• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • 2
  • 1
  • Tagged with
  • 6
  • 6
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Speech segmentation and speaker diarisation for transcription and translation

Sinclair, Mark January 2016 (has links)
This dissertation outlines work related to Speech Segmentation – segmenting an audio recording into regions of speech and non-speech, and Speaker Diarization – further segmenting those regions into those pertaining to homogeneous speakers. Knowing not only what was said but also who said it and when, has many useful applications. As well as providing a richer level of transcription for speech, we will show how such knowledge can improve Automatic Speech Recognition (ASR) system performance and can also benefit downstream Natural Language Processing (NLP) tasks such as machine translation and punctuation restoration. While segmentation and diarization may appear to be relatively simple tasks to describe, in practise we find that they are very challenging and are, in general, ill-defined problems. Therefore, we first provide a formalisation of each of the problems as the sub-division of speech within acoustic space and time. Here, we see that the task can become very difficult when we want to partition this domain into our target classes of speakers, whilst avoiding other classes that reside in the same space, such as phonemes. We present a theoretical framework for describing and discussing the tasks as well as introducing existing state-of-the-art methods and research. Current Speaker Diarization systems are notoriously sensitive to hyper-parameters and lack robustness across datasets. Therefore, we present a method which uses a series of oracle experiments to expose the limitations of current systems and to which system components these limitations can be attributed. We also demonstrate how Diarization Error Rate (DER), the dominant error metric in the literature, is not a comprehensive or reliable indicator of overall performance or of error propagation to subsequent downstream tasks. These results inform our subsequent research. We find that, as a precursor to Speaker Diarization, the task of Speech Segmentation is a crucial first step in the system chain. Current methods typically do not account for the inherent structure of spoken discourse. As such, we explored a novel method which exploits an utterance-duration prior in order to better model the segment distribution of speech. We show how this method improves not only segmentation, but also the performance of subsequent speech recognition, machine translation and speaker diarization systems. Typical ASR transcriptions do not include punctuation and the task of enriching transcriptions with this information is known as ‘punctuation restoration’. The benefit is not only improved readability but also better compatibility with NLP systems that expect sentence-like units such as in conventional machine translation. We show how segmentation and diarization are related tasks that are able to contribute acoustic information that complements existing linguistically-based punctuation approaches. There is a growing demand for speech technology applications in the broadcast media domain. This domain presents many new challenges including diverse noise and recording conditions. We show that the capacity of existing GMM-HMM based speech segmentation systems is limited for such scenarios and present a Deep Neural Network (DNN) based method which offers a more robust speech segmentation method resulting in improved speech recognition performance for a television broadcast dataset. Ultimately, we are able to show that the speech segmentation is an inherently ill-defined problem for which the solution is highly dependent on the downstream task that it is intended for.
2

As fases da entrevista de pré-mediação e suas implicações interacionais para a atividade profissional da mediação

Sant'Anna, Priscila Fernandes 22 August 2017 (has links)
Submitted by Geandra Rodrigues (geandrar@gmail.com) on 2018-04-11T19:50:10Z No. of bitstreams: 1 priscilafernandessantanna.pdf: 1508041 bytes, checksum: 723dbc738a9554c7d68015b4320b7144 (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2018-04-12T11:26:10Z (GMT) No. of bitstreams: 1 priscilafernandessantanna.pdf: 1508041 bytes, checksum: 723dbc738a9554c7d68015b4320b7144 (MD5) / Made available in DSpace on 2018-04-12T11:26:10Z (GMT). No. of bitstreams: 1 priscilafernandessantanna.pdf: 1508041 bytes, checksum: 723dbc738a9554c7d68015b4320b7144 (MD5) Previous issue date: 2017-08-22 / Este trabalho apresenta uma proposta de mapeamento das fases da atividade comunicativa entrevista de pré-mediação. Dentro da perspectiva dos estudos da Linguística Aplicada de base Interacional, objetivamos compreender como essa atividade profissional é organizada, tendo em vista os discursos ali construídos. O contexto jurídico e a mediação familiar ganham destaque nesta pesquisa, considerando a ênfase dada às formas alternativas de resolução de controvérsias no Brasil, que culminaram na promulgação da Lei da Mediação e nas disposições sobre mediação e conciliação no Código de Processo Civil, respectivamente, em vigor, a partir de 2015 e 2016. Na tentativa de pensar a vida social considerando o ponto de vista daqueles que, em geral, não têm voz nas pesquisas acadêmicas (MOITA LOPES, 2006), realizamos um estudo qualitativo e interpretativo (DENZIN; LINCOLN, 2000), no qual as identidades, dos participantes- construídas discursivamente -, são os norteadores para as análises depreendidas. Os dados de mediação retratados nesta pesquisa são dados reais de fala-em-interação, gerados em uma Vara de Família do Fórum de uma cidade do interior do estado do Rio de Janeiro, no ano de 2008. Os dados foram gravados em áudio e, em seguida, transcritos de acordo com o modelo Jefferson de transcrição (LODER, 2008). Como resultados do trabalho, foram encontradas, após o mapeamento das entrevistas de pré-mediação, cinco fases referentes ao tipo de atividade em análise, quais sejam: “Esclarecendo as Regras do Jogo”, “Conhecendo os Participantes”, “Historiando os Conflitos”, “Entendendo o Processo” e “Combinando a Próxima Etapa”. Tais fases apresentam características distintas compostas de ações específicas dos participantes. A compreensão acerca das fases pode contribuir para a profissão do mediador e, em uma instância mais ampla, para os usuários do sistema judiciário brasileiro, oferecendo subsídios para um fazer profissional que compreenda a mediação como um espaço pedagógico de prevenção de conflitos, a fim de que os participantes dessa atividade de fala possam se tornar protagonistas de suas decisões e, com isso, lidar com as questões conflituosas, as quais precisarão passar ao longo de suas vidas social e familiar. / This paper presents a proposal for mapping the phases of the communicative activity pre-mediation interview. Within the perspective of the studies of Applied Linguistics Interational basis, aims to understand how this professional activity is organized, in view of the speeches built there. The legal context and family mediation are highlighted in this study, considering the emphasis given to alternative forms of dispute resolution in Brazil, which culminated in the promulgation of the Mediation Law and the provisions on mediation and conciliation in the Civil Procedure Code, respectively, in 2015 and 2016. In the attempt to think social life considering the point of view of those who, generally, do not have voice in academic research (MOITA LOPES, 2006), a qualitative and interpretative study is carried out (DENZIN; LINCOLN, 2000), in which the identities, of the participants – constructed discursively –, are the guiding principles for the analyzed analyzes. The data of mediation portrayed in this research are real speech-in-interaction data, generated in a Family Court of the Forum of a city in the interior of the state of Rio de Janeiro, in the year 2008. The data were recorded in audio and, And then transcribed according to the Jefferson transcription model (LODER, 2008). As a result of the study, five phases were analyzed, after the pre-mediation interviews were mapped: "Clarifying the Rules of the Game", "Knowing the Participants", "Storing Conflicts", "Understanding the Process" and "Combining the Next Step." These phases present distinct characteristics composed of specific actions of the participants. The understanding of the phases can contribute to the profession of the mediator and, in a broader instance, to the users of the Brazilian judicial system, offering subsidies for a professional doing that understands mediation as a pedagogical area of conflict prevention in order to That the participants in this speech activity can become protagonists of their decisions and thereby deal with the conflicting issues that they will need to pass through in their social and family lives.
3

Формирование фонетических навыков английского языка у китайских студентов направления «Лингвистика» : магистерская диссертация / Development of PhoneticSskills in English Language for Chinese Students Majoring in Linguistics

Бай, Я., Bai, Y. January 2023 (has links)
В данной диссертации исследуется проблема формирования фонетических навыков английского языка у китайских студентов, обучающихся по направлению «Лингвистика». Первая глава посвящена теоретическим основам формирования фонетических навыков английского языка у китайских студентов. В ней рассматриваются проблемы адаптации студентов из КНР в новой образовательной среде; изучаются национально-психологические особенности китайских студентов; анализируются основные виды речевой деятельности, а также рассматривается фонетика как один из важнейших языковых аспектов. Вторая глава посвящена практическим вопросам формирования фонетических навыков английского языка у китайских студентов. В ней рассмотрены фонетические системы английского и китайского языков; изучены методики преподавания фонетики в образовательных учреждениях Китая; разработан комплекс упражнений по формированию фонетических навыков английского языка у китайских студентов направления «Лингвистика», апробированный на студентах первого курса и предложены рекомендации по преодолению трудностей, связанных с проблемами обучения английской фонетики китайских студентов. / This dissertation examines the problem of the formation of phonetic skills of the English language among Chinese students studying at the "Linguistics" department. The first chapter is devoted to the theoretical foundations of the formation of the phonetic skills of the English language among Chinese students. It deals with the problems of adaptation of students from China in the new educational environment; the national psychological characteristics of Chinese students are studied; the main types of speech activity are analyzed, and phonetics is considered as one of the most important linguistic aspects. The second chapter is devoted to the practical issues of the formation of the phonetic skills of the English language among Chinese students. It considers the phonetic systems of English and Chinese; the methods of teaching phonetics in educational institutions of China were studied; a set of exercises on the formation of phonetic skills of the English language among Chinese students of the direction "Linguistics" was developed, tested on first-year students, and recommendations were proposed for overcoming the difficulties associated with the problems of teaching English phonetics to Chinese students. Keywords: phonetics, Chinese students, English language, Chinese students, set of exercises, speech activity.
4

Robust Speech Activity Detection and Direction of Arrival Using Convolutional Neural Networks

Näslund, Anton, Jeansson, Charlie January 2020 (has links)
Social robots are becoming more and more common in our everyday lives. In the field of conversational robotics, the development goes towards socially engaging robots with humanlike conversation. This project looked into one of the technical aspects when recognizing speech, videlicet speech activity detection (SAD). The presented solution uses a convolutional neural network (CNN) based system to detect speech in a forward azimuth area. The project used a dataset from FestVox, called CMU Artic and was complimented by adding recorded noises. A library called Pyroomacoustics were used to simulate a real world setup to create a robust system. A simplified version was built, this model only detected speech activity and a accuracy of 95%was reached. The finished model resulted in an accuracy of 93%.It was compared with similar project, a voice activity detection(VAD) algorithm WebRTC with beamforming, as no previous published solutions to our project was found. Our solution proved to be higher in accuracy in both cases, compared to the accuracy WebRTC achieved on our dataset. / Sociala robotar blir vanligare och vanligare i våra vardagliga liv. Inom konversationsrobotik går utvecklingen mot socialt engagerande robotar som kan ha mänskliga konversationer. Detta projekt tittar på en av de tekniska aspekterna vid taligenkänning, nämligen talaktivitets detektion. Den presenterade lösningen använder ett convolutional neuralt nätverks(CNN) baserat system för att detektera tal i ett framåtriktat azimut område. Projektet använde sig av ett dataset från FestVox, kallat CMU Artic och kompletterades genom att lägga till ett antal inspelade störningsljud. Ett bibliotek som heter Pyroomacoustics användes för att simulera en verklig miljö för att skapa ett robust system. En förenklad modell konstruerades som endast detekterade talaktivitet och en noggrannhet på 95% uppnåddes. Den färdiga maskinen resulterade i en noggrannhet på 93%. Det jämfördes med liknande projekt, en röstaktivitetsdetekterings (VAD) algoritm WebRTC med strålformning, eftersom inga tidigare publicerade lösningar för vårt projekt hittades. Det visade sig att våra lösningar hade högre noggrannhet än den WebRTC uppnådde på vårt dataset. / Kandidatexjobb i elektroteknik 2020, KTH, Stockholm
5

[en] THE CONVERSATIONAL STYLES OF THE AERIAL REPORTER ON THE CONTEXT OF RADIOS IN THE CITY OF RIO DE JANEIRO / [pt] OS ESTILOS CONVERSACIONAIS DO REPÓRTER AÉREO NO CONTEXTO DE RÁDIOS NA CIDADE DO RIO DE JANEIRO

MARCO AURELIO SILVA SOUZA 13 November 2013 (has links)
[pt] O foco do estudo são os estilos conversacionais de repórteres aéreos em rádios da cidade do Rio de Janeiro durante a transmissão de notícias em tempo real sobre o fluxo do trânsito na cidade. O objetivo consiste em avaliar como os repórteres aéreos alternam seus estilos em função da audiência e do contexto situacional do trânsito em um grande centro urbano. A pesquisa parte da perspectiva teórica da sociolinguística interacional, em interface com a teoria da acomodação e do design da audiência, em contextos de ordem micro e macro. A discussão do conceito de estilo conversacional é fundamental, no âmbito das teorias em articulação. Conceitos também importantes são os de tipos de atividade, avaliação, enquadre, alinhamento, pistas de contextualização e conversa cotidiana. A pesquisa se pauta pela investigação qualitativa, de natureza interpretativa. A análise baseia-se em dados gerados mediante gravação de notícias sobre o fluxo do trânsito com quatro repórteres aéreos, em seis emissoras de rádio FM do Rio de Janeiro. Os dados foram transcritos de acordo com convenções da análise da conversa e foram analisados no curso da fala-eminteração dos repórteres aéreos com locutores das rádio e com foco na audiência no trânsito. O estudo mostra que os mesmos repórteres aéreos variam seus estilos conversacionais em diferentes rádios, com diferentes tipos de discurso, variando entre um estilo conversacional informativo de baixo envolvimento interpessoal em algumas rádios e um estilo conversacional informativo de alto envolvimento interpessoal em outras. / [en] The present study focuses on the conversational styles of the aerial reporters on radio stations during the transmission of real-time news about the traffic flow in the city of Rio de Janeiro. The study aims to evaluate how the aerial reporters shift their styles depending on the audience and the situational context of traffic on a great city. The theoretical perspective lies on interactional sociolinguistics and its relation to the social accommodation theory and audience design, in micro and macro contexts. The discussion on the concept of conversational style is crucial, on the scope of the related theories. Other important concepts deal with speech activity and evaluation; framing, alignment, contextualization cues and discourse strategies. The research are characterized on the qualitative and interpretative investigation. The analysis is based on the recording of news about the traffic flow from four aerial reporters in six FM radio stations in Rio de Janeiro. The data were transcribed according to the conventions of conversation analysis and analyzed on the scope of talk-in-interaction from the aerial reporters and radio announcers focusing on the audience on traffic. The study shows that the same aerial reporters shift their conversational styles in different radios, performing different discourse types, varying from a low involvement interpersonal conversational informative style on some radio stations to a high involvement interpersonal conversational informative style on other radio stations.
6

Voice Activity Detection / Voice Activity Detection

Ent, Petr January 2009 (has links)
Práce pojednává o využití support vector machines v detekci řečové aktivity. V první části jsou zkoumány různé druhy příznaků, jejich extrakce a zpracování a je nalezena jejich optimální kombinace, která podává nejlepší výsledky. Druhá část představuje samotný systém pro detekci řečové aktivity a ladění jeho parametrů. Nakonec jsou výsledky porovnány s dvěma dalšími systémy, založenými na odlišných principech. Pro testování a ladění byla použita ERT broadcast news databáze. Porovnání mezi systémy bylo pak provedeno na databázi z NIST06 Rich Test Evaluations.

Page generated in 0.0264 seconds