Global ETD Search

451	Sintese e reconhecimento da fala humana / Synthesis and recognition of human speech Stolfi, Rumiko Oishi 31 October 2006 (has links) Orientadores: Fabio Violaro, Anamaria Gomide / Dissertação (mestrado profissional) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-07T21:57:26Z (GMT). No. of bitstreams: 1 Stolfi_RumikoOishi_M.pdf: 1514197 bytes, checksum: e93f45916d359641c73b31b00952a914 (MD5) Previous issue date: 2006 / Resumo: O objetivo deste trabalho é apresentar uma revisão dos principais conceitos e métodos envolvidos na síntese, processamento e reconhecimento da fala humana por computador.Estas tecnologias têm inúmeras aplicações, que têm aumentado substancialmente nos últimos anos com a popularização de equipamentos de comunicação portáteis (celulares, laptops, palmtops) e a universalização da Internet. A primeira parte deste trabalho é uma revisão dos conceitos básicos de processamento de sinais, incluindo transformada de Fourier, espectro de potência e espectrograma, filtros, digitalização de sinais e o teorema de Nyquist. A segunda parte descreve as principais características da fala humana, os mecanismos envolvidos em sua produção e percepção, e o conceito de fone (unidade lingüística de som). Nessa parte também descrevemos brevemente as principais técnicas para a conversão ortográfica-fonética, para a síntese de fala a partir da descrição fonética, e para o reconhecimento da fala natural. A terceira parte descreve um projeto prático que desenvolvemos para consolidar os conhecimentos adquiridos neste mestrado: um programa que gera canções populares japonesas a partir de uma descrição textual da letra de música, usando método de síntese concatenativa. No final do trabalho listamos também alguns softwares disponíveis (livres e comerciais) para síntese e reconhecimento da fala / Abstract: The goal of this dissertation is to review the main concepts relating to the synthesis, processing, and recognition of human speech by computer. These technologies have many applications, which have increased substantially in recent years after the spread of portable communication equipment (mobile phones, laptops, palmtops) and the universal access to the Internet. The first part of this work is a revision of fundamental concepts of signal processing, including the Fourier transform, power spectrum and spectrogram, filters, signal digitalization, and Nyquist's theorem. The second part describes the main characteristics of human speech, the mechanisms involved in its production and perception, and the concept of phone (linguistic unit of sound). In this part we also briefly describe the main techniques used for orthographic-phonetic transcription, for speech synthesis from a phonetic description, and for the recognition of natural speech. The third part describes a practical project we developed to consolidate the knowledge acquired in our Masters studies: a program that generates Japanese popular songs from a textual description of the lyrics and music, using the concatenative synthesis method. At the end of this dissertation, we list some available software products (free and commercial) for speech synthesis and speech recognition / Mestrado / Engenharia de Computação / Mestre em Ciência da Computação Sistemas de processamento da fala Processamento de sinais Reconhecimento automático da voz Síntese da voz Reconhecimento automatico da fala Speech processing systems Signal processing Automatic speech recognition Voice systhesis
452	Human-AI Teaming for Dynamic Interpersonal Skill Training Ogletree, Xavian Alexander 26 May 2021 (has links) No description available. Computer Science artificial intelligence natural language processing augmented reality mobile computing automatic speech recognition serious game AI NLP AR inclusivity applied computer science
453	Zvyšování účinnosti strojového rozpoznávání řeči / Enhancing the effectiveness of automatic speech recognition Zelinka, Petr January 2012 (has links) This work identifies the causes for unsatisfactory reliability of contemporary systems for automatic speech recognition when deployed in demanding conditions. The impact of the individual sources of performance degradation is documented and a list of known methods for their identification from the recognized signal is given. An overview of the usual methods to suppress the impact of the disruptive influences on the performance of speech recognition is provided. The essential contribution of the work is the formulation of new approaches to constructing acoustical models of noisy speech and nonstationary noise allowing high recognition performance in challenging conditions. The viability of the proposed methods is verified on an isolated-word speech recognizer utilizing several-hour-long recording of the real operating room background acoustical noise recorded at the Uniklinikum Marburg in Germany. This work is the first to identify the impact of changes in speaker’s vocal effort on the reliability of automatic speech recognition in the full vocal effort range (i.e. whispering through shouting). A new concept of a speech recognizer immune to the changes in vocal effort is proposed. For the purposes of research on changes in vocal effort, a new speech database, BUT-VE1, was created.
454	Automatic Speech Recognition System for Somali in the interest of reducing Maternal Morbidity and Mortality. Laryea, Joycelyn, Jayasundara, Nipunika January 2020 (has links) Developing an Automatic Speech Recognition (ASR) system for the Somali language, though not novel, is not actively explored; hence there has been no success in a model for conversational speech. Neither are related works accessible as open-source. The unavailability of digital data is what labels Somali as a low resource language and poses the greatest impediment to the development of an ASR for Somali. The incentive to develop an ASR system for the Somali language is to contribute to reducing the Maternal Mortality Rate (MMR) in Somalia. Researchers acquire interview audio data regarding maternal health and behaviour in the Somali language; to be able to engage the relevant stakeholders to bring about the needed change, these audios must be transcribed into text, which is an important step towards translation into any language. This work investigates available ASR for Somali and attempts to develop a prototype ASR system to convert Somali audios into Somali text. To achieve this target, we first identified the available open-source systems for speech recognition and selected the DeepSpeech engine for the implementation of the prototype. With three hours of audio data, the accuracy of transcription is not as required and cannot be deployed for use. This we attribute to insufficient training data and estimate that the effort towards an ASR for Somali will be more significant by acquiring about 1200 hours of audio to train the DeepSpeech engine Automatic Speech Recognition (ASR) DeepSpeech Natural Language Processing (NLP) Word Error Rate (WER) Character Error Rate (CER) Social Sciences Samhällsvetenskap
455	Speech to Text for Swedish using KALDI / Tal till text, utvecklandet av en svensk taligenkänningsmodell i KALDI Kullmann, Emelie January 2016 (has links) The field of speech recognition has during the last decade left the re- search stage and found its way in to the public market. Most computers and mobile phones sold today support dictation and transcription in a number of chosen languages. Swedish is often not one of them. In this thesis, which is executed on behalf of the Swedish Radio, an Automatic Speech Recognition model for Swedish is trained and the performance evaluated. The model is built using the open source toolkit Kaldi. Two approaches of training the acoustic part of the model is investigated. Firstly, using Hidden Markov Model and Gaussian Mixture Models and secondly, using Hidden Markov Models and Deep Neural Networks. The later approach using deep neural networks is found to achieve a better performance in terms of Word Error Rate. / De senaste åren har olika tillämpningar inom människa-dator interaktion och främst taligenkänning hittat sig ut på den allmänna marknaden. Många system och tekniska produkter stöder idag tjänsterna att transkribera tal och diktera text. Detta gäller dock främst de större språken och sällan finns samma stöd för mindre språk som exempelvis svenskan. I detta examensprojekt har en modell för taligenkänning på svenska ut- vecklas. Det är genomfört på uppdrag av Sveriges Radio som skulle ha stor nytta av en fungerande taligenkänningsmodell på svenska. Modellen är utvecklad i ramverket Kaldi. Två tillvägagångssätt för den akustiska träningen av modellen är implementerade och prestandan för dessa två är evaluerade och jämförda. Först tränas en modell med användningen av Hidden Markov Models och Gaussian Mixture Models och slutligen en modell där Hidden Markov Models och Deep Neural Networks an- vänds, det visar sig att den senare uppnår ett bättre resultat i form av måttet Word Error Rate. Automatic Speech Recognition Kaldi Hidden Markov Model Gaussian Mixture Model Deep Neural Network Taligenkänning Kaldi Hidden Markov Model Gaussian Mixture Models Deep Neural Networks Mathematics Matematik
456	Query By Example Keyword Spotting Sunde Valfridsson, Jonas January 2021 (has links) Voice user interfaces have been growing in popularity and with them an interest for open vocabulary keyword spotting. In this thesis we focus on one particular approach to open vocabulary keyword spotting, query by example keyword spotting. Three types of query by example keyword spotting approaches are described and evaluated: sequence distances, speech to phonemes and deep distance learning. Evaluation is done on a series of custom tasks designed to measure a variety of aspects. The Google Speech Commands benchmark is used for evaluation as well, this to make it more comparable to existing works. From the results, the deep distance learning approach seem most promising in most environments except when memory is very constrained; in which sequence distances might be considered. The speech to phonemes methods is lacking in the usability evaluation. / Röstgränssnitt har växt i populäritet och med dem ett intresse för öppenvokabulärnyckelordsigenkänning. I den här uppsatsen fokuserar vi på en specifik form av öppenvokabulärnyckelordsigenkänning, den s.k nyckelordsigenkänning- genom- exempel. Tre typer av nyckelordsigenkänning- genom- exempel metoder beskrivs och utvärderas: sekvensavstånd, tal till fonem samt djupavståndsinlärning. Utvärdering görs på konstruerade uppgifter designade att mäta en mängd olika aspekter hos metoderna. Google Speech Commands data används för utvärderingen också, detta för att göra det mer jämförbart mot existerade arbeten. Från resultaten framgår det att djupavståndsinlärning verkar mest lovande förutom i miljöer där resurser är väldigt begränsade; i dessa kan sekvensavstånd vara av intresse. Tal till fonem metoderna visar brister i användningsuvärderingen. Keyword Spotting Automatic Speech Recognition ASR Query By Example Deep Distance Learning Dynamic Time Warping Few- Shot Learning Nyckelords igenkänning automatisk taligenkänning fåförsöksinlärning Computer and Information Sciences Data- och informationsvetenskap
457	Automatic Speech Recognition in Somali Gabriel, Naveen January 2020 (has links) The field of speech recognition during the last decade has left the research stage and found its way into the public market, and today, speech recognition software is ubiquitous around us. An automatic speech recognizer understands human speech and represents it as text. Most of the current speech recognition software employs variants of deep neural networks. Before the deep learning era, the hybrid of hidden Markov model and Gaussian mixture model (HMM-GMM) was a popular statistical model to solve speech recognition. In this thesis, automatic speech recognition using HMM-GMM was trained on Somali data which consisted of voice recording and its transcription. HMM-GMM is a hybrid system in which the framework is composed of an acoustic model and a language model. The acoustic model represents the time-variant aspect of the speech signal, and the language model determines how probable is the observed sequence of words. This thesis begins with background about speech recognition. Literature survey covers some of the work that has been done in this field. This thesis evaluates how different language models and discounting methods affect the performance of speech recognition systems. Also, log scores were calculated for the top 5 predicted sentences and confidence measures of pre-dicted sentences. The model was trained on 4.5 hrs of voiced data and its corresponding transcription. It was evaluated on 3 mins of testing data. The performance of the trained model on the test set was good, given that the data was devoid of any background noise and lack of variability. The performance of the model is measured using word error rate(WER) and sentence error rate (SER). The performance of the implemented model is also compared with the results of other research work. This thesis also discusses why log and confidence score of the sentence might not be a good way to measure the performance of the resulting model. It also discusses the shortcoming of the HMM-GMM model, how the existing model can be improved, and different alternatives to solve the problem. automatic speech recognition speaker adaptation generative training gaussian mixture model kaldi finite-state transducers Probability Theory and Statistics Sannolikhetsteori och statistik
458	Research on dialogue-based CALL integrating tutoring and implicit learning : the design of an automatic joining-in type robot assisted language learning / 個別教示学習と潜在学習手法とを統合するCALLに関する研究 / コベツキョウジガクシュウトセンザイガクシュウシュホウトオトウゴウスル CALL ニカンスルケンキュウ AlBara Jamal Khalifa 20 September 2019 (has links) This dissertation presents the design of a novel joining-in-type humanoid robot-assisted language learning that uses two robots to conduct a goal-oriented conversation with the human learner to practice English as a second language. The system uses implicit learning as the main learning style to teach the usage of a specific expression form. A mix of tutoring and peer learning is implemented in the course of a three-party conversation. This learning style enables the learner to gain linguistic knowledge, and at the same time it improves the performance of the speech recognition engine. / 博士(工学) / Doctor of Philosophy in Engineering / 同志社大学 / Doshisha University CALL Robot-Assisted Language Learning (RALL) Automatic Speech Recognition (ASR) Automatic Grammar Classification Implicit Learning Interactive Alignment Corrective Feedback
459	Adaptive Voice Control System using AI Steen, Jasmine, Wilroth, Markus January 2021 (has links) Controlling external actions with the voice is something humans have tried to do for a long time. There are many different ways to implement a voice control system, and many of these applications require internet connections. Leaving the application area limited, as commercially available voice controllers have been stagnating behind due to the cost of developing and maintaining. In this project an artifact was created to work as an easy to use, generic, voice controller tool that allows the user to easily create different voice commands that can be implemented in many different applications and platforms. The user shall have no need of understanding or experience of voice controls in order to use and implement the voice controller. Mel Frequency Cepstral Converter MFCC Artificial Neural Network ANN Automatic Speech Recognition ASR Voice Controller Speech Recognition Speech Model Computer Sciences Datavetenskap (datalogi)
460	Oral Proficiency Assessment of French Using an Elicited Imitation Test and Automatic Speech Recognition Millard, Benjamin J. 27 June 2011 (has links) (PDF) Testing oral proficiency is an important, but often neglected part of the foreign language classroom. Currently accepted methods in testing oral proficiency are timely and expensive. Some work has been done to test and implement new assessment methods, but have focused primarily on English or Spanish (Graham et al. 2008). In this thesis, I demonstrate that the processes established for English and Spanish elicited imitation (EI) testing are relevant to French EI testing. First, I document the development, implementation and evaluation of an EI test to assess French oral proficiency. I also detail the incorporation of the use of automatic speech recognition to score French EI items. Last, I substantiate with statistical analyses that carefully engineered, automatically scored French EI items correlate to a high degree with French OPI scores. oral proficiency assessment automatic speech recognition ASR elicited imitation EI sentence repetition SR SRT French French testing global oral proficiency elicited response Linguistics

Search results