Spelling suggestions: "subject:"speechrecognition"" "subject:"breedsrecognition""
661 |
A Design of Recognition Rate Improving Strategy for Japanese Speech Recognition SystemLin, Cheng-Hung 24 August 2010 (has links)
This thesis investigates the recognition rate improvement strategies for a Japanese speech recognition system. Both training data development and consonant correction scheme are studied. For training data development, a database of 995 two-syllable Japanese words is established by phonetic balanced sieving. Furthermore, feature models for the 188 common Japanese mono-syllables are derived through mixed position training scheme to increase recognition rate. For consonant correction, a sub-syllable model is developed to enhance the consonant recognition accuracy, and hence further improve the overall correct rate for the whole Japanese phrases. Experimental results indicate that the average correct rate for Japanese phrase recognition system with 34 thousand phrases can be improved from 86.91% to 92.38%.
|
662 |
A Study On Bandpassed Speech From The Point Of IntelligibilityGanesh, Murthy C N S 10 1900 (has links)
Speech has been the subject of interest for a very long time. Even with so much advancement in the processing techniques and in the understanding of the source of speech, it is, even today, rather difficult to generate speech in the laboratory in all its aspects. A simple aspect like how the speech can retain its intelligibility even if it is distorted or band passed is not really understood. This thesis deals
with one small feature of speech viz., the intelligibility of speech is retained even when it is bandpassed with a minimum bandwidth of around 1 KHz located any where on the speech
spectrum of 0-4 KHz.
Several experiments have been conducted by the earlier workers by passing speech through various distortors like differentiators, integrators and infinite peak clippers and it is found that the intelligibility is retained to a very large extent in the distorted speech. The integrator and the differentiator remove essentially a certain portion of the spectrum. Therefore, it is thought that the intelligibility of the speech is spread over the entire speech spectrum and that, the intelligibility of speech may not be impaired even when it is bandpassed with a minimum bandwidth and the band may be located any where in the speech spectrum. To test this idea and establish this feature if it exists, preliminary
experiments have been conducted by passing the speech through different filters and it is found that the conjecture seems to be on the right line.
To carry out systematic experiments on this an
experimental set up has been designed and fabricated which consists of a microprocessor controlled speech recording, storing and speech playback system. Also, a personal computer
is coupled to the microprocessor system to enable the storage and processing of the data. Thirty persons drawn from different walks of life like teachers, mechanics and students have been involved for collecting the samples and for
recognition of the information of the processed speech. Even though the sentences like 'This is devices lab' are used to ascertain the effect of bandwidth on the intelligibility, for the purpose of analysis, vowels are used as the speech samples.
The experiments essentially consist of recording words and sentences spoken by the 30 participants and these recorded speech samples are passed through different filters with different bandwidths and central frequencies. The filtered output is played back to the various listeners and
observations regarding the intelligibility of the speech are noted. The listeners do not have any prior information about the content of the speech. It has been found that in almost
all (95%) cases, the messages or words are intelligible for most of the listeners when the band width of the filter is about 1 KHz and this is independent of the location of the pass band in the spectrum of 0-4 KHz. To understand how this feature of speech arises, spectrums of vowels spoken by 30 people have using FFT algorithms on the digitized samples of the speech.
It is felt that there is a cyclic behavior of the spectrum in all the samples. To make sure that the periodicity is present and also to arrive at the periodicity, a moving average procedure is employed to smoothen the spectrum. The smoothened spectrums of all the vowels indeed show a periodicity of about 1 KHz. When the periodicities are analysed the average value of the periodicities has been found to be 1038 Hz with a standard deviation of 19 Hz. In view of this it is thought that the acoustic source
responsible for speech must have generated this periodic spectrum, which might have been modified periodically to imprint the intelligibility. If this is true, one can perhaps easily understand this feature of the speech viz., the intelligibility is retained in a bandpassed speech of bandwidth 1 K H z . the pass band located any where in the speech spectrum of 0-4 KHz. This thesis describing the experiments and the analysis of the speech has been presented in 5 chapters. Chapter 1 deals with the basics of speech and the processing tools used to analyse the speech signal. Chapter 2 presents the literature survey from where the present problem is tracked down. Chapter 3 describes the details of the structure and the fabrication of the experimental setup that has been used. In chapter 4, the detailed account of the way in which the
experiments are conducted and the way in which the speech is analysed is given. In conclusion in chapter 5, the work is summarised and the future work needed to establish the mechanism of speech responsible for the feature of speech described in this thesis is suggested.
|
663 |
Robuste Spracherkennung unter raumakustischen UmgebungsbedingungenPetrick, Rico 14 January 2010 (has links) (PDF)
Bei der Überführung eines wissenschaftlichen Laborsystems zur automatischen Spracherkennung in eine reale Anwendung ergeben sich verschiedene praktische Problemstellungen, von denen eine der Verlust an Erkennungsleistung durch umgebende akustische Störungen ist. Im Gegensatz zu additiven Störungen wie Lüfterrauschen o. ä. hat die Wissenschaft bislang die Störung des Raumhalls bei der Spracherkennung nahezu ignoriert. Dabei besitzen, wie in der vorliegenden Dissertation deutlich gezeigt wird, bereits geringfügig hallende Räume einen stark störenden Einfluss auf die Leistungsfähigkeit von Spracherkennern.
Mit dem Ziel, die Erkennungsleistung wieder in einen praktisch benutzbaren Bereich zu bringen, nimmt sich die Arbeit dieser Problemstellung an und schlägt Lösungen vor. Der Hintergrund der wissenschaftlichen Aktivitäten ist die Erstellung von funktionsfähigen Sprachbenutzerinterfaces für Gerätesteuerungen im Wohn- und Büroumfeld, wie z.~B. bei der Hausautomation. Aus diesem Grund werden praktische Randbedingungen wie die Restriktionen von embedded Computerplattformen in die Lösungsfindung einbezogen.
Die Argumentation beginnt bei der Beschreibung der raumakustischen Umgebung und der Ausbreitung von Schallfeldern in Räumen. Es wird theoretisch gezeigt, dass die Störung eines Sprachsignals durch Hall von zwei Parametern abhängig ist: der Sprecher-Mikrofon-Distanz (SMD) und der Nachhallzeit T60. Um die Abhängigkeit der Erkennungsleistung vom Grad der Hallstörung zu ermitteln, wird eine Anzahl von Erkennungsexperimenten durchgeführt, die den Einfluss von T60 und SMD nachweisen. Weitere Experimente zeigen, dass die Spracherkennung kaum durch hochfrequente Hallanteile beeinträchtigt wird, wohl aber durch tieffrequente.
In einer Literaturrecherche wird ein Überblick über den Stand der Technik zu Maßnahmen gegeben, die den störenden Einfluss des Halls unterdrücken bzw. kompensieren können. Jedoch wird auch gezeigt, dass, obwohl bei einigen Maßnahmen von Verbesserungen berichtet wird, keiner der gefundenen Ansätze den o. a. praktischen Einsatzbedingungen genügt.
In dieser Arbeit wird die Methode Harmonicity-based Feature Analysis (HFA) vorgeschlagen. Sie basiert auf drei Ideen, die aus den Betrachtungen der vorangehenden Kapitel abgeleitet werden. Experimentelle Ergebnisse weisen die Verbesserung der Erkennungsleistung in halligen Umgebungen nach. Es werden sogar praktisch relevante Erkennungsraten erzielt, wenn die Methode mit verhalltem Training kombiniert wird. Die HFA wird gegen Ansätze aus der Literatur evaluiert, die ebenfalls praktischen Implementierungskriterien genügen. Auch Kombinationen der HFA und einigen dieser Ansätze werden getestet.
Im letzten Kapitel werden die beiden Basistechnologien Stimm\-haft-Stimmlos-Entscheidung und Grundfrequenzdetektion umfangreich unter Hallbedingungen getestet, da sie Voraussetzung für die Funktionsfähigkeit der HFA sind. Als Ergebnis wird dargestellt, dass derzeit für beide Technologien kein Verfahren existiert, das unter Hallbedingungen robust arbeitet. Es kann allerdings gezeigt werden, dass die HFA trotz der Unsicherheiten der Verfahren arbeitet und signifikante Steigerungen der Erkennungsleistung erreicht. / Automatic speech recognition (ASR) systems used in real-world indoor scenarios suffer from performance degradation if noise and reverberation conditions differ from the training conditions of the recognizer. This thesis deals with the problem of room reverberation as a cause of distortion in ASR systems. The background of this research is the design of practical command and control applications, such as a voice controlled light switch in rooms or similar applications. Therefore, the design aims to incorporate several restricting working conditions for the recognizer and still achieve a high level of robustness. One of those design restrictions is the minimisation of computational complexity to allow the practical implementation on an embedded processor.
One chapter comprehensively describes the room acoustic environment,
including the behavior of the sound field in rooms. It addresses the speaker room microphone (SRM) system which is expressed in the time domain as the room impulse response (RIR). The convolution of the RIR with the clean speech signal yields the reverberant signal at the microphone.
A thorough analysis proposes that the degree of the distortion caused by reverberation is dependent on two parameters, the reverberation time T60 and the speaker-to-microphone distance (SMD). To evaluate the dependency of the recognition rate on the degree of distortion, a number of experiments has been successfully conducted, confirming the above mentioned dependency of the two parameters, T60 and SMD. Further experiments have shown that ASR is barely affected by high-frequency reverberation, whereas low frequency reverberation has a detrimental effect on the recognition rate.
A literature survey concludes that, although several approaches exist which claim significant improvements, none of them fulfils the above mentioned practical implementation criteria. Within this thesis, a new approach entitled 'harmonicity-based feature analysis' (HFA) is proposed. It is based on three ideas that are derived in former chapters. Experimental results prove that HFA is able to enhance the recognition rate in reverberant environments. Even practical applicable results are achieved when HFA is combined with reverberant training. The method is further evaluated against three other approaches from the literature. Also combinations of methods are tested.
In a last chapter the two base technologies fundamental frequency (F0) estimation and voiced unvoiced decision (VUD) are evaluated in reverberant environments, since they are necessary to run HFA. This evaluation aims to find one optimal method for each of these technologies. The results show that all F0 estimation methods and also the VUD methods have a strong decreasing performance in reverberant environments. Nevertheless it is shown that HFA is able to deal with uncertainties of these base technologies as such that the recognition performance still improves.
|
664 |
Steuerung sprechernormalisierender Abbildungen durch künstliche neuronale NetzwerkeMüller, Knut 01 November 2000 (has links)
No description available.
|
665 |
Using Speech Recognition Software to Increase Writing Fluency for Individuals with Physical DisabilitiesGarrett, Jennifer Tumlin 03 July 2007 (has links)
Writing is an important skill that is necessary throughout school and life. Many students with physical disabilities, however, have difficulty with writing skills due to disability-specific factors, such as motor coordination problems. Due to the difficulties these individuals have with writing, assistive technology is often utilized. One piece of assistive technology, speech recognition software, may help remove the motor demand of writing and help students become more fluent writers. Past research on the use of speech recognition software, however, reveals little information regarding its impact on individuals with physical disabilities. Therefore, this study involved students of high school age with physical disabilities that affected hand use. Using an alternating treatments design to compare the use of word processing with the use of speech recognition software, this study analyzed first-draft writing samples in the areas of fluency, accuracy, type of word errors, recall of intended meaning, and length. Data on fluency, calculated in words correct per minute (wcpm) indicated that all participants wrote much faster with speech recognition compared to word processing. However, accuracy, calculated as percent correct, was much lower when participants used speech recognition compared to word processing. Word errors and recall of intended meaning were coded based on type and varied across participants. In terms of length, all participants wrote longer drafts when using speech recognition software, primarily because their fluency was higher, and they were able, therefore, to write more words. Although the results of this study indicated that participants wrote more fluently with speech recognition, because their accuracy was low, it is difficult to determine whether or not speech recognition is a viable solution for all individuals with physical disabilities. Therefore, additional research is needed that takes into consideration the editing and error correction time when using speech recognition software.
|
666 |
Programinė įranga kompiuterio valdymui balsu / Software for computer control by voiceRingelienė, Živilė 24 September 2008 (has links)
Magistro darbe pristatoma sukurta programa, realizuojanti interneto naršyklės valdymą balsu. Ši programa papildo atskirų žodžių prototipinę atpažinimo sistemą, pagrįstą paslėptaisiais Markovo modeliais (PMM). Šios dvi dalys ir sudaro interneto naršyklės valdymo balsu prototipą, kuris gali atpažinti 71 komandą (vienas arba du žodžiai) lietuvių kalba: 1 komandą, skirtą naršyklės atvėrimui, 54 naršyklės valdymo komandas, 16 komandų, atveriančių konkrečius iš anksto sistemai nurodytus tinklalapius. Darbe aprašytas lietuvių kalbos atskirų žodžių atpažinimo sistemos akustinių modelių, grįstų paslėptaisiais Markovo modeliais, rinkinių eksperimentinis tyrimas. Atsižvelgiant į įvairius atpažinimui turinčius įtakos veiksnius (mokymo duomenų kiekį, mišinio komponenčių skaičių, kalbėtojo lytį, skirtingos techninės įrangos naudojimą atpažinime), buvo sukurti skirtingi balso komandų akustinių modelių rinkiniai. Eksperimentinio tyrimo metu buvo tiriama šių rinkinių panaudojimo atpažinimo sistemoje įtaka sistemos atpažinimo tikslumui. Eksperimentinio tyrimo rezultatai parodė, kad interneto naršyklės valdymo balsu sistemos prototipo atpažinimo tikslumas siekia 98%. Sistema gali būti naudojama kaip vaizdinė priemonė vyresniųjų klasių moksleiviams informacinių technologijų, fizikos, psichologijos, matematikos pamokose. / The thesis presents a prototype of the software (system) for Web browser control by voice. The prototype consists of two parts: the Hidden Markov Models based word recognition system and the program, which implements browser control by voice commands and is integrated in the word recognition system. The prototype is a speaker-independent Lithuanian word (voice commands) recognition system and can recognize 71 voice commands: 1 command is intended to run browser, 54 commands – for browser control, and 16 commands – to open various user predefined websites. Taking into account various factors (amount of training data, number of Gaussian mixture components, gender of speaker, use of different hardware for recognition) which have impact on recognition, different sets of acoustic models of Lithuanian voice commands were created and trained. An experimental investigation of the influence of the sets usage in Lithuanian word recognition system on the word recognition accuracy was performed. The results of the experimental investigation showed that created prototype system achieves 98% word recognition accuracy. The prototype system can be used at secondary school as a visual speech recognition learning tool in the informatics, physics, psychology, and mathematics lessons for the pupils of senior classes.
|
667 |
Ellection markup language (EML) based tele-voting systemGong, XiangQi January 2009 (has links)
Elections are one of the most fundamental activities of a democratic society. As is the case in any other aspect of life, developments in technology have resulted changes in the voting procedure from using the traditional paper-based voting to voting by use of electronic means, or e-voting. E-voting involves using different forms of electronic means like / voting machines, voting via the Internet, telephone, SMS and digital interactive television. This thesis concerns voting by telephone, or televoting, it starts by giving a brief overview and evaluation of various models and technologies that are implemented within such systems. The aspects of televoting that have been investigated are technologies that provide a voice interface to the voter and conduct the voting process, namely the Election Markup Language (EML), Automated Speech Recognition (ASR) and Text-to-Speech (TTS).
|
668 |
Lietuvių kalbos atpažinimas iOS įrenginiuose / Lithuanian speech recognition in iOS devicesSabaliauskas, Darius 06 August 2014 (has links)
Šiuolaikiniame pasaulyje vis daugiau žmonių naudoja išmaniuosius telefonus, kurie perima vis daugiau su kompiuterio atliekamo darbo (el. pašto tikrinimas, apsipirkimas internetu ir t.t.). Šiuose įrenginiuose vis daugiau funkcijų galima atlikti balsu (atidaryti programėles ir kt.), tačiau kol kas tik anglų ir keletu kitų kalbų. Todėl šiame darbe bus nagrinėjamas lietuvių kalbos atpažinimo uždavinys iOS platformai (viena iš pagrindinių išmaniųjų telefonų ir planšetinių kompiuterių platformų), kuri yra naudojama mobiliuose Apple įrenginiuose. Šiame darbe nagrinėjamas CMU Sphinx ir Julius bibliotekų panaudojimas iOS įrenginiuose atpažįstant lietuvių kalbą. Tyrimui buvo sukurtas LSR karkasas paslepiantis CMU Sphinx ir Julius bibliotekų realizacijos ypatumus po Objective-C kalbos sąsaja. Tyrimui buvo naudojamas skaičių nuo 0 iki 9 garsynas ir analizuota, koks atpažinimo tikslumas ir greitaveika yra su tokiu nedideliu 10 žodžių žodynu atpažįstant pavienius skaičius. / Nowadays more and more people use smartphones which replaces more word done with personal computer (e-mail checking, e-shopping, etc.). In these devices more and more functions could be done with voice (open apps and other), but still only in english and some other languages. Therefore, in out work we will investigate Lithuanian speech recognition task in iOS (one of the major smartphones and tablets platforms), which runs in Apple's mobile devices. In this work we investigate CMU Sphinx and Julius libraries use in iOS devices for Lithuanian speech recognition. For this task LSR framework was created which encapsulated CMU Sphinx and Julius realisation nuances under Objective-C interfaces. Experiments were performed with numbers from 0 to 9 corpus and recognition accuracy and speed were investigated.
|
669 |
A voice controlled measurement procedure for the high energy physics laboratoryChen, Chang January 1990 (has links)
A Zenith-386 workstation was outfitted with a DICRES-54.8 paralell port board to facilitate I/C between a large Summagrid x-y coordinate digital measurement pad that has a resolution of 10 microns. Film views of high energy particle collisions can be projected onto this pad for measurement. Voice prompts via a Votrax speech synthesis system are sent at critical points during the algorithm from the Z-386 through other ports of the DICRES board. Progress in measurement is fed into the Z-386's serial port from an Interstate voice recognition system at other points of the measurement algorithm. The whole measurement process is managed by an assembler language based modular computer program. / Department of Physics and Astronomy
|
670 |
Automatic speech recognition for resource-scarce environments / N.T. Kleynhans.Kleynhans, Neil Taylor January 2013 (has links)
Automatic speech recognition (ASR) technology has matured over the past few decades and has made significant impacts in a variety of fields, from assistive technologies to commercial products. However, ASR system development is a resource intensive activity and requires language resources in the form of text annotated audio recordings and pronunciation dictionaries. Unfortunately, many languages found in the developing world fall into the resource-scarce category and due to this resource scarcity the deployment of ASR systems in the developing world is severely inhibited. In this thesis we present research into developing techniques and tools to (1) harvest audio data, (2) rapidly adapt ASR systems and (3) select “useful” training samples in order to assist with resource-scarce ASR system development.
We demonstrate an automatic audio harvesting approach which efficiently creates a speech recognition corpus by harvesting an easily available audio resource. We show that by starting with bootstrapped acoustic models, trained with language data obtain from a dialect, and then running through a few iterations of an alignment-filter-retrain phase it is possible to create an accurate speech recognition corpus. As a demonstration we create a South African English speech recognition corpus by using our approach and harvesting an internet website which provides audio and approximate transcriptions. The acoustic models developed from harvested data are evaluated on independent corpora and show that the proposed harvesting approach provides a robust means to create ASR resources.
As there are many acoustic model adaptation techniques which can be implemented by an ASR system developer it becomes a costly endeavour to select the best adaptation technique. We investigate the dependence of the adaptation data amount and various adaptation techniques by systematically varying the adaptation data amount and comparing the performance of various adaptation techniques. We establish a guideline which can be used by an ASR developer to chose the best adaptation technique given a size constraint on the adaptation data, for the scenario where adaptation between narrow- and wide-band corpora must be performed. In addition, we investigate the effectiveness of a novel channel normalisation technique and compare the performance with standard normalisation and adaptation techniques.
Lastly, we propose a new data selection framework which can be used to design a speech recognition corpus. We show for limited data sets, independent of language and bandwidth, the most effective strategy for data selection is frequency-matched selection and that the widely-used maximum entropy methods generally produced the least promising results. In our model, the frequency-matched selection method corresponds to a logarithmic relationship between accuracy and corpus size; we also investigated other model relationships, and found that a hyperbolic relationship (as suggested from simple asymptotic arguments in learning theory) may lead to somewhat better performance under certain conditions. / Thesis (PhD (Computer and Electronic Engineering))--North-West University, Potchefstroom Campus, 2013.
|
Page generated in 0.2992 seconds