Spelling suggestions: "subject:"speechrecognition"" "subject:"breedsrecognition""
601 |
Development of Tongan Materials for Determining Speech Recognition ThresholdsBunker, Lisa Dawn 16 June 2008 (has links) (PDF)
Speech recognition threshold (SRT) is an important clinical measure that validates the pure-tone average (PTA), assists in diagnosis and prognosis of hearing and hearing impairment, and helps identify non-organic hearing impairment. Few published, recorded, and standardized materials exist in languages other than English, which results in audiologists testing individuals using materials developed in a non-native language. Research shows that this is problematic, as certain criterion for SRT testing are not met. Thus, performance may reflect test-language deficiency rather than hearing impairment. Currently, there are no known published materials for use in measuring the SRT in individuals whose native language is Tongan. The purpose of this project was to record and develop psychometrically equivalent words in Tongan for measuring the SRT. This study identified 28 trisyllabic words that were relatively homogenous in relation to audibility and psychometric function slope. The intensity of these 28 words was adjusted to equate 50% threshold performance for each word with the mean PTA (5.92 dB HL) for the twenty normally hearing participants. These materials were digitally recorded onto compact disc for distribution and use for SRT testing in Tongan.
|
602 |
Development of Thai Speech Audiometry Materials for Measuring Speech Recognition ThresholdsHart, Lauren Alexandra 16 July 2008 (has links) (PDF)
Speech audiometry materials are essential for thorough audiological testing. One aspect of speech audiometry is evaluating an individual's speech recognition threshold (SRT). Recorded materials for SRT are available in many languages; however there are no widely published recorded SRT materials available in the Thai language. The goal of this study was to develop relatively psychometrically equivalent SRT materials for evaluating the hearing abilities of native speakers of the Thai language. To accomplish this, 90 commonly used bisyllabic Thai words were digitally recorded by a male and a female talker and evaluated by 20 native Thai listeners. Twenty-eight words with relatively steep and homogeneous psychometric function slopes were selected and adjusted to reduce threshold variability. These 28 selected words were digitally recorded onto compact disc to facilitate SRT testing for native Thai speakers.
|
603 |
Semi-Supervised Learning with Sparse Autoencoders in Automatic Speech Recognition / Semi-övervakad inlärning med glesa autoencoders i automatisk taligenkänningDHAKA, AKASH KUMAR January 2016 (has links)
This work is aimed at exploring semi-supervised learning techniques to improve the performance of Automatic Speech Recognition systems. Semi-supervised learning takes advantage of unlabeled data in order to improve the quality of the representations extracted from the data.The proposed model is a neural network where the weights are updated by minimizing the weighted sum of a supervised and an unsupervised cost function, simultaneously. These costs are evaluated on the labeled and unlabeled portions of the data set, respectively. The combined cost is optimized through mini-batch stochastic gradient descent via standard backpropagation.The model was tested on a phone classification task on the TIMIT American English data set and on a written digit classification task on the MNIST data set. Our results show that the model outperforms a network trained with standard backpropagation on the labelled material alone. The results are also in line with state-of-the-art graph-based semi-supervised training methods. / Detta arbete syftar till att utforska halvövervakade inlärningstekniker (semi-supervised learning techniques) för att förbättra prestandan hos automatiska taligenkänningssystem.Halvövervakad maskininlärning använder sig av data ej märkt med klasstillhörighetsinformation för att förbättra kvaliteten hos den från datan extraherade representationen.Modellen som beskrivs i arbetet är ett neuralt nätverk där vikterna uppdateras genom att samtidigt minimera den viktade summan av en övervakad och en oövervakad kostnadsfunktion.Dessa kostnadsfunktioner evalueras på den märkta respektive den omärkta datamängden.De kombinerade kostnadsfunktionerna optimeras genom gradient descent med hjälp av traditionell backpropagation.Modellen har evaluerats genom en fonklassificeringsuppgift på datamängden TIMIT American English, samt en sifferklassificeringsuppgift på datamängden MNIST.Resultaten visar att modellen presterar bättre än ett nätverk tränat med backpropagation på endast märkt data.Resultaten är även konkurrenskraftiga med rådande state of the art, grafbaserade halvövervakade inlärningsmetoder.
|
604 |
Detection and Identification of Instability and Blow-off/Flashback Precursors in Aeronautical Engines using Deep Learning techniquesCellier, Antony Hermann Guy January 2020 (has links)
The evolution of injection processes toward more fuel efficient and less polluting combustion systems tend to make them more prone to critical events such as Thermo-Acoustic Instabilities, Blow-Off and Flash-Back. Moreover, the addition of Di-Hydrogen as a secondary or as the main fuel is in discussion by aeronautical engines manufacturers. It drastically modifies the stability of the system and thus raise several interrogations concerning the multiplicity of its use. Being able to predict critical phenomena becomes a necessity in order to efficiently operate a system without having to pre-test every configuration and without sacrificing the safety of the user. Based on Deep Learning techniques and more specifically Speech Recognition, the following study presents the steps to develop a tool able to successfully detect and translate precursors of instability of an aeronautical grade swirled injector confined in a tubular combustion chamber. The promising results obtained lead to proposals for future transpositions to real-size systems. / Utvecklingen av injektionsprocesser mot mer bränsleeffektiva och mindre förorenande förbränningssystem, tenderar att göra dem mer benägna att utsättas för kritiska händelser som Thermo-Acoustic Instabilities, Blow-Off och Flash-Back. Dessutom diskuterar flygmotorkonstruktörer möjligheten att använda Dihydrogen som sekundärt eller som huvudbränsle. Det modifierar drastiskt systemets stabilitet och det väcker frågan hur man kan använda det effektivt. Att kunna förutsäga kritiska fenomen blir en nödvändighet för att använda ett system utan att behöva att på förhand testa varje konfiguration och utan att reducera användarens säkerhet. Baserat på Deep-Learning-tekniker och Speech-Recognition-tekniker, presenterar följande studie stegen för att utveckla ett verktyg som kan upptäcka och översätta föregångare till instabilitet hos en swirled flygmotorerinsprutningspump som är innesluten i en förbränningskammare. De lovande resultaten leder till idéer om hur man kan anpassa det här verktyg till ett system i verklig storlek.
|
605 |
Robust Dialog Management Through A Context-centric ArchitectureHung, Victor C. 01 January 2010 (has links)
This dissertation presents and evaluates a method of managing spoken dialog interactions with a robust attention to fulfilling the human user’s goals in the presence of speech recognition limitations. Assistive speech-based embodied conversation agents are computer-based entities that interact with humans to help accomplish a certain task or communicate information via spoken input and output. A challenging aspect of this task involves open dialog, where the user is free to converse in an unstructured manner. With this style of input, the machine’s ability to communicate may be hindered by poor reception of utterances, caused by a user’s inadequate command of a language and/or faults in the speech recognition facilities. Since a speech-based input is emphasized, this endeavor involves the fundamental issues associated with natural language processing, automatic speech recognition and dialog system design. Driven by ContextBased Reasoning, the presented dialog manager features a discourse model that implements mixed-initiative conversation with a focus on the user’s assistive needs. The discourse behavior must maintain a sense of generality, where the assistive nature of the system remains constant regardless of its knowledge corpus. The dialog manager was encapsulated into a speech-based embodied conversation agent platform for prototyping and testing purposes. A battery of user trials was performed on this agent to evaluate its performance as a robust, domain-independent, speech-based interaction entity capable of satisfying the needs of its users.
|
606 |
Phoneme-based Video Indexing Using Phonetic Disparity SearchBarth, Carlos Leon 01 January 2010 (has links)
This dissertation presents and evaluates a method to the video indexing problem by investigating a categorization method that transcribes audio content through Automatic Speech Recognition (ASR) combined with Dynamic Contextualization (DC), Phonetic Disparity Search (PDS) and Metaphone indexation. The suggested approach applies genome pattern matching algorithms with computational summarization to build a database infrastructure that provides an indexed summary of the original audio content. PDS complements the contextual phoneme indexing approach by optimizing topic seek performance and accuracy in large video content structures. A prototype was established to translate news broadcast video into text and phonemes automatically by using ASR utterance conversions. Each phonetic utterance extraction was then categorized, converted to Metaphones, and stored in a repository with contextual topical information attached and indexed for posterior search analysis. Following the original design strategy, a custom parallel interface was built to measure the capabilities of dissimilar phonetic queries and provide an interface for result analysis. The postulated solution provides evidence of a superior topic matching when compared to traditional word and phoneme search methods. Experimental results demonstrate that PDS can be 3.7% better than the same phoneme query, Metaphone search proved to be 154.6% better than the same phoneme seek and 68.1 % better than the equivalent word search.
|
607 |
Automated Regression Testing Approach To Expansion And Refinement Of Speech Recognition GrammarsDookhoo, Raul 01 January 2008 (has links)
This thesis describes an approach to automated regression testing for speech recognition grammars. A prototype Audio Regression Tester called ART has been developed using Microsoft's Speech API and C#. ART allows a user to perform any of three tasks: automatically generate a new XML-based grammar file from standardized SQL database entries, record and cross-reference audio files for use by an underlying speech recognition engine, and perform regression tests with the aid of an oracle grammar. ART takes as input a wave sound file containing speech and a newly created XML grammar file. It then simultaneously executes two tests: one with the wave file and the new grammar file and the other with the wave file and the oracle grammar. The comparison result of the tests is used to determine whether the test was successful or not. This allows rapid exhaustive evaluations of additions to grammar files to guarantee forward process as the complexity of the voice domain grows. The data used in this research to derive results were taken from the LifeLike project. However, the capabilities of ART extend beyond LifeLike. The results gathered have shown that using a person's recorded voice to do regression testing is as effective as having the person do live testing. A cost-benefit analysis, using two published equations, one for Cost and the other for Benefit, was also performed to determine if automated regression testing is really more effective than manual testing. Cost captures the salaries of the engineers who perform regression testing tasks and Benefit captures revenue gains or losses related to changes in product release time. ART had a higher benefit of $21461.08 when compared to manual regression testing which had a benefit of $21393.99. Coupled with its excellent error detection rates, ART has proven to be very efficient and cost-effective in speech grammar creation and refinement.
|
608 |
End-to-end Transcription of Presentations and Meetings / 講演・会議のend-to-end自動書き起こしMimura, Masato 26 September 2022 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24256号 / 情博第800号 / 新制||情||135(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 河原 達也, 教授 森 信介, 教授 伊藤 孝行 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
609 |
Effect of phonological and semantic predictability on perceived clarity of degraded speech for non-native listenersHoshi Larsson, Kaori January 2022 (has links)
Many of us have experienced that speech in a non-native language under noise can be challenging. This study examined whether semantic and phonological predictability improves the intelligibility of degraded speech in a non-native language. An online experiment was conducted with 15 participants. Based on these data, a repeated-measures ANOVA showed that both overall semantic and phonological prediction enhanced perceptual clarity in degraded speech for non-native listeners. Semantic predictability was effective for non-native speakers only when the sound quality was slightly intelligible. In contrast, phonological predicatively enhance perceptual clarity at all sound quality levels except in clear and unintelligible settings. Another aim of this study was to investigate if individual cognitive ability differences are related to the benefit of phonological and semantic predictability in the non-native context. Results showed a positive Spearman correlation between working memory score and the overall benefit of phonological predictability. As for the effect per sound level, the results were significant only at intermediately intelligible sound quality level. However, there was no correlation between working memory and the benefit of semantic coherence. Verbal fluency did not correlate with either of the benefits of semantic or phonetic predictability.
|
610 |
Taligenkänningseffekt på Klinisk Dokumentationskvalitet / The Impact of Speech Recognition on Clinical Documentation QualityMcClure, Madelena January 2023 (has links)
Health care providers report several stressors related to the use of electronic healthrecord (EHR) systems to complete clinical documentation. These stressors include frustrations resulting from time-consuming and cumbersome interaction with the EHR. Speech recognition(SR) has been suggested as a way to help reduce this stress. Consensus is lacking in the research regarding the effect of SR on clinical documentation quality, and the research that has been conducted is primarily quantitative.The purpose of this study was to increase understanding of how the use of SR changes the work of completing clinical documentation and to identify strategies that would facilitate the implementation and use of SR for documentation. Additionally the study aimed to examine how the use of SR is perceived to impact clinical documentation quality. Qualitative methods were employed. Four physicians (three radiologists and one internist) with experience of using SR to complete documentation participated insemi-structured interviews. The results showed that an internist reported increased time spent on documentation due to the need to proofread and correct errors. Radiologists reported experiencing no significant change in the amount of time spent completing documentation. All physicians experienced an increased rate of errors and increased effort needed for proofreading documentation generated via SR. Physicians reported worries arising from the increased error rate to be a source of stress. A set of strategies to improve users’ experience of SR was developed based on physicians’ experiences, and issues to consider when healthcare organizations implement the use of SR for documentation were identified. Uncorrected SR errors, the ability to see the text while using SR and the immediacy that results from eliminating turn-around time were found to affect physicians’ perception of their documentation quality.
|
Page generated in 0.0635 seconds