Spelling suggestions: "subject:"epeech recognition system"" "subject:"cpeech recognition system""
11 |
Metody potlaÄen umu pro rozpoznvaÄe eÄi / Methods of noise suppression for speech recognition systemsMoldkov, Zuzana January 2014 (has links)
This diploma thesis deals with methods of noise suppression for speech recognition systems. In theoretical part are discussed basic terms of this topic and also methods for noise suppression. These are spectral subtraction, Wiener filtering, RASTA, mapping of spectrogram or algorithms based on noise estimation. In second part types of noise are analyzed, there is proposal and implementation of spectral subtraction method of noise suppression for speech recognition system. Also extensive testing of spectral subtractive algorithms in comparison with Wiener filter is conducted. Assessment of this testing is done with defined metrics, successfulness of recognition, recognition system score and signal to noise ratio.
|
12 |
臺灣大學生透過電腦輔助軟體學習英語發音的研究 / A Passage to being understood and understanding others:蔡碧華, Tsai, Pi Hua Unknown Date (has links)
本研究旨在調查電腦輔助英語發音學習軟體 「MyET」,對學習者在學習英語發音方面的影響。 利用電腦輔助英語發音學習軟體(CAPT),練習英語的類化效果,也列為調查重點之一。 此外,學生使用CAPT過程中遭遇的困難和挑戰,以及互動過程中發展出來的對策也一一加以探討。 本研究的目的是要把CAPT在英語聲韻教學的領域中做正確的定位,並且探討如何使用其他的中介工具(例如人類)來強化此類軟體的輔助學習效果。
參與本次研究的大學生一共有九十名,分為三組:兩組CAPT組(亦即實驗組,使用CAPT獨自或與同儕一起使用CAPT學習英語發音)、非CAPT組(控制組)一 組。每組三十名。實驗開始,所有學生以十週的時間練習朗讀 從「灰姑娘」(Cinderella) 摘錄的文字,此段文字由發行 MyET 的公司線上免費提供。 實驗前與實驗後,兩組的學生各接受一次測驗。 每週練習結束後,學生必須將學習心得記載於學習日誌上;教師也針對每個學生的學習心得給予指導回饋。
研究結果顯示,兩個CAPT組別(亦即使用CAPT發音學習軟體的組別)的學生在學習英語聲韻的過程中,都有明顯及正面的進步與改變。尤其是語調與速度快慢方面的進步遠勝於發音的進步。再者,實驗組學生以十週的時間利用CAPT學習英語後,在朗讀新的文字時,無論是發音或語調都有類化的效應,但是在速度快慢方面則無顯著進步。然而,實驗結果三組的發音表現,在量化統計上並未達到明顯的差異。
雖然如此,在質化的探究上,經過分析學生的學習心得後得知:所有組別當中,獨自使用CAPT學習英語發音的組別,最能夠自我審視語言學習歷程 (包括模仿和學習樂趣)。至於共同使用CAPT學習的學生自述在英語流暢度、語調及發音方面獲致最大的改善。控制組的學生因為沒有同儕的鷹架教學及回饋,也沒有 MyET提供的練習回饋,練習過程中,學生自述學習困難的頻率最高,學生也認為學習收穫很少。 參與本次研究實驗組的學生認為, CAPT提供練習回饋的機制設計有改進的空間。 有關本研究結果在理論及英語教學上的意涵以及研究限制,於結論當中一一提出加以討論。
關鍵字:電腦輔助語言教學,語音辨識軟體,超音段,語調,時長,學習策略,
中介 / This present study investigated the impact of computer-assisted pronunciation training (CAPT) software, i.e., MyET, on students’ learning of English pronunciation. The investigation foci included the generalization of the effect of practice with the CAPT system. Also examined are the difficulties and challenges reported by the students who employed the CAPT system and the strategy scheme they developed from their interaction with the system. This study aimed to position the role of the CAPT system in the arena of instruction on English pronunciation and to investigate how other kinds of mediation, such as that of peer support, could reinforce its efficacy.
This study involved 90 Taiwanese college students, divided into two experimental groups and one control group. The two experimental groups practiced English pronunciation by using a computer-assisted pronunciation training (CAPT) program either independently or with peers while the control group only had access to MP3 files in their practice. All the groups practiced for ten weeks texts adopted from a play, Cinderella, provided by MyET free of charge on line. They all received a pretest and a posttest on the texts they had practiced and a novel text. Each week after their practice with the texts, the participants were asked to write down in their learning logs their reflections on the learning process in Chinese. In the same way, the instructor would provide her feedback on the students’ reflections in the logs every week.
The results showed that the ten-week practice with the CAPT system resulted in significant and positive changes in the learning of English pronunciation of CAPT groups (i.e., the Self-Access CAPT Group and the Collaborative CAPT Group). The progress of the participants in intonation and timing was always higher than in segmental pronunciation. Moreover, the ten-week practice with the CAPT system was found to be generalized (though the generalization is less than mediocre) to the participants’ performance in the production of segmental pronunciation and intonation but not in the timing component in reading the novel text. However, the improvement of the CAPT groups was not great enough to differentiate themselves from the MP3 Group.
Though the quantitative investigation did not reveal significant group differences, the qualitative analysis of the students’ reflections showed that the learning processes all the three groups went through differed. The Self-Access CAPT Group outperformed the other two groups in developing self-monitoring of language learning and production, and in enjoying working with the CAPT system/texts. Among the three groups, the Collaborative CAPT Group outscored the other two groups in reporting their gains and improvement in fluency, intonation and segmental pronunciation, as well as developing strategies to deal with their learning difficulty. Though the students in the MP3 group also made significant progress after the practice, without peers’ scaffolding and the feedback provided by MyET, they reported the highest frequency of difficulties and the least frequency of gains and strategies during the practice. The participants of this study also considered necessary the improvement of the CAPT system’s feedback design. At the end of the study theoretical and pedagogical implications as well as research limitations are presented.
Key words: Computer-Assisted Language Learning (CALL), Automatic Speech Recognition System (ASRS), segmental pronunciation, prosody, intonation, timing, learning strategies, mediation
|
13 |
Hardware/Software Co-Design for Keyword Spotting on Edge DevicesJacob Irenaeus M Bushur (15360553) 29 April 2023 (has links)
<p>The introduction of artificial neural networks (ANNs) to speech recognition applications has sparked the rapid development and popularization of digital assistants. These digital assistants perform keyword spotting (KWS), constantly monitoring the audio captured by a microphone for a small set of words or phrases known as keywords. Upon recognizing a keyword, a larger audio recording is saved and processed by a separate, more complex neural network. More broadly, neural networks in speech recognition have popularized voice as means of interacting with electronic devices, sparking an interest in individuals using speech recognition in their own projects. However, while large companies have the means to develop custom neural network architectures alongside proprietary hardware platforms, such development precludes those lacking similar resources from developing efficient and effective neural networks for embedded systems. While small, low-power embedded systems are widely available in the hobbyist space, a clear process is needed for developing a neural network that accounts for the limitations of these resource-constrained systems. In contrast, a wide variety of neural network architectures exists, but often little thought is given to deploying these architectures on edge devices. </p>
<p><br></p>
<p>This thesis first presents an overview of audio processing techniques, artificial neural network fundamentals, and machine learning tools. A summary of a set of specific neural network architectures is also discussed. Finally, the process of implementing and modifying these existing neural network architectures and training specific models in Python using TensorFlow is demonstrated. The trained models are also subjected to post-training quantization to evaluate the effect on model performance. The models are evaluated using metrics relevant to deployment on resource-constrained systems, such as memory consumption, latency, and model size, in addition to the standard comparisons of accuracy and parameter count. After evaluating the models and architectures, the process of deploying one of the trained and quantized models is explored on an Arduino Nano 33 BLE using TensorFlow Lite for Microcontrollers and on a Digilent Nexys 4 FPGA board using CFU Playground.</p>
|
14 |
Channel Modeling Applied to Robust Automatic Speech RecognitionSklar, Alexander Gabriel 01 January 2007 (has links)
In automatic speech recognition systems (ASRs), training is a critical phase to the system?s success. Communication media, either analog (such as analog landline phones) or digital (VoIP) distort the speaker?s speech signal often in very complex ways: linear distortion occurs in all channels, either in the magnitude or phase spectrum. Non-linear but time-invariant distortion will always appear in all real systems. In digital systems we also have network effects which will produce packet losses and delays and repeated packets. Finally, one cannot really assert what path a signal will take, and so having error or distortion in between is almost a certainty. The channel introduces an acoustical mismatch between the speaker's signal and the trained data in the ASR, which results in poor recognition performance. The approach so far, has been to try to undo the havoc produced by the channels, i.e. compensate for the channel's behavior. In this thesis, we try to characterize the effects of different transmission media and use that as an inexpensive and repeatable way to train ASR systems.
|
Page generated in 0.1106 seconds