Spelling suggestions: "subject:"pronunciation training"" "subject:"pronunciations training""
11 |
Automatické hodnocení anglické výslovnosti nerodilých mluvčích / Automatic Pronunciation Evaluation of Non-Native English SpeakersGazdík, Peter January 2019 (has links)
Computer-Assisted Pronunciation Training (CAPT) is becoming more and more popular these days. However, the accuracy of existing CAPT systems is still quite low. Therefore, this diploma thesis focuses on improving existing methods for automatic pronunciation evaluation on the segmental level. The first part describes common techniques for this task. Afterwards, we proposed the system based on two approaches. Finally, performed experiments show significant improvement over the reference system.
|
12 |
Využití řečových technologií při výuce výslovnosti cizích jazyků / Speech Technology Application in Pronunciation Training and Foreign Language LearningBarotová, Štěpánka January 2020 (has links)
Tato diplomová práce pojednává o využití algoritmu Dynamic Time Warping (DTW) pro automatické hodnocení výslovnosti anglického jazyka. Práce se zaměřuje na vylepšení již existující aplikace pro výuku výslovnosti, a to ve třech oblastech: uživatelské rozhraní, samotný algoritmus a korektivní zpětná vazba uživateli. První část se věnuje přehledu technik používaných v této oblasti, následně je představen nový design uživatelského rozhraní, popsán navržený systém a experimenty. Experimenty se zaměřují na problematiku detekce chyb na úrovni fonémů, na detekci chyb v primárním důrazu na úrovni slabik a na hodnocení intonace na úrovni slov. Všechny použité metody jsou navrženy tak, aby poskytovaly korektivní zpětnou vazbu uživateli. V poslední části je popsáno, jak byly všechny tři vylepšené oblasti aplikace otestovány.
|
13 |
A Real-Time and Automatic Ultrasound-Enhanced Multimodal Second Language Training System: A Deep Learning ApproachMozaffari Maaref, Mohammad Hamed 08 May 2020 (has links)
The critical role of language pronunciation in communicative competence is significant, especially for second language learners. Despite renewed awareness of the importance of articulation, it remains a challenge for instructors to handle the pronunciation needs of language learners. There are relatively scarce pedagogical tools for pronunciation teaching and learning, such as inefficient, traditional pronunciation instructions like listening and repeating. Recently, electronic visual feedback (EVF) systems (e.g., medical ultrasound imaging) have been exploited in new approaches in such a way that they could be effectively incorporated in a range of teaching and learning contexts. Evaluation of ultrasound-enhanced methods for pronunciation training, such as multimodal methods, has asserted that visualizing articulator’s system as biofeedback to language learners might improve the efficiency of articulation learning. Despite the recent successful usage of multimodal techniques for pronunciation training, manual works and human manipulation are inevitable in many stages of those systems. Furthermore, recognizing tongue shape in noisy and low-contrast ultrasound images is a challenging job, especially for non-expert users in real-time applications. On the other hand, our user study revealed that users could not perceive the placement of their tongue inside the mouth comfortably just by watching pre-recorded videos.
Machine learning is a subset of Artificial Intelligence (AI), where machines can learn by experiencing and acquiring skills without human involvement. Inspired by the functionality of the human brain, deep artificial neural networks learn from large amounts of data to perform a task repeatedly. Deep learning-based methods in many computer vision tasks have emerged as the dominant paradigm in recent years. Deep learning methods are powerful in automatic learning of a new job, while unlike traditional image processing methods, they are capable of dealing with many challenges such as object occlusion, transformation variant, and background artifacts. In this dissertation, we implemented a guided language pronunciation training system, benefits from the strengths of deep learning techniques. Our modular system attempts to provide a fully automatic and real-time language pronunciation training tool using ultrasound-enhanced augmented reality. Qualitatively and quantitatively assessments indicate an exceptional performance for our system in terms of flexibility, generalization, robustness, and autonomy outperformed previous techniques. Using our ultrasound-enhanced system, a language learner can observe her/his tongue movements during real-time speech, superimposed on her/his face automatically.
|
14 |
Mispronunciation Detection with SpeechBlender Data Augmentation Pipeline / Uttalsfelsdetektering med SpeechBlender data-förstärkningElkheir, Yassine January 2023 (has links)
The rise of multilingualism has fueled the demand for computer-assisted pronunciation training (CAPT) systems for language learning, CAPT systems make use of speech technology advancements and offer features such as learner assessment and curriculum management. Mispronunciation detection (MD) is a crucial aspect of CAPT, aimed at identifying and correcting mispronunciations in second language learners’ speech. One of the significant challenges in developing MD models is the limited availability of labeled second-language speech data. To overcome this, the thesis introduces SpeechBlender - a fine-grained data augmentation pipeline designed to generate mispronunciations. The SpeechBlender targets different regions of a phonetic unit and blends raw speech signals through linear interpolation, resulting in erroneous pronunciation instances. This method provides a more effective sample generation compared to traditional cut/paste methods. The thesis explores also the use of pre-trained automatic speech recognition (ASR) systems for mispronunciation detection (MD), and examines various phone-level features that can be extracted from pre-trained ASR models and utilized for MD tasks. An deep neural model was proposed, that enhance the representations of extracted acoustic features combined with positional phoneme embeddings. The efficacy of the augmentation technique is demonstrated through a phone-level pronunciation quality assessment task using only non-native good pronunciation speech data. Our proposed technique achieves state-of-the-art results, with Speechocean762 Dataset [54], on ASR dependent MD models at phoneme level, with a 2.0% gain in Pearson Correlation Coefficient (PCC) compared to the previous state-of-the-art [17]. Additionally, we demonstrate a 5.0% improvement at the phoneme level compared to our baseline. In this thesis, we developed the first Arabic pronunciation learning corpus for Arabic AraVoiceL2 to demonstrate the generality of our proposed model and augmentation technique. We used the corpus to evaluate the effectiveness of our approach in improving mispronunciation detection for non-native Arabic speakers learning. Our experiments showed promising results, with a 4.6% increase in F1-score for the Arabic AraVoiceL2 testset, demonstrating the effectiveness of our model and augmentation technique in improving pronunciation learning for non-native speakers of Arabic. / Den ökande flerspråkigheten har ökat efterfrågan på datorstödda CAPT-system (Computer-assisted pronunciation training) för språkinlärning. CAPT-systemen utnyttjar taltekniska framsteg och erbjuder funktioner som bedömning av inlärare och läroplanshantering. Upptäckt av felaktigt uttal är en viktig aspekt av CAPT som syftar till att identifiera och korrigera felaktiga uttal i andraspråkselevernas tal. En av de stora utmaningarna när det gäller att utveckla MD-modeller är den begränsade tillgången till märkta taldata för andraspråk. För att övervinna detta introduceras SpeechBlender i avhandlingen - en finkornig dataförstärkningspipeline som är utformad för att generera feluttalningar. SpeechBlender är inriktad på olika regioner i en fonetisk enhet och blandar råa talsignaler genom linjär interpolering, vilket resulterar i felaktiga uttalsinstanser. Denna metod ger en effektivare provgenerering jämfört med traditionella cut/paste-metoder. I avhandlingen undersöks användningen av förtränade system för automatisk taligenkänning (ASR) för upptäckt av felaktigt uttal. I avhandlingen undersöks olika funktioner på fonemnivå som kan extraheras från förtränade ASR-modeller och användas för att upptäcka felaktigt uttal. En LSTM-modell föreslogs som förbättrar representationen av extraherade akustiska egenskaper i kombination med positionella foneminbäddningar. Effektiviteten hos förstärkning stekniken demonstreras genom en uppgift för bedömning av uttalskvaliteten på fonemnivå med hjälp av taldata som endast innehåller taldata som inte är av inhemskt ursprung och som ger ett bra uttal, Vår föreslagna teknik uppnår toppresultat med Speechocean762-dataset [54], på ASR-beroende modeller för upptäckt av felaktigt uttal på fonemnivå, med en ökning av Pearsonkorrelationskoefficienten (PCC) med 2,0% jämfört med den tidigare toppnivån [17]. Dessutom visar vi en förbättring på 5,0% på fonemnivå jämfört med vår baslinje. Vi observerade också en ökning av F1-poängen med 4,6% med arabiska AraVoiceL2-testset.
|
Page generated in 0.131 seconds