Global ETD Search

1	Acoustic-articulatory DNN Model based on Transfer Learning for Pronunciation Error Detection and Diagnosis / 発音誤りの検出と診断のための転移学習に基づく音響・調音DNNモデル / # ja-Kana Duan, Richeng 25 September 2018 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第21391号 / 情博第677号 / 新制\|\|情\|\|117(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授河原達也, 教授黒橋禎夫, 教授壇辻正剛, 准教授南條浩輝 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Acoustic-articulatory model Transfer Learning DNN CAPT 007
2	Mispronunciation Detection with SpeechBlender Data Augmentation Pipeline / Uttalsfelsdetektering med SpeechBlender data-förstärkning Elkheir, Yassine January 2023 (has links) The rise of multilingualism has fueled the demand for computer-assisted pronunciation training (CAPT) systems for language learning, CAPT systems make use of speech technology advancements and offer features such as learner assessment and curriculum management. Mispronunciation detection (MD) is a crucial aspect of CAPT, aimed at identifying and correcting mispronunciations in second language learners’ speech. One of the significant challenges in developing MD models is the limited availability of labeled second-language speech data. To overcome this, the thesis introduces SpeechBlender - a fine-grained data augmentation pipeline designed to generate mispronunciations. The SpeechBlender targets different regions of a phonetic unit and blends raw speech signals through linear interpolation, resulting in erroneous pronunciation instances. This method provides a more effective sample generation compared to traditional cut/paste methods. The thesis explores also the use of pre-trained automatic speech recognition (ASR) systems for mispronunciation detection (MD), and examines various phone-level features that can be extracted from pre-trained ASR models and utilized for MD tasks. An deep neural model was proposed, that enhance the representations of extracted acoustic features combined with positional phoneme embeddings. The efficacy of the augmentation technique is demonstrated through a phone-level pronunciation quality assessment task using only non-native good pronunciation speech data. Our proposed technique achieves state-of-the-art results, with Speechocean762 Dataset [54], on ASR dependent MD models at phoneme level, with a 2.0% gain in Pearson Correlation Coefficient (PCC) compared to the previous state-of-the-art [17]. Additionally, we demonstrate a 5.0% improvement at the phoneme level compared to our baseline. In this thesis, we developed the first Arabic pronunciation learning corpus for Arabic AraVoiceL2 to demonstrate the generality of our proposed model and augmentation technique. We used the corpus to evaluate the effectiveness of our approach in improving mispronunciation detection for non-native Arabic speakers learning. Our experiments showed promising results, with a 4.6% increase in F1-score for the Arabic AraVoiceL2 testset, demonstrating the effectiveness of our model and augmentation technique in improving pronunciation learning for non-native speakers of Arabic. / Den ökande flerspråkigheten har ökat efterfrågan på datorstödda CAPT-system (Computer-assisted pronunciation training) för språkinlärning. CAPT-systemen utnyttjar taltekniska framsteg och erbjuder funktioner som bedömning av inlärare och läroplanshantering. Upptäckt av felaktigt uttal är en viktig aspekt av CAPT som syftar till att identifiera och korrigera felaktiga uttal i andraspråkselevernas tal. En av de stora utmaningarna när det gäller att utveckla MD-modeller är den begränsade tillgången till märkta taldata för andraspråk. För att övervinna detta introduceras SpeechBlender i avhandlingen - en finkornig dataförstärkningspipeline som är utformad för att generera feluttalningar. SpeechBlender är inriktad på olika regioner i en fonetisk enhet och blandar råa talsignaler genom linjär interpolering, vilket resulterar i felaktiga uttalsinstanser. Denna metod ger en effektivare provgenerering jämfört med traditionella cut/paste-metoder. I avhandlingen undersöks användningen av förtränade system för automatisk taligenkänning (ASR) för upptäckt av felaktigt uttal. I avhandlingen undersöks olika funktioner på fonemnivå som kan extraheras från förtränade ASR-modeller och användas för att upptäcka felaktigt uttal. En LSTM-modell föreslogs som förbättrar representationen av extraherade akustiska egenskaper i kombination med positionella foneminbäddningar. Effektiviteten hos förstärkning stekniken demonstreras genom en uppgift för bedömning av uttalskvaliteten på fonemnivå med hjälp av taldata som endast innehåller taldata som inte är av inhemskt ursprung och som ger ett bra uttal, Vår föreslagna teknik uppnår toppresultat med Speechocean762-dataset [54], på ASR-beroende modeller för upptäckt av felaktigt uttal på fonemnivå, med en ökning av Pearsonkorrelationskoefficienten (PCC) med 2,0% jämfört med den tidigare toppnivån [17]. Dessutom visar vi en förbättring på 5,0% på fonemnivå jämfört med vår baslinje. Vi observerade också en ökning av F1-poängen med 4,6% med arabiska AraVoiceL2-testset. Read more Automatic Speech Recognition (ASR) Datorstödd uttalsträning (CAPT) automatisk taligenkänning (ASR) Elektroteknik och elektronik
3	ComPron : Learning Pronunciation through Building Associations between Native Language and Second Language Speech Sounds Lessing, Sara January 2020 (has links) Current computer-assisted pronunciation training (CAPT) tools are too focused on what technologies can do, rather than focusing on learner needs and pedagogy. They also lack an embodied perspective on learning. This thesis presents a Research through Design project exploring what kind of interactive design features can support second language learners’ pronunciation learning of segmental speech sounds with embodiment in mind. ComPron was designed: an open simulated prototype that supports learners in learning perception and production of new segmental speech sounds in a second language, by comparing them to native language speech sounds. ComProm was evaluated through think-aloud user tests and semi-structured interviews (N=4). The findings indicate that ComPron supports awareness of speech sound-movement connections, association building between sounds, and production of sounds. The design features that enabled awareness, association building, and speech sound production support are discussed and what ComPron offers in comparison to other CAPT-tools. Research through Design (RtD) Co-design Embodied Learning Human Computer Interaction

1

Page generated in 0.019 seconds