Global ETD Search

1	Acoustic-articulatory DNN Model based on Transfer Learning for Pronunciation Error Detection and Diagnosis / 発音誤りの検出と診断のための転移学習に基づく音響・調音DNNモデル / # ja-Kana Duan, Richeng 25 September 2018 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第21391号 / 情博第677号 / 新制\|\|情\|\|117(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授河原達也, 教授黒橋禎夫, 教授壇辻正剛, 准教授南條浩輝 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Acoustic-articulatory model Transfer Learning DNN CAPT 007
2	Perceptually motivated speech recognition and mispronunciation detection Koniaris, Christos January 2012 (has links) This doctoral thesis is the result of a research effort performed in two fields of speech technology, i.e., speech recognition and mispronunciation detection. Although the two areas are clearly distinguishable, the proposed approaches share a common hypothesis based on psychoacoustic processing of speech signals. The conjecture implies that the human auditory periphery provides a relatively good separation of different sound classes. Hence, it is possible to use recent findings from psychoacoustic perception together with mathematical and computational tools to model the auditory sensitivities to small speech signal changes. The performance of an automatic speech recognition system strongly depends on the representation used for the front-end. If the extracted features do not include all relevant information, the performance of the classification stage is inherently suboptimal. The work described in Papers A, B and C is motivated by the fact that humans perform better at speech recognition than machines, particularly for noisy environments. The goal is to make use of knowledge of human perception in the selection and optimization of speech features for speech recognition. These papers show that maximizing the similarity of the Euclidean geometry of the features to the geometry of the perceptual domain is a powerful tool to select or optimize features. Experiments with a practical speech recognizer confirm the validity of the principle. It is also shown an approach to improve mel frequency cepstrum coefficients (MFCCs) through offline optimization. The method has three advantages: i) it is computationally inexpensive, ii) it does not use the auditory model directly, thus avoiding its computational cost, and iii) importantly, it provides better recognition performance than traditional MFCCs for both clean and noisy conditions. The second task concerns automatic pronunciation error detection. The research, described in Papers D, E and F, is motivated by the observation that almost all native speakers perceive, relatively easily, the acoustic characteristics of their own language when it is produced by speakers of the language. Small variations within a phoneme category, sometimes different for various phonemes, do not change significantly the perception of the language’s own sounds. Several methods are introduced based on similarity measures of the Euclidean space spanned by the acoustic representations of the speech signal and the Euclidean space spanned by an auditory model output, to identify the problematic phonemes for a given speaker. The methods are tested for groups of speakers from different languages and evaluated according to a theoretical linguistic study showing that they can capture many of the problematic phonemes that speakers from each language mispronounce. Finally, a listening test on the same dataset verifies the validity of these methods. / <p>QC 20120914</p> / European Union FP6-034362 research project ACORNS / Computer-Animated language Teachers (CALATea) feature extraction feature selection auditory models MFCCs speech recognition distortion measures perturbation analysis psychoacoustics human perception sensitivity matrix pronunciation error detection phoneme second language perceptual assessment
3	Využití řečových technologií při výuce výslovnosti cizích jazyků / Speech Technology Application in Pronunciation Training and Foreign Language Learning Barotová, Štěpánka January 2020 (has links) Tato diplomová práce pojednává o využití algoritmu Dynamic Time Warping (DTW) pro automatické hodnocení výslovnosti anglického jazyka. Práce se zaměřuje na vylepšení již existující aplikace pro výuku výslovnosti, a to ve třech oblastech: uživatelské rozhraní, samotný algoritmus a korektivní zpětná vazba uživateli. První část se věnuje přehledu technik používaných v této oblasti, následně je představen nový design uživatelského rozhraní, popsán navržený systém a experimenty. Experimenty se zaměřují na problematiku detekce chyb na úrovni fonémů, na detekci chyb v primárním důrazu na úrovni slabik a na hodnocení intonace na úrovni slov. Všechny použité metody jsou navrženy tak, aby poskytovaly korektivní zpětnou vazbu uživateli. V poslední části je popsáno, jak byly všechny tři vylepšené oblasti aplikace otestovány.

1

Page generated in 0.122 seconds