61 |
Phoneme set design for second language speech recognition / 第二言語音声認識のための音素セットの構築に関する研究 / ダイ2 ゲンゴ オンセイ ニンシキ ノ タメ ノ オンソ セット ノ コウチク ニカンスル ケンキュウ王 暁芸, Xiaoyun Wang 22 March 2017 (has links)
本論文は第二言語話者の発話を高精度で認識するための音素セットの構成方法に関する研究結果を述べている.本論文では,第二言語話者の発話をネイティブ話者の発話とは異なる音響特徴量の頻度分布を持つ情報源とみなし,これを表現する適切な音素セットを構築する手法を提案している.具体的には,対象とする第二言語と母語との調音位置や調音様式などの類似性に加え,同音異義語の発生による単語識別性能の低下を総合した基準に基づき,最適な音素セットを決定する.提案手法を日本人学生の英語発話の音声認識に適用し,種々の条件下で認識精度の向上を検証した. / This dissertation focuses on the problem caused by confused mispronunciation to improve the recognition performance of second language speech. A novel method considering integrated acoustic and linguistic features is proposed to derive a reduced phoneme set for L2 speech recognition. The customized phoneme set is created with a phonetic decision tree (PDT)-based top-down sequential splitting method that utilizes the phonological knowledge between L1 and L2. The dissertation verifies the efficacy of the proposed method for Japanese English and shows that the feasibility of building a speech recognizer with the proposed method is able to alleviate the problem caused by confused mispronunciation by second language speakers. / 博士(工学) / Doctor of Philosophy in Engineering / 同志社大学 / Doshisha University
62 |
Osvojování primární gramotnosti u dětí s vývojovou dysfázií / Learning Primary Literacy of Children with Specific Language ImpairmentMilanovská, Lýdia January 2016 (has links)
This thesis solves deals with special pedagogical problems of primary literacy acquisition of the children with specific language impairment. The aim is to analyze the phenomena, which disrupt the process of acquiring written language forms of the pupils with this diagnosis and to record factors, which help to streamline this process. Another task is to propose the steps to help overcome the problems caused by impaired communication skills by teaching analytic-synthetic method. The theoretical part is the basis for meeting the targets. It describes specific language impairment as one of the categories of impaired communication skills and its consequences for education. Attention is given to the topic of literacy. Emphasis is particularly given on the initial reading literacy. This period is seen in the context of the speech development and the psycholinguistic concept of literacy is introduced, where the skill of phonemic awareness has a central position. At the end of the theoretical part, the methods of teaching of reading are described. The practical part presents the research study, which was carried out in several consecutive phases: initial diagnostic phase, observation of pupils in various stages of teaching of reading, implementation of methodical support measures and the final assessment of the...
63 |
Controle por unidades verbais mínimas e extensão da unidade ensinada: o efeito do treino de fonemas na emergência da leitura recombinativa / Minimal verbal units control and extension of the unit taught: effect of the phoneme training upon the emergence of recombinative readingInhauser, Luana Zeolla 14 November 2012 (has links)
A leitura é uma habilidade complexa que envolve uma rede de relações entre estímulos e entre estímulos e respostas. Para que um repertório de leitura seja considerado proficiente, o leitor deve apresentar a leitura de novas palavras, não diretamente ensinadas e formadas pela recombinação de unidades aprendidas previamente em outras relações (leitura recombinativa). Um requisito fundamental para o desenvolvimento da leitura recombinativa é, portanto, o estabelecimento de um responder diferencial sob controle das unidades menores, como por exemplo, sílabas, letras ou fonemas, componentes das palavras. O objetivo do presente trabalho foi investigar o efeito do treino direto de fonemas, combinado com um treino de palavras, sobre a emergência da leitura recombinativa de palavras inteiras. Verificou-se também se o momento de inserção do treino de fonemas no procedimento de ensino, se prévio (Condição I) ou sobreposto (Condição II) ao treino de palavras, foi uma variável relevante para a emergência da leitura recombinativa. Os participantes do estudo foram 12 estudantes universitários, com idades entre 18 e 37 anos, e que foram distribuídos igualmente entre três Condições Experimentais: a) Condição I Treino Prévio de Fonemas; b) Condição II Treino Sobreposto de Fonemas; c) Condição IIITreino de palavras. A Condição III foi realizada com o objetivo permitir uma comparação entre o desempenho dos participantes que foram submetidos ao treino de fonemas (Condições I e II) com o dos participantes que foram expostos somente ao treino de palavra (Condição III). Os estímulos utilizados no estudo consistiram de palavras faladas (A) e palavras escritas com um pseudoalfabeto (C), bem como de fonemas e letras do pseudoalfabeto, correspondentes a estes fonemas. Os procedimentos empregados tanto para o treino de fonemas como para o treino de palavras inteiras foram os procedimentos de MTS (relação AC) e de Nomeação Oral (relação CD). Os testes para verificar a emergência da leitura recombinativa foram os mesmos nas três Condições Experimentais e consistiram em testes parciais de leitura oral (Testes CD) e de leitura receptiva (Testes AC), e em um Teste Final de Leitura Oral. Os resultados demonstraram que o treino direto de fonemas (Condições I e II) foi eficaz em estabelecer leitura recombinativa com elevados índices de acertos e pouca variabilidade intra e inter-participantes. O momento de inserção do treino de fonemas no procedimento de ensino foi uma variável relevante e possibilitou a observação de resultados distintos entre os participantes submetidos às Condições I e II. Os resultados sugerem que os participantes submetidos a Condição II (Sobreposto) foram os que apresentaram maior velocidade na recombinação e índices mais elevados de leitura recombinativa, enquanto os participantes submetidos a Condição I (Prévio) foram os que apresentaram menor variabilidade entre si / Reading is a complex skill that involves a network of relationships between stimuli and between stimuli and responses. A repertoire of proficient reading is considered when the reader is able to read new words, not directly taught and composed by the recombination of units previously learned in other relationships (recombinative reading). An essential requirement for the development of recombinative reading is thus establishing a differential responding under the control of smaller units, such as syllables, phonemes or letters, components of the words. The objective of this study was to investigate the effect of direct training of phonemes, combined with a training of words, upon the emergence of recombinative reading of whole words. It was also verified the effect of the moment of insertion of phoneme training in the teaching procedure, prior (Condition I) and overlapping (Condition II) to word training. The participants were 12 college students, aged between 18 and 37 years, which were equally distributed among three Experimental Conditions: a) Condition I Prior Phoneme Training, b) Condition II Overlapping Phoneme Training; c) Condition III Word Training. Condition III was conducted in order to allow a comparison between the performance of participants who were submitted to phoneme training (Condition I and II) and the performance of participants who were exposed only to word training (Condition III). The stimuli used in the study consisted of spoken pseudowords (A) and printed pseudowords (C), as well phonemes and letters form the pseudo-alphabet corresponding to these phonemes. The procedures applied for both phoneme and word training were MTS (AC relation) and Oral Naming (CD relation). Tests to verify the emergence of recombinative reading were the same in all three Experimental Conditions and consisted of partial tests of oral reading (Tests C\'D) and receptive reading (Tests A\'C\'), as well as of a Final Test of Oral Reading. The results showed that the direct training of phonemes (Conditions I and II) was effective in establishing recombinative reading with high scores and low variability within and among participants. Regarding the moment of the insertion of phoneme training on the procedure, the conclusion is that this is an important variable and enables the observation of different results among participants submitted to Conditions I and II. The results suggest that participants undergoing Condition II (Overlapping) were those with greater recombinative speed and higher rates of recombinative reading, while participants submitted to Condition I (Prior) presented the lowest variability among themselves
64 |
De l'utilisation de mesures de confiance en traduction automatique : évaluation, post-édition et application à la traduction de la parole / On the use of confidence measures in machine translation : evaluation, post edition and application to speech translationRaybaud, Sylvain 05 December 2012 (has links)
Cette thèse de doctorat aborde les problématiques de l'estimation de confiance pour la traduction automatique, et de la traduction automatique statistique de la parole spontanée à grand vocabulaire. J'y propose une formalisation du problème d'estimation de confiance, et aborde expérimentalement le problème sous le paradigme de la classification et régression multivariée. Je propose une évaluation des performances des différentes méthodes évoquées, présente les résultats obtenus lors d'une campagne d'évaluation internationale et propose une application à la post-édition par des experts de documents traduits automatiquement. J'aborde ensuite le problème de la traduction automatique de la parole. Après avoir passé en revue les spécificités du medium oral et les défis particuliers qu'il soulève, je propose des méthodes originales pour y répondre, utilisant notamment les réseaux de confusion phonétiques, les mesures de confiances et des techniques de segmentation de la parole. Je montre finalement que le prototype propose rivalise avec des systèmes état de l'art à la conception plus classique / In this thesis I shall deal with the issues of confidence estimation for machine translation and statistical machine translation of large vocabulary spontaneous speech translation. I shall first formalize the problem of confidence estimation. I present experiments under the paradigm of multivariate classification and regression. I review the performances yielded by different techniques, present the results obtained during the WMT2012 internation evaluation campaign and give the details of an application to post edition of automatically translated documents. I then deal with the issue of speech translation. After going into the details of what makes it a very specific and particularly challenging problem, I present original methods to partially solve it, by using phonetic confusion networks, confidence estimation techniques and speech segmentation. I show that the prototype I developped yields performances comparable to state-of-the-art of more standard design
65 |
O fonema : linguística e históriaGaray, Rodrigo Garcia January 2016 (has links)
O presente trabalho é o produto de minha pesquisa acerca dos aspectos históricos e linguísticos que subjazem o conceito do fonema. Nossa ideia originou-se a partir de dois extratos diferentes escritos pelo linguista russo Roman Jakobson: 1) sobre a gênese do fonema: “A procura pelos constituintes diferenciais discretos mais elementares da linguagem nos faz remontar à doutrina do sphoṭa dos gramáticos do sânscrito e a concepção do στοιχεῖον de Platão, mas o verdadeiro estudo linguístico desses invariantes iniciou-se apenas em 1870” (Jakobson, 1962:467); e 2) acerca dos fundadores da Fonologia: “Os primeiros alicerces da Fonologia foram assentados por Baudouin de Courtenay, Ferdinand de Saussure e seus discípulos” (Jakobson, 1962:232). Desta forma, tentamos realizar uma “reconstrução” desta trajetória histórica e linguística, dos nomes, fatos e teorias que formam o conceito da unidade fonológica no estudo científico da língua. Iniciamos com o estudo da ciência da linguagem na Índia antiga (em particular, o estudo da gramática do sânscrito), seguido pelo estudo do alfabeto grego (incluindo aí os problemas relativos à língua grega, assim como à Gramática e à Filosofia). Finalmente, tentamos fazer “um recorte” preciso do momento na história das ideias linguísticas quando o conceito científico do fonema foi delineado, definido e incorporado à terminologia da epistemologia linguística. Os grandes teóricos da escola incipiente da Linguística Geral, da Fonologia e do fonema, são, como disse Jakobson, o linguista e filólogo suíço Saussure, e o filólogo e foneticista polonês Courtenay; mas a história do fonema não é nada simples. Recentemente, um trabalho meticuloso por parte dos pesquisadores tem resgatado grande parte desta história já há muito esquecida, no que tange as teorias antigas dos gramáticos filósofos hindus e gregos, e os manuscritos de Saussure recentemente publicados, assim como os artigos de Courtenay e seus alunos (entre eles o polonês Mikołaj Kruszewski), escritos que, em sua maioria, permanecem sem tradução ao português. Nossa tarefa, então, foi trazer à luz esta história, seus desenvolvimentos no campo da Linguística em geral, e da Fonologia em particular. Realizamos nossa análise por meio de um cuidadoso estudo do fonema, um conceito no qual vários séculos de história e de ideias linguísticas estão sedimentados. / The present work is the product of my research into the historical and linguistic aspects that underlie the concept of the phoneme. Our main idea originated from two different extracts by the Russian linguist Roman Jakobson: 1) on the genesis of the phoneme: “the search for the ultimate discrete differential constituents of language can be traced back to the sphoṭa doctrine of the Sanskrit grammarians and to Plato’s conception of στοιχεῖον, but the actual linguistic study of these invariants started only in the 1870s” (Jakobson, 1962:467); and 2) on the founders of Phonology: “The first foundations of Phonology were laid by Baudouin de Courtenay, Ferdinand de Saussure and their disciples” (Jakobson, 1962:232). Thus, we attempted a historical and linguistic “reconstruction” of names, facts and theories that comprise the concept of a phonological unit and that of the phonological structure of language. We started with the study of the Science of Language in ancient India (in particular the grammar of Sanskrit), followed by the study of the Greek alphabet (including its implications concerning the Greek language, as well as Grammar and Philosophy). Finally, we attempted a precise “cut”, so to speak, on the moment in the history of Linguistic ideas when the scientific concept of the phoneme was outlined, defined and incorporated into the terminology of modern linguistic epistemology. The great theoreticians of the incipient school of General Linguistics, of Phonology and of the phoneme are, as Jakobson stated, the Swiss linguist and philologist Saussure, and the Polish philologist and phonetician Courtenay; yet the story inside the phoneme is anything but a simple one. Recently, meticulous scholarship has rescued a great part of this long forgotten history, in what concerns the ancient theories of both the Hindu and the Greek grammarian-philosophers, and the unpublished manuscript works of Saussure and the works of Courtenay and his students (among them the Polish professor Mikołaj Kruszewski), works that so far have remained without translation into Portuguese. Our task, then, has been to bring this history to light, its developments in the field of Linguistics in general, and Phonology in particular. We carried out this analysis by means of a careful study of the phoneme, a concept in which several hundred years of history and linguistic ideas have crystallized.
66 |
La pronunciación de ELE en los alumnos quebequenses : dificultades concretas y pautas de correcciónMolinié, Luisa 02 1900 (has links)
Mon sujet de recherche traite sur la prononciation de l'espagnol comme langue étrangère chez les élèves québécois, sur leurs difficultés concrètes et lignes de correction qui peuvent leur être attribuées.
Dans une première partie plus générale, nous traiterons sur l'enseignement de la prononciation, de la place qu'elle occupe dans l'enseignement d'une langue étrangère. Nous croyons que la prononciation est un aspect de la langue qui a été mis de côté pour mettre en valeur la communication. Si une "mauvaise" prononciation n'entrave pas à la compréhension ou à la communication, elle n'est pas corrigée ni travaillée. Nous pouvons donc nous retrouver avec des étudiants ayant un haut niveau d'espagnol mais dont la prononciation connaît certaines lacunes.
Nous déterminerons également ce que nous entendons par "meilleure" ou "mauvaise" prononciation, nous nous interrogerons également sur la pertinence de l'enseignement de la phonétique. Nous nous poserons aussi la question sur la place de la prononciation selon la méthodologie didactique utilisée, et analyserons la quantité et qualité des exercices de prononciation présents ou pas dans les manuels scolaires, et s'ils correspondent aux exigences des documents officiels tels le Cadre commun européenne de référence, ou le Plan curricular de l'institut Cervantès.
Dans une deuxième partie nous nous questionnons sur les facteurs qui conditionnent l'apprentissage d'une langue et le perfectionnement de la prononciation dans une langue étrangère, car nous croyons que peut importe l'âge de l'étudiant, il y a toujours place à l'amélioration dans la prononciation. Nous nous interrogeons ensuite sur les tendances générales des francophones lors de leur prononciation de l'espagnol, nous ferons une étude contrastive des phonèmes espagnols et français, puis nous étudierons plus en détail les tendances des élèves québécois, car nous croyons que ces derniers sont dotés de certains atouts en comparaison à d'autres francophones.
Dans une troisième partie, nous proposons des exercices visant à améliorer la prononciation chez nos élèves, et afin de vérifier l'efficacité de ces exercices, nous enregistrerons des étudiants ayant bénéficié de ces exercices, et d'autres qui n'y auront pas eu droit. Cette étude comparative cherche à prouver que ces exercices aident réellement et qu'ils, ou d'autres exercices de ce genre, devraient être inclus dans l'enseignement. Le questionnaire dont il s'agit s'attarde principalement au phénomène du [r], que nous croyons être un, ou le son le plus difficile à prononcer en espagnol (autant la vibrante simple comme multiple). Bien entendu, une partie de ce chapitre sera consacrée à l'analyse de résultats. / My subject of study is about Spanish as a second language, in French Canadian students, about their difficulties and the correction aid. In a first part, more general, we discuss about teaching pronunciation, the place it has in teaching as a second language. We believe that pronunciation is an aspect in a language that has been left out to emphasize on communication. If a “bad” pronunciation does not interfere in the comprehension or communication, it is not corrected or worked on. We can then find ourselves with students having a very high level in Spanish, but their pronunciation is not as good. We also define what we intend by “better” or “bad” pronunciation, and we also ask ourselves about the pertinence of teaching phonetics. We also interrogate ourselves on the question of the place of pronunciation depending on the didactic methodology, and we analyze the quantity and quality of the pronunciation exercises we find, or not, in scholar manuals, and if they correspond to the demands of the official documents as the Common European Framework of Reference for Languages or the Curricular Plan of Cervantes Institute.
In a second part, we ask ourselves about the factors that condition the learning experience of a language, and that is because we believe that no matter the student’s age, there’s always place for amelioration in the pronunciation of a second language. We we’ll also see the French general tendencies in their Spanish pronunciation, and we’ll make a contrastive study on Spanish and French phenomena and then elaborate on French Canadian tendencies, and that is because we believe that they have some sort of vantages in comparison to other French speaking people.
In a third and last part, we propose exercises that tend to help our student’s pronunciation, and to verify the efficiency of those exercises, we will record students that beneficiated of those exercises, and others who did not practice. That study wants to prove that those exercises really help, and that they should be included in the teaching and learning experience. The questionnaire we use is especially dedicated to the [r] phenomenon, because we believe it is a, or the most difficult sound to pronounce in Spanish (simple and multiple vibration forms). Of course, a part of this study will be dedicated to the results analysis.
67 |
Controle por unidades verbais mínimas e extensão da unidade ensinada: o efeito do treino de fonemas na emergência da leitura recombinativa / Minimal verbal units control and extension of the unit taught: effect of the phoneme training upon the emergence of recombinative readingLuana Zeolla Inhauser 14 November 2012 (has links)
A leitura é uma habilidade complexa que envolve uma rede de relações entre estímulos e entre estímulos e respostas. Para que um repertório de leitura seja considerado proficiente, o leitor deve apresentar a leitura de novas palavras, não diretamente ensinadas e formadas pela recombinação de unidades aprendidas previamente em outras relações (leitura recombinativa). Um requisito fundamental para o desenvolvimento da leitura recombinativa é, portanto, o estabelecimento de um responder diferencial sob controle das unidades menores, como por exemplo, sílabas, letras ou fonemas, componentes das palavras. O objetivo do presente trabalho foi investigar o efeito do treino direto de fonemas, combinado com um treino de palavras, sobre a emergência da leitura recombinativa de palavras inteiras. Verificou-se também se o momento de inserção do treino de fonemas no procedimento de ensino, se prévio (Condição I) ou sobreposto (Condição II) ao treino de palavras, foi uma variável relevante para a emergência da leitura recombinativa. Os participantes do estudo foram 12 estudantes universitários, com idades entre 18 e 37 anos, e que foram distribuídos igualmente entre três Condições Experimentais: a) Condição I Treino Prévio de Fonemas; b) Condição II Treino Sobreposto de Fonemas; c) Condição IIITreino de palavras. A Condição III foi realizada com o objetivo permitir uma comparação entre o desempenho dos participantes que foram submetidos ao treino de fonemas (Condições I e II) com o dos participantes que foram expostos somente ao treino de palavra (Condição III). Os estímulos utilizados no estudo consistiram de palavras faladas (A) e palavras escritas com um pseudoalfabeto (C), bem como de fonemas e letras do pseudoalfabeto, correspondentes a estes fonemas. Os procedimentos empregados tanto para o treino de fonemas como para o treino de palavras inteiras foram os procedimentos de MTS (relação AC) e de Nomeação Oral (relação CD). Os testes para verificar a emergência da leitura recombinativa foram os mesmos nas três Condições Experimentais e consistiram em testes parciais de leitura oral (Testes CD) e de leitura receptiva (Testes AC), e em um Teste Final de Leitura Oral. Os resultados demonstraram que o treino direto de fonemas (Condições I e II) foi eficaz em estabelecer leitura recombinativa com elevados índices de acertos e pouca variabilidade intra e inter-participantes. O momento de inserção do treino de fonemas no procedimento de ensino foi uma variável relevante e possibilitou a observação de resultados distintos entre os participantes submetidos às Condições I e II. Os resultados sugerem que os participantes submetidos a Condição II (Sobreposto) foram os que apresentaram maior velocidade na recombinação e índices mais elevados de leitura recombinativa, enquanto os participantes submetidos a Condição I (Prévio) foram os que apresentaram menor variabilidade entre si / Reading is a complex skill that involves a network of relationships between stimuli and between stimuli and responses. A repertoire of proficient reading is considered when the reader is able to read new words, not directly taught and composed by the recombination of units previously learned in other relationships (recombinative reading). An essential requirement for the development of recombinative reading is thus establishing a differential responding under the control of smaller units, such as syllables, phonemes or letters, components of the words. The objective of this study was to investigate the effect of direct training of phonemes, combined with a training of words, upon the emergence of recombinative reading of whole words. It was also verified the effect of the moment of insertion of phoneme training in the teaching procedure, prior (Condition I) and overlapping (Condition II) to word training. The participants were 12 college students, aged between 18 and 37 years, which were equally distributed among three Experimental Conditions: a) Condition I Prior Phoneme Training, b) Condition II Overlapping Phoneme Training; c) Condition III Word Training. Condition III was conducted in order to allow a comparison between the performance of participants who were submitted to phoneme training (Condition I and II) and the performance of participants who were exposed only to word training (Condition III). The stimuli used in the study consisted of spoken pseudowords (A) and printed pseudowords (C), as well phonemes and letters form the pseudo-alphabet corresponding to these phonemes. The procedures applied for both phoneme and word training were MTS (AC relation) and Oral Naming (CD relation). Tests to verify the emergence of recombinative reading were the same in all three Experimental Conditions and consisted of partial tests of oral reading (Tests C\'D) and receptive reading (Tests A\'C\'), as well as of a Final Test of Oral Reading. The results showed that the direct training of phonemes (Conditions I and II) was effective in establishing recombinative reading with high scores and low variability within and among participants. Regarding the moment of the insertion of phoneme training on the procedure, the conclusion is that this is an important variable and enables the observation of different results among participants submitted to Conditions I and II. The results suggest that participants undergoing Condition II (Overlapping) were those with greater recombinative speed and higher rates of recombinative reading, while participants submitted to Condition I (Prior) presented the lowest variability among themselves
68 |
Grapheme-to-phoneme conversion and its application to transliterationJiampojamarn, Sittichai 06 1900 (has links)
Grapheme-to-phoneme conversion (G2P) is the task of converting a word, represented by a sequence of graphemes, to its pronunciation, represented by a sequence of phonemes. The G2P task plays a crucial role in speech synthesis systems, and is an important part of other applications, including spelling correction and speech-to-speech machine translation. G2P conversion is a complex task, for which a number of diverse solutions have been proposed. In general, the problem is challenging because the source string does not unambiguously specify the target representation. In addition, the training data include only example word
pairs without the structural information of subword alignments.
In this thesis, I introduce several novel approaches for G2P conversion. My contributions can be categorized into (1) new alignment models and (2) new output generation models. With respect to alignment models, I present techniques including many-to-many alignment, phonetic-based alignment, alignment by integer linear programing and alignment-by-aggregation. Many-to-many alignment is designed to replace the one-to-one
alignment that has been used almost exclusively in the past. The new many-to-many alignments are more precise and accurate in expressing grapheme-phoneme relationships. The other proposed alignment approaches attempt to advance the training method beyond the use of Expectation-Maximization (EM). With respect to generation models, I first describe a framework for integrating many-to-many alignments and language models for grapheme classification. I then propose joint processing for G2P using online discriminative training. I integrate a generative joint n-gram model into the discriminative framework. Finally, I apply the proposed G2P systems to name transliteration generation and mining tasks. Experiments show that the proposed system achieves state-of-the-art performance in both the G2P and name transliteration tasks.
69 |
Perceptually motivated speech recognition and mispronunciation detectionKoniaris, Christos January 2012 (has links)
This doctoral thesis is the result of a research effort performed in two fields of speech technology, i.e., speech recognition and mispronunciation detection. Although the two areas are clearly distinguishable, the proposed approaches share a common hypothesis based on psychoacoustic processing of speech signals. The conjecture implies that the human auditory periphery provides a relatively good separation of different sound classes. Hence, it is possible to use recent findings from psychoacoustic perception together with mathematical and computational tools to model the auditory sensitivities to small speech signal changes. The performance of an automatic speech recognition system strongly depends on the representation used for the front-end. If the extracted features do not include all relevant information, the performance of the classification stage is inherently suboptimal. The work described in Papers A, B and C is motivated by the fact that humans perform better at speech recognition than machines, particularly for noisy environments. The goal is to make use of knowledge of human perception in the selection and optimization of speech features for speech recognition. These papers show that maximizing the similarity of the Euclidean geometry of the features to the geometry of the perceptual domain is a powerful tool to select or optimize features. Experiments with a practical speech recognizer confirm the validity of the principle. It is also shown an approach to improve mel frequency cepstrum coefficients (MFCCs) through offline optimization. The method has three advantages: i) it is computationally inexpensive, ii) it does not use the auditory model directly, thus avoiding its computational cost, and iii) importantly, it provides better recognition performance than traditional MFCCs for both clean and noisy conditions. The second task concerns automatic pronunciation error detection. The research, described in Papers D, E and F, is motivated by the observation that almost all native speakers perceive, relatively easily, the acoustic characteristics of their own language when it is produced by speakers of the language. Small variations within a phoneme category, sometimes different for various phonemes, do not change significantly the perception of the language’s own sounds. Several methods are introduced based on similarity measures of the Euclidean space spanned by the acoustic representations of the speech signal and the Euclidean space spanned by an auditory model output, to identify the problematic phonemes for a given speaker. The methods are tested for groups of speakers from different languages and evaluated according to a theoretical linguistic study showing that they can capture many of the problematic phonemes that speakers from each language mispronounce. Finally, a listening test on the same dataset verifies the validity of these methods. / <p>QC 20120914</p> / European Union FP6-034362 research project ACORNS / Computer-Animated language Teachers (CALATea)
70 |
När lögnare blir lugnare : En sociofonetisk studie av sammanfallet mellan kort ö och kort u i uppländskanWenner, Lena January 2010 (has links)
The phenomenon of an ongoing sound change leads in some cases to the pronunciation of short ö becoming more like that of short u. This thesis examines the relationship between short ö and u in Uppland Swedish. The localities included in the investigation were Uppsala, Norrtälje, Östervåla and Gräsö. In particular, the thesis examines the effects of age, gender and social status on the acquisition of a pronunciation where the phonemes are produced in a similar way, and whether the change occurs earlier in some words than others. The informants on Gräsö appear to have the highest occurrence of the merger, while those in Norrtälje are best at keeping ö and u apart. In general, men have a smaller difference between ö and u than women. Three different age groups were analysed and the results show that the oldest informants have the largest difference between ö and u and the youngest informants have the smallest difference. There are no significant differences between the three social status groups, but there is a tendency for those with the lowest social status to be better at keeping the phonemes apart than those with the highest social status. 13 minimal (or near-minimal) pairs were analysed to investigate whether the phonetic context has an effect on the degree to which ö and u are becoming more similar. The study shows that the smallest phonetic difference is found for word pairs with r occurring in the preceding or following context. The largest phonetic distance was found in word pairs beginning with a vowel. The study also examined whether there is a relationship between production, perception and attitude to u-sounding ö in Uppsala. By combining the production test results with the informants’ categorisation of u and ö in the perception test, the study shows that the informants with a small phonetic distance in their own speech were better at categorising stimuli correctly than the speakers who had a larger phonetic distance between ö and u in their own speech.
Page generated in 0.0419 seconds