• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 54
  • 11
  • 6
  • 4
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 107
  • 42
  • 33
  • 30
  • 22
  • 21
  • 15
  • 15
  • 14
  • 13
  • 12
  • 11
  • 10
  • 10
  • 10
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Data-driven augmentation of pronunciation dictionaries

Loots, Linsen 03 1900 (has links)
Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2010. / ENGLISH ABSTRACT: This thesis investigates various data-driven techniques by which pronunciation dictionaries can be automatically augmented. First, well-established grapheme-to-phoneme (G2P) conversion techniques are evaluated for Standard South African English (SSAE), British English (RP) and American English (GenAm) by means of four appropriate dictionaries: SAEDICT, BEEP, CMUDICT and PRONLEX. Next, the decision tree algorithm is extended to allow the conversion of pronunciations between different accents by means of phoneme-to-phoneme (P2P) and grapheme-andphoneme- to-phoneme (GP2P) conversion. P2P conversion uses the phonemes of the source accent as input to the decision trees. GP2P conversion further incorporates the graphemes into the decision tree input. Both P2P and GP2P conversion are evaluated using the four dictionaries. It is found that, when the pronunciation is needed for a word not present in the target accent, it is substantially more accurate to modify an existing pronunciation from a different accent, than to derive it from the word’s spelling using G2P conversion. When converting between accents, GP2P conversion provides a significant further increase in performance above P2P. Finally, experiments are performed to determine how large a training dictionary is required in a target accent for G2P, P2P and GP2P conversion. It is found that GP2P conversion requires less training data than P2P and substantially less than G2P conversion. Furthermore, it is found that very little training data is needed for GP2P to perform at almost maximum accuracy. The bulk of the accuracy is achieved within the initial 500 words, and after 3000 words there is almost no further improvement. Some specific approaches to compiling the best training set are also considered. By means of an iterative greedy algorithm an optimal ranking of words to be included in the training set is discovered. Using this set is shown to lead to substantially better GP2P performance for the same training set size in comparison with alternative approaches such as the use of phonetically rich words or random selections. A mere 25 words of training data from this optimal set already achieve an accuracy within 1% of that of the full training dictionary. / AFRIKAANSE OPSOMMING: Hierdie tesis ondersoek verskeie data-gedrewe tegnieke waarmee uitspraakwoordeboeke outomaties aangevul kan word. Eerstens word gevestigde grafeem-na-foneem (G2P) omskakelingstegnieke ge¨evalueer vir Standaard Suid-Afrikaanse Engels (SSAE), Britse Engels (RP) en Amerikaanse Engels (GenAm) deur middel van vier geskikte woordeboeke: SAEDICT, BEEP, CMUDICT en PRONLEX. Voorts word die beslissingsboomalgoritme uitgebrei om die omskakeling van uitsprake tussen verskillende aksente moontlik te maak, deur middel van foneem-na-foneem (P2P) en grafeem-en-foneem-na-foneem (GP2P) omskakeling. P2P omskakeling gebruik die foneme van die bronaksent as inset vir die beslissingsbome. GP2P omskakeling inkorporeer verder die grafeme by die inset. Beide P2P en GP2P omskakeling word evalueer deur middel van die vier woordeboeke. Daar word bevind dat wanneer die uitspraak benodig word vir ’n woord wat nie in die teikenaksent teenwoordig is nie, dit bepaald meer akkuraat is om ’n bestaande uitspraak van ’n ander aksent aan te pas, as om dit af te lei vanuit die woord se spelling met G2P omskakeling. Wanneer daar tussen aksente omgeskakel word, gee GP2P omskakeling ’n verdere beduidende verbetering in akkuraatheid bo P2P. Laastens word eksperimente uitgevoer om die grootte te bepaal van die afrigtingswoordeboek wat benodig word in ’n teikenaksent vir G2P, P2P en GP2P omskakeling. Daar word bevind dat GP2P omskakeling minder afrigtingsdata as P2P en substansieel minder as G2P benodig. Verder word dit bevind dat baie min afrigtingsdata benodig word vir GP2P om teen bykans maksimum akkuraatheid te funksioneer. Die oorwig van die akkuraatheid word binne die eerste 500 woorde bereik, en n´a 3000 woorde is daar amper geen verdere verbetering nie. ’n Aantal spesifieke benaderings word ook oorweeg om die beste afrigtingstel saam te stel. Deur middel van ’n iteratiewe, gulsige algoritme word ’n optimale rangskikking van woorde bepaal vir insluiting by die afrigtingstel. Daar word getoon dat deur hierdie stel te gebruik, substansieel beter GP2P gedrag verkry word vir dieselfde grootte afrigtingstel in vergelyking met alternatiewe benaderings soos die gebruik van foneties-ryke woorde of lukrake seleksies. ’n Skamele 25 woorde uit hierdie optimale stel gee reeds ’n akkuraatheid binne 1% van di´e van die volle afrigtingswoordeboek.
52

O efeito do ensino de relações envolvendo sílabas e fonemas-grafemas sobre a leitura recombinativa / The effect of teaching relations involving syllabes and phonemes-graphemes on recombinative reading

Silva, Camila Maria Silveira da 17 June 2015 (has links)
Made available in DSpace on 2016-04-29T13:17:55Z (GMT). No. of bitstreams: 1 Camila Maria Silveira da Silva.pdf: 2416019 bytes, checksum: 9ece0c038eea46092ad1bf054681e855 (MD5) Previous issue date: 2015-06-17 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Behavior analysts have been investigating the acquisition of recombinative reading, but few studies verified the effect of audio-visual correspondence between phonemes and graphemes. This study aimed to verify the effect of teaching relations involving syllables and phonemes-grapheme on recombinative reading. The participants were six children. Three were submitted to computerized teaching procedure and the other three only performed the tests (control participants). For children of teaching, three sets of four words chosen based on a syllabic matrix were taught. For each word´s set, the procedure was: 1) pretest of word´s sets; 2) teaching of spoken word-picture relation (AB) and picture naming; 3) teaching of spoken word-written word relation (AC) and textual behavior of words; 4) intermediate test; 5) teaching of audio-visual relation with syllables and phonemes and graphemes and textual behavior of these minimal units; 6) identity constructed-response matching (CRMTS-ID) with letters and syllables, after the echoic of the minimum units and word; 7) post-test of word´s sets. In addition, at the beginning and end of the procedure the participants performed the Initial and Final Assessment, respectively, with all word´s sets and the Phonological Awareness Test. The results showed that two of the three teaching participants demonstrated recombinative reading at the end of the study. One (E1) of them presented recombinative reading in Set 1post-test, although accuracy rates have decreased after the teaching of Set 2 and risen again in the Set 3 post-test. The other (E2) demonstrated recombinative reading starting from the Set 2 post-tests. C1 (E1´s control) showed no equivalent level of learning, although C2 (E2´s control) had. The third participant (E3) showed learning of relations directly teaching only, and his control (C3) showed recombinative reading starting from post-tests of Set 2. A second control participant of E3 (C4) didn´t presented recombinative reading. Children had higher difficulty with Set 2, demonstrated by the number of teaching applications to meet the criteria in the post-tests and by the percentage of correct response in the textual behavior teaching. These results indicated the need to review the word´s matrix proposed in this study, whereas Set 2 of words did not include overlapping syllables of Set 1. However, in another study conducted with identical teaching procedure, except for the teaching only phonemes and graphemes as minimum units, better results were showed, even after the teaching of Set 2. We questioned whether the teaching of phonemes and graphemes omitting syllables teaching facilitated the control of the children´s behavior by the letters, so that the absence of syllabic overlap in the words of Set 2 doesn´t affected the participant´s performancee / Analistas do comportamento têm investigado a aquisição de leitura recombinativa, mas poucos são os estudos que verificaram o efeito da correspondência auditivo-visual entre fonemas e grafemas. O presente estudo teve como objetivo verificar o efeito do ensino de relações envolvendo sílabas e fonemas-grafemas sobre a leitura recombinativa. Participaram da pesquisa seis crianças. Três foram submetidas ao procedimento de ensino informatizado e três realizaram apenas os testes (controle). Para as crianças de ensino, três conjuntos de quatro palavras escolhidas com base em uma matriz silábica foram ensinados. Para o ensino de cada conjunto o procedimento foi: 1) pré-teste do conjunto de palavras; 2) ensino da relação palavra falada-figura (AB) e nomeação de figuras; 3) ensino da relação palavra falada-palavra escrita (AC) e comportamento textual de palavras; 4) teste intermediário; 5) ensino da relação auditivo-visual com sílabas e fonemas-grafemas e comportamento textual dessas unidades mínimas; 6) matching to sample de resposta construída de identidade (CRMTS-ID), com letras e sílabas, após o ecóico das unidades mínimas e da palavra apresentada; 7) pós-teste do conjunto de palavras. Além disso, no início e ao final do procedimento os participantes realizaram a Avaliação Inicial e a Avaliação Final, respectivamente, com todos os conjuntos de palavra e a Prova de Consciência Fonológica por Produção Oral. Os resultados mostraram que dois dos três participantes de ensino apresentaram leitura recombinativa ao final do estudo. Um deles (E1) apresentou desde o pós-teste do Conjunto 1 de palavras, embora as porcentagens de acerto tenham diminuído após o ensino do Conjunto 2 e se elevado novamente no pós-teste do Conjunto 3. O outro (E2) passou a demonstrar leitura recombinativa a partir do pós-teste do Conjunto 2. O controle de E1 não apresentou nível de aprendizagem equivalente, embora o controle de E2 o tenha feito. O terceiro participante (E3) demonstrou aprendizagem apenas nas relações diretamente ensinadas, e o seu controle (C3) demonstrou leitura recombinativa a partir dos pós-testes do Conjunto 2 de palavras. Um segundo participante controle de E3 (C4) não apresentou leitura recombinativa. Observou-se maior dificuldade das crianças com o Conjunto 2, quando se observa o número de aplicações do ensino para atingir o critério no pós-teste e as porcentagens de acerto no ensino de comportamento textual. Esses resultados indicaram a necessidade de revisão da matriz de palavras proposta no presente estudo, visto que o Conjunto 2 não contemplava a sobreposição das sílabas do Conjunto 1. No entanto, em outra pesquisa conduzida com procedimento de ensino idêntico, exceto pelo ensino somente dos fonemas e grafemas como unidades mínimas, obteve-se resultados melhores, inclusive após o ensino do Conjunto 2. Questionou-se se o ensino somente dos fonemas e grafemas com ausência das sílabas facilitou o controle do comportamento das crianças pelas letras, de modo que a ausência de sobreposição silábica na matriz não tenha prejudicado o desempenho após o ensino do Conjunto 2
53

Anglų kalbos vizemų pritaikymas lietuvių kalbos garsų animacijai / English visemes and Lithuanian phonemes mapping for animation

Mažonavičiūtė, Ingrida 27 June 2008 (has links)
Baigiamajame darbe tiriamas lietuvių kalbos garsų ir jų vaizdinės informacijos ryššys. Atliekama kalbančių galvų modelių animavimo algoritmų analizė, išškeliama jų problematika ir atsižvelgiant į tai pasiūloma lietuvių kalbos sintetinimo metodika, kuri yra pagrįsta anglų kalbos vizemų naudojimu. ŠŠiame darbe sukuriama 30 trimačių lietuvių kalbos vizemų, kurias vizualiai lyginant su standartinėmis anglų kalbos fonemų vizemomis, sudaroma lietuvišškų fonemų ir anglišškų vizemų atitikčių lentelė. Sudaryta lentelė naudojama lietuvių kalbos garso rinkmenai animuoti. / The connection of Lithuanian sounds and their visual aspect is analyzed. The thesis consists of talking head animation algorithms analysis, problematic topics. In reference it is proposed the idea, how to synthesize Lithuanian speech using English visemes. 30 three dimensional Lithuanian visemes are created. After visual comparison of 3D Lithuanian and standard English visemes, the table of Lithuanian phonemes and English visemes mapping is created. The table is used for animating the Lithuanian sound file.
54

Leveraging supplementary transcriptions and transliterations via re-ranking

Bhargava, Aditya Unknown Date
No description available.
55

Reëlgebaseerde klemtoontoekenning in 'n grafeem-na-foneemstelsel vir Afrikaans / E.W. Mouton

Mouton, Elsie Wilhelmina January 2010 (has links)
Text -to-speech systems currently are of great importance in the community. One core technology in this human language technology resource is stress assignment which plays an important role in any text-to-speech system. At present no automatic stress assigner for Afrikaans exists. For these reasons, the two most important aims of this project will be: a) to develop a complete and accurate set of stress rules for Afrikaans that can be implemented in an automatic stress assigner, and b) to develop an effective and highly accurate stress assigner in order to assign Afrikaans stress to words quickly and effectively. A set of stress rules for Afrikaans was developed in order to reach the first goal. It consists of 18 rules that are divided into groups for words that contain a schwa, derivations, and disyllabic, tri-syllabic and polysyllabic simplex words. Next, different approaches that can be used to develop a stress assigner were examined, and the rule-based approach was used to implement the developed stress rules within the stress assigner. The programming language, Perl, was chosen for the implementation of the rules. The chosen algorithm was used to generate a stress assigner for Afrikaans by implementing the stress rules developed. The hyphenator, Calomo and the compound analyser, CKarma was used to hyphenate all the test data and detect word boundaries within compounds. A dataset of 10 000 correctly annotated tokens was developed during the testing process. The evaluation of the stress assigner consists of four phases. During the first phase, the stress assigner was evaluated with the 10 000 tokens and achieved an accuracy of 92.09%. The grapheme - to-phoneme converter was evaluated with the same data and scored 91.9%. The influence of various factors on stress assignment was determined, and it was established that stress assignment is an essential component of rule-based grapheme-to-phoneme conversion. In conclusion, it can be said that the stress assigner achieved satisfactory results, and that the stress assigner can be successfully utilized in future projects to develop training data for further experiments with stress assignment and grapheme-to-phoneme conversion for Afrikaans. Experiments can be conducted in future with data-driven approaches that possibly may lead to better results in Afrikaans stress assignment and grapheme-to-phoneme conversion. / Thesis (M.A. (Applied Language and Literary Studies))--North-West University, Potchefstroom Campus, 2010.
56

Reëlgebaseerde klemtoontoekenning in 'n grafeem-na-foneemstelsel vir Afrikaans / E.W. Mouton

Mouton, Elsie Wilhelmina January 2010 (has links)
Text -to-speech systems currently are of great importance in the community. One core technology in this human language technology resource is stress assignment which plays an important role in any text-to-speech system. At present no automatic stress assigner for Afrikaans exists. For these reasons, the two most important aims of this project will be: a) to develop a complete and accurate set of stress rules for Afrikaans that can be implemented in an automatic stress assigner, and b) to develop an effective and highly accurate stress assigner in order to assign Afrikaans stress to words quickly and effectively. A set of stress rules for Afrikaans was developed in order to reach the first goal. It consists of 18 rules that are divided into groups for words that contain a schwa, derivations, and disyllabic, tri-syllabic and polysyllabic simplex words. Next, different approaches that can be used to develop a stress assigner were examined, and the rule-based approach was used to implement the developed stress rules within the stress assigner. The programming language, Perl, was chosen for the implementation of the rules. The chosen algorithm was used to generate a stress assigner for Afrikaans by implementing the stress rules developed. The hyphenator, Calomo and the compound analyser, CKarma was used to hyphenate all the test data and detect word boundaries within compounds. A dataset of 10 000 correctly annotated tokens was developed during the testing process. The evaluation of the stress assigner consists of four phases. During the first phase, the stress assigner was evaluated with the 10 000 tokens and achieved an accuracy of 92.09%. The grapheme - to-phoneme converter was evaluated with the same data and scored 91.9%. The influence of various factors on stress assignment was determined, and it was established that stress assignment is an essential component of rule-based grapheme-to-phoneme conversion. In conclusion, it can be said that the stress assigner achieved satisfactory results, and that the stress assigner can be successfully utilized in future projects to develop training data for further experiments with stress assignment and grapheme-to-phoneme conversion for Afrikaans. Experiments can be conducted in future with data-driven approaches that possibly may lead to better results in Afrikaans stress assignment and grapheme-to-phoneme conversion. / Thesis (M.A. (Applied Language and Literary Studies))--North-West University, Potchefstroom Campus, 2010.
57

Voice Transformation And Development Of Related Speech Analysis Tools For Turkish

Salor, Ozgul 01 January 2005 (has links) (PDF)
In this dissertation, new approaches in the design of a voice transformation (VT) system for Turkish are proposed. Objectives in this thesis are two-fold. The first objective is to develop standard speech corpora and segmentation tools for Turkish speech research. The second objective is to consider new approaches for VT. A triphone-balanced set of 2462 Turkish sentences is prepared for analysis. An audio corpus of 100 speakers, each uttering 40 sentences out of the 2462-sentence set, is used to train a speech recognition system designed for English. This system is ported to Turkish to obtain a phonetic aligner and a phoneme recognizer. The triphone-balanced sentence set and the phonetic aligner are used to develop a speech corpus for VT. A new voice transformation approach based on Mixed Excitation Linear Prediction (MELP) speech coding framework is proposed. Multi-stage vector quantization of MELP is used to obtain speaker-specific line-spectral frequency (LSF) codebooks for source and target speakers. Histograms mapping the LSF spaces of source and target speakers are used for transformation in the baseline system. The baseline system is improved by a dynamic programming approach to estimate the target LSFs. As a second approach to the VT problem, quantizing the LSFs using k-means clustering algorithm is applied with dimension reduction of LSFs using principle component analysis. This approach provides speaker-specific codebooks out of the speech corpus instead of using MELP&#039 / s pre-trained LSF codebook. Evaluations show that both dimension reduction and dynamic programming improve the transformation performance.
58

Consciência fonológica e fonema : discutindo seus conceitos e seus empréstimos / Phonological awareness and phoneme : discussing their concepts and their loans

Souza, érika Costa de 26 August 2011 (has links)
This paper reflects on the conceptual issues regarding the notion of phonological awareness as a prerequisite for the acquisition of written language and the concepts of phoneme and sound. Despite repeatedly being used by researches in the areas of language pathology, the terms phoneme and sound have constantly been regarded as two equal entities, which implies in theoretical and methodological consequences in the research in these fields. The appropriation of these terms from Linguistics, in the phonological awareness researches was discussed in this dissertation based on the theoretical constructs of Structural Linguistics, especially those prepared by the precursors of the area of Phonology, as Trubetzkoy and Jakobson, and the American linguist Edward Sapir. Their reflections served as a conceptual support to confront the "appropriation" of the terms from Linguistics and the theoretical commitment maintained by the areas of language disorders with their original concepts. This work originated from theoretical reflections on the data from an experiment conducted with deaf and hearing children, students from 1st to 3rd grade of elementary school, whose aim was to evaluate the phonological awareness based on a copy activity of monosyllabic and trisyllabic words and pseudo-words. The initial concepts that guided the study, as well as the point of view of theories about phonological awareness and its assumption as a pre-requisite for the acquisition of an alphabetic written language, were confronted with the support of the Linguistics. / Este trabalho tece considerações sobre questões conceituais no que diz respeito à noção de consciência fonológica como pré-requisito para a aquisição de escrita e aos conceitos de fonema e som. Apesar de diversas vezes serem utilizados pelas pesquisas nas áreas de estudo das patologias de linguagem, os termos fonema e som têm, constantemente, sido considerados como duas entidades iguais, o que implica consequências teóricas e metodológicas nos trabalhos destes campos de estudo. A apropriação desses termos provindos da Linguística nas pesquisas sobre consciência fonológica foi discutida nesta dissertação com base nos construtos teóricos da Linguística Estruturalista, principalmente aqueles elaborados pelos precursores da área da Fonologia, como Trubetzkoy e Jakobson, e pelo estruturalista americano Edward Sapir. Suas reflexões conceituais serviram de apoio para confrontar o empréstimo dos termos da Linguística bem como o compromisso teórico mantido pelas áreas das patologias de linguagem com seus conceitos originais. Este trabalho originou-se a partir das reflexões teóricas sobre os dados de um experimento realizado com crianças surdas e ouvintes, alunas de 1ª a 3ª série do Ensino Fundamental, cujo objetivo foi avaliar a consciência fonológica com base em uma atividade de cópia de palavras e pseudopalavras monossílabas e trissílabas. Foram confrontados com apoio da Linguística os conceitos de partida que nortearam o estudo, o ponto de vista das teorizações sobre a consciência fonológica e a sua consideração como um pré-requisito para aquisição de escrita de uma língua alfabética.
59

Perception of prosody by cochlear implant recipients

Van Zyl, Marianne January 2014 (has links)
Recipients of present-day cochlear implants (CIs) display remarkable success with speech recognition in quiet, but not with speech recognition in noise. Normal-hearing (NH) listeners, in contrast, perform relatively well with speech recognition in noise. Understanding which speech features support successful perception in noise in NH listeners could provide insight into the difficulty that CI listeners experience in background noise. One set of speech features that has not been thoroughly investigated with regard to its noise immunity is prosody. Existing reports show that CI users have difficulty with prosody perception. The present study endeavoured to determine if prosody is particularly noise-immune in NH listeners and whether the difficulty that CI users experience in noise can be partly explained by poor prosody perception. This was done through the use of three listening experiments. The first listening experiment examined the noise immunity of prosody in NH listeners by comparing perception of a prosodic pattern to word recognition in speech-weighted noise (SWN). Prosody perception was tested in a two-alternatives forced-choice (2AFC) test paradigm using sentences conveying either conditional or unconditional permission, agreement or approval. Word recognition was measured in an open set test paradigm using meaningful sentences. Results indicated that the deterioration slope of prosody recognition (corrected for guessing) was significantly shallower than that of word recognition. At the lowest signal-to-noise ratio (SNR) tested, prosody recognition was significantly better than word recognition. The second experiment compared recognition of prosody and phonemes in SWN by testing perception of both in a 2AFC test paradigm. NH and CI listeners were tested using single words as stimuli. Two prosody recognition tasks were used; the first task required discrimination between questions and statements, while the second task required discrimination between a certain and a hesitant attitude. Phoneme recognition was measured with three vowel pairs selected according to specific acoustic cues. Contrary to the first experiment, the results of this experiment indicated that vowel recognition was significantly better than prosody recognition in noise in both listener groups. The difference between the results of the first and second experiments was thought to have been due to either the test paradigm difference in the first experiment (closed set versus open set), or a difference in stimuli between the experiments (single words versus sentences). The third experiment tested emotional prosody and phoneme perception of NH and CI listeners in SWN using sentence stimuli and a 4AFC test paradigm for both tasks. In NH listeners, deterioration slopes of prosody and phonemes (vowels and consonants) did not differ significantly, and at the lowest SNR tested there was no significant difference in recognition of the different types of speech material. In the CI group, prosody and vowel perception deteriorated with a similar slope, while consonant recognition showed a steeper slope than prosody recognition. It is concluded that while prosody might support speech recognition in noise in NH listeners, explicit recognition of prosodic patterns is not particularly noise-immune and does not account for the difficulty that CI users experience in noise. ## Ontvangers van hedendaagse kogleêre inplantings (KI’s) behaal merkwaardige sukses met spraakherkenning in stilte, maar nie met spraakherkenning in geraas nie. Normaalhorende (NH) luisteraars, aan die ander kant, vaar relatief goed met spraakherkenning in geraas. Begrip van die spraakeienskappe wat suksesvolle persepsie in geraas ondersteun in NH luisteraars, kan lei tot insig in die probleme wat KI-gebruikers in agtergrondgeraas ervaar. Een stel spraakeienskappe wat nog nie deeglik ondersoek is met betrekking tot ruisimmuniteit nie, is prosodie. Bestaande navorsing wys dat KI-gebruikers sukkel met persepsie van prosodie. Die huidige studie is onderneem om te bepaal of prosodie besonder ruisimmuun is in NH luisteraars en of die probleme wat KI-gebruikers in geraas ondervind, deels verklaar kan word deur swak prosodie-persepsie. Dit is gedoen deur middel van drie luistereksperimente. Die eerste luistereksperiment het die ruisimmuniteit van prosodie in NH luisteraars ondersoek deur die persepsie van ’n prosodiese patroon te vergelyk met woordherkenning in spraakgeweegde ruis (SGR). Prosodie-persepsie is getoets in ’n twee-alternatiewe-gedwonge-keuse- (2AGK) toetsparadigma met sinne wat voorwaardelike of onvoorwaardelike toestemming, instemming of goedkeuring oordra. Woordherkenning is gemeet in ’n oopstel-toetsparadigma met betekenisvolle sinne. Resultate het aangedui dat die helling van agteruitgang van prosodieherkenning (gekorrigeer vir raai) betekenisvol platter was as dié van woordherkenning, en dat by die laagste sein-tot-ruiswaarde (STR) wat getoets is, prosodieherkenning betekenisvol beter was as woordherkenning. Die tweede eksperiment het prosodie- en foneemherkenning in SGR vergelyk deur die persepsie van beide te toets in ’n 2AGK-toetsparadigma. NH en KI-luisteraars is getoets met enkelwoorde as stimuli. Twee prosodieherkenningstake is gebruik; die eerste taak het diskriminasie tussen vrae en stellings vereis, terwyl die tweede taak diskriminasie tussen ’n seker en onseker houding vereis het. Foneemherkenning is gemeet met drie vokaalpare wat geselekteer is na aanleiding van spesifieke akoestiese eienskappe. In teenstelling met die eerste eksperiment, het resultate van hierdie eksperiment aangedui dat vokaalherkenning betekenisvol beter was as prosodieherkenning in geraas in beide luisteraarsgroepe. Die verskil tussen die resultate van die eerste en tweede eksperimente kon moontlik die gevolg wees van óf die verskil in toetsparadigma in die eerste eksperiment (geslote- teenoor oop-stel), óf ’n verskil in stimuli tussen die eksperimente (enkelwoorde teenoor sinne). Die derde eksperiment het emosionele-prosodie- en foneempersepsie van NH en KI-luisteraars getoets in SGR met sinstimuli en ’n 4AGK-toetsparadigma vir beide take. In NH luisteraars het die helling van agteruitgang van die persepsie van prosodie en foneme (vokale en konsonante) nie betekenisvol verskil nie, en by die laagste STR wat getoets is, was daar nie ’n betekenisvolle verskil in die herkenning van die twee tipes spraakmateriaal nie. In die KI-groep het prosodie- en vokaalpersepsie met soortgelyke hellings agteruitgegaan, terwyl konsonantherkenning ’n steiler helling as prosodieherkenning vertoon het. Die gevolgtrekking was dat alhoewel prosodie spraakherkenning in geraas in NH luisteraars mag ondersteun, die eksplisiete herkenning van prosodiese patrone nie besonder ruisimmuun is nie en dus nie ’n verklaring bied vir die probleme wat KI-gebruikers in geraas ervaar nie. / Thesis (PhD)--University of Pretoria, 2014. / lk2014 / Electrical, Electronic and Computer Engineering / PhD / unrestricted
60

Automated phoneme mapping for cross-language speech recognition

Sooful, Jayren Jugpal 11 January 2005 (has links)
This dissertation explores a unique automated approach to map one phoneme set to another, based on the acoustic distances between the individual phonemes. Although the focus of this investigation is on cross-language applications, this automated approach can be extended to same-language but different-database applications as well. The main goal of this investigation is to be able to use the data of a source language, to train the initial acoustic models of a target language for which very little speech data may be available. To do this, an automatic technique for mapping the phonemes of the two data sets must be found. Using this technique, it would be possible to accelerate the development of a speech recognition system for a new language. The current research in the cross-language speech recognition field has focused on manual methods to map phonemes. This investigation has considered an English-to-Afrikaans phoneme mapping, as well as an Afrikaans-to-English phoneme mapping. This has been previously applied to these language instances, but utilising manual phoneme mapping methods. To determine the best phoneme mapping, different acoustic distance measures are compared. The distance measures that are considered are the Kullback-Leibler measure, the Bhattacharyya distance metric, the Mahalanobis measure, the Euclidean measure, the L2 metric and the Jeffreys-Matusita distance. The distance measures are tested by comparing the cross-database recognition results obtained on phoneme models created from the TIMIT speech corpus and a locally-compiled South African SUN Speech database. By selecting the most appropriate distance measure, an automated procedure to map phonemes from the source language to the target language can be done. The best distance measure for the mapping gives recognition rates comparable to a manual mapping process undertaken by a phonetic expert. This study also investigates the effect of the number of Gaussian mixture components on the mapping and on the speech recognition system’s performance. The results indicate that the recogniser’s performance increases up to a limit as the number of mixtures increase. In addition, this study has explored the effect of excluding the Mel Frequency delta and acceleration cepstral coefficients. It is found that the inclusion of these temporal features help improve the mapping and the recognition system’s phoneme recognition rate. Experiments are also carried out to determine the impact of the number of HMM recogniser states. It is found that single-state HMMs deliver the optimum cross-language phoneme recognition results. After having done the mapping, speaker adaptation strategies are applied on the recognisers to improve their target-language performance. The models of a fully trained speech recogniser in a source language are adapted to target-language models using Maximum Likelihood Linear Regression (MLLR) followed by Maximum A Posteriori (MAP) techniques. Embedded Baum-Welch re-estimation is used to further adapt the models to the target language. These techniques result in a considerable improvement in the phoneme recognition rate. Although a combination of MLLR and MAP techniques have been used previously in speech adaptation studies, the combination of MLLR, MAP and EBWR in cross-language speech recognition is a unique contribution of this study. Finally, a data pooling technique is applied to build a new recogniser using the automatically mapped phonemes from the target language as well as the source language phonemes. This new recogniser demonstrates moderate bilingual phoneme recognition capabilities. The bilingual recogniser is then further adapted to the target language using MAP and embedded Baum-Welch re-estimation techniques. This combination of adaptation techniques together with the data pooling strategy is uniquely applied in the field of cross-language recognition. The results obtained using this technique outperform all other techniques tested in terms of phoneme recognition rates, although it requires a considerably more time consuming training process. It displays only slightly poorer phoneme recognition than the recognisers trained and tested on the same language database. / Dissertation (MEng (Computer Engineering))--University of Pretoria, 2006. / Electrical, Electronic and Computer Engineering / unrestricted

Page generated in 0.0486 seconds