Global ETD Search

1	Wavelet-based techniques for speech recognition Farooq, Omar January 2002 (has links) In this thesis, new wavelet-based techniques have been developed for the extraction of features from speech signals for the purpose of automatic speech recognition (ASR). One of the advantages of the wavelet transform over the short time Fourier transform (STFT) is its capability to process non-stationary signals. Since speech signals are not strictly stationary the wavelet transform is a better choice for time-frequency transformation of these signals. In addition it has compactly supported basis functions, thereby reducing the amount of computation as opposed to STFT where an overlapping window is needed. 621 Phoneme recognition
2	Pronunciation modelling and bootstrapping Davel, Marelie Hattingh 11 October 2005 (has links) Bootstrapping techniques have the potential to accelerate the development of language technology resources. This is of specific importance in the developing world where language technology resources are scarce and linguistic diversity is high. In this thesis we analyse the pronunciation modelling task within a bootstrapping framework, as a case study in the bootstrapping of language technology resources. We analyse the grapheme-to-phoneme conversion task in the search for a grapheme-to-phoneme conversion algorithm that can be utilised during bootstrapping. We experiment with enhancements to the Dynamically Expanding Context algorithm and develop a new algorithm for grapheme-tophoneme rule extraction (Default & Refine) that utilises the concept of a ‘default phoneme’ to create a cascade of increasingly specialised rules. This algorithm displays a number of attractive properties including rapid learning, language independence, good asymptotic accuracy, robustness to noise, and the production of a compact rule set. In order to have greater flexibility with regard to the various heuristic choices made during rewrite rule extraction, we define a new theoretical framework for analysing instance-based learning of rewrite rule sets. We define the concept of minimal representation graphs, and discuss the utility of these graphs in obtaining the smallest possible rule set describing a given set of discrete training data. We develop an approach for the interactive creation of pronunciation models via bootstrapping, and implement this approach in a system that integrates various of the analysed grapheme-to-phoneme alignment and conversion algorithms. The focus of this work is on combining machine learning and human intervention in such a way as to minimise the amount of human effort required during bootstrapping, and a generic framework for the analysis of this process is defined. Practical tools that support the bootstrapping process are developed and the efficiency of the process is analysed from both a machine learning and a human factors perspective. We find that even linguistically untrained users can use the system to create electronic pronunciation dictionaries accurately, in a fraction of the time the traditional approach requires. We create new dictionaries in a number of languages (isiZulu, Afrikaans and Sepedi) and demonstrate the utility of these dictionaries by incorporating them in speech technology systems. / Thesis (PhD (Electronic Engineering))--University of Pretoria, 2006. / Electrical, Electronic and Computer Engineering / unrestricted Grapheme-to-phoneme conversion Grapheme-to-phoneme alignment Bootstrapping UCTD
3	Mechanisms in phonetic tactile speech perception Ellis, Errol Mark January 1994 (has links) No description available. 621.31042 Hearing aids; Phoneme extraction
4	Fuzzy GMM-based Confidence Measure Towards Keywords Spotting Application Abida, Mohamed Kacem January 2007 (has links) The increasing need for more natural human machine interfaces has generated intensive research work directed toward designing and implementing natural speech enabled systems. The Spectrum of speech recognition applications ranges from understanding simple commands to getting all the information in the speech signal such as words, meaning and emotional state of the user. Because it is very hard to constrain a speaker when expressing a voice-based request, speech recognition systems have to be able to handle (by filtering out) out of vocabulary words in the users speech utterance, and only extract the necessary information (keywords) related to the application to deal correctly with the user query. In this thesis, we investigate an approach that can be deployed in keyword spotting systems. We propose a confidence measure feedback module that provides confidence values to be compared against existing Automatic Speech Recognizer word confidences. The feedback module mainly consists of a soft computing tool-based system using fuzzy Gaussian mixture models to identify all English phonemes. Testing has been carried out on the JULIUS system and the preliminary results show that our feedback module outperforms JULIUS confidence measures for both the correct spotted words and the falsely mapped ones. The results obtained could be refined even further using other type of confidence measure and the whole system could be used for a Natural Language Understanding based module for speech understanding applications. Confidence measure Phoneme classification Electrical and Computer Engineering
5	Fuzzy GMM-based Confidence Measure Towards Keywords Spotting Application Abida, Mohamed Kacem January 2007 (has links) The increasing need for more natural human machine interfaces has generated intensive research work directed toward designing and implementing natural speech enabled systems. The Spectrum of speech recognition applications ranges from understanding simple commands to getting all the information in the speech signal such as words, meaning and emotional state of the user. Because it is very hard to constrain a speaker when expressing a voice-based request, speech recognition systems have to be able to handle (by filtering out) out of vocabulary words in the users speech utterance, and only extract the necessary information (keywords) related to the application to deal correctly with the user query. In this thesis, we investigate an approach that can be deployed in keyword spotting systems. We propose a confidence measure feedback module that provides confidence values to be compared against existing Automatic Speech Recognizer word confidences. The feedback module mainly consists of a soft computing tool-based system using fuzzy Gaussian mixture models to identify all English phonemes. Testing has been carried out on the JULIUS system and the preliminary results show that our feedback module outperforms JULIUS confidence measures for both the correct spotted words and the falsely mapped ones. The results obtained could be refined even further using other type of confidence measure and the whole system could be used for a Natural Language Understanding based module for speech understanding applications. Confidence measure Phoneme classification Electrical and Computer Engineering
6	Phoneme Recognition by hidden Markov modeling Brighton, Andrew P. January 1989 (has links) No description available. Phoneme Recognition Hidden Markov Modeling Speech Recognition
7	Non-linguistic Influences on Infants' Nonnative Phoneme Perception: Exaggerated prosody and Visual Speech Information Aid Discrimination Ostroff, Wendy Louise 11 May 2000 (has links) Research indicates that infants lose the capacity to perceive distinctions in nonnative sounds as they become sensitive to the speech sounds of their native language (i.e., by 10- to 12-months of age). However, investigations into the decline in nonnative phonetic perception have neglected to examine the role of non-linguistic information. Exaggerated prosodic intonation and facial input are prominent in the infants' language-learning environment, and both have been shown to ease the task of speech perception. The current investigation was designed to examine the impact of infant-directed (ID) speech and facial input on infants' ability to discriminate phonemes that do not contrast in their native language. Specifically, 11-month-old infants were tested for discrimination of both a native phoneme contrast and a nonnative phoneme contrast across four conditions, including an auditory manipulation (ID speech vs. AD speech) and a visual manipulation (Face vs. Geometric Form). The results indicated that infants could discriminate the native phonemes across any of the four conditions. Furthermore, the infants could discriminate the nonnative phonemes if they had enhanced auditory and visual information available to them (i.e., if they were presented in ID speech with a synchronous facial display), and if the nonnative discrimination task was the infants' first test session. These results suggest that infants do not lose the capacity to discriminate nonnative phonemes by the end of the first postnatal year, but that they rely on certain language-relevant and non-linguistic sources of information to discriminate nonnative sounds. / Ph. D. Speech Perception Infant Language Learning Phoneme Perception
8	Effects of phonological awareness instruction on pre-reading skills of preschool children at-risk for reading disabilities Hsin, Yi-Wei 14 September 2007 (has links) No description available. Education, Special phonological awareness instruction phoneme blending phoneme segmentation word reading preschool children reading disabilities
9	A nasalização na língua Dâw / Nasalization in Dâw Andrade, Wallace Costa de 27 June 2014 (has links) Consoantes oclusivas sonoras e nasais apresentam similaridades articulatórias. Estes grupos de fones, em algumas línguas indígenas brasileiras, são alofones de um mesmo fonema. Nesses sistemas, há alofones intermediários que apresentam contorno oral-nasal. A língua Dâw, embora descrita com fonemas distintos para as classes oclusivas e nasais, apresenta consoantes de contorno como alofones em situação muito restrita: coda seguindo vogal oral. Este trabalho tem como objetivo descrever e analisar os contextos de nasalização da língua Dâw, através da elicitação de dados originais. Foram realizados três trabalhos de campo, nos quais fizemos gravações de dados com falantes nativos. Obtivemos dados acústicos, através de gravadores digitais, e aerodinâmicos, através do equipamento EVA2 que apresenta transdutores diferenciados para a medição de fluxo de ar oral e nasal. Utilizamos o conceito de distribuição para analisar os dados obtidos, devido à ausência de pares mínimos, pois a língua é tipologicamente isolante-analítica. Corroboramos a descrição anteriormente realizada (Martins, 2004) sobre a categorização de nasais como fonemas distintos, tanto consonantais como vocálicos. Verificamos também a ocorrência de espalhamento de nasalização de aproximantes tautossilábicas a partir de vogais nasais, como descrito, e acrescentamos à descrição o processo de espalhamento para a fricativa glotal surda /h/ quando esta se encontra na mesma sílaba que uma vogal nasal. Conseguimos determinar que o ambiente prosódico de espalhamento de nasalização é a sílaba, já que esse fenômeno não ocorre entre sílabas. Analisamos também se o contorno oral de consoantes nasais poderia ser um processo de longo alcance. Entretanto, os dados demonstraram seu alcance local, também restrito à estrutura da sílaba. As consoantes nasais de contorno oral resgatam, possivelmente, um estado antigo da língua, que pode ser verificado nas línguas-irmãs Hup e Yuhup, de restrição a adjacências mistas oral e nasal. Por ocorrer somente em posição de coda, atribuímos que o contato com o português-brasileiro (PB) manteve esse alofone nesta posição, pois no PB ocorre espalhamento de nasalização regressivo, o que seria indesejável para a língua Dâw, que possui distinção fonêmica entre vogais orais e nasais. Essa dessincronização do gesto velar causa o contorno devido às similaridades articulatórias entre oclusivas sonoras e nasais. Houve, ainda, dados em que a aerodinâmica não correspondeu à percepção acústica, ou seja, escutamos uma nasalização, mas não havia fluxo de ar correspondente. Achamos que essa discrepância deve-se a alguma manobra articulatória não compreendida. Quanto aos processos analisados através do método da Fonologia Prosódica, concluímos que ambos os processos não ocorrem em constituintes prosódicos hierarquicamente superiores / Stop voiced and nasal consonants have articulatory similarities. In some indigenous Brazilian languages, these groups of phones are allophones of the same phoneme. In such systems, there are intermediary allophones that have an oral-nasal contour. Dâw language, although described with distinct phonemes for the stop and nasal classes, has contour consonants as allophones in a very restricted situation: coda after an oral vowel. This dissertation aims to describe and analyze the contexts of nasalization in Dâw language through elicitation of original data. We undertook three fieldwork studies in which we made recordings of data with native speakers. We obtained acoustic data using a digital recorder and aerodynamic data using EVA2 equipment that has separate sensitive transducers for oral and nasal airflow measurement. We used the distribution concept to analyze the data, due to the absence of minimal pairs, since the language is typologically isolating-analytic. We corroborated the previous description (Martins, 2004) on the categorization of both consonant and vowel nasals as distinct phonemes. We also noticed the occurrence of nasal spreading from approximant tautosyllabic to nasal vowels, as described, and added to the description the spreading process for the voiceless glottal fricative /h/ when it is in the same syllable as a nasal vowel. We were able to determine that the prosodic environment of the nasal spreading is the syllable, because this phenomenon does not occur between syllables. We also analyzed whether the oral contour of nasal consonants could be a long-range process. However, the data proved it to be local range, also restricted to the syllable and not the adjacency. Oral-contour nasal consonants hark back to a former state of the language, which can also be seen in its sister languages Hup and Yuhup, with the restriction of mixed oral and nasal adjacencies. As it occurs only in the coda, we attribute the fact that this allophone has maintained this position due to contact with Brazilian Portuguese (BP), because regressive nasal spreading occurs in BP, which would be undesirable for Dâw language, which has phonemic distinction between oral and nasal vowels. This desynchronization of the velar gesture causes the contour due to articulatory similarities between stop voiced and nasal consonants. There were data where the aerodynamics did not match the acoustics, i.e., we heard nasalization, but there was no corresponding nasal airflow. We believe that this discrepancy is due to some articulatory maneuver that is not understood. As regards processes analyzed by Prosodic Phonology, we concluded that both processes do not occur in hierarchically superior prosodic constituents Experimental Phonetics Fonema nasal Fonética experimental Fonologia Nasal phoneme Phonology
10	A nasalização na língua Dâw / Nasalization in Dâw Wallace Costa de Andrade 27 June 2014 (has links) Consoantes oclusivas sonoras e nasais apresentam similaridades articulatórias. Estes grupos de fones, em algumas línguas indígenas brasileiras, são alofones de um mesmo fonema. Nesses sistemas, há alofones intermediários que apresentam contorno oral-nasal. A língua Dâw, embora descrita com fonemas distintos para as classes oclusivas e nasais, apresenta consoantes de contorno como alofones em situação muito restrita: coda seguindo vogal oral. Este trabalho tem como objetivo descrever e analisar os contextos de nasalização da língua Dâw, através da elicitação de dados originais. Foram realizados três trabalhos de campo, nos quais fizemos gravações de dados com falantes nativos. Obtivemos dados acústicos, através de gravadores digitais, e aerodinâmicos, através do equipamento EVA2 que apresenta transdutores diferenciados para a medição de fluxo de ar oral e nasal. Utilizamos o conceito de distribuição para analisar os dados obtidos, devido à ausência de pares mínimos, pois a língua é tipologicamente isolante-analítica. Corroboramos a descrição anteriormente realizada (Martins, 2004) sobre a categorização de nasais como fonemas distintos, tanto consonantais como vocálicos. Verificamos também a ocorrência de espalhamento de nasalização de aproximantes tautossilábicas a partir de vogais nasais, como descrito, e acrescentamos à descrição o processo de espalhamento para a fricativa glotal surda /h/ quando esta se encontra na mesma sílaba que uma vogal nasal. Conseguimos determinar que o ambiente prosódico de espalhamento de nasalização é a sílaba, já que esse fenômeno não ocorre entre sílabas. Analisamos também se o contorno oral de consoantes nasais poderia ser um processo de longo alcance. Entretanto, os dados demonstraram seu alcance local, também restrito à estrutura da sílaba. As consoantes nasais de contorno oral resgatam, possivelmente, um estado antigo da língua, que pode ser verificado nas línguas-irmãs Hup e Yuhup, de restrição a adjacências mistas oral e nasal. Por ocorrer somente em posição de coda, atribuímos que o contato com o português-brasileiro (PB) manteve esse alofone nesta posição, pois no PB ocorre espalhamento de nasalização regressivo, o que seria indesejável para a língua Dâw, que possui distinção fonêmica entre vogais orais e nasais. Essa dessincronização do gesto velar causa o contorno devido às similaridades articulatórias entre oclusivas sonoras e nasais. Houve, ainda, dados em que a aerodinâmica não correspondeu à percepção acústica, ou seja, escutamos uma nasalização, mas não havia fluxo de ar correspondente. Achamos que essa discrepância deve-se a alguma manobra articulatória não compreendida. Quanto aos processos analisados através do método da Fonologia Prosódica, concluímos que ambos os processos não ocorrem em constituintes prosódicos hierarquicamente superiores / Stop voiced and nasal consonants have articulatory similarities. In some indigenous Brazilian languages, these groups of phones are allophones of the same phoneme. In such systems, there are intermediary allophones that have an oral-nasal contour. Dâw language, although described with distinct phonemes for the stop and nasal classes, has contour consonants as allophones in a very restricted situation: coda after an oral vowel. This dissertation aims to describe and analyze the contexts of nasalization in Dâw language through elicitation of original data. We undertook three fieldwork studies in which we made recordings of data with native speakers. We obtained acoustic data using a digital recorder and aerodynamic data using EVA2 equipment that has separate sensitive transducers for oral and nasal airflow measurement. We used the distribution concept to analyze the data, due to the absence of minimal pairs, since the language is typologically isolating-analytic. We corroborated the previous description (Martins, 2004) on the categorization of both consonant and vowel nasals as distinct phonemes. We also noticed the occurrence of nasal spreading from approximant tautosyllabic to nasal vowels, as described, and added to the description the spreading process for the voiceless glottal fricative /h/ when it is in the same syllable as a nasal vowel. We were able to determine that the prosodic environment of the nasal spreading is the syllable, because this phenomenon does not occur between syllables. We also analyzed whether the oral contour of nasal consonants could be a long-range process. However, the data proved it to be local range, also restricted to the syllable and not the adjacency. Oral-contour nasal consonants hark back to a former state of the language, which can also be seen in its sister languages Hup and Yuhup, with the restriction of mixed oral and nasal adjacencies. As it occurs only in the coda, we attribute the fact that this allophone has maintained this position due to contact with Brazilian Portuguese (BP), because regressive nasal spreading occurs in BP, which would be undesirable for Dâw language, which has phonemic distinction between oral and nasal vowels. This desynchronization of the velar gesture causes the contour due to articulatory similarities between stop voiced and nasal consonants. There were data where the aerodynamics did not match the acoustics, i.e., we heard nasalization, but there was no corresponding nasal airflow. We believe that this discrepancy is due to some articulatory maneuver that is not understood. As regards processes analyzed by Prosodic Phonology, we concluded that both processes do not occur in hierarchically superior prosodic constituents Fonema nasal Fonética experimental Fonologia Experimental Phonetics Nasal phoneme Phonology

Search results