• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • 1
  • Tagged with
  • 5
  • 5
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Data sufficiency analysis for automatic speech recognition / by J.A.C. Badenhorst

Badenhorst, Jacob Andreas Cornelius January 2009 (has links)
The languages spoken in developing countries are diverse and most are currently under-resourced from an automatic speech recognition (ASR) perspective. In South Africa alone, 10 of the 11 official languages belong to this category. Given the potential for future applications of speech-based information systems such as spoken dialog system (SDSs) in these countries, the design of minimal ASR audio corpora is an important research area. Specifically, current ASR systems utilise acoustic models to represent acoustic variability, and effective ASR corpus design aims to optimise the amount of relevant variation within training data while minimising the size of the corpus. Therefore an investigation of the effect that different amounts and types of training data have on these models is needed. With this dissertation specific consideration is given to the data sufficiency principals that apply to the training of acoustic models. The investigation of this task lead to the following main achievements: 1) We define a new stability measurement protocol that provides the capability to view the variability of ASR training data. 2) This protocol allows for the investigation of the effect that various acoustic model complexities and ASR normalisation techniques have on ASR training data requirements. Specific trends with regard to the data requirements for different phone categories and how these are affected by various modelling strategies are observed. 3) Based on this analysis acoustic distances between phones are estimated across language borders, paving the way for further research in cross-language data sharing. Finally the knowledge obtained from these experiments is applied to perform a data sufficiency analysis of a new speech recognition corpus of South African languages: The Lwazi ASR corpus. The findings correlate well with initial phone recognition results and yield insight into the sufficient number of speakers required for the development of minimal telephone ASR corpora. / Thesis (M. Ing. (Computer and Electronical Engineering))--North-West University, Potchefstroom Campus, 2009.
2

Data sufficiency analysis for automatic speech recognition / by J.A.C. Badenhorst

Badenhorst, Jacob Andreas Cornelius January 2009 (has links)
The languages spoken in developing countries are diverse and most are currently under-resourced from an automatic speech recognition (ASR) perspective. In South Africa alone, 10 of the 11 official languages belong to this category. Given the potential for future applications of speech-based information systems such as spoken dialog system (SDSs) in these countries, the design of minimal ASR audio corpora is an important research area. Specifically, current ASR systems utilise acoustic models to represent acoustic variability, and effective ASR corpus design aims to optimise the amount of relevant variation within training data while minimising the size of the corpus. Therefore an investigation of the effect that different amounts and types of training data have on these models is needed. With this dissertation specific consideration is given to the data sufficiency principals that apply to the training of acoustic models. The investigation of this task lead to the following main achievements: 1) We define a new stability measurement protocol that provides the capability to view the variability of ASR training data. 2) This protocol allows for the investigation of the effect that various acoustic model complexities and ASR normalisation techniques have on ASR training data requirements. Specific trends with regard to the data requirements for different phone categories and how these are affected by various modelling strategies are observed. 3) Based on this analysis acoustic distances between phones are estimated across language borders, paving the way for further research in cross-language data sharing. Finally the knowledge obtained from these experiments is applied to perform a data sufficiency analysis of a new speech recognition corpus of South African languages: The Lwazi ASR corpus. The findings correlate well with initial phone recognition results and yield insight into the sufficient number of speakers required for the development of minimal telephone ASR corpora. / Thesis (M. Ing. (Computer and Electronical Engineering))--North-West University, Potchefstroom Campus, 2009.
3

Comunicação acústica do lobo-guará: evidências de discriminação individual via playback de aulidos / The voice of the not so lonely maned wolf: evidence of individual discrimination via playback of the long-distance extended-bark

Balieiro, Flora Silveira 01 February 2016 (has links)
O canal acústico é um sistema de sinalização de longo alcance eficiente que pode ser especialmente efetivo para animais com hábitos crepusculares/noturnos. O lobo-guará é um canídeo ameaçado com hábitos crepusculares/noturnos que, no senso comum, é visto como uma espécie solitária. De fato, seria melhor definida como uma espécie gregária, uma vez que macho e fêmea dividem o mesmo território e a distância espacial entre eles varia de acordo com o período reprodutivo da fêmea. O aulido do lobo-guará é uma vocalização de longa distância que funciona como um mecanismo para aumentar a distância espacial entre coespecíficos, bem como para permitir casais de encontrarem um ao outro. Variações individuais nesta vocalização foram relatadas, mas a possibilidade de que elas possam ser percebidas e usadas pela espécie nunca foi testada. Deve-se esperar que essas variações individuais possam ser percebida, pois somente neste cenário seria plausível para o aulido ter a dupla função mencionada acima. Se esta variabilidade individual não é percebida pelos coespecíficos, a eficiência desta vocalização a longas distâncias seria comprometida, já que o ouvinte não seria capaz de identificar se o remetente é o seu parceiro reprodutivo ou um possível rival. Em nosso estudo usamos playbacks para testar se essas variações individuais podem ser percebidas por lobos em cativeiro e concluimos que eles podem. Pelo que conhecemos, esta é a primeira vez se demonstra que o lobo-guará é capaz de discriminar entre aulidos emitidos por diferentes indivíduos / The acoustic channel is an efficient long-distance signaling system that may be especially effective for animals with crepuscular/nocturnal habits. The maned wolf is a threatened canid with crepuscular/nocturnal habits that is thought to be a solitary species in common sense. As a matter of fact, it would be better defined as a gregarious species, since male and female share the same wide territory and the spatial distance between them varies according to the females reproductive period. The maned wolfs extended-bark is a long-distance vocalization that functions as a mechanism to increase spatial distance among conspecifics as well as to enable pair-mates to find each other. Individual variations in this vocalization have been reported, but the possibility that they can be perceived and used by the species has never been tested. One should expect these individual variations to be perceived by the species, since only in this scenario it would be plausible for the extended-bark to have the dual function stated above. If this individual variability is not perceived by conspecifics, the efficiency of this vocalization at long distances, at least for the hypothesized functions, would be compromised, as the hearer would not be able to identify if the sender is its reproductive partner or a possible rival. In our study we used playbacks to test if these individual variations can be perceived by captive wolves and have concluded that they can. To our knowledge, this is the first time it has been demonstrated that the maned wolf is capable of discriminating among extended-barks of different individuals
4

Comunicação acústica do lobo-guará: evidências de discriminação individual via playback de aulidos / The voice of the not so lonely maned wolf: evidence of individual discrimination via playback of the long-distance extended-bark

Flora Silveira Balieiro 01 February 2016 (has links)
O canal acústico é um sistema de sinalização de longo alcance eficiente que pode ser especialmente efetivo para animais com hábitos crepusculares/noturnos. O lobo-guará é um canídeo ameaçado com hábitos crepusculares/noturnos que, no senso comum, é visto como uma espécie solitária. De fato, seria melhor definida como uma espécie gregária, uma vez que macho e fêmea dividem o mesmo território e a distância espacial entre eles varia de acordo com o período reprodutivo da fêmea. O aulido do lobo-guará é uma vocalização de longa distância que funciona como um mecanismo para aumentar a distância espacial entre coespecíficos, bem como para permitir casais de encontrarem um ao outro. Variações individuais nesta vocalização foram relatadas, mas a possibilidade de que elas possam ser percebidas e usadas pela espécie nunca foi testada. Deve-se esperar que essas variações individuais possam ser percebida, pois somente neste cenário seria plausível para o aulido ter a dupla função mencionada acima. Se esta variabilidade individual não é percebida pelos coespecíficos, a eficiência desta vocalização a longas distâncias seria comprometida, já que o ouvinte não seria capaz de identificar se o remetente é o seu parceiro reprodutivo ou um possível rival. Em nosso estudo usamos playbacks para testar se essas variações individuais podem ser percebidas por lobos em cativeiro e concluimos que eles podem. Pelo que conhecemos, esta é a primeira vez se demonstra que o lobo-guará é capaz de discriminar entre aulidos emitidos por diferentes indivíduos / The acoustic channel is an efficient long-distance signaling system that may be especially effective for animals with crepuscular/nocturnal habits. The maned wolf is a threatened canid with crepuscular/nocturnal habits that is thought to be a solitary species in common sense. As a matter of fact, it would be better defined as a gregarious species, since male and female share the same wide territory and the spatial distance between them varies according to the females reproductive period. The maned wolfs extended-bark is a long-distance vocalization that functions as a mechanism to increase spatial distance among conspecifics as well as to enable pair-mates to find each other. Individual variations in this vocalization have been reported, but the possibility that they can be perceived and used by the species has never been tested. One should expect these individual variations to be perceived by the species, since only in this scenario it would be plausible for the extended-bark to have the dual function stated above. If this individual variability is not perceived by conspecifics, the efficiency of this vocalization at long distances, at least for the hypothesized functions, would be compromised, as the hearer would not be able to identify if the sender is its reproductive partner or a possible rival. In our study we used playbacks to test if these individual variations can be perceived by captive wolves and have concluded that they can. To our knowledge, this is the first time it has been demonstrated that the maned wolf is capable of discriminating among extended-barks of different individuals
5

L’analyse factorielle pour la modélisation acoustique des systèmes de reconnaissance de la parole / Factor analysis for acoustic modeling of speech recognition systems

Bouallegue, Mohamed 16 December 2013 (has links)
Dans cette thèse, nous proposons d’utiliser des techniques fondées sur l’analyse factorielle pour la modélisation acoustique pour le traitement automatique de la parole, notamment pour la Reconnaissance Automatique de la parole. Nous nous sommes, dans un premier temps, intéressés à la réduction de l’empreinte mémoire des modèles acoustiques. Notre méthode à base d’analyse factorielle a démontré une capacité de mutualisation des paramètres des modèles acoustiques, tout en maintenant des performances similaires à celles des modèles de base. La modélisation proposée nous conduit à décomposer l’ensemble des paramètres des modèles acoustiques en sous-ensembles de paramètres indépendants, ce qui permet une grande flexibilité pour d’éventuelles adaptations (locuteurs, genre, nouvelles tâches).Dans les modélisations actuelles, un état d’un Modèle de Markov Caché (MMC) est représenté par un mélange de Gaussiennes (GMM : Gaussian Mixture Model). Nous proposons, comme alternative, une représentation vectorielle des états : les fac- teur d’états. Ces facteur d’états nous permettent de mesurer efficacement la similarité entre les états des MMC au moyen d’une distance euclidienne, par exemple. Grâce à cette représenation vectorielle, nous proposons une méthode simple et efficace pour la construction de modèles acoustiques avec des états partagés. Cette procédure s’avère encore plus efficace dans le cas de langues peu ou très peu dotées en ressouces et enconnaissances linguistiques. Enfin, nos efforts se sont portés sur la robustesse des systèmes de reconnaissance de la parole face aux variabilités acoustiques, et plus particulièrement celles générées par l’environnement. Nous nous sommes intéressés, dans nos différentes expérimentations, à la variabilité locuteur, à la variabilité canal et au bruit additif. Grâce à notre approche s’appuyant sur l’analyse factorielle, nous avons démontré la possibilité de modéliser ces différents types de variabilité acoustique nuisible comme une composante additive dans le domaine cepstral. Nous soustrayons cette composante des vecteurs cepstraux pour annuler son effet pénalisant pour la reconnaissance de la parole / In this thesis, we propose to use techniques based on factor analysis to build acoustic models for automatic speech processing, especially Automatic Speech Recognition (ASR). Frstly, we were interested in reducing the footprint memory of acoustic models. Our factor analysis-based method demonstrated that it is possible to pool the parameters of acoustic models and still maintain performance similar to the one obtained with the baseline models. The proposed modeling leads us to deconstruct the ensemble of the acoustic model parameters into independent parameter sub-sets, which allow a great flexibility for particular adaptations (speakers, genre, new tasks etc.). With current modeling techniques, the state of a Hidden Markov Model (HMM) is represented by a combination of Gaussians (GMM : Gaussian Mixture Model). We propose as an alternative a vector representation of states : the factors of states. These factors of states enable us to accurately measure the similarity between the states of the HMM by means of an euclidean distance for example. Using this vector represen- tation, we propose a simple and effective method for building acoustic models with shared states. This procedure is even more effective when applied to under-resourced languages. Finally, we concentrated our efforts on the robustness of the speech recognition sys- tems to acoustic variabilities, particularly those generated by the environment. In our various experiments, we examined speaker variability, channel variability and additive noise. Through our factor analysis-based approach, we demonstrated the possibility of modeling these different types of acoustic variability as an additive component in the cepstral domain. By compensation of this component from the cepstral vectors, we are able to cancel out the harmful effect it has on speech recognition

Page generated in 0.2555 seconds