• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 9
  • 3
  • 3
  • 1
  • Tagged with
  • 23
  • 23
  • 16
  • 12
  • 9
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Automatic Subtitle Generation for Sound in Videos

Guenebaut, Boris January 2009 (has links)
<p>The last ten years have been the witnesses of the emergence of any kind of video content. Moreover, the appearance of dedicated websites for this phenomenon has increased the importance the public gives to it. In the same time, certain individuals are deaf and occasionally cannot understand the meanings of such videos because there is not any text transcription available. Therefore, it is necessary to find solutions for the purpose of making these media artefacts accessible for most people. Several software propose utilities to create subtitles for videos but all require an extensive participation of the user. Thence, a more automated concept is envisaged. This thesis report indicates a way to generate subtitles following standards by using speech recognition. Three parts are distinguished. The first one consists in separating audio from video and converting the audio in suitable format if necessary. The second phase proceeds to the recognition of speech contained in the audio. The ultimate stage generates a subtitle file from the recognition results of the previous step. Directions of implementation have been proposed for the three distinct modules. The experiment results have not done enough satisfaction and adjustments have to be realized for further work. Decoding parallelization, use of well trained models, and punctuation insertion are some of the improvements to be done.</p>
12

Probabilistic space maps for speech with applications

Kalgaonkar, Kaustubh 22 August 2011 (has links)
The objective of the proposed research is to develop a probabilistic model of speech production that exploits the multiplicity of mapping between the vocal tract area functions (VTAF) and speech spectra. Two thrusts are developed. In the first, a latent variable model that captures uncertainty in estimating the VTAF from speech data is investigated. The latent variable model uses this uncertainty to generate many-to-one mapping between observations of the VTAF and speech spectra. The second uses the probabilistic model of speech production to improve the performance of traditional speech algorithms, such as enhancement, acoustic model adaptation, etc. In this thesis, we propose to model the process of speech production with a probability map. This proposed model treats speech production as a probabilistic process with many-to-one mapping between VTAF and speech spectra. The thesis not only outlines a statistical framework to generate and train these probabilistic models from speech, but also demonstrates its power and flexibility with such applications as enhancing speech from both perceptual and recognition perspectives.
13

Automatic speech recognition for resource-scarce environments / N.T. Kleynhans.

Kleynhans, Neil Taylor January 2013 (has links)
Automatic speech recognition (ASR) technology has matured over the past few decades and has made significant impacts in a variety of fields, from assistive technologies to commercial products. However, ASR system development is a resource intensive activity and requires language resources in the form of text annotated audio recordings and pronunciation dictionaries. Unfortunately, many languages found in the developing world fall into the resource-scarce category and due to this resource scarcity the deployment of ASR systems in the developing world is severely inhibited. In this thesis we present research into developing techniques and tools to (1) harvest audio data, (2) rapidly adapt ASR systems and (3) select “useful” training samples in order to assist with resource-scarce ASR system development. We demonstrate an automatic audio harvesting approach which efficiently creates a speech recognition corpus by harvesting an easily available audio resource. We show that by starting with bootstrapped acoustic models, trained with language data obtain from a dialect, and then running through a few iterations of an alignment-filter-retrain phase it is possible to create an accurate speech recognition corpus. As a demonstration we create a South African English speech recognition corpus by using our approach and harvesting an internet website which provides audio and approximate transcriptions. The acoustic models developed from harvested data are evaluated on independent corpora and show that the proposed harvesting approach provides a robust means to create ASR resources. As there are many acoustic model adaptation techniques which can be implemented by an ASR system developer it becomes a costly endeavour to select the best adaptation technique. We investigate the dependence of the adaptation data amount and various adaptation techniques by systematically varying the adaptation data amount and comparing the performance of various adaptation techniques. We establish a guideline which can be used by an ASR developer to chose the best adaptation technique given a size constraint on the adaptation data, for the scenario where adaptation between narrow- and wide-band corpora must be performed. In addition, we investigate the effectiveness of a novel channel normalisation technique and compare the performance with standard normalisation and adaptation techniques. Lastly, we propose a new data selection framework which can be used to design a speech recognition corpus. We show for limited data sets, independent of language and bandwidth, the most effective strategy for data selection is frequency-matched selection and that the widely-used maximum entropy methods generally produced the least promising results. In our model, the frequency-matched selection method corresponds to a logarithmic relationship between accuracy and corpus size; we also investigated other model relationships, and found that a hyperbolic relationship (as suggested from simple asymptotic arguments in learning theory) may lead to somewhat better performance under certain conditions. / Thesis (PhD (Computer and Electronic Engineering))--North-West University, Potchefstroom Campus, 2013.
14

Automatic speech recognition for resource-scarce environments / N.T. Kleynhans.

Kleynhans, Neil Taylor January 2013 (has links)
Automatic speech recognition (ASR) technology has matured over the past few decades and has made significant impacts in a variety of fields, from assistive technologies to commercial products. However, ASR system development is a resource intensive activity and requires language resources in the form of text annotated audio recordings and pronunciation dictionaries. Unfortunately, many languages found in the developing world fall into the resource-scarce category and due to this resource scarcity the deployment of ASR systems in the developing world is severely inhibited. In this thesis we present research into developing techniques and tools to (1) harvest audio data, (2) rapidly adapt ASR systems and (3) select “useful” training samples in order to assist with resource-scarce ASR system development. We demonstrate an automatic audio harvesting approach which efficiently creates a speech recognition corpus by harvesting an easily available audio resource. We show that by starting with bootstrapped acoustic models, trained with language data obtain from a dialect, and then running through a few iterations of an alignment-filter-retrain phase it is possible to create an accurate speech recognition corpus. As a demonstration we create a South African English speech recognition corpus by using our approach and harvesting an internet website which provides audio and approximate transcriptions. The acoustic models developed from harvested data are evaluated on independent corpora and show that the proposed harvesting approach provides a robust means to create ASR resources. As there are many acoustic model adaptation techniques which can be implemented by an ASR system developer it becomes a costly endeavour to select the best adaptation technique. We investigate the dependence of the adaptation data amount and various adaptation techniques by systematically varying the adaptation data amount and comparing the performance of various adaptation techniques. We establish a guideline which can be used by an ASR developer to chose the best adaptation technique given a size constraint on the adaptation data, for the scenario where adaptation between narrow- and wide-band corpora must be performed. In addition, we investigate the effectiveness of a novel channel normalisation technique and compare the performance with standard normalisation and adaptation techniques. Lastly, we propose a new data selection framework which can be used to design a speech recognition corpus. We show for limited data sets, independent of language and bandwidth, the most effective strategy for data selection is frequency-matched selection and that the widely-used maximum entropy methods generally produced the least promising results. In our model, the frequency-matched selection method corresponds to a logarithmic relationship between accuracy and corpus size; we also investigated other model relationships, and found that a hyperbolic relationship (as suggested from simple asymptotic arguments in learning theory) may lead to somewhat better performance under certain conditions. / Thesis (PhD (Computer and Electronic Engineering))--North-West University, Potchefstroom Campus, 2013.
15

Automatic Subtitle Generation for Sound in Videos

Guenebaut, Boris January 2009 (has links)
The last ten years have been the witnesses of the emergence of any kind of video content. Moreover, the appearance of dedicated websites for this phenomenon has increased the importance the public gives to it. In the same time, certain individuals are deaf and occasionally cannot understand the meanings of such videos because there is not any text transcription available. Therefore, it is necessary to find solutions for the purpose of making these media artefacts accessible for most people. Several software propose utilities to create subtitles for videos but all require an extensive participation of the user. Thence, a more automated concept is envisaged. This thesis report indicates a way to generate subtitles following standards by using speech recognition. Three parts are distinguished. The first one consists in separating audio from video and converting the audio in suitable format if necessary. The second phase proceeds to the recognition of speech contained in the audio. The ultimate stage generates a subtitle file from the recognition results of the previous step. Directions of implementation have been proposed for the three distinct modules. The experiment results have not done enough satisfaction and adjustments have to be realized for further work. Decoding parallelization, use of well trained models, and punctuation insertion are some of the improvements to be done.
16

Objective determination of vowel intelligibility of a cochlear implant model

Van Zyl, Jan Louis 08 March 2009 (has links)
The goal of this study was to investigate the methodology in designing a vowel intelligibility model that can objectively predict the outcome of a vowel confusion test performed with normal hearing individuals listening to a cochlear implant acoustic model. The model attempts to mimic vowel perception of a cochlear implantee mathematically. The output of the model is the calculated probability of correct identification of vowel tokens and the probability of specific vowel confusions in a subjective vowel confusion test. In such a manner, the model can be used to aid cochlear implant research by complementing subjective listening tests. The model may also be used to test hypotheses concerning the use and relationship of acoustic cues in vowel identification. The objective vowel intelligibility model consists of two parts: the speech processing component (used to extract the acoustic cues which allow vowels to be identified) and the decision component (simulation of the decision making that takes place in the brain). Acoustic cues were extracted from the vowel sounds and used to calculate probabilities of identifying or confusing specific vowels. The confusion matrices produces by the objective vowel perception model were compared with results from subjective tests performed with normal hearing listeners listening to an acoustic cochlear implant model. The most frequent confusions could be predicted using the first two formant frequencies and the vowel duration as acoustic cues. The model could predict the deterioration of vowel recognition when noise was added to the speech being evaluated. The model provided a first approximation of vowel intelligibility and requires further4 development to completely predict speech perception of cochlear implantees. / Dissertation (ME)--University of Pretoria, 2009. / Electrical, Electronic and Computer Engineering / unrestricted
17

Rozpoznávání řeči s pomocí nástroje Sphinx-4 / Speech recognition using Sphinx-4

Kryške, Lukáš January 2014 (has links)
This diploma thesis is aimed to find an effective method for continuous speech recognition. To be more accurate, it uses speech-to-text recognition for a keyword spotting discipline. This solution is able to be applicable for phone calls analysis or for a similar application. Most of the diploma thesis describes and implements speech recognition framework Sphinx-4 which uses Hidden Markov models (HMM) to define a language acoustic models. It is explained how these models can be trained for a new language or for a new language dialect. Finally there is in detail described how to implement the keyword spotting in the Java language.
18

Factors that limit control effectiveness in self-excited noise driven combustors

Crawford, Jackie H., III 27 March 2012 (has links)
A full Strouhal number thermo-acoustic model is purposed for the feedback control of self excited noise driven combustors. The inclusion of time delays in the volumetric heat release perturbation models create unique behavioral characteristics which are not properly reproduced within current low Strouhal number thermo acoustic models. New analysis tools using probability density functions are introduced which enable exact expressions for the statistics of a time delayed system. Additionally, preexisting tools from applied mathematics and control theory for spectral analysis of time delay systems are introduced to the combustion community. These new analysis tools can be used to extend sensitivity function analysis used in control theory to explain limits to control effectiveness in self-excited combustors. The control effectiveness of self-excited combustors with actuator constraints are found to be most sensitive to the location of non-minimum phase zeros. Modeling the non-minimum phase zeros correctly require accurate volumetric heat release perturbation models. Designs that removes non-minimum phase zeros are more likely to have poles in the right hand complex plane. As a result, unstable combustors are inherently more responsive to feedback control.
19

Modélisation analytique magnéto-acoustique des machines synchrones à commutation de flux à aimants permanents : optimisation du dimensionnement / Analytical approach for magnetic, mechanical and acoustic modeling of flux-switching permanent-magnet motors : application to geometrical optimization

Boisson, Julien 25 November 2014 (has links)
Cette thèse porte sur l'étude des machines synchrones à commutation de flux à aimants permanents et plus particulièrement sur le comportement magnétique, mécanique et acoustique de ces structures particulières. La finalité étant de réaliser une optimisation géométrique alliant rapidité et robustesse tout en prenant en compte ces critères multi-physiques sus-cités. Dans un premier temps, nous avons dressé une vue d'ensemble du fonctionnement de ces machines en présentant le principe de fonctionnement de la commutation de flux et en étudiant une structure penta-phasée 20/18. L'origine du bruit générée dans ces structures a ensuite été débattue avec une attention toute particulière pour les phénomènes magnétiques. Une exploration des différentes contraintes magnétiques ainsi qu'une analyse mécanique et vibratoire par simulations éléments finis et par mesures expérimentales a été réalisée. Dans un deuxième temps, un modèle multi-physique permettant d'estimer l'état magnétique, mécanique et acoustique de ces structures a été présenté. Le choix s'est porté sur une modélisation entièrement analytique devant la nécessité d'obtenir un modèle rapide. Les différents modèles ont été développés et validés soit par simulations éléments finis, soit par mesures expérimentales. Le modèle magnéto-statique a été réalisée par résolution formelle des équations de Maxwell de la magnéto-statique par développement en série de Fourier. Le modèle mécanique a consisté à calculer les modes et les fréquences propres d'ovalisations du stator par une approche énergétique appelé la méthode du quotient de Rayleigh. Le modèle vibratoire / acoustique a, quant à lui, été réalisé en résolvant les équations d'équilibre de la dynamique des poutres dans la base modale formée par les modes propres précédemment calculés. Enfin, dans un troisième et dernier temps, ce modèle a été appliqué dans le cadre d'une optimisation géométrique afin de maximiser le couple électromagnétique tout en minimisant la puissance acoustique rayonnée. Différentes structures de machines ont été abordées et des règles de construction "silencieuse" ont été proposées. L'influence de nombreux paramètres sur la génération de bruit a été étudiée. Ces optimisations ont été effectuées sur trois types de structures : une structure tri-phasée 12/10, une structure tétra-phasée 16/12 et enfin une structure penta-phasée 20/18. De plus elles ont été réalisées dans deux cas de figure : un cas dans lequel la machine est entraînée à vitesse fixe et un cas dans lequel la machine est entrainée à vitesse variable. / This thesis deals with the study of Flux-Switching Permanent-Magnets, in particular on magnetic, mechanical and acoustic behavior of these structures. Firstly, origin of noise generated has been presented with particular attention to magnetic phenomena. Exploration of magnetic stresses, mechanical and vibration analysis have been performed by finite element simulations. Secondly, an analytical multi-physics model has been presented in order to estimate magnetic, mechanical and acoustic behavior. The different models have been validated by finite element simulations or by experimental measurements. Finally, this model has been applied in a geometric optimization loop to maximize electromagnetic torque and minimize acoustic noise generated. These optimizations have been performed on 3-phases 12/10, 4-phases 16/12 and 5-phases 20/18 at fixed and variable speed.
20

Développement d'un traitement acoustique basses-fréquences pour application aérospatiale

Kerkeni, Dhia January 2015 (has links)
Résumé : Tout comme l’aéronautique, l’industrie aérospatiale s’est tournée progressivement vers l’emploi des coques à base des matériaux composites. Cette transition a permis d’alléger considérablement les structures aéronautiques et aérospatiales, et par conséquent, a réduit la consommation de carburants ainsi que l’impact écologique des aéronefs et des lanceurs. Toutefois, la loi de masse stipule que cela ne peut être sans conséquence sur la perte par transmission acoustique des panneaux, surtout sur les basses fréquences. Que ce soit pour la conformité aux exigences des normes aéronautiques en terme de niveau de pression acoustique à l’intérieur des cabines ou la protection des charges utiles dans les coiffes des lanceurs, les traitements acoustiques ciblant les basses fréquences s’avèrent un défi d’envergure. En effet, avec des contraintes très strictes de minimum de masse et de volume ajoutés, il est difficile de traiter les problèmes d’absorption acoustique basses-fréquences, avec les traitements phoniques classiques. Afin de tirer avantage des effets résonants pour améliorer l’absorption sur les basses fréquences, ce projet se propose d’étudier l’intégration d’écrans résistifs à très faibles épaisseurs dans les revêtements acoustiques, tout en minimisant le poids. Il a été proposé dans le cadre de la chaire industrielle de recherche en aéroacoustique dont les principaux bailleurs de fonds sont Bombardier aerospace, Pratt & whitney et Bell helicopter. Aussi, des travaux de recherche connexes se sont déroulés en partenariat avec ULA (United Launch Alliance). Ce travail de maîtrise comporte une partie bibliographique qui présente une étude exhaustive des traitements basses-fréquences existants, tout en y portant un regard critique. La partie théorique met particulièrement l’emphase sur les différents modèles de propagation d’onde et les phénomènes de dissipations dans les milieux poreux. Dans cette partie, on dénombre également les différents types d’écrans ainsi que les modèles de propagation correspondants. Des critères permettant une étude comparative objective du point de vue masse/performance ont été proposés. En plus des paramètres non acoustiques, les conditions de montage et d’agencement des couches ont été étudiées avec des simulations numériques appuyées par des mesures expérimentales. Dans l’avant dernier chapitre, un modèle SEA (Statical Energy Analysis) d’une coiffe de lanceur a été construit avec tous ses détails à partir d’un exemple concret. Les simulations se sont terminées avec une étude comparative de la réduction de niveau de pression acoustique dans la coiffe. Le dernier chapitre résume les principaux résultats et conclusions de cette étude. / Abstract : Over the last few decades, the aerospace industry has witnessed a significant emergence of the use of composite shells. The latter are continuously replacing the metallic ones. This transition allowed a significant weight reduction of flying structures. Consequently, it substantially lessened the fuel consumption and mitigated the environmental footprint of aircrafts and space launch vehicles. However, evoking the mass law, this transition clearly cannot be without consequences on the acoustic transmission loss of fuselage panels. Neither can it be on payload fairings, especially over the low frequencies. Whether to meet with the standards and regulations in terms of acoustic pressure levels inside the pressurized cabins or to protect the payloads inside the launchers fairings, acoustic treatments design targeting the low frequencies seems to be a challenging issue. Indeed, with very stringent constraints in terms of added weight and volume, it is difficult to deal with the low frequencies noise and vibration, using passive monolayers. In order to take advantage of the resonant effects in enhancing low frequencies absorption, this work intends to investigate the integration of heavy treatments of very low thicknesses in acoustic coatings while minimizing weight. It was conducted within the frame work of the industrial research chair in aeroacoustics, whose main funders are : Bombardier Aerospace, Pratt & Whitney and Bell Helicopter. Also, other related researches were jointly carried out in partnership with ULA (United Launch Alliance). This master thesis includes a bibliographical section which consists of an overview of the existing low-frequencies solutions while keeping a critical eye on most of them. The following theoretical part focuses on the different models describing wave s’ propagation and dissipation phenomena in porous media. In the same section, we also list the different types of screens and the corresponding propagation models. Criteria for an objective comparative study in terms of weight versus performance were suggested. In addition to non-acoustic parameters, the layout and the mounting conditions of the acoustic packages were also investigated by means of numerical simulations, seconded by experimental measurements. In the penultimate chapter, an SEA detailed model of a launcher fairing was built based on data taken from a concrete example. The simulations ended with a comparative study of the sound pressure level reduction in the inner fairing cavity. The final chapter summarizes the main findings, conclusions and perspectives of this study.

Page generated in 0.0918 seconds