• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 27
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 39
  • 39
  • 13
  • 10
  • 7
  • 6
  • 6
  • 5
  • 5
  • 5
  • 5
  • 5
  • 5
  • 5
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Deep Neural Network Acoustic Models for ASR

Mohamed, Abdel-rahman 01 April 2014 (has links)
Automatic speech recognition (ASR) is a key core technology for the information age. ASR systems have evolved from discriminating among isolated digits to recognizing telephone-quality, spontaneous speech, allowing for a growing number of practical applications in various sectors. Nevertheless, there are still serious challenges facing ASR which require major improvement in almost every stage of the speech recognition process. Until very recently, the standard approach to ASR had remained largely unchanged for many years. It used Hidden Markov Models (HMMs) to model the sequential structure of speech signals, with each HMM state using a mixture of diagonal covariance Gaussians (GMM) to model a spectral representation of the sound wave. This thesis describes new acoustic models based on Deep Neural Networks (DNN) that have begun to replace GMMs. For ASR, the deep structure of a DNN as well as its distributed representations allow for better generalization of learned features to new situations, even when only small amounts of training data are available. In addition, DNN acoustic models scale well to large vocabulary tasks significantly improving upon the best previous systems. Different input feature representations are analyzed to determine which one is more suitable for DNN acoustic models. Mel-frequency cepstral coefficients (MFCC) are inferior to log Mel-frequency spectral coefficients (MFSC) which help DNN models marginalize out speaker-specific information while focusing on discriminant phonetic features. Various speaker adaptation techniques are also introduced to further improve DNN performance. Another deep acoustic model based on Convolutional Neural Networks (CNN) is also proposed. Rather than using fully connected hidden layers as in a DNN, a CNN uses a pair of convolutional and pooling layers as building blocks. The convolution operation scans the frequency axis using a learned local spectro-temporal filter while in the pooling layer a maximum operation is applied to the learned features utilizing the smoothness of the input MFSC features to eliminate speaker variations expressed as shifts along the frequency axis in a way similar to vocal tract length normalization (VTLN) techniques. We show that the proposed DNN and CNN acoustic models achieve significant improvements over GMMs on various small and large vocabulary tasks.
22

Data sufficiency analysis for automatic speech recognition / by J.A.C. Badenhorst

Badenhorst, Jacob Andreas Cornelius January 2009 (has links)
The languages spoken in developing countries are diverse and most are currently under-resourced from an automatic speech recognition (ASR) perspective. In South Africa alone, 10 of the 11 official languages belong to this category. Given the potential for future applications of speech-based information systems such as spoken dialog system (SDSs) in these countries, the design of minimal ASR audio corpora is an important research area. Specifically, current ASR systems utilise acoustic models to represent acoustic variability, and effective ASR corpus design aims to optimise the amount of relevant variation within training data while minimising the size of the corpus. Therefore an investigation of the effect that different amounts and types of training data have on these models is needed. With this dissertation specific consideration is given to the data sufficiency principals that apply to the training of acoustic models. The investigation of this task lead to the following main achievements: 1) We define a new stability measurement protocol that provides the capability to view the variability of ASR training data. 2) This protocol allows for the investigation of the effect that various acoustic model complexities and ASR normalisation techniques have on ASR training data requirements. Specific trends with regard to the data requirements for different phone categories and how these are affected by various modelling strategies are observed. 3) Based on this analysis acoustic distances between phones are estimated across language borders, paving the way for further research in cross-language data sharing. Finally the knowledge obtained from these experiments is applied to perform a data sufficiency analysis of a new speech recognition corpus of South African languages: The Lwazi ASR corpus. The findings correlate well with initial phone recognition results and yield insight into the sufficient number of speakers required for the development of minimal telephone ASR corpora. / Thesis (M. Ing. (Computer and Electronical Engineering))--North-West University, Potchefstroom Campus, 2009.
23

Data sufficiency analysis for automatic speech recognition / by J.A.C. Badenhorst

Badenhorst, Jacob Andreas Cornelius January 2009 (has links)
The languages spoken in developing countries are diverse and most are currently under-resourced from an automatic speech recognition (ASR) perspective. In South Africa alone, 10 of the 11 official languages belong to this category. Given the potential for future applications of speech-based information systems such as spoken dialog system (SDSs) in these countries, the design of minimal ASR audio corpora is an important research area. Specifically, current ASR systems utilise acoustic models to represent acoustic variability, and effective ASR corpus design aims to optimise the amount of relevant variation within training data while minimising the size of the corpus. Therefore an investigation of the effect that different amounts and types of training data have on these models is needed. With this dissertation specific consideration is given to the data sufficiency principals that apply to the training of acoustic models. The investigation of this task lead to the following main achievements: 1) We define a new stability measurement protocol that provides the capability to view the variability of ASR training data. 2) This protocol allows for the investigation of the effect that various acoustic model complexities and ASR normalisation techniques have on ASR training data requirements. Specific trends with regard to the data requirements for different phone categories and how these are affected by various modelling strategies are observed. 3) Based on this analysis acoustic distances between phones are estimated across language borders, paving the way for further research in cross-language data sharing. Finally the knowledge obtained from these experiments is applied to perform a data sufficiency analysis of a new speech recognition corpus of South African languages: The Lwazi ASR corpus. The findings correlate well with initial phone recognition results and yield insight into the sufficient number of speakers required for the development of minimal telephone ASR corpora. / Thesis (M. Ing. (Computer and Electronical Engineering))--North-West University, Potchefstroom Campus, 2009.
24

Ground vehicle acoustic signal processing based on biological hearing models

Liu, Li, January 1999 (has links) (PDF)
Thesis (M.S.) -- University of Maryland, College Park, 1999. / Thesis research directed by Institute for Systems Research. "M.S. 99-6." Includes bibliographical references (leaves 75-78). Available also online as a PDF file via the World Wide Web.
25

Weakly non-local arbitrarily-shaped absorbing boundary conditions for acoustics and elastodynamics theory and numerical experiments

Lee, Sanghoon, Kallivokas, Loukas F., January 2004 (has links) (PDF)
Thesis (Ph. D.)--University of Texas at Austin, 2004. / Supervisor: Loukas F. Kallivokas. Vita. Includes bibliographical references.
26

Three-dimensional acoustic propagation through shallow water internal, surface gravity and bottom sediment waves

Shmelev, Alexey Alexandrovich January 2011 (has links)
Thesis (Ph. D.)--Joint Program in Applied Ocean Science and Engineering (Massachusetts Institute of Technology, Dept. of Mechanical Engineering; and the Woods Hole Oceanographic Institution), 2011. / This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. / Cataloged from PDF version of thesis. / Includes bibliographical references (p. 185-193). / This thesis describes the physics of fully three-dimensional low frequency acoustic interaction with internal waves, bottom sediment waves and surface swell waves that are often observed in shallow waters and on continental slopes. A simple idealized model of the ocean waveguide is used to analytically study the properties of acoustic normal modes and their perturbations due to waves of each type. The combined approach of a semi-quantitative study based on the geometrical acoustics approximation and on fully three-dimensional coupled mode numerical modeling is used to examine the azimuthal dependence of sound wave horizontal reflection from, transmission through and ducting between straight parallel waves of each type. The impact of the natural crossings of nonlinear internal waves on horizontally ducted sound energy is studied theoretically and modeled numerically using a three-dimensional parabolic equation acoustic propagation code. A realistic sea surface elevation is synthesized from the directional spectrum of long swells and used for three-dimensional numerical modeling of acoustic propagation. As a result, considerable normal mode amplitude scintillations were observed and shown to be strongly dependent on horizontal azimuth, range and mode number. Full field numerical modeling of low frequency sound propagation through large sand waves located on a sloped bottom was performed using the high resolution bathymetry of the mouth of San Francisco Bay. Very strong acoustic ducting is shown to steer acoustic energy beams along the sand wave's curved crests. / by Alexey Alexandrovich Shmelev. / Ph.D.
27

Receptivity to free stream acoustic disturbances due to a roughness element on a flat plate

Ashour, Osama Naim 05 September 2009 (has links)
The boundary-layer receptivity resulting from acoustic forcing over a flat plate with a surface irregularity is investigated. The unsteady free-stream disturbances couple with the steady perturbations resulting from the surface irregularity to form a traveling-wave mode. The resonance condition necessary for receptivity requires a forcing at a wave number equal to that of the Tollmien-Schlichting (TS) eigenmode and a frequency equal to that of the free-stream acoustic disturbance. The basic (mean) flow is calculated using an interacting boundary layer (IBL) scheme that accounts for viscous/inviscid interactions. Then, the method of multiple scales is used to find the total amplitude of the generated wave. Results of this study show how the transition process is significantly stimulated. Also, the dependence of the receptivity on the geometry of the roughness element as well as on the amplitude and frequency of the acoustic disturbance is studied. Application of suction is shown to reduce the receptivity resulting from the roughness element. / Master of Science
28

Understanding and utilizing waveguide invariant range-frequency striations in ocean acoustic waveguides

Cockrell, Kevin L January 2011 (has links)
Thesis (Ph. D.)--Joint Program in Oceanography/Applied Ocean Science and Engineering (Massachusetts Institute of Technology, Dept. of Mechanical Engineering; and the Woods Hole Oceanographic Institution), February 2011. / Cataloged from PDF version of thesis. / Includes bibliographical references (p. 163-170). / Much of the recent research in ocean acoustics has focused on developing methods to exploit the effects that the sea surface and seafloor have on acoustic propagation. Many of those methods require detailed knowledge of the acoustic properties of the seafloor and the sound speed profile (SSP), which limits their applicability. The range-frequency waveguide invariant describes striations that often appear in plots of acoustic intensity versus range and frequency. These range-frequency striations have properties that depend strongly on the frequency of the acoustic source and on distance between the acoustic source and receiver, but that depend mildly on the SSP and seafloor properties. Because of this dependence, the waveguide invariant can be utilized for applications such as passive and active sonar, time-reversal mirrors, and array processing, even when the SSP or the seafloor properties are not well known. This thesis develops a framework for understanding and calculating the waveguide invariant, and uses that framework to develop signal processing techniques for the waveguide invariant. A method for passively estimating the range from an acoustic source to a receiver is developed, and tested on experimental data. Heuristics are developed to estimate the minimum source bandwidth and minimum horizontal aperture required for range estimation. A semi-analytic formula for the waveguide invariant is derived using WKB approximation along with a normal mode description of the acoustic field in a rangeindependent waveguide. This formula is applicable to waveguides with arbitrary SSPs, and reveals precisely how the SSP and the seafloor reflection coefficient affect the value of the waveguide invariant. Previous research has shown that the waveguide invariant range-frequency striations can be observed using a single hydrophone or a horizontal line array (HLA) of hydrophones. This thesis shows that traditional array processing techniques are sometimes inadequate for the purpose of observing range-frequency striations using a HLA. Array processing techniques designed specifically for observing range-frequency striations are developed and demonstrated. Finally, a relationship between the waveguide invariant and wavenumber integrations is derived, which may be useful for studying range-frequency striations in elastic environments such as ice-covered waveguides. / by Kevin L. Cockrell. / Ph.D.
29

Weakly non-local arbitrarily-shaped absorbing boundary conditions for acoustics and elastodynamics theory and numerical experiments

Lee, Sanghoon 28 August 2008 (has links)
Not available / text
30

Objective determination of vowel intelligibility of a cochlear implant model

Van Zyl, Joe. January 2009 (has links)
Thesis (M.Eng.(Bio-Engineering))--University of Pretoria, 2008. / Summaries in Afrikaans and English. Includes bibliographical references.

Page generated in 0.3953 seconds