• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 344
  • 40
  • 24
  • 14
  • 10
  • 10
  • 9
  • 9
  • 9
  • 9
  • 9
  • 9
  • 8
  • 4
  • 3
  • Tagged with
  • 508
  • 508
  • 508
  • 181
  • 125
  • 103
  • 90
  • 50
  • 49
  • 44
  • 42
  • 42
  • 42
  • 40
  • 39
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
381

Statistical language models for Chinese recognition: speech and character

黃伯光, Wong, Pak-kwong. January 1998 (has links)
published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy
382

Machine Learning Methods for Articulatory Data

Berry, Jeffrey James January 2012 (has links)
Humans make use of more than just the audio signal to perceive speech. Behavioral and neurological research has shown that a person's knowledge of how speech is produced influences what is perceived. With methods for collecting articulatory data becoming more ubiquitous, methods for extracting useful information are needed to make this data useful to speech scientists, and for speech technology applications. This dissertation presents feature extraction methods for ultrasound images of the tongue and for data collected with an Electro-Magnetic Articulograph (EMA). The usefulness of these features is tested in several phoneme classification tasks. Feature extraction methods for ultrasound tongue images presented here consist of automatically tracing the tongue surface contour using a modified Deep Belief Network (DBN) (Hinton et al. 2006), and methods inspired by research in face recognition which use the entire image. The tongue tracing method consists of training a DBN as an autoencoder on concatenated images and traces, and then retraining the first two layers to accept only the image at runtime. This 'translational' DBN (tDBN) method is shown to produce traces comparable to those made by human experts. An iterative bootstrapping procedure is presented for using the tDBN to assist a human expert in labeling a new data set. Tongue contour traces are compared with the Eigentongues method of (Hueber et al. 2007), and a Gabor Jet representation in a 6-class phoneme classification task using Support Vector Classifiers (SVC), with Gabor Jets performing the best. These SVC methods are compared to a tDBN classifier, which extracts features from raw images and classifies them with accuracy only slightly lower than the Gabor Jet SVC method.For EMA data, supervised binary SVC feature detectors are trained for each feature in three versions of Distinctive Feature Theory (DFT): Preliminaries (Jakobson et al. 1954), The Sound Pattern of English (Chomsky and Halle 1968), and Unified Feature Theory (Clements and Hume 1995). Each of these feature sets, together with a fourth unsupervised feature set learned using Independent Components Analysis (ICA), are compared on their usefulness in a 46-class phoneme recognition task. Phoneme recognition is performed using a linear-chain Conditional Random Field (CRF) (Lafferty et al. 2001), which takes advantage of the temporal nature of speech, by looking at observations adjacent in time. Results of the phoneme recognition task show that Unified Feature Theory performs slightly better than the other versions of DFT. Surprisingly, ICA actually performs worse than running the CRF on raw EMA data.
383

Sensitivity analysis of blind separation of speech mixtures

Unknown Date (has links)
Blind source separation (BSS) refers to a class of methods by which multiple sensor signals are combined with the aim of estimating the original source signals. Independent component analysis (ICA) is one such method that effectively resolves static linear combinations of independent non-Gaussian distributions. We propose a method that can track variations in the mixing system by seeking a compromise between adaptive and block methods by using mini-batches. The resulting permutation indeterminacy is resolved based on the correlation continuity principle. Methods employing higher order cumulants in the separation criterion are susceptible to outliers in the finite sample case. We propose a robust method based on low-order non-integer moments by exploiting the Laplacian model of speech signals. We study separation methods for even (over)-determined linear convolutive mixtures in the frequency domain based on joint diagonalization of matrices employing time-varying second order statistics. We investigate the sources affecting the sensitivity of the solution under the finite sample case such as the set size, overlap amount and cross-spectrum estimation methods. / by Savaskan Bulek. / Thesis (Ph.D.)--Florida Atlantic University, 2010. / Includes bibliography. / Electronic reproduction. Boca Raton, Fla., 2010. Mode of access: World Wide Web.
384

Detection of Nonstationary Noise and Improved Voice Activity Detection in an Automotive Hands-free Environment

Laverty, Stephen William 11 May 2005 (has links)
Speech processing in the automotive environment is a challenging problem due to the presence of powerful and unpredictable nonstationary noise. This thesis addresses two detection problems involving both nonstationary noise signals and nonstationary desired signals. Two detectors are developed: one to detect passing vehicle noise in the presence of speech and one to detect speech in the presence of passing vehicle noise. The latter is then measured against a state-of-the-art voice activity detector used in telephony. The process of compiling a library of recordings in the automobile to facilitate this research is also detailed.
385

Ellection markup language (EML) based tele-voting system

Gong, XiangQi January 2009 (has links)
Elections are one of the most fundamental activities of a democratic society. As is the case in any other aspect of life, developments in technology have resulted changes in the voting procedure from using the traditional paper-based voting to voting by use of electronic means, or e-voting. E-voting involves using different forms of electronic means like / voting machines, voting via the Internet, telephone, SMS and digital interactive television. This thesis concerns voting by telephone, or televoting, it starts by giving a brief overview and evaluation of various models and technologies that are implemented within such systems. The aspects of televoting that have been investigated are technologies that provide a voice interface to the voter and conduct the voting process, namely the Election Markup Language (EML), Automated Speech Recognition (ASR) and Text-to-Speech (TTS).
386

Bio-inspired noise robust auditory features

Javadi, Ailar 12 June 2012 (has links)
The purpose of this work is to investigate a series of biologically inspired modifications to state-of-the-art Mel- frequency cepstral coefficients (MFCCs) that may improve automatic speech recognition results. We have provided recommendations to improve speech recognition results de- pending on signal-to-noise ratio levels of input signals. This work has been motivated by noise-robust auditory features (NRAF). In the feature extraction technique, after a signal is filtered using bandpass filters, a spatial derivative step is used to sharpen the results, followed by an envelope detector (recti- fication and smoothing) and down-sampling for each filter bank before being compressed. DCT is then applied to the results of all filter banks to produce features. The Hidden- Markov Model Toolkit (HTK) is used as the recognition back-end to perform speech recognition given the features we have extracted. In this work, we investigate the role of filter types, window size, spatial derivative, rectification types, smoothing, down- sampling and compression and compared the final results to state-of-the-art Mel-frequency cepstral coefficients (MFCC). A series of conclusions and insights are provided for each step of the process. The goal of this work has not been to outperform MFCCs; however, we have shown that by changing the compression type from log compression to 0.07 root compression we are able to outperform MFCCs for all noisy conditions.
387

Speech Analysis and Cognition Using Category-Dependent Features in a Model of the Central Auditory System

Jeon, Woojay 13 November 2006 (has links)
It is well known that machines perform far worse than humans in recognizing speech and audio, especially in noisy environments. One method of addressing this issue of robustness is to study physiological models of the human auditory system and to adopt some of its characteristics in computers. As a first step in studying the potential benefits of an elaborate computational model of the primary auditory cortex (A1) in the central auditory system, we qualitatively and quantitatively validate the model under existing speech processing recognition methodology. Next, we develop new insights and ideas on how to interpret the model, and reveal some of the advantages of its dimension-expansion that may be potentially used to improve existing speech processing and recognition methods. This is done by statistically analyzing the neural responses to various classes of speech signals and forming empirical conjectures on how cognitive information is encoded in a category-dependent manner. We also establish a theoretical framework that shows how noise and signal can be separated in the dimension-expanded cortical space. Finally, we develop new feature selection and pattern recognition methods to exploit the category-dependent encoding of noise-robust cognitive information in the cortical response. Category-dependent features are proposed as features that "specialize" in discriminating specific sets of classes, and as a natural way of incorporating them into a Bayesian decision framework, we propose methods to construct hierarchical classifiers that perform decisions in a two-stage process. Phoneme classification tasks using the TIMIT speech database are performed to quantitatively validate all developments in this work, and the results encourage future work in exploiting high-dimensional data with category(or class)-dependent features for improved classification or detection.
388

Physiologically Motivated Methods For Audio Pattern Classification

Ravindran, Sourabh 20 November 2006 (has links)
Human-like performance by machines in tasks of speech and audio processing has remained an elusive goal. In an attempt to bridge the gap in performance between humans and machines there has been an increased effort to study and model physiological processes. However, the widespread use of biologically inspired features proposed in the past has been hampered mainly by either the lack of robustness across a range of signal-to-noise ratios or the formidable computational costs. In physiological systems, sensor processing occurs in several stages. It is likely the case that signal features and biological processing techniques evolved together and are complementary or well matched. It is precisely for this reason that modeling the feature extraction processes should go hand in hand with modeling of the processes that use these features. This research presents a front-end feature extraction method for audio signals inspired by the human peripheral auditory system. New developments in the field of machine learning are leveraged to build classifiers to maximize the performance gains afforded by these features. The structure of the classification system is similar to what might be expected in physiological processing. Further, the feature extraction and classification algorithms can be efficiently implemented using the low-power cooperative analog-digital signal processing platform. The usefulness of the features is demonstrated for tasks of audio classification, speech versus non-speech discrimination, and speech recognition. The low-power nature of the classification system makes it ideal for use in applications such as hearing aids, hand-held devices, and surveillance through acoustic scene monitoring
389

A Study On Bandpassed Speech From The Point Of Intelligibility

Ganesh, Murthy C N S 10 1900 (has links)
Speech has been the subject of interest for a very long time. Even with so much advancement in the processing techniques and in the understanding of the source of speech, it is, even today, rather difficult to generate speech in the laboratory in all its aspects. A simple aspect like how the speech can retain its intelligibility even if it is distorted or band passed is not really understood. This thesis deals with one small feature of speech viz., the intelligibility of speech is retained even when it is bandpassed with a minimum bandwidth of around 1 KHz located any where on the speech spectrum of 0-4 KHz. Several experiments have been conducted by the earlier workers by passing speech through various distortors like differentiators, integrators and infinite peak clippers and it is found that the intelligibility is retained to a very large extent in the distorted speech. The integrator and the differentiator remove essentially a certain portion of the spectrum. Therefore, it is thought that the intelligibility of the speech is spread over the entire speech spectrum and that, the intelligibility of speech may not be impaired even when it is bandpassed with a minimum bandwidth and the band may be located any where in the speech spectrum. To test this idea and establish this feature if it exists, preliminary experiments have been conducted by passing the speech through different filters and it is found that the conjecture seems to be on the right line. To carry out systematic experiments on this an experimental set up has been designed and fabricated which consists of a microprocessor controlled speech recording, storing and speech playback system. Also, a personal computer is coupled to the microprocessor system to enable the storage and processing of the data. Thirty persons drawn from different walks of life like teachers, mechanics and students have been involved for collecting the samples and for recognition of the information of the processed speech. Even though the sentences like 'This is devices lab' are used to ascertain the effect of bandwidth on the intelligibility, for the purpose of analysis, vowels are used as the speech samples. The experiments essentially consist of recording words and sentences spoken by the 30 participants and these recorded speech samples are passed through different filters with different bandwidths and central frequencies. The filtered output is played back to the various listeners and observations regarding the intelligibility of the speech are noted. The listeners do not have any prior information about the content of the speech. It has been found that in almost all (95%) cases, the messages or words are intelligible for most of the listeners when the band width of the filter is about 1 KHz and this is independent of the location of the pass band in the spectrum of 0-4 KHz. To understand how this feature of speech arises, spectrums of vowels spoken by 30 people have using FFT algorithms on the digitized samples of the speech. It is felt that there is a cyclic behavior of the spectrum in all the samples. To make sure that the periodicity is present and also to arrive at the periodicity, a moving average procedure is employed to smoothen the spectrum. The smoothened spectrums of all the vowels indeed show a periodicity of about 1 KHz. When the periodicities are analysed the average value of the periodicities has been found to be 1038 Hz with a standard deviation of 19 Hz. In view of this it is thought that the acoustic source responsible for speech must have generated this periodic spectrum, which might have been modified periodically to imprint the intelligibility. If this is true, one can perhaps easily understand this feature of the speech viz., the intelligibility is retained in a bandpassed speech of bandwidth 1 K H z . the pass band located any where in the speech spectrum of 0-4 KHz. This thesis describing the experiments and the analysis of the speech has been presented in 5 chapters. Chapter 1 deals with the basics of speech and the processing tools used to analyse the speech signal. Chapter 2 presents the literature survey from where the present problem is tracked down. Chapter 3 describes the details of the structure and the fabrication of the experimental setup that has been used. In chapter 4, the detailed account of the way in which the experiments are conducted and the way in which the speech is analysed is given. In conclusion in chapter 5, the work is summarised and the future work needed to establish the mechanism of speech responsible for the feature of speech described in this thesis is suggested.
390

Steuerung sprechernormalisierender Abbildungen durch künstliche neuronale Netzwerke

Müller, Knut 01 November 2000 (has links)
No description available.

Page generated in 0.1138 seconds