Spelling suggestions: "subject:"[een] SPEECH PROCESSING"" "subject:"[enn] SPEECH PROCESSING""
211 |
Energy and nature based split multiple transform domain split vector quantization for speech codingBasta, Moheb Mokhtar 01 April 2003 (has links)
No description available.
|
212 |
Linear contractivity speech codingZuniga, Roberto Benjamin 01 January 1993 (has links)
No description available.
|
213 |
World of faces, words and actions : Observations and neural linkages in early lifeHandl, Andrea January 2016 (has links)
From the start of their lives, infants and young children are surrounded by a tremendous amount of multimodal social information. One intriguing question in the study of early social cognition is how vital social information is detected and processed and how and when young infants begin to make sense of what they see and hear and learn to understand other people’s behavior. The overall aim of this thesis was to provide new insights to this exciting field. Investigating behavior and/or neural mechanisms in early life, the three different studies included in this thesis therefore strive to increase our understanding on perception and processing of social information. Study I used eye-tracking to examine infants´ observations of gaze in a third-party context. The results showed that 9-, 16- and 24-month-old infants differentiate between the body orientations of two individuals on the basis of static visual information. More particularly, they shift their gaze more often between them when the social partners face each other than when they are turned away from each other. Using ERP technique, Study II demonstrated that infants at the age of 4 to 5 months show signs of integrating visual and auditory information at a neural level. Further, direct gaze in combination with backwards-spoken words leads to earlier or enhanced neural processing in comparison to other gaze-word combinations. Study III, also an EEG investigation, found that children between 18 and 30 months of age show a desynchronization of the mu rhythm during both the observation and execution of object-directed actions. Also, the results suggest motor system activation when young children observe others’ mimed actions. To summarize, the findings reported in this thesis strengthen the idea that infants are sensitive to others´ gaze and that this may extend to third-party contexts. Also, gaze is processed together with other information, for instance words, even before infants are able to understand others’ vocabulary. Furthermore, the motor system in young children is active during both the observation and imitation of another person’s goal-directed actions. This is in line with findings in infants, children and adults, indicating that these processes are linked at neural level.
|
214 |
The speech processing skills of children with cochlear implantsPieterse-Randall, Candice 12 1900 (has links)
Thesis (MSL and HT (Interdisciplinary Health Sciences. Speech-Language and Hearing Therapy))--Stellenbosch University, 2008. / This study aims to describe the speech processing skills of three children ages 6;0, 6;10 and 8;
10, with cochlear implants. A psycholinguistic framework was used to profile each child’s
strengths and weaknesses, using a single case study approach. Each child’s speech processing
skills are described based on detailed psycholinguistically-orientated assessments. In addition,
retrospective data from 1-2 years post-implantation were examined in the light of the
psycholinguistic framework in order to describe each child’s development over time and in
relation to time of implantation. Results showed each child to have a unique profile of strengths
and weaknesses, and widely varying outcomes in terms of speech processing even though all
three children had the same initial difficulty (congenital bilateral hearing loss). Links between
speech processing and other aspects of development as well as contextual factors are discussed
in relation to outcomes for each child. The case studies contribute to knowledge of speech
processing skills in children with cochlear implants, and have clinical implications for those
who work with children with cochlear implants and their families.
|
215 |
USB telephony interface device for speech recognition applicationsMuller, J. J. 12 1900 (has links)
Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2005. / Automatic speech recognition (ASR) systems are an attractive means for companies to deliver value added
services with which to improve customer satisfaction. Such ASR systems require a telephony interface to
connect the speech recognition application to the telephone system. Commercially available telephony
interfaces are usually operating system specific, and therefore hardware device driver issues complicate the
development of software applications for different platforms that require telephony access. The drivers and
application programming interface (API) for telephony interfaces are often available only for the Microsoft
Windows operating systems. This poses a problem, as many of the software tools used for speech recognition
research and development operate only on Linux-based computers. These interfaces are also typically in
PCI/ISA card format, which hinders physical portability of the device to another computer. A simple, cheaper
and easier to use USB telephony interface device, offering cross-platform portability, was developed and
presented, together with the necessary API.
|
216 |
Language identification using Gaussian mixture modelsNkadimeng, Calvin 03 1900 (has links)
Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2010. / ENGLISH ABSTRACT: The importance of Language Identification for African languages is seeing a
dramatic increase due to the development of telecommunication infrastructure
and, as a result, an increase in volumes of data and speech traffic in public
networks. By automatically processing the raw speech data the vital assistance
given to people in distress can be speeded up, by referring their calls to a person
knowledgeable in that language.
To this effect a speech corpus was developed and various algorithms were implemented
and tested on raw telephone speech data. These algorithms entailed
data preparation, signal processing, and statistical analysis aimed at discriminating
between languages. The statistical model of Gaussian Mixture Models
(GMMs) were chosen for this research due to their ability to represent an entire
language with a single stochastic model that does not require phonetic transcription.
Language Identification for African languages using GMMs is feasible, although
there are some few challenges like proper classification and accurate
study into the relationship of langauges that need to be overcome. Other methods
that make use of phonetically transcribed data need to be explored and
tested with the new corpus for the research to be more rigorous. / AFRIKAANSE OPSOMMING: Die belang van die Taal identifiseer vir Afrika-tale is sien ’n dramatiese toename
te danke aan die ontwikkeling van telekommunikasie-infrastruktuur en as gevolg
’n toename in volumes van data en spraak verkeer in die openbaar netwerke.Deur
outomaties verwerking van die ruwe toespraak gegee die noodsaaklike hulp verleen
aan mense in nood kan word vinniger-up ”, deur te verwys hul oproepe na
’n persoon ingelichte in daardie taal.
Tot hierdie effek van ’n toespraak corpus het ontwikkel en die verskillende algoritmes
is gemplementeer en getoets op die ruwe telefoon toespraak gegee.Hierdie
algoritmes behels die data voorbereiding, seinverwerking, en statistiese analise
wat gerig is op onderskei tussen tale.Die statistiese model van Gauss Mengsel
Modelle (GGM) was gekies is vir hierdie navorsing as gevolg van hul vermo
te verteenwoordig ’n hele taal met’ n enkele stogastiese model wat nodig nie
fonetiese tanscription nie.
Taal identifiseer vir die Afrikatale gebruik GGM haalbaar is, alhoewel daar
enkele paar uitdagings soos behoorlike klassifikasie en akkurate ondersoek na die
verhouding van TALE wat moet oorkom moet word.Ander metodes wat gebruik
maak van foneties getranskribeerde data nodig om ondersoek te word en getoets
word met die nuwe corpus vir die ondersoek te word strenger.
|
217 |
Analyse et reconnaissance des émotions lors de conversations de centres d'appels / Automatic emotions recognition during call center conversationsVaudable, Christophe 11 July 2012 (has links)
La reconnaissance automatique des émotions dans la parole est un sujet de recherche relativement récent dans le domaine du traitement de la parole, puisqu’il est abordé depuis une dizaine d’années environs. Ce sujet fait de nos jours l’objet d’une grande attention, non seulement dans le monde académique mais aussi dans l’industrie, grâce à l’augmentation des performances et de la fiabilité des systèmes. Les premiers travaux étaient fondés sur des donnés jouées par des acteurs, et donc non spontanées. Même aujourd’hui, la plupart des études exploitent des séquences pré-segmentées d’un locuteur unique et non une communication spontanée entre plusieurs locuteurs. Cette méthodologie rend les travaux effectués difficilement généralisables pour des informations collectées de manière naturelle.Les travaux entrepris dans cette thèse se basent sur des conversations de centre d’appels, enregistrés en grande quantité et mettant en jeu au minimum 2 locuteurs humains (un client et un agent commercial) lors de chaque dialogue. Notre but est la détection, via l’expression émotionnelle, de la satisfaction client. Dans une première partie nous présentons les scores pouvant être obtenus sur nos données à partir de modèles se basant uniquement sur des indices acoustiques ou lexicaux. Nous montrons que pour obtenir des résultats satisfaisants une approche ne prenant en compte qu’un seul de ces types d’indices ne suffit pas. Nous proposons pour palier ce problème une étude sur la fusion d’indices de types acoustiques, lexicaux et syntaxico-sémantiques. Nous montrons que l’emploi de cette combinaison d’indices nous permet d’obtenir des gains par rapport aux modèles acoustiques même dans les cas ou nous nous basons sur une approche sans pré-traitements manuels (segmentation automatique des conversations, utilisation de transcriptions fournies par un système de reconnaissance de la parole). Dans une seconde partie nous remarquons que même si les modèles hybrides acoustiques/linguistiques nous permettent d’obtenir des gains intéressants la quantité de données utilisées dans nos modèles de détection est un problème lorsque nous testons nos méthodes sur des données nouvelles et très variées (49h issus de la base de données de conversations). Pour remédier à ce problème nous proposons une méthode d’enrichissement de notre corpus d’apprentissage. Nous sélectionnons ainsi, de manière automatique, de nouvelles données qui seront intégrées dans notre corpus d’apprentissage. Ces ajouts nous permettent de doubler la taille de notre ensemble d’apprentissage et d’obtenir des gains par rapport aux modèles de départ. Enfin, dans une dernière partie nous choisissons d’évaluées nos méthodes non plus sur des portions de dialogues comme cela est le cas dans la plupart des études, mais sur des conversations complètes. Nous utilisons pour cela les modèles issus des études précédentes (modèles issus de la fusion d’indices, des méthodes d’enrichissement automatique) et ajoutons 2 groupes d’indices supplémentaires : i) Des indices « structurels » prenant en compte des informations comme la durée de la conversation, le temps de parole de chaque type de locuteurs. ii) des indices « dialogiques » comprenant des informations comme le thème de la conversation ainsi qu’un nouveau concept que nous nommons « implication affective ». Celui-ci a pour but de modéliser l’impact de la production émotionnelle du locuteur courant sur le ou les autres participants de la conversation. Nous montrons que lorsque nous combinons l’ensemble de ces informations nous arrivons à obtenir des résultats proches de ceux d’un humain lorsqu’il s’agit de déterminer le caractère positif ou négatif d’une conversation / Automatic emotion recognition in speech is a relatively recent research subject in the field of natural language processing considering that the subject has been proposed for the first time about ten years ago. This subject is nowadays the object of much attention, not only in academia but also in industry, thank to the increased models performance and system reliability. The first studies were based on acted data and non spontaneous speech. Up until now, most experiments carried out by the research community on emotions were realized pre-segmented sequences and with a unique speaker and not on spontaneous speech with several speaker. With this methodology the models built on acted data are hardly usable on data collected in natural context The studies we present in this thesis are based on call center’s conversation with about 1620 hours of dialogs and with at least two human speakers (a commercial agent and a client) for each conversation. Our aim is the detection, via emotional expression, of the client satisfaction.In the first part of this work we present the results we obtained from models using only acoustic or linguistic features for emotion detection. We show that to obtain correct results an approach taking into account only one of these features type is not enough. To overcome this problem we propose the combination of three type of features (acoustic, lexical and semantic). We show that the use of models with features fusion allows higher score for the recognition step in all case compared to the model using only acoustic features. This gain is also obtained if we use an approach without manual pre-processing (automatic segmentation of conversation, transcriptions based on automatic speech recognition).In the second part of our study we notice that even if models based on features combination are relevant for emotion detection the amount of data we use in our training set is too small if we used it on large amount of data test. To overcome this problem we propose a new method to automatically complete training set with new data. We base this selection on linguistic and acoustic criterion. These new information are issued from 100 hours of data. These additions allow us to double the amount of data in our training set and increase emotion recognition rate compare to the non-enrich models. Finally, in the last part we choose to evaluate our method on entire conversation and not only on conversations turns as in most studies. To define the classification of a dialog we use models built on the previous steps of this works and we add two new features group:i) structural features including information like the length of the conversation, the proportion of speech for each speaker in the dialogii) dialogic features including informations like the topic of a conversation and a new concept we call “affective implication”. The aim of the affective implication is to represent the impact of the current speaker’s emotional production on the other speakers. We show that if we combined all information we can obtain results close to those of humans
|
218 |
Sensitivity analysis of blind separation of speech mixturesUnknown Date (has links)
Blind source separation (BSS) refers to a class of methods by which multiple sensor signals are combined with the aim of estimating the original source signals. Independent component analysis (ICA) is one such method that effectively resolves static linear combinations of independent non-Gaussian distributions. We propose a method that can track variations in the mixing system by seeking a compromise between adaptive and block methods by using mini-batches. The resulting permutation indeterminacy is resolved based on the correlation continuity principle. Methods employing higher order cumulants in the separation criterion are susceptible to outliers in the finite sample case. We propose a robust method based on low-order non-integer moments by exploiting the Laplacian model of speech signals. We study separation methods for even (over)-determined linear convolutive mixtures in the frequency domain based on joint diagonalization of matrices employing time-varying second order statistics. We investigate the sources affecting the sensitivity of the solution under the finite sample case such as the set size, overlap amount and cross-spectrum estimation methods. / by Savaskan Bulek. / Thesis (Ph.D.)--Florida Atlantic University, 2010. / Includes bibliography. / Electronic reproduction. Boca Raton, Fla., 2010. Mode of access: World Wide Web.
|
219 |
Voice flow control in integrated packet networksHayden, Howard Paul. January 1981 (has links)
Thesis (Elec.E.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1981. / Includes bibliographical references. / by Howard Paul Hayden. / Thesis (Elec.E.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1981.
|
220 |
Spectral refinement to speech enhancementUnknown Date (has links)
The goal of a speech enhancement algorithm is to remove noise and recover the original signal with as little distortion and residual noise as possible. Most successful real-time algorithms thereof have done in the frequency domain where the frequency amplitude of clean speech is estimated per short-time frame of the noisy signal. The state of-the-art short-time spectral amplitude estimator algorithms estimate the clean spectral amplitude in terms of the power spectral density (PSD) function of the noisy signal. The PSD has to be computed from a large ensemble of signal realizations. However, in practice, it may only be estimated from a finite-length sample of a single realization of the signal. Estimation errors introduced by these limitations deviate the solution from the optimal. Various spectral estimation techniques, many with added spectral smoothing, have been investigated for decades to reduce the estimation errors. These algorithms do not address significantly issue on quality of speech as perceived by a human. This dissertation presents analysis and techniques that offer spectral refinements toward speech enhancement. We present an analytical framework of the effect of spectral estimate variance on the performance of speech enhancement. We use the variance quality factor (VQF) as a quantitative measure of estimated spectra. We show that reducing the spectral estimator VQF reduces significantly the VQF of the enhanced speech. The Autoregressive Multitaper (ARMT) spectral estimate is proposed as a low VQF spectral estimator for use in speech enhancement algorithms. An innovative method of incorporating a speech production model using multiband excitation is also presented as a technique to emphasize the harmonic components of the glottal speech input. / The preconditioning of the noisy estimates by exploiting other avenues of information, such as pitch estimation and the speech production model, effectively increases the localized narrow-band signal-to noise ratio (SNR) of the noisy signal, which is subsequently denoised by the amplitude gain. Combined with voicing structure enhancement, the ARMT spectral estimate delivers enhanced speech with sound clarity desirable to human listeners. The resulting improvements in enhanced speech are observed to be significant with both Objective and Subjective measurement. / by Werayuth Charoenruengkit. / Vita. / Thesis (Ph.D.)--Florida Atlantic University, 2009. / Includes bibliography. / Electronic reproduction. Boca Raton, Fla., 2009. Mode of access: World Wide Web.
|
Page generated in 0.056 seconds