Global ETD Search

211	Energy and nature based split multiple transform domain split vector quantization for speech coding Basta, Moheb Mokhtar 01 April 2003 (has links) No description available. Electrical and Computer Engineering Engineering Systems and Communications
212	Linear contractivity speech coding Zuniga, Roberto Benjamin 01 January 1993 (has links) No description available. Algorithms Coding theory Signal processing -- Digital techniques Speech processing systems Engineering
213	World of faces, words and actions : Observations and neural linkages in early life Handl, Andrea January 2016 (has links) From the start of their lives, infants and young children are surrounded by a tremendous amount of multimodal social information. One intriguing question in the study of early social cognition is how vital social information is detected and processed and how and when young infants begin to make sense of what they see and hear and learn to understand other people’s behavior. The overall aim of this thesis was to provide new insights to this exciting field. Investigating behavior and/or neural mechanisms in early life, the three different studies included in this thesis therefore strive to increase our understanding on perception and processing of social information. Study I used eye-tracking to examine infants´ observations of gaze in a third-party context. The results showed that 9-, 16- and 24-month-old infants differentiate between the body orientations of two individuals on the basis of static visual information. More particularly, they shift their gaze more often between them when the social partners face each other than when they are turned away from each other. Using ERP technique, Study II demonstrated that infants at the age of 4 to 5 months show signs of integrating visual and auditory information at a neural level. Further, direct gaze in combination with backwards-spoken words leads to earlier or enhanced neural processing in comparison to other gaze-word combinations. Study III, also an EEG investigation, found that children between 18 and 30 months of age show a desynchronization of the mu rhythm during both the observation and execution of object-directed actions. Also, the results suggest motor system activation when young children observe others’ mimed actions. To summarize, the findings reported in this thesis strengthen the idea that infants are sensitive to others´ gaze and that this may extend to third-party contexts. Also, gaze is processed together with other information, for instance words, even before infants are able to understand others’ vocabulary. Furthermore, the motor system in young children is active during both the observation and imitation of another person’s goal-directed actions. This is in line with findings in infants, children and adults, indicating that these processes are linked at neural level. Infant development Eye-tracking EEG third-party interactions crossmodal integration gaze processing speech processing mu desynchronization action perception imitation
214	The speech processing skills of children with cochlear implants Pieterse-Randall, Candice 12 1900 (has links) Thesis (MSL and HT (Interdisciplinary Health Sciences. Speech-Language and Hearing Therapy))--Stellenbosch University, 2008. / This study aims to describe the speech processing skills of three children ages 6;0, 6;10 and 8; 10, with cochlear implants. A psycholinguistic framework was used to profile each child’s strengths and weaknesses, using a single case study approach. Each child’s speech processing skills are described based on detailed psycholinguistically-orientated assessments. In addition, retrospective data from 1-2 years post-implantation were examined in the light of the psycholinguistic framework in order to describe each child’s development over time and in relation to time of implantation. Results showed each child to have a unique profile of strengths and weaknesses, and widely varying outcomes in terms of speech processing even though all three children had the same initial difficulty (congenital bilateral hearing loss). Links between speech processing and other aspects of development as well as contextual factors are discussed in relation to outcomes for each child. The case studies contribute to knowledge of speech processing skills in children with cochlear implants, and have clinical implications for those who work with children with cochlear implants and their families. Cochlear implants in children Speech processing Speech perception Speech production Psycholinguistic profiling Hearing age Dissertations -- Speech therapy Theses -- Speech therapy
215	USB telephony interface device for speech recognition applications Muller, J. J. 12 1900 (has links) Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2005. / Automatic speech recognition (ASR) systems are an attractive means for companies to deliver value added services with which to improve customer satisfaction. Such ASR systems require a telephony interface to connect the speech recognition application to the telephone system. Commercially available telephony interfaces are usually operating system specific, and therefore hardware device driver issues complicate the development of software applications for different platforms that require telephony access. The drivers and application programming interface (API) for telephony interfaces are often available only for the Microsoft Windows operating systems. This poses a problem, as many of the software tools used for speech recognition research and development operate only on Linux-based computers. These interfaces are also typically in PCI/ISA card format, which hinders physical portability of the device to another computer. A simple, cheaper and easier to use USB telephony interface device, offering cross-platform portability, was developed and presented, together with the necessary API. Dissertations -- Electronic engineering Theses -- Electronic engineering Telephone systems Speech processing systems Automatic speech recognition Electrical and Electronic Engineering
216	Language identification using Gaussian mixture models Nkadimeng, Calvin 03 1900 (has links) Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2010. / ENGLISH ABSTRACT: The importance of Language Identification for African languages is seeing a dramatic increase due to the development of telecommunication infrastructure and, as a result, an increase in volumes of data and speech traffic in public networks. By automatically processing the raw speech data the vital assistance given to people in distress can be speeded up, by referring their calls to a person knowledgeable in that language. To this effect a speech corpus was developed and various algorithms were implemented and tested on raw telephone speech data. These algorithms entailed data preparation, signal processing, and statistical analysis aimed at discriminating between languages. The statistical model of Gaussian Mixture Models (GMMs) were chosen for this research due to their ability to represent an entire language with a single stochastic model that does not require phonetic transcription. Language Identification for African languages using GMMs is feasible, although there are some few challenges like proper classification and accurate study into the relationship of langauges that need to be overcome. Other methods that make use of phonetically transcribed data need to be explored and tested with the new corpus for the research to be more rigorous. / AFRIKAANSE OPSOMMING: Die belang van die Taal identifiseer vir Afrika-tale is sien ’n dramatiese toename te danke aan die ontwikkeling van telekommunikasie-infrastruktuur en as gevolg ’n toename in volumes van data en spraak verkeer in die openbaar netwerke.Deur outomaties verwerking van die ruwe toespraak gegee die noodsaaklike hulp verleen aan mense in nood kan word vinniger-up ”, deur te verwys hul oproepe na ’n persoon ingelichte in daardie taal. Tot hierdie effek van ’n toespraak corpus het ontwikkel en die verskillende algoritmes is gemplementeer en getoets op die ruwe telefoon toespraak gegee.Hierdie algoritmes behels die data voorbereiding, seinverwerking, en statistiese analise wat gerig is op onderskei tussen tale.Die statistiese model van Gauss Mengsel Modelle (GGM) was gekies is vir hierdie navorsing as gevolg van hul vermo te verteenwoordig ’n hele taal met’ n enkele stogastiese model wat nodig nie fonetiese tanscription nie. Taal identifiseer vir die Afrikatale gebruik GGM haalbaar is, alhoewel daar enkele paar uitdagings soos behoorlike klassifikasie en akkurate ondersoek na die verhouding van TALE wat moet oorkom moet word.Ander metodes wat gebruik maak van foneties getranskribeerde data nodig om ondersoek te word en getoets word met die nuwe corpus vir die ondersoek te word strenger. Automatic language identification Gaussian mixture models Southern African languages Speech recognition Dissertations -- Electronic engineering Theses -- Electronic engineering Speech processing systems
217	Analyse et reconnaissance des émotions lors de conversations de centres d'appels / Automatic emotions recognition during call center conversations Vaudable, Christophe 11 July 2012 (has links) La reconnaissance automatique des émotions dans la parole est un sujet de recherche relativement récent dans le domaine du traitement de la parole, puisqu’il est abordé depuis une dizaine d’années environs. Ce sujet fait de nos jours l’objet d’une grande attention, non seulement dans le monde académique mais aussi dans l’industrie, grâce à l’augmentation des performances et de la fiabilité des systèmes. Les premiers travaux étaient fondés sur des donnés jouées par des acteurs, et donc non spontanées. Même aujourd’hui, la plupart des études exploitent des séquences pré-segmentées d’un locuteur unique et non une communication spontanée entre plusieurs locuteurs. Cette méthodologie rend les travaux effectués difficilement généralisables pour des informations collectées de manière naturelle.Les travaux entrepris dans cette thèse se basent sur des conversations de centre d’appels, enregistrés en grande quantité et mettant en jeu au minimum 2 locuteurs humains (un client et un agent commercial) lors de chaque dialogue. Notre but est la détection, via l’expression émotionnelle, de la satisfaction client. Dans une première partie nous présentons les scores pouvant être obtenus sur nos données à partir de modèles se basant uniquement sur des indices acoustiques ou lexicaux. Nous montrons que pour obtenir des résultats satisfaisants une approche ne prenant en compte qu’un seul de ces types d’indices ne suffit pas. Nous proposons pour palier ce problème une étude sur la fusion d’indices de types acoustiques, lexicaux et syntaxico-sémantiques. Nous montrons que l’emploi de cette combinaison d’indices nous permet d’obtenir des gains par rapport aux modèles acoustiques même dans les cas ou nous nous basons sur une approche sans pré-traitements manuels (segmentation automatique des conversations, utilisation de transcriptions fournies par un système de reconnaissance de la parole). Dans une seconde partie nous remarquons que même si les modèles hybrides acoustiques/linguistiques nous permettent d’obtenir des gains intéressants la quantité de données utilisées dans nos modèles de détection est un problème lorsque nous testons nos méthodes sur des données nouvelles et très variées (49h issus de la base de données de conversations). Pour remédier à ce problème nous proposons une méthode d’enrichissement de notre corpus d’apprentissage. Nous sélectionnons ainsi, de manière automatique, de nouvelles données qui seront intégrées dans notre corpus d’apprentissage. Ces ajouts nous permettent de doubler la taille de notre ensemble d’apprentissage et d’obtenir des gains par rapport aux modèles de départ. Enfin, dans une dernière partie nous choisissons d’évaluées nos méthodes non plus sur des portions de dialogues comme cela est le cas dans la plupart des études, mais sur des conversations complètes. Nous utilisons pour cela les modèles issus des études précédentes (modèles issus de la fusion d’indices, des méthodes d’enrichissement automatique) et ajoutons 2 groupes d’indices supplémentaires : i) Des indices « structurels » prenant en compte des informations comme la durée de la conversation, le temps de parole de chaque type de locuteurs. ii) des indices « dialogiques » comprenant des informations comme le thème de la conversation ainsi qu’un nouveau concept que nous nommons « implication affective ». Celui-ci a pour but de modéliser l’impact de la production émotionnelle du locuteur courant sur le ou les autres participants de la conversation. Nous montrons que lorsque nous combinons l’ensemble de ces informations nous arrivons à obtenir des résultats proches de ceux d’un humain lorsqu’il s’agit de déterminer le caractère positif ou négatif d’une conversation / Automatic emotion recognition in speech is a relatively recent research subject in the field of natural language processing considering that the subject has been proposed for the first time about ten years ago. This subject is nowadays the object of much attention, not only in academia but also in industry, thank to the increased models performance and system reliability. The first studies were based on acted data and non spontaneous speech. Up until now, most experiments carried out by the research community on emotions were realized pre-segmented sequences and with a unique speaker and not on spontaneous speech with several speaker. With this methodology the models built on acted data are hardly usable on data collected in natural context The studies we present in this thesis are based on call center’s conversation with about 1620 hours of dialogs and with at least two human speakers (a commercial agent and a client) for each conversation. Our aim is the detection, via emotional expression, of the client satisfaction.In the first part of this work we present the results we obtained from models using only acoustic or linguistic features for emotion detection. We show that to obtain correct results an approach taking into account only one of these features type is not enough. To overcome this problem we propose the combination of three type of features (acoustic, lexical and semantic). We show that the use of models with features fusion allows higher score for the recognition step in all case compared to the model using only acoustic features. This gain is also obtained if we use an approach without manual pre-processing (automatic segmentation of conversation, transcriptions based on automatic speech recognition).In the second part of our study we notice that even if models based on features combination are relevant for emotion detection the amount of data we use in our training set is too small if we used it on large amount of data test. To overcome this problem we propose a new method to automatically complete training set with new data. We base this selection on linguistic and acoustic criterion. These new information are issued from 100 hours of data. These additions allow us to double the amount of data in our training set and increase emotion recognition rate compare to the non-enrich models. Finally, in the last part we choose to evaluate our method on entire conversation and not only on conversations turns as in most studies. To define the classification of a dialog we use models built on the previous steps of this works and we add two new features group:i) structural features including information like the length of the conversation, the proportion of speech for each speaker in the dialogii) dialogic features including informations like the topic of a conversation and a new concept we call “affective implication”. The aim of the affective implication is to represent the impact of the current speaker’s emotional production on the other speakers. We show that if we combined all information we can obtain results close to those of humans Détection d'émotions Apprentissage semi supervisé Traitement automatique des langues Reconnaissance de la parole Emotions detection Semi supervised learning Natural language processing Speech processing
218	Sensitivity analysis of blind separation of speech mixtures Unknown Date (has links) Blind source separation (BSS) refers to a class of methods by which multiple sensor signals are combined with the aim of estimating the original source signals. Independent component analysis (ICA) is one such method that effectively resolves static linear combinations of independent non-Gaussian distributions. We propose a method that can track variations in the mixing system by seeking a compromise between adaptive and block methods by using mini-batches. The resulting permutation indeterminacy is resolved based on the correlation continuity principle. Methods employing higher order cumulants in the separation criterion are susceptible to outliers in the finite sample case. We propose a robust method based on low-order non-integer moments by exploiting the Laplacian model of speech signals. We study separation methods for even (over)-determined linear convolutive mixtures in the frequency domain based on joint diagonalization of matrices employing time-varying second order statistics. We investigate the sources affecting the sensitivity of the solution under the finite sample case such as the set size, overlap amount and cross-spectrum estimation methods. / by Savaskan Bulek. / Thesis (Ph.D.)--Florida Atlantic University, 2010. / Includes bibliography. / Electronic reproduction. Boca Raton, Fla., 2010. Mode of access: World Wide Web. Signal processing--Digital techniques Neural networks (Computer science) Automatic speech recognition Speech processing systems
219	Voice flow control in integrated packet networks Hayden, Howard Paul. January 1981 (has links) Thesis (Elec.E.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1981. / Includes bibliographical references. / by Howard Paul Hayden. / Thesis (Elec.E.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1981. Packet switching (Data transmission) Speech processing systems. Signal processingDigital techniques. Digital communications. Algorithms.
220	Spectral refinement to speech enhancement Unknown Date (has links) The goal of a speech enhancement algorithm is to remove noise and recover the original signal with as little distortion and residual noise as possible. Most successful real-time algorithms thereof have done in the frequency domain where the frequency amplitude of clean speech is estimated per short-time frame of the noisy signal. The state of-the-art short-time spectral amplitude estimator algorithms estimate the clean spectral amplitude in terms of the power spectral density (PSD) function of the noisy signal. The PSD has to be computed from a large ensemble of signal realizations. However, in practice, it may only be estimated from a finite-length sample of a single realization of the signal. Estimation errors introduced by these limitations deviate the solution from the optimal. Various spectral estimation techniques, many with added spectral smoothing, have been investigated for decades to reduce the estimation errors. These algorithms do not address significantly issue on quality of speech as perceived by a human. This dissertation presents analysis and techniques that offer spectral refinements toward speech enhancement. We present an analytical framework of the effect of spectral estimate variance on the performance of speech enhancement. We use the variance quality factor (VQF) as a quantitative measure of estimated spectra. We show that reducing the spectral estimator VQF reduces significantly the VQF of the enhanced speech. The Autoregressive Multitaper (ARMT) spectral estimate is proposed as a low VQF spectral estimator for use in speech enhancement algorithms. An innovative method of incorporating a speech production model using multiband excitation is also presented as a technique to emphasize the harmonic components of the glottal speech input. / The preconditioning of the noisy estimates by exploiting other avenues of information, such as pitch estimation and the speech production model, effectively increases the localized narrow-band signal-to noise ratio (SNR) of the noisy signal, which is subsequently denoised by the amplitude gain. Combined with voicing structure enhancement, the ARMT spectral estimate delivers enhanced speech with sound clarity desirable to human listeners. The resulting improvements in enhanced speech are observed to be significant with both Objective and Subjective measurement. / by Werayuth Charoenruengkit. / Vita. / Thesis (Ph.D.)--Florida Atlantic University, 2009. / Includes bibliography. / Electronic reproduction. Boca Raton, Fla., 2009. Mode of access: World Wide Web. Spectral theory (Mathematics) Noise control Fuzzy algorithms

Search results