Global ETD Search

1	Contribution de l'analyse du signal vocal à la détection de l'état de somnolence et du niveau de charge mentale / Contribution of the analysis of speech signal to the detection of drowsiness and mental load level Boyer, Stanislas 20 June 2016 (has links) Les exigences opérationnelles du métier de pilote sont susceptibles d'engendrer de la somnolence et des niveaux de charge mentale inadéquats (i.e., trop faible ou trop élevé) au cours des vols. Les dettes de sommeil et les perturbations circadiennes liées à divers facteurs (e.g., longues périodes de services, horaires de travail irrégulier, etc.) demandent aux pilotes de repousser sans cesse leurs limites biologiques. Par ailleurs, la charge de travail mental des pilotes présente de fortes variations au cours d'un vol : élevée au cours des phases critiques (i.e., décollage et atterrissage), elle devient très réduite pendant les phases de croisière. Lorsque la charge mentale devient trop élevée ou, à l'inverse, trop faible, les performances se dégradent et des erreurs de pilotage peuvent apparaître. La mise en oeuvre de méthodes de détection de l'état de somnolence et du niveau de charge mentale en temps quasi réel est un défi majeur pour le suivi et le contrôle de l'activité de pilotage. L'objectif de la thèse est de déterminer si la voix humaine peut permettre de détecter d'une part, l'état de somnolence et d'autre part, le niveau de charge mentale d'un individu. Dans une première étude, la voix de participants a été enregistrée lors d'une tâche de lecture avant et après une nuit de privation totale de sommeil (PTS). Les variations de l'état de somnolence consécutives à la PTS ont été évaluées au moyen de mesures auto-évaluatives et électrophysiologiques (ÉlectroEncéphaloGraphie [EEG] et Potentiels Évoqués [PEs]). Les résultats ont montré une variation significative après la PTS de plusieurs paramètres acoustiques liés : (a) à l'amplitude des impulsions glottiques (fréquence de modulation d'amplitude), (b) à la forme du signal acoustique (longueur euclidienne du signal et ses caractéristiques associées) et (c) au spectre du signal des voyelles (rapport harmonique sur bruit, fréquence du second formant, coefficient d'asymétrie, centre de gravité spectral, différences d'énergie, pente spectrale et coefficients cepstraux à échelle Mel). La plupart des caractéristiques spectrales ont montré une sensibilité différente à la privation de sommeil en fonction du type de voyelles. Des corrélations significatives ont été mises en évidence entre plusieurs paramètres acoustiques et plusieurs indicateurs objectifs (EEG et PEs) de l'état de somnolence. Dans une seconde étude, le signal vocal a été enregistré durant une tâche de rappel de listes de mots. La difficulté de la tâche était manipulée en faisant varier le nombre de mots dans chaque liste (i.e., entre un et sept, correspondant à sept conditions de charge mentale). Le diamètre pupillaire - qui est un indicateur objectif pertinent du niveau de charge mentale - a été mesuré simultanément avec l'enregistrement de la voix afin d'attester de la variation du niveau de charge mentale durant la tâche expérimentale. Les résultats ont montré que des paramètres acoustiques classiques (fréquence fondamentale et son écart type, shimmer, nombre de périodes et rapport harmonique sur bruit) et originaux (fréquence de modulation d'amplitude et variations à court-terme de la longueur euclidienne du signal) ont été particulièrement sensibles aux variations de la charge mentale. Les variations de ces paramètres acoustiques étaient corrélées à celles du diamètre pupillaire. L'ensemble des résultats suggère que les paramètres acoustiques de la voix humaine identifiés lors des expérimentations pourraient représenter des indicateurs pertinents pour la détection de l'état de somnolence et du niveau de charge mentale d'un individu. Les résultats ouvrent de nombreuses perspectives de recherche et d'applications dans le domaine de la sécurité des transports, notamment dans le secteur aéronautique. / Operational requirements of aircraft pilots may cause drowsiness and inadequate mental load levels (i.e., too low or too high) during flights. Sleep debts and circadian disruptions linked to various factors (e.g., long working periods, irregular work schedules, etc.) require pilots to challenge their biological limits. Moreover, pilots' mental workload exhibits strong fluctuations during flights: higher during critical phases (i.e., takeoff and landing), it becomes very low during cruising phases. When the mental load becomes too high or, conversely, too low, performance decreases and flight errors may manifest. Implementation of detection methods of drowsiness and mental load levels in near real time is a major challenge for monitoring and controlling flight activity. The aim of this thesis is therefore to determine if the human voice can serve to detect on one hand the drowsiness and on the other hand the mental load level of an individual. In a first study, the voice of participants was recorded during a reading task before and after a night of total sleep deprivation (TSD). Drowsiness variations linked to TSD were assessed using self-evaluative and electrophysiological measures (ElectroEncephaloGraphy [EEG] and Evoked Potentials [EPs]). Results showed significant variations after the TSD in many acoustic features related to: (a) the amplitude of the glottal pulses (amplitude modulation frequency), (b) the shape of the acoustic wave (Euclidean length of the signal and its associated features) and (3) the spectrum of the vowel signal (harmonic-to-noise ratio, second formant frequency, skewness, spectral center of gravity, energy differences, spectral tilt and Mel-frequency cepstral coefficients). Most spectral features showed different sensitivity to sleep deprivation depending on the vowel type. Significant correlations were found between several acoustic features and several objective indicators (EEG and PEs) of drowsiness. In a second study, voices were recorded during a task featuring word-list recall. The difficulty of the task was manipulated by varying the number of words in each list (i.e., between one and seven, corresponding to seven mental load conditions). Evoked pupillary response - known to be a useful proxy of mental load - was recorded simultaneously with speech to attest variations in mental load level during the experimental task. Results showed that classical features (fundamental frequency and its standard deviation, shimmer, number of periods and harmonic-to-noise ratio) and original features (amplitude modulation frequency and short-term variation in digital amplitude length) were particularly sensitive to variations in mental load. Variations in these acoustic features were correlated to those of the pupil size. Results suggest that the acoustic features of the human voice identified during these experiments could represent relevant indicators for the detection of drowsiness and mental load levels of an individual. Findings open up many research and applications perspectives in the field of transport safety, particularly in the aeronautical sector. Voix Paramètres acoustiques Somnolence Charge mentale Sécurité aérienne Pilotes d'avion Voice Acoustic Features Drowsiness Mental Load Aviation Safety Aircraft Pilots
2	Urban Seismic Event Detection: A Non-Invasive Deep Learning Approach Parth Sagar Hasabnis (18424092) 23 April 2024 (has links) <p dir="ltr">As cameras increasingly populate urban environments for surveillance, the threat of data breaches and losses escalates as well. The rapid advancements in generative Artificial Intelligence have greatly simplified the replication of individuals’ appearances from video footage. This capability poses a grave risk as malicious entities can exploit it for various nefarious purposes, including identity theft and tracking individuals’ daily activities to facilitate theft or burglary.</p><p dir="ltr">To reduce reliance on video surveillance systems, this study introduces Urban Seismic Event Detection (USED), a deep learning-based technique aimed at extracting information about urban seismic events. Our approach involves synthesizing training data through a small batch of manually labelled field data. Additionally, we explore the utilization of unlabeled field data in training through semi-supervised learning, with the implementation of a mean-teacher approach. We also introduce pre-processing and post-processing techniques tailored to seismic data. Subsequently, we evaluate the trained models using synthetic, real, and unlabeled data and compare the results with recent statistical methods. Finally, we discuss the insights gained and the limitations encountered in our approach, while also proposing potential avenues for future research.</p> Seismology and seismic exploration Signal processing Deep learning deep acoustic features seismic event identification semi supervised learning synthetic data generated
3	Analyse acoustique de la voix émotionnelle de locuteurs lors d’une interaction humain-robot / Acoustic analysis of speakers emotional voices during a human-robot interaction Tahon, Marie 15 November 2012 (has links) Mes travaux de thèse s'intéressent à la voix émotionnelle dans un contexte d'interaction humain-robot. Dans une interaction réaliste, nous définissons au moins quatre grands types de variabilités : l'environnement (salle, microphone); le locuteur, ses caractéristiques physiques (genre, âge, type de voix) et sa personnalité; ses états émotionnels; et enfin le type d'interaction (jeu, situation d'urgence ou de vie quotidienne). A partir de signaux audio collectés dans différentes conditions, nous avons cherché, grâce à des descripteurs acoustiques, à imbriquer la caractérisation d'un locuteur et de son état émotionnel en prenant en compte ces variabilités.Déterminer quels descripteurs sont essentiels et quels sont ceux à éviter est un défi complexe puisqu'il nécessite de travailler sur un grand nombre de variabilités et donc d'avoir à sa disposition des corpus riches et variés. Les principaux résultats portent à la fois sur la collecte et l'annotation de corpus émotionnels réalistes avec des locuteurs variés (enfants, adultes, personnes âgées), dans plusieurs environnements, et sur la robustesse de descripteurs acoustiques suivant ces quatre variabilités. Deux résultats intéressants découlent de cette analyse acoustique: la caractérisation sonore d'un corpus et l'établissement d'une liste "noire" de descripteurs très variables. Les émotions ne sont qu'une partie des indices paralinguistiques supportés par le signal audio, la personnalité et le stress dans la voix ont également été étudiés. Nous avons également mis en oeuvre un module de reconnaissance automatique des émotions et de caractérisation du locuteur qui a été testé au cours d'interactions humain-robot réalistes. Une réflexion éthique a été menée sur ces travaux. / This thesis deals with emotional voices during a human-robot interaction. In a natural interaction, we define at least, four kinds of variabilities: environment (room, microphone); speaker, its physic characteristics (gender, age, voice type) and personality; emotional states; and finally the kind of interaction (game scenario, emergency, everyday life). From audio signals collected in different conditions, we tried to find out, with acoustic features, to overlap speaker and his emotional state characterisation taking into account these variabilities.To find which features are essential and which are to avoid is hard challenge because it needs to work with a high number of variabilities and then to have riche and diverse data to our disposal. The main results are about the collection and the annotation of natural emotional corpora that have been recorded with different kinds of speakers (children, adults, elderly people) in various environments, and about how reliable are acoustic features across the four variabilities. This analysis led to two interesting aspects: the audio characterisation of a corpus and the drawing of a black list of features which vary a lot. Emotions are ust a part of paralinguistic features that are supported by the audio channel, other paralinguistic features have been studied such as personality and stress in the voice. We have also built automatic emotion recognition and speaker characterisation module that we have tested during realistic interactions. An ethic discussion have been driven on our work. Descripteurs acoustiques Voix émotionnelle Interaction humain-robot Reconnaissance d’émotions Identification du locuteur Acoustic features Emotional voice Human-robot interaction Emotion recognition Speaker identification
4	Vocalização de suínos em grupo sob diferentes condições térmicas / Pig vocalization in group under different thermal conditions Moura, Giselle Borges de 15 February 2013 (has links) Quantificar e qualificar o bem-estar de animais de produção, ainda é um desafio. Em qualquer avaliação de bem-estar, deve-se analisar, principalmente, a ausência de sentimentos negativos fortes, como o sofrimento, e a presença de sentimentos positivos, como o prazer. O objetivo principal dessa pesquisa foi quantificar a vocalização de suínos em grupos sob diferentes condições térmicas. Em termos de objetivos específicos foram avaliar a existência de padrões vocálicos de comunicação entre animais alojados em grupo e extrair as características acústicas dos espectros sonoros das vocalizações relacionando com as diferentes condições do micro-clima da instalação. O experimento foi realizado em uma unidade de experimentação com suínos, junto à University of Illinois (EUA), com ambiente controlado. Quatro grupos de seis leitões foram utilizados para a coleta dos dados. Foram instalados dataloggers para registrar as variáveis ambientais (T, °C e UR, %) e posterior cálculo dos índices de conforto (ITU e Entalpia do ar). Foram instalados microfones do tipo cardióide no centro geométrico de cada baia que abrigava os leitões, para registro das vocalizações. Os microfones foram conectados a um amplificador de sinais, e este a uma placa de captura dos sinais de áudio e vídeo, instalados em um computador. Para as edições dos arquivos de áudio contendo as vocalizações dos leitões, o programa Goldwave® foi utilizado na separação, e aplicação de filtros para a retirada de ruídos. Na sequência, os áudios foram analisados com auxílio do programa Sounds Analysis Pro 2011, onde foram extraídos as características acústicas. A amplitude (dB), frequência fundamental (Hz), frequência média (Hz), frequência de pico (Hz) e entropia foram utilizados para caracterização do espectro sonoro das vocalizações do grupo de leitões nas diferentes condições térmicas. O delineamento do experimento foi em blocos casualizados, com dois tratamentos, e três repetições na semana, sendo executado em duas semanas. Os dados foram amostrados para uma análise do comportamento do banco de dados de vocalização em relação aos tratamentos que foram aplicados. Os dados foram submetidos a uma análise de variância utilizando o proc GLM do SAS. Dentre os parâmetros acústicos analisados, a amplitude (dB), frequência fundamental e entropia. Os tratamentos, condição de conforto e condição de calor, apresentaram diferenças significativas, pelo teste de Tukey (p<0,05). A análise de variância mostrou diferenças no formato da onda para cada condição térmica nos diferentes períodos do dia. É possível quantificar a vocalização em grupos de suínos em diferentes condições térmicas, por intermédio da extração das características acústicas das amostras sonoras. O espectro sonoro foi extraído, indicando possíveis variações do comportamento dos leitões nas diferentes condições térmicas dentro dos períodos do dia. No entanto, a etapa de reconhecimento de padrão, ainda necessita de um banco de dados maior e mais consistente para o reconhecimento do espectro em cada condição térmica, seja por análise das imagens ou pela extração das características acústicas. Dentre as características acústicas analisadas, a amplitude (dB), frequência fundamental (Hz) e entropia das vocalizações em grupo de suínos foram significativas para expressar a condição dos animais quando em diferentes condições térmicas. / To quantify and to qualify animal well-being in livestock farms is still a challenge. To assess animal well-being, it must be analyzed, mainly, the absence of strong negative feelings, like pain, and the presence of positive feelings, like pleasure. The main objective was to quantify vocalization in a group of pigs under different thermal conditions. The specific objectives were to assess the existence of vocal pattern of communication between housing groups of pigs, and get the acoustic characteristics of the sound spectrum from the vocalizations related to the different microclimate conditions. The trial was carried out in a controlled environment experimental unit for pigs, at the University of Illinois (USA). Four groups of six pigs were used in the data collection. Dataloggers were installed to record environmental variables (T, °C and RH, %). These environmental variable were used to calculate two thermal comfort index: Enthalpy and THI. Cardioid microphones were installed to record continuous vocalizations in the geometric center of each pen where the pigs were housing. Microphones were connected to an amplifier, and this was connected to a dvr card installed in a computer to record audio and video information. For doing the sound edition in a pig vocalization database, the Goldwave® software was used to separate, and filter the files excluding background noise. In the sequence, the sounds were analyzed using the software Sounds Analysis Pro 2011, and the acoustic characteristics were extracted. Amplitude (dB), pitch (Hz), mean frequency (Hz), peak frequency (Hz) and entropy were used to characterize the sound spectrum of vocalizations of the groups of piglets in the different thermal conditions. A randomized block design was used, composed by two treatments and three repetitions in a week and executed in two weeks. Data were sampled to analyze the behavior of the databank of vocalization as a relation to the applied treatments. Data were submitted to an analysis of variance using the proc GLM of SAS. Among the studied acoustic parameters, the amplitude (dB), pitch and entropy. The treatments (comfort and heat stress conditions) presented significative differences, through Tukey\'s test (p<0,05). The analysis of variance showed differences to the wave format to each thermal condition in the different periods of the day. The quantification of vocalization of swine in groups under different thermal conditions is possible, using the extraction of acoustic characteristics from the sound samples. The sound spectrum was extracted, which indicated possible alterations in the piglets behavior in the different thermal conditions during the periods of the day. However, the stage of pattern\'s recognition still needs a larger and more consistent database to the recognition of the spectrum in each thermal condition, through image analysis or by the extraction of the acoustic characteristics. Among he analyzed acoustic characteristics, the amplitude (dB), pitch (Hz) and entropy of the vocalizations of groups of swine were significative to express the condition of the animals in different thermal conditions. Acoustic features Animal vocalization Animal welfare Bem-estar animal Características acústicas Microclima Microclimate Monitoramento Monitoring Sons vocálicos Vocal sounds Vocalização animal
5	Vocalização de suínos em grupo sob diferentes condições térmicas / Pig vocalization in group under different thermal conditions Giselle Borges de Moura 15 February 2013 (has links) Quantificar e qualificar o bem-estar de animais de produção, ainda é um desafio. Em qualquer avaliação de bem-estar, deve-se analisar, principalmente, a ausência de sentimentos negativos fortes, como o sofrimento, e a presença de sentimentos positivos, como o prazer. O objetivo principal dessa pesquisa foi quantificar a vocalização de suínos em grupos sob diferentes condições térmicas. Em termos de objetivos específicos foram avaliar a existência de padrões vocálicos de comunicação entre animais alojados em grupo e extrair as características acústicas dos espectros sonoros das vocalizações relacionando com as diferentes condições do micro-clima da instalação. O experimento foi realizado em uma unidade de experimentação com suínos, junto à University of Illinois (EUA), com ambiente controlado. Quatro grupos de seis leitões foram utilizados para a coleta dos dados. Foram instalados dataloggers para registrar as variáveis ambientais (T, °C e UR, %) e posterior cálculo dos índices de conforto (ITU e Entalpia do ar). Foram instalados microfones do tipo cardióide no centro geométrico de cada baia que abrigava os leitões, para registro das vocalizações. Os microfones foram conectados a um amplificador de sinais, e este a uma placa de captura dos sinais de áudio e vídeo, instalados em um computador. Para as edições dos arquivos de áudio contendo as vocalizações dos leitões, o programa Goldwave® foi utilizado na separação, e aplicação de filtros para a retirada de ruídos. Na sequência, os áudios foram analisados com auxílio do programa Sounds Analysis Pro 2011, onde foram extraídos as características acústicas. A amplitude (dB), frequência fundamental (Hz), frequência média (Hz), frequência de pico (Hz) e entropia foram utilizados para caracterização do espectro sonoro das vocalizações do grupo de leitões nas diferentes condições térmicas. O delineamento do experimento foi em blocos casualizados, com dois tratamentos, e três repetições na semana, sendo executado em duas semanas. Os dados foram amostrados para uma análise do comportamento do banco de dados de vocalização em relação aos tratamentos que foram aplicados. Os dados foram submetidos a uma análise de variância utilizando o proc GLM do SAS. Dentre os parâmetros acústicos analisados, a amplitude (dB), frequência fundamental e entropia. Os tratamentos, condição de conforto e condição de calor, apresentaram diferenças significativas, pelo teste de Tukey (p<0,05). A análise de variância mostrou diferenças no formato da onda para cada condição térmica nos diferentes períodos do dia. É possível quantificar a vocalização em grupos de suínos em diferentes condições térmicas, por intermédio da extração das características acústicas das amostras sonoras. O espectro sonoro foi extraído, indicando possíveis variações do comportamento dos leitões nas diferentes condições térmicas dentro dos períodos do dia. No entanto, a etapa de reconhecimento de padrão, ainda necessita de um banco de dados maior e mais consistente para o reconhecimento do espectro em cada condição térmica, seja por análise das imagens ou pela extração das características acústicas. Dentre as características acústicas analisadas, a amplitude (dB), frequência fundamental (Hz) e entropia das vocalizações em grupo de suínos foram significativas para expressar a condição dos animais quando em diferentes condições térmicas. / To quantify and to qualify animal well-being in livestock farms is still a challenge. To assess animal well-being, it must be analyzed, mainly, the absence of strong negative feelings, like pain, and the presence of positive feelings, like pleasure. The main objective was to quantify vocalization in a group of pigs under different thermal conditions. The specific objectives were to assess the existence of vocal pattern of communication between housing groups of pigs, and get the acoustic characteristics of the sound spectrum from the vocalizations related to the different microclimate conditions. The trial was carried out in a controlled environment experimental unit for pigs, at the University of Illinois (USA). Four groups of six pigs were used in the data collection. Dataloggers were installed to record environmental variables (T, °C and RH, %). These environmental variable were used to calculate two thermal comfort index: Enthalpy and THI. Cardioid microphones were installed to record continuous vocalizations in the geometric center of each pen where the pigs were housing. Microphones were connected to an amplifier, and this was connected to a dvr card installed in a computer to record audio and video information. For doing the sound edition in a pig vocalization database, the Goldwave® software was used to separate, and filter the files excluding background noise. In the sequence, the sounds were analyzed using the software Sounds Analysis Pro 2011, and the acoustic characteristics were extracted. Amplitude (dB), pitch (Hz), mean frequency (Hz), peak frequency (Hz) and entropy were used to characterize the sound spectrum of vocalizations of the groups of piglets in the different thermal conditions. A randomized block design was used, composed by two treatments and three repetitions in a week and executed in two weeks. Data were sampled to analyze the behavior of the databank of vocalization as a relation to the applied treatments. Data were submitted to an analysis of variance using the proc GLM of SAS. Among the studied acoustic parameters, the amplitude (dB), pitch and entropy. The treatments (comfort and heat stress conditions) presented significative differences, through Tukey\'s test (p<0,05). The analysis of variance showed differences to the wave format to each thermal condition in the different periods of the day. The quantification of vocalization of swine in groups under different thermal conditions is possible, using the extraction of acoustic characteristics from the sound samples. The sound spectrum was extracted, which indicated possible alterations in the piglets behavior in the different thermal conditions during the periods of the day. However, the stage of pattern\'s recognition still needs a larger and more consistent database to the recognition of the spectrum in each thermal condition, through image analysis or by the extraction of the acoustic characteristics. Among he analyzed acoustic characteristics, the amplitude (dB), pitch (Hz) and entropy of the vocalizations of groups of swine were significative to express the condition of the animals in different thermal conditions. Bem-estar animal Características acústicas Microclima Monitoramento Sons vocálicos Vocalização animal Acoustic features Animal vocalization Animal welfare Microclimate Monitoring Vocal sounds
6	Supervised Speech Separation Using Deep Neural Networks Wang, Yuxuan 21 May 2015 (has links) No description available. Computer Science Engineering Speech separation time-frequency masking computational auditory scene analysis acoustic features deep neural networks training targets generalization speech intelligibility speech quality
7	On Generalization of Supervised Speech Separation Chen, Jitong 30 August 2017 (has links) No description available. Computer Science Engineering Speech separation speech intelligibility computational auditory scene analysis mask estimation supervised learning deep neural networks acoustic features noise generalization SNR generalization speaker generalization
8	Lateralization Effects of Brainstem Responses and Middle Latency Responses to a Complex Tone and Speech Syllable Anderson, Jill M. 23 September 2011 (has links) No description available. Audiology Frequency following responses
9	Acoustic Measurements of Clear Speech Cue Fade in Adults with Idiopathic Parkinson Disease Diekema, Emily D. 19 May 2016 (has links) No description available. Speech Therapy Acoustics cue fade parkinson parkinsons disease acoustic features speech rate articulation rate percent pause time intensity rainbow passage fundamental frequency clear speech
10	Acoustic gesture modeling. Application to a Vietnamese speech recognition system / Modélisation des gestes acoustiques. Application à un système de reconnaissance de la parole Vietnamienne Tran, Thi-Anh-Xuan 30 March 2016 (has links) La sélection de caractéristiques acoustiques appropriées est essentielle dans tout système de traitement de la parole. Pendant près de 40 ans, la parole a été généralement considérée comme une séquence de signaux quasi-stables (voyelles) séparés par des transitions (consonnes). Bien qu‟un grand nombre d'études documentent clairement l'importance de la coarticulation, et révèlent que les cibles articulatoires et acoustiques ne sont pas indépendantes du contexte, l‟hypothèse que chaque voyelle présente une cible acoustique qui peut être spécifiée d'une manière indépendante du contexte reste très répandue. Ce point de vue implique des limitations fortes. Il est bien connu que les fréquences de formants sont des caractéristiques acoustiques qui présentent un lien évident avec la production de la parole, et qui peuvent participer à la distinction perceptive entre les voyelles. Par conséquent, les voyelles sont généralement décrites avec des configurations articulatoires statiques représentées par des cibles dans l'espace acoustique, généralement par les fréquences des formants correspondants, représentées dans les plans F1-F2 et F2-F3. Les consonnes occlusives peuvent être décrites en termes de point d'articulation, représentés par locus (ou locus équations) dans le plan acoustique. Mais les trajectoires des fréquences de formants dans la parole fluide présentent rarement un état d'équilibre pour chaque voyelle. Elles varient avec le locuteur, l'environnement consonantique (co-articulation) et le débit de parole (relative à un continuum entre hypo et hyper-articulation). En vue des limites inhérentes aux approches statiques, la démarche adoptée ici consiste à étudier les transitions entre les voyelles et les consonnes (V1V2 et V1CV2) d‟un point de vue dynamique. / Speech plays a vital role in human communication. Selection of relevant acoustic speech features is key to in the design of any system using speech processing. For some 40 years, speech was typically considered as a sequence of quasi-stable portions of signal (vowels) separated by transitions (consonants). Despite a wealth of studies that clearly document the importance of coarticulation, and reveal that articulatory and acoustic targets are not context-independent, the view that each vowel has an acoustic target that can be specified in a context-independent manner remains widespread. This point of view entails strong limitations. It is well known that formant frequencies are acoustic characteristics that bear a clear relationship with speech production, and that can distinguish among vowels. Therefore, vowels are generally described with static articulatory configurations represented by targets in the acoustic space, typically by formant frequencies in F1-F2 and F2-F3 planes. Plosive consonants can be described in terms of places of articulation, represented by locus or locus equations in an acoustic plane. But formant frequencies trajectories in fluent speech rarely display a steady state for each vowel. They vary with speaker, consonantal environment (co-articulation) and speaking rate (relating to continuum between hypo- and hyper-articulation). In view of inherent limitations of static approaches, the approach adopted here consists in studying both vowels and consonants from a dynamic point of view.Firstly we studied the effects of the impulse response at the beginning, at the end and during transitions of the signal both in the speech signal and at the perception level. Variations of the phases of the components were then examined. Results show that the effects of these parameters can be observed in spectrograms. Crucially, the amplitudes of the spectral components distinguished under the approach advocated here are sufficient for perceptual discrimination. From this result, for all speech analysis, we only focus on amplitude domain, deliberately leaving aside phase information. Next we extent the work to vowel-consonant-vowel perception from a dynamic point of view. These perceptual results, together with those obtained earlier by Carré (2009a), show that vowel-to-vowel and vowel-consonant-vowel stimuli can be characterized and separated by the direction and rate of the transitions on formant plane, even when absolute frequency values are outside the vowel triangle (i.e. the vowel acoustic space in absolute values).Due to limitations of formant measurements, the dynamic approach needs to develop new tools, based on parameters that can replace formant frequency estimation. Spectral Subband Centroid Frequency (SSCF) features was studied. Comparison with vowel formant frequencies show that SSCFs can replace formant frequencies and act as “pseudo-formant” even during consonant production.On this basis, SSCF is used as a tool to compute dynamic characteristics. We propose a new way to model the dynamic speech features: we called it SSCF Angles. Our analysis work on SSCF Angles were performed on transitions of vowel-to-vowel (V1V2) sequences of both Vietnamese and French. SSCF Angles appear as reliable and robust parameters. For each language, the analysis results show that: (i) SSCF Angles can distinguish V1V2 transitions; (ii) V1V2 and V2V1 have symmetrical properties on the acoustic domain based on SSCF Angles; (iii) SSCF Angles for male and female are fairly similar in the same studied transition of context V1V2; and (iv) they are also more or less invariant for speech rate (normal speech rate and fast one). And finally, these dynamic acoustic speech features are used in Vietnamese automatic speech recognition system with several obtained interesting results. Gestes vocaliques Caractéristiques acoustiques dynamiques Angles SSCF Spectre d'amplitude Reconnaissance automatique de la parole Vowel gesture Dynamic acoustic features SSCF angles Transition direction and rate Magnitude of speech Automatic speech recognition 620

Search results