Global ETD Search

21	Paramètres spectraux à LPC Paramètres Mapping : approches multi-linéaires et GMM (appliqué aux voyelles françaises) Ming, Zuheng 24 June 2013 (has links) (PDF) Le langage parlé complété (LPC) est un système de communication visuel qui utilise des formes de main placés dans différentes positions près du visage, en combinaison avec le discours de la lecture labiale naturel, pour améliorer la perception de la parole à partir de l'entrée visuelle pour les personnes sourdes. Cependant l'un des défis importants est la question de la communication de la parole entre les personnes normo-entendant qui ne pratiquent pas LPC mais produisent discours acoustique et les personnes sourdes qui utilisent la lecture labiale complété par code LPC pour la perception de la parole sans audition résiduelle. Dans notre travail, nous appliquons la méthode de régression linéaire multiple (MLR) et modèle gaussien de mélange (GMM) approche pour mapper des paramètres spectraux acoustiques à la position de la main dans LPC et la forme de la lèvre d'accompagnement. Nous donc contribué à la mise au point d'un système de traduction automatique dans le cadre de la synthèse de la parole visuelle.Cela prouve que l'approche MLR est bonne pour l'estimation des paramètres pour les lèvres à partir des paramètres spectraux car il y a forte corrélation linéaire entre les paramètres des lèvres et des paramètres spectraux. Cependant, la performance de l'approche MLR pour estimer la position de la main est faible car il n'y a pas de relation entre les positions de la main et des paramètres spectraux. En introduisant un espace intermédiaire, il s'avère que la structure de topologie similaire est la clé de la MLR. Afin de libérer de la contrainte linéaire de l'approche MLR, nous appliquons la méthode de cartographie basée sur GMM qui possède à la fois les propriétés de classification et de régression. Les paramètres de GMM sont estimés par les méthodes de formation supervisées, non supervisées et semi- supervisés séparément dans la vue de la théorie de l'apprentissage de la machine. La méthode de formation supervisée montre une grande efficacité et une bonne robustesse. Le Minimum Mean Square Error (MMSE) et Maximum A Posteriori Probabilité (MAP) sont utilisés comme critères de régression séparément dans l'approche de la cartographie basée sur GMM. Cela prouve que l'approche MLR est un cas particulier de l'approche de GMM lorsque le nombre de gaussiennes est égal à un. Ainsi, l'approche de la cartographie sur GMM peut améliorer la performance significative en comparaison avec le MLR en augmentant le nombre de gaussiennes. Enfin, les différentes approches de cartographie utilisées dans ce travail sont comparées dans une transition continue. Il montre que l'approche sur GMM peut effectuer bien grâce à la propriété de classification lorsque les données source et cible n'a pas de " relation" comme dans le cas de l'estimation de la position de la main, et il peut également améliorer les performances par la propriété de régression local lorsque la source et les données cible a forte corrélation comme dans le cas du paramètre de lèvre estimation. En outre, une prédiction directe de la géométrie des lèvres comporte de l'image naturelle de la bouche région d'intérêt (ROI) sur la base de la 2D transformée en cosinus discrète (DCT) combinée à une analyse en composante principale (ACP) est proposé. Les résultats montrent la possibilité d'estimer les caractéristiques géométriques de la lèvre avec une bonne précision en utilisant un ensemble réduit de prédicteurs dérivés des coefficients DCT. [SPI:OTHER] Engineering Sciences/Other LPC Régression multiple linéaire (MLR) GMM MMSE MAP
22	Paramètres spectraux à LPC Paramètres Mapping : approches multi-linéaires et GMM (appliqué aux voyelles françaises) / Spectral Parameters to Cued Speech Parameters Mapping : Multi-linear and GMM approaches (applied to French vowels) Ming, Zuheng 24 June 2013 (has links) Le langage parlé complété (LPC) est un système de communication visuel qui utilise des formes de main placés dans différentes positions près du visage, en combinaison avec le discours de la lecture labiale naturel, pour améliorer la perception de la parole à partir de l'entrée visuelle pour les personnes sourdes. Cependant l'un des défis importants est la question de la communication de la parole entre les personnes normo-entendant qui ne pratiquent pas LPC mais produisent discours acoustique et les personnes sourdes qui utilisent la lecture labiale complété par code LPC pour la perception de la parole sans audition résiduelle. Dans notre travail, nous appliquons la méthode de régression linéaire multiple (MLR) et modèle gaussien de mélange (GMM) approche pour mapper des paramètres spectraux acoustiques à la position de la main dans LPC et la forme de la lèvre d’accompagnement. Nous donc contribué à la mise au point d'un système de traduction automatique dans le cadre de la synthèse de la parole visuelle.Cela prouve que l'approche MLR est bonne pour l'estimation des paramètres pour les lèvres à partir des paramètres spectraux car il y a forte corrélation linéaire entre les paramètres des lèvres et des paramètres spectraux. Cependant, la performance de l'approche MLR pour estimer la position de la main est faible car il n'y a pas de relation entre les positions de la main et des paramètres spectraux. En introduisant un espace intermédiaire, il s'avère que la structure de topologie similaire est la clé de la MLR. Afin de libérer de la contrainte linéaire de l'approche MLR, nous appliquons la méthode de cartographie basée sur GMM qui possède à la fois les propriétés de classification et de régression. Les paramètres de GMM sont estimés par les méthodes de formation supervisées, non supervisées et semi- supervisés séparément dans la vue de la théorie de l'apprentissage de la machine. La méthode de formation supervisée montre une grande efficacité et une bonne robustesse. Le Minimum Mean Square Error (MMSE) et Maximum A Posteriori Probabilité (MAP) sont utilisés comme critères de régression séparément dans l'approche de la cartographie basée sur GMM. Cela prouve que l'approche MLR est un cas particulier de l'approche de GMM lorsque le nombre de gaussiennes est égal à un. Ainsi, l'approche de la cartographie sur GMM peut améliorer la performance significative en comparaison avec le MLR en augmentant le nombre de gaussiennes. Enfin, les différentes approches de cartographie utilisées dans ce travail sont comparées dans une transition continue. Il montre que l'approche sur GMM peut effectuer bien grâce à la propriété de classification lorsque les données source et cible n'a pas de " relation" comme dans le cas de l'estimation de la position de la main, et il peut également améliorer les performances par la propriété de régression local lorsque la source et les données cible a forte corrélation comme dans le cas du paramètre de lèvre estimation. En outre, une prédiction directe de la géométrie des lèvres comporte de l'image naturelle de la bouche région d'intérêt (ROI) sur la base de la 2D transformée en cosinus discrète (DCT) combinée à une analyse en composante principale (ACP) est proposé. Les résultats montrent la possibilité d'estimer les caractéristiques géométriques de la lèvre avec une bonne précision en utilisant un ensemble réduit de prédicteurs dérivés des coefficients DCT. / Cued Speech (CS) is a visual communication system that uses hand shapes placed in different positions near the face, in combination with the natural speech lip-reading, to enhance speech perception from visual input for deaf people. However one of the important challenges is the question of speech communication between normal hearing people who do not practice CS but produce acoustic speech and deaf people who use lip-reading complemented by CS code for speech perception with no residual audition. In our work, we apply the multi-linear regression approach (MLR) and Gaussian Mixture Model (GMM)-based mapping approach to map acoustic spectral parameters to the hand position in CS and the accompanying lip shape. We hence contributed to the development of automatic translation system in the framework of visual speech synthesis. It proves that the MLR approach is good for estimating the lip parameters from the spectral parameters since there is strong linear correlation between the lip parameters and spectral parameters. However, the performance of MLR approach for estimating the hand position is poor since there is no relationship between the hand positions and spectral parameters. By introducing an intermediate space, it proves that the similar topology structure is the key of the MLR. In order to release the linear constraint of the MLR approach, we apply the GMM-based mapping approach which has both the classification-partition and regression properties. The parameters of GMM are estimated by the supervised, unsupervised and semi-supervised training methods separately in the view of the machine learning theory. The supervised training method shows high efficiency and good robustness. The Minimum Mean Square Error (MMSE) and Maximum A Posteriori Probability (MAP) are used as regression criteria separately in GMM-based mapping approach. It proves that the MLR approach is a special case of GMM-based mapping approach when the number of the Gaussians equals to one. Thus the GMM-based mapping approach can improve the performance significantly in comparison with the MLR by increasing the number of the Gaussians. Finally, a continuous transition achieved by the linear interpolation in the acoustic space is introduced to compare the different mapping approaches used in this work. It shows that the GMM-based mapping approach can perform well thanks to the classification-partitioning property when the source and target data has “no relationship” such as the case of the hand position estimation; and it can also improve the performance by the local regression property when the source and target data has strong correlation such as the case of the lip parameter estimation. Besides, a direct prediction of lip geometry features from the natural image of mouth region-of-interest (ROI) based on the 2D Discrete Cosine Transform (DCT) combined with a Principal Component Analysis (PCA) is proposed. The results show the possibility to estimate the geometric lip features with good accuracy using a reduced set of predictors derived from the DCT coefficients. LPC Régression multiple linéaire (MLR) GMM MMSE MAP Cued Speech Acoustic speech to Cued speech mapping Multi-linear regression (MLR) GMM MMSE MAP
23	Reconhecimento de comandos de voz por redes neurais Rodrigo Jorge Alvarenga 02 June 2012 (has links) Sistema de reconhecimento de fala tem amplo emprego no universo industrial, no aperfeiçoamento de operações e procedimentos humanos e no setor do entretenimento e recreação. O objetivo específico do trabalho foi conceber e desenvolver um sistema de reconhecimento de voz, capaz de identificar comandos de voz, independentemente do locutor. A finalidade precípua do sistema é controlar movimentos de robôs, com aplicações na indústria e no auxílio de deficientes físicos. Utilizou-se a abordagem da tomada de decisão por meio de uma rede neural treinada com as características distintivas do sinal de fala de 16 locutores. As amostras dos comandos foram coletadas segundo o critério de conveniência (em idade e sexo), a fim de garantir uma maior discriminação entre as características de voz, e assim alcançar a generalização da rede neural utilizada. O préprocessamento consistiu na determinação dos pontos extremos da locução do comando e na filtragem adaptativa de Wiener. Cada comando de fala foi segmentado em 200 janelas, com superposição de 25% . As features utilizadas foram a taxa de cruzamento de zeros, a energia de curto prazo e os coeficientes ceptrais na escala de frequência mel. Os dois primeiros coeficientes da codificação linear preditiva e o seu erro também foram testados. A rede neural empregada como classificador foi um perceptron multicamadas, treinado pelo algoritmo backpropagation. Várias experimentações foram realizadas para a escolha de limiares, valores práticos, features e configurações da rede neural. Os resultados foram considerados muito bons, alcançando uma taxa de acertos de 89,16%, sob as condições de pior caso da amostragem dos comandos. / Systems for speech recognition have widespread use in the industrial universe, in the improvement of human operations and procedures and in the area of entertainment and recreation. The specific objective of this study was to design and develop a voice recognition system, capable of identifying voice commands, regardless of the speaker. The main purpose of the system is to control movement of robots, with applications in industry and in aid of disabled people. We used the approach of decision making, by means of a neural network trained with the distinctive features of the speech of 16 speakers. The samples of the voice commands were collected under the criterion of convenience (age and sex), to ensure a greater discrimination between the voice characteristics and to reach the generalization of the neural network. Preprocessing consisted in the determination of the endpoints of each command signal and in the adaptive Wiener filtering. Each speech command was segmented into 200 windows with overlapping of 25%. The features used were the zero crossing rate, the short-term energy and the mel-frequency ceptral coefficients. The first two coefficients of the linear predictive coding and its error were also tested. The neural network classifier was a multilayer perceptron, trained by the backpropagation algorithm. Several experiments were performed for the choice of thresholds, practical values, features and neural network configurations. Results were considered very good, reaching an acceptance rate of 89,16%, under the `worst case conditions for the sampling of the commands. processamento de sinais reconhecimento de palavras MFCC coeficientes `mel-cepstral LPC redes neurais backpropagation automation signal processing word recognition MFCC mel-frequency ceptral coefficients LPC neural networks backpropagation ENGENHARIA MECANICA
24	Vliv alkoholu na řečový signál / Effect of alcohol on speech signal Kandus, Filip January 2011 (has links) The main theme of the thesis is to examine the influence of alcohol on the speech apparatus and speech signal. The first part is focused on symptoms and detection of alcohol concentration in the human body. The following part describes somescientific publications and projects, which dealt witha a similar theme. Also the czech documentation to german database ALC was created. Based on phonetic knowledge, Czech text was compiled. Different speakers were reading this text so we go tour own database of alcoholic and sober speech. Samples from individual speakers are processed using linear prediction, formant and cepstral analysis in MATLAB and the effect of alcohol on selected parameters of speech signal is evaluated.
25	The Role of Autotaxin in the Regulation of Lysophosphatidylcholine-Induced Cell Migration Gaetano, Cristoforo Giuseppe 06 1900 (has links) Increased expression of autotaxin has been shown to promote metastasis formation and cancer proliferation. These actions could be related to the catalytic activity of autotaxin which converts lysophosphatidylcholine into lysophosphatidate extracellularly or non-catalytic functions of autotaxin may be responsible. Also both LPC and LPA have been reported to stimulate migration through their respective receptors. This work investigates the role of autotaxin in controlling the motility of two cancer cell lines. With the use of autotaxin inhibitors we were able to block LPC-induced migration. Knocking-down autotaxin secretion also blocked stimulation of migration by LPC. Autotaxin inhibitors abolished any migratory effects from media collected from autotaxin secreting cells. We determined that LPC alone is unable to stimulate migration. Also we did not observe non-catalytic effects of autotaxin on migration. This thesis provides strong evidence that the inhibition of autotaxin production or activity would provide a beneficial therapy in the prevention of tumour growth or metastasis in patients with autotaxin expressing tumours. Autotaxin Lysophosphatidylcholine Lysophosphatidate Lysophosphatidic adic LPA ATX LPC Melanoma Breast Cancer Migration Metastasis
26	The Role of Autotaxin in the Regulation of Lysophosphatidylcholine-Induced Cell Migration Gaetano, Cristoforo Giuseppe Unknown Date No description available. Autotaxin Lysophosphatidylcholine Lysophosphatidate Lysophosphatidic adic LPA ATX LPC Melanoma Breast Cancer Migration Metastasis
27	Recognition Event-Related Potentials and Neuropsychological Indices in Healthy Ageing and Amnestic Mild Cognitive Impairment Megan Broughton Unknown Date (has links) Amnestic mild cognitive impairment (aMCI) has been established as a significant risk factor for Alzheimer‟s disease (AD) and in many cases this state appears to represent an early or incipient stage of AD. Due to difficulties with the diagnosis and prognosis of aMCI and AD, as well as with the projected significant socioeconomic ramifications of AD, there is a need to establish sensitive and reliable biomarkers. The application of event related potentials (ERPs) has been recommended in this context due to their reliability, non-invasive nature, inexpense and relatively widespread availability. This thesis aims to further assess the potential efficacy of ERP markers for such applications. These aims are pursued via investigations of ERPs in healthy ageing, MCI and AD utilising an explicit recognition task that requires the use of key cognitive/memory processes which are often impaired in aMCI and AD. Two ERP effects were analysed: the N400effect which is assumed to index familiarity or trace strength, and the Late Positive Complex (LPC) which appears to index recollection or decision-related factors such as accuracy. Chapter 3 reports ERP and recognition accuracy comparisons between samples of 15 young (mean age = 21.73 years) and 15 older, cognitively healthy adults (mean age = 66.67 years). ERP data were acquired during performance of a word recognition task with high and low memory load conditions (long and short encoding lists, respectively). At test, participants were required to make old/new judgements to visually presented words. There was a trend for young participants to perform more accurately than the older sample, especially on the long list; although these differences only approached significance. However, the N400 old/new effect was found to be significantly reduced in the old compared with the young participants across memory load conditions. LPC old/new effects were generally not observed and this is likely due to the nature of the task which generally places minimal demands on controlled retrieval processes. These results indicate that the N400 effect may be more sensitive to the deleterious effects of ageing on recognition memory-related process(s) than behavioural measures of memory accuracy. Consistent with the view that the N400 indexes familiarity, these results are in accordance with other evidence that familiarity is affected in healthy ageing. The same methodology was used to compare ERPs between aMCI (n = 11) and healthy older adults (n = 11) in Chapter 4. The aMCI participants performed significantly worse than vi healthy elderly participants in discriminating „old‟ from „new‟ words. In the corresponding ERP data, healthy control sample demonstrated significant N400 old/new effects at parietal electrode locations, whereas aMCI participants failed to demonstrate significant N400 old/new effects at any electrode location. Again, LPC effects were not observed in either sample. The absence of significant N400 effects in aMCI participants may reflect a disruption of familiarity-based recognition in aMCI. These results converge with other evidence that the N400 effect may be a sensitive ERP marker useful for detecting, monitoring and/or predicting amnestic related cognitive decline. There are reported variations in underlying causes and sequelae of aMCI (e.g., not all progress to AD). Chapter 5 reports an exploratory investigation aimed at determining whether baseline ERPs differentiate between aMCI participants on the basis of their clinical diagnosis at follow-up. Baseline ERP data were compared in a small sample (n = 7) of aMCI participant who remained cognitively stable at 12-month follow-up (SMCI) with two aMCI participants who progressed to meet an AD diagnosis (PMCI) at the latter time-point. There was a trend for PMCI participants to display smaller old/new effects. However, only one participant displayed significantly smaller N400 old/new effects under low memory load conditions. Interestingly, this participant was also more impaired in baseline cognitive functioning. Chapter 6 examines the relationship between baseline ERPs and performance on neuropsychological assessment at 12-month follow-up in a sample of aMCI and AD participants (n =13) in order to investigate whether ERPs may prove informative for prognoses regarding general trajectories of cognitive decline, irrespective of diagnostic status. Smaller N400 old/new effects (at Fz and CPz) were associated with poorer performance on tasks assessing global cognitive functioning and auditory attention span. Reduced LPC old/new differences were related to poorer performance on tasks assessing global cognitive functioning, verbal learning and memory and better performance on a task assessing working memory at follow-up. In contrast to these results, no relationships were observed between ERP effects and concurrent performance on neuropsychological assessment in this sample, or in 42 elderly participants (including healthy, aMCI and AD), as described in Chapter 7. Taken together these results suggest that ERPs may be more sensitive in predicting future rather than concurrent cognitive functioning and may provide a more objective measure/classification of cognitive impairment vii irrespective of diagnosis. These outcomes are particularly novel as the relationship between baseline ERP data and follow-up neuropsychological measures does not appear to have been systematically reported in the literature to date. Collectively these findings indicate that ERP measure(s), particularly the N400 old/new effect, are sensitive to neurocognitive changes associated with ageing and aMCI, and may prove a useful biomarker for the early detection of AD. This is interesting as the effects of healthy ageing and pathological decline on the N400 from explicit recognition tasks have not been thoroughly explored. Moreover, the N400 (and perhaps, to a lesser degree, LPC) effect(s) appear to have substantial value for informing future prognoses of subsequent cognitive trajectories, at least for persons with amnestic impairment. These results may have significant clinical implications pertaining to the selection and application of efficacious therapeutic interventions in aMCI and AD. Event-related Potentials N400 Lpc ageing Alzheimer's Disease Mild Cognitive Impairment Recognition memory
28	Apports du numérique dans les outils de communication des personnes handicapées : développement d’un dictionnaire inversé : Langue des Signes Françaises -> Français Zbakh, Mohammed 17 December 2014 (has links) Les dictionnaires sont considérés comme des passerelles entre les langues. Au cours des dernières décennies, ils se sont rapidement adaptés aux nouvelles technologies. En effet, ils ont dépassé leur aspect classique de livres, pour conquérir le nouveau monde d’Internet. Ce développement leur a permis d’avoir plus d’accessibilité et plus de réactivité grâce à l’utilisation de différents systèmes d’indexation et de classification adéquats. Malgré la différence de structure entre la langue vocale et la langue des signes, cette dernière ne fait pas exception.Lors de ce travail, nous avons développé un système de recherche intelligent, capable de trouver la signification d’un signe de la langue des signes française à partir des paramètres du signe lui-même. Toutefois, la structure visuo-gestuelle de la langue des signes pose des difficultés pratiques à la mise en œuvre informatique de cette langue. La particularité de sa grammaire, d’être pratiquée dans l’espace, nous a encouragée à travailler sur une approche pragmatique, qui facilite l’accès à son vocabulaire pour toute personne s’intéressant à la langue des signes française.Lors de nos expérimentations, nous avons mis en place une plate-forme web de recherche des signes, puis nous avons analysé les requêtes des utilisateurs connectés à cette plate-forme. Cette analyse avait pour but d’identifier les paramètres nécessaires au développement d’un système léger, capable de trouver facilement la signification d’un signe en langue française. / Dictionaries can be seen as bridges between languages. Recently, They have quickly adapted themselves to new technologies like many other sources of knowledge. Indeed, they have overreach their tradictional look of books, to carry themselves in the new world of the Internet. This development has enabled them to reach new levels of accessibility and responsiveness through the use of different indexing and classification adequate systems. Despite its different structure with vocal language, sign language is no exception on this ground.In this work, we developed an intelligent searching system, able to give the meaning of a sign of the French sign language through different parameters of the sign itself. However, the visual-gestural structure of sign languages poses practical difficulties in the computing implementation of this language. The particularity of its grammar, the fact that it has to be performed in space encouraged us to work on a pragmatic approach, which facilitates access to its vocabulary for anyone interested in French Sign Language.In our experimentation, we set up a web platform of signs search, and then analyzed the requests of users that have been connected to this platform. This analysis led us to identify the parameters necessary to develop a light system that can easily provide the meaning of a sign in French. Langue des Signes Française Classification automatique Dissemblance Dictionnaire inversé LSF Communication personnes sourde LPC
29	Odhad formantových kmitočtů pomocí strojového učení / Estimation of formant frequencies using machine learning Káčerová, Erika January 2019 (has links) This Master's thesis deals with the issue of formant extraction. A system of scripts in Matlab interface is created to generate values of the first three formant frequencies from speech recordings with the use of Praat and Snack(WaveSurfer). Mel Frequency Cepstral Coefficients and Linear Predictive Coefficients are extracted from the audio files in order to be added to the database. This database is then used to train a neural network. Finally, the designed neural network is tested.
30	Dekodér pro systém detekce klíčových slov / Decoder for key word detection system Krotký, Jan January 2009 (has links) The essay presents the basic characteristics of human speech recognition, describes systems for the detection of key words and further deals with the proposal of each decoder blocks divided into three chapters. The first one describes the operations that are performed before the signal distribution of the framework and the segmentation. The second chapter describes the calculation of short-term energy, the number of zero passes and self-correlative, prediction and Mel-frequency cepstral coefficients. The third chapter, which describes the design of the block decoder, describes the method of dynamic time destruction and the method based on hidden Markov model. The final part of the essay describes decoders working with a speech and a proposal for a simple decoder working with isolated words, which was based issued and tested based on the preceding chapters.

Search results