• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 170
  • 40
  • 33
  • 30
  • 14
  • 10
  • 9
  • 8
  • 4
  • 4
  • 4
  • 3
  • 3
  • 2
  • 2
  • Tagged with
  • 391
  • 104
  • 101
  • 86
  • 80
  • 47
  • 39
  • 33
  • 32
  • 31
  • 30
  • 30
  • 28
  • 28
  • 27
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
221

Vision-Based Force Planning and Voice-Based Human-Machine Interface of an Assistive Robotic Exoskeleton Glove for Brachial Plexus Injuries

Guo, Yunfei 18 October 2023 (has links)
This dissertation focuses on improving the capabilities of an assistive robotic exoskeleton glove designed for patients with Brachial Plexus Injuries (BPI). The aim of this research is to develop a force control method, an automatic force planning method, and a Human-Machine Interface (HMI) to refine the grasping functionalities of the exoskeleton glove, thus helping rehabilitation and independent living for individuals with BPI. The exoskeleton glove is a useful tool in post-surgery therapy for patients with BPI, as it helps counteract hand muscle atrophy by allowing controlled and assisted hand movements. This study introduces an assistive exoskeleton glove with rigid side-mounted linkages driven by Series Elastic Actuators (SEAs) to perform five different types of grasps. In the aspect of force control, data-driven SEA fingertip force prediction methods were developed to assist force control with the Linear Series Elastic Actuators (LSEAs). This data-driven force prediction method can provide precise prediction of SEA fingertip force taking into account the deformation and friction force on the exoskeleton glove. In the aspect of force planning, a slip-grasp force planning method with hybrid slip detection is implemented. This method incorporates a vision-based approach to estimate object properties to refine grasp force predictions, thus mimicking human grasping processes and reducing the trial-and-error iterations required for the slip- grasp method, increasing the grasp success rate from 71.9% to 87.5%. In terms of HMI, the Configurable Voice Activation and Speaker Verification (CVASV) system was developed to control the proposed exoskeleton glove, which was then complemented by an innovative one-shot learning-based alternative, which proved to be more effective than CVASV in terms of training time and connectivity requirements. Clinical trials were conducted successfully in patients with BPI, demonstrating the effectiveness of the exoskeleton glove. / Doctor of Philosophy / This dissertation focuses on improving the capabilities of a robotic exoskeleton glove designed to assist individuals with Brachial Plexus Injuries (BPI). The goal is to enhance the glove's ability to grasp and manipulate objects, which can help in the recovery process and enable patients with BPI to live more independently. The exoskeleton glove is a tool for patients with BPI to used after surgery to prevent the muscles of the hand from weakening due to lack of use. This research introduces an exoskeleton glove that utilizes special mechanisms to perform various types of grasp. The study has three main components. First, it focuses on ensuring that the glove can accurately control its grip strength. This is achieved through a special method that takes into account factors such as how the materials in the glove change when it moves and the amount of friction present. Second, the study works on a method for planning how much force the glove should use to hold objects without letting them slip. This method combines a camera-based object and material detection to estimate the weight and size of the target object, making the glove better at holding things without dropping them. The third part involves designing how people can instruct the glove what to do. The command can be sent to the robot by voice. This study proposed a new method that quickly learns how you talk and recognizes your voice. The exoskeleton glove was tested on patients with BPI and the results showed that it is successful in helping them. This study enhances assistive technology, especially in the field of assistive exoskeleton glove, making it more effective and beneficial for individuals with hand disabilities.
222

[en] TO SPEAK OR NOT TO SPEAK BEAUTIFUL ENGLISH?: THAT’S THE QUESTION: DILEMMAS OF A GROUP OF BRAZILIAN ENGLISH TEACHERS / [pt] TER OU NÃO TER O INGLÊS LINDO?: EIS A QUESTÃO: DILEMAS DE UM GRUPO DE PROFESSORAS BRASILEIRAS DE INGLÊS

EVELLYN DORDRON AZEVEDO 08 August 2013 (has links)
[pt] O desenvolvimento desta pesquisa visa aprofundar entendimentos sobre a produção, principalmente na modalidade oral, do inglês como língua estrangeira (ILE). Após perceber-me detentora de questões relacionadas às minhas identidades e à forma com que poderia ser percebida por demais indivíduos ao utilizar tal língua estrangeira, realizei uma entrevista-conversa semi-estruturada e informal com outras duas professoras de ILE. Através de uma abordagem qualitativa interpretativista dos dados obtidos, percebi que nós vivenciávamos tensões semelhantes. A análise do discurso gerado nas entrevistas-conversas teve como foco as crenças das professoras com relação ao papel da língua inglesa, como língua nativa e estrangeira, em contextos sociais contemporâneos e as questões de identidades pessoais e profissionais. Na segunda fase da análise dos dados, norteada pelos princípios da Prática Exploratória e da Prática Reflexiva, promovi outras entrevistas-conversas com as mesmas professoras, separadamente. Refletimos sobre nossas percepções e crenças expressas durante a primeira fase, além de tratarmos sobre nossos posicionamentos nesse novo momento. Ao analisar os dados das duas fases da pesquisa com o arcabouço teóricometodológico adotado, entendi a complexidade de nossas questões e a mutabilidade de nossas identidades e crenças de acordo com as experiências que vivenciamos e os contextos nos quais estamos inseridas. No entanto, algumas ideias ainda encontram-se profundamente enraizadas, resultando em tensões que nos levam a habitar um entre-lugar de posicionamentos. / [en] The development of the current research aims at searching for understandings about the production, especially in the oral modality, of English as a Foreign Language (EFL). After perceiving myself as having issues related to my social identities as well as to how I could be perceived by other individuals when using such foreign language, a semi-structured conversation-interview was arranged with two other participating EFL teachers. Through an interpretativist qualitative approach towards the generated data, it was possible to realize that the two participant teachers suffered from tensions similar to mine. The analysis of the discourse generated through the conversation-interviews focused on the teachers’ beliefs about the role of English, as a native and a foreign language, within contemporary social contexts and on personal professional identity issues. The second phase of this work, guided by the principles of Exploratory Practice and of Reflective Practice, led us to other conversation-interviews with the same participant teachers, separately. We reflected upon our perceptions and beliefs expressed during the first phase, besides dealing with our positioning at that moment. By connecting the insights obtained during the two research phases, I could understand the complexity of the issues raised and the changeability of our identities and beliefs according to the situations we experience and the contexts we are part of. However, some ideas are deeply ingrained in our minds, resulting in tensions that lead us to inhabit a middle-place of perspectives.
223

Αναγνώριση ομιλητή / Speaker recognition

Ganchev, Todor 25 June 2007 (has links)
Η παρούσα διατριβή πραγματεύεται την αναγνώριση ομιλητή σε πραγματικές συνθήκες. Τα κύρια σημεία της εργασίας είναι: (1) αξιολόγηση διαφόρων προσεγγίσεων εξαγωγής χαρακτηριστικών παραμέτρων ομιλίας, (2) μείωση της ισχύος της περιβαλλοντικής επίδρασης στην απόδοση της αναγνώρισης ομιλητή, και (3) μελέτη τεχνικών κατηγοριοποίησης, εναλλακτικών προς τις υπάρχουσες. Συγκεκριμένα, στο (1), προτείνεται μια νέα δομή εξαγωγής παραμέτρων ομιλίας βασισμένη σε πακέτα κυματομορφών, κατάλληλα σχεδιασμένη για αναγνώριση ομιλητή. Εξάγεται με ένα αντικειμενικό τρόπο σε σχέση με την απόδοση αναγνώρισης ομιλητή, σε αντίθεση με την MFCC προσέγγιση, που βασίζεται στην προσέγγιση της αντίληψης της ανθρώπινης ακοής. Έπειτα, στο (2), δίνεται μια δομή για την εξαγωγή παραμέτρων βασισμένη στα MFCC, ανεκτική στο θόρυβο, για την βελτίωση της απόδοσης της αναγνώρισης ομιλητή σε πραγματικό περιβάλλον. Συνοπτικά, μια τεχνική μείωσης του θορύβου βασισμένη σε μοντέλο προσαρμοσμένη στο πρόβλημα της επιβεβαίωσης ομιλητή ενσωματώνεται απευθείας στη δομή υπολογισμού των MFCC. Αυτή η προσέγγιση επέδειξε σημαντικό πλεονέκτημα σε πραγματικό και ταχέως μεταβαλλόμενο περιβάλλον. Τέλος, στο (3), εισάγονται δύο νέοι κατηγοριοποιητές που αναφέρονται ως Locally Recurrent Probabilistic Neural Network (LR PNN), και Generalized Locally Recurrent Probabilistic Neural Network (GLR PNN). Είναι υβρίδια μεταξύ των Recurrent Neural Network (RNN) και Probabilistic Neural Network (PNN) και συνδυάζουν τα πλεονεκτήματα των γεννετικών και διαφορικών προσσεγγίσεων κατηγοριοποίησης. Επιπλέον, τα νέα αυτά νευρωνικά δίκτυα είναι ευαίσθητα σε παροδικές και ειδικές συσχετίσεις μεταξύ διαδοχικών εισόδων, και έτσι, είναι κατάλληλα για να αξιοποιήσουν την συσχέτιση παραμέτρων ομιλίας μεταξύ πλαισίων ομιλίας. Κατά την εξαγωγή των πειραμάτων, διαφάνηκε ότι οι αρχιτεκτονικές LR PNN και GLR PNN παρέχουν καλύτερη απόδοση, σε σχέση με τα αυθεντικά PNN. / This dissertation dials with speaker recognition in real-world conditions. The main accent falls on: (1) evaluation of various speech feature extraction approaches, (2) reduction of the impact of environmental interferences on the speaker recognition performance, and (3) studying alternative to the present state-of-the-art classification techniques. Specifically, within (1), a novel wavelet packet-based speech features extraction scheme fine-tuned for speaker recognition is proposed. It is derived in an objective manner with respect to the speaker recognition performance, in contrast to the state-of-the-art MFCC scheme, which is based on approximation of human auditory perception. Next, within (2), an advanced noise-robust feature extraction scheme based on MFCC is offered for improving the speaker recognition performance in real-world environments. In brief, a model-based noise reduction technique adapted for the specifics of the speaker verification task is incorporated directly into the MFCC computation scheme. This approach demonstrated significant advantage in real-world fast-varying environments. Finally, within (3), two novel classifiers referred to as Locally Recurrent Probabilistic Neural Network (LR PNN), and Generalized Locally Recurrent Probabilistic Neural Network (GLR PNN) are introduced. They are hybrids between Recurrent Neural Network (RNN) and Probabilistic Neural Network (PNN) and combine the virtues of the generative and discriminative classification approaches. Moreover, these novel neural networks are sensitive to temporal and special correlations among consecutive inputs, and therefore, are capable to exploit the inter-frame correlations among speech features derived for successive speech frames. In the experimentations, it was demonstrated that the LR PNN and GLR PNN architectures provide benefit in terms of performance, when compared to the original PNN.
224

Reliability of voice comparison for forensic applications / Fiabilité de la comparaison des voix dans le cadre judiciaire

Ajili, Moez 28 November 2017 (has links)
Dans les procédures judiciaires, des enregistrements de voix sont de plus en plus fréquemment présentés comme élément de preuve. En général, il est fait appel à un expert scientifique pour établir si l’extrait de voix en question a été prononcé par un suspect donné (prosecution hypothesis) ou non (defence hypothesis). Ce prosessus est connu sous le nom de “Forensic Voice Comparison (FVC)” (comparaison de voix dans le cadre judiciaire). Depuis l’émergence du modèle DNA typing, l’approche Bayesienne est devenue le nouveau “golden standard” en sciences criminalistiques. Dans cette approche, l’expert exprime le résultat de son analyse sous la forme d’un rapport de vraisemblance (LR). Ce rapport ne favorise pas seulement une des hypothèses (“prosecution” ou “defence”) mais il fournit également le poids de cette décision. Bien que le LR soit théoriquement suffisant pour synthétiser le résultat, il est dans la pratique assujetti à certaines limitations en raison de son processus d’estimation. Cela est particulièrement vrai lorsque des systèmes de reconnaissance automatique du locuteur (ASpR) sont utilisés. Ces systèmes produisent un score dans toutes les situations sans prendre en compte les conditions spécifiques au cas étudié. Plusieurs facteurs sont presque toujours ignorés par le processus d’estimation tels que la qualité et la quantité d’information dans les deux enregistrements vocaux, la cohérence de l’information entre les deux enregistrements, leurs contenus phonétiques ou encore les caractéristiques intrinsèques des locuteurs. Tous ces facteurs mettent en question la notion de fiabilité de la comparaison de voix dans le cadre judiciaire. Dans cette thèse, nous voulons adresser cette problématique dans le cadre des systèmes automatiques (ASpR) sur deux points principaux. Le premier consiste à établir une échelle hiérarchique des catégories phonétiques des sons de parole selon la quantité d’information spécifique au locuteur qu’ils contiennent. Cette étude montre l’importance du contenu phonétique: Elle met en évidence des différences intéressantes entre les phonèmes et la forte influence de la variabilité intra-locuteurs. Ces résultats ont été confirmés par une étude complémentaire sur les voyelles orales basée sur les paramètres formantiques, indépendamment de tout système de reconnaissance du locuteur. Le deuxième point consiste à mettre en œuvre une approche afin de prédire la fiabilité du LR à partir des deux enregistrements d’une comparaison de voix sans recours à un ASpR. À cette fin, nous avons défini une mesure d’homogénéité (NHM) capable d’estimer la quantité d’information et l’homogénéité de cette information entre les deux enregistrements considérés. Notre hypothèse ainsi définie est que l’homogénéité soit directement corrélée avec le degré de fiabilité du LR. Les résultats obtenus ont confirmé cette hypothèse avec une mesure NHM fortement corrélée à la mesure de fiabilité du LR. Nos travaux ont également mis en évidence des différences significatives du comportement de NHM entre les comparaisons cibles et les comparaisons imposteurs. Nos travaux ont montré que l’approche “force brute” (reposant sur un grand nombre de comparaisons) ne suffit pas à assurer une bonne évaluation de la fiabilité en FVC. En effet, certains facteurs de variabilité peuvent induire des comportements locaux des systèmes, liés à des situations particulières. Pour une meilleure compréhension de l’approche FVC et/ou d’un système ASpR, il est nécessaire d’explorer le comportement du système à une échelle aussi détaillée que possible (le diable se cache dans les détails) / It is common to see voice recordings being presented as a forensic trace in court. Generally, a forensic expert is asked to analyse both suspect and criminal’s voice samples in order to indicate whether the evidence supports the prosecution (same-speaker) or defence (different-speakers) hypotheses. This process is known as Forensic Voice Comparison (FVC). Since the emergence of the DNA typing model, the likelihood-ratio (LR) framework has become the new “golden standard” in forensic sciences. The LR not only supports one of the hypotheses but also quantifies the strength of its support. However, the LR accepts some practical limitations due to its estimation process itself. It is particularly true when Automatic Speaker Recognition (ASpR) systems are considered as they are outputting a score in all situations regardless of the case specific conditions. Indeed, several factors are not taken into account by the estimation process like the quality and quantity of information in both voice recordings, their phonological content or also the speakers intrinsic characteristics, etc. All these factors put into question the validity and reliability of FVC. In this Thesis, we wish to address these issues. First, we propose to analyse how the phonetic content of a pair of voice recordings affects the FVC accuracy. We show that oral vowels, nasal vowels and nasal consonants bring more speaker-specific information than averaged phonemic content. In contrast, plosive, liquid and fricative do not have a significant impact on the LR accuracy. This investigation demonstrates the importance of the phonemic content and highlights interesting differences between inter-speakers effects and intra-speaker’s ones. A further study is performed in order to study the individual speaker-specific information for each vowel based on formant parameters without any use of ASpR system. This study has revealed interesting differences between vowels in terms of quantity of speaker information. The results show clearly the importance of intra-speaker variability effects in FVC reliability estimation. Second, we investigate an approach to predict the LR reliability based only on the pair of voice recordings. We define a homogeneity criterion (NHM) able to measure the presence of relevant information and the homogeneity of this information between the pair of voice recordings. We are expecting that lowest values of homogeneity are correlated with the lowest LR’s accuracy measures, as well as the opposite behaviour for high values. The results showed the interest of the homogeneity measure for FVC reliability. Our studies reported also large differences of behaviour between FVC genuine and impostor trials. The results confirmed the importance of intra-speaker variability effects in FVC reliability estimation. The main takeaway of this Thesis is that averaging the system behaviour over a high number of factors (speaker, duration, content...) hides potentially many important details. For a better understanding of FVC approach and/or an ASpR system, it is mandatory to explore the behaviour of the system at an as-detailed-as-possible scale (The devil lies in the details).
225

Convergence phonétique en interaction Phonetic convergence in interaction / Phonetic convergence in interaction

Lelong, Amélie 03 July 2012 (has links)
Le travail présenté dans cette thèse est basé sur l’étude d’un phénomène appelé convergence phonétique qui postule que deux interlocuteurs en interaction vont avoir tendance à adapter leur façon de parler à leur interlocuteur dans un but communicatif. Nous avons donc mis en place un paradigme appelé « Dominos verbaux » afin de collecter un corpus large pour caractériser ce phénomène, le but final étant de doter un agent conversationnel animé de cette capacité d’adaptation afin d’améliorer la qualité des interactions homme-machine.Nous avons mené différentes études pour étudier le phénomène entre des paires d’inconnus, d’amis de longue date, puis entre des personnes provenant de la même famille. On s’attend à ce que l’amplitude de la convergence soit liée à la distance sociale entre les deux interlocuteurs. On retrouve bien ce résultat. Nous avons ensuite étudié l’impact de la connaissance de la cible linguistique sur l’adaptation. Pour caractériser la convergence phonétique, nous avons développé deux méthodes : la première basée sur une analyse discriminante linéaire entre les coefficients MFCC de chaque locuteur, la seconde utilisant la reconnaissance de parole. La dernière méthode nous permettra par la suite d’étudier le phénomène en condition moins contrôlée.Finalement, nous avons caractérisé la convergence phonétique à l’aide d’une mesure subjective en utilisant un nouveau test de perception basé sur la détection « en ligne » d’un changement de locuteur. Le test a été réalisé à l’aide signaux extraits des interactions mais également avec des signaux obtenus avec une synthèse adaptative basé sur la modélisation HNM. Nous avons obtenus des résultats comparables démontrant ainsi la qualité de notre synthèse adaptative. / The work presented in this manuscript is based on the study of a phenomenon called phonetic convergence which postulates that two people in interaction will tend to adapt how they talk to their partner in a communicative purpose. We have developed a paradigm called “Verbal Dominoes“ to collect a large corpus to characterize this phenomenon, the ultimate goal being to fill a conversational agent of this adaptability in order to improve the quality of human-machine interactions.We have done several studies to investigate the phenomenon between pairs of unknown people, good friends, and between people coming from the same family. We expect that the amplitude of convergence is proportional to the social distance between the two speakers. We found this result. Then, we have studied the knowledge of the linguistic target impact on adaptation. To characterize the phonetic convergence, we have developed two methods: the first one is based on a linear discriminant analysis between the MFCC coefficients of each speaker and the second one used speech recognition techniques. The last method will allow us to study the phenomenon in less controlled conditions.Finally, we characterized the phonetic convergence with a subjective measurement using a new perceptual test called speaker switching. The test was performed using signals coming from real interactions but also with synthetic data obtained with the harmonic plus
226

Adaptation de clones orofaciaux à la morphologie et aux stratégies de contrôle de locuteurs cibles pour l'articulation de la parole / Adaptation of orofacial clones to the morphology and control strategies of target speakers for speech articulation

Valdés Vargas, Julian Andrés 28 June 2013 (has links)
La capacité de production de la parole est apprise et maintenue au moyen d'une boucle de perception-action qui permet aux locuteurs de corriger leur propre production en fonction du retour perceptif reçu. Ce retour est auditif et proprioceptif, mais pas visuel. Ainsi, les sons de parole peuvent être complétés par l'affichage des articulateurs sur l'écran de l'ordinateur, y compris ceux qui sont habituellement cachés tels que la langue ou le voile du palais, ce qui constitue de la parole augmentée. Ce type de système a des applications dans des domaines tels que l'orthophonie, la correction phonétique et l'acquisition du langage. Ce travail a été mené dans le cadre du développement d'un système de retour articulatoire visuel, basé sur la morphologie et les stratégies articulatoires d'un locuteur de référence, qui anime automatiquement une tête parlante 3D à partir du son de la parole. La motivation de cette recherche était d'adapter ce système à plusieurs locuteurs. Ainsi, le double objectif de cette thèse était d'acquérir des connaissances sur la variabilité inter-locuteur, et de proposer des modèles pour adapter un clone de référence, composé de modèles des articulateurs de la parole (lèvres, langue, voile du palais, etc.), à d'autres locuteurs qui peuvent avoir des morphologies et des stratégies articulatoires différentes. Afin de construire des modèles articulatoires pour différents contours du conduit vocal, nous avons d'abord acquis des données qui couvrent l'espace articulatoire dans la langue française. Des Images médio-sagittales obtenues par Résonance Magnétique (IRM) pour onze locuteurs francophones prononçant 63 articulations ont été recueillis. L'un des principaux apports de cette étude est une base de données plus détaillée et plus grande que celles disponibles dans la littérature. Cette base contient, pour plusieurs locuteurs, les tracés de tous les articulateurs du conduit vocal, pour les voyelles et les consonnes, alors que les études précédentes dans la littérature sont principalement basées sur les voyelles. Les contours du conduit vocal visibles dans l'IRM ont été tracés à la main en suivant le même protocole pour tous les locuteurs. Afin d'acquérir de la connaissance sur la variabilité inter-locuteur, nous avons caractérisé nos locuteurs en termes des stratégies articulatoires des différents articulateurs tels que la langue, les lèvres et le voile du palais. Nous avons constaté que chaque locuteur a sa propre stratégie pour produire des sons qui sont considérées comme équivalents du point de vue de la communication parlée. La variabilité de la langue, des lèvres et du voile du palais a été décomposé en une série de mouvements principaux par moyen d'une analyse en composantes principales (ACP). Nous avons remarqué que ces mouvements sont effectués dans des proportions différentes en fonction du locuteur. Par exemple, pour un déplacement donné de la mâchoire, la langue peut globalement se déplacer dans une proportion qui dépend du locuteur. Nous avons également remarqué que la protrusion, l'ouverture des lèvres, l'influence du mouvement de la mâchoire sur les lèvres, et la stratégie articulatoire du voile du palais peuvent également varier en fonction du locuteur. Par exemple, certains locuteurs replient le voile du palais contre la langue pour produire la consonne /ʁ/. Ces résultats constituent également une contribution importante à la connaissance de la variabilité inter-locuteur dans la production de la parole. Afin d'extraire un ensemble de patrons articulatoires communs à différents locuteurs dans la production de la parole (normalisation), nous avons basé notre approche sur des modèles linéaires construits à partir de données articulatoires. Des méthodes de décomposition linéaire multiple ont été appliquées aux contours de la langue, des lèvres et du voile du palais ... / The capacity of producing speech is learned and maintained by means of a perception-action loop that allows speakers to correct their own production as a function of the perceptive feedback received. This auto feedback is auditory and proprioceptive, but not visual. Thus, speech sounds may be complemented by augmented speech systems, i.e. speech accompanied by the virtual display of speech articulators shapes on a computer screen, including those that are typically hidden such as tongue or velum. This kind of system has applications in domains such as speech therapy, phonetic correction or language acquisition in the framework of Computer Aided Pronunciation Training (CAPT). This work has been conducted in the frame of development of a visual articulatory feedback system, based on the morphology and articulatory strategies of a reference speaker, which automatically animates a 3D talking head from the speech sound. The motivation of this research was to make this system suitable for several speakers. Thus, the twofold objective of this thesis work was to acquire knowledge about inter-speaker variability, and to propose vocal tract models to adapt a reference clone, composed of models of speech articulator's contours (lips, tongue, velum, etc), to other speakers that may have different morphologies and different articulatory strategies. In order to build articulatory models of various vocal tract contours, we have first acquired data that cover the whole articulatory space in the French language. Midsagittal Magnetic Resonance Images (MRI) of eleven French speakers, pronouncing 63 articulations, have been collected. One of the main contributions of this study is a more detailed and larger database compared to the studies in the literature, containing information of several vocal tract contours, speakers and consonants, whereas previous studies in the literature are mostly based on vowels. The vocal tract contours visible in the MRI were outlined by hand following the same protocol for all speakers. In order to acquire knowledge about inter-speaker variability, we have characterised our speakers in terms of the articulatory strategies of various vocal tract contours like: tongue, lips and velum. We observed that each speaker has his/her own strategy to achieve sounds that are considered equivalent, among different speakers, for speech communication purposes. By means of principal component analysis (PCA), the variability of the tongue, lips and velum contours was decomposed in a set of principal movements. We noticed that these movements are performed in different proportions depending on the speaker. For instance, for a given displacement of the jaw, the tongue may globally move in a proportion that depends on the speaker. We also noticed that lip protrusion, lip opening, the influence of the jaw movement on the lips, and the velum's articulatory strategy can also vary according to the speaker. For example, some speakers roll up their uvulas against the tongue to produce the consonant /ʁ/ in vocalic contexts. These findings also constitute an important contribution to the knowledge of inter-speaker variability in speech production. In order to extract a set of common articulatory patterns that different speakers employ when producing speech sounds (normalisation), we have based our approach on linear models built from articulatory data. Multilinear decomposition methods have been applied to the contours of the tongue, lips and velum. The evaluation of our models was based in two criteria: the variance explanation and the Root Mean Square Error (RMSE) between the original and recovered articulatory coordinates. Models were also assessed using a leave-one-out cross validation procedure ...
227

Frequency-Tuning and Dynamic Simulation of Electrostatically Actuated Beams

Mittal, Saurabh January 2014 (has links) (PDF)
The resonance frequency of electrostatically actuated micromachined beams can be tuned substantially by applying a DC voltage bias, first by decreasing the frequency until the onset of pull-in and then by increasing it by the virtue of contact. With the objective of modeling and designing the micromechanical structures after pull-in, a semi-analytical method was developed to determine the length of the contact between the beam and the substrate. The semi-analytical method which is validated on the straight beams is extended for the folded beam structures. This method provides a tool to the microsystem designer to quickly evaluate the deformed configuration of the folded beams after pull-in without the time-intensive contact analysis. This tool is used to design the micro‐speaker elements suitable for emitting low frequency sounds. Multiple instabilities after the pull-in were numerically observed and it was shown that the resonant frequency of an L-shaped beam can be varied in different frequency bands. The speaker element can emit any frequency in a given range, as the resonant frequency of the beam structures can be tuned both before and after pull-in. Operating the speaker element at resonance maximizes the efficiency of the speaker design because the amplitude of vibration is maximum at the resonance frequency. Furthermore, the interplay between the torsional and bending loads is used to minimize the out-of-plane deflection under self weight. A selection criterion is employed to choose a beam structure with optimum stiffness and natural frequency. Beam-based micro-speaker element designs with single and multi-layered suspended structures are proposed. Practical considerations such as volume displacement, mode shapes and dynamic coupling are discussed, on the basis of which design guidelines for a speaker element are proposed. Squeeze film effects and nonlinearity due to the midplane stretching is integrated into the transient analysis model to analyze the effect on the stroke of beam operating at resonance. A comparison between various speaker elements is presented.
228

Verifikace osob podle hlasu bez extrakce příznaků / Speaker Verification without Feature Extraction

Lukáč, Peter January 2021 (has links)
Verifikácia osôb je oblasť, ktorá sa stále modernizuje, zlepšuje a snaží sa vyhovieť požiadavkám, ktoré sa na ňu kladú vo oblastiach využitia ako sú autorizačné systmémy, forenzné analýzy, atď. Vylepšenia sa uskutočňujú vďaka pokrom v hlbokom učení, tvorením nových trénovacích a testovacích dátovych sad a rôznych súťaží vo verifikácií osôb a workshopov. V tejto práci preskúmame modely pre verifikáciu osôb bez extrakcie príznakov. Používanie nespracovaných zvukových stôp ako vstupy modelov zjednodušuje spracovávanie vstpu a teda znižujú sa výpočetné a pamäťové požiadavky a redukuje sa počet hyperparametrov potrebných pre tvorbu príznakov z nahrávok, ktoré ovplivňujú výsledky. Momentálne modely bez extrakcie príznakov nedosahujú výsledky modelov s extrakciou príznakov. Na základných modeloch budeme experimentovať s modernými technikamy a budeme sa snažiť zlepšiť presnosť modelov. Experimenty s modernými technikamy značne zlepšili výsledky základných modelov ale stále sme nedosiahli výsledky vylepšeného modelu s extrakciou príznakov. Zlepšenie je ale dostatočné nato aby sme vytovrili fúziu so s týmto modelom. Záverom diskutujeme dosiahnuté výsledky a navrhujeme zlepšenia na základe týchto výsledkov.
229

Kdy kdo mluví? / Speaker Diarization

Tomášek, Pavel January 2011 (has links)
This work aims at a task of speaker diarization. The goal is to implement a system which is able to decide "who spoke when". Particular components of implementation are described. The main parts are feature extraction, voice activity detection, speaker segmentation and clustering and finally also postprocessing. This work also contains results of implemented system on test data including a description of evaluation. The test data comes from the NIST RT Evaluation 2005 - 2007 and the lowest error rate for this dataset is 18.52% DER. Results are compared with diarization system implemented by Marijn Huijbregts from The Netherlands, who worked on the same data in 2009 and reached 12.91% DER.
230

Intersession Variability Compensation in Language and Speaker Identification / Intersession Variability Compensation in Language and Speaker Identification

Hubeika, Valiantsina January 2008 (has links)
Variabilita kanálu a hovoru je velmi důležitým problémem v úloze rozpoznávání mluvčího. V současné době je ve velkém množství vědeckých článků uvedeno několik technik pro kompenzaci vlivu kanálu. Kompenzace vlivu kanálu může být implementována jak v doméně modelu, tak i v doménách příznaků i skóre. Relativně nová výkoná technika je takzvaná eigenchannel adaptace pro GMM (Gaussian Mixture Models). Mevýhodou této metody je nemožnost její aplikace na jiné klasifikátory, jako napřílad takzvané SVM (Support Vector Machines), GMM s různým počtem Gausových komponent nebo v rozpoznávání řeči s použitím skrytých markovových modelů (HMM). Řešením může být aproximace této metody, eigenchannel adaptace v doméně příznaků. Obě tyto techniky, eigenchannel adaptace v doméně modelu a doméně příznaků v systémech rozpoznávání mluvčího, jsou uvedeny v této práci. Po dosažení dobrých výsledků v rozpoznávání mluvčího, byl přínos těchto technik zkoumán pro akustický systém rozpoznávání jazyka zahrnující 14 jazyků. V této úloze má nežádoucí vliv nejen variabilita kanálu, ale i variabilita mluvčího. Výsledky jsou prezentovány na datech definovaných pro evaluaci rozpoznávání mluvčího z roku 2006 a evaluaci rozpoznávání jazyka v roce 2007, obě organizované Amerických Národním Institutem pro Standard a Technologie (NIST)

Page generated in 0.0642 seconds