• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 71
  • 13
  • 12
  • 10
  • 8
  • 8
  • 8
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 162
  • 55
  • 53
  • 42
  • 24
  • 23
  • 23
  • 22
  • 20
  • 20
  • 19
  • 18
  • 12
  • 11
  • 10
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
141

Repairing Swedish Automatic Speech Recognition / Korrigering av Automatisk Taligenkänning för Svenska

Rehn, Karla January 2021 (has links)
The quality of automatic speech recognition has increased dramatically the last few years, but the performance for low and middle resource languages such as Swedish is still far from optimal. In this project a language model trained on large written corpora called KB-BERT is utilized to improve the quality of transcriptions for Swedish. The large language model is inserted as a repairing module after the automatic speech recognition, aiming to repair the original output into a transcription more closely resembling the ground truth by using a sequence to sequence translating approach. Two automatic speech recognition models are used to transcribe the speech, one of the models are developed in this project using the Kaldi framework, the other model is Microsoft’s Azure Speech to text platform. The performance of the translator is evaluated with four different datasets, three consisting of read speech and one of spontaneous speech. The spontaneous speech and one of the read datasets include both native and non-native speakers. The performance is measured by three different metrics, word error rate, a weighted word error rate and a semantic similarity. The repairs improve the transcriptions of two of the read speech datasets significantly, decreasing the word error rate from 13.69% to 3.05% and from 36.23% to 21.17%. The repairs improve the word error rate from 44.38% to 44.06% on the data with spontaneous speech, and fail on the last read dataset, instead increasing the word error rate. The lower performance on the latter is likely due to lack of data. / Automatisk taligenkänning har förbättrats de senaste åren, men för små språk såsom svenska är prestandan fortfarande långt ifrån optimal. Det här projektet använder KB-BERT, en neural språkmodell tränad på stora mängder skriven text, för att förbättra kvalitén på transkriptioner av svenskt tal. Transkriptionerna kommer från två olika taligenkänningsmodeller, dels en utvecklad i det här projektet med hjälp av mjukvarubiblioteket Kaldi, dels Microsoft Azures plattform för tal till text. Transkriptionerna repareras med hjälp av en sequence-to-sequence översättningsmodell, och KB-BERT används för att initiera modellen. Översättningen sker från den urpsrungliga transkriptionen från en av tal-till-text-modellerna till en transkription som är mer lik den korrekta, faktiska transkriptionen. Kvalitéen på reparationerna evalueras med tre olika metriker, på fyra olika dataset. Tre av dataseten är läst tal och det fjärde spontant, och det spontana talet samt ett av de lästa dataseten kommer både från talare som har svenska som modersmål, och talare som har det som andraspråk. De tre metrikerna är word error rate, en viktad word error rate, samt ett mått på semantisk likhet. Reparationerna förbättrar transkriptionerna från två av de lästa dataseten markant, och sänker word error rate från 13.69% till 3.05% och från 36.23% till 21.17%. På det spontana talet sänks word error rate från 44.38% till 44.06%. Reparationerna misslyckas på det fjärde datasetet, troligen på grund av dess lilla storlek.
142

Degradation in Performance of Lanthanum Strontium Manganite Based Solid Oxide Fuel Cell Cathodes Under Accelerated Testing

Cooper, Celeste Eaton 29 June 2017 (has links)
No description available.
143

Použití rekurentních neuronových sítí pro automatické rozpoznávání řečníka, jazyka a pohlaví / Neural networks for automatic speaker, language, and sex identification

Do, Ngoc January 2016 (has links)
Title: Neural networks for automatic speaker, language, and sex identifica- tion Author: Bich-Ngoc Do Department: Institute of Formal and Applied Linguistics Supervisor: Ing. Mgr. Filip Jurek, Ph.D., Institute of Formal and Applied Linguistics and Dr. Marco Wiering, Faculty of Mathematics and Natural Sciences, University of Groningen Abstract: Speaker recognition is a challenging task and has applications in many areas, such as access control or forensic science. On the other hand, in recent years, deep learning paradigm and its branch, deep neural networks have emerged as powerful machine learning techniques and achieved state-of- the-art in many fields of natural language processing and speech technology. Therefore, the aim of this work is to explore the capability of a deep neural network model, recurrent neural networks, in speaker recognition. Our pro- posed systems are evaluated on TIMIT corpus using speaker identification task. In comparison with other systems in the same test conditions, our systems could not surpass reference ones due to the sparsity of validation data. In general, our experiments show that the best system configuration is a combination of MFCCs with their dynamic features and a recurrent neural network model. We also experiment recurrent neural networks and convo- lutional neural...
144

Architecture et contrôle du patinage d'un véhicule mono et multi-source de puissance / Architecture and slipping control of a mono and multi-source vehicle

Chapuis, Cédric 13 November 2012 (has links)
Les progrès techniques faits ces dernières années dans le domaine des batteries ainsi que le durcissement des normes écologiques entraînent un regain d'intérêt pour les véhicules hybrides et électriques. La possibilité d'utiliser plusieurs sources de puissance à l'intérieur d'un même véhicule conduit à remettre en question les architectures traditionnelles des véhicules et à étudier des architectures multi-sources. Après un état de l'art des architectures et des systèmes de transmission de couple, le véhicule prototype du projet VELROUE, utilisé par la suite comme moyen d'essai, est présenté. Puis, le contrôle du patinage des roues arrière du véhicule VELROUE équipé d'un moteur thermique sur le train avant et de deux moteurs électriques reliés aux roues arrière est étudié. Ensuite, différents modèles véhicules sont détaillés en vue d'analyser les transferts d'énergie au sein du système à l'aide de l'outil Bond Graph, de synthétiser des lois de commande de contrôle du patinage et de simuler le comportement du véhicule pour valider les fonctions d'anti-patinage (ASR). Une première commande de type PID qui servira de référence est dans un premier temps introduite. La contribution principale de ce travail de thèse concerne la synthèse et la mise en oeuvre de commandes non linéaires soit par retour linéarisant, soit basée sur la théorie de la platitude. Les modèles de synthèses de commande sont issus d'hypothèses classiques retenues lors des situations de vie considérées : dynamique longitudinale, pompage et tangage sur un double modèle bicyclette. Une stratégie de commande est également développée afin d'assurer la sécurité du conducteur, de réduire les besoins matériels et d'améliorer l'agrément conducteur. Enfin, les commandes non linéaires développées sont testées en simulation puis validées expérimentalement sur le véhicule VELROUE. Une comparaison de ces commandes est effectuée selon des critères énergétiques, de performances, de complexité et de coût. Ces techniques développées pour l'ASR sont étendues pour des phases de freinage récupératif (MSR), qui constitue également une originalité de ces travaux. / The technical progress made during last years in the battery field and the environmental standards hardening lead to an increased interest in hybrid and electric vehicles. The possibility to use several power sources inside a vehicle leads to question the traditional vehicle architectures and to study multi-power sources architectures. After a state of the art on architectures and torque transmission systems, the VELROUE project's prototype is presented. This prototype is later used as a validation platform. Then, the rear wheels slipping control of the VELROUE vehicle which is equipped with an internal combustion engine on the front axle and with two electric motors connected to the rear wheels is studied. Next, different vehicle models are described to analyze energy transfers inside the system using Bond Graph, to synthesize anti-slipping control laws and to simulate the vehicle behavior in order to validate the anti-slipping functions (ASR). A first PID-like controller is initially introduced to serve as reference. The main contribution of this thesis deals with the synthesis and implementation of nonlinear controls either using linearizing feedback, or based on the flatness theory. The synthesis controls models come from classical hypothesis: longitudinal and vertical dynamics and pitch on a double bicycle model. A control strategy is also developed to assure driver's security, to reduce material needs and to enhance the driver approval. Finally, the nonlinear controls developed here are simulated and then experimentally validated on the VELROUE vehicle. A comparison of these commands is performed according to energy, performance, complexity and cost criteria. These control laws developed for ASR are extended to regenerative braking phases (MSR), which is also an originality of this work.
145

Joint Evaluation Of Multiple Speech Patterns For Speech Recognition And Training

Nair, Nishanth Ulhas 01 1900 (has links)
Improving speech recognition performance in the presence of noise and interference continues to be a challenging problem. Automatic Speech Recognition (ASR) systems work well when the test and training conditions match. In real world environments there is often a mismatch between testing and training conditions. Various factors like additive noise, acoustic echo, and speaker accent, affect the speech recognition performance. Since ASR is a statistical pattern recognition problem, if the test patterns are unlike anything used to train the models, errors are bound to occur, due to feature vector mismatch. Various approaches to robustness have been proposed in the ASR literature contributing to mainly two topics: (i) reducing the variability in the feature vectors or (ii) modify the statistical model parameters to suit the noisy condition. While some of those techniques are quite effective, we would like to examine robustness from a different perspective. Considering the analogy of human communication over telephones, it is quite common to ask the person speaking to us, to repeat certain portions of their speech, because we don't understand it. This happens more often in the presence of background noise where the intelligibility of speech is affected significantly. Although exact nature of how humans decode multiple repetitions of speech is not known, it is quite possible that we use the combined knowledge of the multiple utterances and decode the unclear part of speech. Majority of ASR algorithms do not address this issue, except in very specific issues such as pronunciation modeling. We recognize that under very high noise conditions or bursty error channels, such as in packet communication where packets get dropped, it would be beneficial to take the approach of repeated utterances for robust ASR. In this thesis, we have formulated a set of algorithms for both joint evaluation/decoding for recognizing noisy test utterances as well as utilize the same formulation for selective training of Hidden Markov Models (HMMs), again for robust performance. We first address joint recognition of multiple speech patterns given that they belong to the same class. We formulated this problem considering the patterns as isolated words. If there are K test patterns (K ≥ 2) of a word by a speaker, we show that it is possible to improve the speech recognition accuracy over independent single pattern evaluation of test speech, for the case of both clean and noisy speech. We also find the state sequence which best represents the K patterns. This formulation can be extended to connected word recognition or continuous speech recognition also. Next, we consider the benefits of joint multi-pattern likelihood for HMM training. In the usual HMM training, all the training data is utilized to arrive at a best possible parametric model. But, it is possible that the training data is not all genuine and therefore may have labeling errors, noise corruptions, or plain outlier exemplars. Such outliers will result in poorer models and affect speech recognition performance. So it is important to selectively train them so that the outliers get a lesser weightage. Giving lesser weight to an entire outlier pattern has been addressed before in speech recognition literature. However, it is possible that only some portions of a training pattern are corrupted. So it is important that only the corrupted portions of speech are given a lesser weight during HMM training and not the entire pattern. Since in HMM training, multiple patterns of speech from each class are used, we show that it is possible to use joint evaluation methods to selectively train HMMs such that only the corrupted portions of speech are given a lesser weight and not the entire speech pattern. Thus, we have addressed all the three main tasks of a HMM, to jointly utilize the availability of multiple patterns belonging to the same class. We experimented the new algorithms for Isolated Word Recognition in the case of both clean speech and noisy speech. Significant improvement in speech recognition performance is obtained, especially for speech affected by transient/burst noise.
146

Etude du comportement d'un alliage chromino-formeur comme matériau d'interconnecteur pour l'Electrolyse à Haute Température

Guillou, Sébastien 01 December 2011 (has links) (PDF)
Dans les systèmes d'Electrolyse Haute Température (EHT), le matériau choisi comme interconnecteur doit avoir une bonne résistance à la corrosion sous air et sous mélange H2/H2O à 800 °C, et maintenir une bonne conductivité sur de longues durées. Dans ce cadre, l'objectif de ce travail était, d'une part, d'évaluer un alliage ferritique commercial (l'alliage K41X) comme matériau d'interconnecteur pour l'application EHT. Dans ce but, ont été mis en place des essais d'oxydation en four et en thermoblance pour accéder aux cinétiques d'oxydation, et des mesures de résistivité pour évaluer le paramètre ASR (Area Specific Resistance) à 800°C. D'autre part, l'étude a permis d'apporter des éléments de compréhension plus fondamentaux sur les mécanismes d'oxydation des alliages chromino-formeurs, en particulier sous mélange H2-H2O, par le biais d'essais et de caractérisations spécifiques (Photoélectrochimie, traçage isotopique, essais de longues durées). Cette double stratégie est également appliquée pour l'étude d'une solution de revêtement (obtenu à l'aide de la MOCVD) basée sur l'oxyde pérovskite LaCrO3 qui présente des propriétés de conductivité élevée particulièrement intéressante en vue de l'application EHT. Ainsi, cette étude amène également des éléments de compréhension sur le rôle du lanthane comme élément réactif dont l'effet est souvent discuté dans la littérature. Pour les deux milieux, à 800°C, la couche d'oxyde formée est une couche duplexe Cr2O3/(Mn,Cr)3O4 , recouverte dans le cas du mélange H2-H2O par une fine couche d'oxyde spinelle Mn2TiO4 . Sous air, le mécanisme de croissance déterminé ici est cationique, en accord avec la littérature. La présence d'un revêtement LaCrO3 ne modifie pas ce mécanisme mais ralentit la cinétique de croissance de la couche sur les premières centaines d'heure. De plus, le revêtement améliore l'adhérence et la conductivité de la couche d'oxyde. Sous mélange H2-H2O, le mécanisme de croissance se révèle anionique. La présence de revêtement ralentit la cinétique d'oxydation. Bien que .d'épaisseurs similaires, les couches d'oxyde présentent sous air une résistivité d'un ordre de grandeur inférieure à celle mesurée sous H2-H2O. Il est mis en évidence que la forte résistivité de l'alliage en milieu H2-H2O est liée à la présence de protons issus de la vapeur d'eau présents dans la couche d'oxyde. Le revêtement ne permet néanmoins pas d améliorer la conductivité sous H2-H2O.
147

Contribution to the requalification of alkali silica reaction (ASR) damaged structures : assessment of the ASR advancement in aggregates by alkali silica reaction / Contribution à la requalification des structures endommagées par l’alcali réaction : evaluation de l’avancement de l’alcali réaction dans les granulats

Gao, Xiao Xiao 16 December 2010 (has links)
Afin de répondre aux questions des propriétaires de structures atteintes de réaction alcali-silice (RAS), ce travail se concentre sur une partie d'une méthodologie globale, proposée initialement par le LMDC et EDF, et dont le but est l'étude du comportement mécanique des constructions endommagées par la RAS. Pour atteindre cet objectif, l'avancement chimique de la RAS des granulats récupérés dans les structures affectées doit être évalué. Ainsi, ce travail est consacré à la quantification de la silice potentiellement réactive des granulats, par l'utilisation de deux approches : une approche indirecte par un test d'expansion et une approche directe par des méthodes chimiques. La présentation du manuscrit s'articule autour des points suivants :• Un test d'expansion pertinent et rapide sur mortiers pour relier la quantité de silice réactive à l'expansion mesurée. Les conditions expérimentales suivantes ont été choisies pour tester différentes tailles et natures de granulats, ainsi que différentes tailles d'éprouvettes : solution de NaOH à 1 mol/l et température de conservation de 60°C.• Une méthode chimique rapide de dissolution sélective pour mesurer directement la quantité de silice réactive disponible pour la RAS. La méthode HF / HF+HCl a été trouvé comme étant la plus efficace.• Un modèle chemo-mécanique pour analyser les effets de la taille des granulats et des éprouvettes, et évaluer l'avancement chimique de la réaction.Finalement, une méthodologie est proposée pour calculer la constante cinétique de la réaction dans le cadre de la requalification des structures atteintes de RAS. / In order to answer the questions of the ASR-affected structures owners, this work focused on a part of a global methodology, which is proposed originally by the LMDC and EDF, aiming to reassess the mechanical behavior of ASR-damaged constructions. To achieve this purpose, the chemical advancement of ASR in the aggregates recovered from the structure should be evaluated. Thus, this work focuses on the assessment of the potentially reactive silica content with two main methods: indirectly by expansion test and directly by chemical methods. The presentation of this manuscript is around the following points: • A relevant and rapid expansion test on mortars to link the reactive silica content to measured expansion. The experimental condition: 1 mol/l NaOH solution conserved at 60°C is chosen to test different aggregate sizes, specimen sizes and natures of aggregate. • A fast chemical method of selective dissolution to measure directly the silica available for ASR. Acid/basic methods are tested and compared; HF / HF+HCl method is found to be the most effective. • A chemo-mechanical model to analyze the effect of aggregate size and specimen size, and evaluate the chemical advancement of ASR. Finally, a methodology is proposed to calculate the kinetics constant in the framework of structural requalification. Key words: alkali-silica reaction (ASR), chemical advancement, reactive silica, expansion test, chemical test, chemo-mechanical model, kinetic constant, selective dissolution
148

Integrace hlasových technologií na mobilní platformy / Integration of Voice Technologies on Mobile Platforms

Černičko, Sergij January 2013 (has links)
The goal of the thesis is being familiar with methods a techniques used in speech processing. Describe the current state of research and development of speech technology. Project and implement server speech recognizer that uses BSAPI. Integrate client that will use server for speech recognition to mobile dictionaries of Lingea company.
149

Automatic Annotation of Speech: Exploring Boundaries within Forced Alignment for Swedish and Norwegian / Automatisk Anteckning av Tal: Utforskning av Gränser inom Forced Alignment för Svenska och Norska

Biczysko, Klaudia January 2022 (has links)
In Automatic Speech Recognition, there is an extensive need for time-aligned data. Manual speech segmentation has been shown to be more laborious than manual transcription, especially when dealing with tens of hours of speech. Forced alignment is a technique for matching a signal with its orthographic transcription with respect to the duration of linguistic units. Most forced aligners, however, are language-dependent and trained on English data, whereas under-resourced languages lack the resources to develop an acoustic model required for an aligner, as well as manually aligned data. An alternative solution to the training of new models can be cross-language forced alignment, in which an aligner trained on one language is used for aligning data in another language.  This thesis aimed to evaluate state-of-the-art forced alignment algorithms available for Swedish and test whether a Swedish model could be applied for aligning Norwegian. Three approaches for forced aligners were employed: (1) one forced aligner based on Dynamic Time Warping and text-to-speech synthesis Aeneas, (2) two forced aligners based on Hidden Markov Models, namely the Munich AUtomatic Segmentation System (WebMAUS) and the Montreal Forced Aligner (MFA) and (3) Connectionist Temporal Classification (CTC) segmentation algorithm with two pre-trained and fine-tuned Wav2Vec2 Swedish models. First, small speech test sets for Norwegian and Swedish, covering different types of spontaneousness in the speech, were created and manually aligned to create gold-standard alignments. Second, the performance of the Swedish dataset was evaluated with respect to the gold standard. Finally, it was tested whether Swedish forced aligners could be applied for aligning Norwegian data. The performance of the aligners was assessed by measuring the difference between the boundaries set in the gold standard from that of the comparison alignment. The accuracy was estimated by calculating the proportion of alignments below a particular threshold proposed in the literature. It was found that the performance of the CTC segmentation algorithm with Wav2Vec2 (VoxRex) was superior to other forced alignment systems. The differences between the alignments of two Wav2Vec2 models suggest that the training data may have a larger influence on the alignments, than the architecture of the algorithm. In lower thresholds, the traditional HMM approach outperformed the deep learning models. Finally, findings from the thesis have demonstrated promising results for cross-language forced alignment using Swedish models to align related languages, such as Norwegian.
150

Signal Processing Methods for Reliable Extraction of Neural Responses in Developmental EEG

Kumaravel, Velu Prabhakar 27 February 2023 (has links)
Studying newborns in the first days of life prior to experiencing the world provides remarkable insights into the neurocognitive predispositions that humans are endowed with. First, it helps us to improve our current knowledge of the development of a typical brain. Secondly, it potentially opens new pathways for earlier diagnosis of several developmental neurocognitive disorders such as Autism Spectrum Disorder (ASD). While most studies investigating early cognition in the literature are purely behavioural, recently there has been an increasing number of neuroimaging studies in newborns and infants. Electroencephalography (EEG) is one of the most optimal neuroimaging technique to investigate neurocognitive functions in human newborns because it is non-invasive and quick and easy to mount on the head. Since EEG offers a versatile design with custom number of channels/electrodes, an ergonomic wearable solution could help study newborns outside clinical settings such as their homes. Compared to adult EEG, newborn EEG data are different in two main aspects: 1) In experimental designs investigating stimulus-related neural responses, collected data is extremely short in length due to the reduced attentional span of newborns; 2) Data is heavily contaminated with noise due to their uncontrollable movement artifacts. Since EEG processing methods for adults are not adapted to very short data length and usually deal with well-defined, stereotyped artifacts, they are unsuitable for newborn EEG. As a result, researchers manually clean the data, which is a subjective and time-consuming task. This thesis work is specifically dedicated to developing (semi-) automated novel signal processing methods for noise removal and for extracting reliable neural responses specific to this population. The solutions are proposed for both high-density EEG for traditional lab-based research and wearable EEG for clinical applications. To this end, this thesis, first, presents novel signal processing methods applied to newborn EEG: 1) Local Outlier Factor (LOF) for detecting and removing bad/noisy channels; 2) Artifacts Subspace Reconstruction (ASR) for detecting and removing or correcting bad/noisy segments. Then, based on these algorithms and other preprocessing functionalities, a robust preprocessing pipeline, Newborn EEG Artifact Removal (NEAR), is proposed. Notably, this is the first time LOF is explored for EEG bad channel detection, despite being a popular outlier detection technique in other kinds of data such as Electrocardiogram (ECG). Even if ASR is already an established artifact real algorithm originally developed for mobile adult EEG, this thesis explores the possibility of adapting ASR for short newborn EEG data, which is the first of its kind. NEAR is validated on simulated, real newborn, and infant EEG datasets. We used the SEREEGA toolbox to simulate neurologically plausible synthetic data and contaminated a certain number of channels and segments with artifacts commonly manifested in developmental EEG. We used newborn EEG data (n = 10, age range: 1 and 4 days) recorded in our lab based on a frequency-tagging paradigm. The chosen paradigm consists of visual stimuli to investigate the cortical bases of facelike pattern processing, and the results were published in 2019. To test NEAR performance on an older population with an event-related design (ERP) and with data recorded in another lab, we also evaluated NEAR on infant EEG data recorded on 9-months-old infants (n = 14) with an ERP paradigm. The experimental paradigm for these datasets consists of auditory stimulus to investigate the electrophysiological evidence for understanding maternal speech, and the results were published in 2012. Since authors of these independent studies employed manual artifact removal, the obtained neural responses serve as ground truth for validating NEAR’s artifact removal performance. For comparative evaluation, we considered the performance of two state-of-the-art pipelines designed for older infants. Results show that NEAR is successful in recovering the neural responses (specific to the EEG paradigm and the stimuli) compared to the other pipelines. In sum, this thesis presents a set of methods for artifact removal and extraction of stimulus-related neural responses specifically adapted to newborn and infant EEG data that will hopefully contribute to strengthening the reliability and reproducibility of developmental cognitive neuroscience studies, both in research laboratories and in clinical applications.

Page generated in 0.0874 seconds