Spelling suggestions: "subject:"noise environments""
1 |
Perceptual Binaural Speech Enhancement in Noisy EnviornmentsDong, Rong 02 1900 (has links)
Speech enhancement in multi-speaker babble remains an enormous challenge. In this study, we developed a binaural speech enhancement system to extract information pertaining to a target speech signal embedded in a noisy background for use in future hearing-aid systems. The principle underlying the proposed system is to simulate the perceptual auditory segregation process carried out in the normal human auditory system. Based on the spatial location, pitch and onset cues, the system can identify and enhance those time-frequency regions which constitute the target speech.
The proposed system is capable of dealing with a wide variety of noise intrusions, including competing speech signals and multi-speaker babble. It also works under mild reverberation conditions. Systematic evaluation shows that the system achieves substantial improvement on the intelligibility of target signal, while it largely suppresses the unwanted background signal. / Thesis / Master of Applied Science (MASc)
|
2 |
Robust speech features for speech recognition in hostile environmentsToh, Aik January 1900 (has links)
Speech recognition systems have improved in robustness in recent years with respect to both speaker and acoustical variability. Nevertheless, it is still a challenge to deploy speech recognition systems in real-world applications that are exposed to diverse and significant level of noise. Robustness and recognition accuracy are the essential criteria in determining the extent of a speech recognition system deployed in real-world applications. This work involves development of techniques and extensions to extract robust features from speech and achieve substantial performance in speech recognition. Robustness and recognition accuracy are the top concern in this research. In this work, the robustness issue is approached using the front-end processing, in particular robust feature extraction. The author proposes an unified framework for robust feature and presents a comprehensive evaluation on robustness in speech features. The framework addresses three distinct approaches: robust feature extraction, temporal information inclusion and normalization strategies. The author discusses the issue of robust feature selection primarily in the spectral and cepstral context. Several enhancement and extensions are explored for the purpose of robustness. This includes a computationally efficient approach proposed for moment normalization. In addition, a simple back-end approach is incorporated to improve recognition performance in reverberant environments. Speech features in this work are evaluated in three distinct environments that occur in real-world scenarios. The thesis also discusses the effect of noise on speech features and their parameters. The author has established that statistical properties play an important role in mismatches. The significance of the research is strengthened by the evaluation of robust approaches in more than one scenario and the comparison with the performance of the state-of-the-art features. The contributions and limitations of each robust feature in all three different environments are highlighted. The novelty of the work lies in the diverse hostile environments which speech features are evaluated for robustness. The author has obtained recognition accuracy of more than 98.5% for channel distortion. Recognition accuracy greater than 90.0% has also been maintained for reverberation time 0.4s and additive babble noise at SNR 10dB. The thesis delivers a comprehensive research on robust speech features for speech recognition in hostile environments supported by significant experimental results. Several observations, recommendations and relevant issues associated with robust speech features are presented.
|
3 |
En undersökning och jämförelse av två röststyrningsramverk för Android i bullriga miljöer / An examination and comparison of two speech recognition frameworks for Android in noisy environmentsSandström, Rasmus, Renngård, Jonas January 2017 (has links)
Voice control is a technology that most people encounter or use on a daily basis. The voice control technology can be used to interpret voice commands and execute tasks based on the command pronounced. According to previous studies problems arise with the precision when the voice control technologies are used in noisy environments. This study has been conducted as an experiment where the precision in two voice control frameworks for Android has been examined. The purpose with this study is to examine the precision in these two frameworks to assist a decision making for an organisation who has developed an application which will be used by midwives in low and middle income countries. Two prototypes was developed using the two voice control frameworks PocketSphinx and iSpeech. The precision of these frameworks was tested in three different surroundings. The surroundings the frameworks was tested in had the decibel levels 25, 60, and 80. The result shows that the number of correctly registered voice commands reduces considerably depending on which sound level the frameworks are being tested in. The framework who got the most voice commands correctly registered was PocketSphinx, but even this framework had a big margin of error. / Röststyrning är idag en teknologi som de flesta människor någon gång stöter på eller använder sig av dagligen. Röststyrningsteknologin kan användas för att tolka vissa kommandon som sedan utför en uppgift baserat på det kommando som uttalas. Enligt tidigare studier uppkommer det problem med precisionen hos de röststyrningsramverk som används i bullriga miljöer. Denna studie har utförts som ett experiment där precisionen hos två stycken röststyrningsramverk för Android har undersökts. Syftet med denna studie var att undersöka precisionen hos dessa ramverk för att bistå med underlag till en organisation som utvecklat en applikation som används av barnmorskor i låg- och medelinkomstländer. Två stycken prototyper utvecklades med hjälp av röststyrningsramverken PocketSphinx och iSpeech. Dessa ramverks precision testades i tre stycken olika miljöer. De miljöer som prototyperna testades i hade ljudnivåerna 25dB, 60dB samt 80dB. Resultatet påvisar att antalet korrekt registrerade kommandon minskar avsevärt beroende på vilken ljudnivå som ramverken testas i. Det ramverk som korrekt registrerade flest röstkommandon var PocketSphinx men även denna hade en stor felmarginal.
|
4 |
Analyse de l’environnement sonore pour le maintien à domicile et la reconnaissance d’activités de la vie courante, des personnes âgées / Sound analysis oh the environment for healthcare and recognition of daily life activities for the elderlyRobin, Maxime 17 April 2018 (has links)
L’âge moyen de la population française et européenne augmente, cette constatation apporte de nouveaux enjeux techniques et sociétaux, les personnes âgées étant les personnes les plus fragiles et les plus vulnérables, notamment du point de vue des accidents domestiques et en particulier des chutes. C’est pourquoi de nombreux projets d’aide aux personnes âgées : techniques, universitaires et commerciaux ont vu le jour ces dernières années. Ce travail de thèse a été effectué sous convention Cifre, conjointement entre l’entreprise KRG Corporate et le laboratoire BMBI (Biomécanique et Bio-ingénierie) de l’UTC (Université de technologie de Compiègne). Elle a pour objet de proposer un capteur de reconnaissance de sons et des activités de la vie courante, dans le but d’étoffer et d’améliorer le système de télé-assistance déjà commercialisé par la société. Plusieurs méthodes de reconnaissance de parole ou de reconnaissance du locuteur ont déjà été éprouvées dans le domaine de la reconnaissance de sons, entre autres les techniques : GMM (Modèle de mélange gaussien–Gaussian Mixture Model), SVM-GSL (Machine à vecteurs de support, GMM-super-vecteur à noyau linéaire – Support vector machine GMM Supervector Linear kernel) et HMM (Modèle de Markov caché – Hidden Markov Model). De la même manière, nous nous sommes proposés d’utiliser les i-vecteurs pour la reconnaissance de sons. Les i-vecteurs sont utilisés notamment en reconnaissance de locuteur, et ont révolutionné ce domaine récemment. Puis nous avons élargi notre spectre, et utilisé l’apprentissage profond (Deep Learning) qui donne actuellement de très bon résultats en classification tous domaines confondus. Nous les avons tout d’abord utilisés en renfort des i-vecteurs, puis nous les avons utilisés comme système de classification exclusif. Les méthodes précédemment évoquées ont également été testées en conditions bruités puis réelles. Ces différentes expérimentations nous ont permis d’obtenir des taux de reconnaissance très satisfaisants, les réseaux de neurones en renfort des i-vecteurs et les réseaux de neurones seuls étant les systèmes ayant la meilleure précision, avec une amélioration très significative par rapport aux différents systèmes issus de la reconnaissance de parole et de locuteur. / The average age of the French and European population is increasing; this observation brings new technical and societal challenges. Older people are the most fragile and vulnerable, especially in terms of domestic accidents and specifically falls. This is why many elderly people care projects : technical, academic and commercial have seen the light of day in recent years. This thesis work wasc arried out under Cifre agreement, jointly between the company KRG Corporate and the BMBI laboratory (Biomechanics and Bioengineering) of the UTC (Université of Technologie of Compiègne). Its purpose is to offer a sensor for sound recognition and everyday activities, with the aim of expanding and improving the tele-assistance system already marketed by the company. Several speech recognition or speaker recognition methods have already been proven in the field of sound recognition, including GMM (Modèle de mélange gaussien – Gaussian Mixture Model), SVM-GSL (Machine à vecteurs de support, GMM-super-vecteur à noyau linéaire – Support vector machine GMM Supervector Linear kernel) and HMM (Modèle de Markov caché – Hidden Markov Model). In the same way, we proposed to use i-vectors for sound recognition. I-Vectors are used in particular in speaker recognition, and have revolutionized this field recently. Then we broadened our spectrum, and used Deep Learning, which currently gives very good results in classification across all domains. We first used them to reinforce the i-vectors, then we used them as our exclusive classification system. The methods mentioned above were also tested under noisy and then real conditions. These different experiments gaves us very satisfactory recognition rates, with neural networks as reinforcement for i-vectors and neural networks alone being the most accurate systems, with a very significant improvement compared to the various speech and speaker recognition systems.
|
5 |
Effet d’une exposition à long-terme à un milieu bruité sur l’audiogramme et les propriétés fonctionnelles des neurones du cortex auditif primaire / Effects of a long term exposure to a noisy environment on the audiogram and functionnal properties of neurons in the primary auditory cortexOccelli, Florian 30 November 2015 (has links)
Depuis quelques années, des recherches décrivent des effets alarmants de l’exposition à des environnements acoustiques artificiels sur les propriétés fonctionnelles des neurones du système auditif. L’objectif de ce projet était de déterminer si une exposition à très long terme à une intensité sonore, qui n’est pas reconnue par les législations pour provoquer des pertes permanentes ou temporaires (80dB SLP 8h/jour), induisait ou pas des changements au niveau des audiogrammes et des propriétés fonctionnelles des neurones du cortex auditif primaire.Des rattes adultes (Sprague Dawley) ont été exposées entre 3 mois à 18 mois (selon les groupes) à un milieu acoustique mimant les environnements sonores quotidiens de la majorité de la population et dont les effets n’ont jamais été étudiés sur de telles durées. L’originalité de ce projet réside dans l’analyse des effets à tous les niveaux du système auditif depuis le niveau périphérique (ABRs) jusqu’au niveau central (électrophysiologie corticale) ainsi que les conséquences possibles au niveau comportemental. Une tâche d’apprentissage perceptif inédite a été mise au point afin d’évaluer les effets de l’exposition. Au cours du vieillissement, nos données montrent une baisse des performances comportementales, une atteinte progressive des seuils ABRs et des atteintes de certains paramètres des réponses neuronales comme (i) la latence, (ii) la durée, (iii) la détection de silence dans une vocalisation, (iv) le suivit d’une modulation d’amplitude, (v) la reproductibilité des réponses à une vocalisation. Le principal effet de l’exposition à un environnement bruité est l’apparition d’un TTS après 6 à 12 mois d’exposition (qui disparait complètement en 3 semaines), sans que cela ait, de façon très surprenante, la moindre conséquence notable sur les seuils ABRs, l’activité évoquée corticale, ou les performances de discrimination des animaux. Ces résultats nous incitent à la prudence sur la généralisation des conclusions à tirer des expositions à des environnements bruités artificiels. / Over the last few years, studies have described alarming effects of exposure to artificial acoustic environments on the functional properties of neurons in the auditory system. The aim of this project was to determine if long-lasting exposure at a sound intensity which is not recognized by the legislation to cause permanent or temporary hearing loss (80 dB SLP 8h/ day) induced, or not, changes in the audiograms and functional properties of neurons in theprimary auditory cortex. Adult female rats (Sprague Dawley) were exposed over 3 to 18 months (depending on the group) to an acoustic environment mimicking daily sound environments surrounding a large part of the population, and whose effects have never been studied on such durations. The originality of this project lies in analyzing the effects at alllevels of the auditory system from peripheral (via ABRs) to central levels (cortical electrophysiology) and also the possible consequences at the behavioral level. A new perceptual learning task has been developed to assess the effects of exposure. During aging, our data showed a decrease in behavioral performance, a gradual impairment of ABRs thresholds as well as an impairment in parameters of the neural responses such as (i) the response latency, (ii) response duration, (iii) the ability to detect silence in a vocalization (iv) or to follow an amplitude modulation, (v) the reproducibility of response to vocalization. The main effect of exposure to a noisy environment is the appearance of a Temporary Threshold Shift (TTS) after 6 to 12 months of exposure (which completely disappears in three weeks). Surprisingly, this long lasting TTS had apparently no e ffect on ABRs thresholds, the evokedcortical activity, or the animal’s discrimination performance. These results encourage us to be quite cautious in generalizing the conclusions to be drawn from exposures to artificial noisyenvironments.
|
Page generated in 0.0657 seconds