• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 22
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 36
  • 36
  • 36
  • 20
  • 16
  • 13
  • 12
  • 12
  • 10
  • 9
  • 9
  • 7
  • 6
  • 6
  • 6
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Long-range discrimination of individual vocal signatures by a songbird : from propagation constraints to neural substrate / Discrimination à longue distance des signatures vocales individuelles chez un oiseau chanteur : des contraintes de propagation au substrat neuronal

Mouterde, Solveig 24 June 2014 (has links)
L'un des plus grands défis posés par la communication est que l'information codée par l'émetteur est toujours modifiée avant d'atteindre le récepteur, et que celui-ci doit traiter cette information altérée afin de recouvrer le message. Ceci est particulièrement vrai pour la communication acoustique, où la transmission du son dans l'environnement est une source majeure de dégradation du signal, ce qui diminue l'intensité du signal relatif au bruit. La question de savoir comment les animaux transmettent l'information malgré ces conditions contraignantes a été l'objet de nombreuses études, portant soit sur l'émetteur soit sur le récepteur. Cependant, une recherche plus intégrée sur l'analyse de scènes auditives est nécessaire pour aborder cette tâche dans toute sa complexité. Le but de ma recherche était d'utiliser une approche transversale afin d'étudier comment les oiseaux s'adaptent aux contraintes de la communication à longue distance, en examinant le codage de l'information au niveau de l'émetteur, les dégradations du signal acoustiques dues à la propagation, et la discrimination de cette information dégradée par le récepteur, au niveau comportemental comme au niveau neuronal. J'ai basé mon travail sur l'idée de prendre en compte les problèmes réellement rencontrés par les animaux dans leur environnement naturel, et d'utiliser des stimuli reflétant la pertinence biologique des problèmes posés à ces animaux. J'ai choisi de me focaliser sur l'information d'identité individuelle contenue dans le cri de distance des diamants mandarins (Taeniopygia guttata) et d'examiner comment la signature vocale individuelle est codée, dégradée, puis discriminée et décodée, depuis l'émetteur jusqu'au récepteur. Cette étude montre que la signature individuelle des diamants mandarins est très résistante à la propagation, et que les paramètres acoustiques les plus individualisés varient selon la distance considérée. En testant des femelles dans les expériences de conditionnement opérant, j'ai pu montrer que celles-ci sont expertes pour discriminer entre les signature vocales dégradées de deux mâles, et qu'elles peuvent s'améliorer en s'entraînant. Enfin, j'ai montré que cette capacité de discrimination impressionnante existe aussi au niveau neuronal : nous avons montré l'existence d'une population de neurones pouvant discriminer des voix individuelles à différent degrés de dégradation, sans entrainement préalable. Ce niveau de traitement évolué, dans le cortex auditif primaire, ouvre la voie à de nouvelles recherches, à l'interface entre le traitement neuronal de l'information et le comportement / In communication systems, one of the biggest challenges is that the information encoded by the emitter is always modified before reaching the receiver, who has to process this altered information in order to recover the intended message. In acoustic communication particularly, the transmission of sound through the environment is a major source of signal degradation, caused by attenuation, absorption and reflections, all of which lead to decreases in the signal relative to the background noise. How animals deal with the need for exchanging information in spite of constraining conditions has been the subject of many studies either at the emitter or at the receiver's levels. However, a more integrated research about auditory scene analysis has seldom been used, and is needed to address the complexity of this process. The goal of my research was to use a transversal approach to study how birds adapt to the constraints of long distance communication by investigating the information coding at the emitter's level, the propagation-induced degradation of the acoustic signal, and the discrimination of this degraded information by the receiver at both the behavioral and neural levels. Taking into account the everyday issues faced by animals in their natural environment, and using stimuli and paradigms that reflected the behavioral relevance of these challenges, has been the cornerstone of my approach. Focusing on the information about individual identity in the distance calls of zebra finches Taeniopygia guttata, I investigated how the individual vocal signature is encoded, degraded, and finally discriminated, from the emitter to the receiver. This study shows that the individual signature of zebra finches is very resistant to propagation-induced degradation, and that the most individualized acoustic parameters vary depending on distance. Testing female birds in operant conditioning experiments, I showed that they are experts at discriminating between the degraded vocal signatures of two males, and that they can improve their ability substantially when they can train over increasing distances. Finally, I showed that this impressive discrimination ability also occurs at the neural level: we found a population of neurons in the avian auditory forebrain that discriminate individual voices with various degrees of propagation-induced degradation without prior familiarization or training. The finding of such a high-level auditory processing, in the primary auditory cortex, opens a new range of investigations, at the interface of neural processing and behavior
22

串流式音訊分類於智慧家庭之應用 / Streaming audio classification for smart home environments

溫景堯, Wen, Jing Yao Unknown Date (has links)
聽覺與視覺同為人類最重要的感官。計算式聽覺場景分析(Computation Auditory Scene Analysis, CASA)透過聽覺心理學中對於人耳特性與心理感知的關連性,定義了一個可能的方向,讓電腦聽覺更為貼近人類感知。本研究目的在於應用聽覺心理學之原則,以影像處理與圖型辨識技術,設計音訊增益、切割、描述等對應之處理,透過相似度計算方式實現智慧家庭之環境中的即時音訊分類。 本研究分為三部分,第一部分為音訊處理,將環境中的聲音轉換成電腦可處理與強化之訊號;第二部分透過CASA原則設計影像處理,以冀於影像上達成音訊處理之結果,並以影像特徵加以描述音訊事件;第三部分定義影像特徵之距離,以K個最近鄰點(K-Nearest Neighbor, KNN)技術針對智慧家庭環境常見之音訊事件,實現即時辨識與分類。實驗結果顯示本論文所提出的音訊分類方法有著不錯的效果,對八種家庭環境常見的聲音辨識正確率可達80-90%,而在雜訊或其他聲音干擾的情況下,辨識結果也維持在70%左右。 / Human receive sounds such as language and music through audition. Therefore, audition and vision are viewed as the two most important aspects of human perception. Computational auditory scene analysis (CASA) defined a possible direction to close the gap between computerized audition and human perception using the correlation between features of ears and mental perception in psychology of hearing. In this research, we develop and integrate methods for real-time streaming audio classification based on the principles of psychology of hearing as well as techniques in pattern recognition. There are three major parts in this research. The first is audio processing, translating sounds into information that can be enhanced by computers; the second part uses the principles of CASA to design a framework for audio signal description and event detection by means of computer vision and image processing techniques; the third part defines the distance of image feature vectors and uses K-Nearest Neighbor (KNN) classifier to accomplish audio recognition and classification in real-time. Experimental results show that the proposed approach is quite effective, achieving an overall recognition rate of 80-90% for 8 types of audio input. The performance degrades only slightly in the presence of noise and other interferences.
23

Investigation of noise in hospital emergency departments

Mahapatra, Arun Kiran 08 November 2011 (has links)
The hospital sound environment is complex. Emergency Departments (EDs), in particular, have proven to be hectic work environments populated with diverse sound sources. Medical equipment, alarms, and communication events generate noise that can interfere with staff concentration and communication. In this study, sound measurements and analyses were conducted in six hospitals total: three civilian hospitals in Atlanta, Georgia and Dublin, Ohio, as well as three Washington, DC-area hospitals in the Military Health System (MHS). The equivalent, minimum, and maximum sound pressure levels were recorded over twenty-four hours in several locations in each ED, with shorter 15-30 minute measurements performed in other areas. Acoustic descriptors, such as spectral content, level distributions, and speech intelligibility were examined. The perception of these acoustic qualities by hospital staff was also evaluated through subjective surveys. It was found that noise levels in both work areas and patient rooms were excessive. Additionally, speech intelligibility measurements and survey results show that background noise presents a significant obstacle in effective communication between staff members and patients. Compared to previous studies, this study looks at a wider range of acoustic metrics and the corresponding perceptions of staff in order to form a more precise and accurate depiction of the ED sound environment.
24

Acoustic Space Mapping : A Machine Learning Approach to Sound Source Separation and Localization / Projection d'espaces acoustiques : Une approche par apprentissage automatisé de la séparation et de la localisation de sources sonores

Deleforge, Antoine 26 November 2013 (has links)
Dans cette thèse, nous abordons le problème longtemps étudié de la séparation et localisation binaurale (deux microphones) de sources sonores par l'apprentissage supervisé. Dans ce but, nous développons un nouveau paradigme dénommé projection d'espaces acoustiques, à la croisé des chemins entre la perception binaurale, de l'écoute robotisée, du traitement du signal audio, et de l'apprentissage automatisé. L'approche proposée consiste à apprendre un lien entre les indices auditifs perçus par le système et la position de la source sonore dans une autre modalité du système, comme l'espace visuelle ou l'espace moteur. Nous proposons de nouveaux protocoles expérimentaux permettant d'acquérir automatiquement de grands ensembles d'entraînement qui associent des telles données. Les jeux de données obtenus sont ensuite utilisés pour révéler certaines propriétés intrinsèques des espaces acoustiques, et conduisent au développement d'une famille générale de modèles probabilistes permettant la projection localement linéaire d'un espace de haute dimension vers un espace de basse dimension. Nous montrons que ces modèles unifient plusieurs méthodes de régression et de réduction de dimension existantes, tout en incluant un grand nombre de nouveaux modèles qui généralisent les précédents. Les popriétés et l'inférence de ces modèles sont détaillées en profondeur, et le net avantage des méthodes proposées par rapport à des techniques de l'état de l'art est établit sur différentes applications de projection d'espace, au delà du champs de l'analyse de scènes auditives. Nous montrons ensuite comment les méthodes proposées peuvent être étendues probabilistiquement pour s'attaquer au fameux problème de la soirée cocktail, c'est à dire localiser une ou plusieurs sources émettant simultanément dans un environnement réel, et reséparer les signaux mélangés. Nous montrons que les techniques qui en découlent accomplissent cette tâche avec une précision inégalée. Ceci démontre le rôle important de l'apprentissage et met en avant le paradigme de la projection d'espaces acoustiques comme un outil prometteur pour aborder de façon robuste les problèmes les plus difficiles de l'audition binaurale computationnelle. / In this thesis, we address the long-studied problem of binaural (two microphones) sound source separation and localization through supervised leaning. To achieve this, we develop a new paradigm referred as acoustic space mapping, at the crossroads of binaural perception, robot hearing, audio signal processing and machine learning. The proposed approach consists in learning a link between auditory cues perceived by the system and the emitting sound source position in another modality of the system, such as the visual space or the motor space. We propose new experimental protocols to automatically gather large training sets that associates such data. Obtained datasets are then used to reveal some fundamental intrinsic properties of acoustic spaces and lead to the development of a general family of probabilistic models for locally-linear high- to low-dimensional space mapping. We show that these models unify several existing regression and dimensionality reduction techniques, while encompassing a large number of new models that generalize previous ones. The properties and inference of these models are thoroughly detailed, and the prominent advantage of proposed methods with respect to state-of-the-art techniques is established on different space mapping applications, beyond the scope of auditory scene analysis. We then show how the proposed methods can be probabilistically extended to tackle the long-known cocktail party problem, i.e., accurately localizing one or several sound sources emitting at the same time in a real-word environment, and separate the mixed signals. We show that resulting techniques perform these tasks with an unequaled accuracy. This demonstrates the important role of learning and puts forwards the acoustic space mapping paradigm as a promising tool for robustly addressing the most challenging problems in computational binaural audition.
25

Toward sequential segregation of speech sounds based on spatial cues / Vers la ségrégation séquentielle de signaux de parole sur la base d'indices de position

David, Marion 13 November 2014 (has links)
Dans un contexte sonore constitué de plusieurs sources sonores, l’analyse de scène auditive a pour objectif de dresser une représentation précise et utile des sons perçus. Résoudre ce type de scènes consiste à regrouper les sons provenant d’une même source et de les séparer des autres sons. Ce travail de thèse a eu pour but d’approfondir nos connaissances du traitement de ces scènes auditives complexes par le système auditif. En particulier, il s’agissait d’étudier l’influence potentielle des indices spatiaux sur la ségrégation. Une attention particulière a été portée tout au long de cette thèse pour intégrer des éléments réalistes dans toutes les études menées. Dans un environnement réel, la salle et la tête entraînent des distorsions des signaux de parole en fonction des positions de la source et du récepteur. Ce phénomène est appelé coloration. Comme première approximation de la parole, des bruits avec un spectre de parole ont été utilisés pour évaluer l’effet de la coloration. Les résultats ont montré que les fines différences spectrales monaurales induites par la coloration due à la tête et à la salle peuvent engendrer de la ségrégation. De plus, cette ségrégation peut être renforcée en ajoutant les indices binauraux associés à une position donnée (ILD, ITD). En particulier, une deuxième étude a suggéré que les variations monaurales d’intensité au cours du temps à chaque oreille étaient plus utiles pour la ségrégation que les différences interaurales de niveau. Les résultats ont également montré que le percept de latéralisation, associé à un ITD donné, favorise la ségrégation lorsque ce percept est suffisamment saillant. Par ailleurs, l’ITD per se peut induire de la ségrégation. La capacité naturelle à résoudre perceptivement une scène auditive est pertinente pour l’intelligibilité de la parole. L’objectif était de répliquer ces premières expériences, donc évaluer l’influence des indices spatiaux sur la ségrégation de signaux de parole à la place de bruits gelés. Une caractéristique de la parole est la grande variabilité de ses paramètres acoustiques qui permettent de transmettre de l’information. Ainsi, la première étape a été d’étudier dans quelle mesure la ségrégation basée sur une différence de fréquence peut être influencée par l’introduction de variabilité spectrale au sein des stimuli. L’étape suivante a été d’évaluer la différence de fréquence fondamentale requise pour séparer des flux de parole. En effet, il a été supposé que des indices de position pourraient être utiles pour renforcer la ségrégation basée sur un indice plus robuste comme une différence de F0 du fait de leur stabilité au cours du temps dans des situations réelles. Les résultats de ces expériences préliminaires ont montré que l’introduction d’une large variabilité spectrale au sein de flux de sons purs pouvait entraîner un percept compliqué, probablement constitué des multiples flux sonores. De plus, les résultats ont indiqué qu’une différence de F0 comprise entre 3 et 5 demi-tons permettait de séparer des signaux de parole. Les résultats de ces expériences pourront être utilisés pour concevoir la prochaine expérience visant à étudier dans quelle mesure un percept ambigu peut évoluer vers de la ségrégation par l’introduction d’indices de position. / In a context of competing sound sources, the auditory scene analysis aims to draw an accurate and useful representation of the perceived sounds. Solving such a scene consists of grouping sound events which come from the same source and segregating them from the other sounds. This PhD work intended to further our understanding of how the human auditory system processes these complex acoustic environments, with a particular emphasis on the potential influence of spatial cues on perceptual stream segregation. All the studies conducted during this PhD endeavoured to rely on realistic configurations.In a real environment, the diffraction and reflection properties of the room and the head lead to distortions of the sounds depending on the source and receiver positions. This phenomenon is named colouration. Speechshaped noises, as a first approximation of speech sounds, were used to evaluate the effect of this colouration on stream segregation. The results showed that the slight monaural spectral differences induced by head and room colouration can induce segregation. Moreover, this segregation was enhanced by adding the binaural cues associated with a given position (ITD, ILD). Especially, a second study suggested that the monaural intensity variations across time at each ear were more relevant for stream segregation than the interaural level differences. The results also indicated that the percept of lateralization associated with a given ITD helped the segregation when the lateralization was salient enough. Besides, the ITD per se could also favour segregation.The natural ability to perceptually solve an auditory scene is relevant for speech intelligibility. The main idea was to replicate the first experiments with speech items instead of frozen noises. A characteristic of running speech is a high degree of acoustical variability used to convey information. Thus, as a first step, we investigated the robustness of stream segregation based on a frequency difference to variability on the same acoustical cue (i.e., frequency). The second step was to evaluate the fundamental frequency difference that enables to separate speech items. Indeed, according to the limited effects measured in the two first experiments, it was assumed that spatial cues might be relevant for stream segregation only in interaction with another “stronger” cue such as a F0 difference.The results of these preliminary experiments showed first that the introduction of a large spectral variability introduced within pure tone streams can lead to a complicated percept, presumably consisting of multiple streams. Second, the results suggested that a fundamental frequency difference comprised between 3 and 5 semitones enables to separate speech item. These experiments provided results that will be used to design the next experiment investigating how an ambiguous percept could be biased toward segregation by introducing spatial cues.
26

Harmonic Sound Source Separation in Monaural Music Signals

Goel, Priyank January 2013 (has links) (PDF)
Sound Source Separation refers to separating sound signals according to their sources from a given observed sound. It is efficient to code and very easy to analyze and manipulate sounds from individual sources separately than in a mixture. This thesis deals with the problem of source separation in monaural recordings of harmonic musical instruments. A good amount of literature is surveyed and presented since sound source separation has been tried by many researchers over many decades through various approaches. A prediction driven approach is first presented which is inspired by old-plus-new heuristic used by humans for Auditory Scene Analysis. In this approach, the signals from different sources are predicted using a general model and then these predictions are reconciled with observed sound to get the separated signal. This approach failed for real world sound recordings in which the spectrum of the source signals change very dynamically. Considering the dynamic nature of the spectrums, an approach which uses covariance matrix of amplitudes of harmonics is proposed. The overlapping and non-overlapping harmonics of the notes are first identified with the knowledge of pitch of the notes. The notes are matched on the basis of their covariance profiles. The second order properties of overlapping harmonics of a note are estimated with the use of co-variance matrix of a matching note. The full harmonic is then reconstructed using these second order characteristics. The technique has performed well over sound samples taken from RWC musical Instrument database.
27

Auditory foreground and background decomposition: New perspectives gained through methodological diversification

Thomaßen, Sabine 11 April 2022 (has links)
A natural auditory scene contains many sound sources each of which produces complex sounds. These sounds overlap and reach our ears at the same time, but they also change constantly. To still be able to follow the sound source of interest, the auditory system must decide where each individual tone belongs to and integrate this information over time. For well-controlled investigations on the mechanisms behind this challenging task, sound sources need to be simulated in the lab. This is mostly done with sine tones arranged in certain spectrotemporal patterns. The vast majority of studies simply interleave two sub-sequences of sine tones. Participants report how they perceive these sequences or they perform a task whose performance measure allows hints on how the scene was perceived. While many important insights have been gained with this procedure, the questions that can be addressed with it are limited and the commonly used response methods are partly susceptible to distortions or only indirect measures. The present thesis enlarged the complexity of the tone sequences and the diversity of perceptual measures used for investigations on auditory scene analysis. These changes are intended to open up new questions and give new perspectives on our knowledge about auditory scene analysis. In detail, the thesis established three-tone sequences as a tool for specific investigations on the perceptual foreground and background processing in complex auditory scenes. In addition, it modifies an already established approach for indirect measures of auditory perception in a way that enables detailed and univocal investigations on background processing. Finally, a new response method, namely a no-report method for auditory perception that might also serve as a method to validate subjective report measures, was developed. This new methodological approach uses eye movements as a measurement tool for auditory perception. With the aid of all these methodological improvements, the current thesis shows that auditory foreground formation is actually more complex than previously assumed since listeners hold more than one auditory source in the foreground without being forced to do so. In addition, it shows that the auditory system prefers a limited number of specific source configurations probably to avoid combinatorial explosion. Finally, the thesis indicates that the formation of the perceptual background is also quite complex since the auditory system holds perceptual organization alternatives in parallel that were basically assumed to be mutually exclusive. Thus, both the foreground and the background follow different rules than expected based on two-tone sequences. However, one finding seems to be true for both kinds of sequences: the impact of the tone pattern on the subjective perception is marginal, be it in two- or three-tone sequences. Regarding the no-report method for auditory perception, the thesis shows that eye movements and the reported auditory foreground formations were in good agreement and it seems like this approach indeed has the potential to become a first no-report measure for auditory perception.:Abstract 3 Acknowledgments 5 List of Figures 8 List of Tables 9 Collaborations 11 1 General Introduction 13 1.1 The auditory foreground 13 1.1.1 Attention and auditory scene analysis 13 1.1.2 Investigating auditory scene analysis with two-tone sequences 16 1.1.3 Multistability 18 1.2 The auditory background 21 1.2.1 Investigating auditory background processing 22 1.3 Measures of auditory perception 23 1.3.1 Report procedures 23 1.3.2 Performance-based measures 26 1.3.3 Psychophysiological measures 27 1.4 Summary and goals of the thesis 30 2 The auditory foreground 33 2.1 Study 1: Foreground formation in three-tone sequences 33 2.1.1 Abstract 33 2.1.2 Introduction 33 2.1.3 Methods 37 2.1.4 Results 43 2.1.5 Discussion 48 2.2 Study 2: Pattern effects in three-tone sequences 53 2.2.1 Abstract 53 2.2.2 Methods 53 2.2.3 Results 54 2.2.4 Discussion 58 2.3 Study 3: Pattern effects in two-tone sequences 59 2.3.1 Abstract 59 2.3.2 Introduction 59 2.3.3 General Methods 63 2.3.4 Experiment 1 – Methods and Results 65 2.3.5 Experiment 2 – Methods and Results 67 2.3.6 Experiment 3 – Methods and Results 70 2.3.7 Discussion 72 3 The auditory background 74 3.1 Study 4: Background formation in three-tone sequences 74 3.1.1 Abstract 74 3.1.2 Introduction 74 3.1.3 Methods 77 3.1.4 Results 82 3.1.5 Discussion 86 4 Audio-visual coupling for investigations on auditory perception 90 4.1 Study 5: Using Binocular Rivalry to tag auditory perception 90 4.1.1 Abstract 90 4.1.2 Introduction 90 4.1.3 Methods 92 4.1.4 Results 100 4.1.5 Discussion 108 5 General Discussion 113 5.1 Short review of the findings 113 5.2 The auditory foreground 114 5.2.1 Auditory foreground formation and attention theories 114 5.2.2 The role of tone pattern in foreground formation 116 5.2.3 Methodological considerations and continuation 117 5.3 The auditory background 118 5.3.1 Auditory object formation without attention 120 5.3.2 Multistability without attention 121 5.3.3 Methodological considerations and continuation 122 5.4 Auditory scene analysis by audio-visual coupling 124 5.4.1 Methodological considerations and continuation 124 5.5 Artificial listening situations and conclusions on natural hearing 126 6 Conclusions 128 References 130
28

Deep CASA for Robust Pitch Tracking and Speaker Separation

Liu, Yuzhou January 2019 (has links)
No description available.
29

Deep learning methods for reverberant and noisy speech enhancement

Zhao, Yan 15 September 2020 (has links)
No description available.
30

Supervised Speech Separation Using Deep Neural Networks

Wang, Yuxuan 21 May 2015 (has links)
No description available.

Page generated in 0.08 seconds