31 |
Monaural Speech Segregation in Reverberant EnvironmentsJin, Zhaozhang 27 September 2010 (has links)
No description available.
|
32 |
A biologically inspired approach to the cocktail party problemChou, Kenny F. 19 May 2020 (has links)
At a cocktail party, one can choose to scan the room for conversations of interest, attend to a specific conversation partner, switch between conversation partners, or not attend to anything at all. The ability of the normal-functioning auditory system to flexibly listen in complex acoustic scenes plays a central role in solving the cocktail party problem (CPP). In contrast, certain demographics (e.g., individuals with hearing impairment or older adults) are unable to solve the CPP, leading to psychological ailments and reduced quality of life. Since the normal auditory system still outperforms machines in solving the CPP, an effective solution may be found by mimicking the normal-functioning auditory system.
Spatial hearing likely plays an important role in CPP-processing in the auditory system. This thesis details the development of a biologically based approach to the CPP by modeling specific neural mechanisms underlying spatial tuning in the auditory cortex. First, we modeled bottom-up, stimulus-driven mechanisms using a multi-layer network model of the auditory system. To convert spike trains from the model output into audible waveforms, we designed a novel reconstruction method based on the estimation of time-frequency masks. We showed that our reconstruction method produced sounds with significantly higher intelligibility and quality than previous reconstruction methods. We also evaluated the algorithm's performance using a psychoacoustic study, and found that it provided the same amount of benefit to normal-hearing listeners as a current state-of-the-art acoustic beamforming algorithm.
Finally, we modeled top-down, attention driven mechanisms that allowed the network to flexibly operate in different regimes, e.g., monitor the acoustic scene, attend to a specific target, and switch between attended targets. The model explains previous experimental observations, and proposes candidate neural mechanisms underlying flexible listening in cocktail-party scenarios. The strategies proposed here would benefit hearing-assistive devices for CPP processing (e.g., hearing aids), where users would benefit from switching between various modes of listening in different social situations. / 2022-05-19T00:00:00Z
|
33 |
Long-range discrimination of individual vocal signatures by a songbird : from propagation constraints to neural substrate / Discrimination à longue distance des signatures vocales individuelles chez un oiseau chanteur : des contraintes de propagation au substrat neuronalMouterde, Solveig 24 June 2014 (has links)
L'un des plus grands défis posés par la communication est que l'information codée par l'émetteur est toujours modifiée avant d'atteindre le récepteur, et que celui-ci doit traiter cette information altérée afin de recouvrer le message. Ceci est particulièrement vrai pour la communication acoustique, où la transmission du son dans l'environnement est une source majeure de dégradation du signal, ce qui diminue l'intensité du signal relatif au bruit. La question de savoir comment les animaux transmettent l'information malgré ces conditions contraignantes a été l'objet de nombreuses études, portant soit sur l'émetteur soit sur le récepteur. Cependant, une recherche plus intégrée sur l'analyse de scènes auditives est nécessaire pour aborder cette tâche dans toute sa complexité. Le but de ma recherche était d'utiliser une approche transversale afin d'étudier comment les oiseaux s'adaptent aux contraintes de la communication à longue distance, en examinant le codage de l'information au niveau de l'émetteur, les dégradations du signal acoustiques dues à la propagation, et la discrimination de cette information dégradée par le récepteur, au niveau comportemental comme au niveau neuronal. J'ai basé mon travail sur l'idée de prendre en compte les problèmes réellement rencontrés par les animaux dans leur environnement naturel, et d'utiliser des stimuli reflétant la pertinence biologique des problèmes posés à ces animaux. J'ai choisi de me focaliser sur l'information d'identité individuelle contenue dans le cri de distance des diamants mandarins (Taeniopygia guttata) et d'examiner comment la signature vocale individuelle est codée, dégradée, puis discriminée et décodée, depuis l'émetteur jusqu'au récepteur. Cette étude montre que la signature individuelle des diamants mandarins est très résistante à la propagation, et que les paramètres acoustiques les plus individualisés varient selon la distance considérée. En testant des femelles dans les expériences de conditionnement opérant, j'ai pu montrer que celles-ci sont expertes pour discriminer entre les signature vocales dégradées de deux mâles, et qu'elles peuvent s'améliorer en s'entraînant. Enfin, j'ai montré que cette capacité de discrimination impressionnante existe aussi au niveau neuronal : nous avons montré l'existence d'une population de neurones pouvant discriminer des voix individuelles à différent degrés de dégradation, sans entrainement préalable. Ce niveau de traitement évolué, dans le cortex auditif primaire, ouvre la voie à de nouvelles recherches, à l'interface entre le traitement neuronal de l'information et le comportement / In communication systems, one of the biggest challenges is that the information encoded by the emitter is always modified before reaching the receiver, who has to process this altered information in order to recover the intended message. In acoustic communication particularly, the transmission of sound through the environment is a major source of signal degradation, caused by attenuation, absorption and reflections, all of which lead to decreases in the signal relative to the background noise. How animals deal with the need for exchanging information in spite of constraining conditions has been the subject of many studies either at the emitter or at the receiver's levels. However, a more integrated research about auditory scene analysis has seldom been used, and is needed to address the complexity of this process. The goal of my research was to use a transversal approach to study how birds adapt to the constraints of long distance communication by investigating the information coding at the emitter's level, the propagation-induced degradation of the acoustic signal, and the discrimination of this degraded information by the receiver at both the behavioral and neural levels. Taking into account the everyday issues faced by animals in their natural environment, and using stimuli and paradigms that reflected the behavioral relevance of these challenges, has been the cornerstone of my approach. Focusing on the information about individual identity in the distance calls of zebra finches Taeniopygia guttata, I investigated how the individual vocal signature is encoded, degraded, and finally discriminated, from the emitter to the receiver. This study shows that the individual signature of zebra finches is very resistant to propagation-induced degradation, and that the most individualized acoustic parameters vary depending on distance. Testing female birds in operant conditioning experiments, I showed that they are experts at discriminating between the degraded vocal signatures of two males, and that they can improve their ability substantially when they can train over increasing distances. Finally, I showed that this impressive discrimination ability also occurs at the neural level: we found a population of neurons in the avian auditory forebrain that discriminate individual voices with various degrees of propagation-induced degradation without prior familiarization or training. The finding of such a high-level auditory processing, in the primary auditory cortex, opens a new range of investigations, at the interface of neural processing and behavior
|
34 |
Análise de cenas de pomares de laranjeiras através de segmentação de imagens e reconhecimento de padrões / Orange orchard scene analysis with image segmentation and pattern recognitionFelipe Alves Cavani 05 November 2007 (has links)
Os sistemas automáticos são normalmente empregados na indústria com o objetivo de otimizar a produção. Na agro-indústria, estes sistemas são usados com o mesmo propósito, sendo que dentre estes sistemas é possível destacar os que empregam a visão computacional, pois esta tem sido usada para inspeção de lavouras, colheita mecanizada, guiagem de veículos e robôs entre outras aplicações. No presente trabalho, técnicas de visão computacional foram utilizadas para segmentar e classificar elementos presentes em imagens obtidas de pomares de laranjeiras. Uma arquitetura modular foi utilizada na qual a imagem é segmentada automaticamente e, posteriormente, os segmentos são classificados. Nesta arquitetura, o algoritmo de segmentação e o classificador podem ser alterados sem prejudicar a flexibilidade do sistema implementado. Foram realizados experimentos com um banco de imagens composto por 658 imagens. Estas imagens foram obtidas sob diferentes condições de iluminação durante o período que as frutas estavam maduras. Estes experimentos foram realizados para avaliar, no contexto da arquitetura desenvolvida, o algoritmo de segmentação JSEG, vetores de características derivados dos espaços de cores RGB e HSV, além de três tipos de classificadores: bayesiano, classificador ingênuo de Bayes e classificador baseado no perceptron multicamadas. Finalmente, foram construídos os mapas de classes. As funções de distribuição de probabilidades foram estimadas com o algoritmo de Figueiredo-Jain. Dos resultados obtidos, deve-se destacar que o algoritmo de segmentação mostrou-se adequado aos propósitos deste trabalho e o classificador bayesiano mostrou-se mais prático que o classificador baseado no perceptron multicamadas. Por fim, a arquitetura mostrou-se adequada para o reconhecimento de cenas obtidas em pomares de laranjeiras. / Automation systems are usually used in the industry to optimize the production. In the agroindustry, these systems are used with the same intentions. Among them are systems that use computer vision for inspection, mechanized harvest, vehicles and robots guidance and other applications. Because of this, in the present work, techniques of computer vision were used to segment and classify elements in the images from oranges orchards. A modular architecture was used. The image are automatically segmented and, then the segments are classified. In this architecture, the segmentation algorithm and the classifier can be modified without loss of flexibility. The experiments were carried out with 658 images. These images were acquired under different illumination conditions during the period that the fruits are mature. These experiments were carried out to evaluate, in the context of developed architecture, the segmentation algorithm JSEG, characteristics vectors derived from the colors spaces RGB and HSV and three classifiers: Bayes\'s classifier, Bayes\'s naive classifier and multilayer perceptron classifier. Finally, the class maps were constructed. The Figueiredo-Jain algorithm was used to estimate the probability distribution functions. The results show that the segmentation algorithm is adequate to this work and the Bayes classifier is more practical that the multilayer perceptron classifier. Finally, the architecture is adequate for recognition of images acquired in orange orchards.
|
35 |
Análise de cenas de pomares de laranjeiras através de segmentação de imagens e reconhecimento de padrões / Orange orchard scene analysis with image segmentation and pattern recognitionCavani, Felipe Alves 05 November 2007 (has links)
Os sistemas automáticos são normalmente empregados na indústria com o objetivo de otimizar a produção. Na agro-indústria, estes sistemas são usados com o mesmo propósito, sendo que dentre estes sistemas é possível destacar os que empregam a visão computacional, pois esta tem sido usada para inspeção de lavouras, colheita mecanizada, guiagem de veículos e robôs entre outras aplicações. No presente trabalho, técnicas de visão computacional foram utilizadas para segmentar e classificar elementos presentes em imagens obtidas de pomares de laranjeiras. Uma arquitetura modular foi utilizada na qual a imagem é segmentada automaticamente e, posteriormente, os segmentos são classificados. Nesta arquitetura, o algoritmo de segmentação e o classificador podem ser alterados sem prejudicar a flexibilidade do sistema implementado. Foram realizados experimentos com um banco de imagens composto por 658 imagens. Estas imagens foram obtidas sob diferentes condições de iluminação durante o período que as frutas estavam maduras. Estes experimentos foram realizados para avaliar, no contexto da arquitetura desenvolvida, o algoritmo de segmentação JSEG, vetores de características derivados dos espaços de cores RGB e HSV, além de três tipos de classificadores: bayesiano, classificador ingênuo de Bayes e classificador baseado no perceptron multicamadas. Finalmente, foram construídos os mapas de classes. As funções de distribuição de probabilidades foram estimadas com o algoritmo de Figueiredo-Jain. Dos resultados obtidos, deve-se destacar que o algoritmo de segmentação mostrou-se adequado aos propósitos deste trabalho e o classificador bayesiano mostrou-se mais prático que o classificador baseado no perceptron multicamadas. Por fim, a arquitetura mostrou-se adequada para o reconhecimento de cenas obtidas em pomares de laranjeiras. / Automation systems are usually used in the industry to optimize the production. In the agroindustry, these systems are used with the same intentions. Among them are systems that use computer vision for inspection, mechanized harvest, vehicles and robots guidance and other applications. Because of this, in the present work, techniques of computer vision were used to segment and classify elements in the images from oranges orchards. A modular architecture was used. The image are automatically segmented and, then the segments are classified. In this architecture, the segmentation algorithm and the classifier can be modified without loss of flexibility. The experiments were carried out with 658 images. These images were acquired under different illumination conditions during the period that the fruits are mature. These experiments were carried out to evaluate, in the context of developed architecture, the segmentation algorithm JSEG, characteristics vectors derived from the colors spaces RGB and HSV and three classifiers: Bayes\'s classifier, Bayes\'s naive classifier and multilayer perceptron classifier. Finally, the class maps were constructed. The Figueiredo-Jain algorithm was used to estimate the probability distribution functions. The results show that the segmentation algorithm is adequate to this work and the Bayes classifier is more practical that the multilayer perceptron classifier. Finally, the architecture is adequate for recognition of images acquired in orange orchards.
|
36 |
串流式音訊分類於智慧家庭之應用 / Streaming audio classification for smart home environments溫景堯, Wen, Jing Yao Unknown Date (has links)
聽覺與視覺同為人類最重要的感官。計算式聽覺場景分析(Computation Auditory Scene Analysis, CASA)透過聽覺心理學中對於人耳特性與心理感知的關連性,定義了一個可能的方向,讓電腦聽覺更為貼近人類感知。本研究目的在於應用聽覺心理學之原則,以影像處理與圖型辨識技術,設計音訊增益、切割、描述等對應之處理,透過相似度計算方式實現智慧家庭之環境中的即時音訊分類。
本研究分為三部分,第一部分為音訊處理,將環境中的聲音轉換成電腦可處理與強化之訊號;第二部分透過CASA原則設計影像處理,以冀於影像上達成音訊處理之結果,並以影像特徵加以描述音訊事件;第三部分定義影像特徵之距離,以K個最近鄰點(K-Nearest Neighbor, KNN)技術針對智慧家庭環境常見之音訊事件,實現即時辨識與分類。實驗結果顯示本論文所提出的音訊分類方法有著不錯的效果,對八種家庭環境常見的聲音辨識正確率可達80-90%,而在雜訊或其他聲音干擾的情況下,辨識結果也維持在70%左右。 / Human receive sounds such as language and music through audition. Therefore, audition and vision are viewed as the two most important aspects of human perception. Computational auditory scene analysis (CASA) defined a possible direction to close the gap between computerized audition and human perception using the correlation between features of ears and mental perception in psychology of hearing. In this research, we develop and integrate methods for real-time streaming audio classification based on the principles of psychology of hearing as well as techniques in pattern recognition.
There are three major parts in this research. The first is audio processing, translating sounds into information that can be enhanced by computers; the second part uses the principles of CASA to design a framework for audio signal description and event detection by means of computer vision and image processing techniques; the third part defines the distance of image feature vectors and uses K-Nearest Neighbor (KNN) classifier to accomplish audio recognition and classification in real-time. Experimental results show that the proposed approach is quite effective, achieving an overall recognition rate of 80-90% for 8 types of audio input. The performance degrades only slightly in the presence of noise and other interferences.
|
37 |
Investigation of noise in hospital emergency departmentsMahapatra, Arun Kiran 08 November 2011 (has links)
The hospital sound environment is complex. Emergency Departments (EDs), in particular, have proven to be hectic work environments populated with diverse sound sources. Medical equipment, alarms, and communication events generate noise that can interfere with staff concentration and communication. In this study, sound measurements and analyses were conducted in six hospitals total: three civilian hospitals in Atlanta, Georgia and Dublin, Ohio, as well as three Washington, DC-area hospitals in the Military Health System (MHS). The equivalent, minimum, and maximum sound pressure levels were recorded over twenty-four hours in several locations in each ED, with shorter 15-30 minute measurements performed in other areas. Acoustic descriptors, such as spectral content, level distributions, and speech intelligibility were examined. The perception of these acoustic qualities by hospital staff was also evaluated through subjective surveys. It was found that noise levels in both work areas and patient rooms were excessive. Additionally, speech intelligibility measurements and survey results show that background noise presents a significant obstacle in effective communication between staff members and patients. Compared to previous studies, this study looks at a wider range of acoustic metrics and the corresponding perceptions of staff in order to form a more precise and accurate depiction of the ED sound environment.
|
38 |
Spatio-temporal data interpolation for dynamic scene analysisKim, Kihwan 06 January 2012 (has links)
Analysis and visualization of dynamic scenes is often constrained by the amount of spatio-temporal information available from the environment. In most scenarios, we have to account for incomplete information and sparse motion data, requiring us to employ interpolation and approximation methods to fill for the missing information. Scattered data interpolation and approximation techniques have been widely used for solving the problem of completing surfaces and images with incomplete input data. We introduce approaches for such data interpolation and approximation from limited sensors, into the domain of analyzing and visualizing dynamic scenes. Data from dynamic scenes is subject to constraints due to the spatial layout of the scene and/or the configurations of video cameras in use. Such constraints include: (1) sparsely available cameras observing the scene, (2) limited field of view provided by the cameras in use, (3) incomplete motion at a specific moment, and (4) varying frame rates due to different exposures and resolutions.
In this thesis, we establish these forms of incompleteness in the scene, as spatio-temporal uncertainties, and propose solutions for resolving the uncertainties by applying scattered data approximation into a spatio-temporal domain.
The main contributions of this research are as follows: First, we provide an efficient framework to visualize large-scale dynamic scenes from distributed static videos. Second, we adopt Radial Basis Function (RBF) interpolation to the spatio-temporal domain to generate global motion tendency. The tendency, represented by a dense flow field, is used to optimally pan and tilt a video camera. Third, we propose a method to represent motion trajectories using stochastic vector fields. Gaussian Process Regression (GPR) is used to generate a dense vector field and the certainty of each vector in the field. The generated stochastic fields are used for recognizing motion patterns under varying frame-rate and incompleteness of the input videos. Fourth, we also show that the stochastic representation of vector field can also be used for modeling global tendency to detect the region of interests in dynamic scenes with camera motion. We evaluate and demonstrate our approaches in several applications for visualizing virtual cities, automating sports broadcasting, and recognizing traffic patterns in surveillance videos.
|
39 |
Multilayer background modeling under occlusions for spatio-temporal scene analysisAzmat, Shoaib 21 September 2015 (has links)
This dissertation presents an efficient multilayer background modeling approach to distinguish among midground objects, the objects whose existence occurs over varying time scales between the extremes of short-term ephemeral appearances (foreground) and long-term stationary persistences (background). Traditional background modeling separates a given scene into foreground and background regions. However, the real world can be much more complex than this simple classification, and object appearance events often occur over varying time scales. There are situations in which objects appear on the scene at different points in time and become stationary; these objects can get occluded by one another, and can change positions or be removed from the scene. Inability to deal with such scenarios involving midground objects results in errors, such as ghost objects, miss-detection of occluding objects, aliasing caused by the objects that have left the scene but are not removed from the model, and new objects’ detection when existing objects are displaced. Modeling temporal layers of multiple objects allows us to overcome these errors, and enables the surveillance and summarization of scenes containing multiple midground objects.
|
40 |
Fusions multimodales pour la recherche d'humains par un robot mobile / Multimodal fusions for human detection by a mobile robotLabourey, Quentin 19 May 2017 (has links)
Dans ce travail, nous considérons le cas d'un robot mobile d'intérieur dont l'objectif est de détecter les humains présents dans l'environnement et de se positionner physiquement par rapport à eux, dans le but de mieux percevoir leur état. Pour cela, le robot dispose de différents capteurs (capteur RGB-Depth, microphones, télémètre laser). Des contributions de natures variées ont été effectuées :Classification d'événements sonores en environnement intérieur : La méthode de classification proposée repose sur une taxonomie de petite taille et est destinée à différencier les marqueurs de la présence humaine. L'utilisation de fonctions de croyance permet de prendre en compte l'incertitude de la classification, et de labelliser un son comme « inconnu ».Fusion audiovisuelle pour la détection de locuteurs successifs dans une conversation : Une méthode de détection de locuteurs est proposée dans le cas du robot immobile, placé comme témoin d'une interaction sociale. Elle repose sur une fusion audiovisuelle probabiliste. Cette méthode a été testée sur des vidéos acquises par le robot.Navigation dédiée à la détection d'humains à l'aide d'une fusion multimodale : A partir d'informations provenant des capteurs hétérogènes, le robot cherche des humains de manière autonome dans un environnement connu. Les informations sont fusionnées au sein d'une grille de perception multimodale. Cette grille permet au robot de prendre une décision quant à son prochain déplacement, à l'aide d'un automate reposant sur des niveaux de priorité des informations perçues. Ce système a été implémenté et testé sur un robot Q.bo.Modélisation crédibiliste de l'environnement pour la navigation : La construction de la grille de perception multimodale est améliorée à l'aide d'un mécanisme de fusion reposant sur la théorie des fonctions de croyance. Ceci permet au robot de maintenir une grille « évidentielle » dans le temps comprenant l'information perçue et son incertitude. Ce système a d'abord été évalué en simulation, puis sur le robot Q.bo. / In this work, we consider the case of mobile robot that aims at detecting and positioning itself with respect to humans in its environment. In order to fulfill this mission, the robot is equipped with various sensors (RGB-Depth, microphones, laser telemeter). This thesis contains contributions of various natures:Sound classification in indoor environments: A small taxonomy is proposed in a classification method destined to enable a robot to detect human presence. Uncertainty of classification is taken into account through the use of belief functions, allowing us to label a sound as "unknown".Speaker tracking thanks to audiovisual data fusion: The robot is witness to a social interaction and tracks the successive speakers with probabilistic audiovisual data fusion. The proposed method was tested on videos extracted from the robot's sensors.Navigation dedicated to human detection thanks to a multimodal fusion:} The robot autonomously navigates in a known environment to detect humans thanks to heterogeneous sensors. The data is fused to create a multimodal perception grid. This grid enables the robot to chose its destinations, depending on the priority of perceived information. This system was implemented and tested on a Q.bo robot.Credibilist modelization of the environment for navigation: The creation of the multimodal perception grid is improved by the use of credibilist fusion. This enables the robot to maintain an evidential grid in time, containing the perceived information and its uncertainty. This system was implemented in simulation first, and then on a Q.bo robot.
|
Page generated in 0.0748 seconds