Spelling suggestions: "subject:"scene analysis"" "subject:"acene analysis""
11 |
Multi-view Object Segmentation / Segmentation multi-vues d'objetDjelouah, Abdelaziz 17 March 2015 (has links)
L'utilisation de systèmes multi-caméras est de plus en plus populaire et il y a un intérêt croissant à résoudre les problèmes de vision par ordinateur dans ce contexte particulier. L'objectif étant de ne pas se limiter à l'application des méthodes monoculaires mais de proposer de nouvelles approches intrinsèquement orientées vers les systèmes multi-caméras. Le travail de cette thèse a pour objectif une meilleure compréhension du problème de segmentation multi-vues, pour proposer une nouvelle approche qui tire meilleur parti de la redondance d'information inhérente à l'utilisation de plusieurs points de vue. La segmentation multi-vues est l'identification de l'objet observé simultanément dans plusieurs caméras et sa séparation de l'arrière-plan. Les approches monoculaires classiques raisonnent sur chaque image de manière indépendante et ne bénéficient pas de la présence de plusieurs points de vue. Une question clé de la segmentation multi-vues réside dans la propagation d'information sur la segmentation entres les images tout en minimisant la complexité et le coût en calcul. Dans ce travail, nous investiguons en premier lieu l'utilisation d'un ensemble épars d'échantillons de points 3D. L'algorithme proposé classe chaque point comme "vide" s'il se projette sur une région du fond et "occupé" s'il se projette sur une région avant-plan dans toutes les vues. Un modèle probabiliste est proposé pour estimer les modèles de couleur de l'avant-plan et de l'arrière-plan, que nous testons sur plusieurs jeux de données de l'état de l'art. Deux extensions du modèle sont proposées. Dans la première, nous montrons la flexibilité de la méthode proposée en intégrant les mélanges de Gaussiennes comme modèles d'apparence. Cette intégration est possible grâce à l'utilisation de l'inférence variationelle. Dans la seconde, nous montrons que le modèle bayésien basé sur les échantillons 3D peut aussi être utilisé si des mesures de profondeur sont présentes. Les résultats de l'évaluation montrent que les problèmes de robustesse, typiquement causés par les ambigüités couleurs entre fond et forme, peuvent être au moins partiellement résolus en utilisant cette information de profondeur. A noter aussi qu'une approche multi-vues reste meilleure qu'une méthode monoculaire utilisant l'information de profondeur. Les différents tests montrent aussi les limitations de la méthode basée sur un échantillonnage éparse. Cela a montré la nécessité de proposer un modèle reposant sur une description plus riche de l'apparence dans les images, en particulier en utilisant les superpixels. L'une des contributions de ce travail est une meilleure modélisation des contraintes grâce à un schéma par coupure de graphes liant les régions d'images aux échantillons 3D. Dans le cas statique, les résultats obtenus rivalisent avec ceux de l'état de l'art mais sont obtenus avec beaucoup moins de points de vue. Les résultats dans le cas dynamique montrent l'intérêt de la propagation de l'information de segmentation à travers la géométrie et le mouvement. Enfin, la dernière partie de cette thèse explore la possibilité d'améliorer le suivi dans les systèmes multi-caméras non calibrés. Un état de l'art sur le suivi monoculaire et multi-caméras est présenté et nous explorons l'utilisation des matrices d'autosimilarité comme moyen de décrire le mouvement et de le comparer entre plusieurs caméras. / There has been a growing interest for multi-camera systems and many interesting works have tried to tackle computer vision problems in this particular configuration. The general objective is to propose new multi-view oriented methods instead of applying limited monocular approaches independently for each viewpoint. The work in this thesis is an attempt to have a better understanding of the multi-view object segmentation problem and to propose an alternative approach making maximum use of the available information from different viewpoints. Multiple view segmentation consists in segmenting objects simultaneously in several views. Classic monocular segmentation approaches reason on a single image and do not benefit from the presence of several viewpoints. A key issue in that respect is to ensure propagation of segmentation information between views while minimizing complexity and computational cost. In this work, we first investigate the idea that examining measurements at the projections of a sparse set of 3D points is sufficient to achieve this goal. The proposed algorithm softly assigns each of these 3D samples to the scene background if it projects on the background region in at least one view, or to the foreground if it projects on foreground region in all views. A complete probabilistic framework is proposed to estimate foreground/background color models and the method is tested on various datasets from state of the art. Two different extensions of the sparse 3D sampling segmentation framework are proposed in two scenarios. In the first, we show the flexibility of the sparse sampling framework, by using variational inference to integrate Gaussian mixture models as appearance models. In the second scenario, we propose a study of how to incorporate depth measurements in multi-view segmentation. We present a quantitative evaluation, showing that typical color-based segmentation robustness issues due to color-space ambiguity between foreground and background, can be at least partially mitigated by using depth, and that multi-view color depth segmentation also improves over monocular color depth segmentation strategies. The various tests also showed the limitations of the proposed 3D sparse sampling approach which was the motivation to propose a new method based on a richer description of image regions using superpixels. This model, that expresses more subtle relationships of the problem trough a graph construction linking superpixels and 3D samples, is one of the contributions of this work. In this new framework, time related information is also integrated. With static views, results compete with state of the art methods but they are achieved with significantly fewer viewpoints. Results on videos demonstrate the benefit of segmentation propagation through geometric and temporal cues. Finally, the last part of the thesis explores the possibilities of tracking in uncalibrated multi-view scenarios. A summary of existing methods in this field is presented, in both mono-camera and multi-camera scenarios. We investigate the potential of using self-similarity matrices to describe and compare motion in the context of multi-view tracking.
|
12 |
Electroencephalographic measures of auditory perception in dynamic acoustic environmentsMcMullan, Amanda R January 2013 (has links)
We are capable of effortlessly parsing a complex scene presented to us. In order to do
this, we must segregate objects from each other and from the background. While this
process has been extensively studied in vision science, it remains relatively less
understood in auditory science. This thesis sought to characterize the neuroelectric
correlates of auditory scene analysis using electroencephalography. Chapter 2 determined
components evoked by first-order energy boundaries and second-order pitch boundaries.
Chapter 3 determined components evoked by first-order and second-order discontinuous
motion boundaries. Both of these chapters focused on analysis of event-related potential
(ERP) waveforms and time-frequency analysis. In addition, these chapters investigated
the contralateral nature of a negative ERP component. These results extend the current
knowledge of auditory scene analysis by providing a starting point for discussing and
characterizing first-order and second-order boundaries in an auditory scene. / x, 90 leaves : col. ill. ; 29 cm
|
13 |
Boundary extension in the auditory domainHutchison, Joanna Lynn. January 2007 (has links) (PDF)
Thesis (Ph.D.)--Texas Christian University, 2007. / Title from dissertation title page (viewed Jul. 27, 2007). Includes abstract. Includes bibliographical references.
|
14 |
Multi-view Object Segmentation / Segmentation multi-vues d'objetDjelouah, Abdelaziz 17 March 2015 (has links)
L'utilisation de systèmes multi-caméras est de plus en plus populaire et il y a un intérêt croissant à résoudre les problèmes de vision par ordinateur dans ce contexte particulier. L'objectif étant de ne pas se limiter à l'application des méthodes monoculaires mais de proposer de nouvelles approches intrinsèquement orientées vers les systèmes multi-caméras. Le travail de cette thèse a pour objectif une meilleure compréhension du problème de segmentation multi-vues, pour proposer une nouvelle approche qui tire meilleur parti de la redondance d'information inhérente à l'utilisation de plusieurs points de vue. La segmentation multi-vues est l'identification de l'objet observé simultanément dans plusieurs caméras et sa séparation de l'arrière-plan. Les approches monoculaires classiques raisonnent sur chaque image de manière indépendante et ne bénéficient pas de la présence de plusieurs points de vue. Une question clé de la segmentation multi-vues réside dans la propagation d'information sur la segmentation entres les images tout en minimisant la complexité et le coût en calcul. Dans ce travail, nous investiguons en premier lieu l'utilisation d'un ensemble épars d'échantillons de points 3D. L'algorithme proposé classe chaque point comme "vide" s'il se projette sur une région du fond et "occupé" s'il se projette sur une région avant-plan dans toutes les vues. Un modèle probabiliste est proposé pour estimer les modèles de couleur de l'avant-plan et de l'arrière-plan, que nous testons sur plusieurs jeux de données de l'état de l'art. Deux extensions du modèle sont proposées. Dans la première, nous montrons la flexibilité de la méthode proposée en intégrant les mélanges de Gaussiennes comme modèles d'apparence. Cette intégration est possible grâce à l'utilisation de l'inférence variationelle. Dans la seconde, nous montrons que le modèle bayésien basé sur les échantillons 3D peut aussi être utilisé si des mesures de profondeur sont présentes. Les résultats de l'évaluation montrent que les problèmes de robustesse, typiquement causés par les ambigüités couleurs entre fond et forme, peuvent être au moins partiellement résolus en utilisant cette information de profondeur. A noter aussi qu'une approche multi-vues reste meilleure qu'une méthode monoculaire utilisant l'information de profondeur. Les différents tests montrent aussi les limitations de la méthode basée sur un échantillonnage éparse. Cela a montré la nécessité de proposer un modèle reposant sur une description plus riche de l'apparence dans les images, en particulier en utilisant les superpixels. L'une des contributions de ce travail est une meilleure modélisation des contraintes grâce à un schéma par coupure de graphes liant les régions d'images aux échantillons 3D. Dans le cas statique, les résultats obtenus rivalisent avec ceux de l'état de l'art mais sont obtenus avec beaucoup moins de points de vue. Les résultats dans le cas dynamique montrent l'intérêt de la propagation de l'information de segmentation à travers la géométrie et le mouvement. Enfin, la dernière partie de cette thèse explore la possibilité d'améliorer le suivi dans les systèmes multi-caméras non calibrés. Un état de l'art sur le suivi monoculaire et multi-caméras est présenté et nous explorons l'utilisation des matrices d'autosimilarité comme moyen de décrire le mouvement et de le comparer entre plusieurs caméras. / There has been a growing interest for multi-camera systems and many interesting works have tried to tackle computer vision problems in this particular configuration. The general objective is to propose new multi-view oriented methods instead of applying limited monocular approaches independently for each viewpoint. The work in this thesis is an attempt to have a better understanding of the multi-view object segmentation problem and to propose an alternative approach making maximum use of the available information from different viewpoints. Multiple view segmentation consists in segmenting objects simultaneously in several views. Classic monocular segmentation approaches reason on a single image and do not benefit from the presence of several viewpoints. A key issue in that respect is to ensure propagation of segmentation information between views while minimizing complexity and computational cost. In this work, we first investigate the idea that examining measurements at the projections of a sparse set of 3D points is sufficient to achieve this goal. The proposed algorithm softly assigns each of these 3D samples to the scene background if it projects on the background region in at least one view, or to the foreground if it projects on foreground region in all views. A complete probabilistic framework is proposed to estimate foreground/background color models and the method is tested on various datasets from state of the art. Two different extensions of the sparse 3D sampling segmentation framework are proposed in two scenarios. In the first, we show the flexibility of the sparse sampling framework, by using variational inference to integrate Gaussian mixture models as appearance models. In the second scenario, we propose a study of how to incorporate depth measurements in multi-view segmentation. We present a quantitative evaluation, showing that typical color-based segmentation robustness issues due to color-space ambiguity between foreground and background, can be at least partially mitigated by using depth, and that multi-view color depth segmentation also improves over monocular color depth segmentation strategies. The various tests also showed the limitations of the proposed 3D sparse sampling approach which was the motivation to propose a new method based on a richer description of image regions using superpixels. This model, that expresses more subtle relationships of the problem trough a graph construction linking superpixels and 3D samples, is one of the contributions of this work. In this new framework, time related information is also integrated. With static views, results compete with state of the art methods but they are achieved with significantly fewer viewpoints. Results on videos demonstrate the benefit of segmentation propagation through geometric and temporal cues. Finally, the last part of the thesis explores the possibilities of tracking in uncalibrated multi-view scenarios. A summary of existing methods in this field is presented, in both mono-camera and multi-camera scenarios. We investigate the potential of using self-similarity matrices to describe and compare motion in the context of multi-view tracking.
|
15 |
Sequential organization in computational auditory scene analysisShao, Yang 21 September 2007 (has links)
No description available.
|
16 |
Audio Source Separation Using Perceptual Principles for Content-Based Coding and Information ManagementMelih, Kathy, n/a January 2004 (has links)
The information age has brought with it a dual problem. In the first place, the ready access to mechanisms to capture and store vast amounts of data in all forms (text, audio, image and video), has resulted in a continued demand for ever more efficient means to store and transmit this data. In the second, the rapidly increasing store demands effective means to structure and access the data in an efficient and meaningful manner. In terms of audio data, the first challenge has traditionally been the realm of audio compression research that has focused on statistical, unstructured audio representations that obfuscate the inherent structure and semantic content of the underlying data. This has only served to further complicate the resolution of the second challenge resulting in access mechanisms that are either impractical to implement, too inflexible for general application or too low level for the average user. Thus, an artificial dichotomy has been created from what is in essence a dual problem. The founding motivation of this thesis is that, although the hypermedia model has been identified as the ideal, cognitively justified method for organising data, existing audio data representations and coding models provide little, if any, support for, or resemblance to, this model. It is the contention of the author that any successful attempt to create hyperaudio must resolve this schism, addressing both storage and information management issues simultaneously. In order to achieve this aim, an audio representation must be designed that provides compact data storage while, at the same time, revealing the inherent structure of the underlying data. Thus it is the aim of this thesis to present a representation designed with these factors in mind. Perhaps the most difficult hurdle in the way of achieving the aims of content-based audio coding and information management is that of auditory source separation. The MPEG committee has noted this requirement during the development of its MPEG-7 standard, however, the mechanics of "how" to achieve auditory source separation were left as an open research question. This same committee proposed that MPEG-7 would "support descriptors that can act as handles referring directly to the data, to allow manipulation of the multimedia material." While meta-data tags are a part solution to this problem, these cannot allow manipulation of audio material down to the level of individual sources when several simultaneous sources exist in a recording. In order to achieve this aim, the data themselves must be encoded in such a manner that allows these descriptors to be formed. Thus, content-based coding is obviously required. In the case of audio, this is impossible to achieve without effecting auditory source separation. Auditory source separation is the concern of computational auditory scene analysis (CASA). However, the findings of CASA research have traditionally been restricted to a limited domain. To date, the only real application of CASA research to what could loosely be classified as information management has been in the area of signal enhancement for automatic speech recognition systems. In these systems, a CASA front end serves as a means of separating the target speech from the background "noise". As such, the design of a CASA-based approach, as presented in this thesis, to one of the most significant challenges facing audio information management research represents a significant contribution to the field of information management. Thus, this thesis unifies research from three distinct fields in an attempt to resolve some specific and general challenges faced by all three. It describes an audio representation that is based on a sinusoidal model from which low-level auditory primitive elements are extracted. The use of a sinusoidal representation is somewhat contentious with the modern trend in CASA research tending toward more complex approaches in order to resolve issues relating to co-incident partials. However, the choice of a sinusoidal representation has been validated by the demonstration of a method to resolve many of these issues. The majority of the thesis contributes several algorithms to organise the low-level primitives into low-level auditory objects that may form the basis of nodes or link anchor points in a hyperaudio structure. Finally, preliminary investigations in the representations suitability for coding and information management tasks are outlined as directions for future research.
|
17 |
Study of ASA AlgorithmsArdam, Nagaraju January 2010 (has links)
Hearing aid devices are used to help people with hearing impairment. The number of people that requires hearingaid devices are possibly constant over the years, however the number of people that now have access to hearing aiddevices increasing rapidly. The hearing aid devices must be small, consume very little power, and be fairly accurate.Even though it is normally more important for the user that hearing impairment look good (are discrete). Once thehearing aid device prescribed to the user, she/he needs to train and adjust the device to compensate for the individualimpairment.We are within the framework of this project researching on hearing aid devices that can be trained by the hearingimpaired person her-/himself. This project is about finding suitable noise cancellation algorithm for the hearing-aiddevice. We consider several types of algorithms like, microphone array signal processing, Independent ComponentAnalysis (ICA) based on double microphone called Blind Source Separation (BSS) and DRNPE algorithm.We run this current and most sophisticated and robust algorithms in certain noise backgrounds like Cocktail noise,street, public places, train, babble situations to test the efficiency. The BSS algorithm was well in some situation andgave average results in some situations. Where one microphone gave steady results in all situations. The output isgood enough to listen targeted audio.The functionality and performance of the proposed algorithm is evaluated with different non-stationary noisebackgrounds. From the performance results it can be concluded that, by using the proposed algorithm we are able toreduce the noise to certain level. SNR, system delay, minimum error and audio perception are the vital parametersconsidered to evaluate the performance of algorithms. Based on these parameters an algorithm is suggested forheairng-aid. / Hearing-Aid
|
18 |
Speech-on-speech masking in a front-back dimension and analysis of binaural parameters in rooms using MLS methodsAaronson, Neil L. January 2008 (has links)
Thesis (Ph. D.)--Michigan State University. Dept. of Physics, 2008. / Title from PDF t.p. (viewed on July 22, 2009) Includes bibliographical references (p. 236-243). Also issued in print.
|
19 |
Psychophysical and Neural Correlates of Auditory Attraction and AversionJanuary 2014 (has links)
abstract: This study explores the psychophysical and neural processes associated with the perception of sounds as either pleasant or aversive. The underlying psychophysical theory is based on auditory scene analysis, the process through which listeners parse auditory signals into individual acoustic sources. The first experiment tests and confirms that a self-rated pleasantness continuum reliably exists for 20 various stimuli (r = .48). In addition, the pleasantness continuum correlated with the physical acoustic characteristics of consonance/dissonance (r = .78), which can facilitate auditory parsing processes. The second experiment uses an fMRI block design to test blood oxygen level dependent (BOLD) changes elicited by a subset of 5 exemplar stimuli chosen from Experiment 1 that are evenly distributed over the pleasantness continuum. Specifically, it tests and confirms that the pleasantness continuum produces systematic changes in brain activity for unpleasant acoustic stimuli beyond what occurs with pleasant auditory stimuli. Results revealed that the combination of two positively and two negatively valenced experimental sounds compared to one neutral baseline control elicited BOLD increases in the primary auditory cortex, specifically the bilateral superior temporal gyrus, and left dorsomedial prefrontal cortex; the latter being consistent with a frontal decision-making process common in identification tasks. The negatively-valenced stimuli yielded additional BOLD increases in the left insula, which typically indicates processing of visceral emotions. The positively-valenced stimuli did not yield any significant BOLD activation, consistent with consonant, harmonic stimuli being the prototypical acoustic pattern of auditory objects that is optimal for auditory scene analysis. Both the psychophysical findings of Experiment 1 and the neural processing findings of Experiment 2 support that consonance is an important dimension of sound that is processed in a manner that aids auditory parsing and functional representation of acoustic objects and was found to be a principal feature of pleasing auditory stimuli. / Dissertation/Thesis / Masters Thesis Psychology 2014
|
20 |
Natural Correlations of Spectral Envelope and their Contribution to Auditory Scene AnalysisJanuary 2017 (has links)
abstract: Auditory scene analysis (ASA) is the process through which listeners parse and organize their acoustic environment into relevant auditory objects. ASA functions by exploiting natural regularities in the structure of auditory information. The current study investigates spectral envelope and its contribution to the perception of changes in pitch and loudness. Experiment 1 constructs a perceptual continuum of twelve f0- and intensity-matched vowel phonemes (i.e. a pure timbre manipulation) and reveals spectral envelope as a primary organizational dimension. The extremes of this dimension are i (as in “bee”) and Ʌ (“bun”). Experiment 2 measures the strength of the relationship between produced f0 and the previously observed phonetic-pitch continuum at three different levels of phonemic constraint. Scat performances and, to a lesser extent, recorded interviews were found to exhibit changes in accordance with the natural regularity; specifically, f0 changes were correlated with the phoneme pitch-height continuum. The more constrained case of lyrical singing did not exhibit the natural regularity. Experiment 3 investigates participant ratings of pitch and loudness as stimuli vary in f0, intensity, and the phonetic-pitch continuum. Psychophysical functions derived from the results reveal that moving from i to Ʌ is equivalent to a .38 semitone decrease in f0 and a .75 dB decrease in intensity. Experiment 4 examines the potentially functional aspect of the pitch, loudness, and spectral envelope relationship. Detection thresholds of stimuli in which all three dimensions change congruently (f0 increase, intensity increase, Ʌ to i) or incongruently (no f0 change, intensity increase, i to Ʌ) are compared using an objective version of the method of limits. Congruent changes did not provide a detection benefit over incongruent changes; however, when the contribution of phoneme change was removed, congruent changes did offer a slight detection benefit, as in previous research. While this relationship does not offer a detection benefit at threshold, there is a natural regularity for humans to produce phonemes at higher f0s according to their relative position on the pitch height continuum. Likewise, humans have a bias to detect pitch and loudness changes in phoneme sweeps in accordance with the natural regularity. / Dissertation/Thesis / Doctoral Dissertation Psychology 2017
|
Page generated in 0.0572 seconds