Global ETD Search

31	Level of detail for granular audio-graphic rendering : representation, implementation, and user-based evaluation Ding, Hui 30 September 2013 (has links) (PDF) Real-time simulation of complex audio-visual scenes remains challenging due to the technically independent but perceptually related rendering process in each modality. Because of the potential crossmodal dependency of auditory and visual perception, the optimization of graphics and sound rendering, such as Level of Details (LOD), should be considered in a combined manner but not as separate issues. For instance, in audition and vision, people have perceptual limits on observation quality. Techniques of perceptually driven LOD for graphics have been greatly advanced for decades. However, the concept of LOD is rarely considered in crossmodal evaluation and rendering. This thesis is concentrated on the crossmodal evaluation of perception on audiovisual LOD rendering by psychophysical methods, based on that one may apply a functional and general method to eventually optimize the rendering. The first part of the thesis is an overview of our research. In this part, we review various LOD approaches and discuss concerned issues, especially from a crossmodal perceptual perspective. We also discuss the main results on the design, rendering and applications of highly detailed interactive audio and graphical scenes of the ANR Topophonie project, in which the thesis took place. A study of psychophysical methods for the evaluation on audio-visual perception is also presented to provide a solid knowledge of experimental design. In the second part, we focus on studying the perception of image artifacts in audio-visual LOD rendering. A series of experiments was designed to investigate how the additional audio modality can impact the visual detection of artifacts produced by impostor-based LOD. The third part of the thesis is focused on the novel extended-X3D that we designed for audio-visual LOD modeling. In the fourth part, we present a design and evaluation of the refined crossmodal LOD system. The evaluation of the audio-visual perception on crossmodal LOD system was achieved through a series of psychophysical experiments. Our main contribution is that we provide a further understanding of crossmodal LOD with some new observations, and explore it through perceptual experiments and analysis. The results of our work can eventually be used as the empirical evidences and guideline for a perceptually driven crossmodal LOD system. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Level of detail for graphics Level of detail for sound Crossmodal perception Audio-visual perception Perceptual experiments Psychophysical methods HCI
32	Deep Neural Architectures for Automatic Representation Learning from Multimedia Multimodal Data / Architectures neuronales profondes pour l'apprentissage de représentation multimodales de données multimédias Vukotic, Verdran 26 September 2017 (has links) La thèse porte sur le développement d'architectures neuronales profondes permettant d'analyser des contenus textuels ou visuels, ou la combinaison des deux. De manière générale, le travail tire parti de la capacité des réseaux de neurones à apprendre des représentations abstraites. Les principales contributions de la thèse sont les suivantes: 1) Réseaux récurrents pour la compréhension de la parole: différentes architectures de réseaux sont comparées pour cette tâche sur leurs facultés à modéliser les observations ainsi que les dépendances sur les étiquettes à prédire. 2) Prédiction d’image et de mouvement : nous proposons une architecture permettant d'apprendre une représentation d'une image représentant une action humaine afin de prédire l'évolution du mouvement dans une vidéo ; l'originalité du modèle proposé réside dans sa capacité à prédire des images à une distance arbitraire dans une vidéo. 3) Encodeurs bidirectionnels multimodaux : le résultat majeur de la thèse concerne la proposition d'un réseau bidirectionnel permettant de traduire une modalité en une autre, offrant ainsi la possibilité de représenter conjointement plusieurs modalités. L'approche été étudiée principalement en structuration de collections de vidéos, dons le cadre d'évaluations internationales où l'approche proposée s'est imposée comme l'état de l'art. 4) Réseaux adverses pour la fusion multimodale: la thèse propose d'utiliser les architectures génératives adverses pour apprendre des représentations multimodales en offrant la possibilité de visualiser les représentations dans l'espace des images. / In this dissertation, the thesis that deep neural networks are suited for analysis of visual, textual and fused visual and textual content is discussed. This work evaluates the ability of deep neural networks to learn automatic multimodal representations in either unsupervised or supervised manners and brings the following main contributions:1) Recurrent neural networks for spoken language understanding (slot filling): different architectures are compared for this task with the aim of modeling both the input context and output label dependencies.2) Action prediction from single images: we propose an architecture that allow us to predict human actions from a single image. The architecture is evaluated on videos, by utilizing solely one frame as input.3) Bidirectional multimodal encoders: the main contribution of this thesis consists of neural architecture that translates from one modality to the other and conversely and offers and improved multimodal representation space where the initially disjoint representations can translated and fused. This enables for improved multimodal fusion of multiple modalities. The architecture was extensively studied an evaluated in international benchmarks within the task of video hyperlinking where it defined the state of the art today.4) Generative adversarial networks for multimodal fusion: continuing on the topic of multimodal fusion, we evaluate the possibility of using conditional generative adversarial networks to lean multimodal representations in addition to providing multimodal representations, generative adversarial networks permit to visualize the learned model directly in the image domain. Autoencodeurs Apprentissage de représentations Deep neural networks Embedding Continuous representation Multimedia Multimodal Computer vision Spoken langage understanding Crossmodal Generative adversarial networks Autoencoders 006.4
33	Fear Processing in Dental Phobia during Crossmodal Symptom Provocation: An fMRI Study Hilbert, Kevin, Evens, Ricarda, Maslowski, Nina Isabel, Wittchen, Hans-Ulrich, Lüken, Ulrike 09 July 2014 (has links) (PDF) While previous studies successfully identified the core neural substrates of the animal subtype of specific phobia, only few and inconsistent research is available for dental phobia. These findings might partly relate to the fact that, typically, visual stimuli were employed. The current study aimed to investigate the influence of stimulus modality on neural fear processing in dental phobia. Thirteen dental phobics (DP) and thirteen healthy controls (HC) attended a block-design functional magnetic resonance imaging (fMRI) symptom provocation paradigm encompassing both visual and auditory stimuli. Drill sounds and matched neutral sinus tones served as auditory stimuli and dentist scenes and matched neutral videos as visual stimuli. Group comparisons showed increased activation in the insula, anterior cingulate cortex, orbitofrontal cortex, and thalamus in DP compared to HC during auditory but not visual stimulation. On the contrary, no differential autonomic reactions were observed in DP. Present results are largely comparable to brain areas identified in animal phobia, but also point towards a potential downregulation of autonomic outflow by neural fear circuits in this disorder. Findings enlarge our knowledge about neural correlates of dental phobia and may help to understand the neural underpinnings of the clinical and physiological characteristics of the disorder. Zahnbehandlungsphobie Dentalphobie cross-modales Symptom fMRI-Studie fMRT funktionelle Magnetresonanztomographie TU Dresden Publikationsfonds Dental Phobia Crossmodal Symptom fMRI Study Technical University Dresden Publication funds ddc:610 ddc:570 rvk:XA 10000 rvk:WA 15000
34	Nature of crossmodal plasticity in the blind brain and interplay with sight restoration Dormal, Giulia 06 1900 (has links) Thèse réalisée en cotutelle avec l'Université catholique de Louvain. / Ce travail de thèse s’est intéressé à la plasticité cérébrale associée à la privation/restauration visuelle. A travers deux études transversales utilisant l’imagerie par résonance magnétique fonctionnelle auprès d’un groupe de participants présentant une cécité congénitale ou précoce (ainsi qu’auprès d’un groupe contrôle de participants voyants), nous avons tenté de caractériser la manière dont le cortex occipital - typiquement dédié au traitement de l’information visuelle - se réorganise afin de traiter différents stimuli auditifs. Nous démontrons qu’en cas de cécité précoce, différentes régions du cortex occipital présentent une préférence fonctionnelle pour certains types de stimuli non-visuels, avec une spécialisation fonctionnelle qui respecte celle de régions typiquement impliquées dans le traitement d’informations similaires en vision. Ces découvertes constituent une avancée conceptuelle concernant le rôle joué par les contraintes intrinsèques d’une part, et par l’expérience d’autre part, dans l’émergence de réponses sensorielles et fonctionnelles du cortex occipital. D’une part, l’observation de réponses occipitales à la stimulation auditive chez le non-voyant précoce (réorganisation transmodale) rend compte de la capacité du cortex occipital à réorienter sa modalité sensorielle préférentielle en fonction de l’expérience. D’autre part, l’existence de modules cognitifs spécialisés dans le cortex occipital du non-voyant précoce, semblables à ceux du cerveau voyant, démontre les contraintes intrinsèques imposées à une telle plasticité. Dans une étude de cas longitudinale, nous avons également exploré comment les changements plastiques associés à la cécité interagissent avec une récupération visuelle partielle à l’âge adulte. Nous avons réalisé des mesures pré et post-opératoires auprès d’un patient ayant récupéré la vision, en combinant les techniques comportementales ainsi que de neuroimagerie fonctionnelle et structurelle afin d’investiguer conjointement l’évolution de la réorganisation transmodale et de la récupération des fonctions visuelles à travers le temps. Nous démontrons que les changements structurels et fonctionnels caractérisant le cortex occipital du non-voyant sont partiellement réversibles suite à une récupération visuelle à l’âge adulte. De manière générale, ces recherches témoignent de l’importante adaptabilité du cortex occipital aux prises avec des changements drastiques dans l’expérience visuelle. / The present Ph.D. work was dedicated to the study of experience-dependent brain plasticity associated with visual deprivation/restoration. In two cross-sectional studies involving the use of functional magnetic resonance imaging in a group of participants with congenital or early blindness (and in a control group of sighted participants), we attempted to characterize the way the occipital cortex - typically devoted to vision – reorganizes itself in order to process different auditory stimuli. We demonstrate that in case of early visual deprivation, distinct regions of the occipital cortex display a functional preference for specific non-visual attributes, maintaining a functional specialization similar to the one that characterizes the sighted brain. Such studies have shed new light on the role played by intrinsic constraints on the one side, and experience on the other, in shaping the modality- and functional tuning of the occipital cortex. On the one hand, the observation of occipital responses to auditory stimulation (crossmodal plasticity) highlights the ability of the occipital cortex to reorient its preferential tuning towards the preserved sensory modalities as a function of experience. On the other hand, the observation of specialized cognitive modules in the occipital cortex, similar to those observed in the sighted, highlights the intrinsic constraints imposed to such plasticity. In a longitudinal single-case study, we further explored how the neuroplastic changes associated with blindness may interact with the newly reacquired visual inputs following partial visual restoration in adulthood. We performed both pre- and post-surgery measurements in a sight-recovery patient combining behavioral, neurostructural and neurofunctional methods in order to jointly investigate the evolution of crossmodal reorganization and visual recovery across time. We demonstrate that functional and structural changes evidenced in the visually-deprived occipital cortex can only partially reverse following sight restoration in adulthood. Altogether, our findings demonstrate the striking adaptability of the occipital cortex facing drastic changes in visual experience. cécité plasticité transmodale système ventral-dorsal restauration visuelle blindness crossmodal plasticity ventral-dorsal systems sight restoration functional magnetic resonance imaging
35	Level of detail for granular audio-graphic rendering : representation, implementation, and user-based evaluation / Niveau de détail pour le rendu audio-graphique granulaire : la représentation, l’implémentation, l’évaluation basée sur les utilisateurs Ding, Hui 30 September 2013 (has links) Simulation en temps réel de scènes audio-visuelles complexes reste difficile en raison du processus de rendu techniquement indépendant mais perceptivement lié à chaque modalité. En raison de la dépendance cross-modale potentiel de la perception auditive et visuelle, l'optimisation de graphiques et de rendu sonore, tels que le niveau de détail (LOD), doit être considéré de manière combinée, mais pas comme des questions distinctes. Par exemple, dans l'audition et de la vision, les gens ont des limites perceptives sur la qualité de l'observation. Techniques de LOD conduit par la perception pour les graphismes ont été grandement progressé depuis des décennies. Cependant, le concept de LOD est rarement pris en compte dans l'évaluation et le rendu crossmodal. Cette thèse porte sur l'évaluation de la perception crossmodale sur le rendu LOD audiovisuel par des méthodes psychophysiques, sur lequel on peut appliquer une méthode fonctionnelle et générale, à terme, d'optimiser le rendu. La première partie de la thèse est une étude des problématiques. Dans cette partie, nous passons en revue les différentes approches LOD et discutons les issues, en particulier du point de vue au niveau de la perception crossmodale. Nous discutons également les résultats principaux sur le design, le rendu et les applications interactives des scènes audio et graphiques dans le cadre du projet ANR Topophonie dont la thèse a eu lieu. Une étude des méthodes psychophysiques pour l'évaluation de la perception audio-visuelle est également présentée afin de fournir une solide connaissance du design expérimentale. Dans la deuxième partie, nous nous concentrons sur l'étude de la perception des artefacts d'image dans le rendu LOD audio-visuel. Une série d'expériences a été conçue pour étudier comment la modalité audio supplémentaire peut influer sur la détection visuelle des artefacts produits par la méthode LOD d’imposteur. La troisième partie de la thèse est axée sur le X3D étendu que nous avons conçu pour la modélisation de LOD audio-visuel. Dans la dernière partie, nous présentons le design et l'évaluation du système original par le rendu LOD crossmodal. L'évaluation de la perception audio-visuelle sur le système LOD crossmodal a été atteinte grâce à une série d'expériences psychophysiques. Notre contribution principale est que nous offrons une compréhension originale de LOD crossmodal avec de nouvelles observations, et l'explorer par des expériences et des analyses perceptives. Les résultats de notre travail peuvent être, éventuellement, les preuves empiriques et des lignes directrices pour un système de rendu LOD crossmodale conduit par la perception. / Real-time simulation of complex audio-visual scenes remains challenging due to the technically independent but perceptually related rendering process in each modality. Because of the potential crossmodal dependency of auditory and visual perception, the optimization of graphics and sound rendering, such as Level of Details (LOD), should be considered in a combined manner but not as separate issues. For instance, in audition and vision, people have perceptual limits on observation quality. Techniques of perceptually driven LOD for graphics have been greatly advanced for decades. However, the concept of LOD is rarely considered in crossmodal evaluation and rendering. This thesis is concentrated on the crossmodal evaluation of perception on audiovisual LOD rendering by psychophysical methods, based on that one may apply a functional and general method to eventually optimize the rendering. The first part of the thesis is an overview of our research. In this part, we review various LOD approaches and discuss concerned issues, especially from a crossmodal perceptual perspective. We also discuss the main results on the design, rendering and applications of highly detailed interactive audio and graphical scenes of the ANR Topophonie project, in which the thesis took place. A study of psychophysical methods for the evaluation on audio-visual perception is also presented to provide a solid knowledge of experimental design. In the second part, we focus on studying the perception of image artifacts in audio-visual LOD rendering. A series of experiments was designed to investigate how the additional audio modality can impact the visual detection of artifacts produced by impostor-based LOD. The third part of the thesis is focused on the novel extended-X3D that we designed for audio-visual LOD modeling. In the fourth part, we present a design and evaluation of the refined crossmodal LOD system. The evaluation of the audio-visual perception on crossmodal LOD system was achieved through a series of psychophysical experiments. Our main contribution is that we provide a further understanding of crossmodal LOD with some new observations, and explore it through perceptual experiments and analysis. The results of our work can eventually be used as the empirical evidences and guideline for a perceptually driven crossmodal LOD system. Niveau de détail graphique Niveau de détail sonore Perception crossmodale Perception audiovisuelle Expérience perceptuelle Méthodes psychophysiques IHM Level of detail for graphics Level of detail for sound Crossmodal perception Audio-visual perception Perceptual experiments Psychophysical methods HCI
36	Etude des corrélats cérébraux sous-tendant les processus associatifs impliqués dans l'identification des personnes Joassin, Frédéric 29 March 2006 (has links) L'être humain est pourvu des multiples canaux sensoriels par lesquels il appréhende le monde. Un critère fondamental à notre adaptation est notre capacité à établir des relations entre les différentes informations que nos sens perçoivent. Cette capacité est notamment cruciale dans nos interactions sociales puisque nous devons constamment intégrer en une représentation unifiée les informations visuelles (telles que les visages), auditives (telles que les voix) et verbales (telles que le discours ou le nom) afin de pouvoir identifier nos interlocuteurs. Sachant que le traitement des principales informations qui nous permettent d'identifier les personnes (visages, voix et nom de famille) est sous-tendu par l'activation de régions cérébrales spécifiques et distinctes les unes des autres, la question qui se pose est de savoir comment le cerveau opère pour créer une représentation unifiée des personnes que nous connaissons. Dans la première partie de cette thèse, nous passerons en revue différentes études qui ont tenté de cerner les régions cérébrales impliquées dans le traitement (perception et reconnaissance) de chaque type d'information. Le premier chapitre sera consacré aux corrélats cérébraux du traitement des visages, le second à ceux impliqués dans le traitement des noms propres, le troisième à ceux impliqués dans le traitement des voix. Ces processus seront chaque fois abordés sous l'angle de la neuropsychologie cognitive, de l'imagerie cérébrale fonctionnelle et de l'électrophysiologie. Un quatrième chapitre théorique sera consacré à l'étude des processus associatifs entre ces trois types d'informations, et nous verrons que rares sont les études qui ont directement examiné les activités cérébrales spécifiques à la récupération d'associations entre informations relatives à l'identité des personnes. L'approche expérimentale sera abordée dans la seconde partie de cette thèse. Les quatre études décrites dans cette partie se basent sur les résultats de l'étude de Campanella et al. (2001) qui, par PET-scan, ont examiné les régions cérébrales activées par la récupération d'associations entre visages et noms propres. Utilisant la méthode soustractive, consistant en la soustraction de deux conditions unimodales d'une condition bimodale, ces auteurs ont montré une activation d'un réseau d'aires cérébrales latéralisé dans l'hémisphère gauche et incluant notamment le lobule pariétal inférieur, interprété comme étant une région de convergence multimodale où s'opère l'intégration des différentes informations perçues par les sujets. La méthode soustractive sera utilisée dans toutes les expériences décrites dans cette section. La première étude de cette thèse, utilisant la même méthodologie appliquée à la méthode électrophysiologique des potentiels évoqués, aura pour but d'examiner le décours temporel des activités observées par Campanella et al. (2001). La seconde étude en potentiels évoqués aura pour but d'examiner si les activités observées dans les deux études pré-citées sont spécifiques aux processus associatifs entre visages et noms propres, ou s'ils reflètent des processus plus généraux permettant de lier tout objet visuel à son nom. Les études 3 et 4 viseront quant à elles à définir si le lobule pariétal inférieur gauche est impliqué dans l'intégration de stimulations exclusivement visuelles, ou si il est impliqué dans le « binding » de tout type d'information relative aux personnes, quelle que soit les modalités de présentation. Dans ce cadre, l'étude 3 examinera les corrélats cérébraux impliqués dans la récupération d'associations entre visages et voix. L'étude 4 examinera cette question au moyen de l'imagerie par résonance magnétique fonctionnelle. La dernière partie de cette thèse sera consacrée à l'interprétation de l'ensemble des résultats des 4 expériences décrites précédemment. L'accent y sera mis sur la latence d'apparition des ondes spécifiques aux conditions associatives, apparaissant en même temps que les activités propres au traitement de chaque type d'information, ainsi que sur le rôle du gyrus pariétal inférieur gauche dans l'intégration des représentations des différents attributs par lesquels nous identifions les personnes. Crossmodal processes People recognition Faces Voices Processus associatifs Processus crossmodaux Reconnaissance des personnes Visages Voix Noms propres Potentiels évoqués Proper names Event-related potentials
37	Quel son spatialisé pour la vidéo 3D ? : influence d'un rendu Wave Field Synthesis sur l'expérience audio-visuelle 3D / Which spatialized sound for 3D video ? : influence of a Wave Field Synthesis rendering on 3D audio-visual experience Moulin, Samuel 03 April 2015 (has links) Le monde du divertissement numérique connaît depuis plusieurs années une évolution majeure avec la démocratisation des technologies vidéo 3D. Il est désormais commun de visualiser des vidéos stéréoscopiques sur différents supports : au cinéma, à la télévision, dans les jeux vidéos, etc. L'image 3D a considérablement évolué mais qu'en est-il des technologies de restitution sonore associées ? La plupart du temps, le son qui accompagne la vidéo 3D est basé sur des effets de latéralisation, plus au moins étendus (stéréophonie, systèmes 5.1). Il est pourtant naturel de s'interroger sur le besoin d'introduire des événements sonores en lien avec l'ajout de cette nouvelle dimension visuelle : la profondeur. Plusieurs technologies semblent pouvoir offrir une description sonore 3D de l'espace (technologies binaurales, Ambisonics, Wave Field Synthesis). Le recours à ces technologies pourrait potentiellement améliorer la qualité d'expérience de l'utilisateur, en termes de réalisme tout d'abord grâce à l'amélioration de la cohérence spatiale audio-visuelle, mais aussi en termes de sensation d'immersion. Afin de vérifier cette hypothèse, nous avons mis en place un système de restitution audio-visuelle 3D proposant une présentation visuelle stéréoscopique associée à un rendu sonore spatialisé par Wave Field Synthesis. Trois axes de recherche ont alors été étudiés : 1 / Perception de la distance en présentation unimodale ou bimodale. Dans quelle mesure le système audio-visuel est-il capable de restituer des informations spatiales relatives à la distance, dans le cas d'objets sonores, visuels, ou audio-visuels ? Les expériences menées montrent que la Wave Field Synthesis permet de restituer la distance de sources sonores virtuelles. D'autre part, les objets visuels et audio-visuels sont localisés avec plus de précisions que les objets uniquement sonores. 2 / Intégration multimodale suivant la distance. Comment garantir une perception spatiale audio-visuelle cohérente de stimuli simples ? Nous avons mesuré l'évolution de la fenêtre d'intégration spatiale audio-visuelle suivant la distance, c'est-à-dire les positions des stimuli audio et visuels pour lesquelles la fusion des percepts a lieu. 3 / Qualité d'expérience audio-visuelle 3D. Quel est l'apport du rendu de la profondeur sonore sur la qualité d'expérience audio-visuelle 3D ? Nous avons tout d'abord évalué la qualité d'expérience actuelle, lorsque la présentation de contenus vidéo 3D est associée à une bande son 5.1, diffusée par des systèmes grand public (système 5.1, casque, et barre de son). Nous avons ensuite étudié l'apport du rendu de la profondeur sonore grâce au système audio-visuel proposé (vidéo 3D associée à la Wave Field Synthesis). / The digital entertainment industry is undergoing a major evolution due to the recent spread of stereoscopic-3D videos. It is now possible to experience 3D by watching movies, playing video games, and so on. In this context, video catches most of the attention but what about the accompanying audio rendering? Today, the most often used sound reproduction technologies are based on lateralization effects (stereophony, 5.1 surround systems). Nevertheless, it is quite natural to wonder about the need of introducing a new audio technology adapted to this new visual dimension: the depth. Many alternative technologies seem to be able to render 3D sound environments (binaural technologies, ambisonics, Wave Field Synthesis). Using these technologies could potentially improve users' quality of experience. It could impact the feeling of realism by adding audio-visual spatial congruence, but also the immersion sensation. In order to validate this hypothesis, a 3D audio-visual rendering system is set-up. The visual rendering provides stereoscopic-3D images and is coupled with a Wave Field Synthesis sound rendering. Three research axes are then studied: 1/ Depth perception using unimodal or bimodal presentations. How the audio-visual system is able to render the depth of visual, sound, and audio-visual objects? The conducted experiments show that Wave Field Synthesis can render virtual sound sources perceived at different distances. Moreover, visual and audio-visual objects can be localized with a higher accuracy in comparison to sound objects. 2/ Crossmodal integration in the depth dimension. How to guarantee the perception of congruence when audio-visual stimuli are spatially misaligned? The extent of the integration window was studied at different visual object distances. In other words, according to the visual stimulus position, we studied where sound objects should be placed to provide the perception of a single unified audio-visual stimulus. 3/ 3D audio-visual quality of experience. What is the contribution of sound depth rendering on the 3D audio-visual quality of experience? We first assessed today's quality of experience using sound systems dedicated to the playback of 5.1 soundtracks (5.1 surround system, headphones, soundbar) in combination with 3D videos. Then, we studied the impact of sound depth rendering using the set-up audio-visual system (3D videos and Wave Field Synthesis). Wave Field Synthesis Vidéo stéréoscopique Perception de la distance Perception audio-visuelle Intégration multimodale Qualité d'expérience Wave Field Synthesis Stereoscopic-3D video Distance perception Audio-visual perception Crossmodal integration Quality of experience 153
38	Quel son spatialisé pour la vidéo 3D ? : influence d'un rendu Wave Field Synthesis sur l'expérience audio-visuelle 3D / Which spatialized sound for 3D video ? : influence of a Wave Field Synthesis rendering on 3D audio-visual experience Moulin, Samuel 03 April 2015 (has links) Le monde du divertissement numérique connaît depuis plusieurs années une évolution majeure avec la démocratisation des technologies vidéo 3D. Il est désormais commun de visualiser des vidéos stéréoscopiques sur différents supports : au cinéma, à la télévision, dans les jeux vidéos, etc. L'image 3D a considérablement évolué mais qu'en est-il des technologies de restitution sonore associées ? La plupart du temps, le son qui accompagne la vidéo 3D est basé sur des effets de latéralisation, plus au moins étendus (stéréophonie, systèmes 5.1). Il est pourtant naturel de s'interroger sur le besoin d'introduire des événements sonores en lien avec l'ajout de cette nouvelle dimension visuelle : la profondeur. Plusieurs technologies semblent pouvoir offrir une description sonore 3D de l'espace (technologies binaurales, Ambisonics, Wave Field Synthesis). Le recours à ces technologies pourrait potentiellement améliorer la qualité d'expérience de l'utilisateur, en termes de réalisme tout d'abord grâce à l'amélioration de la cohérence spatiale audio-visuelle, mais aussi en termes de sensation d'immersion. Afin de vérifier cette hypothèse, nous avons mis en place un système de restitution audio-visuelle 3D proposant une présentation visuelle stéréoscopique associée à un rendu sonore spatialisé par Wave Field Synthesis. Trois axes de recherche ont alors été étudiés : 1 / Perception de la distance en présentation unimodale ou bimodale. Dans quelle mesure le système audio-visuel est-il capable de restituer des informations spatiales relatives à la distance, dans le cas d'objets sonores, visuels, ou audio-visuels ? Les expériences menées montrent que la Wave Field Synthesis permet de restituer la distance de sources sonores virtuelles. D'autre part, les objets visuels et audio-visuels sont localisés avec plus de précisions que les objets uniquement sonores. 2 / Intégration multimodale suivant la distance. Comment garantir une perception spatiale audio-visuelle cohérente de stimuli simples ? Nous avons mesuré l'évolution de la fenêtre d'intégration spatiale audio-visuelle suivant la distance, c'est-à-dire les positions des stimuli audio et visuels pour lesquelles la fusion des percepts a lieu. 3 / Qualité d'expérience audio-visuelle 3D. Quel est l'apport du rendu de la profondeur sonore sur la qualité d'expérience audio-visuelle 3D ? Nous avons tout d'abord évalué la qualité d'expérience actuelle, lorsque la présentation de contenus vidéo 3D est associée à une bande son 5.1, diffusée par des systèmes grand public (système 5.1, casque, et barre de son). Nous avons ensuite étudié l'apport du rendu de la profondeur sonore grâce au système audio-visuel proposé (vidéo 3D associée à la Wave Field Synthesis). / The digital entertainment industry is undergoing a major evolution due to the recent spread of stereoscopic-3D videos. It is now possible to experience 3D by watching movies, playing video games, and so on. In this context, video catches most of the attention but what about the accompanying audio rendering? Today, the most often used sound reproduction technologies are based on lateralization effects (stereophony, 5.1 surround systems). Nevertheless, it is quite natural to wonder about the need of introducing a new audio technology adapted to this new visual dimension: the depth. Many alternative technologies seem to be able to render 3D sound environments (binaural technologies, ambisonics, Wave Field Synthesis). Using these technologies could potentially improve users' quality of experience. It could impact the feeling of realism by adding audio-visual spatial congruence, but also the immersion sensation. In order to validate this hypothesis, a 3D audio-visual rendering system is set-up. The visual rendering provides stereoscopic-3D images and is coupled with a Wave Field Synthesis sound rendering. Three research axes are then studied: 1/ Depth perception using unimodal or bimodal presentations. How the audio-visual system is able to render the depth of visual, sound, and audio-visual objects? The conducted experiments show that Wave Field Synthesis can render virtual sound sources perceived at different distances. Moreover, visual and audio-visual objects can be localized with a higher accuracy in comparison to sound objects. 2/ Crossmodal integration in the depth dimension. How to guarantee the perception of congruence when audio-visual stimuli are spatially misaligned? The extent of the integration window was studied at different visual object distances. In other words, according to the visual stimulus position, we studied where sound objects should be placed to provide the perception of a single unified audio-visual stimulus. 3/ 3D audio-visual quality of experience. What is the contribution of sound depth rendering on the 3D audio-visual quality of experience? We first assessed today's quality of experience using sound systems dedicated to the playback of 5.1 soundtracks (5.1 surround system, headphones, soundbar) in combination with 3D videos. Then, we studied the impact of sound depth rendering using the set-up audio-visual system (3D videos and Wave Field Synthesis). Wave Field Synthesis Vidéo stéréoscopique Perception de la distance Perception audio-visuelle Intégration multimodale Qualité d'expérience Wave Field Synthesis Stereoscopic-3D video Distance perception Audio-visual perception Crossmodal integration Quality of experience 153
39	Fear Processing in Dental Phobia during Crossmodal Symptom Provocation: An fMRI Study Hilbert, Kevin, Evens, Ricarda, Maslowski, Nina Isabel, Wittchen, Hans-Ulrich, Lüken, Ulrike 09 July 2014 (has links) While previous studies successfully identified the core neural substrates of the animal subtype of specific phobia, only few and inconsistent research is available for dental phobia. These findings might partly relate to the fact that, typically, visual stimuli were employed. The current study aimed to investigate the influence of stimulus modality on neural fear processing in dental phobia. Thirteen dental phobics (DP) and thirteen healthy controls (HC) attended a block-design functional magnetic resonance imaging (fMRI) symptom provocation paradigm encompassing both visual and auditory stimuli. Drill sounds and matched neutral sinus tones served as auditory stimuli and dentist scenes and matched neutral videos as visual stimuli. Group comparisons showed increased activation in the insula, anterior cingulate cortex, orbitofrontal cortex, and thalamus in DP compared to HC during auditory but not visual stimulation. On the contrary, no differential autonomic reactions were observed in DP. Present results are largely comparable to brain areas identified in animal phobia, but also point towards a potential downregulation of autonomic outflow by neural fear circuits in this disorder. Findings enlarge our knowledge about neural correlates of dental phobia and may help to understand the neural underpinnings of the clinical and physiological characteristics of the disorder. info:eu-repo/classification/ddc/610 ddc:610 info:eu-repo/classification/ddc/570 ddc:570
40	Sensory Integration under Natural Conditions: a Theoretical, Physiological and Behavioral Approach Onat, Selim 02 September 2011 (has links) We can affirm to apprehend a system in its totality only when we know how it behaves under its natural operating conditions. However, in the face of the complexity of the world, science can only evolve by simplifications, which paradoxically hide a good deal of the very mechanisms we are interested in. On the other hand, scientific enterprise is very tightly related to the advances in technology and the latter inevitably influences the manner in which the scientific experiments are conducted. Due to this factor, experimental conditions which would have been impossible to bring into laboratory not more than 20 years ago, are today within our reach. This thesis investigates neuronal integrative processes by using a variety of theoretical and experimental techniques wherein the approximation of ecologically relevant conditions within the laboratory is the common denominator. The working hypothesis of this thesis is that neurons and neuronal systems, in the sensory and higher cortices, are specifically adapted, as a result of evolutionary processes, to the sensory signals most likely to be received under ecologically relevant conditions. In order to conduct the present study along this line, we first recorded movies with the help of two microcameras carried by cats exploring a natural environment. This resulted in a database of binocular natural movies that was used in our theoretical and experimental studies. In a theoretical study, we aimed to understand the principles of binocular disparity encoding in terms of spatio-temporal statistical properties of natural movies in conjunction with simple mathematical expressions governing the activity levels of simulated neurons. In an unsupervised learning scheme, we used the binocular movies as input to a neuronal network and obtained receptive fields that represent these movies optimally with respect to the temporal stability criterion. Many distinctive aspects of the binocular coding in complex cells, such as the phase and position encoding of disparity and the existence of unbalanced ocular contributions, were seen to emerge as the result of this optimization process. Therefore we conclude that the encoding of binocular disparity by complex cells can be understood in terms of an optimization process that regulates activities of neurons receiving ecologically relevant information. Next we aimed to physiologically characterize the responses of the visual cortex to ecologically relevant stimuli in its full complexity and compare these to the responses evoked by artificial, conventional laboratory stimuli. To achieve this, a state-of-the-art recording method, voltage-sensitive dye imaging was used. This method captures the spatio-temporal activity patterns within the millisecond range across large cortical portions spanning over many pinwheels and orientation columns. It is therefore very well suited to provide a faithful picture of the cortical state in its full complexity. Drifting bar stimuli evoked two major sets of components, one coding for the position and the other for the orientation of the grating. Responses to natural stimuli involved more complex dynamics, which were locked to the motion present in the natural movies. In response to drifting gratings, the cortical state was initially dominated by a strong excitatory wave. This initial spatially widespread hyper-excitatory state had a detrimental effect on feature selectivity. In contrast, natural movies only rarely induced such high activity levels and the onset of inhibition cut short a further increase in activation level. An increase of 30% of the movie contrast was estimated to be necessary in order to produce activity levels comparable to gratings. These results show that the operating regime within which the natural movies are processed differs remarkably. Moreover, it remains to be established to what extent the cortical state under artificial conditions represents a valid state to make inferences concerning operationally more relevant input. The primary visual cortex contains a dense web of neuronal connections linking distant neurons. However the flow of information within this local network is to a large extent unknown under natural stimulation conditions. To functionally characterize these long-range intra-areal interactions, we presented natural movies also locally through either one or two apertures and analyzed the effects of the distant visual stimulation on the local activity levels. The distant patch had a net facilitatory effect on the local activity levels. Furthermore, the degree of the facilitation was dependent on the congruency between the two simultaneously presented movie patches. Taken together, our results indicate that the ecologically relevant stimuli are processed within a distinct operating regime characterized by moderate levels of excitation and/or high levels of inhibition, where facilitatory cooperative interactions form the basis of integrative processes. To gather better insights into the motion locking phenomenon and test the generalizability of the local cooperative processes toward larger scale interactions, we resorted to the unequalized temporal resolution of EEG and conducted a multimodal study. Inspired from the temporal properties of our natural movies, we designed a dynamic multimodal stimulus that was either congruent or incongruent across visual and auditory modalities. In the visual areas, the dynamic stimulation unfolded neuronal oscillations with frequencies well above the frequency spectrum content of the stimuli and the strength of these oscillations was coupled to the stimuli's motion profile. Furthermore, the coupling was found to be stronger in the case where the auditory and visual streams were congruent. These results show that the motion locking, which was so far observed in cats, is a phenomenon that also exists in humans. Moreover, the presence of long-range multimodal interactions indicates that, in addition to local intra-areal mechanisms ensuring the integration of local information, the central nervous system embodies an architecture that enables also the integration of information on much larger scales spread across different modalities. Any characterization of integrative phenomena at the neuronal level needs to be supplemented by its effects at the behavioral level. We therefore tested whether we could find any evidence of integration of different sources of information at the behavioral level using natural stimuli. To this end, we presented to human subjects images of natural scenes and evaluated the effect of simultaneously played localized natural sounds on their eye movements. The behavior during multimodal conditions was well approximated by a linear combination of the behavior under unimodal conditions. This is a strong indication that both streams of information are integrated in a joint multimodal saliency map before the final motor command is produced. The results presented here validate the possibility and the utility of using natural stimuli in experimental settings. It is clear that the ecological relevance of the experimental conditions are crucial in order to elucidate complex neuronal mechanisms resulting from evolutionary processes. In the future, having better insights on the nervous system can only be possible when the complexity of our experiments will match to the complexity of the mechanisms we are interested in. visual cortex neuronal dynamics feature integration crossmodal integration eye movements multisensory processing EEG voltage-sensitive dye imaging psychophysics overt attention natural stimuli electrophysiology singular value decomposition optical imaging ddc:500

Search results