31 |
Rastreamento de componentes conexas em vídeo 3D para obtenção de estruturas tridimensionais / Tracking of connected components from 3D video in order to obtain tridimensional structuresPires, David da Silva 17 August 2007 (has links)
Este documento apresenta uma dissertação sobre o desenvolvimento de um sistema de integração de dados para geração de estruturas tridimensionais a partir de vídeo 3D. O trabalho envolve a extensão de um sistema de vídeo 3D em tempo real proposto recentemente. Esse sistema, constituído por projetor e câmera, obtém imagens de profundidade de objetos por meio da projeção de slides com um padrão de faixas coloridas. Tal procedimento permite a obtenção, em tempo real, tanto do modelo 2,5 D dos objetos quanto da textura dos mesmos, segundo uma técnica denominada luz estruturada. Os dados são capturados a uma taxa de 30 quadros por segundo e possuem alta qualidade: resoluções de 640 x 480 pixeis para a textura e de 90 x 240 pontos (em média) para a geometria. A extensão que essa dissertação propõe visa obter o modelo tridimensional dos objetos presentes em uma cena por meio do registro dos dados (textura e geometria) dos diversos quadros amostrados. Assim, o presente trabalho é um passo intermediário de um projeto maior, no qual pretende-se fazer a reconstrução dos modelos por completo, bastando para isso apenas algumas imagens obtidas a partir de diferentes pontos de observação. Tal reconstrução deverá diminuir a incidência de pontos de oclusão (bastante comuns nos resultados originais) de modo a permitir a adaptação de todo o sistema para objetos móveis e deformáveis, uma vez que, no estado atual, o sistema é robusto apenas para objetos estáticos e rígidos. Até onde pudemos averiguar, nenhuma técnica já foi aplicada com este propósito. Este texto descreve o trabalho já desenvolvido, o qual consiste em um método para detecção, rastreamento e casamento espacial de componentes conexas presentes em um vídeo 3D. A informação de imagem do vídeo (textura) é combinada com posições tridimensionais (geometria) a fim de alinhar partes de superfícies que são vistas em quadros subseqüentes. Esta é uma questão chave no vídeo 3D, a qual pode ser explorada em diversas aplicações tais como compressão, integração geométrica e reconstrução de cenas, dentre outras. A abordagem que adotamos consiste na detecção de características salientes no espaço do mundo, provendo um alinhamento de geometria mais completo. O processo de registro é feito segundo a aplicação do algoritmo ICP---Iterative Closest Point---introduzido por Besl e McKay em 1992. Resultados experimentais bem sucedidos corroborando nosso método são apresentados. / This document presents a MSc thesis focused on the development of a data integration system to generate tridimensional structures from 3D video. The work involves the extension of a recently proposed real time 3D video system. This system, composed by a video camera and a projector, obtains range images of recorded objects using slide projection of a coloured stripe pattern. This procedure allows capturing, in real time, objects´ texture and 2,5 D model, at the same time, by a technique called structured light. The data are acquired at 30 frames per second, being of high quality: the resolutions are 640 x 480 pixels and 90 x 240 points (in average), respectively. The extension that this thesis proposes aims at obtaining the tridimensional model of the objects present in a scene through data matching (texture and geometry) of various sampled frames. Thus, the current work is an intermediary step of a larger project with the intent of achieving a complete reconstruction from only a few images obtained from different viewpoints. Such reconstruction will reduce the incidence of occlusion points (very common on the original results) such that it should be possible to adapt the whole system to moving and deformable objects (In the current state, the system is robust only to static and rigid objects.). To the best of our knowledge, there is no method that has fully solved this problem. This text describes the developed work, which consists of a method to perform detection, tracking and spatial matching of connected components present in a 3D video. The video image information (texture) is combined with tridimensional sites (geometry) in order to align surface portions seen on subsequent frames. This is a key step in the 3D video that may be explored in several applications such as compression, geometric integration and scene reconstruction, to name but a few. Our approach consists of detecting salient features in both image and world spaces, for further alignment of texture and geometry. The matching process is accomplished by the application of the ICP---Iterative Closest Point---algorithm, introduced by Besl and McKay in 1992. Succesful experimental results corroborating our method are shown.
|
32 |
Quel son spatialisé pour la vidéo 3D ? : influence d'un rendu Wave Field Synthesis sur l'expérience audio-visuelle 3D / Which spatialized sound for 3D video ? : influence of a Wave Field Synthesis rendering on 3D audio-visual experienceMoulin, Samuel 03 April 2015 (has links)
Le monde du divertissement numérique connaît depuis plusieurs années une évolution majeure avec la démocratisation des technologies vidéo 3D. Il est désormais commun de visualiser des vidéos stéréoscopiques sur différents supports : au cinéma, à la télévision, dans les jeux vidéos, etc. L'image 3D a considérablement évolué mais qu'en est-il des technologies de restitution sonore associées ? La plupart du temps, le son qui accompagne la vidéo 3D est basé sur des effets de latéralisation, plus au moins étendus (stéréophonie, systèmes 5.1). Il est pourtant naturel de s'interroger sur le besoin d'introduire des événements sonores en lien avec l'ajout de cette nouvelle dimension visuelle : la profondeur. Plusieurs technologies semblent pouvoir offrir une description sonore 3D de l'espace (technologies binaurales, Ambisonics, Wave Field Synthesis). Le recours à ces technologies pourrait potentiellement améliorer la qualité d'expérience de l'utilisateur, en termes de réalisme tout d'abord grâce à l'amélioration de la cohérence spatiale audio-visuelle, mais aussi en termes de sensation d'immersion. Afin de vérifier cette hypothèse, nous avons mis en place un système de restitution audio-visuelle 3D proposant une présentation visuelle stéréoscopique associée à un rendu sonore spatialisé par Wave Field Synthesis. Trois axes de recherche ont alors été étudiés : 1 / Perception de la distance en présentation unimodale ou bimodale. Dans quelle mesure le système audio-visuel est-il capable de restituer des informations spatiales relatives à la distance, dans le cas d'objets sonores, visuels, ou audio-visuels ? Les expériences menées montrent que la Wave Field Synthesis permet de restituer la distance de sources sonores virtuelles. D'autre part, les objets visuels et audio-visuels sont localisés avec plus de précisions que les objets uniquement sonores. 2 / Intégration multimodale suivant la distance. Comment garantir une perception spatiale audio-visuelle cohérente de stimuli simples ? Nous avons mesuré l'évolution de la fenêtre d'intégration spatiale audio-visuelle suivant la distance, c'est-à-dire les positions des stimuli audio et visuels pour lesquelles la fusion des percepts a lieu. 3 / Qualité d'expérience audio-visuelle 3D. Quel est l'apport du rendu de la profondeur sonore sur la qualité d'expérience audio-visuelle 3D ? Nous avons tout d'abord évalué la qualité d'expérience actuelle, lorsque la présentation de contenus vidéo 3D est associée à une bande son 5.1, diffusée par des systèmes grand public (système 5.1, casque, et barre de son). Nous avons ensuite étudié l'apport du rendu de la profondeur sonore grâce au système audio-visuel proposé (vidéo 3D associée à la Wave Field Synthesis). / The digital entertainment industry is undergoing a major evolution due to the recent spread of stereoscopic-3D videos. It is now possible to experience 3D by watching movies, playing video games, and so on. In this context, video catches most of the attention but what about the accompanying audio rendering? Today, the most often used sound reproduction technologies are based on lateralization effects (stereophony, 5.1 surround systems). Nevertheless, it is quite natural to wonder about the need of introducing a new audio technology adapted to this new visual dimension: the depth. Many alternative technologies seem to be able to render 3D sound environments (binaural technologies, ambisonics, Wave Field Synthesis). Using these technologies could potentially improve users' quality of experience. It could impact the feeling of realism by adding audio-visual spatial congruence, but also the immersion sensation. In order to validate this hypothesis, a 3D audio-visual rendering system is set-up. The visual rendering provides stereoscopic-3D images and is coupled with a Wave Field Synthesis sound rendering. Three research axes are then studied: 1/ Depth perception using unimodal or bimodal presentations. How the audio-visual system is able to render the depth of visual, sound, and audio-visual objects? The conducted experiments show that Wave Field Synthesis can render virtual sound sources perceived at different distances. Moreover, visual and audio-visual objects can be localized with a higher accuracy in comparison to sound objects. 2/ Crossmodal integration in the depth dimension. How to guarantee the perception of congruence when audio-visual stimuli are spatially misaligned? The extent of the integration window was studied at different visual object distances. In other words, according to the visual stimulus position, we studied where sound objects should be placed to provide the perception of a single unified audio-visual stimulus. 3/ 3D audio-visual quality of experience. What is the contribution of sound depth rendering on the 3D audio-visual quality of experience? We first assessed today's quality of experience using sound systems dedicated to the playback of 5.1 soundtracks (5.1 surround system, headphones, soundbar) in combination with 3D videos. Then, we studied the impact of sound depth rendering using the set-up audio-visual system (3D videos and Wave Field Synthesis).
|
33 |
Quel son spatialisé pour la vidéo 3D ? : influence d'un rendu Wave Field Synthesis sur l'expérience audio-visuelle 3D / Which spatialized sound for 3D video ? : influence of a Wave Field Synthesis rendering on 3D audio-visual experienceMoulin, Samuel 03 April 2015 (has links)
Le monde du divertissement numérique connaît depuis plusieurs années une évolution majeure avec la démocratisation des technologies vidéo 3D. Il est désormais commun de visualiser des vidéos stéréoscopiques sur différents supports : au cinéma, à la télévision, dans les jeux vidéos, etc. L'image 3D a considérablement évolué mais qu'en est-il des technologies de restitution sonore associées ? La plupart du temps, le son qui accompagne la vidéo 3D est basé sur des effets de latéralisation, plus au moins étendus (stéréophonie, systèmes 5.1). Il est pourtant naturel de s'interroger sur le besoin d'introduire des événements sonores en lien avec l'ajout de cette nouvelle dimension visuelle : la profondeur. Plusieurs technologies semblent pouvoir offrir une description sonore 3D de l'espace (technologies binaurales, Ambisonics, Wave Field Synthesis). Le recours à ces technologies pourrait potentiellement améliorer la qualité d'expérience de l'utilisateur, en termes de réalisme tout d'abord grâce à l'amélioration de la cohérence spatiale audio-visuelle, mais aussi en termes de sensation d'immersion. Afin de vérifier cette hypothèse, nous avons mis en place un système de restitution audio-visuelle 3D proposant une présentation visuelle stéréoscopique associée à un rendu sonore spatialisé par Wave Field Synthesis. Trois axes de recherche ont alors été étudiés : 1 / Perception de la distance en présentation unimodale ou bimodale. Dans quelle mesure le système audio-visuel est-il capable de restituer des informations spatiales relatives à la distance, dans le cas d'objets sonores, visuels, ou audio-visuels ? Les expériences menées montrent que la Wave Field Synthesis permet de restituer la distance de sources sonores virtuelles. D'autre part, les objets visuels et audio-visuels sont localisés avec plus de précisions que les objets uniquement sonores. 2 / Intégration multimodale suivant la distance. Comment garantir une perception spatiale audio-visuelle cohérente de stimuli simples ? Nous avons mesuré l'évolution de la fenêtre d'intégration spatiale audio-visuelle suivant la distance, c'est-à-dire les positions des stimuli audio et visuels pour lesquelles la fusion des percepts a lieu. 3 / Qualité d'expérience audio-visuelle 3D. Quel est l'apport du rendu de la profondeur sonore sur la qualité d'expérience audio-visuelle 3D ? Nous avons tout d'abord évalué la qualité d'expérience actuelle, lorsque la présentation de contenus vidéo 3D est associée à une bande son 5.1, diffusée par des systèmes grand public (système 5.1, casque, et barre de son). Nous avons ensuite étudié l'apport du rendu de la profondeur sonore grâce au système audio-visuel proposé (vidéo 3D associée à la Wave Field Synthesis). / The digital entertainment industry is undergoing a major evolution due to the recent spread of stereoscopic-3D videos. It is now possible to experience 3D by watching movies, playing video games, and so on. In this context, video catches most of the attention but what about the accompanying audio rendering? Today, the most often used sound reproduction technologies are based on lateralization effects (stereophony, 5.1 surround systems). Nevertheless, it is quite natural to wonder about the need of introducing a new audio technology adapted to this new visual dimension: the depth. Many alternative technologies seem to be able to render 3D sound environments (binaural technologies, ambisonics, Wave Field Synthesis). Using these technologies could potentially improve users' quality of experience. It could impact the feeling of realism by adding audio-visual spatial congruence, but also the immersion sensation. In order to validate this hypothesis, a 3D audio-visual rendering system is set-up. The visual rendering provides stereoscopic-3D images and is coupled with a Wave Field Synthesis sound rendering. Three research axes are then studied: 1/ Depth perception using unimodal or bimodal presentations. How the audio-visual system is able to render the depth of visual, sound, and audio-visual objects? The conducted experiments show that Wave Field Synthesis can render virtual sound sources perceived at different distances. Moreover, visual and audio-visual objects can be localized with a higher accuracy in comparison to sound objects. 2/ Crossmodal integration in the depth dimension. How to guarantee the perception of congruence when audio-visual stimuli are spatially misaligned? The extent of the integration window was studied at different visual object distances. In other words, according to the visual stimulus position, we studied where sound objects should be placed to provide the perception of a single unified audio-visual stimulus. 3/ 3D audio-visual quality of experience. What is the contribution of sound depth rendering on the 3D audio-visual quality of experience? We first assessed today's quality of experience using sound systems dedicated to the playback of 5.1 soundtracks (5.1 surround system, headphones, soundbar) in combination with 3D videos. Then, we studied the impact of sound depth rendering using the set-up audio-visual system (3D videos and Wave Field Synthesis).
|
34 |
Rastreamento de componentes conexas em vídeo 3D para obtenção de estruturas tridimensionais / Tracking of connected components from 3D video in order to obtain tridimensional structuresDavid da Silva Pires 17 August 2007 (has links)
Este documento apresenta uma dissertação sobre o desenvolvimento de um sistema de integração de dados para geração de estruturas tridimensionais a partir de vídeo 3D. O trabalho envolve a extensão de um sistema de vídeo 3D em tempo real proposto recentemente. Esse sistema, constituído por projetor e câmera, obtém imagens de profundidade de objetos por meio da projeção de slides com um padrão de faixas coloridas. Tal procedimento permite a obtenção, em tempo real, tanto do modelo 2,5 D dos objetos quanto da textura dos mesmos, segundo uma técnica denominada luz estruturada. Os dados são capturados a uma taxa de 30 quadros por segundo e possuem alta qualidade: resoluções de 640 x 480 pixeis para a textura e de 90 x 240 pontos (em média) para a geometria. A extensão que essa dissertação propõe visa obter o modelo tridimensional dos objetos presentes em uma cena por meio do registro dos dados (textura e geometria) dos diversos quadros amostrados. Assim, o presente trabalho é um passo intermediário de um projeto maior, no qual pretende-se fazer a reconstrução dos modelos por completo, bastando para isso apenas algumas imagens obtidas a partir de diferentes pontos de observação. Tal reconstrução deverá diminuir a incidência de pontos de oclusão (bastante comuns nos resultados originais) de modo a permitir a adaptação de todo o sistema para objetos móveis e deformáveis, uma vez que, no estado atual, o sistema é robusto apenas para objetos estáticos e rígidos. Até onde pudemos averiguar, nenhuma técnica já foi aplicada com este propósito. Este texto descreve o trabalho já desenvolvido, o qual consiste em um método para detecção, rastreamento e casamento espacial de componentes conexas presentes em um vídeo 3D. A informação de imagem do vídeo (textura) é combinada com posições tridimensionais (geometria) a fim de alinhar partes de superfícies que são vistas em quadros subseqüentes. Esta é uma questão chave no vídeo 3D, a qual pode ser explorada em diversas aplicações tais como compressão, integração geométrica e reconstrução de cenas, dentre outras. A abordagem que adotamos consiste na detecção de características salientes no espaço do mundo, provendo um alinhamento de geometria mais completo. O processo de registro é feito segundo a aplicação do algoritmo ICP---Iterative Closest Point---introduzido por Besl e McKay em 1992. Resultados experimentais bem sucedidos corroborando nosso método são apresentados. / This document presents a MSc thesis focused on the development of a data integration system to generate tridimensional structures from 3D video. The work involves the extension of a recently proposed real time 3D video system. This system, composed by a video camera and a projector, obtains range images of recorded objects using slide projection of a coloured stripe pattern. This procedure allows capturing, in real time, objects´ texture and 2,5 D model, at the same time, by a technique called structured light. The data are acquired at 30 frames per second, being of high quality: the resolutions are 640 x 480 pixels and 90 x 240 points (in average), respectively. The extension that this thesis proposes aims at obtaining the tridimensional model of the objects present in a scene through data matching (texture and geometry) of various sampled frames. Thus, the current work is an intermediary step of a larger project with the intent of achieving a complete reconstruction from only a few images obtained from different viewpoints. Such reconstruction will reduce the incidence of occlusion points (very common on the original results) such that it should be possible to adapt the whole system to moving and deformable objects (In the current state, the system is robust only to static and rigid objects.). To the best of our knowledge, there is no method that has fully solved this problem. This text describes the developed work, which consists of a method to perform detection, tracking and spatial matching of connected components present in a 3D video. The video image information (texture) is combined with tridimensional sites (geometry) in order to align surface portions seen on subsequent frames. This is a key step in the 3D video that may be explored in several applications such as compression, geometric integration and scene reconstruction, to name but a few. Our approach consists of detecting salient features in both image and world spaces, for further alignment of texture and geometry. The matching process is accomplished by the application of the ICP---Iterative Closest Point---algorithm, introduced by Besl and McKay in 1992. Succesful experimental results corroborating our method are shown.
|
35 |
Quel son spatialisé pour la vidéo 3D ? : influence d'un rendu Wave Field Synthesis sur l'expérience audio-visuelle 3D / Which spatialized sound for 3D video ? : influence of a Wave Field Synthesis rendering on 3D audio-visual experienceMoulin, Samuel 03 April 2015 (has links)
Le monde du divertissement numérique connaît depuis plusieurs années une évolution majeure avec la démocratisation des technologies vidéo 3D. Il est désormais commun de visualiser des vidéos stéréoscopiques sur différents supports : au cinéma, à la télévision, dans les jeux vidéos, etc. L'image 3D a considérablement évolué mais qu'en est-il des technologies de restitution sonore associées ? La plupart du temps, le son qui accompagne la vidéo 3D est basé sur des effets de latéralisation, plus au moins étendus (stéréophonie, systèmes 5.1). Il est pourtant naturel de s'interroger sur le besoin d'introduire des événements sonores en lien avec l'ajout de cette nouvelle dimension visuelle : la profondeur. Plusieurs technologies semblent pouvoir offrir une description sonore 3D de l'espace (technologies binaurales, Ambisonics, Wave Field Synthesis). Le recours à ces technologies pourrait potentiellement améliorer la qualité d'expérience de l'utilisateur, en termes de réalisme tout d'abord grâce à l'amélioration de la cohérence spatiale audio-visuelle, mais aussi en termes de sensation d'immersion. Afin de vérifier cette hypothèse, nous avons mis en place un système de restitution audio-visuelle 3D proposant une présentation visuelle stéréoscopique associée à un rendu sonore spatialisé par Wave Field Synthesis. Trois axes de recherche ont alors été étudiés : 1 / Perception de la distance en présentation unimodale ou bimodale. Dans quelle mesure le système audio-visuel est-il capable de restituer des informations spatiales relatives à la distance, dans le cas d'objets sonores, visuels, ou audio-visuels ? Les expériences menées montrent que la Wave Field Synthesis permet de restituer la distance de sources sonores virtuelles. D'autre part, les objets visuels et audio-visuels sont localisés avec plus de précisions que les objets uniquement sonores. 2 / Intégration multimodale suivant la distance. Comment garantir une perception spatiale audio-visuelle cohérente de stimuli simples ? Nous avons mesuré l'évolution de la fenêtre d'intégration spatiale audio-visuelle suivant la distance, c'est-à-dire les positions des stimuli audio et visuels pour lesquelles la fusion des percepts a lieu. 3 / Qualité d'expérience audio-visuelle 3D. Quel est l'apport du rendu de la profondeur sonore sur la qualité d'expérience audio-visuelle 3D ? Nous avons tout d'abord évalué la qualité d'expérience actuelle, lorsque la présentation de contenus vidéo 3D est associée à une bande son 5.1, diffusée par des systèmes grand public (système 5.1, casque, et barre de son). Nous avons ensuite étudié l'apport du rendu de la profondeur sonore grâce au système audio-visuel proposé (vidéo 3D associée à la Wave Field Synthesis). / The digital entertainment industry is undergoing a major evolution due to the recent spread of stereoscopic-3D videos. It is now possible to experience 3D by watching movies, playing video games, and so on. In this context, video catches most of the attention but what about the accompanying audio rendering? Today, the most often used sound reproduction technologies are based on lateralization effects (stereophony, 5.1 surround systems). Nevertheless, it is quite natural to wonder about the need of introducing a new audio technology adapted to this new visual dimension: the depth. Many alternative technologies seem to be able to render 3D sound environments (binaural technologies, ambisonics, Wave Field Synthesis). Using these technologies could potentially improve users' quality of experience. It could impact the feeling of realism by adding audio-visual spatial congruence, but also the immersion sensation. In order to validate this hypothesis, a 3D audio-visual rendering system is set-up. The visual rendering provides stereoscopic-3D images and is coupled with a Wave Field Synthesis sound rendering. Three research axes are then studied: 1/ Depth perception using unimodal or bimodal presentations. How the audio-visual system is able to render the depth of visual, sound, and audio-visual objects? The conducted experiments show that Wave Field Synthesis can render virtual sound sources perceived at different distances. Moreover, visual and audio-visual objects can be localized with a higher accuracy in comparison to sound objects. 2/ Crossmodal integration in the depth dimension. How to guarantee the perception of congruence when audio-visual stimuli are spatially misaligned? The extent of the integration window was studied at different visual object distances. In other words, according to the visual stimulus position, we studied where sound objects should be placed to provide the perception of a single unified audio-visual stimulus. 3/ 3D audio-visual quality of experience. What is the contribution of sound depth rendering on the 3D audio-visual quality of experience? We first assessed today's quality of experience using sound systems dedicated to the playback of 5.1 soundtracks (5.1 surround system, headphones, soundbar) in combination with 3D videos. Then, we studied the impact of sound depth rendering using the set-up audio-visual system (3D videos and Wave Field Synthesis).
|
36 |
3D video browser / 3D Video BrowserMíchal, Vít January 2009 (has links)
The aim of this project is to design and create application for the visualization of interconnected video data. Visualization takes place in 3D space and seeks to exploit its advantages (such as depth perception). This document contains survey to different categories of spatial user interfaces. In addition, includes three possible designs of user interfaces and control. Implementation details and made usability tests are also described. Application is implemented in C++ using Open Inventor. The document includes evaluation of the results and made tests.
|
37 |
3D Video Playback : A modular cross-platform GPU-based approach for flexible multi-view 3D video renderingAndersson, Håkan January 2010 (has links)
The evolution of depth‐perception visualization technologies, emerging format standardization work and research within the field of multi‐view 3D video and imagery addresses the need for flexible 3D video visualization. The wide variety of available 3D‐display types and visualization techniques for multi‐view video, as well as the high throughput requirements for high definition video, addresses the need for a real‐time 3D video playback solution that takes advantage of hardware accelerated graphics, while providing a high degree of flexibility through format configuration and cross‐platform interoperability. A modular component based software solution based on FFmpeg for video demultiplexing and video decoding is proposed,using OpenGL and GLUT for hardware accelerated graphics and POSIX threads for increased CPU utilization. The solution has been verified to have sufficient throughput in order to display 1080p video at the native video frame rate on the experimental system, which is considered as a standard high‐end desktop PC only using commercial hardware. In order to evaluate the performance of the proposed solution a number of throughput evaluation metrics have been introduced measuring average frame rate as a function of: video bit rate, video resolution and number of views. The results obtained have indicated that the GPU constitutes the primary bottleneck in a multi‐view lenticular rendering system and that multi‐view rendering performance is degraded as the number of views is increased. This is a result of the current GPU square matrix texture cache architectures, resulting in texture lookup access times according to random memory access patterns when the number of views is high. The proposed solution has been identified in order to provide low CPU efficiency, i.e. low CPU hardware utilization and it is recommended to increase performance by investigating the gains of scalable multithreading techniques. It is also recommended to investigate the gains of introducing video frame buffering in video memory or to move more calculations to the CPU in order to increase GPU performance.
|
38 |
Objective assessment of stereoscopic video quality of 3DTV / Évaluation objective de la qualité vidéo en TV 3D reliefKhaustova, Darya 30 January 2015 (has links)
Le niveau d'exigence minimum pour tout système 3D (images stéréoscopiques) est de garantir le confort visuel des utilisateurs. Le confort visuel est un des trois axes perceptuels de la qualité d'expérience (QoE) 3D qui peut être directement lié aux paramètres techniques du système 3D. Par conséquent, le but de cette thèse est de caractériser objectivement l'impact de ces paramètres sur la perception humaine afin de contrôler la qualité stéréoscopique. La première partie de la thèse examine l'intérêt de prendre en compte l'attention visuelle des spectateurs dans la conception d'une mesure objective de qualité 3D. Premièrement, l'attention visuelle en 2D et 3D sont comparées en utilisant des stimuli simples. Les conclusions de cette première expérience sont validées en utilisant des scènes complexes avec des disparités croisées et décroisées. De plus, nous explorons l'impact de l'inconfort visuel causé par des disparités excessives sur l'attention visuelle. La seconde partie de la thèse est dédiée à la conception d'un modèle objectif de QoE pour des vidéos 3D, basé sur les seuils perceptuels humains et le niveau d'acceptabilité. De plus nous explorons la possibilité d'utiliser la modèle proposé comme une nouvelle échelle subjective. Pour la validation de ce modèle, des expériences subjectives sont conduites présentant aux sujets des images stéréoscopiques fixes et animées avec différents niveaux d'asymétrie. La performance est évaluée en comparant des prédictions objectives avec des notes subjectives pour différents niveaux d'asymétrie qui pourraient provoquer un inconfort visuel. / The minimum requirement for any 3D (stereoscopic images) system is to guarantee visual comfort of viewers. Visual comfort is one of the three primary perceptual attributes of 3D QoE, which can be linked directly with technical parameters of a 3D system. Therefore, the goal of this thesis is to characterize objectively the impact of these parameters on human perception for stereoscopic quality monitoring. The first part of the thesis investigates whether visual attention of the viewers should be considered when designing an objective 3D quality metrics. First, the visual attention in 2D and 3D is compared using simple test patterns. The conclusions of this first experiment are validated using complex stimuli with crossed and uncrossed disparities. In addition, we explore the impact of visual discomfort caused by excessive disparities on visual attention. The second part of the thesis is dedicated to the design of an objective model of 3D video QoE, which is based on human perceptual thresholds and acceptability level. Additionally we explore the possibility to use the proposed model as a new subjective scale. For the validation of proposed model, subjective experiments with fully controlled still and moving stereoscopic images with different types of view asymmetries are conducted. The performance is evaluated by comparing objective predictions with subjective scores for various levels of view discrepancies which might provoke visual discomfort.
|
Page generated in 0.0472 seconds