Global ETD Search

31	3D Instance Segmentation of Cluttered Scenes : A Comparative Study of 3D Data Representations Konradsson, Albin, Bohman, Gustav January 2021 (has links) This thesis provides a comparison between instance segmentation methods using point clouds and depth images. Specifically, their performance on cluttered scenes of irregular objects in an industrial environment is investigated. Recent work by Wang et al. [1] has suggested potential benefits of a point cloud representation when performing deep learning on data from 3D cameras. However, little work has been done to enable quantifiable comparisons between methods based on different representations, particularly on industrial data. Generating synthetic data provides accurate grayscale, depth map, and point cloud representations for a large number of scenes and can thus be used to compare methods regardless of datatype. The datasets in this work are created using a tool provided by SICK. They simulate postal packages on a conveyor belt scanned by a LiDAR, closely resembling a common industry application. Two datasets are generated. One dataset has low complexity, containing only boxes.The other has higher complexity, containing a combination of boxes and multiple types of irregularly shaped parcels. State-of-the-art instance segmentation methods are selected based on their performance on existing benchmarks. We chose PointGroup by Jiang et al. [2], which uses point clouds, and Mask R-CNN by He et al. [3], which uses images. The results support that there may be benefits of using a point cloud representation over depth images. PointGroup performs better in terms of the chosen metric on both datasets. On low complexity scenes, the inference times are similar between the two methods tested. However, on higher complexity scenes, MaskR-CNN is significantly faster. Deep Learning Computer Vision Point Cloud Depth Map 3D Instance Segmentation Cluttered Scenes
32	Robust Learning of a depth map for obstacle avoidance with a monocular stabilized flying camera / Apprentissage robuste d'une carte de profondeur pour l'évitement d'obstacle dans le cas des cameras volantes, monoculaires et stabilisées Pinard, Clément 24 June 2019 (has links) Le drone orienté grand public est principalement une caméra volante, stabilisée et de bonne qualité. Ceux-ci ont démocratisé la prise de vue aérienne, mais avec leur succès grandissant, la notion de sécurité est devenue prépondérante.Ce travail s'intéresse à l'évitement d'obstacle, tout en conservant un vol fluide pour l'utilisateur.Dans ce contexte technologique, nous utilisons seulement une camera stabilisée, par contrainte de poids et de coût.Pour leur efficacité connue en vision par ordinateur et leur performance avérée dans la résolution de tâches complexes, nous utilisons des réseaux de neurones convolutionnels (CNN). Notre stratégie repose sur un systeme de plusieurs niveaux de complexité dont les premieres étapes sont de mesurer une carte de profondeur depuis la caméra. Cette thèse étudie les capacités d'un CNN à effectuer cette tâche.La carte de profondeur, étant particulièrement liée au flot optique dans le cas d'images stabilisées, nous adaptons un réseau connu pour cette tâche, FlowNet, afin qu'il calcule directement la carte de profondeur à partir de deux images stabilisées. Ce réseau est appelé DepthNet.Cette méthode fonctionne en simulateur avec un entraînement supervisé, mais n'est pas assez robuste pour des vidéos réelles. Nous étudions alors les possibilites d'auto-apprentissage basées sur la reprojection différentiable d'images. Cette technique est particulièrement nouvelle sur les CNNs et nécessite une étude détaillée afin de ne pas dépendre de paramètres heuristiques.Finalement, nous développons un algorithme de fusion de cartes de profondeurs pour utiliser DepthNet sur des vidéos réelles. Plusieurs paires différentes sont données à DepthNet afin d'avoir une grande plage de profondeurs mesurées. / Customer unmanned aerial vehicles (UAVs) are mainly flying cameras. They democratized aerial footage, but with thei success came security concerns.This works aims at improving UAVs security with obstacle avoidance, while keeping a smooth flight. In this context, we use only one stabilized camera, because of weight and cost incentives.For their robustness in computer vision and thei capacity to solve complex tasks, we chose to use convolutional neural networks (CNN). Our strategy is based on incrementally learning tasks with increasing complexity which first steps are to construct a depth map from the stabilized camera. This thesis is focused on studying ability of CNNs to train for this task.In the case of stabilized footage, the depth map is closely linked to optical flow. We thus adapt FlowNet, a CNN known for optical flow, to output directly depth from two stabilized frames. This network is called DepthNet.This experiment succeeded with synthetic footage, but is not robust enough to be used directly on real videos. Consequently, we consider self supervised training with real videos, based on differentiably reproject images. This training method for CNNs being rather novel in literature, a thorough study is needed in order not to depend too moch on heuristics.Finally, we developed a depth fusion algorithm to use DepthNet efficiently on real videos. Multiple frame pairs are fed to DepthNet to get a great depth sensing range. Carte de profondeur Robuste Caméra monoculaire Caméra stabilisée Reseau de neurones Apprentissage profond Depth map Robust Stabilized camera Monocular camera Neural networks Deep learning
33	Antal tvärsektioners påverkan på djupmodeller producerad av SeaFloor HydroLite ™ enkelstråligt ekolod : En jämförelse mot djupmodeller producerad av Kongsberg EM 2040P MKII flerstråligt ekolod Hägg, Linnéa, Stenberg Jönsson, Simon January 2023 (has links) Hydroakustiska mätningar har gjorts i nästan två hundra år. Det kan liknas med topografiska mätningar på land och visar hur sjö- eller havsbottnar ser ut. Idag används ekolod vilket är en teknik som skickar ut ljudvågor i vattnet för att mäta hur lång tid det tar för ljudet att studsa på bottnen och sedan komma upp till instrumentet igen. Därefter går det att räkna ut djupet med hjälp av ljudhastighetsberäkningar. Vid inmätning av enkelstråligt ekolod rekommenderas användande av tvärsektioner som kontroll av data. Flerstråligt ekolod behöver däremot inte tvärsektioner då övertäckning mellan stråken används som kontroll. I denna studie undersöks hur antalet tvärsektioner påverkar djupkartor skapade av Seafloor HydroLite TM enkelstråligt ekolod. Detta är även en undersökning av hur djupkartor producerade av SeaFloor HydroLite TM enkelstråligt ekolod skiljer sig mot djupkartor producerade av Kongsberg EM 2040 MK11 flerstråligt ekolod. Studieområdet är 1820 m2 och är beläget vid Forsbackas hamn i Storsjön, Gävle kommun. Vid inmätning av flerstråligt ekolod användes en övertäckning av lägst 50 %. Fem huvudstråk och sju tvärsektioner mättes med enkelstråligt ekolod för området. Djupkartor med olika antal tvärsektioner gjordes i Surfer 10 från enkelstråligt ekolod. Därefter jämfördes djupkartor av enkelstråligt ekolod mot kartor gjorda av data från flerstråligt ekolod för att se hur djupkartorna skiljer sig och för att se hur djupkartorna av enkelstråligt ekolod påverkas av olika antal tvärsektioner. Med användande av flerstråligt ekolod som referens mot djupkartor gjorda av enkelstråligt ekolod blev resultaten att RMS och standardosäkerhet minskar med 1 cm i RMS-värde och med 2 cm i standardosäkerhet. Jämförelse mellan ekolods systemen visar att skillnaden av djupvärderna är runt 10 cm. Slutsatserna från denna studie är att tvärsektioner endast förbättrar kvalitén på djupkartor marginellt vid jämn och enhetlig bottentopografi, men fyller en viktig funktion genom att kontrollera kvalitén av inmätningsdatat. Samt att SeaFloor HydroLite TM klarar av order 1b vid ett djup omkring en till fyra meter om ej kravet på full bottentäckning beaktas. Seafloor HydroLite TM skapar en översiktlig djupkarta medan djupmodellerna från Kongsberg EM 2040 MKII ser mera detaljer. / Hydroacoustic measurements have been conducted for almost two hundred years. It can be compared to topographic measurements on land and shows the appearance of lake or ocean floors. Today, echosounders are used, which is a technique that sends out sound waves into the water to measure the time it takes for the sound to bounce off the bottom and return to the instrument. Sound velocity calculations can then be used to calculate the depth. The use of cross-sections is recommended as a data control of single beam echosounder. However, multi beam echosounders only use overlap as control. This study examines how the number of cross-sections affects depth maps created by Seafloor HydroLite TM single beam echosounder. It also investigates the differences between depth maps produced by the SeaFloor HydroLite TM single beam echosounder and the Kongsberg EM 2040 MK11 multi beam echosounder. The study area covers 1820 m2 and is located at Forsbackas Harbor in Storsjön, Gävle municipality. A minimum overlap of 50% was used for the surveying with the multi beam echosounder. Five main lines and seven cross-sections were measured using the single beam echosounder. Depth maps with different numbers of cross-sections were created using data from the single beam echosounder. The maps from the single beam echosounder were compared to maps created from the data obtained by the multi beam echosounder to assess the differences and the impact of varying numbers of cross-sections on the depth maps from the single beam echosounder. By using the multi beam echosounder as a reference for the depth maps created by the single beam echosounder, the results showed a decrease of 1 cm in RMS value and 2 cm in standard deviation. The comparison between the echosounder systems revealed a difference of around 10 cm in depth values. The conclusions from this study are that cross-sections only marginally improve the quality of depth maps in cases of even and uniform bottom topography but serve an important function in validating the quality of the survey data. Additionally, the SeaFloor HydroLite TM is capable of meeting Order 1b at depths ranging from one to four meters if the requirement for full bottom coverage is not considered. The Seafloor HydroLite TM creates a general overview of the depth map, while the depth models from the Kongsberg EM 2040 MKII provide more detailed information. Single beam echo sounder multi beam echo sounder cross lines depth map Enkelstråligt ekolod flerstråligt ekolod tvärsektioner djupkarta
34	Исследование задачи построения карты глубины изображения с помощью сверточной нейронной сети : магистерская диссертация / Study of the problem of constructing an image depth map using a convolutional neural network Бакулин, С. А., Bakulin, S. A. January 2024 (has links) Объект исследования: алгоритмы оценки глубины изображения. Предмет исследования: методы обучения и оптимизаций построения карты глубины из одного изображения. Цель работы: оптимизация алгоритма построения карты глубины изображения на основе глубокой нейронной сети. В процессе исследования проводились: сравнение базовых архитектур. модели, анализ и визуализация полученных результатов, измерение производительности различных архитектур, наблюдение за аномальными случаями в процессе обучения и эксплуатации модели. В работе продемонстрирован алгоритм построения, обучения и оптимизации сверточной нейронной сети для оценки глубины изображения. Область практического применения: алгоритмы оценки глубины изображения используются в следующих сферах: беспилотное управление транспортными средствами, 3D реконструкция сцены, AR/VR, навигационные системы, медицина, анимация. / Object of the study: algorithms for estimating the image depth. Subject of the study: methods of training and optimization of constructing a depth map from a single image. Objective of the work: optimization of the algorithm for constructing an image depth map based on a deep neural network. During the study, the following was carried out: comparison of basic architectures. models, analysis and visualization of the obtained results, measurement of the performance of various architectures, observation of anomalous cases during the training and operation of the model. The work demonstrates an algorithm for constructing, training and optimizing a convolutional neural network for estimating the image depth. Area of practical application: image depth estimation algorithms are used in the following areas: unmanned vehicle control, 3D scene reconstruction, AR / VR, navigation systems, medicine, animation. MASTER'S THESIS COMPUTER VISION DEPTH ESTIMATION DEPTH MAP CONVOLUTIONAL NEURAL NETWORKS TRANSFER LEARNING КОМПЬЮТЕРНОЕ ЗРЕНИЕ ОЦЕНКА ГЛУБИНЫ КАРТА ГЛУБИНЫ ПЕРЕНОС ОБУЧЕНИЯ
35	Reconstrução tridimensional para objetos de herança virtual. / Tridimensional reconstruction for virtual heritage objects. Miranda, Hardy José Santos de 28 May 2018 (has links) Em um primeiro momento as novas tecnologias podem impulsionar acentuadamente a interação com um elemento, o que pode levar à um aprendizado significativo, mas esse impulso reduz assim que a interação se torna comum ou até mesmo repetitiva. Quando essa nova tecnologia se torna natural para o usuário ela deixa de ser uma novidade e se torna uma ferramenta. O uso de Imagens Geradas por Computador (IGC) experienciaram exatamente isso, décadas atrás, mas estão constantemente sendo iteradas de acordo com suas necessidades de reavaliação frequentes. Com o desenvolvimento das IGC as imagens tridimensionais deixaram de ser um formato excessivamente complicado, ao passo que hardwares e conceitos foram adentrando objetos do dia-a-dia como smartphones, webcams, câmeras, aplicativos de geração de malhas 3D, etc. O seu uso com objetivos museológicos se tornou evidente no campo da herança cultural para arquivamento e comunicação. Sendo assim, para verificar a viabilidade para uma solução fácil e de baixo custo visando novos usuários, diferentes tipos de métodos não-destrutivos de reconstrução baseadas na superfície foram analisados. Para isso, identificou-se a qualidade do resultado de precisão da malha, a rastreabilidade e a compatibilidade dos mesmos. Com esse objetivo, foi proposto um método com um conjunto de métricas que podem ser aplicadas para determinar a usabilidade de um objeto reconstruído com um fim específico. Quatro artefatos arqueológicos foram escaneados usando métodos de vídeo fotogrametria e vídeo de profundidade, a serem comparados com substitutos escaneados a laser. Depois de analisar os escaneamentos dos mesmos objetos com esses diferentes métodos, concluiu-se que a fotogrametria é capaz de gerar com rapidez um modelo altamente detalhado, mas com várias distorções. A profundidade de câmera gerou superfícies mais suaves e maior incidência de erros. Em última análise, cada método apresentado demonstra múltiplas possibilidades para materialização, dependendo do objetivo, resolução e de quão detalhado o objeto deve ser para ser corretamente compreendido. / At first glance new technologies can provide an engaging way to interact with a subject which may induce a meaningful learning, but it soon falls short when it becomes common or even repetitive. As this new technology becomes natural to the user, it no longer relies on novelty and goes into another condition, as a tool. The use of Computer-Generated Imagery (CGI) experienced exactly this, decades ago, but as it\'s constantly being iterated upon it needs to be reassessed often. As CGI goes, the tridimensional imagery as an overcomplicated format started to fade, as new hardware and concepts made way into everyday objects such as smartphones, webcams, cameras, 3D mesh generation apps, etc. It\'s use for museological purposes became clear in the field of cultural heritage for archiving and communication. So, to verify the viability for a low-cost and easy to use solution aiming to novice users, different types of non-destructive methods surface based reconstructions are analyzed to identify the quality of the resulted mesh based on precision, traceability and compatibility. To this end, it was proposed a method with a set of metrics which can be used to point out the usability of a reconstructed object for a specific end. Four archaeological artifacts were scanned using the video photogrammetry method and the depth video method, to be compared with a laser scanned surrogate. After analyzing the scans of the same objects with these different methods, the conclusion is that photogrammetry has the power to provide a highly detailed model very fast but with several distortions. The depth camera provided smoother surfaces and higher error. Ultimately, each method presented multiple possibilities for materialize itself, depending on target resolution and how detailed the object must be to correctly understand it. 3D reconstruction Arqueologia virtual Computação gráfica Computer-generated imagery Depth map Fotogrametria Herança virtual Imagens geradas por computador Mapa de profundidade Photogrammetry Reconstrução 3D Virtual archaeology Virtual heritage
36	MPEG Z/Alpha and high-resolution MPEG / MPEG Z/Alpha och högupplösande MPEG-video Ziegler, Gernot January 2003 (has links) <p>The progression of technical development has yielded practicable camera systems for the acquisition of so called depth maps, images with depth information. </p><p>Images and movies with depth information open the door for new types of applications in the area of computer graphics and vision. That implies that they will need to be processed in all increasing volumes.</p><p>Increased depth image processing puts forth the demand for a standardized data format for the exchange of image data with depth information, both still and animated. Software to convert acquired depth data to such videoformats is highly necessary. </p><p>This diploma thesis sheds light on many of the issues that come with this new task group. It spans from data acquisition over readily available software for the data encoding to possible future applications. </p><p>Further, a software architecture fulfilling all of the mentioned demands is presented. </p><p>The encoder is comprised of a collection of UNIX programs that generate MPEG Z/Alpha, an MPEG2 based video format. MPEG Z/Alpha contains beside MPEG2's standard data streams one extra data stream to store image depth information (and transparency). </p><p>The decoder suite, called TexMPEG, is a C library for the in-memory decompression of MPEG Z/Alpha. Much effort has been put into video decoder parallelization, and TexMPEG is now capable of decoding multiple video streams, not only in parallel internally, but also with inherent frame synchronization between parallely decoded MPEG videos.</p> Datorteknik MPEG2 Z/Alpha depth map depth data stream panorama VR theatre frame synchronous playback OpenGL video textures libmpeg2 mpeg2enc demultiplexer relief textures video decoder parallelization Datorteknik Computer engineering Datorteknik
37	Algorithmes, architecture et éléments optiques pour l'acquisition embarquées d'images totalement focalisées et annotées en distance / Algorithms, architecture and optics components for embedded All-in-Focus and distance-annoted image acquision system Emberger, Simon 13 December 2017 (has links) L'acquisition de la profondeur d'une scène en plus de son image est une caractéristique souhaitable pour de nombreuses applications qui dépendent de l'environnement proche. L'état de l'art dans le domaine de l'extraction de profondeur propose de nombreuses méthodes, mais très peu sont réellement adaptées aux systèmes embarqués miniaturisés. Certaines parce qu'elles sont trop encombrantes en raison de leur système optique, d'autres parce qu'elles nécessitent une calibration délicate, ou des méthodes de reconstructions difficilement implantables dans un système embarqué. Dans cette thèse nous nous concentrons sur des méthodes a faible complexité matérielle afin de proposer une solution algorithmique et optique pour réaliser un capteur permettant à la fois d'extraire la profondeur de la scène, de fournir une évaluation de pertinence de cette mesure et de proposer des images focalisées en tout point. Dans ce sens, nous montrons que les algorithmes du type Depth from Focus (DfF) sont les plus adaptés à ces contraintes. Ce procédé consiste à acquérir un cube d'images multi-focus d'une même scène pour différentes distances de focalisation. Les images sont analysées afin d'annoter chacune des zones de la scène d'un indice relatif à sa profondeur estimée. Cet indice est utilisé pour reconstruire une image nette en tout point.Nous avons travaillé sur la notion de netteté afin de proposer des solutions peu complexes, uniquement basées sur des additions et comparaisons, et de fait, facilement adaptables pour un portage sur une architecture matérielle. La solution proposée effectue une analyse bidirectionnelle de contraste local puis combine les meilleures estimations de profondeur en fin de traitement. Elle se décline en trois approches avec une restriction de la complexité de plus en plus forte et ainsi une aptitude de plus en plus marquée pour l'embarqué. Pour chaque méthode, des cartes de profondeurs et de confiances sont établies, ainsi qu'une image totalement focalisée constituée d'éléments issus de l'ensemble du cube multi-focus. Ces approches sont comparées en qualité et en complexité à d'autres méthodes de l'état de l'art de complexité similaire. Une architecture est proposée pour une implantation matérielle de la solution la plus prometteuse. La conception de ces algorithmes soulève le problème de la qualité d'image. Il est en effet primordial d'avoir une évolution remarquable du contraste ainsi qu'une invariance de la scène lors de la capture du cube multi-focus. Un effet très souvent négligé dans ce type d'approche est le zoom parasite provoqué par la lentille responsable de la variation de focus. Ce zoom de focalisation fragilise l'aspect invariance de la scène et provoque l'apparition d'artefacts sur les trois informations Profondeur, Image et Confiance. La recherche d'optiques adaptées au DfF constitue donc un second axe de ces travaux. Nous avons évalué des lentilles liquides industrielles et des lentilles modales expérimentales à cristaux liquides nématiques conçues durant cette thèse. Ces technologies ont été comparées en termes de rapidité, de qualité d'image, d'intensité de zoom de focalisation engendré, de tension d'alimentation et enfin de qualité des cartes de profondeur extraites et des images totalement focalisées reconstruites.La lentille et l'algorithme répondant le mieux à cette problématique DfF embarqué ont ensuite été évalués via le portage sur une plateforme de développement CPU-GPU permettant l'acquisition d'images et de cartes de profondeurs et de confiances en temps réel. / Acquiring the depth of a scene in addition to its image is a desirable feature for many applications which depend on the near environment. The state of the art in the field of depth extraction offers many methods, but very few are well adapted to small embedded systems. Some of them are too cumbersome because of their large optical system. Others might require a delicate calibration or processing methods which are difficult to implement in an embedded system. In this PhD thesis, we focus on methods with low hardware complexity in order to propose algorithms and optical solutions that extract the depth of the scene, provide a relevance evaluation of this measurement and produce all-in-focus images. We show that Depth from Focus (DfF) algorithms are the most adapted to embedded electronics constraints. This method consists in acquiring a cube of multi-focus images of the same scene for different focusing distances. The images are analyzed in order to annotate each zone of the scene with an index relative to its estimated depth. This index is then used to build an all in focus image. We worked on the sharpness criterion in order to propose low complexity solutions, only based on additions and comparisons, easily adaptable on a hardware architecture. The proposed solution uses bidirectional local contrast analysis and then combines the most relevant depth estimations based on detection confidence at the end of treatment. It is declined in three approaches which need less and less processing and thus make them more and more adapted for a final embedded solution. For each method, depth and confidence maps are established, as well as an all-in-focus image composed of elements from the entire multi-focus cube. These approaches are compared in quality and complexity with other state-of-the-art methods which present similar complexity. A hardware implementation of the best solution is proposed. The design of these algorithms raises the problem of image quality. It is indeed essential to have a remarkable contrast evolution as well as a motionless scene during the capture of the multi-focus cube. A very often neglected effect in this type of approach is the parasitic zoom caused by the lens motion during a focus variation. This "focal zoom" weakens the invariance aspect of the scene and causes artifacts on the depth and confidence maps and on the all in focus image. The search for optics adapted to DfF is thus a second line of research in this work. We have evaluated industrial liquid lenses and experimental nematic liquid crystal modal lenses designed during this thesis. These technologies were compared in terms of speed, image quality, generated focal zoom intensity, power supply voltage and finally the quality of extracted depth maps and reconstructed all in focus images. The lens and the algorithm which best suited this embedded DfF issue were then evaluated on a CPU-GPU development platform allowing real time acquisition of depth maps, confidence maps and all in focus images. Capteur d'image 3D Acquisition de la profondeur Imagerie All-In-Focus 3D Image-Sensor Depth acquisition All Focus image generation Depth Map Confidence Map Liquid crystal lens 620
38	MPEG Z/Alpha and high-resolution MPEG / MPEG Z/Alpha och högupplösande MPEG-video Ziegler, Gernot January 2003 (has links) The progression of technical development has yielded practicable camera systems for the acquisition of so called depth maps, images with depth information. Images and movies with depth information open the door for new types of applications in the area of computer graphics and vision. That implies that they will need to be processed in all increasing volumes. Increased depth image processing puts forth the demand for a standardized data format for the exchange of image data with depth information, both still and animated. Software to convert acquired depth data to such videoformats is highly necessary. This diploma thesis sheds light on many of the issues that come with this new task group. It spans from data acquisition over readily available software for the data encoding to possible future applications. Further, a software architecture fulfilling all of the mentioned demands is presented. The encoder is comprised of a collection of UNIX programs that generate MPEG Z/Alpha, an MPEG2 based video format. MPEG Z/Alpha contains beside MPEG2's standard data streams one extra data stream to store image depth information (and transparency). The decoder suite, called TexMPEG, is a C library for the in-memory decompression of MPEG Z/Alpha. Much effort has been put into video decoder parallelization, and TexMPEG is now capable of decoding multiple video streams, not only in parallel internally, but also with inherent frame synchronization between parallely decoded MPEG videos. Datorteknik MPEG2 Z/Alpha depth map depth data stream panorama VR theatre frame synchronous playback OpenGL video textures libmpeg2 mpeg2enc demultiplexer relief textures video decoder parallelization Datorteknik Computer Engineering Datorteknik
39	Detekce objektů pomocí Kinectu / Object Detection Using Kinect Řehánek, Martin January 2012 (has links) With the release of the Kinect device new possibilities appeared, allowing a simple use of image depth in image processing. The aim of this thesis is to propose a method for object detection and recognition in a depth map. Well known method Bag of Words and a descriptor based on Spin Image method are used for the object recognition. The Spin Image method is one of several existing approaches to depth map which are described in this thesis. Detection of object in picture is ensured by the sliding window technique. That is improved and speeded up by utilization of the depth information.
40	Codage de carte de profondeur par déformation de courbes élastiques / Coding of depth maps by elastic deformations of curves Calemme, Marco 20 September 2016 (has links) Dans le format multiple-view video plus depth, les cartes de profondeur peuvent être représentées comme des images en niveaux de gris et la séquence temporelle correspondante peut être considérée comme une séquence vidéo standard en niveaux de gris. Cependant les cartes de profondeur ont des propriétés différentes des images naturelles: ils présentent de grandes surfaces lisses séparées par des arêtes vives. On peut dire que l'information la plus importante réside dans les contours de l'objet, en conséquence une approche intéressante consiste à effectuer un codage sans perte de la carte de contour, éventuellement suivie d'un codage lossy des valeurs de profondeur par-objet. Dans ce contexte, nous proposons une nouvelle technique pour le codage sans perte des contours de l'objet, basée sur la déformation élastique des courbes. Une évolution continue des déformations élastiques peut être modélisée entre deux courbes de référence, et une version du contour déformée élastiquement peut être envoyée au décodeur avec un coût de codage très faible et utilisé comme information latérale pour améliorer le codage sans perte du contour réel. Après que les principales discontinuités ont été capturées par la description du contour, la profondeur à l'intérieur de chaque région est assez lisse. Nous avons proposé et testé deux techniques différentes pour le codage du champ de profondeur à l'intérieur de chaque région. La première technique utilise la version adaptative à la forme de la transformation en ondelette, suivie par la version adaptative à la forme de SPIHT. La seconde technique effectue une prédiction du champ de profondeur à partir de sa version sous-échantillonnée et l'ensemble des contours codés. Il est généralement reconnu qu'un rendu de haute qualité au récepteur pour un nouveau point de vue est possible qu’avec la préservation de l'information de contour, car des distorsions sur les bords lors de l'étape de codage entraînerait une dégradation évidente sur la vue synthétisée et sur la perception 3D. Nous avons étudié cette affirmation en effectuant un test d'évaluation de la qualité perçue en comparant, pour le codage des cartes de profondeur, une technique basée sur la compression d'objects et une techniques de codage vidéo hybride à blocs. / In multiple-view video plus depth, depth maps can be represented by means of grayscale images and the corresponding temporal sequence can be thought as a standard grayscale video sequence. However depth maps have different properties from natural images: they present large areas of smooth surfaces separated by sharp edges. Arguably the most important information lies in object contours, as a consequence an interesting approach consists in performing a lossless coding of the contour map, possibly followed by a lossy coding of per-object depth values. In this context, we propose a new technique for the lossless coding of object contours, based on the elastic deformation of curves. A continuous evolution of elastic deformations between two reference contour curves can be modelled, and an elastically deformed version of the reference contours can be sent to the decoder with an extremely small coding cost and used as side information to improve the lossless coding of the actual contour. After the main discontinuities have been captured by the contour description, the depth field inside each region is rather smooth. We proposed and tested two different techniques for the coding of the depth field inside each region. The first technique performs the shape-adaptive wavelet transform followed by the shape-adaptive version of SPIHT. The second technique performs a prediction of the depth field from its subsampled version and the set of coded contours. It is generally recognized that a high quality view rendering at the receiver side is possible only by preserving the contour information, since distortions on edges during the encoding step would cause a sensible degradation on the synthesized view and on the 3D perception. We investigated this claim by conducting a subjective quality assessment test to compare an object-based technique and a hybrid block-based techniques for the coding of depth maps. Synthèse d'image Codage de carte de profondeur Vidéo multi-vues plus profondeur Contours Lossless representation Quality assessment Elastic curves Image synthesis Depth map coding Multiview video plus depth Contours Représentation sans perte Evaluation de la qualité Courbes élastiques

Search results