Global ETD Search

11	Low complexity multiview video coding Khattak, Shadan January 2014 (has links) 3D video is a technology that has seen a tremendous attention in the recent years. Multiview Video Coding (MVC) is an extension of the popular H.264 video coding standard and is commonly used to compress 3D videos. It offers an improvement of 20% to 50% in compression efficiency over simulcast encoding of multiview videos using the conventional H.264 video coding standard. However, there are two important problems associated with it: (i) its superior compression performance comes at the cost of significantly higher computational complexity which hampers the real-world realization of MVC encoder in applications such as 3D live broadcasting and interactive Free Viewpoint Television (FTV), and (ii) compressed 3D videos can suffer from packet loss during transmission, which can degrade the viewing quality of the 3D video at the decoder. This thesis aims to solve these problems by presenting techniques to reduce the computational complexity of the MVC encoder and by proposing a consistent error concealment technique for frame losses in 3D video transmission. The thesis first analyses the complexity of the MVC encoder. It then proposes two novel techniques to reduce the complexity of motion and disparity estimation. The first method achieves complexity reduction in the disparity estimation process by exploiting the relationship between temporal levels, type of macroblocks and search ranges while the second method achieves it by exploiting the geometrical relation- ship between motion and disparity vectors in stereo frames. These two methods are then combined with other state-of-the-art methods in a unique framework where gains add up. Experimental results show that the proposed low-complexity framework can reduce the encoding time of the standard MVC encoder by over 93% while maintaining similar compression efficiency performance. The addition of new View Synthesis Prediction (VSP) modes to the MVC encoding framework improves the compression efficiency of MVC. However, testing additional modes comes at the cost of increased encoding complexity. In order to reduce the encoding complexity, the thesis, next, proposes a bayesian early mode decision technique for a VSP enhanced MVC coder. It exploits the statistical similarities between the RD costs of the VSP SKIP mode in neighbouring views to terminate the mode decision process early. Results indicate that the proposed technique can reduce the encoding time of the enhanced MVC coder by over 33% at similar compression efficiency levels. Finally, compressed 3D videos are usually required to be broadcast to a large number of users where transmission errors can lead to frame losses which can degrade the video quality at the decoder. A simple reconstruction of the lost frames can lead to inconsistent reconstruction of the 3D scene which may negatively affect the viewing experience of a user. In order to solve this problem, the thesis proposes, at the end, a consistency model for recovering frames lost during transmission. The proposed consistency model is used to evaluate inter-view and temporal consistencies while selecting candidate blocks for concealment. Experimental results show that the proposed technique is able to recover the lost frames with high consistency and better quality than two standard error concealment methods and a baseline technique based on the boundary matching algorithm. 600
12	Orbital angular momentum encoding/decoding of 2D images for scalable multiview colour displays Chu, Jiaqi January 2018 (has links) Three-dimensional (3D) displays project 3D images that give 3D perceptions and mimic real-world objects. Among the rich varieties of 3D displays, multiview displays take advantage of light’s various degrees of freedom and provide some of the 3D perceptions by projecting 2D subsampling of a 3D object. More 2D subsampling is required to project images with smoother parallax and more realistic sensation. As an additional degree of freedom with theoretically unlimited state space, orbital angular momentum (OAM) modes may be an alternative to the conventional multiview approaches and potentially project more images. This research involves exploring the possibility of encoding/decoding off-axis points in 2D images with OAM modes, development of the optical system, and design and development of a multiview colour display architecture. The first part of the research is exploring encoding/decoding off-axis points with OAM modes. Conventionally OAM modes are used to encode/decode the on-axis information only. Analysis of on-axis OAM beams referenced to off-axis points suggests representation of off-axis displacements as a set of expanded OAM components. At current stage off-axis points within an effective coding area are possible to be encoded/decoded with chosen OAM modes for multiplexing. Experimentally a 2D image is encoded/decoded with an OAM modes. When the encoding/decoding OAM modes match, the image is reconstructed. On the other hand, a dark region with zero intensity is shown. The dark region suggests the effective coding area for multiplexing. The final part of the research develops a multiview colour display. Based on understandings of off-axis representation of a set of different OAM components and experimental test of the optical system, three 1 mm monochromatic images are encoded, multiplexed and projected. Having studied wavelength effects on OAM coding, the initial architecture is updated to a scalable colour display consisting of four wavelengths.
13	Learning a Multiview Weighted Majority Vote Classifier : Using PAC-Bayesian Theory and Boosting / Apprentissage de vote de majorité pour la classification multivue : Utilisation de la théorie PAC-Bayésienne et du boosting Goyal, Anil 23 October 2018 (has links) La génération massive de données, nous avons de plus en plus de données issues de différentes sources d’informations ayant des propriétés hétérogènes. Il est donc important de prendre en compte ces représentations ou vues des données. Ce problème d'apprentissage automatique est appelé apprentissage multivue. Il est utile dans de nombreux domaines d’applications, par exemple en imagerie médicale, nous pouvons représenter le cerveau humains via des IRM, t-fMRI, EEG, etc. Dans cette cette thèse, nous nous concentrons sur l’apprentissage multivue supervisé, où l’apprentissage multivue est une combinaison de différents modèles de classifications ou de vues. Par conséquent, selon notre point de vue, il est intéressant d’aborder la question de l’apprentissage à vues multiples dans le cadre PAC-Bayésien. C’est un outil issu de la théorie de l’apprentissage statistique étudiant les modèles s’exprimant comme des votes de majorité. Un des avantages est qu’elle permet de prendre en considération le compromis entre précision et diversité des votants, au cœur des problématiques liées à l’apprentissage multivue. La première contribution de cette thèse étend la théorie PAC-Bayésienne classique (avec une seule vue) à l’apprentissage multivue (avec au moins deux vues). Pour ce faire, nous définissons une hiérarchie de votants à deux niveaux: les classifieurs spécifiques à la vue et les vues elles-mêmes. Sur la base de cette stratégie, nous avons dérivé des bornes en généralisation PAC-Bayésiennes (probabilistes et non-probabilistes) pour l’apprentissage multivue. D'un point de vue pratique, nous avons conçu deux algorithmes d'apprentissage multivues basés sur notre stratégie PAC-Bayésienne à deux niveaux. Le premier algorithme appelé PB-MVBoost est un algorithme itératif qui apprend les poids sur les vues en contrôlant le compromis entre la précision et la diversité des vues. Le second est une approche de fusion tardive où les prédictions des classifieurs spécifiques aux vues sont combinées via l’algorithme PAC-Bayésien CqBoost proposé par Roy et al. Enfin, nous montrons que la minimisation des erreurs pour le vote de majorité multivue est équivalente à la minimisation de divergences de Bregman. De ce constat, nous proposons un algorithme appelé MωMvC2 pour apprendre un vote de majorité multivue. / With tremendous generation of data, we have data collected from different information sources having heterogeneous properties, thus it is important to consider these representations or views of the data. This problem of machine learning is referred as multiview learning. It has many applications for e.g. in medical imaging, we can represent human brain with different set of features for example MRI, t-fMRI, EEG, etc. In this thesis, we focus on supervised multiview learning, where we see multiview learning as combination of different view-specific classifiers or views. Therefore, according to our point of view, it is interesting to tackle multiview learning issue through PAC-Bayesian framework. It is a tool derived from statistical learning theory studying models expressed as majority votes. One of the advantages of PAC-Bayesian theory is that it allows to directly capture the trade-off between accuracy and diversity between voters, which is important for multiview learning. The first contribution of this thesis is extending the classical PAC-Bayesian theory (with a single view) to multiview learning (with more than two views). To do this, we considered a two-level hierarchy of distributions over the view-specific voters and the views. Based on this strategy, we derived PAC-Bayesian generalization bounds (both probabilistic and expected risk bounds) for multiview learning. From practical point of view, we designed two multiview learning algorithms based on our two-level PAC-Bayesian strategy. The first algorithm is a one-step boosting based multiview learning algorithm called as PB-MVBoost. It iteratively learns the weights over the views by optimizing the multiview C-Bound which controls the trade-off between the accuracy and the diversity between the views. The second algorithm is based on late fusion approach where we combine the predictions of view-specific classifiers using the PAC-Bayesian algorithm CqBoost proposed by Roy et al. Finally, we show that minimization of classification error for multiview weighted majority vote is equivalent to the minimization of Bregman divergences. This allowed us to derive a parallel update optimization algorithm (referred as MωMvC2) to learn our multiview weighted majority vote. Apprentissage multivue Théorie PAC-Bayésienne Votes de majorité Multiview Learning PAC-Bayesian Theory Boosting Majority Vote
14	Edge-aided virtual view rendering for multiview video plus depth Muddala, Suryanarayana Murthy, Sjöström, Mårten, Olsson, Roger, Tourancheau, Sylvain January 2013 (has links) Depth-Image-Based Rendering (DIBR) of virtual views is a fundamental method in three dimensional 3-D video applications to produce dierent perspectives from texture and depth information, in particular the multi-viewplus-depth (MVD) format. Artifacts are still present in virtual views as a consequence of imperfect rendering using existing DIBR methods. In this paper, we propose an alternative DIBR method for MVD. In the proposed method we introduce an edge pixel and interpolate pixel values in the virtual view using the actual projected coordinates from two adjacent views, by which cracks and disocclusions are automatically lled. In particular, we propose a method to merge pixel information from two adjacent views in the virtual view before the interpolation; we apply a weighted averaging of projected pixels within the range of one pixel in the virtual view. We compared virtual view images rendered by the proposed method to the corresponding view images rendered by state-of-theart methods. Objective metrics demonstrated an advantage of the proposed method for most investigated media contents. Subjective test results showed preference to dierent methods depending on media content, and the test could not demonstrate a signicant dierence between the proposed method and state-of-the-art methods. View rendering 3DTV multiview plus depth (MVD) depth-image-based-rendering (DIBR) warping
15	Open Source 3D Reconstruction Mierle, Keir 25 July 2008 (has links) A new open source 3D reconstruction and evaluation pipeline is described, with a thorough description of the algorithms employed. A new evaluation framework is introduced, which is easy to use for comparison of state-of-the-art multiview reconstruction algorithms. The evaluation framework also includes tools for creating data sets which have ground truth. The source code is available under the GPL; a first for a complete end-to-end reconstruction system. vision reconstruction open source evaluation multiview matchmoving tracking camera calibration 0984
16	Open Source 3D Reconstruction Mierle, Keir 25 July 2008 (has links) A new open source 3D reconstruction and evaluation pipeline is described, with a thorough description of the algorithms employed. A new evaluation framework is introduced, which is easy to use for comparison of state-of-the-art multiview reconstruction algorithms. The evaluation framework also includes tools for creating data sets which have ground truth. The source code is available under the GPL; a first for a complete end-to-end reconstruction system. vision reconstruction open source evaluation multiview matchmoving tracking camera calibration 0984
17	Bildbaserad rendering : Implementation och jämförelse av två algoritmer Härdling, Peter January 2010 (has links) Det här arbetet har gått ut på att jämföra två algoritmer för bildbaserad rendering. Båda algoritmerna använder två bilder som spelats in med formatet MultiView plus depth för att rendera nya mellanliggande vyer av en tredimensionell scen. De tvådimensionella bilderna är kompletterade med djupvärden för varje bildpunkt. Renderingen kan då utföras genom perspektivriktiga transformationer där alla bildpunkters nya positioner projiceras individuellt. I samband med renderingen behöver bland annat mätfel i de ursprungliga bilderna samt skymda partier hanteras. Algoritm I gör det delvis genom utjämning av skarvararna mellan bildernas bidrag till den nya vyn. Algoritm II bygger på att bilderna delas upp i lager där de lager som ansetts vara säkra prioriteras framför lager som har bedömts vara mer riskabla. Algoritmerna har implementerats i Matlab och algoritm II har modifierats genom kompletteringar av dess regler för prioriteringen av lagren till mer komplicerade scener. Algoritm II har visat sig vara bättre på att bevara detaljer i de renderade vyerna och håller en jämnare hastighet vid renderingarna. Den ger även högre och jämnare resultat vid jämförelse med kvalitetsmåttet PSNR men vid jämförelser med MSSIM har den däremot fått något lägre värden. De ytterligare stegen vid renderingen har även ökat renderingstiderna med upp till 40 % jämfört med algoritm I. Författaren ger förslag på områden för fortsatt utveckling av algoritm II. Till exempel bör algoritmen testas vidare för att avgöra om de använda gränsvärdena är generella eller om de måste anpassas till olika scener. / This thesis has been aimed at comparing two algorithms for image-based renderings. Both algorithms uses two images recorded with the MultiView plus depth format, to render new intermediate views of a three-dimensional scene. The two-dimensional images extensions with depth values for each pixel, makes it possible to perform the image warping as perspective projections of all individually pixels to their new positions. During rendering, such as measurement error in the original images and occlusions has to be handled. Algorithm I is partly based on smoothening the joints between the contributions from the two images to the novel view. Algorithm II divides the images into layers, in which layers consid-ered safe has priority over layers that have been defined as more risky. The algorithms have been implemented in Matlab and algorithm II has been modified through additions to the priority rules for the layers to more complex scenes. Algorithm II has proven to be better at preserving the details in the rendered views, and maintains a less varying speed when rendering. It also provides higher and more consistent PSNR values, but in comparison using MSSIM the values are slightly lower. The additional steps have also increased the rendering times by up to 40 % compared to algorithm I. The author suggests areas for further development of algorithm II. For example, the algorithm should be tested further to determine if the used thresholds are general or whether they must be adapted to different scenes. 2DD 3D bildbaserad rendering MultiView plus Depth perspektivtransformering projicering SLERP IBR warp
18	Depth-Assisted Semantic Segmentation, Image Enhancement and Parametric Modeling Zhang, Chenxi 01 January 2014 (has links) This dissertation addresses the problem of employing 3D depth information on solving a number of traditional challenging computer vision/graphics problems. Humans have the abilities of perceiving the depth information in 3D world, which enable humans to reconstruct layouts, recognize objects and understand the geometric space and semantic meanings of the visual world. Therefore it is significant to explore how the 3D depth information can be utilized by computer vision systems to mimic such abilities of humans. This dissertation aims at employing 3D depth information to solve vision/graphics problems in the following aspects: scene understanding, image enhancements and 3D reconstruction and modeling. In addressing scene understanding problem, we present a framework for semantic segmentation and object recognition on urban video sequence only using dense depth maps recovered from the video. Five view-independent 3D features that vary with object class are extracted from dense depth maps and used for segmenting and recognizing different object classes in street scene images. We demonstrate a scene parsing algorithm that uses only dense 3D depth information to outperform using sparse 3D or 2D appearance features. In addressing image enhancement problem, we present a framework to overcome the imperfections of personal photographs of tourist sites using the rich information provided by large-scale internet photo collections (IPCs). By augmenting personal 2D images with 3D information reconstructed from IPCs, we address a number of traditionally challenging image enhancement techniques and achieve high-quality results using simple and robust algorithms. In addressing 3D reconstruction and modeling problem, we focus on parametric modeling of flower petals, the most distinctive part of a plant. The complex structure, severe occlusions and wide variations make the reconstruction of their 3D models a challenging task. We overcome these challenges by combining data driven modeling techniques with domain knowledge from botany. Taking a 3D point cloud of an input flower scanned from a single view, each segmented petal is fitted with a scale-invariant morphable petal shape model, which is constructed from individually scanned 3D exemplar petals. Novel constraints based on botany studies are incorporated into the fitting process for realistically reconstructing occluded regions and maintaining correct 3D spatial relations. The main contribution of the dissertation is in the intelligent usage of 3D depth information on solving traditional challenging vision/graphics problems. By developing some advanced algorithms either automatically or with minimum user interaction, the goal of this dissertation is to demonstrate that computed 3D depth behind the multiple images contains rich information of the visual world and therefore can be intelligently utilized to recognize/ understand semantic meanings of scenes, efficiently enhance and augment single 2D images, and reconstruct high-quality 3D models. Semantic segmentation image enhancement 3D parametric modeling Multiview stereo Artificial Intelligence and Robotics
19	Cubic-Panorama Image Dataset Analysis for Storage and Transmission Salehi Doolabi, Saeed 23 April 2013 (has links) This thesis involves systems for virtual presence in remote locations, a field referred to as telepresence. Recent image-based representations such as Google map's street view provide a familiar example. Several areas of research are open; such image-based representations are huge in size and the necessity to compress data efficiently for storage is inevitable. On the other hand, users are usually located in remote areas, and thus efficient transmission of the visual information is another issue of great importance. In this work, real-world images are used in preference to computer graphics representations, mainly due to the photorealism that they provide as well as to avoid the high computational cost required for simulating large-scale environments. The cubic format is selected for panoramas in this thesis. A major feature of the captured cubic-panoramic image datasets in this work is the assumption of static scenes, and major issues of the system are compression efficiency and random access for storage, as well as computational complexity for transmission upon remote users' requests. First, in order to enable smooth navigation across different view-points, a method for aligning cubic-panorama image datasets by using the geometry of the scene is proposed and tested. Feature detection and camera calibration are incorporated and unlike the existing method, which is limited to a pair of panoramas, our approach is applicable to datasets with a large number of panoramic images, with no need for extra numerical estimation. Second, the problem of cubic-panorama image dataset compression is addressed in a number of ways. Two state-of-the-art approaches, namely the standardized scheme of H.264 and a wavelet-based codec named Dirac, are used and compared for the application of virtual navigation in image based representations of real world environments. Different frame prediction structures and group of pictures lengths are investigated and compared for this new type of visual data. At this stage, based on the obtained results, an efficient prediction structure and bitstream syntax using features of the data as well as satisfying major requirements of the system are proposed. Third, we have proposed novel methods to address the important issue of disparity estimation. A client-server based scheme is assumed and a remote user is assumed to seek information at each navigation step. Considering the compression stage, a fast method that uses our previous work on the geometry of the scene as well as the proposed prediction structure together with the cubic format of panoramas is used to estimate disparity vectors efficiently. Considering the transmission stage, a new transcoding scheme is introduced and a number of different frame-format conversion scenarios are addressed towards the goal of free navigation. Different types of navigation scenarios including forward or backward navigation, as well as user pan, tilt, and zoom are addressed. In all the aforementioned cases, results are compared both visually through error images and videos as well as using the objective measures. Altogether free navigation within the captured panoramic image datasets will be facilitated using our work and it can be incorporated in state-of-the-art of emerging cubic-panorama image dataset compression/transmission schemes. Telepresence Virtual Navigation Imaged Based Rendering Cubic Panorama Multiview Video Video Transcoding Disparity Estimation
20	Multiterminal Video Coding: From Theory to Application Zhang, Yifu 2012 August 1900 (has links) Multiterminal (MT) video coding is a practical application of the MT source coding theory. For MT source coding theory, two problems associated with achievable rate regions are well investigated into in this thesis: a new sufficient condition for BT sum-rate tightness, and the sum-rate loss for quadratic Gaussian MT source coding. Practical code design for ideal Gaussian sources with quadratic distortion measure is also achieved for cases more than two sources with minor rate loss compared to theoretical limits. However, when the theory is applied to practical applications, the performance of MT video coding has been unsatisfactory due to the difficulty to explore the correlation between different camera views. In this dissertation, we present an MT video coding scheme under the H.264/AVC framework. In this scheme, depth camera information can be optionally sent to the decoder separately as another source sequence. With the help of depth information at the decoder end, inter-view correlation can be largely improved and thus so is the compression performance. With the depth information, joint estimation from decoded frames and side information at the decoder also becomes available to improve the quality of reconstructed video frames. Experimental result shows that compared to separate encoding, up to 9.53% of the bit rate can be saved by the proposed MT scheme using decoder depth information, while up to 5.65% can be saved by the scheme without depth camera information. Comparisons to joint video coding schemes are also provided. video coding information theory multiterminal source coding H.264/AVC multiview video coding depth camera

Search results