Global ETD Search

1	Instant HDR-NeRF: Fast Learning Of High Dynamic Range View Synthesis With Unknown Exposure Settings Nguyen, Nam 01 June 2024 (has links) (PDF) We propose Instant High Dynamic Range Neural Radiance Fields (Instant HDR-NeRF), a method of learning high dynamic range (HDR) view synthesis from a set of low dynamic range (LDR) views with unknown and varying exposure and white balance in as little as minutes. Our method can render novel HDR views without ground-truth supervision, and novel LDR views in different exposure settings, including those that match the ground-truth LDR views. The key to our method is to model the physical process of the camera with two implicit MLPs: a radiance field and a monotonically increasing tone-mapper. Built upon Instant Neural Graphics Primitives (Instant-NGP), the radiance field encodes the scene geometry and radiance (from 0 to ∞), and outputs the densities and the radiance at locations along the camera ray. The monotonically increasing tone-mapper models the camera response function (CRF) where the radiance hits on the camera sensor and becomes a pixel value (from 0 to 255). The radiance at each location is combined with the learnable exposure parameters, optimized separately for each color band and for each image. A quantitative evaluation on benchmark datasets shows that our method outperforms prior HDR novel view synthesis methods in LDR rendering quality and training speed. To best of our knowledge, our method is also the first HDR radiance field that successfully recovers the ground-truth CRF with a low average error rate of 3.70%, while co-learning geometry, radiance, and exposures all at the same time through implicit functions. In practical applications, our method can produce high-fidelity 3D reconstruction of real-world scenes from images of varying exposure settings, which is particularly useful for casual capturing, where fixed settings aren’t guaranteed. The tone-mapper MLP can be easily controlled to simulate auto-exposure effects, making it useful in filming and video games. Furthermore, the HDR radiance maps produced by our method can be edited and tone-mapped according to user preferences. Novel View Synthesis Neural Radiance Fields Computer Vision Artificial Intelligence Deep Learning
2	ENHANCING PRIVACY AND IMMERSIVENESS IN VIDEO CONFERENCING: PORTRAIT VIDEO SEGMENTATION AND SPATIAL VIEW INTERPOLATION Weichen Xu (20307963) 10 January 2025 (has links) <p dir="ltr">In recent years, video conferencing technology has gained widespread use, driving the demand for enhanced virtual communication experiences. However, survey shows that users are highly concerned about information leaks from their video backgrounds and often re- port diﬃculties maintaining attention during long meetings. This dissertation focuses on improving privacy and immersion in video conferencing. To improve segmentation quality and inter-frame segmentation consistency, sources of temporal guidance are benchmarked. Furthermore, we develop a lightweight deep neural network with eﬃcient temporal guid- ance for real-time portrait video segmentation. The method achieves good balance between processing time and segmentation quality, making it ideal for real-time applications such as background blurring and replacement. To promote immersiveness in video conferencing, we propose a cost-eﬀective telepresence system that delivers more immersive viewing ex- periences. The system integrates multi-view capture, spatial view interpolation, and view rendering. Leveraging a fully synthetic multi-view portrait dataset, the quality of the spatial view interpolation method is significantly improved. Additionally, we introduce an eﬀective multi-stage network which significantly reduces the computation cost in generating multiple interpolated views at finer scales without sacrificing image quality. Furthermore, a simplified system with only two camera inputs is explored, which utilizes pose information to assist spatial view interpolation. The proposed telepresence system oﬀers an immersive multi-angle viewing experience.</p> Signal processing video conferencing system telepresence systems video segmentation view interpolation solution novel view synthesis
3	Cartographie RGB-D dense pour la localisation visuelle temps-réel et la navigation autonome / Dense RGB-D mapping for real-time localisation and autonomous navigation Meilland, Maxime 28 March 2012 (has links) Dans le contexte de la navigation autonome en environnement urbain, une localisation précise du véhicule est importante pour une navigation sure et fiable. La faible précision des capteurs bas coût existants tels que le système GPS, nécessite l'utilisation d'autres capteurs eux aussi à faible coût. Les caméras mesurent une information photométrique riche et précise sur l'environnement, mais nécessitent l'utilisation d'algorithmes de traitement avancés pour obtenir une information sur la géométrie et sur la position de la caméra dans l'environnement. Cette problématique est connue sous le terme de Cartographie et Localisation Simultanées (SLAM visuel). En général, les techniques de SLAM sont incrémentales et dérivent sur de longues trajectoires. Pour simplifier l'étape de localisation, il est proposé de découpler la partie cartographie et la partie localisation en deux phases: la carte est construite hors-ligne lors d'une phase d'apprentissage, et la localisation est effectuée efficacement en ligne à partir de la carte 3D de l'environnement. Contrairement aux approches classiques, qui utilisent un modèle 3D global approximatif, une nouvelle représentation égo-centrée dense est proposée. Cette représentation est composée d'un graphe d'images sphériques augmentées par l'information dense de profondeur (RGB+D), et permet de cartographier de larges environnements. Lors de la localisation en ligne, ce type de modèle apporte toute l'information nécessaire pour une localisation précise dans le voisinage du graphe, et permet de recaler en temps-réel l'image perçue par une caméra embarquée sur un véhicule, avec les images du graphe, en utilisant une technique d'alignement d'images directe. La méthode de localisation proposée, est précise, robuste aux aberrations et prend en compte les changements d'illumination entre le modèle de la base de données et les images perçues par la caméra. Finalement, la précision et la robustesse de la localisation permettent à un véhicule autonome, équipé d'une caméra, de naviguer de façon sure en environnement urbain. / In an autonomous navigation context, a precise localisation of the vehicule is important to ensure a reliable navigation. Low cost sensors such as GPS systems are inacurrate and inefficicent in urban areas, and therefore the employ of such sensors alone is not well suited for autonomous navigation. On the other hand, camera sensors provide a dense photometric measure that can be processed to obtain both localisation and mapping information. In the robotics community, this problem is well known as Simultaneous Localisation and Mapping (SLAM) and it has been studied for the last thirty years. In general, SLAM algorithms are incremental and prone to drift, thus such methods may not be efficient in large scale environments for real-time localisation. Clearly, an a-priori 3D model simplifies the localisation and navigation tasks since it allows to decouple the structure and motion estimation problems. Indeed, the map can be previously computed during a learning phase, whilst the localisation can be handled in real-time using a single camera and the pre-computed model. Classic global 3D model representations are usually inacurrate and photometrically inconsistent. Alternatively, it is proposed to use an ego-centric model that represents, as close as possible, real sensor measurements. This representation is composed of a graph of locally accurate spherical panoramas augmented with dense depth information. These augmented panoramas allow to generate varying viewpoints through novel view synthesis. To localise a camera navigating locally inside the graph, we use the panoramas together with a direct registration technique. The proposed localisation method is accurate, robust to outliers and can handle large illumination changes. Finally, autonomous navigation in urban environments is performed using the learnt model, with only a single camera to compute localisation. SLAM Navigation Localisation Suivi visuel Synthèse de nouvelle vue Cartographie SLAM Navigation Localisation Visual Tracking Novel View Synthesis Mapping
4	Cartographie RGB-D dense pour la localisation visuelle temps-réel et la navigation autonome / Dense RGB-D mapping for real-time localisation and autonomous navigation Meilland, Maxime 28 March 2012 (has links) Dans le contexte de la navigation autonome en environnement urbain, une localisation précise du véhicule est importante pour une navigation sure et fiable. La faible précision des capteurs bas coût existants tels que le système GPS, nécessite l'utilisation d'autres capteurs eux aussi à faible coût. Les caméras mesurent une information photométrique riche et précise sur l'environnement, mais nécessitent l'utilisation d'algorithmes de traitement avancés pour obtenir une information sur la géométrie et sur la position de la caméra dans l'environnement. Cette problématique est connue sous le terme de Cartographie et Localisation Simultanées (SLAM visuel). En général, les techniques de SLAM sont incrémentales et dérivent sur de longues trajectoires. Pour simplifier l'étape de localisation, il est proposé de découpler la partie cartographie et la partie localisation en deux phases: la carte est construite hors-ligne lors d'une phase d'apprentissage, et la localisation est effectuée efficacement en ligne à partir de la carte 3D de l'environnement. Contrairement aux approches classiques, qui utilisent un modèle 3D global approximatif, une nouvelle représentation égo-centrée dense est proposée. Cette représentation est composée d'un graphe d'images sphériques augmentées par l'information dense de profondeur (RGB+D), et permet de cartographier de larges environnements. Lors de la localisation en ligne, ce type de modèle apporte toute l'information nécessaire pour une localisation précise dans le voisinage du graphe, et permet de recaler en temps-réel l'image perçue par une caméra embarquée sur un véhicule, avec les images du graphe, en utilisant une technique d'alignement d'images directe. La méthode de localisation proposée, est précise, robuste aux aberrations et prend en compte les changements d'illumination entre le modèle de la base de données et les images perçues par la caméra. Finalement, la précision et la robustesse de la localisation permettent à un véhicule autonome, équipé d'une caméra, de naviguer de façon sure en environnement urbain. / In an autonomous navigation context, a precise localisation of the vehicule is important to ensure a reliable navigation. Low cost sensors such as GPS systems are inacurrate and inefficicent in urban areas, and therefore the employ of such sensors alone is not well suited for autonomous navigation. On the other hand, camera sensors provide a dense photometric measure that can be processed to obtain both localisation and mapping information. In the robotics community, this problem is well known as Simultaneous Localisation and Mapping (SLAM) and it has been studied for the last thirty years. In general, SLAM algorithms are incremental and prone to drift, thus such methods may not be efficient in large scale environments for real-time localisation. Clearly, an a-priori 3D model simplifies the localisation and navigation tasks since it allows to decouple the structure and motion estimation problems. Indeed, the map can be previously computed during a learning phase, whilst the localisation can be handled in real-time using a single camera and the pre-computed model. Classic global 3D model representations are usually inacurrate and photometrically inconsistent. Alternatively, it is proposed to use an ego-centric model that represents, as close as possible, real sensor measurements. This representation is composed of a graph of locally accurate spherical panoramas augmented with dense depth information. These augmented panoramas allow to generate varying viewpoints through novel view synthesis. To localise a camera navigating locally inside the graph, we use the panoramas together with a direct registration technique. The proposed localisation method is accurate, robust to outliers and can handle large illumination changes. Finally, autonomous navigation in urban environments is performed using the learnt model, with only a single camera to compute localisation. SLAM Navigation Localisation Suivi visuel Synthèse de nouvelle vue Cartographie SLAM Navigation Localisation Visual Tracking Novel View Synthesis Mapping
5	Multi-scale Methods for Omnidirectional Stereo with Application to Real-time Virtual Walkthroughs Brunton, Alan P 28 November 2012 (has links) This thesis addresses a number of problems in computer vision, image processing, and geometry processing, and presents novel solutions to these problems. The overarching theme of the techniques presented here is a multi-scale approach, leveraging mathematical tools to represent images and surfaces at different scales, and methods that can be adapted from one type of domain (eg., the plane) to another (eg., the sphere). The main problem addressed in this thesis is known as stereo reconstruction: reconstructing the geometry of a scene or object from two or more images of that scene. We develop novel algorithms to do this, which work for both planar and spherical images. By developing a novel way to formulate the notion of disparity for spherical images, we are able effectively adapt our algorithms from planar to spherical images. Our stereo reconstruction algorithm is based on a novel application of distance transforms to multi-scale matching. We use matching information aggregated over multiple scales, and enforce consistency between these scales using distance transforms. We then show how multiple spherical disparity maps can be efficiently and robustly fused using visibility and other geometric constraints. We then show how the reconstructed point clouds can be used to synthesize a realistic sequence of novel views, images from points of view not captured in the input images, in real-time. Along the way to this result, we address some related problems. For example, multi-scale features can be detected in spherical images by convolving those images with a filterbank, generating an overcomplete spherical wavelet representation of the image from which the multiscale features can be extracted. Convolution of spherical images is much more efficient in the spherical harmonic domain than in the spatial domain. Thus, we develop a GPU implementation for fast spherical harmonic transforms and frequency domain convolutions of spherical images. This tool can also be used to detect multi-scale features on geometric surfaces. When we have a point cloud of a surface of a particular class of object, whether generated by stereo reconstruction or by some other modality, we can use statistics and machine learning to more robustly estimate the surface. If we have at our disposal a database of surfaces of a particular type of object, such as the human face, we can compute statistics over this database to constrain the possible shape a new surface of this type can take. We show how a statistical spherical wavelet shape prior can be used to efficiently and robustly reconstruct a face shape from noisy point cloud data, including stereo data. multi-scale wavelets stereo reconstruction omnidirectional vision real-time novel view synthesis real-time virtual walkthroughs spherical parameterizations spherical harmonics GPU programming
6	Multi-scale Methods for Omnidirectional Stereo with Application to Real-time Virtual Walkthroughs Brunton, Alan P 28 November 2012 (has links) This thesis addresses a number of problems in computer vision, image processing, and geometry processing, and presents novel solutions to these problems. The overarching theme of the techniques presented here is a multi-scale approach, leveraging mathematical tools to represent images and surfaces at different scales, and methods that can be adapted from one type of domain (eg., the plane) to another (eg., the sphere). The main problem addressed in this thesis is known as stereo reconstruction: reconstructing the geometry of a scene or object from two or more images of that scene. We develop novel algorithms to do this, which work for both planar and spherical images. By developing a novel way to formulate the notion of disparity for spherical images, we are able effectively adapt our algorithms from planar to spherical images. Our stereo reconstruction algorithm is based on a novel application of distance transforms to multi-scale matching. We use matching information aggregated over multiple scales, and enforce consistency between these scales using distance transforms. We then show how multiple spherical disparity maps can be efficiently and robustly fused using visibility and other geometric constraints. We then show how the reconstructed point clouds can be used to synthesize a realistic sequence of novel views, images from points of view not captured in the input images, in real-time. Along the way to this result, we address some related problems. For example, multi-scale features can be detected in spherical images by convolving those images with a filterbank, generating an overcomplete spherical wavelet representation of the image from which the multiscale features can be extracted. Convolution of spherical images is much more efficient in the spherical harmonic domain than in the spatial domain. Thus, we develop a GPU implementation for fast spherical harmonic transforms and frequency domain convolutions of spherical images. This tool can also be used to detect multi-scale features on geometric surfaces. When we have a point cloud of a surface of a particular class of object, whether generated by stereo reconstruction or by some other modality, we can use statistics and machine learning to more robustly estimate the surface. If we have at our disposal a database of surfaces of a particular type of object, such as the human face, we can compute statistics over this database to constrain the possible shape a new surface of this type can take. We show how a statistical spherical wavelet shape prior can be used to efficiently and robustly reconstruct a face shape from noisy point cloud data, including stereo data. multi-scale wavelets stereo reconstruction omnidirectional vision real-time novel view synthesis real-time virtual walkthroughs spherical parameterizations spherical harmonics GPU programming
7	Multi-scale Methods for Omnidirectional Stereo with Application to Real-time Virtual Walkthroughs Brunton, Alan P January 2012 (has links) This thesis addresses a number of problems in computer vision, image processing, and geometry processing, and presents novel solutions to these problems. The overarching theme of the techniques presented here is a multi-scale approach, leveraging mathematical tools to represent images and surfaces at different scales, and methods that can be adapted from one type of domain (eg., the plane) to another (eg., the sphere). The main problem addressed in this thesis is known as stereo reconstruction: reconstructing the geometry of a scene or object from two or more images of that scene. We develop novel algorithms to do this, which work for both planar and spherical images. By developing a novel way to formulate the notion of disparity for spherical images, we are able effectively adapt our algorithms from planar to spherical images. Our stereo reconstruction algorithm is based on a novel application of distance transforms to multi-scale matching. We use matching information aggregated over multiple scales, and enforce consistency between these scales using distance transforms. We then show how multiple spherical disparity maps can be efficiently and robustly fused using visibility and other geometric constraints. We then show how the reconstructed point clouds can be used to synthesize a realistic sequence of novel views, images from points of view not captured in the input images, in real-time. Along the way to this result, we address some related problems. For example, multi-scale features can be detected in spherical images by convolving those images with a filterbank, generating an overcomplete spherical wavelet representation of the image from which the multiscale features can be extracted. Convolution of spherical images is much more efficient in the spherical harmonic domain than in the spatial domain. Thus, we develop a GPU implementation for fast spherical harmonic transforms and frequency domain convolutions of spherical images. This tool can also be used to detect multi-scale features on geometric surfaces. When we have a point cloud of a surface of a particular class of object, whether generated by stereo reconstruction or by some other modality, we can use statistics and machine learning to more robustly estimate the surface. If we have at our disposal a database of surfaces of a particular type of object, such as the human face, we can compute statistics over this database to constrain the possible shape a new surface of this type can take. We show how a statistical spherical wavelet shape prior can be used to efficiently and robustly reconstruct a face shape from noisy point cloud data, including stereo data. multi-scale wavelets stereo reconstruction omnidirectional vision real-time novel view synthesis real-time virtual walkthroughs spherical parameterizations spherical harmonics GPU programming

1

Page generated in 0.064 seconds