Global ETD Search

1	A NeRF for All Seasons Michael Donald Gableman (16632723) 08 August 2023 (has links) <p> </p> <p>As a result of Shadow NeRF and Sat-NeRF, it is possible to take the solar angle into account in a NeRF-based framework for rendering a scene from a novel viewpoint using satellite images for training. Our work extends those contributions and shows how one can make the renderings season-specific. Our main challenge was creating a Neural Radiance Field (NeRF) that could render seasonal features independently of viewing angle and solar angle</p> <p>while still being able to render shadows. We teach our network to render seasonal features by introducing one more input variable — time of the year. However, the small training datasets typical of satellite imagery can introduce ambiguities in cases where shadows are present in the same location for every image of a particular season. We add additional terms to the loss function to discourage the network from using seasonal features for accounting for shadows. We show the performance of our network on eight Areas of Interest containing images captured by the Maxar WorldView-3 satellite. This evaluation includes tests measuring the ability of our framework to accurately render novel views, generate height maps, predict shadows, and specify seasonal features independently from shadows. Our ablation</p> <p>studies justify the choices made for network design parameters. Also included in this work is a novel approach to space carving which merges multiple features and consistency metrics</p> <p>at different spatial scales to create higher quality digital surface map than is possible using standard RGB features.</p> Computer vision Satelite data Digital Surface Model (DSM) Neural Radiance Fields seasonal variabilities Space carving
2	Arboreal Radiance Fields : Investigating NeRF-Based Orthophotos in Forestry Lissmats, Olof January 2024 (has links) This thesis explores the potential of Neural Radiance Fields (NeRF) for generating orthophotos in forestry applications. Traditional orthophoto production methods, such as those implemented in Pix4D, require high image overlap and significant data collection. NeRF, a novel 3D scene reconstruction technique, shows potential for reducing these requirements by effectively reconstructing scenes with lower image overlaps. This study compares the orthophotos produced by NeRF and Pix4D using various degrees of image overlap, evaluating the results based on geometric accuracy, image quality, and robustness to data variations. The findings indicate that NeRF can produce orthophotos from low-overlap images with geometric accuracy comparable to orthophotos produced by Pix4D from high-overlap images, though with some trade-offs in image sharpness. These results suggest potential cost savings and operational efficiencies in forestry applications, providing a viable alternative to traditional photogrammetric techniques. NeRF Neural Radiance Fields Orthophotos Orthophotography Pix4D
3	Instant HDR-NeRF: Fast Learning Of High Dynamic Range View Synthesis With Unknown Exposure Settings Nguyen, Nam 01 June 2024 (has links) (PDF) We propose Instant High Dynamic Range Neural Radiance Fields (Instant HDR-NeRF), a method of learning high dynamic range (HDR) view synthesis from a set of low dynamic range (LDR) views with unknown and varying exposure and white balance in as little as minutes. Our method can render novel HDR views without ground-truth supervision, and novel LDR views in different exposure settings, including those that match the ground-truth LDR views. The key to our method is to model the physical process of the camera with two implicit MLPs: a radiance field and a monotonically increasing tone-mapper. Built upon Instant Neural Graphics Primitives (Instant-NGP), the radiance field encodes the scene geometry and radiance (from 0 to ∞), and outputs the densities and the radiance at locations along the camera ray. The monotonically increasing tone-mapper models the camera response function (CRF) where the radiance hits on the camera sensor and becomes a pixel value (from 0 to 255). The radiance at each location is combined with the learnable exposure parameters, optimized separately for each color band and for each image. A quantitative evaluation on benchmark datasets shows that our method outperforms prior HDR novel view synthesis methods in LDR rendering quality and training speed. To best of our knowledge, our method is also the first HDR radiance field that successfully recovers the ground-truth CRF with a low average error rate of 3.70%, while co-learning geometry, radiance, and exposures all at the same time through implicit functions. In practical applications, our method can produce high-fidelity 3D reconstruction of real-world scenes from images of varying exposure settings, which is particularly useful for casual capturing, where fixed settings aren’t guaranteed. The tone-mapper MLP can be easily controlled to simulate auto-exposure effects, making it useful in filming and video games. Furthermore, the HDR radiance maps produced by our method can be edited and tone-mapped according to user preferences. Novel View Synthesis Neural Radiance Fields Computer Vision Artificial Intelligence Deep Learning
4	Humans in the wild : NeRFs for Dynamic Scenes Modeling from In-the-Wild Monocular Videos with Humans Alessandro, Sanvito January 2023 (has links) Recent advancements in computer vision have led to the emergence of Neural Radiance Fields (NeRFs), a powerful tool for reconstructing photorealistic 3D scenes, even in dynamic settings. However, these methods struggle when dealing with human subjects, especially when the subject is partially obscured or not completely observable, resulting in inaccurate reconstructions of geometries and textures. To address this issue, this thesis evaluates state-of-the-art human modeling using implicit representations with partial observability of the subject. We then propose and test several novel methods to improve the generalization of these models, including the use of symmetry and Signed Distance Function (SDF) driven losses and leveraging prior knowledge from multiple subjects via a pre-trained model. Our results demonstrate that our proposed methods significantly improve the accuracy of the reconstructions, even in challenging ”in-the-wild” situations, both quantitatively and qualitatively. Our approach opens new opportunities for applications such as asset generation for video games and movies and improved simulations for autonomous driving scenarios from abundant in-the-wild monocular videos. In summary, our research presents a significant improvement to the state-of-the-art human modeling using implicit representations, with important implications for 3D Computer Vision (CV) and Neural Rendering and its applications in various industries. / De senaste framstegen inom datorseende har lett till uppkomsten av Neural Radiance Fields (NeRFs), ett kraftfullt verktyg för att rekonstruera fotorealistiska 3D-scener, även i dynamiska miljöer. Dessa metoder brister dock vid hantering av människor, särskilt när människan är delvis skymd eller inte helt observerbar, vilket resulterar i felaktiga rekonstruktioner av geometrier och texturer. För att ta itu med denna fråga, utvärderar denna avhandling toppmodern mänsklig modellering med hjälp av implicita representationer med partiell observerbarhet av ämnet. Vidare föreslår, samt testar vi, flertalet nya metoder för att förbättra generaliseringen av dessa modeller, inklusive användningen av symmetri och SDF-drivna förluster och utnyttjandet av förkunskaper från flera individer via en förtränad modell. Resultaten visar att våra föreslagna metoder avsevärt förbättrar rekonstruktionernas noggrannhet, även i utmanande ”in-the-wild” situationer, både kvantitativt och kvalitativt. Vårt tillvägagångssätt skapar nya möjligheter för applikationer som tillgångsgenerering för videospel och filmer och förbättrade simuleringar för scenarier för autonom körning från rikliga monokulära videor. Sammanfattningsvis, presenterar vår forskning en betydande förbättring av toppmodern modelleringen med hjälp av implicita representationer, med viktiga implikationer för 3D CV och neural rendering och dess tillämpningar i olika industrier. Clothed Human Reconstruction Neural Rendering Neural Radiance Fields Scene Reconstruction Computer Vision Clothed Human Reconstruction Neural Rendering Neural Radiance Fields Scene Reconstruction Computer Vision Computer and Information Sciences Data- och informationsvetenskap
5	Pose Estimation using Implicit Functions and Uncertainty in 3D Blomstedt, Frida January 2023 (has links) Human pose estimation in 3D is a large area within computer vision, with many application areas. A common approach is to first estimate the pose in 2D, resulting in a confidence heatmap, and then estimate the 3D pose using the most likely estimations in 2D. This may, however, cause problems in cases where pose estimates are more uncertain and the estimation of one point is far from the true position, for example when a limb is occluded. This thesis adapts the method Neural Radiance Fields (NeRF) to 2D confidence heatmaps in order to create an implicit representation of the uncertainty in 3D, thus attempting to make use of as much information in 2D as possible. The adapted method was evaluated on the Human3.6M dataset, and results show that this method outperforms a simple triangulation baseline, especially when the estimation in 2D is far from the true pose. pose estimation neural radiance fields nerf computer vision machine learning
6	Advances in generative models for dynamic scenes Castrejon Subira, Lluis Enric 05 1900 (has links) Les réseaux de neurones sont un type de modèle d'apprentissage automatique (ML) qui résolvent des tâches complexes d'intelligence artificielle (AI) sans nécessiter de représentations de données élaborées manuellement. Bien qu'ils aient obtenu des résultats impressionnants dans des tâches nécessitant un traitement de la parole, d’image, et du langage, les réseaux de neurones ont encore de la difficulté à résoudre des tâches de compréhension de scènes dynamiques. De plus, l’entraînement de réseaux de neurones nécessite généralement de nombreuses données annotées manuellement, ce qui peut être un processus long et coûteux. Cette thèse est composée de quatre articles proposant des modèles génératifs pour des scènes dynamiques. La modélisation générative est un domaine du ML qui étudie comment apprendre les mécanismes par lesquels les données sont produites. La principale motivation derrière les modèles génératifs est de pouvoir, sans utiliser d’étiquettes, apprendre des représentations de données utiles; c’est un sous-produit de l'approximation du processus de génération de données. De plus, les modèles génératifs sont utiles pour un large éventail d'applications telles que la super-résolution d'images, la synthèse vocale ou le résumé de texte. Le premier article se concentre sur l'amélioration de la performance des précédents auto-encodeurs variationnels (VAE) pour la prédiction vidéo. Il s’agit d’une tâche qui consiste à générer les images futures d'une scène dynamique, compte tenu de certaines observations antérieures. Les VAE sont une famille de modèles à variables latentes qui peuvent être utilisés pour échantillonner des points de données. Comparés à d'autres modèles génératifs, les VAE sont faciles à entraîner et ont tendance à couvrir tous les modes des données, mais produisent souvent des résultats de moindre qualité. En prédiction vidéo, les VAE ont été les premiers modèles capables de produire des images futures plausibles à partir d’un contexte donné, un progrès marquant par rapport aux modèles précédents car, pour la plupart des scènes dynamiques, le futur n'est pas une fonction déterministe du passé. Cependant, les premiers VAE pour la prédiction vidéo produisaient des résultats avec des artefacts visuels visibles et ne fonctionnaient pas sur des ensembles de données réalistes complexes. Dans cet article, nous identifions certains des facteurs limitants de ces modèles, et nous proposons pour chacun d’eux une solution pour en atténuer l'impact. Grâce à ces modifications, nous montrons que les VAE pour la prédiction vidéo peuvent obtenir des résultats de qualité nettement supérieurs par rapport aux références précédentes, et qu'ils peuvent être utilisés pour modéliser des scènes de conduite autonome. Dans le deuxième article, nous proposons un nouveau modèle en cascade pour la génération vidéo basé sur les réseaux antagonistes génératifs (GAN). Après le succès des VAE pour prédiction vidéo, il a été démontré que les GAN produisaient des échantillons vidéo de meilleure qualité pour la génération vidéo conditionnelle à des classes. Cependant, les GAN nécessitent de très grandes tailles de lots ainsi que des modèles de grande capacité, ce qui rend l’entraînement des GAN pour la génération vidéo coûteux computationnellement, à la fois en termes de mémoire et en temps de calcul. Nous proposons de scinder le processus génératif en une cascade de sous-modèles, chacun d'eux résolvant un problème plus simple. Cette division nous permet de réduire considérablement le coût computationnel tout en conservant la qualité de l'échantillon, et nous démontrons que ce modèle peut s'adapter à de très grands ensembles de données ainsi qu’à des vidéos de haute résolution. Dans le troisième article, nous concevons un modèle basé sur le principe qu'une scène est composée de différents objets, mais que les transitions de trame (également appelées règles dynamiques) sont partagées entre les objets. Pour mettre en œuvre cette hypothèse de modélisation, nous concevons un modèle qui extrait d'abord les différentes entités d'une image. Ensuite, le modèle apprend à mettre à jour la représentation de l'objet d'une image à l'autre en choisissant parmi différentes transitions possibles qui sont toutes partagées entre les différents objets. Nous montrons que, lors de l'apprentissage d'un tel modèle, les règles de transition sont fondées sémantiquement, et peuvent être appliquées à des objets non vus lors de l'apprentissage. De plus, nous pouvons utiliser ce modèle pour prédire les observations multimodales futures d'une scène dynamique en choisissant différentes transitions. Dans le dernier article nous proposons un modèle génératif basé sur des techniques de rendu 3D qui permet de générer des scènes avec plusieurs objets. Nous concevons un mécanisme d'inférence pour apprendre les représentations qui peuvent être rendues avec notre modèle et nous optimisons simultanément ce mécanisme d'inférence et le moteur de rendu. Nous montrons que ce modèle possède une représentation interprétable dans laquelle des changements sémantiques appliqués à la représentation de la scène sont rendus dans la scène générée. De plus, nous montrons que, suite au processus d’entraînement, notre modèle apprend à segmenter les objets dans une scène sans annotations et que la représentation apprise peut être utilisée pour résoudre des tâches de compréhension de scène dynamique en déduisant la représentation de chaque observation. / Neural networks are a type of Machine Learning (ML) models that solve complex Artificial Intelligence (AI) tasks without requiring handcrafted data representations. Although they have achieved impressive results in tasks requiring speech, image and language processing, neural networks still struggle to solve dynamic scene understanding tasks. Furthermore, training neural networks usually demands lots data that is annotated manually, which can be an expensive and time-consuming process. This thesis is comprised of four articles proposing generative models for dynamic scenes. Generative modelling is an area of ML that investigates how to learn the mechanisms by which data is produced. The main motivation for generative models is to learn useful data representations without labels as a by-product of approximating the data generation process. Furthermore, generative models are useful for a wide range of applications such as image super-resolution, voice synthesis or text summarization. The first article focuses on improving the performance of previous Variational AutoEncoders (VAEs) for video prediction, which is the task of generating future frames of a dynamic scene given some previous occurred observations. VAEs are a family of latent variable models that can be used to sample data points. Compared to other generative models, VAEs are easy to train and tend to cover all data modes, but often produce lower quality results. In video prediction VAEs were the first models that were able to produce multiple plausible future outcomes given a context, marking an advancement over previous models as for most dynamic scenes the future is not a deterministic function of the past. However, the first VAEs for video prediction produced results with visible visual artifacts and could not operate on complex realistic datasets. In this article we identify some of the limiting factors for these models, and for each of them we propose a solution to ease its impact. With our proposed modifications, we show that VAEs for video prediction can obtain significant higher quality results over previous baselines and that they can be used to model autonomous driving scenes. In the second article we propose a new cascaded model for video generation based on Generative Adversarial Networks (GANs). After the success of VAEs in video prediction, GANs were shown to produce higher quality video samples for class-conditional video generation. However, GANs require very large batch sizes and high capacity models, which makes training GANs for video generation computationally expensive, both in terms of memory and training time. We propose to split the generative process into a cascade of submodels, each of them solving a smaller generative problem. This split allows us to significantly reduce the computational requirements while retaining sample quality, and we show that this model can scale to very large datasets and video resolutions. In the third article we design a model based on the premise that a scene is comprised of different objects but that frame transitions (also known as dynamic rules) are shared among objects. To implement this modeling assumption we design a model that first extracts the different entities in a frame, and then learns to update the object representation from one frame to another by choosing among different possible transitions, all shared among objects. We show that, when learning such a model, the transition rules are semantically grounded and can be applied to objects not seen during training. Further, we can use this model for predicting multimodal future observations of a dynamic scene by choosing different transitions. In the last article we propose a generative model based on 3D rendering techniques that can generate scenes with multiple objects. We design an inference mechanism to learn representations that can be rendered with our model and we simultaneously optimize this inference mechanism and the renderer. We show that this model has an interpretable representation in which semantic changes to the scene representation are shown in the output. Furthermore, we show that, as a by product of the training process, our model learns to segment the objects in a scene without annotations and that the learned representation can be used to solve dynamic scene understanding tasks by inferring the representation of each observation. Neural networks Deep learning Video generation Generative models Variational autoencoders Generative adversarial networks Video prediction Neural radiance fields Réseaux de neurones Apprentissage profond Auto-encodeurs variationnels Réseaux antagonistes génératifs Prédiction vidéo Génération de vidéo Champs de rayonnement neuronal

1

Page generated in 0.0476 seconds