1 |
Occlusion Management in Conventional and Head-Mounted Display Visualization through the Relaxation of the Single Viewpoint/Timepoint ConstraintMeng-Lin Wu (6916283) 16 August 2019 (has links)
<div>In conventional computer graphics and visualization, images are synthesized following the planar pinhole camera (PPC) model. The PPC approximates physical imaging devices such as cameras and the human eye, which sample the scene with linear rays that originate from a single viewpoint, i.e. the pinhole. In addition, the PPC takes a snapshot of the scene, sampling it at a single instant in time, or timepoint, for each image. Images synthesized with these single viewpoint and single timepoint constraints are familiar to the user, as they emulate images captured with cameras or perceived by the human visual system. However, visualization using the PPC model suffers from the limitation of occlusion, when a region of interest (ROI) is not visible due to obstruction by other data. The conventional solution to the occlusion problem is to rely on the user to change the view interactively to gain line of sight to the scene ROIs. This approach of sequential navigation has the shortcomings of (1) inefficiency, as navigation is wasted when circumventing an occluder does not reveal an ROI, (2) inefficacy, as a moving or a transient ROI can hide or disappear before the user reaches it, or as scene understanding requires visualizing multiple distant ROIs in parallel, and (3) user confusion, as back-and-forth navigation for systematic scene exploration can hinder spatio-temporal awareness.</div><div><br></div><div>In this thesis we propose a novel paradigm for handling occlusions in visualization based on generalizing an image to incorporate samples from multiple viewpoints and multiple timepoints. The image generalization is implemented at camera model level, by removing the same timepoint restriction, and by removing the linear ray restriction, allowing for curved rays that are routed around occluders to reach distant ROIs. The paradigm offers the opportunity to greatly increase the information bandwidth of images, which we have explored in the context of both desktop and head-mounted display visualization, as needed in virtual and augmented reality applications. The challenges of multi-viewpoint multi-timepoint visualization are (1) routing the non-linear rays to find all ROIs or to reach all known ROIs, (2) making the generalized image easy to parse by enforcing spatial and temporal continuity and non-redundancy, (3) rendering the generalized images quickly as required by interactive applications, and (4) developing algorithms and user interfaces for the intuitive navigation of the compound cameras with tens of degrees of freedom. We have addressed these challenges (1) by developing a multiperspective visualization framework based on a hierarchical camera model with PPC and non-PPC leafs, (2) by routing multiple inflection point rays with direction coherence, which enforces visualization continuity, and without intersection, which enforces non-redundancy, (3) by designing our hierarchical camera model to provide closed-form projection, which enables porting generalized image rendering to the traditional and highly-efficient projection followed by rasterization pipeline implemented by graphics hardware, and (4) by devising naturalistic user interfaces based on tracked head-mounted displays that allow deploying and retracting the additional perspectives intuitively and without simulator sickness.</div>
|
2 |
Vision-based approaches for surgical activity recognition using laparoscopic and RBGD videos / Approches basées vision pour la reconnaissance d’activités chirurgicales à partir de vidéos laparoscopiques et multi-vues RGBDTwinanda, Andru Putra 27 January 2017 (has links)
Cette thèse a pour objectif la conception de méthodes pour la reconnaissance automatique des activités chirurgicales. Cette reconnaissance est un élément clé pour le développement de systèmes réactifs au contexte clinique et pour des applications comme l’assistance automatique lors de chirurgies complexes. Nous abordons ce problème en utilisant des méthodes de Vision puisque l’utilisation de caméras permet de percevoir l’environnement sans perturber la chirurgie. Deux types de vidéos sont utilisées : des vidéos laparoscopiques et des vidéos multi-vues RGBD. Nous avons d’abord étudié les résultats obtenus avec les méthodes de l’état de l’art, puis nous avons proposé des nouvelles approches basées sur le « Deep learning ». Nous avons aussi généré de larges jeux de données constitués d’enregistrements de chirurgies. Les résultats montrent que nos méthodes permettent d’obtenir des meilleures performances pour la reconnaissance automatique d’activités chirurgicales que l’état de l’art. / The main objective of this thesis is to address the problem of activity recognition in the operating room (OR). Activity recognition is an essential component in the development of context-aware systems, which will allow various applications, such as automated assistance during difficult procedures. Here, we focus on vision-based approaches since cameras are a common source of information to observe the OR without disrupting the surgical workflow. Specifically, we propose to use two complementary video types: laparoscopic and OR-scene RGBD videos. We investigate how state-of-the-art computer vision approaches perform on these videos and propose novel approaches, consisting of deep learning approaches, to carry out the tasks. To evaluate our proposed approaches, we generate large datasets of recordings of real surgeries. The results demonstrate that the proposed approaches outperform the state-of-the-art methods in performing surgical activity recognition on these new datasets.
|
3 |
3D Object Detection Using Virtual Environment Assisted Deep Network TrainingDale, Ashley S. 12 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / An RGBZ synthetic dataset consisting of five object classes in a variety of virtual environments and orientations was combined with a small sample of real-world
image data and used to train the Mask R-CNN (MR-CNN) architecture in a variety
of configurations. When the MR-CNN architecture was initialized with MS COCO
weights and the heads were trained with a mix of synthetic data and real world data,
F1 scores improved in four of the five classes: The average maximum F1-score of
all classes and all epochs for the networks trained with synthetic data is F1∗ = 0.91,
compared to F1 = 0.89 for the networks trained exclusively with real data, and the
standard deviation of the maximum mean F1-score for synthetically trained networks
is σ∗ = 0.015, compared to σ_F1 = 0.020 for the networks trained exclusively with real F1
data. Various backgrounds in synthetic data were shown to have negligible impact on F1 scores, opening the door to abstract backgrounds and minimizing the need for intensive synthetic data fabrication. When the MR-CNN architecture was initialized with MS COCO weights and depth data was included in the training data, the net- work was shown to rely heavily on the initial convolutional input to feed features into the network, the image depth channel was shown to influence mask generation, and the image color channels were shown to influence object classification. A set of latent variables for a subset of the synthetic datatset was generated with a Variational Autoencoder then analyzed using Principle Component Analysis and Uniform Manifold Projection and Approximation (UMAP). The UMAP analysis showed no meaningful distinction between real-world and synthetic data, and a small bias towards clustering based on image background.
|
4 |
3D OBJECT DETECTION USING VIRTUAL ENVIRONMENT ASSISTED DEEP NETWORK TRAININGAshley S Dale (8771429) 07 January 2021 (has links)
<div>
<div>
<div>
<p>An RGBZ synthetic dataset consisting of five object classes in a variety of virtual environments and orientations was combined with a small sample of real-world image data and used to train the Mask R-CNN (MR-CNN) architecture in a variety of configurations. When the MR-CNN architecture was initialized with MS COCO weights and the heads were trained with a mix of synthetic data and real world data, F1 scores improved in four of the five classes: The average maximum F1-score of all classes and all epochs for the networks trained with synthetic data is F1∗ = 0.91, compared to F1 = 0.89 for the networks trained exclusively with real data, and the standard deviation of the maximum mean F1-score for synthetically trained networks is σ∗ <sub>F1 </sub>= 0.015, compared to σF 1 = 0.020 for the networks trained exclusively with real data. Various backgrounds in synthetic data were shown to have negligible impact
on F1 scores, opening the door to abstract backgrounds and minimizing the need for
intensive synthetic data fabrication. When the MR-CNN architecture was initialized
with MS COCO weights and depth data was included in the training data, the net-
work was shown to rely heavily on the initial convolutional input to feed features into
the network, the image depth channel was shown to influence mask generation, and
the image color channels were shown to influence object classification. A set of latent
variables for a subset of the synthetic datatset was generated with a Variational Autoencoder then analyzed using Principle Component Analysis and Uniform Manifold
Projection and Approximation (UMAP). The UMAP analysis showed no meaningful distinction between real-world and synthetic data, and a small bias towards clustering
based on image background.
</p></div></div></div>
|
Page generated in 0.0264 seconds