1 |
3D Object Detection Using Virtual Environment Assisted Deep Network TrainingDale, Ashley S. 12 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / An RGBZ synthetic dataset consisting of five object classes in a variety of virtual environments and orientations was combined with a small sample of real-world
image data and used to train the Mask R-CNN (MR-CNN) architecture in a variety
of configurations. When the MR-CNN architecture was initialized with MS COCO
weights and the heads were trained with a mix of synthetic data and real world data,
F1 scores improved in four of the five classes: The average maximum F1-score of
all classes and all epochs for the networks trained with synthetic data is F1∗ = 0.91,
compared to F1 = 0.89 for the networks trained exclusively with real data, and the
standard deviation of the maximum mean F1-score for synthetically trained networks
is σ∗ = 0.015, compared to σ_F1 = 0.020 for the networks trained exclusively with real F1
data. Various backgrounds in synthetic data were shown to have negligible impact on F1 scores, opening the door to abstract backgrounds and minimizing the need for intensive synthetic data fabrication. When the MR-CNN architecture was initialized with MS COCO weights and depth data was included in the training data, the net- work was shown to rely heavily on the initial convolutional input to feed features into the network, the image depth channel was shown to influence mask generation, and the image color channels were shown to influence object classification. A set of latent variables for a subset of the synthetic datatset was generated with a Variational Autoencoder then analyzed using Principle Component Analysis and Uniform Manifold Projection and Approximation (UMAP). The UMAP analysis showed no meaningful distinction between real-world and synthetic data, and a small bias towards clustering based on image background.
|
2 |
3D OBJECT DETECTION USING VIRTUAL ENVIRONMENT ASSISTED DEEP NETWORK TRAININGAshley S Dale (8771429) 07 January 2021 (has links)
<div>
<div>
<div>
<p>An RGBZ synthetic dataset consisting of five object classes in a variety of virtual environments and orientations was combined with a small sample of real-world image data and used to train the Mask R-CNN (MR-CNN) architecture in a variety of configurations. When the MR-CNN architecture was initialized with MS COCO weights and the heads were trained with a mix of synthetic data and real world data, F1 scores improved in four of the five classes: The average maximum F1-score of all classes and all epochs for the networks trained with synthetic data is F1∗ = 0.91, compared to F1 = 0.89 for the networks trained exclusively with real data, and the standard deviation of the maximum mean F1-score for synthetically trained networks is σ∗ <sub>F1 </sub>= 0.015, compared to σF 1 = 0.020 for the networks trained exclusively with real data. Various backgrounds in synthetic data were shown to have negligible impact
on F1 scores, opening the door to abstract backgrounds and minimizing the need for
intensive synthetic data fabrication. When the MR-CNN architecture was initialized
with MS COCO weights and depth data was included in the training data, the net-
work was shown to rely heavily on the initial convolutional input to feed features into
the network, the image depth channel was shown to influence mask generation, and
the image color channels were shown to influence object classification. A set of latent
variables for a subset of the synthetic datatset was generated with a Variational Autoencoder then analyzed using Principle Component Analysis and Uniform Manifold
Projection and Approximation (UMAP). The UMAP analysis showed no meaningful distinction between real-world and synthetic data, and a small bias towards clustering
based on image background.
</p></div></div></div>
|
Page generated in 0.0224 seconds