Spelling suggestions: "subject:"aynthetic dataset"" "subject:"aynthetic ataset""
1 |
WiSDM: a platform for crowd-sourced data acquisition, analytics, and synthetic data generationChoudhury, Ananya 15 August 2016 (has links)
Human behavior is a key factor influencing the spread of infectious diseases. Individuals adapt their daily routine and typical behavior during the course of an epidemic -- the adaptation is based on their perception of risk of contracting the disease and its impact. As a result, it is desirable to collect behavioral data before and during a disease outbreak. Such data can help in creating better computer models that can, in turn, be used by epidemiologists and policy makers to better plan and respond to infectious disease outbreaks. However, traditional data collection methods are not well suited to support the task of acquiring human behavior related information; especially as it pertains to epidemic planning and response.
Internet-based methods are an attractive complementary mechanism for collecting behavioral information. Systems such as Amazon Mechanical Turk (MTurk) and online survey tools provide simple ways to collect such information. This thesis explores new methods for information acquisition, especially behavioral information that leverage this recent technology.
Here, we present the design and implementation of a crowd-sourced surveillance data acquisition system -- WiSDM. WiSDM is a web-based application and can be used by anyone with access to the Internet and a browser. Furthermore, it is designed to leverage online survey tools and MTurk; WiSDM can be embedded within MTurk in an iFrame. WiSDM has a number of novel features, including, (i) ability to support a model-based abductive reasoning loop: a flexible and adaptive information acquisition scheme driven by causal models of epidemic processes, (ii) question routing: an important feature to increase data acquisition efficacy and reduce survey fatigue and (iii) integrated surveys: interactive surveys to provide additional information on survey topic and improve user motivation.
We evaluate the framework's performance using Apache JMeter and present our results. We also discuss three other extensions of WiSDM: Adapter, Synthetic Data Generator, and WiSDM Analytics. The API Adapter is an ETL extension of WiSDM which enables extracting data from disparate data sources and loading to WiSDM database. The Synthetic Data Generator allows epidemiologists to build synthetic survey data using NDSSL's Synthetic Population as agents. WiSDM Analytics empowers users to perform analysis on the data by writing simple python code using Versa APIs. We also propose a data model that is conducive to survey data analysis. / Master of Science
|
2 |
Video Analytics for Agricultural ApplicationsShengtai Ju (19180429) 20 July 2024 (has links)
<p dir="ltr">Agricultural applications often require human experts with domain knowledge to ensure compliance and improve productivity, which can be costly and inefficient. To tackle this problem, automated video systems can be implemented for agricultural tasks thanks to the ubiquity of cameras. In this thesis, we focus on designing and implementing video analytics systems for real applications in agriculture by combining both traditional image processing and recent advancements in computer vision. Existing research and available methods have been heavily focused on obtaining the best performance on large-scale benchmarking datasets, while neglecting the applications to real-world problems. Our goal is to bridge the gap between state-of-art methods and real agricultural applications. More specifically, we design video systems for the two tasks of monitoring turkey behavior for turkey welfare and handwashing action recognition for improved food safety. For monitoring turkeys, we implement a turkey detector, a turkey tracker, and a turkey head tracker by combining object detection and multi-object tracking. Furthermore, we detect turkey activities by incorporating motion information. For recognizing handwashing activities, we combine a hand extraction method for focusing on the hand regions with a neural network to build a hand image classifier. In addition, we apply a two-stream network with RGB and hand streams to further improve performance and robustness.</p><p dir="ltr">Besides designing a robust hand classifier, we explore how dataset attributes and distribution shifts can impact system performance. In particular, distribution shifts caused by changes in hand poses and shadow can cause a classifier’s performance to degrade sharply or breakdown beyond a certain point. To better explore the impact of hand poses and shadow and to mitigate the induced breakdown points, we generate synthetic data with desired variations to introduce controlled distribution shift. Experimental results show that the breakdown points are heavily impacted by pose and shadow conditions. In addition, we demonstrate mitigation strategies to significant performance degradation by using selective additional training data and adding synthetic shadow to images. By incorporating domain knowledge and understanding the applications, we can effectively design video analytics systems and apply advanced techniques in agricultural scenarios.</p>
|
3 |
Segmentace obrazových dat pomocí hlubokých neuronových sítí / Image Segmentation with Deep Neural NetworkPazderka, Radek January 2019 (has links)
This master's thesis is focused on segmentation of the scene from traffic environment. The solution to this problem is segmentation neural networks, which enables classification of every pixel in the image. In this thesis is created segmentation neural network, that has reached better results than present state-of-the-art architectures. This work is also focused on the segmentation of the top view of the road, as there are no freely available annotated datasets. For this purpose, there was created automatic tool for generation of synthetic datasets by using PC game Grand Theft Auto V. The work compares the networks, that have been trained solely on synthetic data and the networks that have been trained on both real and synthetic data. Experiments prove, that the synthetic data can be used for segmentation of the data from the real environment. There has been implemented a system, that enables work with segmentation neural networks.
|
4 |
Generátor syntetické datové sady pro dopravní analýzu / Synthetic Data Set Generator for Traffic AnalysisŠlosár, Peter January 2014 (has links)
This Master's thesis deals with the design and development of tools for generating a synthetic dataset for traffic analysis purposes. The first part contains a brief introduction to the vehicle detection and rendering methods. Blender and the set of scripts are used to create highly customizable training images dataset and synthetic videos from a single photograph. Great care is taken to create very realistic output, that is suitable for further processing in field of traffic analysis. Produced images and videos are automatically richly annotated. Achieved results are tested by training a sample car detector and evaluated with real life testing data. Synthetic dataset outperforms real training datasets in this comparison of the detection rate. Computational demands of the tools are evaluated as well. The final part sums up the contribution of this thesis and outlines some extensions of the tools for the future.
|
5 |
An empirical study on synthetic image generation techniques for object detectorsArcidiacono, Claudio Salvatore January 2018 (has links)
Convolutional Neural Networks are a very powerful machine learning tool that outperformed other techniques in image recognition tasks. The biggest drawback of this method is the massive amount of training data required, since producing training data for image recognition tasks is very labor intensive. To tackle this issue, different techniques have been proposed to generate synthetic training data automatically. These synthetic data generation techniques can be grouped in two categories: the first category generates synthetic images using computer graphic software and CAD models of the objects to recognize; the second category generates synthetic images by cutting the object from an image and pasting it on another image. Since both techniques have their pros and cons, it would be interesting for industries to investigate more in depth the two approaches. A common use case in industrial scenarios is detecting and classifying objects inside an image. Different objects appertaining to classes relevant in industrial scenarios are often undistinguishable (for example, they all the same component). For these reasons, this thesis work aims to answer the research question “Among the CAD model generation techniques, the Cut-paste generation techniques and a combination of the two techniques, which technique is more suitable for generating images for training object detectors in industrial scenarios”. In order to answer the research question, two synthetic image generation techniques appertaining to the two categories are proposed.The proposed techniques are tailored for applications where all the objects appertaining to the same class are indistinguishable, but they can also be extended to other applications. The two synthetic image generation techniques are compared measuring the performances of an object detector trained using synthetic images on a test dataset of real images. The performances of the two synthetic data generation techniques used for data augmentation have been also measured. The empirical results show that the CAD models generation technique works significantly better than the Cut-Paste generation technique where synthetic images are the only source of training data (61% better),whereas the two generation techniques perform equally good as data augmentation techniques. Moreover, the empirical results show that the models trained using only synthetic images performs almost as good as the model trained using real images (7,4% worse) and that augmenting the dataset of real images using synthetic images improves the performances of the model (9,5% better). / Konvolutionella neurala nätverk är ett mycket kraftfullt verktyg för maskininlärning som överträffade andra tekniker inom bildigenkänning. Den största nackdelen med denna metod är den massiva mängd träningsdata som krävs, eftersom det är mycket arbetsintensivt att producera träningsdata för bildigenkänningsuppgifter. För att ta itu med detta problem har olika tekniker föreslagits för att generera syntetiska träningsdata automatiskt. Dessa syntetiska datagenererande tekniker kan grupperas i två kategorier: den första kategorin genererar syntetiska bilder med hjälp av datorgrafikprogram och CAD-modeller av objekten att känna igen; Den andra kategorin genererar syntetiska bilder genom att klippa objektet från en bild och klistra in det på en annan bild. Eftersom båda teknikerna har sina fördelar och nackdelar, skulle det vara intressant för industrier att undersöka mer ingående de båda metoderna. Ett vanligt fall i industriella scenarier är att upptäcka och klassificera objekt i en bild. Olika föremål som hänför sig till klasser som är relevanta i industriella scenarier är ofta oskiljbara (till exempel de är alla samma komponent). Av dessa skäl syftar detta avhandlingsarbete till att svara på frågan “Bland CAD-genereringsteknikerna, Cut-paste generationsteknikerna och en kombination av de två teknikerna, vilken teknik är mer lämplig för att generera bilder för träningsobjektdetektorer i industriellascenarier”. För att svara på forskningsfrågan föreslås två syntetiska bildgenereringstekniker som hänför sig till de två kategorierna. De föreslagna teknikerna är skräddarsydda för applikationer där alla föremål som tillhör samma klass är oskiljbara, men de kan också utökas till andra applikationer. De två syntetiska bildgenereringsteknikerna jämförs med att mäta prestanda hos en objektdetektor som utbildas med hjälp av syntetiska bilder på en testdataset med riktiga bilder. Föreställningarna för de två syntetiska datagenererande teknikerna som används för dataförökning har också uppmätts. De empiriska resultaten visar att CAD-modelleringstekniken fungerar väsentligt bättre än Cut-Paste-genereringstekniken, där syntetiska bilder är den enda källan till träningsdata (61% bättre), medan de två generationsteknikerna fungerar lika bra som dataförstoringstekniker. Dessutom visar de empiriska resultaten att modellerna som utbildats med bara syntetiska bilder utför nästan lika bra som modellen som utbildats med hjälp av riktiga bilder (7,4% sämre) och att förstora datasetet med riktiga bilder med hjälp av syntetiska bilder förbättrar modellens prestanda (9,5% bättre).
|
6 |
3D Object Detection Using Virtual Environment Assisted Deep Network TrainingDale, Ashley S. 12 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / An RGBZ synthetic dataset consisting of five object classes in a variety of virtual environments and orientations was combined with a small sample of real-world
image data and used to train the Mask R-CNN (MR-CNN) architecture in a variety
of configurations. When the MR-CNN architecture was initialized with MS COCO
weights and the heads were trained with a mix of synthetic data and real world data,
F1 scores improved in four of the five classes: The average maximum F1-score of
all classes and all epochs for the networks trained with synthetic data is F1∗ = 0.91,
compared to F1 = 0.89 for the networks trained exclusively with real data, and the
standard deviation of the maximum mean F1-score for synthetically trained networks
is σ∗ = 0.015, compared to σ_F1 = 0.020 for the networks trained exclusively with real F1
data. Various backgrounds in synthetic data were shown to have negligible impact on F1 scores, opening the door to abstract backgrounds and minimizing the need for intensive synthetic data fabrication. When the MR-CNN architecture was initialized with MS COCO weights and depth data was included in the training data, the net- work was shown to rely heavily on the initial convolutional input to feed features into the network, the image depth channel was shown to influence mask generation, and the image color channels were shown to influence object classification. A set of latent variables for a subset of the synthetic datatset was generated with a Variational Autoencoder then analyzed using Principle Component Analysis and Uniform Manifold Projection and Approximation (UMAP). The UMAP analysis showed no meaningful distinction between real-world and synthetic data, and a small bias towards clustering based on image background.
|
7 |
Detekce dopravních značek a semaforů / Detection of Traffic Signs and LightsOškera, Jan January 2020 (has links)
The thesis focuses on modern methods of traffic sign detection and traffic lights detection directly in traffic and with use of back analysis. The main subject is convolutional neural networks (CNN). The solution is using convolutional neural networks of YOLO type. The main goal of this thesis is to achieve the greatest possible optimization of speed and accuracy of models. Examines suitable datasets. A number of datasets are used for training and testing. These are composed of real and synthetic data sets. For training and testing, the data were preprocessed using the Yolo mark tool. The training of the model was carried out at a computer center belonging to the virtual organization MetaCentrum VO. Due to the quantifiable evaluation of the detector quality, a program was created statistically and graphically showing its success with use of ROC curve and evaluation protocol COCO. In this thesis I created a model that achieved a success average rate of up to 81 %. The thesis shows the best choice of threshold across versions, sizes and IoU. Extension for mobile phones in TensorFlow Lite and Flutter have also been created.
|
8 |
3D OBJECT DETECTION USING VIRTUAL ENVIRONMENT ASSISTED DEEP NETWORK TRAININGAshley S Dale (8771429) 07 January 2021 (has links)
<div>
<div>
<div>
<p>An RGBZ synthetic dataset consisting of five object classes in a variety of virtual environments and orientations was combined with a small sample of real-world image data and used to train the Mask R-CNN (MR-CNN) architecture in a variety of configurations. When the MR-CNN architecture was initialized with MS COCO weights and the heads were trained with a mix of synthetic data and real world data, F1 scores improved in four of the five classes: The average maximum F1-score of all classes and all epochs for the networks trained with synthetic data is F1∗ = 0.91, compared to F1 = 0.89 for the networks trained exclusively with real data, and the standard deviation of the maximum mean F1-score for synthetically trained networks is σ∗ <sub>F1 </sub>= 0.015, compared to σF 1 = 0.020 for the networks trained exclusively with real data. Various backgrounds in synthetic data were shown to have negligible impact
on F1 scores, opening the door to abstract backgrounds and minimizing the need for
intensive synthetic data fabrication. When the MR-CNN architecture was initialized
with MS COCO weights and depth data was included in the training data, the net-
work was shown to rely heavily on the initial convolutional input to feed features into
the network, the image depth channel was shown to influence mask generation, and
the image color channels were shown to influence object classification. A set of latent
variables for a subset of the synthetic datatset was generated with a Variational Autoencoder then analyzed using Principle Component Analysis and Uniform Manifold
Projection and Approximation (UMAP). The UMAP analysis showed no meaningful distinction between real-world and synthetic data, and a small bias towards clustering
based on image background.
</p></div></div></div>
|
Page generated in 0.0576 seconds