Modern machine learning methods, utilising neural networks, require a lot of training data. Data gathering and preparation has thus become a major bottleneck in the machine learning pipeline and researchers often use large public datasets to conduct their research (such as the ImageNet [1] or MNIST [2] datasets). As these methods begin being used in industry, these challenges become apparent. In factories objects being produced are often unique and may even involve trade secrets and patents that need to be protected. Additionally, manufacturing may not have started yet, making real data collection impossible. In both cases a public dataset is unlikely to be applicable. One possible solution, investigated in this thesis, is synthetic data generation. Synthetic data generation using physically based rendering was tested for unsupervised anomaly detection on a 3D printed block. A small image dataset was gathered of the block as control and a data generation model was created using its CAD model, a resource most often available in industrial settings. The data generation model used randomisation to reduce the domain shift between the real and synthetic data. For testing the data, autoencoder models were trained, both on the real and synthetic data separately and in combination. The material of the block, a white painted surface, proved challenging to reconstruct and no significant difference between the synthetic and real data could be observed. The model trained on real data outperformed the models trained on synthetic and the combined data. However, the synthetic data combined with the real data showed promise with reducing some of the bias intentionally introduced in the real dataset. Future research could focus on creating synthetic data for a problem where a good anomaly detection model already exists, with the goal of transferring some of the synthetic data generation model (such as the materials) to a new problem. This would be of interest in industries where they produce many different but similar objects and could reduce the time needed when starting a new machine learning project.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-443243 |
Date | January 2021 |
Creators | Fröjdholm, Hampus |
Publisher | Uppsala universitet, Avdelningen för visuell information och interaktion |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Relation | UPTEC F, 1401-5757 ; 21015 |
Page generated in 0.0026 seconds