Spelling suggestions: "subject:"aynthetic data"" "subject:"asynthetic data""
31 |
Synthetic Data Generation and Training Pipeline for General Object Detection Using Domain RandomizationArnestrand, Hampus, Mark, Casper January 2024 (has links)
The development of high-performing object detection models requires extensive and varied datasets with accurately annotated images, a process that is traditionally labor-intensive and prone to errors. To address these challenges, this report explores the generation of synthetic data using domain randomization techniques to train object detection models. We present a pipeline that integrates synthetic data creation in Unity, and the training of YOLOv8 object detection models. Our approach uses the Unity Perception package to produce diverse and precisely annotated datasets, overcoming the domain gap typically associated with synthetic data. The pipeline was evaluated through a series of experiments, analyzing the impact of various parameters such as background textures, and training arguments on model performance. The results demonstrate that models trained with our synthetic data can achieve high accuracy and generalize well to real-world scenarios, offering a scalable and efficient alternative to manual data annotation.
|
32 |
Applying Simulation to the Problem of Detecting Financial FraudLopez-Rojas, Edgar Alonso January 2016 (has links)
This thesis introduces a financial simulation model covering two related financial domains: Mobile Payments and Retail Stores systems. The problem we address in these domains is different types of fraud. We limit ourselves to isolated cases of relatively straightforward fraud. However, in this thesis the ultimate aim is to introduce our approach towards the use of computer simulation for fraud detection and its applications in financial domains. Fraud is an important problem that impact the whole economy. Currently, there is a lack of public research into the detection of fraud. One important reason is the lack of transaction data which is often sensitive. To address this problem we present a mobile money Payment Simulator (PaySim) and Retail Store Simulator (RetSim), which allow us to generate synthetic transactional data that contains both: normal customer behaviour and fraudulent behaviour. These simulations are Multi Agent-Based Simulations (MABS) and were calibrated using real data from financial transactions. We developed agents that represent the clients and merchants in PaySim and customers and salesmen in RetSim. The normal behaviour was based on behaviour observed in data from the field, and is codified in the agents as rules of transactions and interaction between clients and merchants, or customers and salesmen. Some of these agents were intentionally designed to act fraudulently, based on observed patterns of real fraud. We introduced known signatures of fraud in our model and simulations to test and evaluate our fraud detection methods. The resulting behaviour of the agents generate a synthetic log of all transactions as a result of the simulation. This synthetic data can be used to further advance fraud detection research, without leaking sensitive information about the underlying data or breaking any non-disclose agreements. Using statistics and social network analysis (SNA) on real data we calibrated the relations between our agents and generate realistic synthetic data sets that were verified against the domain and validated statistically against the original source. We then used the simulation tools to model common fraud scenarios to ascertain exactly how effective are fraud techniques such as the simplest form of statistical threshold detection, which is perhaps the most common in use. The preliminary results show that threshold detection is effective enough at keeping fraud losses at a set level. This means that there seems to be little economic room for improved fraud detection techniques. We also implemented other applications for the simulator tools such as the set up of a triage model and the measure of cost of fraud. This showed to be an important help for managers that aim to prioritise the fraud detection and want to know how much they should invest in fraud to keep the loses below a desired limit according to different experimented and expected scenarios of fraud.
|
33 |
Venture Capital Investment under Private InformationNarayanan, Meyyappan January 2011 (has links)
Many venture capitalists (VCs) use the “VC method” of valuation where they use judgment to estimate a probability of successful exit while determining the ownership share to demand in exchange for investing in a venture. However, prior models are not aligned with the “VC method” because they do not consider private information about entrepreneurial characteristics, the primary drivers of the above probability, and consequently do not model judgment. The three main chapters of this thesis—one theoretical, one simulation, and one empirical study—examine the venture capital deal process in sync with the “VC method.”
Chapter 2 is theoretical and develops a principal-agent model of venture capital deal process incorporating double-sided moral hazard and one-sided private information. The VC is never fully informed about the entrepreneur’s disutility of effort in spite of due diligence checks, so takes on a belief about the latter’s performance in the funded venture to determine the offer. This study suggests that there exists a critical point in the VC’s belief—and correspondingly in the VC’s ownership share—that maximizes the total return to the two parties. It also uncovers optimal revision strategies for the VC to adopt if the offer is rejected where it is shown that the VC should develop a strong advisory capacity and minimize time constraints to facilitate investment.
Chapter 3 simulates venture capital deals as per the theoretical model and confirms the existence of critical points in the VC’s belief and ownership share that maximize the returns to the two parties and their total return. Particularly, the VC’s return (in excess of his or her return from an alternate investment) peaks for a moderate ownership share for the VC. Since private information with the entrepreneur would preclude the VC from knowing these critical points a priori, the VC should demand a moderate ownership share to stay close to such a peak. Using data from simulations, we also generate predictions about the properties of the venture capital deal space—notably: (a) Teamwork is crucial to financing; and (b) If the VC is highly confident about the entrepreneur’s performance, it would work to the latter’s advantage. Chapter 4 reports the results from our survey of eight seasoned VCs affiliated with seven firms operating in Canada, USA, and UK, where our findings received a high degree of support.
|
34 |
Venture Capital Investment under Private InformationNarayanan, Meyyappan January 2011 (has links)
Many venture capitalists (VCs) use the “VC method” of valuation where they use judgment to estimate a probability of successful exit while determining the ownership share to demand in exchange for investing in a venture. However, prior models are not aligned with the “VC method” because they do not consider private information about entrepreneurial characteristics, the primary drivers of the above probability, and consequently do not model judgment. The three main chapters of this thesis—one theoretical, one simulation, and one empirical study—examine the venture capital deal process in sync with the “VC method.”
Chapter 2 is theoretical and develops a principal-agent model of venture capital deal process incorporating double-sided moral hazard and one-sided private information. The VC is never fully informed about the entrepreneur’s disutility of effort in spite of due diligence checks, so takes on a belief about the latter’s performance in the funded venture to determine the offer. This study suggests that there exists a critical point in the VC’s belief—and correspondingly in the VC’s ownership share—that maximizes the total return to the two parties. It also uncovers optimal revision strategies for the VC to adopt if the offer is rejected where it is shown that the VC should develop a strong advisory capacity and minimize time constraints to facilitate investment.
Chapter 3 simulates venture capital deals as per the theoretical model and confirms the existence of critical points in the VC’s belief and ownership share that maximize the returns to the two parties and their total return. Particularly, the VC’s return (in excess of his or her return from an alternate investment) peaks for a moderate ownership share for the VC. Since private information with the entrepreneur would preclude the VC from knowing these critical points a priori, the VC should demand a moderate ownership share to stay close to such a peak. Using data from simulations, we also generate predictions about the properties of the venture capital deal space—notably: (a) Teamwork is crucial to financing; and (b) If the VC is highly confident about the entrepreneur’s performance, it would work to the latter’s advantage. Chapter 4 reports the results from our survey of eight seasoned VCs affiliated with seven firms operating in Canada, USA, and UK, where our findings received a high degree of support.
|
35 |
Generating Synthetic Schematics with Generative Adversarial NetworksDaley Jr, John January 2020 (has links)
This study investigates synthetic schematic generation using conditional generative adversarial networks, specifically the Pix2Pix algorithm was implemented for the experimental phase of the study. With the increase in deep neural network’s capabilities and availability, there is a demand for verbose datasets. This in combination with increased privacy concerns, has led to synthetic data generation utilization. Analysis of synthetic images was completed using a survey. Blueprint images were generated and were successful in passing as genuine images with an accuracy of 40%. This study confirms the ability of generative neural networks ability to produce synthetic blueprint images.
|
36 |
Synthetic Data Generation Using Transformer Networks / Textgenerering med transformatornätverk : Skapa text från ett syntetiskt dataset i tabellformCampos, Pedro January 2021 (has links)
One of the areas propelled by the advancements in Deep Learning is Natural Language Processing. These continuous advancements allowed the emergence of new language models such as the Transformer [1], a deep learning model based on attention mechanisms that takes a sequence of symbols as input and outputs another sequence, attending to the input during its generation. This model is often used in translation, text summarization and text generation, outperforming previous used methods such as Recurrent Neural Networks and Generative Adversarial Networks. The problem statement provided by the company Syndata for this thesis is related to this new architecture: Given a tabular dataset, create a model based on the Transformer that can generate text fields considering the underlying context from the rest of the accompanying fields. In an attempt to accomplish this, Syndata has previously implemented a recurrent model, nevertheless, they’re confident that a Transformer could perform better at this task. Their goal is to improve the solution provided with the implementation of a model based on the Transformer architecture. The implemented model should then be compared to the previous recurrent model and it’s expected to outperform it. Since there aren’t many published research articles where Transformers are used for synthetic tabular data generation, this problem is fairly original. Four different models were implemented: a model that is based on the GPT architecture [2], an LSTM [3], a Bidirectional-LSTM with an Encoder- Decoder structure and the Transformer. The first two models are autoregressive models and the later two are sequence-to-sequence models which have an Encoder-Decoder architecture. We evaluated each one of them based on 3 different aspects: on the distribution similarity between the real and generated datasets, on how well each model was able to condition name generation considering the information contained in the accompanying fields and on how much real data the model compromised after generation, which addresses a privacy related issue. We found that the Encoder-Decoder models such as the Transformer and the Bidirectional LSTM seem to perform better for this type of synthetic data generation where the output (or the field to be predicted) has to be conditioned by the rest of the accompanying fields. They’ve outperformed the GPT and the RNNmodels in the aspects that matter most to Syndata: keeping customer data private and being able to correctly condition the output with the information contained in the accompanying fields. / Deep learning har lett till stora framsteg inom textbaserad språkteknologi (Natural Language Processing) där en typ av maskininlärningsarkitektur kallad Transformers[1] har haft ett extra stort intryck. Dessa modeller använder sig av en så kallad attention mekanism, tränas som språkmodeller (Language Models), där de tar in en sekvens av symboler och matar ut en annan. Varje steg i den utgående sekvensen beror olika mycket på steg i den ingående sekvensen givet vad denna attention mekanism lärt sig vara relevant. Dessa modeller används för översättning, sammanfattning och textgenerering och har överträffat andra arkitekturer som Recurrent Neural Networks, RNNs samt Generative Adversarial Networks. Problemformuleringen för denna avhandling kom från företaget Syndata och är relaterat till denna arkitektur: givet tabellbaserad data, implementera en Transformer som genererar textfält beroende av informationen i de medföljande tabellfälten. Syndata har tidigare implementerat ett RNN för detta ändamål men är övertygande om att en Transformer kan prestera bättre. Målet för denna avhandling är att implementera en Transformer och jämföra med den tidigare implementationen med hypotesen att den kommer att prestera bättre. Det underliggande målet är att givet data i tabellform kunna generera ny syntetisk data, användbar för industrin, där problem kring integritet och privat information kan minimeras. Fyra modeller implementerades: en Transformermodel baserad på GPT- arkitekturen[ 2], en LSTM[3]-modell, en encoder-decoder Transformer och en BiLSTM-modell. De två förstnämnda modellerna är auto-regressiva och de senare två är sequence-to-sequence som har en encoder-decoder arkitektur. Dessa modeller utvärderades och jämfördes givet tre kriterier: hur lik sannolikhetsfördelningen mellan den verkliga och den genererade datamängden, hur mycket varje modell baserade generationen på de medföljande fälten och hur mycket verklig data som komprometteras genom synteseringen. Slutsatsen var att Encoder-Decoder varianterna, Transformern och BiLSTM, var bättre för att syntesera data i tabellformat, där utdatan (eller fälten som ska genereras) ska uppvisa ett starkt beroende av resten av de medföljande fälten. De överträffade GPT- och RNN- modellerna i de aspekter som betyder mest för Syndata att hålla kunddata privat och att den syntetiserade datan ska vara beroende av informationen i de medföljande fälten.
|
37 |
Learning from 3D generated synthetic data for unsupervised anomaly detectionFröjdholm, Hampus January 2021 (has links)
Modern machine learning methods, utilising neural networks, require a lot of training data. Data gathering and preparation has thus become a major bottleneck in the machine learning pipeline and researchers often use large public datasets to conduct their research (such as the ImageNet [1] or MNIST [2] datasets). As these methods begin being used in industry, these challenges become apparent. In factories objects being produced are often unique and may even involve trade secrets and patents that need to be protected. Additionally, manufacturing may not have started yet, making real data collection impossible. In both cases a public dataset is unlikely to be applicable. One possible solution, investigated in this thesis, is synthetic data generation. Synthetic data generation using physically based rendering was tested for unsupervised anomaly detection on a 3D printed block. A small image dataset was gathered of the block as control and a data generation model was created using its CAD model, a resource most often available in industrial settings. The data generation model used randomisation to reduce the domain shift between the real and synthetic data. For testing the data, autoencoder models were trained, both on the real and synthetic data separately and in combination. The material of the block, a white painted surface, proved challenging to reconstruct and no significant difference between the synthetic and real data could be observed. The model trained on real data outperformed the models trained on synthetic and the combined data. However, the synthetic data combined with the real data showed promise with reducing some of the bias intentionally introduced in the real dataset. Future research could focus on creating synthetic data for a problem where a good anomaly detection model already exists, with the goal of transferring some of the synthetic data generation model (such as the materials) to a new problem. This would be of interest in industries where they produce many different but similar objects and could reduce the time needed when starting a new machine learning project.
|
38 |
Deep Learning for 3D Perception: Computer Vision and Tactile SensingGarcia-Garcia, Alberto 23 October 2019 (has links)
The care of dependent people (for reasons of aging, accidents, disabilities or illnesses) is one of the top priority lines of research for the European countries as stated in the Horizon 2020 goals. In order to minimize the cost and the intrusiveness of the therapies for care and rehabilitation, it is desired that such cares are administered at the patient’s home. The natural solution for this environment is an indoor mobile robotic platform. Such robotic platform for home care needs to solve to a certain extent a set of problems that lie in the intersection of multiple disciplines, e.g., computer vision, machine learning, and robotics. In that crossroads, one of the most notable challenges (and the one we will focus on) is scene understanding: the robot needs to understand the unstructured and dynamic environment in which it navigates and the objects with which it can interact. To achieve full scene understanding, various tasks must be accomplished. In this thesis we will focus on three of them: object class recognition, semantic segmentation, and grasp stability prediction. The first one refers to the process of categorizing an object into a set of classes (e.g., chair, bed, or pillow); the second one goes one level beyond object categorization and aims to provide a per-pixel dense labeling of each object in an image; the latter consists on determining if an object which has been grasped by a robotic hand is in a stable configuration or if it will fall. This thesis presents contributions towards solving those three tasks using deep learning as the main tool for solving such recognition, segmentation, and prediction problems. All those solutions share one core observation: they all rely on tridimensional data inputs to leverage that additional dimension and its spatial arrangement. The four main contributions of this thesis are: first, we show a set of architectures and data representations for 3D object classification using point clouds; secondly, we carry out an extensive review of the state of the art of semantic segmentation datasets and methods; third, we introduce a novel synthetic and large-scale photorealistic dataset for solving various robotic and vision problems together; at last, we propose a novel method and representation to deal with tactile sensors and learn to predict grasp stability.
|
39 |
Using Synthetic Data to ModelMobile User Interface InteractionsJalal, Laoa January 2023 (has links)
Usability testing within User Interface (UI) is a central part of assuring high-quality UIdesign that provides good user-experiences across multiple user-groups. The processof usability testing often times requires extensive collection of user feedback, preferablyacross multiple user groups, to ensure an unbiased observation of the potential designflaws within the UI design. Attaining feedback from certain user groups has shown tobe challenging, due to factors such as medical conditions that limits the possibilities ofusers to participate in the usability test. An absence of these hard-to-access groups canlead to designs that fails to consider their unique needs and preferences, which maypotentially result in a worse user experience for these individuals. In this thesis, wetry to address the current gaps within data collection of usability tests by investigatingwhether the Generative Adversarial Network (GAN) framework can be used to generatehigh-quality synthetic user interactions of a particular UI gesture across multiple usergroups. Moreover, a collection UI interaction of 2 user groups, namely the elderlyand young population, was conducted where the UI interaction at focus was thedrag-and-drop operation. The datasets, comprising of both user groups were trainedon separate GANs, both using the doppelGANger architecture, and the generatedsynthetic data were evaluated based on its diversity, how well temporal correlations arepreserved and its performance compared to the real data when used in a classificationtask. The experiment result shows that both GANs produces high-quality syntheticresemblances of the drag-and-drop operation, where the synthetic samples show bothdiversity and uniqueness when compared to the actual dataset. The synthetic datasetacross both user groups also provides similar statistical properties within the originaldataset, such as the per-sample length distribution and the temporal correlationswithin the sequences. Furthermore, the synthetic dataset shows, on average, similarperformance achievements across precision, recall and F1 scores compared to theactual dataset when used to train a classifier to distinguish between the elderly andyounger population drag-and-drop sequences. Further research regarding the use ofmultiple UI gestures, using a single GAN to generate UI interactions across multipleuser groups, and performing a comparative study of different GAN architectures wouldprovide valuable insights of unexplored potentials and possible limitations within thisparticular problem domain.
|
40 |
Synthetic Data Generation for 6D Object Pose and Grasping EstimationMartínez González, Pablo 16 March 2023 (has links)
Teaching a robot how to behave so it becomes completely autonomous is not a simple task. When robotic systems become truly intelligent, interactions with them will feel natural and easy, but nothing could be further from truth. Make a robot understand its surroundings is a huge task that the computer vision field tries to address, and deep learning techniques are bringing us closer. But at the cost of the data. Synthetic data generation is the process of generating artificial data that is used to train machine learning models. This data is generated using computer algorithms and simulations, and is designed to resemble real-world data as closely as possible. The use of synthetic data has become increasingly popular in recent years, particularly in the field of deep learning, due to the shortage of high-quality annotated real-world data and the high cost of collecting it. For that reason, in this thesis we are addressing the task of facilitating the generation of synthetic data with the creation of a framework which leverages advances in modern rendering engines. In this context, the generated synthetic data can be used to train models for tasks such as 6D object pose estimation or grasp estimation. 6D object pose estimation refers to the problem of determining the position and orientation of an object in 3D space, while grasp estimation involves predicting the position and orientation of a robotic hand or gripper that can be used to pick up and manipulate the object. These are important tasks in robotics and computer vision, as they enable robots to perform complex manipulation and grasping tasks. In this work we propose a way of extracting grasping information from hand-object interactions in virtual reality, so that synthetic data can also boost research in that area. Finally, we use this synthetically generated data to test the proposal of applying 6D object pose estimation architectures to grasping region estimation. This idea is based on both problems sharing several underlying concepts such as object detection and orientation. / Enseñar a un robot a ser completamente autónomo no es tarea fácil. Cuando los sistemas robóticos sean realmente inteligentes, las interacciones con ellos parecerán naturales y fáciles, pero nada más lejos de la realidad. Hacer que un robot comprenda y asimile su entorno es una difícil cruzada que el campo de la visión por ordenador intenta abordar, y las técnicas de aprendizaje profundo nos están acercando al objetivo. Pero el precio son los datos. La generación de datos sintéticos es el proceso de generar datos artificiales que se utilizan para entrenar modelos de aprendizaje automático. Estos datos se generan mediante algoritmos informáticos y simulaciones, y están diseñados para parecerse lo más posible a los datos del mundo real. El uso de datos sintéticos se ha vuelto cada vez más popular en los últimos años, especialmente en el campo del aprendizaje profundo, debido a la escasez de datos reales anotados de alta calidad y al alto coste de su recopilación. Por ello, en esta tesis abordamos la tarea de facilitar la generación de datos sintéticos con la creación de una herramienta que aprovecha los avances de los motores modernos de renderizado. En este contexto, los datos sintéticos generados pueden utilizarse para entrenar modelos para tareas como la estimación de la pose 6D de objetos o la estimación de agarres. La estimación de la pose 6D de objetos se refiere al problema de determinar la posición y orientación de un objeto en el espacio 3D, mientras que la estimación del agarre implica predecir la posición y orientación de una mano robótica o pinza que pueda utilizarse para coger y manipular el objeto. Se trata de tareas importantes en robótica y visión por computador, ya que permiten a los robots realizar tareas complejas de manipulación y agarre. En este trabajo proponemos una forma de extraer información de agarres a partir de interacciones mano-objeto en realidad virtual, de modo que los datos sintéticos también puedan impulsar la investigación en esa área. Por último, utilizamos estos datos generados sintéticamente para poner a prueba la propuesta de aplicar arquitecturas de estimación de pose 6D de objetos a la estimación de regiones de agarre. Esta propuesta se basa en que ambos problemas comparten varios conceptos subyacentes, como la detección y orientación de objetos. / This thesis has been funded by the Spanish Ministry of Education [FPU17/00166]
|
Page generated in 0.0764 seconds