Global ETD Search

61	Privacy preserving data access mechanism for health data / Sekretessbevarande dataåtkomstmekanism för hälsodata Abdi Dahir, Najiib, Dahir Ali, Ikran January 2023 (has links) Due to the rise of digitalization and the growing amount of data, ensuring the integrity and security of patient data has become increasingly vital within the healthcare industry, which has traditionally managed substantial quantities of sensitive patient and personal information. This bachelor's thesis focused on designing and implementing a secure data sharing infrastructure to protect the integrity and confidentiality of patient data. Synthetic data was used to enable access for researchers and students in regulated environments without compromising patient privacy. The project successfully achieved its goals by evaluating different privacy-preserving mechanisms and developing a machine learning-based application to demonstrate the functionality of the secure data sharing infrastructure. Despite some challenges, the chosen algorithms showed promising results in terms of privacy preservation and statistical similarity. Ultimately, the use of synthetic data can promote fair decision-making processes and contribute to secure data sharing practices in the healthcare industry. / Hälso- och sjukvårdsbranschen har länge varit en sektor som hanterar stora mängder känsliga patientdata och personuppgifter. Integriteten och säkerheten hos patientdata har blivit allt viktigare som en följd av ökad datavolym och digitalisering. Detta examensarbete fokuserade på att utforma och implementera en säker datadelning infrastruktur för att skydda integritet och sekretess för patientdata. Syntetisk data användes för att möjliggöra tillgång för forskare och studenter i reglerade miljöer utan att riskera patienters privatliv. Projektet lyckades genom att utvärdera olika integritetsbevarande mekanismer och skapa en maskininlärningsbaserad applikation för att visa den säkra datadelningsinfrastrukturens funktionalitet. Trots vissa utmaningar visade de valda algoritmerna lovande resultat i fråga om integritetsbevarande och statistisk likhet. Slutligen kan användningen av syntetiska data främja rättvisa beslutsprocesser och bidra till säkra datadelningspraxis inom hälso- och sjukvårdsbranschen. Secure data sharing synthetic data privacy preservation healthcare machine learning. Säker datadelning syntetiska data integritetsbevarande hälso- och sjukvård maskininlärning Computer Engineering Datorteknik
62	Synthesizing of brain MRE wave data / Syntetistering av vågrörelsedata för hjärnan med MRE Yuliuhina, Maryia January 2023 (has links) Magnetic resonance elastography (MRE) is an imaging technique that allows for non-invasive access to the physical properties of body tissues. MRE has great potential, but it is difficult to conduct research due to the time-consuming estimation of stiffness maps, which could be speeded up by using neural network. However, there is not enough real data to train one, thus, synthetic data is needed. To create synthetic data three techniques of simulating tissue displacement due to wave propagation was explored, including solving differential equations for a system of coupled harmonic oscillators (CHO method) and using two different functions from the k-Wave toolbox. Each of the three methods demonstrated the ability to replicate the displacement pattern in a phantom with a simple structure. The CHO method and \texttt{kspaceFirstOrder} function of the k-Wave toolbox showed the best performance when simulating displacement in a 2D brain slice. The models are not very accurate, but capture general features of displacement in a brain and hold potential for future improvement. / Magnetresonans-elastografi (MRE) är en avbildningsteknik som möjliggör icke-invasiv åtkomst till de fysiska egenskaperna hos olika vävnader. MRE har stor potential, men forskning inom ämnet försvåras på grund av den tidskrävande beräkningen av elasticitetskartorna, vilket kan påskyndas med hjälp av ett neuralt nätverk. Dock finns det inte tillräckligt med experimentiell data för att träna ett sådant nätverk, och därför behövs syntetisk data. För att skapa sådan syntetisk MRE-data utforskades tre tekniker för att simulera vågrörelser i hjärnvävnad; dessa tekniker inkluderar lösning av differentialekvationer för ett system av kopplade harmoniska oscillatorer (CHO-metoden) och användning av två olika funktioner från det Matlab-baserade programmet k-Wave. Var och en av de tre metoderna visade potential att återskapa vågsmönstret i en enkel strukturerad fantom. CHO-metoden och funktionen kspaceFirstOrder från k-Wave visade bäst prestanda vid simulering av vågrörelser i ett 2D-segment av hjärnan. Modellerna visade sig inte vara särskilt precisa, men fångar allmänna, kvalitativa, egenskaper av vågrörelser i hjärnan och uppvisar potential för framtida förbättring. MR elastography brain MRE synthetic data computational modeling shear wave elastography MR-elastografi hjärn MRE syntetiska data beräkningsmodellering skjuvvåg elastografi Medical Engineering Medicinteknik
63	Inferring 3D trajectory from monocular data using deep learning / Inferens av 3D bana utifrån 2D data med djupa arkitekturer Sellstedt, Victor January 2021 (has links) Trajectory estimation, with regards to reconstructing a 3D trajectory from a 2D trajectory, is commonly achieved using stereo or multi camera setups. Although projections from 3D to 2D suffer significant information loss, some methods approach this problem from a monocular perspective to address limitations of multi camera systems, such as requiring points in to be observed by more than one camera. This report explores how deep learning methodology can be applied to estimation of golf balls’ 3D trajectories using features from synthetically generated monocular data. Three neural network architectures for times series analysis, Long Short-Term Memory (LSTM), Bidirectional LSTM(BLSTM), and Temporal Convolutional Network (TCN); are compared to a simpler Multi Layer Perceptron (MLP) baseline and theoretical stereo error. The results show the models’ performances are varied with median performances often significantly better than average, caused by some predictions with very large errors. Overall the BLSTM performed best of all models both quantitatively and qualitatively, for some ranges with a lower error than a stereo estimate with an estimated disparity error of 1. Although the performance of the proposed monocular approaches do not outperform a stereo system with a lower disparity error, the proposed approaches could be good alternatives where stereo solutions might not be possible. / Lösningar för inferens av 3D banor utifrån 2D sekvenser använder sig ofta av två eller fler kameror som datakällor. Trots att mycket information förloras i projektionen till kamerabilden använder sig vissa lösningar sig av endast en kamera. En sådan monokulär lösning kan vara mer fördelaktiga än multikamera lösningar i vissa fall, såsom när ett objekt endast är synligt av ena kamera. Denna rapport undersöker hur metoder baserade på djupa arkitekturer kan användas för att uppskatta golfbollars 3D banor med variabler som skapas utifrån syntetiskt genererad monokulär data. Tre olika arkitekturer för tidsserieanalys Long Short-Term Memory (LSTM), Bidirectional LSTM (BLSTM) och Temporal Convolutional Neural Network (TCN) jämförs mot en enklare Multi Layer Perceptron (MLP) och teoretiska stereo-fel. Resultaten visar att modellerna har en varierad prestation med median resultaten ofta mycket bättre än medelvärdena, på grund av några förutsägelser med stora fel. Överlag var den bästa modellen BLSTM:en både kvantitativt och kvalitativt samt bättre än stereo lösningen med högre fel för vissa intervall. Resultaten visar dock på att modellerna är tydligt sämre en stereo systemet med lägre fel. Trots detta kan de föreslagna metoderna utgöra bra alternativ för lösningar där stereo system inte kan användas. Deep Learning Monocular trajectory estimation Time series prediction Synthetic data Djupinlärning Inferens från monkulära sekvenser Tidsserieanalys Syntetisk data Computer and Information Sciences Data- och informationsvetenskap
64	Data Synthesis in Deep Learning for Object Detection / Syntetiskt Data i Djupinlärning för Objektdetektion Haddad, Josef January 2021 (has links) Deep neural networks typically require large amounts of labeled data for training, but a problem is that collecting data can be expensive. Our study aims at revealing insights into how training with synthetic data affects performance in real-world object detection tasks. This is achieved by synthesising annotated image data in the automotive domain using a car simulator for the tasks of detecting cars in images from the real world. We furthermore perform experiments in the aviation domain where we incorporate synthetic images extracted from an airplane simulator with real-world data for detecting runways. In our experiments, the synthetic data sets are leveraged by pre-training a deep learning based object detector, which is then fine-tuned and evaluated on real-world data. We evaluate this approach on three real-world data sets across the two domains and furthermore evaluate how the classification performance scales as synthetic and real-world data varies in the automotive domain. In the automotive domain, we additionally perform image-to-image translation both from the synthetic domain to the real-world domain, and the other way around, as a means of domain adaptation to assess whether it further improves performance. The results show that adding synthetic data improves performance in the automotive domain and that pre-training with more synthetic data results in further performance improvements, but that the performance boost of adding more real-world data exceeds that of the addition of more synthetic data. We can not conclude that using CycleGAN for domain adaptation further improves the performance. / Djupa neurala nätverk behöver normalt stora mängder annoterad träningsdata, men ett problem är att data kan vara dyrt att sampla in. Syftet med denna studie är att undersöka hur träning med syntetiskt data påverkar en objektdetektors prestanda på verkligt data. Detta undersöks genom att syntetisera data i bildomänen med hjälp av en bilsimulator för uppgiften att identifiera bilar i den verkliga världen. Dessutom utför vi experiment i flygdomänen där vi inkorporerar syntetiskt flygbilddata från en flygsimulator med riktigt flygdata för detektion av landningsbanor. Det syntetiska datat i vår studie används till att förträna en djupinlärningsbaserad objektdetektor, som sedan fintränas och evalueras på data insamlat från den verkliga världen. Vi evaluerar denna approach på totalt tre riktiga dataset över våra två domäner och dessutom undersöker vi hur prestandan skalar när mängden syntetiskt och riktigt data varierar i bildomänen. I bildomänen tillämpar vi dessutom bildtillbild translation mellan de syntetiska och riktiga bilderna för att undersöka om denna sorts domänadaption förbättrar prestandan. Resultaten visar att tillägg av syntetiskt data förbättrar prestandan i bildomänen och att förträning med en större mängd syntetiskt data resulterar i ytterligare prestandaförbättringar, men att prestandaförbättringen när mer riktigt data läggs till är större i jämförelse. Vi kan inte dra slutsatsen att domänadaption med CycleGAN leder till förbättrad prestanda. Deep Learning Computer vision Object detection Synthetic data Domain Adaptation Machine Learning Djupinlärning Datorseende Objektdetektion Syntetiskt data Domänadaption Maskininlärning Computer and Information Sciences Data- och informationsvetenskap
65	Synthetic Data Generation for the Financial Industry Using Generative Adversarial Networks / Generering av Syntetisk Data för Finansbranchen med Generativa Motstridande Nätverk Ljung, Mikael January 2021 (has links) Following the introduction of new laws and regulations to ensure data protection in GDPR and PIPEDA, interests in technologies to protect data privacy have increased. A promising research trajectory in this area is found in Generative Adversarial Networks (GAN), an architecture trained to produce data that reflects the statistical properties of its underlying dataset without compromising the integrity of the data subjects. Despite the technology’s young age, prior research has made significant progress in the generation process of so-called synthetic data, and the current models can generate images with high-quality. Due to the architecture’s success with images, it has been adapted to new domains, and this study examines its potential to synthesize financial tabular data. The study investigates a state-of-the-art model within tabular GANs, called CTGAN, together with two proposed ideas to enhance its generative ability. The results indicate that a modified training dynamic and a novel early stopping strategy improve the architecture’s capacity to synthesize data. The generated data presents realistic features with clear influences from its underlying dataset, and the inferred conclusions on subsequent analyses are similar to those based on the original data. Thus, the conclusion is that GANs has great potential to generate tabular data that can be considered a substitute for sensitive data, which could enable organizations to have more generous data sharing policies. / Med striktare förhållningsregler till hur data ska hanteras genom GDPR och PIPEDA har intresset för anonymiseringsmetoder för att censurera känslig data aktualliserats. En lovande teknik inom området återfinns i Generativa Motstridande Nätverk, en arkitektur som syftar till att generera data som återspeglar de statiska egenskaperna i dess underliggande dataset utan att äventyra datasubjektens integritet. Trots forskningsfältet unga ålder har man gjort stora framsteg i genereringsprocessen av så kallad syntetisk data, och numera finns det modeller som kan generera bilder av hög realistisk karaktär. Som ett steg framåt i forskningen har arkitekturen adopterats till nya domäner, och den här studien syftar till att undersöka dess förmåga att syntatisera finansiell tabelldata. I studien undersöks en framträdande modell inom forskningsfältet, CTGAN, tillsammans med två föreslagna idéer i syfte att förbättra dess generativa förmåga. Resultaten indikerar att en förändrad träningsdynamik och en ny optimeringsstrategi förbättrar arkitekturens förmåga att generera syntetisk data. Den genererade datan håller i sin tur hög kvalité med tydliga influenser från dess underliggande dataset, och resultat på efterföljande analyser mellan datakällorna är av jämförbar karaktär. Slutsatsen är således att GANs har stor potential att generera tabulär data som kan betrakatas som substitut till känslig data, vilket möjliggör för en mer frikostig delningspolitik av data inom organisationer. Deep Learning Generative Models GAN CTGAN Synthetic Data Financial Industry Djupinlärning generativ modellering GAN CTGAN Syntetisk Data Finansindustrin Computer and Information Sciences Data- och informationsvetenskap
66	Digital Platform Dynamics: Governance, Market Design and AI Integration Ilango Guru Muniasamy (19149178) 17 July 2024 (has links) <p dir="ltr">In my dissertation, I examine the dynamics of digital platforms, starting with the governance practices of established platforms, then exploring innovative design approaches, and finally the integration of advanced AI technologies in platforms. I structure this exploration into three essays: in the first essay, I discuss moderation processes in online communities; in the second, I propose a novel design for a blockchain-based green bond exchange; and in the third, I examine how AI-based decision-making platforms can be enhanced through synthetic data generation.</p><p dir="ltr">In my first essay, I investigate the role of moderation in online communities, focusing on its effect on users' participation in community moderation. Using data from a prominent online forum, I analyze changes in users' moderation actions (upvoting and downvoting of others' content) after they experience a temporary account suspension. While I find no significant change in their upvoting behavior, my results suggest that users downvote more after their suspension. Combined with findings on lower quality and conformity with the community while downvoting, the results suggest an initial increase in hostile moderation after suspension, although these effects dissipate over time. The short-term hostility post-suspension has the potential to negatively affect platform harmony, thus revealing the complexities of disciplinary actions and their unintended consequences.</p><p dir="ltr">In the second essay, I shift from established platforms to innovations in platform design, presenting a novel hybrid green bond exchange that integrates blockchain technology with thermodynamic principles to address market volatility and regulatory uncertainty. The green bond market, despite its high growth, faces issues like greenwashing, liquidity constraints, and limited retail investor participation. To tackle these challenges, I propose an exchange framework that uses blockchain for green bond tokenization, enhancing transparency and accessibility. By conceptualizing the exchange as a thermodynamic system, I ensure economic value is conserved and redistributed, promoting stability and efficiency. I include key mechanisms in the design to conserve value in the exchange and deter speculative trading. Through simulations, I demonstrate significant improvements in market stability, liquidity, and efficiency, highlighting the effectiveness of this interdisciplinary approach and offering a robust framework for future financial system development.</p><p dir="ltr">In the third essay, I explore the integration of advanced AI technologies, focusing on how large language models (LLMs) like GPT can be adapted for specialized fields such as education policy and decision-making. To address the need for high-quality, domain-specific training data, I develop a methodology that combines agent-based simulation (ABS) with synthetic data generation and GPT fine-tuning. This enhanced model provides accurate, contextually relevant, and interpretable insights for educational policy scenarios. My approach addresses challenges such as data scarcity, privacy concerns, and the need for diverse, representative data. Experiments show significant improvements in model performance and robustness, offering policymakers a powerful tool for exploring complex scenarios and making data-driven decisions. This research advances the literature on synthetic data in AI and agent-based modeling in education, demonstrating the adaptability of large language models to specialized domains.</p> Business information systems online community discussion platform user participation content moderation user suspensions green bonds blockchain exchange design science research synthetic data agent based modeling llm finetuning
67	Finer grained evaluation methods for better understanding of deep neural network representations Bordes, Florian 08 1900 (has links) Établir des méthodes d'évaluation pour les systèmes d'intelligence artificielle (IA) est une étape importante pour précisément connaître leurs limites et ainsi prévenir les dommages qu'ils pourraient causer et savoir quels aspects devraient être améliorés. Cela nécessite d'être en mesure de dresser des portraits précis des limitations associées à un système d'IA donné. Cela demande l'accès à des outils et des principes fiables, transparent, à jour et faciles à utiliser. Malheureusement, la plupart des méthodes d'évaluation utilisées à ce jour ont un retard significatif par rapport aux performances toujours croissantes des réseaux de neurones artificiels. Dans cette thèse par articles, je présente des méthodes et des principes d'évaluation plus rigoureux pour obtenir une meilleur compréhension des réseaux de neurones et de leurs limitations. Dans le premier article, je présente Representation Conditional Diffusion Model (RCDM), une méthode d'évaluation à l'état de l'art qui permet, à partir d'une représentation donnée -- par exemple les activations d'une couche donnée d'un réseau de neurones artificiels -- de générer une image. En utilisant les dernières avancées dans la génération d'images, RCDM permet aux chercheur·euse·s de visualiser l'information contenue à l'intérieur d'une représentation. Dans le deuxième article, j'introduis la régularisation par Guillotine qui est une technique bien connue dans la littérature sur l'apprentissage par transfert mais qui se présente différemment dans la littérature sur l'auto-apprentissage. Pour améliorer la généralisation à travers différentes tâches, on montre qu'il est important d'évaluer un modèle en coupant un certain nombre de couches. Dans le troisième article, j'introduis le score DéjaVu qui quantifie à quel point un réseau de neurones a mémorisé les données d'entraînement. Ce score utilise une petite partie d'une image d'entraînement puis évalue quelles informations il est possible d'inférer à propos du reste de l'image. Dans le dernier article, je présente les jeux de données photo-réalistes PUG (Photorealistic Unreal Graphics) que nous avons développés. Au contraire de données réelles, pour lesquelles générer des annotations est un processus coûteux, l'utilisation de données synthétiques offre un contrôle total sur la scène générée et sur les annotations. On utilise un moteur de jeux vidéo qui permet la synthèse d'images photo-réalistes de haute qualité, afin d'évaluer la robustesse d'un réseau de neurones pré-entraîné, ceci sans avoir besoin d'adapter ce réseau avec un entraînement additionnel. / Carefully designing benchmarks to evaluate the safety of Artificial Intelligent (AI) agents is a much-needed step to precisely know the limits of their capabilities and thus prevent potential damages they could cause if used beyond these limits. Researchers and engineers should be able to draw precise pictures of the failure modes of a given AI system and find ways to mitigate them. Drawing such portraits requires reliable tools and principles that are transparent, up-to-date, and easy to use by practitioners. Unfortunately, most of the benchmark tools used in research are often outdated and quickly fall behind the fast pace of improvement of the capabilities of deep neural networks. In this thesis by article, I focus on establishing more fine-grained evaluation methods and principles to gain a better understanding of deep neural networks and their limitations. In the first article, I present Representation Conditional Diffusion Model (RCDM), a state-of-the-art visualization method that can map any deep neural network representation to the image space. Using the latest advances in generative modeling, RCDM sheds light on what is learned by deep neural networks by allowing practitioners to visualize the richness of a given representation. In the second article, I (re)introduce Guillotine Regularization (GR) -- a trick that has been used for a long time in transfer learning -- from a novel understanding and viewpoint grounded in the self-supervised learning outlook. We show that evaluating a model by removing its last layers is important to ensure better generalization across different downstream tasks. In the third article, I introduce the DejaVu score which quantifies how much models are memorizing their training data. This score relies on leveraging partial information from a given image such as a crop, and evaluates how much information one can retrieve about the entire image based on only this partial content. In the last article, I introduce the Photorealistic Unreal Graphics (PUG) datasets and benchmarks. In contrast to real data for which getting annotations is often a costly and long process, synthetic data offers complete control of the elements in the scene and labeling. In this work, we leverage a powerful game engine that produces high-quality and photorealistic images to evaluate the robustness of pre-trained neural networks without additional finetuning. Apprentissage profond Données synthétiques Apprentissage Auto-Supervisé Mémorisation Évaluation Deep Learning Evaluation Memorization Benchmarks Self-Supervised Learning Synthetic Data
68	Classifying Metal Scrap Piles Using Synthetic Data : Evaluating image classification models trained on synthetic data / Klassificering av metallskrothögar med hjälp av syntetiska data Pedersen, Stian Lockhart January 2024 (has links) Modern deep learning models require large amounts of data to train, and the acquisition of data can be challenging. Synthetic data provides an alternative to manually collecting real data, alleviating problems associated with real data acquisition. For recycling processes, classifying metal scrap piles containing hazardous objects is important, where hazardous objects can be damaging and costly if handled incorrectly. Automatically detecting hazardous objects in metal scrap piles using image classification models requires large amounts of data, and metal scrap piles contain large variations in objects, textures, and lighting. Furthermore, data acquisition can be challenging in the recycling domain, where positive objects can be scarce and manual acquisition setup can be challenging. In this thesis, synthetic images of metal scrap piles in a recycling process are created, intended for training image classification models to detect metal scrap piles containing fire extinguishers or hydraulic cylinders. Synthetic images are created with physically based rendering and domain randomization, rendered with either rasterization or ray tracing engines. Ablation studies are conducted to investigate the effect of using domain randomization. The performance of models trained on purely synthetic datasets is compared to models trained on datasets containing only real images. Furthermore, photorealistic rendering with ray tracing rendering is evaluated by comparing F1 scores between models trained on data sets created with rasterization or ray tracing. The F1 scores show that models trained on purely synthetic data outperform those trained solely on real data when classifying images containing fire extinguishers or hydraulic cylinders. Ablation studies show that domain randomization of textures is beneficial both for the classification of fire extinguishers and for the classification of hydraulic cylinders in metal scrap piles. High dynamic range image lighting randomization does not provide benefits when classifying metal scrap piles containing fire extinguishers, suggesting that other lighting randomization techniques may be more effective. The F1 scores show that synthetically created images using rasterization perform better when classifying metal scrap piles containing fire extinguishers. However, when classifying metal scrap piles containing hydraulic cylinders, images created with ray tracing achieve higher F1 scores. This thesis highlights the potential of synthetic data as an alternative to manually acquiring real data, particularly in domains where data collection is challenging and time-consuming. The results show the effectiveness of domain randomization and physically based rendering techniques in creating realistic and diverse synthetic datasets. Machine Learning Synthetic Data Sim2real Computer Vision 3D Graphics Engines Recycling industry Image Classification
69	Использование моделей глубокого обучения для обнаружения аномалий в логах в процессе разработки программного обеспечения : магистерская диссертация / Utilizing deep learning models to detect log anomalies during software development Дивенко, А. С., Divenko, A. S. January 2024 (has links) Данная работа посвящена применению моделей глубокого обучения для решения этой проблемы в процессе разработки программного обеспечения. Разработан стенд для имитации процесса разработки ПО, на котором были сгенерированы синтетические данные логов из различных сервисов. Объединение разнородных логов позволило создать реалистичный набор данных со скрытыми зависимостями для более сложной задачи поиска аномалий. На созданном наборе данных были применены модели глубокого обучения DeepLog, LogAnomaly и LogBERT. Для каждой модели выполнено обучение и оценка эффективности обнаружения аномалий на тестовой выборке. Разработанный стенд может усложняться и использоваться для дальнейших исследований в области применения глубокого обучения к задаче поиска аномалий в логах в процессе разработки ПО. / This paper focuses on the application of deep learning models to address this problem in the software development. A simulation framework was developed to imitate the software development by generating synthetic log data from different services. Combining heterogeneous logs allowed the creation of a realistic dataset with hidden dependencies for a more complex anomaly search task. DeepLog, LogAnomaly and LogBERT deep learning models were applied on the created dataset. For each model, training and evaluation of anomaly detection performance on a test sample was performed. The developed framework can be extended and used for further research in the application of deep learning to the task of searching for anomalies in logs during software development. MASTER'S THESIS LOG ANOMALIES DEEP LEARNING MODELS SOFTWARE DEVELOPMENT SYNTHETIC DATA АНОМАЛИИ В ЛОГАХ СИНТЕТИЧЕСКИЕ ДАННЫЕ
70	Augmenting High-Dimensional Data with Deep Generative Models / Högdimensionell dataaugmentering med djupa generativa modeller Nilsson, Mårten January 2018 (has links) Data augmentation is a technique that can be performed in various ways to improve the training of discriminative models. The recent developments in deep generative models offer new ways of augmenting existing data sets. In this thesis, a framework for augmenting annotated data sets with deep generative models is proposed together with a method for quantitatively evaluating the quality of the generated data sets. Using this framework, two data sets for pupil localization was generated with different generative models, including both well-established models and a novel model proposed for this purpose. The unique model was shown both qualitatively and quantitatively to generate the best data sets. A set of smaller experiments on standard data sets also revealed cases where this generative model could improve the performance of an existing discriminative model. The results indicate that generative models can be used to augment or replace existing data sets when training discriminative models. / Dataaugmentering är en teknik som kan utföras på flera sätt för att förbättra träningen av diskriminativa modeller. De senaste framgångarna inom djupa generativa modeller har öppnat upp nya sätt att augmentera existerande dataset. I detta arbete har ett ramverk för augmentering av annoterade dataset med hjälp av djupa generativa modeller föreslagits. Utöver detta så har en metod för kvantitativ evaulering av kvaliteten hos genererade data set tagits fram. Med hjälp av detta ramverk har två dataset för pupillokalisering genererats med olika generativa modeller. Både väletablerade modeller och en ny modell utvecklad för detta syfte har testats. Den unika modellen visades både kvalitativt och kvantitativt att den genererade de bästa dataseten. Ett antal mindre experiment på standardiserade dataset visade exempel på fall där denna generativa modell kunde förbättra prestandan hos en existerande diskriminativ modell. Resultaten indikerar att generativa modeller kan användas för att augmentera eller ersätta existerande dataset vid träning av diskriminativa modeller. GAN GANs machine learning deep learning generative model generative models deep generative model deep generative models generative adversarial networks VAE VAEs variational autoencoder variational autoencoders autoencoder auto encoder encoder decoder computer vision eye tracking pupil localization pupil eyes eye synthetic data big data data generation synthetic data generation neural networks neural network high-dimensional data high-resolution images. Computer Sciences Datavetenskap (datalogi)

Search results