Global ETD Search

1	Privacy-Preserving Synthetic Medical Data Generation with Deep Learning Torfi, Amirsina 26 August 2020 (has links) Deep learning models demonstrated good performance in various domains such as ComputerVision and Natural Language Processing. However, the utilization of data-driven methods in healthcare raises privacy concerns, which creates limitations for collaborative research. A remedy to this problem is to generate and employ synthetic data to address privacy concerns. Existing methods for artificial data generation suffer from different limitations, such as being bound to particular use cases. Furthermore, their generalizability to real-world problems is controversial regarding the uncertainties in defining and measuring key realistic characteristics. Hence, there is a need to establish insightful metrics (and to measure the validity of synthetic data), as well as quantitative criteria regarding privacy restrictions. We propose the use of Generative Adversarial Networks to help satisfy requirements for realistic characteristics and acceptable values of privacy metrics, simultaneously. The present study makes several unique contributions to synthetic data generation in the healthcare domain. First, we propose a novel domain-agnostic metric to evaluate the quality of synthetic data. Second, by utilizing 1-D Convolutional Neural Networks, we devise a new approach to capturing the correlation between adjacent diagnosis records. Third, we employ ConvolutionalAutoencoders for creating a robust and compact feature space to handle the mixture of discrete and continuous data. Finally, we devise a privacy-preserving framework that enforcesRényi differential privacy as a new notion of differential privacy. / Doctor of Philosophy / Computers programs have been widely used for clinical diagnosis but are often designed with assumptions limiting their scalability and interoperability. The recent proliferation of abundant health data, significant increases in computer processing power, and superior performance of data-driven methods enable a trending paradigm shift in healthcare technology. This involves the adoption of artificial intelligence methods, such as deep learning, to improve healthcare knowledge and practice. Despite the success in using deep learning in many different domains, in the healthcare field, privacy challenges make collaborative research difficult, as working with data-driven methods may jeopardize patients' privacy. To overcome these challenges, researchers propose to generate and utilize realistic synthetic data that can be used instead of real private data. Existing methods for artificial data generation are limited by being bound to special use cases. Furthermore, their generalizability to real-world problems is questionable. There is a need to establish valid synthetic data that overcomes privacy restrictions and functions as a real-world analog for healthcare deep learning data training. We propose the use of Generative Adversarial Networks to simultaneously overcome the realism and privacy challenges associated with healthcare data. Deep learning healthcare synthetic data generation generative adversarial networks privacy.
2	Semantic Segmentation with Carla Simulator Malec, Stanislaw January 2021 (has links) Autonomous vehicles perform semantic segmentation to orient themselves, but training neural networks for semantic segmentation requires large amounts of labeled data. A hand-labeled real-life dataset requires considerable effort to create, so we instead turn to virtual simulators where the segmented labels are known to generate large datasets virtually for free. This work investigates how effective synthetic datasets are in driving scenarios by collecting a dataset from a simulator and testing it against a real-life hand-labeled dataset. We show that we can get a model up and running faster by mixing synthetic and real-life data than traditional dataset collection methods and achieve close to baseline performance. autonomous vehicles synthetic data generation semantic segmentation computer vision carla simulator Computer Sciences Datavetenskap (datalogi)
3	Bayesian Variable Selection with Shrinkage Priors and Generative Adversarial Networks for Fraud Detection Issoufou Anaroua, Amina 01 January 2024 (has links) (PDF) This research paper focuses on fraud detection in the financial industry using Generative Adversarial Networks (GANs) in conjunction with Uni and Multi Variate Bayesian Model with Shrinkage Priors (BMSP). The problem addressed is the need for accurate and advanced fraud detection techniques due to the increasing sophistication of fraudulent activities. The methodology involves the implementation of GANs and the application of BMSP for variable selection to generate synthetic fraud samples for fraud detection using the augmented dataset. Experimental results demonstrate the effectiveness of the BMSP GAN approach in detecting fraud with improved performance compared to other methods. The conclusions drawn highlight the potential of GANs and BMSP for enhancing fraud detection capabilities and suggest future research directions for further improvements in the field. Categorical Data Analysis Data Science
4	Energy-Efficient Private Forecasting on Health Data using SNNs / Energieffektiv privat prognos om hälsodata med hjälp av SNNs Di Matteo, Davide January 2022 (has links) Health monitoring devices, such as Fitbit, are gaining popularity both as wellness tools and as a source of information for healthcare decisions. Predicting such wellness goals accurately is critical for the users to make informed lifestyle choices. The core objective of this thesis is to design and implement such a system that takes energy consumption and privacy into account. This research is modelled as a time-series forecasting problem that makes use of Spiking Neural Networks (SNNs) due to their proven energy-saving capabilities. Thanks to their design that closely mimics natural neural networks (such as the brain), SNNs have the potential to significantly outperform classic Artificial Neural Networks in terms of energy consumption and robustness. In order to prove our hypotheses, a previous research by Sonia et al. [1] in the same domain and with the same dataset is used as our starting point, where a private forecasting system using Long short-term memory (LSTM) is designed and implemented. Their study also implements and evaluates a clustering federated learning approach, which fits well the highly distributed data. The results obtained in their research act as a baseline to compare our results in terms of accuracy, training time, model size and estimated energy consumed. Our experiments show that Spiking Neural Networks trades off accuracy (2.19x, 1.19x, 4.13x, 1.16x greater Root Mean Square Error (RMSE) for macronutrients, calories burned, resting heart rate, and active minutes respectively), to grant a smaller model (19% less parameters an 77% lighter in memory) and a 43% faster training. Our model is estimated to consume 3.36μJ per inference, which is much lighter than traditional Artificial Neural Networks (ANNs) [2]. The data recorded by health monitoring devices is vastly distributed in the real-world. Moreover, with such sensitive recorded information, there are many possible implications to consider. For these reasons, we apply the clustering federated learning implementation [1] to our use-case. However, it can be challenging to adopt such techniques since it can be difficult to learn from data sequences that are non-regular. We use a two-step streaming clustering approach to classify customers based on their eating and exercise habits. It has been shown that training different models for each group of users is useful, particularly in terms of training time; however this is strongly dependent on the cluster size. Our experiments conclude that there is a decrease in error and training time if the clusters contain enough data to train the models. Finally, this study addresses the issue of data privacy by using state of-the-art differential privacy. We apply e-differential privacy to both our baseline model (trained on the whole dataset) and our federated learning based approach. With a differential privacy of ∈= 0.1 our experiments report an increase in the measured average error (RMSE) of only 25%. Specifically, +23.13%, 25.71%, +29.87%, 21.57% for macronutrients (grams), calories burned (kCal), resting heart rate (beats per minute (bpm), and minutes (minutes) respectively. / Hälsoövervakningsenheter, som Fitbit, blir allt populärare både som friskvårdsverktyg och som informationskälla för vårdbeslut. Att förutsäga sådana välbefinnandemål korrekt är avgörande för att användarna ska kunna göra välgrundade livsstilsval. Kärnmålet med denna avhandling är att designa och implementera ett sådant system som tar hänsyn till energiförbrukning och integritet. Denna forskning är modellerad som ett tidsserieprognosproblem som använder sig av SNNs på grund av deras bevisade energibesparingsförmåga. Tack vare deras design som nära efterliknar naturliga neurala nätverk (som hjärnan) har SNNs potentialen att avsevärt överträffa klassiska artificiella neurala nätverk när det gäller energiförbrukning och robusthet. För att bevisa våra hypoteser har en tidigare forskning av Sonia et al. [1] i samma domän och med samma dataset används som utgångspunkt, där ett privat prognossystem som använder LSTM designas och implementeras. Deras studie implementerar och utvärderar också en klustringsstrategi för federerad inlärning, som passar väl in på den mycket distribuerade data. Resultaten som erhållits i deras forskning fungerar som en baslinje för att jämföra våra resultat vad gäller noggrannhet, träningstid, modellstorlek och uppskattad energiförbrukning. Våra experiment visar att Spiking Neural Networks byter ut precision (2,19x, 1,19x, 4,13x, 1,16x större RMSE för makronäringsämnen, förbrända kalorier, vilopuls respektive aktiva minuter), för att ge en mindre modell ( 19% mindre parametrar, 77% lättare i minnet) och 43% snabbare träning. Vår modell beräknas förbruka 3, 36μJ, vilket är mycket lättare än traditionella ANNs [2]. Data som registreras av hälsoövervakningsenheter är enormt spridda i den verkliga världen. Dessutom, med sådan känslig registrerad information finns det många möjliga konsekvenser att överväga. Av dessa skäl tillämpar vi klustringsimplementeringen för federerad inlärning [1] på vårt användningsfall. Det kan dock vara utmanande att använda sådana tekniker eftersom det kan vara svårt att lära sig av datasekvenser som är oregelbundna. Vi använder en tvåstegs streaming-klustringsmetod för att klassificera kunder baserat på deras mat- och träningsvanor. Det har visat sig att det är användbart att träna olika modeller för varje grupp av användare, särskilt när det gäller utbildningstid; detta är dock starkt beroende av klustrets storlek. Våra experiment drar slutsatsen att det finns en minskning av fel och träningstid om klustren innehåller tillräckligt med data för att träna modellerna. Slutligen tar denna studie upp frågan om datasekretess genom att använda den senaste differentiell integritet. Vi tillämpar e-differentiell integritet på både vår baslinjemodell (utbildad på hela datasetet) och vår federerade inlärningsbaserade metod. Med en differentiell integritet på ∈= 0.1 rapporterar våra experiment en ökning av det uppmätta medelfelet (RMSE) på endast 25%. Specifikt +23,13%, 25,71%, +29,87%, 21,57% för makronäringsämnen (gram), förbrända kalorier (kCal), vilopuls (bpm och minuter (minuter). Spiking neural networks differential privacy synthetic data generation smart health care fitness trackers. Spikande neurala nätverk differentiell integritet syntetisk datagenerering smart hälsovård träningsspårare. Computer and Information Sciences Data- och informationsvetenskap
5	Material Artefact Generation / Material Artefact Generation Rončka, Martin January 2019 (has links) Ne vždy je jednoduché získání dostatečně velké a kvalitní datové sady s obrázky zřetelných artefaktů, ať už kvůli nedostatku ze strany zdroje dat nebo složitosti tvorby anotací. To platí například pro radiologii, nebo také strojírenství. Abychom mohli využít moderní uznávané metody strojového učení které se využívají pro klasifikaci, segmentaci a detekci defektů, je potřeba aby byla datová sada dostatečně velká a vyvážená. Pro malé datové sady čelíme problémům jako je přeučení a slabost dat, které způsobují nesprávnou klasifikaci na úkor málo reprezentovaných tříd. Tato práce se zabývá prozkoumáváním využití generativních sítí pro rozšíření a vyvážení datové sady o nové vygenerované obrázky. Za použití sítí typu Conditional Generative Adversarial Networks (CGAN) a heuristického generátoru anotací jsme schopni generovat velké množství nových snímků součástek s defekty. Pro experimenty s generováním byla použita datová sada závitů. Dále byly použity dvě další datové sady keramiky a snímků z MRI (BraTS). Nad těmito dvěma datovými sadami je provedeno zhodnocení vlivu generovaných dat na učení a zhodnocení přínosu pro zlepšení klasifikace a segmentace.
6	Porovnání přístupů ke generování umělých dat / Comparison of Approaches to Synthetic Data Generation Šejvlová, Ludmila January 2017 (has links) The diploma thesis deals with synthetic data, selected approaches to their generation together with a practical task of data generation. The goal of the thesis is to describe the selected approaches to data generation, capture their key advantages and disadvantages and compare the individual approaches to each other. The practical part of the thesis describes generation of synthetic data for teaching knowledge discovery using databases. The thesis includes a basic description of synthetic data and thoroughly explains the process of their generation. The approaches selected for further examination are random data generation, the statistical approach, data generation languages and the ReverseMiner tool. The thesis also describes the practical usage of synthetic data and the suitability of each approach for certain purposes. Within this thesis, educational data Hotel SD were created using the ReverseMiner tool. The data contain relations discoverable with SD (set-difference) GUHA-procedures.
7	Complex Vehicle Modeling: A Data Driven Approach Schoen, Alexander C. 12 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / This thesis proposes an artificial neural network (NN) model to predict fuel consumption in heavy vehicles. The model uses predictors derived from vehicle speed, mass, and road grade. These variables are readily available from telematics devices that are becoming an integral part of connected vehicles. The model predictors are aggregated over a fixed distance traveled (i.e., window) instead of fixed time interval. It was found that 1km windows is most appropriate for the vocations studied in this thesis. Two vocations were studied, refuse and delivery trucks. The proposed NN model was compared to two traditional models. The first is a parametric model similar to one found in the literature. The second is a linear regression model that uses the same features developed for the NN model. The confidence level of the models using these three methods were calculated in order to evaluate the models variances. It was found that the NN models produce lower point-wise error. However, the stability of the models are not as high as regression models. In order to improve the variance of the NN models, an ensemble based on the average of 5-fold models was created. Finally, the confidence level of each model is analyzed in order to understand how much error is expected from each model. The mean training error was used to correct the ensemble predictions for five K-Fold models. The ensemble K-fold model predictions are more reliable than the single NN and has lower confidence interval than both the parametric and regression models. Neural Network Prediction Fuel Consumption Improvement Ensemble Learning Refuse Truck Complex System Modeling Delivery Truck Vehicle Routing SAE J1321 Synthetic Data Generation Aerodynamic Speed Characteristic Acceleration Feature Importance Influence of Weights Machine Learning Point-wise Error Artificial Neural Network
8	[en] AN APPROACH BASED ON INTERACTIVE MACHINE LEARNING AND NATURAL INTERACTION TO SUPPORT PHYSICAL REHABILITATION / [pt] UMA ABORDAGEM BASEADA NO APRENDIZADO DE MÁQUINA INTERATIVO E INTERAÇÃO NATURAL PARA APOIO À REABILITAÇÃO FÍSICA JESSICA MARGARITA PALOMARES PECHO 10 August 2021 (has links) [pt] A fisioterapia visa melhorar a funcionalidade física das pessoas, procurando atenuar as incapacidades causadas por alguma lesão, distúrbio ou doença. Nesse contexto, diversas tecnologias computacionais têm sido desenvolvidas com o intuito de apoiar o processo de reabilitação, como as tecnologias adaptáveis para o usuário final. Essas tecnologias possibilitam ao fisioterapeuta adequar aplicações e criarem atividades com características personalizadas de acordo com as preferências e necessidades de cada paciente. Nesta tese é proposta uma abordagem de baixo custo baseada no aprendizado de máquina interativo (iML - Interactive Machine Learning) que visa auxiliar os fisioterapeutas a criarem atividades personalizadas para seus pacientes de forma fácil e sem a necessidade de codificação de software, a partir de apenas alguns exemplos em vídeo RGB (capturadas por uma câmera de vídeo digital) Para tal, aproveitamos a estimativa de pose baseada em aprendizado profundo para rastrear, em tempo real, as articulações-chave do corpo humano a partir de dados da imagem. Esses dados são processados como séries temporais por meio do algoritmo Dynamic Time Warping em conjunto com com o algoritmo K-Nearest Neighbors para criar um modelo de aprendizado de máquina. Adicionalmente, usamos um algoritmo de detecção de anomalias com o intuito de avaliar automaticamente os movimentos. A arquitetura de nossa abordagem possui dois módulos: um para o fisioterapeuta apresentar exemplos personalizados a partir dos quais o sistema cria um modelo para reconhecer esses movimentos; outro para o paciente executar os movimentos personalizados enquanto o sistema avalia o paciente. Avaliamos a usabilidade de nosso sistema com fisioterapeutas de cinco clínicas de reabilitação. Além disso, especialistas avaliaram clinicamente nosso modelo de aprendizado de máquina. Os resultados indicam que a nossa abordagem contribui para avaliar automaticamente os movimentos dos pacientes sem monitoramento direto do fisioterapeuta, além de reduzir o tempo necessário do especialista para treinar um sistema adaptável. / [en] Physiotherapy aims to improve the physical functionality of people, seeking to mitigate the disabilities caused by any injury, disorder or disease. In this context, several computational technologies have been developed in order to support the rehabilitation process, such as the end-user adaptable technologies. These technologies allow the physiotherapist to adapt applications and create activities with personalized characteristics according to the preferences and needs of each patient. This thesis proposes a low-cost approach based on interactive machine learning (iML) that aims to help physiotherapists to create personalized activities for their patients easily and without the need for software coding, from just a few examples in RGB video (captured by a digital video camera). To this end, we take advantage of pose estimation based on deep learning to track, in real time, the key joints of the human body from image data. This data is processed as time series using the Dynamic Time Warping algorithm in conjunction with the K-Nearest Neighbors algorithm to create a machine learning model. Additionally, we use an anomaly detection algorithm in order to automatically assess movements. The architecture of our approach has two modules: one for the physiotherapist to present personalized examples from which the system creates a model to recognize these movements; another to the patient performs personalized movements while the system evaluates the patient. We assessed the usability of our system with physiotherapists from five rehabilitation clinics. In addition, experts have clinically evaluated our machine learning model. The results indicate that our approach contributes to automatically assessing patients movements without direct monitoring by the physiotherapist, in addition to reducing the specialist s time required to train an adaptable system. [pt] DETECCAO DE ANOMALIAS [pt] CRIACAO DE DADOS SINTETICOS [pt] REABILITACAO FISICA [pt] TECNOLOGIAS ADAPTAVEIS [pt] APRENDIZADO DE MAQUINA INTERATIVO [en] ANOMALY DETECTION [en] SYNTHETIC DATA GENERATION [en] PHYSICAL REHABILITATION [en] ADAPTATIVE TECHNOLOGIES [en] INTERACTIVE MACHINE LEARNING
9	Multivariate Time Series Data Generation using Generative Adversarial Networks : Generating Realistic Sensor Time Series Data of Vehicles with an Abnormal Behaviour using TimeGAN Nord, Sofia January 2021 (has links) Large datasets are a crucial requirement to achieve high performance, accuracy, and generalisation for any machine learning task, such as prediction or anomaly detection, However, it is not uncommon for datasets to be small or imbalanced since gathering data can be difficult, time-consuming, and expensive. In the task of collecting vehicle sensor time series data, in particular when the vehicle has an abnormal behaviour, these struggles are present and may hinder the automotive industry in its development. Synthetic data generation has become a growing interest among researchers in several fields to handle the struggles with data gathering. Among the methods explored for generating data, generative adversarial networks (GANs) have become a popular approach due to their wide application domain and successful performance. This thesis focuses on generating multivariate time series data that are similar to vehicle sensor readings from the air pressures in the brake system of vehicles with an abnormal behaviour, meaning there is a leakage somewhere in the system. A novel GAN architecture called TimeGAN was trained to generate such data and was then evaluated using both qualitative and quantitative evaluation metrics. Two versions of this model were tested and compared. The results obtained proved that both models learnt the distribution and the underlying information within the features of the real data. The goal of the thesis was achieved and can become a foundation for future work in this field. / När man applicerar en modell för att utföra en maskininlärningsuppgift, till exempel att förutsäga utfall eller upptäcka avvikelser, är det viktigt med stora dataset för att uppnå hög prestanda, noggrannhet och generalisering. Det är dock inte ovanligt att dataset är små eller obalanserade eftersom insamling av data kan vara svårt, tidskrävande och dyrt. När man vill samla tidsserier från sensorer på fordon är dessa problem närvarande och de kan hindra bilindustrin i dess utveckling. Generering av syntetisk data har blivit ett växande intresse bland forskare inom flera områden som ett sätt att hantera problemen med datainsamling. Bland de metoder som undersökts för att generera data har generative adversarial networks (GANs) blivit ett populärt tillvägagångssätt i forskningsvärlden på grund av dess breda applikationsdomän och dess framgångsrika resultat. Denna avhandling fokuserar på att generera flerdimensionell tidsseriedata som liknar fordonssensoravläsningar av lufttryck i bromssystemet av fordon med onormalt beteende, vilket innebär att det finns ett läckage i systemet. En ny GAN modell kallad TimeGAN tränades för att genera sådan data och utvärderades sedan både kvalitativt och kvantitativt. Två versioner av denna modell testades och jämfördes. De erhållna resultaten visade att båda modellerna lärde sig distributionen och den underliggande informationen inom de olika signalerna i den verkliga datan. Målet med denna avhandling uppnåddes och kan lägga grunden för framtida arbete inom detta område. Time Series Data Generation Generative Adversarial Network Deep Neural Network Data Augmentation Synthetic Data Generation Generering av Tidsseriedata Generativa Motstridande Nätverk Djupa Neurala Nätverk Dataökning Syntetisk Datagenerering Computer and Information Sciences Data- och informationsvetenskap
10	Augmenting High-Dimensional Data with Deep Generative Models / Högdimensionell dataaugmentering med djupa generativa modeller Nilsson, Mårten January 2018 (has links) Data augmentation is a technique that can be performed in various ways to improve the training of discriminative models. The recent developments in deep generative models offer new ways of augmenting existing data sets. In this thesis, a framework for augmenting annotated data sets with deep generative models is proposed together with a method for quantitatively evaluating the quality of the generated data sets. Using this framework, two data sets for pupil localization was generated with different generative models, including both well-established models and a novel model proposed for this purpose. The unique model was shown both qualitatively and quantitatively to generate the best data sets. A set of smaller experiments on standard data sets also revealed cases where this generative model could improve the performance of an existing discriminative model. The results indicate that generative models can be used to augment or replace existing data sets when training discriminative models. / Dataaugmentering är en teknik som kan utföras på flera sätt för att förbättra träningen av diskriminativa modeller. De senaste framgångarna inom djupa generativa modeller har öppnat upp nya sätt att augmentera existerande dataset. I detta arbete har ett ramverk för augmentering av annoterade dataset med hjälp av djupa generativa modeller föreslagits. Utöver detta så har en metod för kvantitativ evaulering av kvaliteten hos genererade data set tagits fram. Med hjälp av detta ramverk har två dataset för pupillokalisering genererats med olika generativa modeller. Både väletablerade modeller och en ny modell utvecklad för detta syfte har testats. Den unika modellen visades både kvalitativt och kvantitativt att den genererade de bästa dataseten. Ett antal mindre experiment på standardiserade dataset visade exempel på fall där denna generativa modell kunde förbättra prestandan hos en existerande diskriminativ modell. Resultaten indikerar att generativa modeller kan användas för att augmentera eller ersätta existerande dataset vid träning av diskriminativa modeller. GAN GANs machine learning deep learning generative model generative models deep generative model deep generative models generative adversarial networks VAE VAEs variational autoencoder variational autoencoders autoencoder auto encoder encoder decoder computer vision eye tracking pupil localization pupil eyes eye synthetic data big data data generation synthetic data generation neural networks neural network high-dimensional data high-resolution images. Computer Sciences Datavetenskap (datalogi)

Search results