Global ETD Search

71	Measuring the Utility of Synthetic Data : An Empirical Evaluation of Population Fidelity Measures as Indicators of Synthetic Data Utility in Classification Tasks / Mätning av Användbarheten hos Syntetiska Data : En Empirisk Utvärdering av Population Fidelity mätvärden som Indikatorer på Syntetiska Datas Användbarhet i Klassifikationsuppgifter Florean, Alexander January 2024 (has links) In the era of data-driven decision-making and innovation, synthetic data serves as a promising tool that bridges the need for vast datasets in machine learning (ML) and the imperative necessity of data privacy. By simulating real-world data while preserving privacy, synthetic data generators have become more prevalent instruments in AI and ML development. A key challenge with synthetic data lies in accurately estimating its utility. For such purpose, Population Fidelity (PF) measures have shown to be good candidates, a category of metrics that evaluates how well the synthetic data mimics the general distribution of the original data. With this setting, we aim to answer: "How well are different population fidelity measures able to indicate the utility of synthetic data for machine learning based classification models?" We designed a reusable six-step experiment framework to examine the correlation between nine PF measures and the performance of four ML for training classification models over five datasets. The six-step approach includes data preparation, training, testing on original and synthetic datasets, and PF measures computation. The study reveals non-linear relationships between the PF measures and synthetic data utility. The general analysis, meaning the monotonic relationship between the PF measure and performance over all models, yielded at most moderate correlations, where the Cluster measure showed the strongest correlation. In the more granular model-specific analysis, Random Forest showed strong correlations with three PF measures. The findings show that no PF measure shows a consistently high correlation over all models to be considered a universal estimator for model performance.This highlights the importance of context-aware application of PF measures and sets the stage for future research to expand the scope, including support for a wider range of types of data and integrating privacy evaluations in synthetic data assessment. Ultimately, this study contributes to the effective and reliable use of synthetic data, particularly in sensitive fields where data quality is vital. / I eran av datadriven beslutsfattning och innovation, fungerar syntetiska data som ett lovande verktyg som bryggar behovet av omfattande dataset inom maskininlärning (ML) och nödvändigheten för dataintegritet. Genom att simulera verklig data samtidigt som man bevarar integriteten, har generatorer av syntetiska data blivit allt vanligare verktyg inom AI och ML-utveckling. En viktig utmaning med syntetiska data är att noggrant uppskatta dess användbarhet. För detta ändamål har mått under kategorin Populations Fidelity (PF) visat sig vara goda kandidater, det är mätvärden som utvärderar hur väl syntetiska datan efterliknar den generella distributionen av den ursprungliga datan. Med detta i åtanke strävar vi att svara på följande: Hur väl kan olika population fidelity mätvärden indikera användbarheten av syntetisk data för maskininlärnings baserade klassifikationsmodeller? För att besvara frågan har vi designat ett återanvändbart sex-stegs experiment ramverk, för att undersöka korrelationen mellan nio PF-mått och prestandan hos fyra ML klassificeringsmodeller, på fem dataset. Sex-stegs strategin inkluderar datatillredning, träning, testning på både ursprungliga och syntetiska dataset samt beräkning av PF-mått. Studien avslöjar förekommandet av icke-linjära relationer mellan PF-måtten och användbarheten av syntetiska data. Den generella analysen, det vill säga den monotona relationen mellan PF-måttet och prestanda över alla modeller, visade som mest medelmåttiga korrelationer, där Cluster-måttet visade den starkaste korrelationen. I den mer detaljerade, modell-specifika analysen visade Random Forest starka korrelationer med tre PF-mått. Resultaten visar att inget PF-mått visar konsekvent hög korrelation över alla modeller för att betraktas som en universell indikator för modellprestanda. Detta understryker vikten av kontextmedveten tillämpning av PF-mått och banar väg för framtida forskning för att utöka omfånget, inklusive stöd för ett bredare utbud för data av olika typer och integrering av integritetsutvärderingar i bedömningen av syntetiska data. Därav, så bidrar denna studie till effektiv och tillförlitlig användning av syntetiska data, särskilt inom känsliga områden där datakvalitet är avgörande. Synthetic Data Machine Learning Population Fidelity Measures Utility Metrics Synthetic Data Quality Evaluation Classification Algorithms Utility Estimation Data Privacy Artificial Intelligence Experiment Framework Model Performance Assessment Syntetisk Data Maskininlärning Population Fidelity Mätvärden Användbarhetsmätvärden Kvalitetsutvärdering av Syntetisk Data Klassificeringsalgoritmer Användbarhetsutvärdering Dataintegritet Artificiell Intelligens AI Experiment Ramverk Utvärdering av Modellprestanda Computer Sciences Datavetenskap (datalogi)
72	Training a Neural Network using Synthetically Generated Data / Att träna ett neuronnät med syntetisktgenererad data Diffner, Fredrik, Manjikian, Hovig January 2020 (has links) A major challenge in training machine learning models is the gathering and labeling of a sufficiently large training data set. A common solution is the use of synthetically generated data set to expand or replace a real data set. This paper examines the performance of a machine learning model trained on synthetic data set versus the same model trained on real data. This approach was applied to the problem of character recognition using a machine learning model that implements convolutional neural networks. A synthetic data set of 1’240’000 images and two real data sets, Char74k and ICDAR 2003, were used. The result was that the model trained on the synthetic data set achieved an accuracy that was about 50% better than the accuracy of the same model trained on the real data set. / Vid utvecklandet av maskininlärningsmodeller kan avsaknaden av ett tillräckligt stort dataset för träning utgöra ett problem. En vanlig lösning är att använda syntetiskt genererad data för att antingen utöka eller helt ersätta ett dataset med verklig data. Denna uppsats undersöker prestationen av en maskininlärningsmodell tränad på syntetisk data jämfört med samma modell tränad på verklig data. Detta applicerades på problemet att använda ett konvolutionärt neuralt nätverk för att tyda tecken i bilder från ”naturliga” miljöer. Ett syntetiskt dataset bestående av 1’240’000 samt två stycken dataset med tecken från bilder, Char74K och ICDAR2003, användes. Resultatet visar att en modell tränad på det syntetiska datasetet presterade ca 50% bättre än samma modell tränad på Char74K. Synthetic data set Generating synthetic data set Machine learning Deep Learning Convolutional Neural Networks Machine learning model Character recognition in natural images Char74k ICDAR2003. Syntetiskt dataset Generera syntetiskt data Maskininlärning Maskininlärningsmodell Djuplärning Konvolutionära neurala nätverk teckenigenkänning i bilder Char74k ICDAR2003 Computer Sciences Datavetenskap (datalogi)
73	Complex Vehicle Modeling: A Data Driven Approach Schoen, Alexander C. 12 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / This thesis proposes an artificial neural network (NN) model to predict fuel consumption in heavy vehicles. The model uses predictors derived from vehicle speed, mass, and road grade. These variables are readily available from telematics devices that are becoming an integral part of connected vehicles. The model predictors are aggregated over a fixed distance traveled (i.e., window) instead of fixed time interval. It was found that 1km windows is most appropriate for the vocations studied in this thesis. Two vocations were studied, refuse and delivery trucks. The proposed NN model was compared to two traditional models. The first is a parametric model similar to one found in the literature. The second is a linear regression model that uses the same features developed for the NN model. The confidence level of the models using these three methods were calculated in order to evaluate the models variances. It was found that the NN models produce lower point-wise error. However, the stability of the models are not as high as regression models. In order to improve the variance of the NN models, an ensemble based on the average of 5-fold models was created. Finally, the confidence level of each model is analyzed in order to understand how much error is expected from each model. The mean training error was used to correct the ensemble predictions for five K-Fold models. The ensemble K-fold model predictions are more reliable than the single NN and has lower confidence interval than both the parametric and regression models. Neural Network Prediction Fuel Consumption Improvement Ensemble Learning Refuse Truck Complex System Modeling Delivery Truck Vehicle Routing SAE J1321 Synthetic Data Generation Aerodynamic Speed Characteristic Acceleration Feature Importance Influence of Weights Machine Learning Point-wise Error Artificial Neural Network
74	Generation of Synthetic Traffic Sign Images using Diffusion Models Carlson, Johanna, Byman, Lovisa January 2023 (has links) In the area of Traffic Sign Recognition (TSR), deep learning models are trained to detect and classify images of traffic signs. The amount of data available to train these models is often limited, and collecting more data is time-consuming and expensive. A possible complement to traditional data acquisition, is to generate synthetic images with a generative machine learning model. This thesis investigates the use of denoising diffusion probabilistic models for generating synthetic data of one or multiple traffic sign classes, when providing different amount of real images for that class (classes). In the few-sample method, the number of images used was from 1 to 1000, and zero images were used in the zero-shot method. The results from the few-sample method show that combining synthetic images with real images when training a traffic sign classifier, increases the performance in 3 out of 6 investigated cases. The results indicate that the developed zero-shot method is useful if further refined, and potentially could enable generation of realistic images of signs not seen in the training data. Machine Learning Computer Vision Diffusion Models Traffic Sign Recognition Traffic Sign Classification Synthetic Data Maskininlärning Datorseende Diffusionsmodeller Trafikskyltsigenkänning Trafikskyltsklassificering Syntetisk data
75	[en] AN APPROACH BASED ON INTERACTIVE MACHINE LEARNING AND NATURAL INTERACTION TO SUPPORT PHYSICAL REHABILITATION / [pt] UMA ABORDAGEM BASEADA NO APRENDIZADO DE MÁQUINA INTERATIVO E INTERAÇÃO NATURAL PARA APOIO À REABILITAÇÃO FÍSICA JESSICA MARGARITA PALOMARES PECHO 10 August 2021 (has links) [pt] A fisioterapia visa melhorar a funcionalidade física das pessoas, procurando atenuar as incapacidades causadas por alguma lesão, distúrbio ou doença. Nesse contexto, diversas tecnologias computacionais têm sido desenvolvidas com o intuito de apoiar o processo de reabilitação, como as tecnologias adaptáveis para o usuário final. Essas tecnologias possibilitam ao fisioterapeuta adequar aplicações e criarem atividades com características personalizadas de acordo com as preferências e necessidades de cada paciente. Nesta tese é proposta uma abordagem de baixo custo baseada no aprendizado de máquina interativo (iML - Interactive Machine Learning) que visa auxiliar os fisioterapeutas a criarem atividades personalizadas para seus pacientes de forma fácil e sem a necessidade de codificação de software, a partir de apenas alguns exemplos em vídeo RGB (capturadas por uma câmera de vídeo digital) Para tal, aproveitamos a estimativa de pose baseada em aprendizado profundo para rastrear, em tempo real, as articulações-chave do corpo humano a partir de dados da imagem. Esses dados são processados como séries temporais por meio do algoritmo Dynamic Time Warping em conjunto com com o algoritmo K-Nearest Neighbors para criar um modelo de aprendizado de máquina. Adicionalmente, usamos um algoritmo de detecção de anomalias com o intuito de avaliar automaticamente os movimentos. A arquitetura de nossa abordagem possui dois módulos: um para o fisioterapeuta apresentar exemplos personalizados a partir dos quais o sistema cria um modelo para reconhecer esses movimentos; outro para o paciente executar os movimentos personalizados enquanto o sistema avalia o paciente. Avaliamos a usabilidade de nosso sistema com fisioterapeutas de cinco clínicas de reabilitação. Além disso, especialistas avaliaram clinicamente nosso modelo de aprendizado de máquina. Os resultados indicam que a nossa abordagem contribui para avaliar automaticamente os movimentos dos pacientes sem monitoramento direto do fisioterapeuta, além de reduzir o tempo necessário do especialista para treinar um sistema adaptável. / [en] Physiotherapy aims to improve the physical functionality of people, seeking to mitigate the disabilities caused by any injury, disorder or disease. In this context, several computational technologies have been developed in order to support the rehabilitation process, such as the end-user adaptable technologies. These technologies allow the physiotherapist to adapt applications and create activities with personalized characteristics according to the preferences and needs of each patient. This thesis proposes a low-cost approach based on interactive machine learning (iML) that aims to help physiotherapists to create personalized activities for their patients easily and without the need for software coding, from just a few examples in RGB video (captured by a digital video camera). To this end, we take advantage of pose estimation based on deep learning to track, in real time, the key joints of the human body from image data. This data is processed as time series using the Dynamic Time Warping algorithm in conjunction with the K-Nearest Neighbors algorithm to create a machine learning model. Additionally, we use an anomaly detection algorithm in order to automatically assess movements. The architecture of our approach has two modules: one for the physiotherapist to present personalized examples from which the system creates a model to recognize these movements; another to the patient performs personalized movements while the system evaluates the patient. We assessed the usability of our system with physiotherapists from five rehabilitation clinics. In addition, experts have clinically evaluated our machine learning model. The results indicate that our approach contributes to automatically assessing patients movements without direct monitoring by the physiotherapist, in addition to reducing the specialist s time required to train an adaptable system. [pt] DETECCAO DE ANOMALIAS [pt] CRIACAO DE DADOS SINTETICOS [pt] REABILITACAO FISICA [pt] TECNOLOGIAS ADAPTAVEIS [pt] APRENDIZADO DE MAQUINA INTERATIVO [en] ANOMALY DETECTION [en] SYNTHETIC DATA GENERATION [en] PHYSICAL REHABILITATION [en] ADAPTATIVE TECHNOLOGIES [en] INTERACTIVE MACHINE LEARNING
76	Tracking a ball during bounce and roll using recurrent neural networks / Följning av en boll under studs och rull med hjälp av återkopplande neurala nätverk Rosell, Felicia January 2018 (has links) In many types of sports, on-screen graphics such as an reconstructed ball trajectory, can be displayed for spectators or players in order to increase understanding. One sub-problem of trajectory reconstruction is tracking of ball positions, which is a difficult problem due to the fast and often complex ball movement. Historically, physics based techniques have been used to track ball positions, but this thesis investigates using a recurrent neural network design, in the application of tracking bouncing golf balls. The network is trained and tested on synthetically created golf ball shots, created to imitate balls shot out from a golf driving range. It is found that the trained network succeeds in tracking golf balls during bounce and roll, with an error rate of under 11 %. / Grafik visad på en skärm, så som en rekonstruerad bollbana, kan användas i många typer av sporter för att öka en åskådares eller spelares förståelse. För att lyckas rekonstruera bollbanor behöver man först lösa delproblemet att följa en bolls positioner. Följning av bollpositioner är ett svårt problem på grund av den snabba och ofta komplexa bollrörelsen. Tidigare har fysikbaserade tekniker använts för att följa bollpositioner, men i den här uppsatsen undersöks en metod baserad på återkopplande neurala nätverk, för att följa en studsande golfbolls bana. Nätverket tränas och testas på syntetiskt skapade golfslag, där bollbanorna är skapade för att imitera golfslag från en driving range. Efter träning lyckades nätverket följa golfbollar under studs och rull med ett fel på under 11 %. machine learning ML recurrent neural networks RNN deep learning tracking golf bounce synthetic data maskininlärning ML recurrent neural networks RNN djupinlärning följning golf studs syntetiskt data Computer Sciences Datavetenskap (datalogi)
77	Synthetic Graph Generation at Scale : A novel framework for generating large graphs using clustering, generative models and node embeddings / Storskalig generering av syntetiska grafer : En ny arkitektur för att tillverka stora grafer med hjälp av klustring, generativa modeller och nodinbäddningar Hammarstedt, Johan January 2022 (has links) The field of generative graph models has seen increased popularity during recent years as it allows us to model the underlying distribution of a network and thus recreate it. From allowing anonymization of sensitive information in social networks to data augmentation of rare diseases in the brain, the ability to generate synthetic data has multiple applications in various domains. However, most current methods face the bottleneck of trying to generate the entire adjacency matrix and are thus limited to graphs with less than tens of thousands of nodes. In contrast, large real-world graphs like social networks or transaction graphs can extend significantly beyond these boundaries. Furthermore, the current scalable approaches are predominantly based on stochasticity and do not capture local structures and communities. In this paper, we propose Graphwave Edge-Linking CELL or GELCELL, a novel three-step architecture for generating graphs at scale. First, instead of constructing the entire network, GELCELL partitions the data and generates each cluster separately, allowing for efficient and parallelizable training. Then, by encoding the nodes, it trains a classifier to predict the edges between the partitions to patch them together, creating a synthetic version of the original large graph. Although it does suffer from some limitations due to necessary constraints on the cluster sizes, the results showed that GELCELL, given optimized parameters, can produce graphs with reasonable accuracy on all data tested, with the largest having 400 000 nodes and 1 000 000 edges. / Generativa grafmodeller har sett ökad popularitet under de senaste åren eftersom det möjliggör modellering av grafens underliggande distribution, och vi kan på så sätt återskapa liknande kopior. Förmågan att generera syntetisk data har ett flertal applikationsområden i en mängd av områden, allt från att möjligöra anonymisering av känslig data i sociala nätverk till att utöka mängden tillgänglig data av ovanliga hjärnsjukdomar. Dagens metoder har länge varit begränsade till grafer med under tiotusental noder, då dessa inte är tillräckligt skalbara, men grafer som sociala nätverk eller transaktionsgrafer kan sträcka sig långt utöver dessa gränser. Dessutom är de nuvarande skalbara tillvägagångssätten till största delen baserade på stokasticitet och fångar inte lokala strukturer och kluster. I denna rapport föreslår vi ”Graphwave EdgeLinking CELL” eller GELCELL, en trestegsarkitektur för att generera grafer i större skala. Istället för att återskapa hela grafen direkt så partitionerar GELCELL all datat och genererar varje kluster separat, vilket möjliggör både effektiv och parallelliserbar träning. Vi kan sedan koppla samman grafen genom att koda noderna och träna en modell för att prediktera länkarna mellan kluster och återskapa en syntetisk version av originalet. Metoden kräver vissa antaganden gällande max-storleken på dess kluster men är flexibel och kan rymma domänkännedom om en specifik graf i form av informerad parameterinställning. Trots detta visar resultaten på varierade träningsdata att GELCELL, givet optimerade parametrar, är kapabel att genera grafer med godtycklig precision upp till den största beprövade grafen med 400 000 noder och 1 000 000 länkar. Data Anonymization Graph Learning Generative Graph Modeling Graph Clustering Node Embedding Synthetic Data Dataanonymisering Grafinlärning Generativa graf-modeller Graf klustring Länk prediktion Nodinbäddning Syntetisk data Computer and Information Sciences Data- och informationsvetenskap
78	Gaze tracking using Recurrent Neural Networks : Hardware agnostic gaze estimation using temporal features, synthetic data and a geometric model Malmberg, Fredrik January 2022 (has links) Vision is an important tool for us humans and significant effort has been put into creating solutions that let us measure how we use it. Most common among the techniques to measure gaze direction is to use specialised hardware such as infrared eye trackers. Recently, several Convolutional Neural Network (CNN) based architectures have been suggested yielding impressive results on single Red Green Blue (RGB) images. However, limited research has been done around whether using several sequential images can lead to improved tracking performance. Expanding this research to include low frequency and low quality RGB images can further open up the possibility to improve tracking performance for models using off-the-shelf hardware such as web cameras or smart phone cameras. GazeCapture is a well known dataset used for training RGB based CNN models but it lacks sequences of images and natural eye movements. In this thesis, a geometric gaze estimation model is introduced and synthetic data is generated using Unity to create sequences of images with both RGB input data as well as ground Point of Gaze (POG). To make these images more natural appearing domain adaptation is done using a CycleGAN. The data is then used to train several different models to evaluate whether temporal information can increase accuracy. Even though the improvement when using a Gated Recurrent Unit (GRU) based temporal model is limited over simple sequence averaging, the network achieves smoother tracking than a single image model while still offering faster updates over a saccade (eye movement) compared to averaging. This indicates that temporal features could improve accuracy. There are several promising future areas of related research that could further improve performance such as using real sequential data or further improving the domain adaptation of synthetic data. / Synen är ett viktigt sinne för oss människor och avsevärd energi har lagts ner på att skapa lösningar som låter oss mäta hur vi använder den. Det vanligaste sättet att göra detta idag är att använda specialiserad hårdvara baserad på infrarött ljus för ögonspårning. På senare tid har maskininlärning och modeller baserade på CNN uppnått imponerande resultat för enskilda RGB-bilder men endast begränsad forskning har gjorts kring huruvida användandet av en sekvens av högupplösta bilder kan öka prestandan för dessa modeller ytterligare. Genom att uttöka denna till bildserier med lägre frekvens och kvalitet kan det finnas möjligheter att förbättra prestandan för sekventiella modeller som kan använda data från standard-hårdvara såsom en webbkamera eller kameran i en vanlig telefon. GazeCapture är ett välkänt dataset som kan användas för att träna RGB-baserade CNN-modeller för enskilda bilder. Dock innehåller det inte bildsekvenser eller bilder som fångar naturliga ögonrörelser. För att hantera detta tränades de sekventiella modellerna i denna uppsats med data som skapats från 3D-modeller i Unity. För att den syntetiska datan skulle vara jämförbar med riktiga bilder anpassades den med hjälp av ett CycleGAN. Även om förbättringen som uppnåddes med sekventiella GRU-baserade modeller var begränsad jämfört med en modell som använde medelvärdet för sekvensen så uppnådde den tränade sekventiella modellen jämnare spårning jämfört med enbildsmodeller samtidigt som den uppdateras snabbare vid en sackad (ögonrörelse) än medelvärdesmodellen. Detta indikerar att den tidsmässiga information kan förbättra ögonspårning även för lågfrekventa bildserier med lägre kvalitet. Det finns ett antal intressanta områden att fortsätta undersöka för att ytterligare öka prestandan i liknande system som till exempel användandet av större mängder riktig sekventiell data eller en förbättrad domänanpassning av syntetisk data. Gaze Tracking Eye Tracking Computer Vision Transfer Learning Synthetic Data Domain Adaptation Sequential Models Blickspårning Ögonspårning Datorseende Transfer Learning Syntetisk Data Domain Adaptation Sekventiella Modeller Computer and Information Sciences Data- och informationsvetenskap
79	Methodik zur Erstellung von synthetischen Daten für das Qualitätsmanagement und der vorausschauenden Instandhaltung im Bereich der Innenhochdruck-Umformung (IHU) Reuter, Thomas, Massalsky, Kristin, Burkhardt, Thomas 28 November 2023 (has links) Unternehmen stehen zunehmend vor der Herausforderung, dem drohenden Wissensverlust durch demografischen Wandel und Mitarbeiterabgang zu begegnen. In Zeiten voranschreitender Digitalisierung gilt es, große Datenmengen beherrschbar und nutzbar zu machen, mit dem Ziel, einerseits die Ressourceneffizienz innerhalb des Unternehmens zu erhöhen und anderseits den Kunden zusätzliche Dienstleistungen anbieten zu können. Vor dem Hintergrund, ein effizientes Qualitätsmanagement und eine vorausschauende Instandhaltung mit ein und demselben System zu realisieren, sind zunächst technologische Kennzahlen und die Prozessführung zu bestimmen. Im Bereich der intelligenten Instandhaltung ist es jedoch nicht immer möglich, Fehlerzustände von physischen Anlagen im Serienbetrieb als Datensatz abzufassen. Das bewusste Zulassen von Fehlern unter realen Produktionsbedingungen könnte zu fatalen Ausfällen bis hin zur Zerstörung der Anlage führen. Auch das gezielte Erzeugen von Fehlern unter stark kontrollierten Bedingungen kann zeitaufwendig, kostenintensiv oder sogar undurchführbar sein.
80	Methodology for the creation of synthetic data for quality management and predictive maintenance in the field of hydroforming (IHU) Reuter, Thomas, Massalsky, Kristin, Burkhardt, Thomas 28 November 2023 (has links) Companies are increasingly challenged by the impending loss of knowledge due to demographic change and employee loss. In times of advancing digitalization, it is important to make large datasets accessible and usable, aiming at increasing resource efficiency within the company on the one hand and being able to offer customers additional services on the other. Given the background of implementing efficient quality management and predictive maintenance with the same system, technological key figures and process control must first be determined. In the field of intelligent maintenance, however, it is not always possible to record error states of physical systems in series operation as a data set. Deliberately allowing faults to occur under real production conditions could lead to fatal failures or even the destruction of the system. The targeted generation of faults under highly controlled conditions can also be timeconsuming, cost-intensive, or even impractical.

Search results