Global ETD Search

21	Automatic variance adjusted Bayesian inference with pseudo likelihood under unequal probability sampling: imputation and data synthetic Almomani, Ayat January 2021 (has links) No description available. Statistics Multiple imputation Pseudo posterior uncertainty Informative sampling variance adjusted Dirichlet process Synthetic data
22	Generating a synthetic dataset for kidney transplantation using generative adversarial networks and categorical logit encoding Bartocci, John Timothy 24 May 2021 (has links) No description available. Computer Science Artificial Intelligence Synthetic Data Generative Adversarial Network Categorical Data Kidney Transplant
23	Segmentation of x-ray images using deep learning trained on synthetic data / Segmentering av röntgenbilder genom djupinlärning tränad på syntetisk data Larsson, Marcus January 2023 (has links) Radiograph examinations play a critical role in various applications such as the detection of bone pathologies and lung cancer, despite the challenge of false negatives. The integration of Artificial Intelligence (AI) holds promise in enhancing image quality and assisting radiologists in their diagnostic processes. However, the scarcity of annotated high-quality data poses a significant hurdle in training AI models effectively. In this thesis, we propose a method for training deep learning models using synthetic data to achieve segmentation of X-ray images. Realistic, simulated, images were generated, enabling segmentation of anatomical structures, including the spine, ribs, scapula, clavicle, and lungs, on a test set comprised of other simulated images. The foremost emphasize was placed on the segmentation of the spine, where we obtained a Dice score of 0.87. Significant advancements have also been made in the application of the model to real clinical images, demonstrating successful segmentation in certain instances. Further generalization of the model opens up numerous avenues for future exploration of deep learning in radiography. / Röntgenundersökningar har en avgörande roll inom flera områden, såsom detektering av bensjukdomar och lungcancer, trots en stor andel falska negativa resultat. Artificiell intelligens (AI) är ett lovande verktyg för att förbättra bildkvaliteten och underlätta radiologers arbete att diagnostisera patienter. Det är dock en brist på högkvalitativ, annoterad, data, vilket är ett signifikant hinder för effektiv träning av AI-modeller. I detta arbete presenteras en metod för att träna djupinlärningsmodeller med hjälp av syntetisk data för att segmentera anatomier på röntgenbilder. Realistiska, simulerade, bilder genererades och möjliggjorde segmentering av ryggrad, revben, skulderblad, nyckelben och lungor på ett testset bestående av andra simulerade bilder. Störst vikt lades på segmentering av ryggrad, där ett Dice-resultat på 0.87 uppnåddes. Betydande framsteg har också gjorts i tillämpningen av modellen till verkliga kliniska bilder och lyckade segmenteringar åstadkoms i vissa exempel. Ytterligare generalisering av modellen skulle öppna upp många möjligheter att undersöka användning av djupinlärning för röntgenbilder. Radiography Deep Learning Synthetic Data Segmentation Röntgen Djupinlärning Syntetisk Data Segmentering Physical Sciences Fysik
24	Generating Directed & Weighted Synthetic Graphs using Low-Rank Approximations / Generering av Riktade & Viktade Syntetiska Grafer med Lågrangs-approximationer Lundin, Erik January 2022 (has links) Generative models for creating realistic synthetic graphs constitute a research area that is increasing in popularity, especially as the use of graph data is becoming increasingly common. Generating realistic synthetic graphs enables sharing of the information embedded in graphs without directly sharing the original graphs themselves. This can in turn contribute to an increase of knowledge within several domains where access to data is normally restricted, including the financial system and social networks. In this study, it is examined how existing generative models can be extended to be compatible with directed and weighted graphs, without limiting the models to generating graphs of a specific domain. Several models are evaluated, and all use low-rank approximations to learn structural properties of directed graphs. Additionally, it is evaluated how node embeddings can be used with a regression model to add realistic edge weights to directed graphs. The results show that the evaluated methods are capable of reproducing global statistics from the original directed graphs to a promising degree, without having more than 52% overlap in terms of edges. The results also indicate that realistic directed and weighted graphs can be generated from directed graphs by predicting edge weights using pairs of node embeddings. However, the results vary depending on which node embedding technique is used. graphs networks graph generation synthetic data machine learning low-rank approximation Computer and Information Sciences Data- och informationsvetenskap
25	Artificial Transactional Data Generation for Benchmarking Algorithms / Generering av artificiell transaktionsdata för att prestandamäta algoritmer Lundgren, Veronica January 2023 (has links) Modern retailers have been collecting more and more data over the past decades. The increased sizes of collected data have led to higher demand for data analytics expertise tools, which the Umeå-founded company Infobaleen provides. A recurring challenge when developing such tools is the data itself. Difficulties in finding relevant open data sets have led to a rise in the popularity of using synthetic data. By using artificially generated data, developers gain more control over the input when testing and presenting their work. However, most methods that exist today either depend on real-world data as input or produce results that look synthetic and are difficult to extend. In this thesis, I introduce a method specifically designed to generate synthetic transactional data stochastically. I first examined real-world data provided by Infobaleen to determine suitable statistical distributions to use in my algorithm empirically. I then modelled individual decision-making using points in an embedding space, where the distance between the points serves as a basis for individually unique probability weights. This solution creates data distributed similarly to real-world data and enables retroactive data enrichment using the same embeddings. The result is a data set that looks genuine to the human eye but is entirely synthetic. Infobaleen already generates data with this model when presenting its product to new potential customers or partners. Data Analytics Synthetic Data Statistical Distribution Embedding Transactional Data Computer Sciences Datavetenskap (datalogi)
26	GAN-Based Approaches for Generating Structured Data in the Medical Domain Abedi, Masoud, Hempel, Lars, Sadeghi, Sina, Kirsten, Toralf 03 November 2023 (has links) Modern machine and deep learning methods require large datasets to achieve reliable and robust results. This requirement is often difficult to meet in the medical field, due to data sharing limitations imposed by privacy regulations or the presence of a small number of patients (e.g., rare diseases). To address this data scarcity and to improve the situation, novel generative models such as Generative Adversarial Networks (GANs) have been widely used to generate synthetic data that mimic real data by representing features that reflect health-related information without reference to real patients. In this paper, we consider several GAN models to generate synthetic data used for training binary (malignant/benign) classifiers, and compare their performances in terms of classification accuracy with cases where only real data are considered. We aim to investigate how synthetic data can improve classification accuracy, especially when a small amount of data is available. To this end, we have developed and implemented an evaluation framework where binary classifiers are trained on extended datasets containing both real and synthetic data. The results show improved accuracy for classifiers trained with generated data from more advanced GAN models, even when limited amounts of original data are available. info:eu-repo/classification/ddc/610 ddc:610
27	Generate synthetic datasets and scenarios by learning from the real world Berizzi, Paolo January 2021 (has links) The modern paradigms of machine learning algorithms and artificial intelligence base their success on processing a large quantity of data. Nevertheless, data does not come for free, and it can sometimes be practically unfeasible to collect enough data to train machine learning models successfully. That is the main reason why synthetic data generation is of great interest in the research community. Generating realistic synthetic data can empower machine learning models with vast datasets that are difficult to collect in the real world. In autonomous vehicles, it would require thousands of hours of driving recording for a machine learning model to learn how to drive a car in a safety-critical and effective way. The use of synthetic data, on the other hand, make it possible to simulate many different driving scenarios at a much lower cost. This thesis investigates the functioning of Meta-Sim, a synthetic data generator used to create datasets by learning from the real world. I evaluated the effects of replacing the stem of the Inception-V3 with the stem of the Inception- V4 as the feature extractor needed to process image data. Results showed similar behaviour of models that used the stem of the Inception-V4 instead of the Inception-V3. Slightly differences were found when the model tried to simulate more complex images. In these cases, the models that use the stem of the Inception-V4 converged in fewer iterations than those that used the Inception-V3, demonstrating superior behaviours of the Inception-V4. In the end, I proved that the Inception-V4 could be used to achieve state-of-the- art results in synthetic data generation. Moreover, in specific cases, I show that the Inception-V4 can exceed the performance attained by Meta-Sim. The outcome suggests further research in the field to validate the results on a larger scale. / De moderna paradigmen för algoritmer för maskininlärning och artificiell intelligens bygger sin framgång på att bearbeta en stor mängd data. Data är dock inte gratis, och det kan ibland vara praktiskt omöjligt att samla in tillräckligt med data för att träna upp maskininlärningsmodeller på ett framgångsrikt sätt. Det är huvudskälet till att generering av syntetiska data är av stort intresse för forskarsamhället. Genom att generera realistiska syntetiska data kan maskininlärningsmodeller få tillgång till stora datamängder som är svåra att samla in i den verkliga världen. I autonoma fordon skulle det krävas tusentals timmars körning för att en maskininlärningsmodell ska lära sig att köra en bil på ett säkerhetskritiskt och effektivt sätt. Användningen av syntetiska data gör det å andra sidan möjligt att simulera många olika körscenarier till en mycket lägre kostnad. I den här avhandlingen undersöks hur Meta-Sim fungerar, en generator för syntetiska data som används för att skapa dataset genom att lära sig av den verkliga världen. Jag utvärderade effekterna av att ersätta stammen från Inception-V3 med stammen från Inception-V4 som den funktionsextraktor som behövs för att bearbeta bilddata. Resultaten visade ett liknande beteende hos modeller som använde stammen från Inception-V4 i stället för Inception- V3. Små skillnader konstaterades när modellen försökte simulera mer komplexa bilder. I dessa fall konvergerade de modeller som använde Inception-V4:s stam på färre iterationer än de som använde Inception-V3, vilket visar att Inception- V4:s beteende är överlägset. I slutändan bevisade jag att Inception-V4 kan användas för att uppnå toppmoderna resultat vid generering av syntetiska data. Dessutom visar jag i specifika fall att Inception-V4 kan överträffa den prestanda som uppnås av Meta-Sim. Resultatet föreslår ytterligare forskning på området för att validera resultaten i större skala. Synthetic Data Rendered Images Computer Vision Syntetiska data återgivna bilder datorsyn Computer and Information Sciences Data- och informationsvetenskap
28	The Application of Synthetic Signals for ECG Beat Classification Brown, Elliot Morgan 01 September 2019 (has links) A brief overview of electrocardiogram (ECG) properties and the characteristics of various cardiac conditions is given. Two different models are used to generate synthetic ECG signals. Domain knowledge is used to create synthetic examples of 16 different heart beat types with these models. Other techniques for synthesizing ECG signals are explored. Various machine learning models with different combinations of real and synthetic data are used to classify individual heart beats. The performance of the different methods and models are compared, and synthetic data is shown to be useful in beat classification. ECG synthetic data SMOTE signals classification machine learning neural networks Mathematics Physical Sciences and Mathematics
29	Simulating to learn: using adaptive simulation to train, test and understand neural networks Ruiz, Nataniel 10 September 2024 (has links) Most machine learning models are trained and tested on fixed datasets that have been collected in the real world. This longstanding approach has some prominent weaknesses: (1) collecting and annotating real data is expensive (2) real data might not cover all of the important rare scenarios that might be of interest (3) it is impossible to finely control certain attributes of real data (e.g. lighting, pose, texture), and (4) testing on a similar distribution as the training data can give an incomplete picture of the capabilities and weaknesses of the model. In this thesis we propose approaches for training and testing machine learning models using adaptive simulation. Specifically, given a parametric image/video simulator, the causal parameters of a scene can be adapted to generate different data distributions. We present five different methods to train and test machine learning models by adapting the simulated data distribution, these are Learning to Simulate, One-at-a-Time Simulated Testing, Simulated Adversarial Testing, Simulated Adversarial Training and Counterfactual Simulation Testing. We demonstrate these five approaches on vastly different real-world computer vision tasks, including semantic segmentation in traffic scenes, face recognition, body measurement estimation and object recognition. We achieve state-of-the-art results in several different applications. We release three large public datasets for different domains. Our main discoveries include: (1) we can find biases of models by testing them using scenes where each causal parameter is varied independently (2) our confidence in the performance of some models is inflated since they fail when the data distribution is adversarially sampled (3) we can bridge the simulation/real domain gap using counterfactual testing in order to compare different neural networks with different architectures, and (4) we can improve machine learning model performance by adapting the simulated data distribution either by (a) by learning the generative parameters to directly maximize performance on a validation set or (b) by adversarial optimization of the generative parameters. Finally, we present DreamBooth, a first exploration in the direction of controlling recently released diffusion models in order to achieve realistic simulation, which would improve the precision, performance and impact of all the ideas presented in this thesis. Computer science Deep learning Generative models Machine learning Neural networks Simulation Synthetic data
30	A Comprehensive Approach to Evaluating Usability and Hyperparameter Selection for Synthetic Data Generation Adriana Louise Watson (19180771) 20 July 2024 (has links) <p dir="ltr">Data is the key component of every machine-learning algorithm. Without sufficient quantities of quality data, the vast majority of machine learning algorithms fail to perform. Acquiring the data necessary to feed algorithms, however, is a universal challenge. Recently, synthetic data production methods have become increasingly relevant as a method of ad-dressing a variety of data issues. Synthetic data allows researchers to produce supplemental data from an existing dataset. Furthermore, synthetic data anonymizes data without losing functionality. To advance the field of synthetic data production, however, measuring the quality of produced synthetic data is an essential step. Although there are existing methods for evaluating synthetic data quality, the methods tend to address finite aspects of the data quality. Furthermore, synthetic data evaluation from one study to another varies immensely adding further challenge to the quality comparison process. Finally, al-though tools exist to automatically tune hyperparameters, the tools fixate on traditional machine learning applications. Thus, identifying ideal hyperparameters for individual syn-thetic data generation use cases is also an ongoing challenge.</p> Data quality Adversarial machine learning Machine Learning Synthetic Data Time Series Data Adversarial Networks

Search results