The development of new solutions incorporating artificial intelligence (AI) within the medical field is an area of great interest. However, access to comprehensive and diverse datasets is restricted due to the sensitive nature of the data. A potential solution to this is to generatesynthetic datasets based on real medical data. Synthetic data could protect the integrity of the subjects while preserving the inherent information necessary for training AI models and be generated in greater quantity than otherwise available. This thesis project aims to generate reliable clinical trial subject data using a generative adversarial network (GAN). The main data set used is a mock clinical trial dataset consisting of multiple subject visits, however an additional data set containing authentic medical data is also used for better insights into the model’s ability to learn underlying relationships. The thesis also investigates training strategies for simulating the temporal dimension and the missing values in the data. The GAN model used is an altered version of the Conditional Tabular GAN (CTGAN)made to be compatible with the preprocessed clinical trial mock data, and multiple model architectures and number of training epochs are examined. The results show great potential for GAN models on clinical trial datasets, especially for real-life data. One model, trained on the authentic dataset, generates near-perfect synthetic data with respect to column distributions and correlation between columns. The results also show that classification models trained on synthetic data and tested on real data have the potential to match the performance of classification models trained on real data. While the synthetic data replicates the missing values, no definitive conclusion can be drawn regarding the temporal characteristics due to the sparsity of the mock dataset and lack of real correlations in it. Although the results are promising, further experiments on authentic datasets with less sparsity are required.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-533754 |
Date | January 2024 |
Creators | Lindell, Linus |
Publisher | Uppsala universitet, Signaler och system |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Relation | UPTEC F, 1401-5757 ; 24051 |
Page generated in 0.0024 seconds