Global ETD Search

Return to search

Generation of Synthetic Clinical Trial Subject Data Using Generative Adversarial Networks

The development of new solutions incorporating artificial intelligence (AI) within the medical field is an area of great interest. However, access to comprehensive and diverse datasets is restricted due to the sensitive nature of the data. A potential solution to this is to generatesynthetic datasets based on real medical data. Synthetic data could protect the integrity of the subjects while preserving the inherent information necessary for training AI models and be generated in greater quantity than otherwise available. This thesis project aims to generate reliable clinical trial subject data using a generative adversarial network (GAN). The main data set used is a mock clinical trial dataset consisting of multiple subject visits, however an additional data set containing authentic medical data is also used for better insights into the model’s ability to learn underlying relationships. The thesis also investigates training strategies for simulating the temporal dimension and the missing values in the data. The GAN model used is an altered version of the Conditional Tabular GAN (CTGAN)made to be compatible with the preprocessed clinical trial mock data, and multiple model architectures and number of training epochs are examined. The results show great potential for GAN models on clinical trial datasets, especially for real-life data. One model, trained on the authentic dataset, generates near-perfect synthetic data with respect to column distributions and correlation between columns. The results also show that classification models trained on synthetic data and tested on real data have the potential to match the performance of classification models trained on real data. While the synthetic data replicates the missing values, no definitive conclusion can be drawn regarding the temporal characteristics due to the sparsity of the mock dataset and lack of real correlations in it. Although the results are promising, further experiments on authentic datasets with less sparsity are required.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-533754

Artificial intelligence

Artificial Neural Network

Generative Adversarial Network

Datavetenskap (datalogi)

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-533754
Date	January 2024
Creators	Lindell, Linus
Publisher	Uppsala universitet, Signaler och system
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	UPTEC F, 1401-5757 ; 24051

Page generated in 0.0024 seconds

Generation of Synthetic Clinical Trial Subject Data Using Generative Adversarial Networks

Description

Links & Downloads

Tags

Additional Fields