Modern machine and deep learning methods require large datasets to achieve reliable
and robust results. This requirement is often difficult to meet in the medical field, due to data
sharing limitations imposed by privacy regulations or the presence of a small number of patients (e.g.,
rare diseases). To address this data scarcity and to improve the situation, novel generative models
such as Generative Adversarial Networks (GANs) have been widely used to generate synthetic
data that mimic real data by representing features that reflect health-related information without
reference to real patients. In this paper, we consider several GAN models to generate synthetic data
used for training binary (malignant/benign) classifiers, and compare their performances in terms
of classification accuracy with cases where only real data are considered. We aim to investigate
how synthetic data can improve classification accuracy, especially when a small amount of data is
available. To this end, we have developed and implemented an evaluation framework where binary
classifiers are trained on extended datasets containing both real and synthetic data. The results show
improved accuracy for classifiers trained with generated data from more advanced GAN models,
even when limited amounts of original data are available.
Identifer | oai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:87885 |
Date | 03 November 2023 |
Creators | Abedi, Masoud, Hempel, Lars, Sadeghi, Sina, Kirsten, Toralf |
Publisher | MDPI |
Source Sets | Hochschulschriftenserver (HSSS) der SLUB Dresden |
Language | English |
Detected Language | English |
Type | info:eu-repo/semantics/publishedVersion, doc-type:article, info:eu-repo/semantics/article, doc-type:Text |
Rights | info:eu-repo/semantics/openAccess |
Relation | 7075 |
Page generated in 0.0024 seconds