Return to search

Genomic Data Augmentation with Variational Autoencoder

In order to treat cancer effectively, medical practitioners must predict pathological stages accurately, and machine learning methods can be employed to make such predictions. However, biomedical datasets, including genomic datasets, often have disproportionately more samples from people of European ancestry than people of other ethnic or racial groups, which can cause machine learning methods to perform better on the European samples than on the people of the under-represented groups. Data augmentation can be employed as a potential solution in order to artificially increase the number of samples from people of under-represented racial groups, and can in turn improve pathological stage predictions for future patients from such under-represented groups. Genomic data augmentation has been explored previously, for example using a Generative Adversarial Network, but to the best of our knowledge, the use of the variational autoencoder for the purpose of genomic data augmentation remains largely unexplored. Here we utilize a geometry-based variational autoencoder that models the latent space as a Riemannian manifold so that samples can be generated without the use of a prior distribution to show that the variational autoencoder can indeed be used to reliably augment genomic data. Using TCGA prostate cancer genotype data, we show that our VAE-generated data can improve pathological stage predictions on a test set of European samples. Because we only had European samples that were labeled in terms of pathological stage, we were not able to validate the African generated samples in this way, but we still attempt to show how such samples may be realistic. / Computer and Information Science

Identiferoai:union.ndltd.org:TEMPLE/oai:scholarshare.temple.edu:20.500.12613/9560
Date12 1900
CreatorsThyrum, Emily
ContributorsShi, Xinghua Mindy, Chen, Yuzhou, Carnevale, Vincenzo
PublisherTemple University. Libraries
Source SetsTemple University
LanguageEnglish
Detected LanguageEnglish
TypeThesis/Dissertation, Text
Format93 pages
RightsIN COPYRIGHT- This Rights Statement can be used for an Item that is in copyright. Using this statement implies that the organization making this Item available has determined that the Item is in copyright and either is the rights-holder, has obtained permission from the rights-holder(s) to make their Work(s) available, or makes the Item available under an exception or limitation to copyright (including Fair Use) that entitles it to make the Item available., http://rightsstatements.org/vocab/InC/1.0/
Relationhttp://dx.doi.org/10.34944/dspace/9522, Theses and Dissertations

Page generated in 0.002 seconds