Return to search

Computational methods for identification of disease-associated variations in exome sequencing

The explosive growth in the ability to sequence DNA due to next-generation sequencing (NGS) technologies has brought an unprecedented ability to characterize an individual's exome inexpensively. This ability provides clinicians with additional tools to evaluate the likely genetic factors underlying heritable diseases. With this added capacity comes a need to identify relationships between the genetic variations observed in a patient and the disease with which the patient presents. This dissertation focuses on computational techniques to inform molecular diagnostics from NGS data. The techniques focus on three distinct domains in the characterization of disease-associated variants from exome sequencing.
First, strategies for producing complete and non-artifactual candidate variant lists are discussed. The process of converting patient DNA to a list of variants from the reference genome is very complex, and numerous modes of error may be introduced during the process. For this, a Random Forest classifier was built to capture biases in a laboratory variant calling pipeline, and a C4.5 decision tree was built to enable discovery of thresholds for false positive reduction. Additionally, a strategy for augmenting exome capture experiments through evaluation of RNA-sequencing is discussed.
Second, a novel positive and unlabeled learning for prioritization (PULP) strategy is proposed to identify candidate variants most likely to be associated with a patient's disease. Using a number of publicly available data sources, PULP ranks genes according to how alike they are to previously discovered disease genes. This strategy is evaluated on a number of candidate lists from the literature, and demonstrated to significantly enrich ordered candidate variants lists for likely disease-associated variants.
Finally, the Training for Recognition and Integration of Phenotypes in Ocular Disease (TRIPOD) web utility is introduced as a means of simultaneously training and learning from clinicians about heritable ocular diseases. This tool currently contains a number of case studies documenting a wide range of diseases, and challenges trainees to virtually diagnose patients based on presented image data. Annotations by trainee and expert alike are used to construct rich phenotypic profiles for patients with known disease genotypes.
The strategies presented in this dissertation are specifically applicable to heritable retinal dystrophies, and have resulted in a number of improvements to the accurate molecular diagnosis of patient diseases. However, these works also provide a generalizable framework for disease-associated variant identification in any heritable, genetically heterogeneous disease, and represent the ongoing challenge of accurate diagnosis in the information age.

Identiferoai:union.ndltd.org:uiowa.edu/oai:ir.uiowa.edu:etd-5520
Date01 December 2014
CreatorsWagner, Alex Handler
ContributorsBraun, Terry A., Stone, Edwin M.
PublisherUniversity of Iowa
Source SetsUniversity of Iowa
LanguageEnglish
Detected LanguageEnglish
Typedissertation
Formatapplication/pdf
SourceTheses and Dissertations
RightsCopyright 2014 Alex Handler Wagner

Page generated in 0.0019 seconds