Global ETD Search

Return to search

Robust genotype classification using dynamic variable selection

Single nucleotide polymorphisms (SNPs) are DNA sequence variations, occurring when a single nucleotide –A, T, C or G – is altered. Arguably, SNPs account for more than 90% of human genetic variation. Dr. Tebbutt's laboratory has developed a highly redundant SNP genotyping assay consisting of multiple probes with signals from multiple channels for a single SNP, based on arrayed primer extension (APEX). The strength of this platform is its unique redundancy having multiple probes for a single SNP. Using this microarray platform, we have developed fully-automated genotype calling algorithms based on linear models for individual probe signals and using dynamic variable selection at the prediction level. The algorithms combine separate analyses based on the multiple probe sets to give a final confidence score for each candidate genotypes.

Our proposed classification model achieved an accuracy level of >99.4% with 100% call rate for the SNP genotype data which is comparable with existing genotyping technologies. We discussed the appropriateness of the proposed model related to other existing high-throughput genotype calling algorithms.

In this thesis we have explored three new ideas for classification with high dimensional data: (1) ensembles of various sets of predictors with built-in dynamic property; (2) robust classification at the prediction level; and (3) a proper confidence measure for dealing with failed predictor(s).

We found that a mixture model for classification provides robustness against outlying values of the explanatory variables. Furthermore, the algorithm chooses among different sets of explanatory variables in a dynamic way, prediction by prediction. We analyzed several data sets, including real and simulated samples to illustrate these features. Our model-based genotype calling algorithm captures the redundancy in the system considering all the underlying probe features of a particular SNP, automatically down-weighting any ‘bad data’ corresponding to image artifacts on the microarray slide or failure of a specific chemistry.

Though motivated by this genotyping application, the proposed methodology would apply to other classification problems where the explanatory variables fall naturally into groups or outliers in the explanatory variables require variable selection at the prediction stage for robustness. / Science, Faculty of / Statistics, Department of / Graduate

http://hdl.handle.net/2429/1602

Statistics

Bioinformatics

Microarray analysis

Identifer	oai:union.ndltd.org:UBC/oai:circle.library.ubc.ca:2429/1602
Date	11 1900
Creators	Podder, Mohua
Publisher	University of British Columbia
Source Sets	University of British Columbia
Language	English
Detected Language	English
Type	Text, Thesis/Dissertation
Format	2348006 bytes, application/pdf
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International, http://creativecommons.org/licenses/by-nc-nd/4.0/

Page generated in 0.0018 seconds

Robust genotype classification using dynamic variable selection

Description

Links & Downloads

Tags

Additional Fields