Return to search

Machine Learning Approaches to Biological Sequence and Phenotype Data Analysis

To understand biology at a system level, I presented novel machine learning algorithms to reveal the underlying mechanisms of how genes and their products function in different biological levels in this thesis. Specifically, at sequence level, based on Kernel Support Vector Machines (SVMs), I proposed learned random-walk kernel and learned empirical-map kernel to identify protein remote homology solely based on sequence data, and I proposed a discriminative motif discovery algorithm to identify sequence motifs that characterize
protein sequences' remote homology membership. The proposed approaches significantly outperform previous methods, especially on some challenging protein families. At expression and protein level, using hierarchical Bayesian graphical models, I developed the first high-throughput computational predictive model to filter sequence-based predictions of microRNA targets by incorporating the
proteomic data of putative microRNA target genes, and I proposed another probabilistic model to explore the underlying mechanisms of microRNA regulation by combining the expression profile data of messenger RNAs and microRNAs. At cellular level, I further investigated how yeast genes manifest their
functions in cell morphology by performing gene function prediction from the morphology data of yeast temperature-sensitive alleles. The developed prediction models enable biologists to choose some interesting yeast
essential genes and study their predicted novel functions.

Identiferoai:union.ndltd.org:TORONTO/oai:tspace.library.utoronto.ca:1807/26209
Date17 February 2011
CreatorsMin, Renqiang
ContributorsBonner, Anthony, Zhang, Zhaolei
Source SetsUniversity of Toronto
Languageen_ca
Detected LanguageEnglish
TypeThesis

Page generated in 0.0023 seconds