Return to search

Machine Learning Approaches for Identifying microRNA Targets and Conserved Protein Complexes

Much research has been directed toward understanding the roles of essential components in the cell, such as proteins, microRNAs, and genes. This dissertation focuses on two interesting problems in bioinformatics research: microRNA-target prediction and the identification of conserved protein complexes across species. We define the two problems and develop novel approaches for solving them. MicroRNAs are short non-coding RNAs that mediate gene expression. The goal is to predict microRNA targets. Existing methods rely on sequence features to predict targets. These features are neither sufficient nor necessary to identify functional target sites and ignore the cellular conditions in which microRNA and mRNA interact. We developed MicroTarget to predict microRNA-mRNA interactions using heterogeneous data sources. MicroTarget uses expression data to learn candidate target set for each microRNA. Then, sequence data is used to provide evidence of direct interactions and ranking the predicted targets. The predicted targets overlap with many of the experimentally validated ones. The results indicate that using expression data helps in predicting microRNA targets accurately.

Protein complexes conserved across species specify processes that are core to cell machinery. Methods that have been devised to identify conserved complexes are severely limited by noise in PPI data. Behind PPIs, there are domains interacting physically to perform the necessary functions. Therefore, employing domains and domain interactions gives a better view of the protein interactions and functions. We developed novel strategy for local network alignment, DONA. DONA maps proteins into their domains and uses DDIs to improve the network alignment. We developed novel strategy for constructing an alignment graph and then uses this graph to discover the conserved sub-networks. DONA shows better performance in terms of the overlap with known protein complexes with higher precision and recall rates than existing methods. The result shows better semantic similarity computed with respect to both the biological process and the molecular function of the aligned sub-networks. / Ph. D. / Much research has been directed toward understanding the roles of essential components in the cell, such as proteins, microRNAs, and genes. The processes within the cell include a mixture of small molecules. It is of great interest to utilize different information sources to discover the interactions among these molecules. This dissertation focuses on two interesting problems: microRNA-target prediction and the identification of conserved protein complexes across species. We define the two problems and develop novel approaches for solving them. MicroRNAs are a recently discovered class of non-coding RNAs. They play key roles in the regulation of gene expression of as much as 30% of all mammalian protein encoding genes. MicroRNAs regulation activity has been implicated in a number of diseases including cancer, heart disease and neurological diseases. We developed MicroTarget to predict microRNAgene interactions using heterogeneous data sources. The predicted target genes overlap with many of the experimentally validated ones.

Proteins carry out their tasks in the cell by interacting with each other. Protein complexes conserved among species specify the cell core processes. We identify conserved complexes by constructing an alignment graph leveraging on the conservation of PPIs between species through domain conservation and domain-domain interactions (DDI) in addition to PPI networks. Better integration of domain conservation and interactions in our developed conserved protein complexes identification system helps biologists benefit from verified data to predict more reliable similarity relationships among species. All the test data sets and source code for this dissertation are available at:
https://bioinformatics.cs.vt.edu/∼htorkey/Software.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/77536
Date27 April 2017
CreatorsTorkey, Hanaa A.
ContributorsComputer Science, Heath, Lenwood S., Zhang, Liqing, Grene, Ruth, Deng, Xinwei, ElHefnawi, Mahmoud M.
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
Detected LanguageEnglish
TypeDissertation
FormatETD, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0039 seconds