Return to search

Using machine learning to predict long non-coding RNAs and exploring their evolutionary patterns and prevalence in plant transcriptomes

Long non-protein coding RNAs (lncRNAs) represent a diverse and enigmatic classification of RNA. With roles associated with development and stress responses, these non-coding gene regulators are essential, and yet remain understudied in plants. Thus far, of just over 430 experimentally validated lncRNAs, only 13 are derived from plant systems and many of which do not meet the classic criteria of the RNA class. Without a solid definition of what makes a lncRNA, and few empirically validated transcripts, methods currently available for prediction fall short. To address this deficiency in lncRNA research, we constructed and applied a machine learning-based lncRNA prediction protocol that does not impose predefined rules, and utilises only experimentally confirmed lncRNAs in its training datasets.
Through model evaluation, we found that our novel lncRNA prediction tool had an estimated accuracy of over 96%. In a study that predicted lncRNAs from transcriptomes of evolutionary diverse plant species, we determined that molecular features of lncRNAs display different phylogenetic signal patterns compared to protein-coding genes. Additionally, our analyses suggested that stress-resistant species express fewer lncRNAs than more stress sensitive species. To expand on these results, we used the prediction tool in concert with a transcriptomic study of two natural accessions of the drought tolerant species Eutrema salsugineum. Previously reported to show little physiological differences in a first drought, but differ significantly in a second, we instead demonstrated that the two ecotypes displayed vastly different transcriptomic responses, including the expression of lncRNAs, to a first and second drought treatment. In conclusion, the prediction tool can be applied to studies to further our knowledge of lncRNA evolution and as an additional tool in classic transcriptomic studies. The suggested importance of lncRNAs in drought resistance, and evidence of expression in two natural E. salsugineum accessions, merits further studies on the molecular and evolutionary mechanisms of these putatively regulatory transcripts. / Thesis / Doctor of Philosophy (PhD)

Identiferoai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/24319
Date January 2019
CreatorsSimopoulos, Caitlin
ContributorsWeretilnyk, Elizabeth, Golding, Brian, Biology
Source SetsMcMaster University
LanguageEnglish
Detected LanguageEnglish
TypeThesis

Page generated in 0.0018 seconds