Spelling suggestions: "subject:"cqboost classification model"" "subject:"deboost classification model""
1 |
SYSTEMATICALLY LEARNING OF INTERNAL RIBOSOME ENTRY SITE AND PREDICTION BY MACHINE LEARNINGJunhui Wang (5930375) 15 May 2019 (has links)
<p><a>Internal
ribosome entry sites (IRES) are segments of the mRNA found in untranslated
regions, which can recruit the ribosome and initiate translation independently
of the more widely used 5’ cap dependent translation initiation mechanism. IRES
play an important role in conditions where has been 5’ cap dependent
translation initiation blocked or repressed. They have been found to play
important roles in viral infection, cellular apoptosis, and response to other
external stimuli. It has been suggested that about 10% of mRNAs, both viral and
cellular, can utilize IRES. But due to the limitations of IRES bicistronic
assay, which is a gold standard for identifying IRES, relatively few IRES have
been definitively described and functionally validated compared to the
potential overall population. Viral and cellular IRES may be mechanistically
different, but this is difficult to analyze because the mechanistic differences
are still not very clearly defined. Identifying additional IRES is an important
step towards better understanding IRES mechanisms. Development of a new
bioinformatics tool that can accurately predict IRES from sequence would be a
significant step forward in identifying IRES-based regulation, and in
elucidating IRES mechanism. This dissertation systematically studies the
features which can distinguish IRES from nonIRES sequences. Sequence features
such as kmer words, and structural features such as predicted MFE of folding, Q<sub>MFE</sub>,
and sequence/structure triplets are evaluated as possible discriminative
features. Those potential features incorporated into an IRES classifier based
on XGBboost, a machine learning model, to classify novel sequences as belong to
IRES or nonIRES groups. The XGBoost model performs better than previous
predictors, with higher accuracy and lower computational time. The number of
features in the model has been greatly reduced, compared to previous
predictors, by adding global kmer and structural features. The trained XGBoost
model has been implemented as the first high-throughput bioinformatics tool for
IRES prediction, IRESpy. This website provides a public tool for all IRES
researchers and can be used in other genomics applications such as gene
annotation and analysis of differential gene expression.</a></p>
|
Page generated in 0.1054 seconds