Return to search

Application of a Naïve Bayes Classifier to Assign Polyadenylation Sites from 3' End Deep Sequencing Data: A Dissertation

Cleavage and polyadenylation of a precursor mRNA is important for transcription termination, mRNA stability, and regulation of gene expression. This process is directed by a multitude of protein factors and cis elements in the pre-mRNA sequence surrounding the cleavage and polyadenylation site. Importantly, the location of the cleavage and polyadenylation site helps define the 3’ untranslated region of a transcript, which is important for regulation by microRNAs and RNA binding proteins. Additionally, these sites have generally been poorly annotated. To identify 3’ ends, many techniques utilize an oligo-dT primer to construct deep sequencing libraries. However, this approach can lead to identification of artifactual polyadenylation sites due to internal priming in homopolymeric stretches of adenines. Previously, simple heuristic filters relying on the number of adenines in the genomic sequence downstream of a putative polyadenylation site have been used to remove these sites of internal priming. However, these simple filters may not remove all sites of internal priming and may also exclude true polyadenylation sites. Therefore, I developed a naïve Bayes classifier to identify putative sites from oligo-dT primed 3’ end deep sequencing as true or false/internally primed. Notably, this algorithm uses a combination of sequence elements to distinguish between true and false sites. Finally, the resulting algorithm is highly accurate in multiple model systems and facilitates identification of novel polyadenylation sites.

Identiferoai:union.ndltd.org:umassmed.edu/oai:escholarship.umassmed.edu:gsbs_diss-1658
Date29 April 2013
CreatorsSheppard, Sarah E.
PublishereScholarship@UMassChan
Source SetsUniversity of Massachusetts Medical School
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceMorningside Graduate School of Biomedical Sciences Dissertations and Theses
RightsCopyright is held by the author, with all rights reserved.

Page generated in 0.0026 seconds