Cleavage and polyadenylation of a precursor mRNA is important for transcription termination, mRNA stability, and regulation of gene expression. This process is directed by a multitude of protein factors and cis elements in the pre-mRNA sequence surrounding the cleavage and polyadenylation site. Importantly, the location of the cleavage and polyadenylation site helps define the 3’ untranslated region of a transcript, which is important for regulation by microRNAs and RNA binding proteins. Additionally, these sites have generally been poorly annotated. To identify 3’ ends, many techniques utilize an oligo-dT primer to construct deep sequencing libraries. However, this approach can lead to identification of artifactual polyadenylation sites due to internal priming in homopolymeric stretches of adenines. Previously, simple heuristic filters relying on the number of adenines in the genomic sequence downstream of a putative polyadenylation site have been used to remove these sites of internal priming. However, these simple filters may not remove all sites of internal priming and may also exclude true polyadenylation sites. Therefore, I developed a naïve Bayes classifier to identify putative sites from oligo-dT primed 3’ end deep sequencing as true or false/internally primed. Notably, this algorithm uses a combination of sequence elements to distinguish between true and false sites. Finally, the resulting algorithm is highly accurate in multiple model systems and facilitates identification of novel polyadenylation sites.
Identifer | oai:union.ndltd.org:umassmed.edu/oai:escholarship.umassmed.edu:gsbs_diss-1658 |
Date | 29 April 2013 |
Creators | Sheppard, Sarah E. |
Publisher | eScholarship@UMassChan |
Source Sets | University of Massachusetts Medical School |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Morningside Graduate School of Biomedical Sciences Dissertations and Theses |
Rights | Copyright is held by the author, with all rights reserved. |
Page generated in 0.0024 seconds