High throughput sequencing data are rich in information and contain many off-target sequences (reads) that are often ignored but may be biologically relevant. Seed extension, a combination of reference and de novo based assembly methods, can be used to extract the information but it is time-consuming to implement because it requires that multiple seeds (sequences from one or many closely related species) be gathered in advance. A new tool is presented here, SeedSQrrL, that can automatically crawl the web to gather the seeds from the closest taxonomic relative for each gene and store it into a relational database. The seeds can then be used to create multiple seed extensions which are later combined into a reference or used for downstream phylogenetic analysis. Patterns in the resulting gene trees can be searched for using the traditional methods of tree comparison (Robinson-Foulds topological distance and branch-length comparison methods). Currently, no open source tree pattern matching program exists that allows the user to modify algorithms and create their own custom pattern matching functions. I have worked on such a tool, called Treematcher, and it will be made available in the ETE Toolkit (a Python Environment for Tree Exploration). Three biological case studies will be included included to demonstrate the capabilities of the two programs: 1) a custom function in Treematcher to perform a regular expression-like query, 2) SeedSQrrL will be used to isolate mitochondrial genes from snakes and chloroplast genes from angiosperms, and 3) a large case study of animals will be assembled. / A Dissertation submitted to the Department of Scientific Computing in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2018. / April 2, 2018. / Automated Gene Reference Collection, Gene Tree Pattern Matching, High Throughput Sequence Analysis, NCBI Taxonomy, Open Source Software for Bioinformatics, Python / Includes bibliographical references. / Alan Lemmon, Professor Directing Dissertation; Michelle Arbeitman, University Representative; Anke Meyer-Baese, Committee Member; Peter Beerli, Committee Member; Dennis Slice, Committee Member.
Identifer | oai:union.ndltd.org:fsu.edu/oai:fsu.digital.flvc.org:fsu_657910 |
Contributors | Mechtley, Alisha (author), Lemmon, Alan R (professor directing dissertation), Arbeitman, Michelle N. (university representative), Meyer-Bäse, Anke (committee member), Beerli, Peter (committee member), Slice, Dennis E. (committee member), Florida State University (degree granting institution), College of Arts and Sciences (degree granting college), Department of Scientific Computing (degree granting departmentdgg) |
Publisher | Florida State University |
Source Sets | Florida State University |
Language | English, English |
Detected Language | English |
Type | Text, text, doctoral thesis |
Format | 1 online resource (73 pages), computer, application/pdf |
Page generated in 0.0022 seconds