Global ETD Search

Return to search

STORI: selectable taxon ortholog retrieval iteratively

Speciation and gene duplication are fundamental evolutionary processes that enable biological innovation. For over a decade, biologists have endeavored to distinguish orthology (homology caused by speciation) from paralogy (homology caused by duplication). Disentangling orthology and paralogy is useful to diverse fields such as phylogenetics, protein engineering, and genome content comparison.

A common step in ortholog detection is the computation of Bidirectional Best Hits (BBH). However, we found this computation impractical for more than 24 Eukaryotic proteomes. Attempting to retrieve orthologs in less time than previous methods require, we developed a novel algorithm and implemented it as a suite of Perl scripts. This software, Selectable Taxon Ortholog Retrieval Iteratively (STORI), retrieves orthologous protein sequences for a set of user-defined proteomes and query sequences. While the time complexity of the BBH method is O(#taxa^2), we found that the average CPU time used by STORI may increase linearly with the number of taxa.

To demonstrate one aspect of STORI’s usefulness, we used this software to infer the orthologous sequences of 26 ribosomal proteins (rProteins) from the large ribosomal subunit (LSU), for a set of 115 Bacterial and 94 Archaeal proteomes. Next, we used established tree-search methods to seek the most probable evolutionary explanation of these data. The current implementation of STORI runs on Red Hat Enterprise Linux 6.0 with installations of Moab 5.3.7, Perl 5 and several Perl modules. STORI is available at: <http://github.com/jgstern/STORI>.

http://hdl.handle.net/1853/53377

Ortholog prediction

Protein sequence retrieval

Modeling bacterial evolution

Fusobacteria

Molecular sequence data management

Bayesian inference phylogenomics

Identifer	oai:union.ndltd.org:GATECH/oai:smartech.gatech.edu:1853/53377
Date	08 June 2015
Creators	Stern, Joshua Gallant
Contributors	Gaucher, Eric A.
Publisher	Georgia Institute of Technology
Source Sets	Georgia Tech Electronic Thesis and Dissertation Archive
Language	en_US
Detected Language	English
Type	Thesis, Dataset
Format	application/pdf

Page generated in 0.0018 seconds

STORI: selectable taxon ortholog retrieval iteratively

Description

Links & Downloads

Tags

Additional Fields