Return to search

Explorations In Searching Compressed Nucleic Acid And Protein Sequence Databases And Their Cooperatively-Compressed Indices

Nucleic acid and protein databases such as GenBank are growing at a rate that perhaps eclipses even Moore’s Law of increase in computational power. This poses a problem for the biological sciences, which have become increasingly dependant on searching and manipulating these databases. It was once reasonably practical to perform exhaustive searches of these databases, for example using the algorithm described by Smith and Waterman, however it has been many years since this was the case. This has led to the development of a series of search algorithms, such as FASTA, BLAST and BLAT, that are each successively faster, but at similarly successive costs in terms of thoroughness.

Attempts have been made to remedy this problem by devising search algorithms that are both fast and thorough. An example is CAFE, which seeks to construct a search system with a sub-linear relationship between search time and database size, and argues that this property must be present for any search system to be successful in the long term.

This dissertation explores this notion by seeking to construct a search system that takes advantage of the growing redundancy in databases such as GenBank in order to reduce both the search time and the space required to store the databases and their indices, while preserving or increasing the thoroughness of the search.

The result is the creation and implementation of new genomic sequence search and alignment,
database compression, and index compression algorithms and systems that make progress toward resolving the problem of reducing search speed and space requirements while improving sensitivity. However, success is tempered by the need for databases with adequate local redundancy, and the computational cost of these algorithms when servicing un-batched queries.

Identiferoai:union.ndltd.org:ADTP/202476
Date January 2008
CreatorsGardner-Stephen, Paul Mark, paul.gardner-stephen@flinders.edu.au
PublisherFlinders University. Computer Science, Engineering & Mathematics
Source SetsAustraliasian Digital Theses Program
LanguageEnglish
Detected LanguageEnglish
Rightshttp://www.flinders.edu.au/disclaimer/), Copyright Paul Mark Gardner-Stephen

Page generated in 0.0019 seconds