Nucleic acid and protein databases such as GenBank are growing at a rate that perhaps eclipses even Moores Law of increase in computational power. This poses a problem for the biological sciences, which have become increasingly dependant on searching and manipulating these databases. It was once reasonably practical to perform exhaustive searches of these databases, for example using the algorithm described by Smith and Waterman, however it has been many years since this was the case. This has led to the development of a series of search algorithms, such as FASTA, BLAST and BLAT, that are each successively faster, but at similarly successive costs in terms of thoroughness.
Attempts have been made to remedy this problem by devising search algorithms that are both fast and thorough. An example is CAFE, which seeks to construct a search system with a sub-linear relationship between search time and database size, and argues that this property must be present for any search system to be successful in the long term.
This dissertation explores this notion by seeking to construct a search system that takes advantage of the growing redundancy in databases such as GenBank in order to reduce both the search time and the space required to store the databases and their indices, while preserving or increasing the thoroughness of the search.
The result is the creation and implementation of new genomic sequence search and alignment,
database compression, and index compression algorithms and systems that make progress toward resolving the problem of reducing search speed and space requirements while improving sensitivity. However, success is tempered by the need for databases with adequate local redundancy, and the computational cost of these algorithms when servicing un-batched queries.
Identifer | oai:union.ndltd.org:ADTP/202476 |
Date | January 2008 |
Creators | Gardner-Stephen, Paul Mark, paul.gardner-stephen@flinders.edu.au |
Publisher | Flinders University. Computer Science, Engineering & Mathematics |
Source Sets | Australiasian Digital Theses Program |
Language | English |
Detected Language | English |
Rights | http://www.flinders.edu.au/disclaimer/), Copyright Paul Mark Gardner-Stephen |
Page generated in 0.0016 seconds