Return to search

Algorithmic problems in strings with applications to the analysis of biological sequences

Recent advances in molecular biology have dramatically changed the way biological data analysis is performed [119, 92]. Next-Generation Sequenc- ing (NGS) technologies produce high-throughput data of highly controlled quality, hundreds of times faster and cheaper than a decade ago. Mapping of short reads to a reference sequence is a fundamental problem in NGS technologies. After finding an occurrence of a high quality fragment of the read, the rest must be approximately aligned, but a good alignment would not be expected to contain a large number of gaps (consecutive insertions or deletions). We present an alternative alignment algorithm which computes the optimal alignment with a bounded number of gaps. Another problem arising from NGS technologies is merging overlapping reads into a single string. We present a data structure which allows for the efficient computation of the overlaps between two strings as well as being applicable to other problems. Weighted strings are a representation of data that allows for a subtle representation of ambiguity in strings. In this document we present algorithms for other problems related to weighted strings: the computation of exact and approximate inverted repeats in weighted strings, computing repetitions and computing covers. We investigate the average-case complexity of wildcard matching. Wildcards can be used to model single nucleotide polymorphisms and so, efficient algorithms to search for strings with wildcards are necessary. In this document we investigate how efficient algorithms for this problem can be on average. There exist many organisms such as viruses, bacteria, eukaryotic cells, and archaea which have a circular DNA structure. If a biologist wishes to find occurrences of a particular virus in a carriers DNA sequence which may not be circular it must be possible to efficiently locate occurrences of circular strings. In this document we present a number of algorithms for circular string matching.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:679776
Date January 2015
CreatorsBarton, Carl Samuel
ContributorsIliopoulos, Costas ; Crochemore, Maxime
PublisherKing's College London (University of London)
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttp://kclpure.kcl.ac.uk/portal/en/theses/algorithmic-problems-in-strings-with-applications-to-the-analysis-of-biological-sequences(461c8961-c256-4ff8-97f7-c0718709367d).html

Page generated in 0.0014 seconds