Return to search

Fast and efficient algorithms on strings and sequences

The last few decades have witnessed the fascinating outgrowth of the field of String Algorithms. And, with the ever increasing mass of information and rapid pace of dissemination and sharing thereof, the importance and relevance of these algorithms can be expected to grow even further in near future. In this thesis, we have studied a number of interesting problems on strings and sequences and taken an effort to devise efficient algorithms to solve them. The problems we have studied here falls into two main categories of problems, namely the String Matching Problems and the Longest Common Subsequence Problems. The first String matching problem handled in this thesis is the pattern matching problems under the don't care paradigm. Don't care pattern matching problems have been studied since 1974 and the quest for improved algorithms is still on. We present efficient algorithms for different versions of this problem. We also present efficient algorithms for a more general problem, namely the pattern matching problem with degenerate strings. Next, we consider another well-studied problem namely the Swap Matching problem. We present a new graph-theoretic approach to model the problem, which opens a new and hitherto unexplored avenue to solve it. Then, using the model, we devise an efficient algorithm, which works particularly well for short patterns. Computer assisted music analysis and music information retrieval has a number of tasks that can be related to stringology. To this extent, we consider the problem of identifying rhythms in a musical text, which can be used in automatic song classification. Since there doesn't exist a complete agreement on the definitions of related features, we take an effort to present a new paradigm to model this problem and devise an efficient algorithm to solve it. Next, we consider a number ofrelatively new interesting pattern matching problems from indexing point of view. We present efficient indexing schemes for the problem of pattern matching in given intervals, the property matching problem and the circular pattern matching problem. We conclude our study of string matching problems by focusing on devising an efficient data structure to index gapped factors. In the second part of this thesis, we study the longest common subsequence (LCS) problems and variants thereof. We first introduce a number of new LCS variants by applying the notion of gap-constraints. The new variants are motivated by practical applications from computational molecular biology and we present efficient algorithms to solve them. Then, using the techniques developed while solving these variants, we present efficient algorithms for the classic LCS problem and for another relatively new variant, namely, the Constrained LCS (CLCS) problem. Finally, we efficiently solve the LCS and CLCS problems for degenerate strings.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:617795
Date January 2007
CreatorsRahman, Mohammad Sohel
PublisherKing's College London (University of London)
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation

Page generated in 0.002 seconds