Return to search

Some Common Subsequence Problems of Multiple Sequences and Their Applications

The longest common subsequence (LCS) problem is a famous and classical problem in computer science and molecular biology. The common subsequence of multiple sequences shows the identical and similar parts in these sequences. This dissertation pays attention to the approximate algorithms for finding the LCS of $k$ input sequence ($k$-LCS problem), the merged LCS problem, and the mosaic LCS problem. These three problems try to hunt out the identical relationships among the $k$ sequences, the interleaving relationship between a target sequence and a merged sequence of a pair of sequences, and the mosaic relationship between a target sequence and a set of sequences, respectively.
Given $k$ input sequences, the $k$-LCS problem is to find the LCS which is common in all sequences. We first propose two $sigma$-approximate algorithms for the $k$-LCS problem with time complexities $O(sigma k n)$ and $O(sigma^{2} k n + sigma^{3} n)$ respectively, where $sigma$ and $n$ are the alphabet size and length of sequences, respectively. Experimental results show that our algorithms for 2-LCS could be a good filter to select the candidate sequences in database searching.
Given a target sequence $T$ and a pair of merging sequences $A$ and $B$, the merged LCS problem is to find the LCS of $T$ and the optimally merged sequence by merging $A$ and $B$ alternately. Its goal is to find a merging way for understanding the interleaving relationship of sequences. We first propose an algorithm with $O(n^{3})$ time for solving the problem, where $n$ is the sequence length. We further add the block information of input sequences in the blocked merged LCS problem. To solve the latter problem, we propose an algorithm with time complexity $O(n^{2}m_{b})$, where $m_{b}$ is the number of blocks. Based on the S-table technique, we can design an improved algorithm with $O(n^{2} + nm_{b}^{2})$ time.
Additionally, we desire to obtain the relationship between one sequence and a set of sequences. Given a target sequence $T$ and a set $S$ of source sequences, the mosaic LCS problem is to find the LCS of $T$ and a mosaic sequence $C$, composed of repeatable $k$ sequences in $S$. Based on the concept of break points in $T$, a divide and conquer algorithm is proposed with $O(n^2m|S|+ n^3log k)$ time, where $n$ and $m$ are the lengths of $T$ and the maximal length of sequences in $S$, respectively. Again, based on the S-table technique, an improved algorithm with $O(n(m+k)|S|)$ time is proposed by applying an efficient preprocessing.

Identiferoai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0714107-160907
Date14 July 2007
CreatorsHuang, Kuo-Si
ContributorsYaw-Ling Lin, Kun-Mao Chao, Chungnan Lee, Chuan Yi Tang, Jung-Hsien Chiang, Chang-Biau Yang, Shi-Jinn Horng, Rong-Jaye Chen
PublisherNSYSU
Source SetsNSYSU Electronic Thesis and Dissertation Archive
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0714107-160907
Rightsoff_campus_withheld, Copyright information available at source archive

Page generated in 0.0026 seconds