1 |
A Genetic Algorithm for the Longest Common Subsequence of Multiple SequencesChiang, Chung-Han 06 January 2009 (has links)
Various approaches have been proposed for finding the longest
common subsequence (LCS) of two sequences. The time complexities
of these algorithms are usually $O(n^2)$ in the worst case, where
$n$ is the length of input sequences. However, these algorithms
would become infeasible when the input length, $n$, is very long.
Recently, the $k$-LCS $(k ≥ 2)$ problem has become more
attractive. Some algorithms have been proposed for solving the
problem, but the execution time required for solving the $k$-LCS
problem is still too long to be practical. In this thesis, we
propose a genetic algorithm for solving the $k$-LCS problem with
time complexity $O(Gpk(n + |P_j|))$, which $G$ is the number of
generations, $p$ is the number of template patterns, $k$ is the
number of input sequences, $n$ and $|P_j|$ are the length of input
sequences and the length of template patterns, respectively. As
our experimental results show, when $k$ is 20 and $n$ is 1000, the
performance ratio ($|CS|/|LCS|$) of our algorithm is greater than
0.8, where $|CS|$ denotes the length of the solution we find, and
$|LCS|$ represents the length of the real (optimal) LCS. Comparing
the performance ratios with Expansion Algorithm and BNMAS
Algorithm, our algorithm is much better than them when the number
of input sequences varies from 2 to 20 and the length of the input
sequences varies from 100 to 2000.
|
2 |
Some Common Subsequence Problems of Multiple Sequences and Their ApplicationsHuang, Kuo-Si 14 July 2007 (has links)
The longest common subsequence (LCS) problem is a famous and classical problem in computer science and molecular biology. The common subsequence of multiple sequences shows the identical and similar parts in these sequences. This dissertation pays attention to the approximate algorithms for finding the LCS of $k$ input sequence ($k$-LCS problem), the merged LCS problem, and the mosaic LCS problem. These three problems try to hunt out the identical relationships among the $k$ sequences, the interleaving relationship between a target sequence and a merged sequence of a pair of sequences, and the mosaic relationship between a target sequence and a set of sequences, respectively.
Given $k$ input sequences, the $k$-LCS problem is to find the LCS which is common in all sequences. We first propose two $sigma$-approximate algorithms for the $k$-LCS problem with time complexities $O(sigma k n)$ and $O(sigma^{2} k n + sigma^{3} n)$ respectively, where $sigma$ and $n$ are the alphabet size and length of sequences, respectively. Experimental results show that our algorithms for 2-LCS could be a good filter to select the candidate sequences in database searching.
Given a target sequence $T$ and a pair of merging sequences $A$ and $B$, the merged LCS problem is to find the LCS of $T$ and the optimally merged sequence by merging $A$ and $B$ alternately. Its goal is to find a merging way for understanding the interleaving relationship of sequences. We first propose an algorithm with $O(n^{3})$ time for solving the problem, where $n$ is the sequence length. We further add the block information of input sequences in the blocked merged LCS problem. To solve the latter problem, we propose an algorithm with time complexity $O(n^{2}m_{b})$, where $m_{b}$ is the number of blocks. Based on the S-table technique, we can design an improved algorithm with $O(n^{2} + nm_{b}^{2})$ time.
Additionally, we desire to obtain the relationship between one sequence and a set of sequences. Given a target sequence $T$ and a set $S$ of source sequences, the mosaic LCS problem is to find the LCS of $T$ and a mosaic sequence $C$, composed of repeatable $k$ sequences in $S$. Based on the concept of break points in $T$, a divide and conquer algorithm is proposed with $O(n^2m|S|+ n^3log k)$ time, where $n$ and $m$ are the lengths of $T$ and the maximal length of sequences in $S$, respectively. Again, based on the S-table technique, an improved algorithm with $O(n(m+k)|S|)$ time is proposed by applying an efficient preprocessing.
|
3 |
Metody vícenásobného zarovnávání nukleotidových sekvencí / Methods for multialignment of nucleotide sequencesTrněný, Ondřej January 2013 (has links)
To be able to understand characteristics and purpose of biological sequences correctly, it is crucial to have a possibility to sort and compare them. Because of this need and to extend existing knowledge pool, numerous methods were proposed. Especially in field of multiple sequences alignment. Methods for multiple sequences alignment may provide various valuable information about sequences which failed to show enough similarity in pairwise alignment. According to this, several algorithms were implemented in various computer applications which provide a way to analyse huge sets of data. One of those, the progressive alignment algorithm, is implemented as a part of this thesis
|
Page generated in 0.0794 seconds