• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • 1
  • Tagged with
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

A Genetic Algorithm for the Longest Common Subsequence of Multiple Sequences

Chiang, Chung-Han 06 January 2009 (has links)
Various approaches have been proposed for finding the longest common subsequence (LCS) of two sequences. The time complexities of these algorithms are usually $O(n^2)$ in the worst case, where $n$ is the length of input sequences. However, these algorithms would become infeasible when the input length, $n$, is very long. Recently, the $k$-LCS $(k ≥ 2)$ problem has become more attractive. Some algorithms have been proposed for solving the problem, but the execution time required for solving the $k$-LCS problem is still too long to be practical. In this thesis, we propose a genetic algorithm for solving the $k$-LCS problem with time complexity $O(Gpk(n + |P_j|))$, which $G$ is the number of generations, $p$ is the number of template patterns, $k$ is the number of input sequences, $n$ and $|P_j|$ are the length of input sequences and the length of template patterns, respectively. As our experimental results show, when $k$ is 20 and $n$ is 1000, the performance ratio ($|CS|/|LCS|$) of our algorithm is greater than 0.8, where $|CS|$ denotes the length of the solution we find, and $|LCS|$ represents the length of the real (optimal) LCS. Comparing the performance ratios with Expansion Algorithm and BNMAS Algorithm, our algorithm is much better than them when the number of input sequences varies from 2 to 20 and the length of the input sequences varies from 100 to 2000.
2

Some Common Subsequence Problems of Multiple Sequences and Their Applications

Huang, Kuo-Si 14 July 2007 (has links)
The longest common subsequence (LCS) problem is a famous and classical problem in computer science and molecular biology. The common subsequence of multiple sequences shows the identical and similar parts in these sequences. This dissertation pays attention to the approximate algorithms for finding the LCS of $k$ input sequence ($k$-LCS problem), the merged LCS problem, and the mosaic LCS problem. These three problems try to hunt out the identical relationships among the $k$ sequences, the interleaving relationship between a target sequence and a merged sequence of a pair of sequences, and the mosaic relationship between a target sequence and a set of sequences, respectively. Given $k$ input sequences, the $k$-LCS problem is to find the LCS which is common in all sequences. We first propose two $sigma$-approximate algorithms for the $k$-LCS problem with time complexities $O(sigma k n)$ and $O(sigma^{2} k n + sigma^{3} n)$ respectively, where $sigma$ and $n$ are the alphabet size and length of sequences, respectively. Experimental results show that our algorithms for 2-LCS could be a good filter to select the candidate sequences in database searching. Given a target sequence $T$ and a pair of merging sequences $A$ and $B$, the merged LCS problem is to find the LCS of $T$ and the optimally merged sequence by merging $A$ and $B$ alternately. Its goal is to find a merging way for understanding the interleaving relationship of sequences. We first propose an algorithm with $O(n^{3})$ time for solving the problem, where $n$ is the sequence length. We further add the block information of input sequences in the blocked merged LCS problem. To solve the latter problem, we propose an algorithm with time complexity $O(n^{2}m_{b})$, where $m_{b}$ is the number of blocks. Based on the S-table technique, we can design an improved algorithm with $O(n^{2} + nm_{b}^{2})$ time. Additionally, we desire to obtain the relationship between one sequence and a set of sequences. Given a target sequence $T$ and a set $S$ of source sequences, the mosaic LCS problem is to find the LCS of $T$ and a mosaic sequence $C$, composed of repeatable $k$ sequences in $S$. Based on the concept of break points in $T$, a divide and conquer algorithm is proposed with $O(n^2m|S|+ n^3log k)$ time, where $n$ and $m$ are the lengths of $T$ and the maximal length of sequences in $S$, respectively. Again, based on the S-table technique, an improved algorithm with $O(n(m+k)|S|)$ time is proposed by applying an efficient preprocessing.
3

Metody vícenásobného zarovnávání nukleotidových sekvencí / Methods for multialignment of nucleotide sequences

Trněný, Ondřej January 2013 (has links)
To be able to understand characteristics and purpose of biological sequences correctly, it is crucial to have a possibility to sort and compare them. Because of this need and to extend existing knowledge pool, numerous methods were proposed. Especially in field of multiple sequences alignment. Methods for multiple sequences alignment may provide various valuable information about sequences which failed to show enough similarity in pairwise alignment. According to this, several algorithms were implemented in various computer applications which provide a way to analyse huge sets of data. One of those, the progressive alignment algorithm, is implemented as a part of this thesis

Page generated in 0.0646 seconds