Spelling suggestions: "subject:"subsequence"" "subject:"subsequences""
1 |
A Survey of the Longest Common Subsequence Problem and Its Related ProblemsChen, Jian-bein 12 September 2005 (has links)
ABSTRACT
The longest common subsequence (LCS) problem is a classical problem both in com-
binational optimization and computational biology. During the past few decades,
a considerable number of studies have been focused on LCS and its related prob-
lems. In this thesis, we shall present a complete survey on LCS and its related
problems, and review some e¡Ócient algorithms for solving these problems. We shall
also give the de¡Ânition to each problem, present some theorems, and illustrate time
complexity and space complexity of the algorithms for solving these problems.
|
2 |
Algorithms for Scaled String Indexing and LCS VariantsPeng, Yung-Hsing 20 July 2010 (has links)
Related problems of string indexing and sequence analysis have been widely studied for a long time. Recently, researchers turn to consider extended versions of these problems, which provides more realistic applications. In this dissertation, we focus on three problems of recent interest, which are (1)the
indexing problem for scaled strings, (2)the merged longest common subsequence problem and its variant with blocks, and (3)the sequence alignment
problem with weighted constraints.
The indexing problem for scaled strings asks one to preprocess a text string T, so that the matched positions of a pattern string P in T, with some
scales £\ applied to P, can be reported efficiently. In this dissertation, we propose efficient algorithms for indexing real scaled strings, discretely scaled
strings, and proportionally scaled strings. Our indexing algorithms achieve either significant improvements to previous results, or the best known results. The merged longest common subsequence (merged LCS) problem aims to detect the interleaving relationship between sequences, which has important applications to genomic and signal comparison. In this dissertation, we propose improved algorithms for finding the merged LCS. Our
algorithms for finding the merged LCS are also more efficient than the previous results, especially for large alphabets. Finally, the sequence alignment problem with weighted constraints is a newly proposed problem in this dissertation. For this new problem, we first propose an efficient solution, and then show that the concept of weighted constraints can be further used to solve many constraint-related problems on sequences. Therefore, our results in this dissertation have significant contributions to the field of string indexing and sequence analysis.
|
3 |
A Genetic Algorithm for the Longest Common Subsequence of Multiple SequencesChiang, Chung-Han 06 January 2009 (has links)
Various approaches have been proposed for finding the longest
common subsequence (LCS) of two sequences. The time complexities
of these algorithms are usually $O(n^2)$ in the worst case, where
$n$ is the length of input sequences. However, these algorithms
would become infeasible when the input length, $n$, is very long.
Recently, the $k$-LCS $(k ≥ 2)$ problem has become more
attractive. Some algorithms have been proposed for solving the
problem, but the execution time required for solving the $k$-LCS
problem is still too long to be practical. In this thesis, we
propose a genetic algorithm for solving the $k$-LCS problem with
time complexity $O(Gpk(n + |P_j|))$, which $G$ is the number of
generations, $p$ is the number of template patterns, $k$ is the
number of input sequences, $n$ and $|P_j|$ are the length of input
sequences and the length of template patterns, respectively. As
our experimental results show, when $k$ is 20 and $n$ is 1000, the
performance ratio ($|CS|/|LCS|$) of our algorithm is greater than
0.8, where $|CS|$ denotes the length of the solution we find, and
$|LCS|$ represents the length of the real (optimal) LCS. Comparing
the performance ratios with Expansion Algorithm and BNMAS
Algorithm, our algorithm is much better than them when the number
of input sequences varies from 2 to 20 and the length of the input
sequences varies from 100 to 2000.
|
4 |
Aspects of random matrix theory: concentration and subsequence problemsXu, Hua 17 November 2008 (has links)
The present work studies some aspects of random matrix theory. Its first part is devoted to the asymptotics of random matrices with infinitely divisible, in particular heavy-tailed, entries. Its second part focuses on relations between limiting law in subsequence problems and spectra of random matrices.
|
5 |
An Approach for Solving the Constrained Longest Common Subsequence ProblemPeng, Chao-Li 28 August 2003 (has links)
A subsequence is obtained by deleting zero or more symbols from a given sequence. For given two sequences, the longest common subsequence problem is to find a common subsequence whose length is the longest. The constrained longest common subsequence (CLCS) problem is to find a longest common subsequence that contains a specific subsequence. Note a CLCS may be shorter than an LCS.
In this thesis, we propose a dynamic programming algorithm for solving the CLCS problem. The time complexity is O(pmn), where m and n are the lengths of the given sequences and p is the length of the constraint sequence. Our algorithm
can also be applied to solve the constrained sequence alignment problem, which is a sequence alignment problem with the requirement that some specific symbols must be aligned together.
|
6 |
Finding the Longest Increasing Subsequence of Every SubstringTseng, Chiou-Ting 27 August 2006 (has links)
Given a string S = {a1, a2, a3, ..., an}, the longest increasing subsequence (LIS) problem is to find a subsequence of the given string such that the subsequence
is increasing and its length is maximal. In a previous result, to find the longest increasing subsequences of each sliding window with a fixed size w of a given string
with length n can be solved in O(w log log n+OUTPUT) time, where O(w log log n+ w^2) time is taken for preprocessing and OUTPUT is the sum of all output lengths. In this thesis, we solve the problem for finding the longest increasing subsequence of every substring of S. With the straightforward implementation of the previous result, the time required for the preprocessing would be O(n^3). We modify the data structure used in the algorithm, hence the required preprocessing time is improved to O(n^2). The time required for the report stage is linear to the size of the output. In other words, our algorithm can find the LIS of every substring in O(n^2+OUTPUT) time. If the LIS's of all substrings are desired to be reported, since there are O(n^2) substrings totally in a given string with length n, our algorithm is optimal.
|
7 |
Some Common Subsequence Problems of Multiple Sequences and Their ApplicationsHuang, Kuo-Si 14 July 2007 (has links)
The longest common subsequence (LCS) problem is a famous and classical problem in computer science and molecular biology. The common subsequence of multiple sequences shows the identical and similar parts in these sequences. This dissertation pays attention to the approximate algorithms for finding the LCS of $k$ input sequence ($k$-LCS problem), the merged LCS problem, and the mosaic LCS problem. These three problems try to hunt out the identical relationships among the $k$ sequences, the interleaving relationship between a target sequence and a merged sequence of a pair of sequences, and the mosaic relationship between a target sequence and a set of sequences, respectively.
Given $k$ input sequences, the $k$-LCS problem is to find the LCS which is common in all sequences. We first propose two $sigma$-approximate algorithms for the $k$-LCS problem with time complexities $O(sigma k n)$ and $O(sigma^{2} k n + sigma^{3} n)$ respectively, where $sigma$ and $n$ are the alphabet size and length of sequences, respectively. Experimental results show that our algorithms for 2-LCS could be a good filter to select the candidate sequences in database searching.
Given a target sequence $T$ and a pair of merging sequences $A$ and $B$, the merged LCS problem is to find the LCS of $T$ and the optimally merged sequence by merging $A$ and $B$ alternately. Its goal is to find a merging way for understanding the interleaving relationship of sequences. We first propose an algorithm with $O(n^{3})$ time for solving the problem, where $n$ is the sequence length. We further add the block information of input sequences in the blocked merged LCS problem. To solve the latter problem, we propose an algorithm with time complexity $O(n^{2}m_{b})$, where $m_{b}$ is the number of blocks. Based on the S-table technique, we can design an improved algorithm with $O(n^{2} + nm_{b}^{2})$ time.
Additionally, we desire to obtain the relationship between one sequence and a set of sequences. Given a target sequence $T$ and a set $S$ of source sequences, the mosaic LCS problem is to find the LCS of $T$ and a mosaic sequence $C$, composed of repeatable $k$ sequences in $S$. Based on the concept of break points in $T$, a divide and conquer algorithm is proposed with $O(n^2m|S|+ n^3log k)$ time, where $n$ and $m$ are the lengths of $T$ and the maximal length of sequences in $S$, respectively. Again, based on the S-table technique, an improved algorithm with $O(n(m+k)|S|)$ time is proposed by applying an efficient preprocessing.
|
8 |
A Hybrid Algorithm for the Longest Common Subsequence of Multiple SequencesWeng, Hsiang-yi 19 August 2009 (has links)
The k-LCS problem is to find the longest common subsequence (LCS) of k input sequences. It is difficult while the number of input sequences is large.
In the past, researchers focused on finding the LCS of two sequences (2-LCS). However, there is no good algorithm for finding the optimal solution of k-LCS up to now. For solving the k-LCS problem, in this thesis, we first propose a mixed algorithm, which is a combination of a heuristic algorithm, genetic algorithm (GA) and ant colony optimization (ACO) algorithm.
Then, we propose an enhanced ACO (EACO) algorithm, composed of the heuristic algorithm and matching pair algorithm (MPA). In our experiments, we compare our algorithms with expansion algorithm, best next for maximal available symbol algorithm, GA and ACO algorithm. The experimental results on several sets of DNA and protein sequences show that our EACO algorithm outperforms other algorithms in the lengths of solutions.
|
9 |
The Distribution of the Length of the Longest Increasing Subsequence in Random Permutations of Arbitrary Multi-setsAl-Meanazel, Ayat 07 October 2015 (has links)
The distribution theory of runs and patterns has a long and rich history. In this dissertation we study the distribution of some run-related statistics in sequences and random permutations of arbitrary multi-sets. Using the finite Markov chain imbedding technique (FMCI), which was proposed by Fu and Koutras (1994), we proposed an alternative method to calculate the exact distribution of the total number of adjacent increasing and adjacent consecutive increasing subsequences in sequences.
Fu and Hsieh (2015) obtained the exact distribution of the length of the longest increasing subsequence in random permutations. To the best of our knowledge, little or no work has been done on the exact distribution of the length of the longest increasing subsequence in random permutations of arbitrary multi-sets. Here we obtained the exact distribution of the length of the longest increasing subsequence in random permutations of arbitrary multi-sets. We also obtain the the exact distribution of the length of the longest increasing subsequence for the set of all permutations of length N generated from {1,2,...,n}. / February 2016
|
10 |
Study of Parallel Algorithms Related to Subsequence Problems on the Sequent Multiprocessor SystemPothuru, Surendra 08 1900 (has links)
The primary purpose of this work is to study, implement and analyze the performance of parallel algorithms related to subsequence problems. The problems include string to string correction problem, to determine the longest common subsequence problem and solving the sum-range-product, 1 —D pattern matching, longest non-decreasing (non-increasing) (LNS) and maximum positive subsequence (MPS) problems. The work also includes studying the techniques and issues involved in developing parallel applications. These algorithms are implemented on the Sequent Multiprocessor System. The subsequence problems have been defined, along with performance metrics that are utilized. The sequential and parallel algorithms have been summarized. The implementation issues which arise in the process of developing parallel applications have been identified and studied.
|
Page generated in 0.0788 seconds