Spelling suggestions: "subject:"dequence alignment"" "subject:"dequence lignment""
1 |
CSA-X: Modularized Constrained Multiple Sequence Alignment2015 October 1900 (has links)
Imposing additional constraints on multiple sequence alignment (MSA) algorithms can often produce more biologically meaningful alignments. Hence, various constrained multiple sequence alignment (CMSA) algorithms have been developed in the literature, where researchers used anchor points, regular expressions, or context-free-grammars to specify the constraints, wherein alignments produced are forced to align around segments that match the constraints.
In this thesis, we propose CSA-X, a modularized program of constrained multiple sequence alignment that accepts constraints in the form of regular expressions. It uses an arbitrary underlying multiple sequence alignment program to generate alignments, and is therefore modular. The name CSA-X refers to our proposed program generally, where the letter X is substituted with the name of a (non-constrained) multiple sequence alignment algorithm which is used as underlying MSA engine in the proposed program. We compare the accuracy of our program with another constrained multiple sequence alignment program called RE-MuSiC that similarly uses regular expressions for constraints. In addition, comparisons are also made to the underlying MSA programs (without constraints).
The BAliBASE 3.0 benchmark database is used to assess the performance of the proposed program CSA-X, other MSA programs, and CMSA programs considered in this study. Based on the results presented herein, CSA-X outperforms RE-MuSiC, and scores well against the underlying alignment programs. It also shows that the use of regular expression constraints, if chosen well, created from the least conserved region of the correct alignments, improves the alignment accuracy. In this study, ProbCons and T-Coffee are used as the underlying MSA programs in CSA-X, and the accuracy of the alignments are measured in terms of Q score and TC score. On average, CSA-X used with constraints identified from the least conserved regions of the correct alignments achieves results that are 17.65% more for Q score, and 23.7% more for TC score compared to RE-MuSiC. In fact, CSA-X with ProbCons (CSA-PC) achieves a higher score in over 97.9% of the cases for Q score, and over 96.4% of the cases for TC score. In addition, CSA-X with T-Coffee (CSA-TCOF) achieves a higher score in over 97.7% of the cases for Q score, and over 94.8% of the cases for TC score. Furthermore, CSA-X with regular expressions created from the least conserved regions of the correct alignments achieves higher accuracy scores compared to standalone ProbCons and T-Coffee. To measure the statistical significance of CSA-X results, the Wilcoxon rank-sum test and Wilcoxon signed-rank test are performed, and these tests show that CSA-X results for the least conserved regular expression constraint sets from the correct BAliBASE 3.0 alignments are significantly different than those from RE-MuSiC, ProbCons, and T-Coffee.
|
2 |
Gene annotation using Ab initio protein structure prediction : method development and application to major protein families /Bonneau, Richard A. January 2001 (has links)
Thesis (Ph. D.)--University of Washington, 2001. / Vita. Includes bibliographical references (leaves 130-144).
|
3 |
Aligning multiple sequences adaptivelyYe, Yongtao, 叶永滔 January 2014 (has links)
With the rapid development of genome sequencing, an ever-increasing number of molecular biology analyses rely on the construction of an accurate multiple sequence alignment (MSA), such as motifs detection, phylogeny inference and structure prediction. Although many methods have been developed during the last two decades, most of them may perform poorly on some types of inputs, in particular when families of sequences fall below thirty percent similarity. Therefore, this thesis introduced two different effective approaches to improve the overall quality of multiple sequence alignment.
First, by considering the similarity of the input sequences, we proposed an adaptive approach to compute better substitution matrices for each pair of sequences, and then apply the progressive alignment method to align them. For example, for inputs with high similarity, we consider the whole sequences and align them with global pair-Hidden Markov model, while for those with moderate low similarity, we may ignore the ank regions and use some local pair-Hidden Markov models to align them. To test the effectiveness of this approach, we have implemented a multiple sequence alignment tool called GLProbs and compared its performance with one dozen leading tools on three benchmark alignment databases, and GLProbs' alignments have the best scores in almost all testings. We have also evaluated the practicability of the alignments of GLProbs by applying the tool to three biological applications, namely phylogenetic tree reconstruction, protein secondary structure prediction and the detection of high risk members for cervical cancer in the HPV-E6 family, and the results are very encouraging.
Second, based on our previous study, we proposed another new tool PnpProbs, which constructs better multiple sequence alignments by better handling of guide trees. It classifies input sequences into two types: normally related sequences and distantly related sequences. For normally related sequences, it uses an adaptive approach to construct the guide tree, and based on this guide tree, aligns the sequences progressively. To be more precise, it first estimates the input's discrepancy by computing the standard deviation of their percent identities, and based on this estimate, it chooses the best method to construct the guide tree. For distantly related sequences, PnpProbs abandons the guide tree; instead it uses the non-progressive sequence annealing method to construct the multiple sequence alignment. By combining the strength of the progressive and non-progressive methods, and with a better way to construct the guide tree, PnpProbs improves the quality of multiple sequence alignments significantly for not only general input sequences, but also those very distantly related.
With those encouraging empirical results, our developed software tools have been appreciated by the community gradually. For example, GLProbs has been invited and incorporated into the JAva Bioinformatics Analysis Web Services system (JABAWS). / published_or_final_version / Computer Science / Master / Master of Philosophy
|
4 |
MULTIPLE SEQUENCES ALIGNMENT FOR PHYLOGENETIC TREE CONSTRUCTION USING GRAPHICS PROCESSING UNITSHe, Jintai 01 January 2008 (has links)
Sequence alignment has become a routine procedure in evolutionary biology in looking for evolutionary relationships between primary sequences of DNA, RNA, and protein. Smith Waterman and Needleman Wunsch algorithms are two algorithms respectively for local alignment and global alignment. Both of them are based on dynamic programming and guarantee optimal results. They have been widely used for the past dozens of years. However, time and space requirement increase exponentially with the number of sequences increase. Here I present a novel approach to improve the performance of sequence alignment by using graphics processing unit which is capable of handling large amount of data in parallel.
|
5 |
Improving the quality of multiple sequence alignmentLu, Yue 15 May 2009 (has links)
Multiple sequence alignment is an important bioinformatics problem, with applications
in diverse types of biological analysis, such as structure prediction, phylogenetic analysis
and critical sites identification. In recent years, the quality of multiple sequence
alignment was improved a lot by newly developed methods, although it remains a
difficult task for constructing accurate alignments, especially for divergent sequences.
In this dissertation, we propose three new methods (PSAlign, ISPAlign, and NRAlign)
for further improving the quality of multiple sequences alignment.
In PSAlign, we propose an alternative formulation of multiple sequence alignment based
on the idea of finding a multiple alignment which preserves all the pairwise alignments
specified by edges of a given tree. In contrast with traditional NP-hard formulations, our
preserving alignment formulation can be solved in polynomial time without using a
heuristic, while still retaining very good performance when compared to traditional
heuristics. In ISPAlign, by using additional hits from database search of the input sequences, a few
strategies have been proposed to significantly improve alignment accuracy, including the
construction of profiles from the hits while performing profile alignment, the inclusion
of high scoring hits into the input sequences, the use of intermediate sequence search to
link distant homologs, and the use of secondary structure information.
In NRAlign, we observe that it is possible to further improve alignment accuracy by
taking into account alignment of neighboring residues when aligning two residues, thus
making better use of horizontal information. By modifying existing multiple alignment
algorithms to make use of horizontal information, we show that this strategy is able to
consistently improve over existing algorithms on all the benchmarks that are commonly
used to measure alignment accuracy.
|
6 |
Biological sequence analyses theory, algorithms, and applications /Ma, Fangrui. January 2009 (has links)
Thesis (Ph.D.)--University of Nebraska-Lincoln, 2009. / Title from title screen (site viewed October 13, 2009). PDF text: xv, 233 p. : ill. ; 4 Mb. UMI publication number: AAT 3360173. Includes bibliographical references. Also available in microfilm and microfiche formats.
|
7 |
A two-pronged approach to improve distant homology detectionLee, Marianne M. January 2009 (has links)
Thesis (Ph. D.)--Ohio State University, 2009. / Title from first page of PDF file. Includes bibliographical references (p. 91-100).
|
8 |
Sequence alignment : algorithm development and applications /Jiang, Tianwei. January 2009 (has links)
Includes bibliographical references (p. 64-71).
|
9 |
The limits of progressive multiple sequence alignment /Sheneman, Lucas James. January 1900 (has links)
Thesis (Ph. D., Bioinformatics and Computational Biology)--University of Idaho, August 2008. / Major professor: James A. Foster. Includes bibliographical references (leaves 90-94). Also available online (PDF file) by subscription or by purchasing the individual file.
|
10 |
Salvinorin A fragment synthesis and modeling studies /McGovern, Donna Lue, January 1900 (has links)
Thesis (Ph. D.)--Virginia Commonwealth University, 2009. / Prepared for: Dept. of Medicinal Chemistry. Title from title-page of electronic thesis. Bibliography: leaves 126-138.
|
Page generated in 0.0835 seconds