Global ETD Search

1	CSA-X: Modularized Constrained Multiple Sequence Alignment 2015 October 1900 (has links) Imposing additional constraints on multiple sequence alignment (MSA) algorithms can often produce more biologically meaningful alignments. Hence, various constrained multiple sequence alignment (CMSA) algorithms have been developed in the literature, where researchers used anchor points, regular expressions, or context-free-grammars to specify the constraints, wherein alignments produced are forced to align around segments that match the constraints. In this thesis, we propose CSA-X, a modularized program of constrained multiple sequence alignment that accepts constraints in the form of regular expressions. It uses an arbitrary underlying multiple sequence alignment program to generate alignments, and is therefore modular. The name CSA-X refers to our proposed program generally, where the letter X is substituted with the name of a (non-constrained) multiple sequence alignment algorithm which is used as underlying MSA engine in the proposed program. We compare the accuracy of our program with another constrained multiple sequence alignment program called RE-MuSiC that similarly uses regular expressions for constraints. In addition, comparisons are also made to the underlying MSA programs (without constraints). The BAliBASE 3.0 benchmark database is used to assess the performance of the proposed program CSA-X, other MSA programs, and CMSA programs considered in this study. Based on the results presented herein, CSA-X outperforms RE-MuSiC, and scores well against the underlying alignment programs. It also shows that the use of regular expression constraints, if chosen well, created from the least conserved region of the correct alignments, improves the alignment accuracy. In this study, ProbCons and T-Coffee are used as the underlying MSA programs in CSA-X, and the accuracy of the alignments are measured in terms of Q score and TC score. On average, CSA-X used with constraints identified from the least conserved regions of the correct alignments achieves results that are 17.65% more for Q score, and 23.7% more for TC score compared to RE-MuSiC. In fact, CSA-X with ProbCons (CSA-PC) achieves a higher score in over 97.9% of the cases for Q score, and over 96.4% of the cases for TC score. In addition, CSA-X with T-Coffee (CSA-TCOF) achieves a higher score in over 97.7% of the cases for Q score, and over 94.8% of the cases for TC score. Furthermore, CSA-X with regular expressions created from the least conserved regions of the correct alignments achieves higher accuracy scores compared to standalone ProbCons and T-Coffee. To measure the statistical significance of CSA-X results, the Wilcoxon rank-sum test and Wilcoxon signed-rank test are performed, and these tests show that CSA-X results for the least conserved regular expression constraint sets from the correct BAliBASE 3.0 alignments are significantly different than those from RE-MuSiC, ProbCons, and T-Coffee.
2	Improving the quality of multiple sequence alignment Lu, Yue 15 May 2009 (has links) Multiple sequence alignment is an important bioinformatics problem, with applications in diverse types of biological analysis, such as structure prediction, phylogenetic analysis and critical sites identification. In recent years, the quality of multiple sequence alignment was improved a lot by newly developed methods, although it remains a difficult task for constructing accurate alignments, especially for divergent sequences. In this dissertation, we propose three new methods (PSAlign, ISPAlign, and NRAlign) for further improving the quality of multiple sequences alignment. In PSAlign, we propose an alternative formulation of multiple sequence alignment based on the idea of finding a multiple alignment which preserves all the pairwise alignments specified by edges of a given tree. In contrast with traditional NP-hard formulations, our preserving alignment formulation can be solved in polynomial time without using a heuristic, while still retaining very good performance when compared to traditional heuristics. In ISPAlign, by using additional hits from database search of the input sequences, a few strategies have been proposed to significantly improve alignment accuracy, including the construction of profiles from the hits while performing profile alignment, the inclusion of high scoring hits into the input sequences, the use of intermediate sequence search to link distant homologs, and the use of secondary structure information. In NRAlign, we observe that it is possible to further improve alignment accuracy by taking into account alignment of neighboring residues when aligning two residues, thus making better use of horizontal information. By modifying existing multiple alignment algorithms to make use of horizontal information, we show that this strategy is able to consistently improve over existing algorithms on all the benchmarks that are commonly used to measure alignment accuracy. Multiple Sequence Alignment Algorithms Bioinformatics
3	Multiple sequence alignment augmented by expert user constraints Jin, Lingling 13 April 2010 Sequence alignment has become one of the most common tasks in bioinformatics. Most of the existing sequence alignment methods use general scoring schemes. But these alignments are sometimes not completely relevant because they do not necessarily provide the desired information. It would be extremely difficult, if not impossible, to include any possible objective into an algorithm. Our goal is to allow a working biologist to augment a given alignment with additional information based on their knowledge and objectives.<p></p>In this thesis, we will formally define constraints and compatible constraint sets for an alignment which require some positions of the sequences to be aligned together. Using this approach, one can align some specific segments such as domains within protein sequences by inputting constraints (the positions of the segments on the sequences), and the algorithm will automatically find an optimal alignment in which the segments are aligned together.<p></p>A necessary prerequisite of calculating an alignment is that the constraints inputted be compatible with each other, and we will develop algorithms to check this condition for both pairwise and multiple sequence alignments. The algorithms are based on a depth-first search on a graph that is converted from the constraints and the alignment. We then develop algorithms to perform pairwise and multiple sequence alignments satisfying these compatible constraints.<p></p>Using straightforward dynamic programming for pairwise sequence alignment satisfying a compatible constraint set, an optimal alignment corresponds to a path going through the dynamic programming matrix, and as we are only using single-position constraints, a constraint can be represented as a point on the matrix, so a compatible constraint set is a set of points. We try to determine a new path, rather than the original path, that achieves the highest score which goes through all the compatible constraint set points. The path is a concatenation of sub-paths, so that only the scores in the sub-matrices need to be calculated. This means the time required to get the new path decreases as the number of constraints increases, and it also varies as the positions of the points change. It can be further reduced by using the information from the original alignment, which can offer a significant speed gain.<p></p>We then use exact and progressive algorithms to find multiple sequence alignments satisfying a compatible constraint set, which are extensions of pairwise sequence alignments. With exact algorithms for three sequences, where constraints are represented as lines, we discuss a method to force the optimal path to cross the constraint lines. And with progressive algorithms, we use a set of pairwise alignments satisfying compatible constraints to construct multiple sequence alignments progressively. Because they are more complex, we leave some extensions as future work. multiple sequence alignment constraint compatible constraint set
4	Multiple sequence alignment augmented by expert user constraints Jin, Lingling 13 April 2010 (has links) Sequence alignment has become one of the most common tasks in bioinformatics. Most of the existing sequence alignment methods use general scoring schemes. But these alignments are sometimes not completely relevant because they do not necessarily provide the desired information. It would be extremely difficult, if not impossible, to include any possible objective into an algorithm. Our goal is to allow a working biologist to augment a given alignment with additional information based on their knowledge and objectives.<p></p>In this thesis, we will formally define constraints and compatible constraint sets for an alignment which require some positions of the sequences to be aligned together. Using this approach, one can align some specific segments such as domains within protein sequences by inputting constraints (the positions of the segments on the sequences), and the algorithm will automatically find an optimal alignment in which the segments are aligned together.<p></p>A necessary prerequisite of calculating an alignment is that the constraints inputted be compatible with each other, and we will develop algorithms to check this condition for both pairwise and multiple sequence alignments. The algorithms are based on a depth-first search on a graph that is converted from the constraints and the alignment. We then develop algorithms to perform pairwise and multiple sequence alignments satisfying these compatible constraints.<p></p>Using straightforward dynamic programming for pairwise sequence alignment satisfying a compatible constraint set, an optimal alignment corresponds to a path going through the dynamic programming matrix, and as we are only using single-position constraints, a constraint can be represented as a point on the matrix, so a compatible constraint set is a set of points. We try to determine a new path, rather than the original path, that achieves the highest score which goes through all the compatible constraint set points. The path is a concatenation of sub-paths, so that only the scores in the sub-matrices need to be calculated. This means the time required to get the new path decreases as the number of constraints increases, and it also varies as the positions of the points change. It can be further reduced by using the information from the original alignment, which can offer a significant speed gain.<p></p>We then use exact and progressive algorithms to find multiple sequence alignments satisfying a compatible constraint set, which are extensions of pairwise sequence alignments. With exact algorithms for three sequences, where constraints are represented as lines, we discuss a method to force the optimal path to cross the constraint lines. And with progressive algorithms, we use a set of pairwise alignments satisfying compatible constraints to construct multiple sequence alignments progressively. Because they are more complex, we leave some extensions as future work. multiple sequence alignment constraint compatible constraint set
5	Multiple Sequence Alignment Using the Clustering Method Huang, Kuen-Feng 23 August 2001 (has links) The multiple sequence alignment (MSA) is a fundamental technique of molecular biology. Biological sequences are aligned with each other vertically in order to show the similarities and differences among them. Due to its importance, many algorithms have been proposed. With dynamic programming, finding the optimal alignment for a pair of sequences can be done in O(n2) time, where n is the length of the two strings. Unfortunately, for the general optimization problem of aligning k sequences of length n , O(nk) time is required. In this thesis, we shall first propose an efficient group alignment method to perform the alignment between two groups of sequences. Then we shall propose a clustering method to build the tree topology for merging. The clustering method is based on the concept that the two sequences having the longest distance should be split into two clusters. By our experiments, both the alignment quality and required time of our algorithm are better than those of NJ (neighbor joining) algorithm and Clustal W algorithm. Affine gap penalty Bioinformatics Multiple Sequence Alignment
6	Family of Hidden Markov Models and its applications to phylogenetics and metagenomics Nguyen, Nam-phuong Duc 24 October 2014 (has links) A Profile Hidden Markov Model (HMM) is a statistical model for representing a multiple sequence alignment (MSA). Profile HMMs are important tools for sequence homology detection and have been used in wide a range of bioinformatics applications including protein structure prediction, remote homology detection, and sequence alignment. Profile HMM methods result in accurate alignments on datasets with evolutionarily similar sequences; however, I will show that on datasets with evolutionarily divergent sequences, the accuracy of HMM-based methods degrade. My dissertation presents a new statistical model for representing an MSA by using a set of HMMs. The family of HMM (fHMM) approach uses multiple HMMs instead of a single HMM to represent an MSA. I present a new algorithm for sequence alignment using the fHMM technique. I show that using the fHMM technique for sequence alignment results in more accurate alignments than the single HMM approach. As sequence alignment is a fundamental step in many bioinformatics pipelines, improvements to sequence alignment result in improvements across many different fields. I show the applicability of fHMM to three specific problems: phylogenetic placement, taxonomic profiling and identification, and MSA estimation. In phylogenetic placement, the problem addressed is how to insert a query sequence into an existing tree. In taxonomic identification and profiling, the problems addressed are how to taxonomically classify a query sequence, and how to estimate a taxonomic profile on a set of sequences. Finally, both profile HMM and fHMM require a backbone MSA as input in order to align the query sequences. In MSA estimation, the problem addressed is how to estimate a ``de novo'' MSA without the use of an existing backbone alignment. For each problem, I present a software pipeline that implements the fHMM specifically for that domain: SEPP for phylogenetic placement, TIPP for taxonomic profiling and identification, and UPP for MSA estimation. I show that SEPP has improved accuracy compared to the single HMM approach. I also show that SEPP results in more accurate phylogenetic placements compared to existing placement methods, and SEPP is more computationally efficient, both in peak memory usage and running time. I show that TIPP more accurately classifies novel sequences compared to the single HMM approach, and TIPP estimates more accurate taxonomic profiles than leading methods on simulated metagenomic datasets. I show how UPP can estimate ``de novo'' alignments using fHMM. I present results that show UPP is more accurate and efficient than existing alignment methods, and estimates accurate alignments and trees on datasets containing both full-length and fragmentary sequences. Finally, I show that UPP can estimate a very accurate alignment on a dataset with 1,000,000 sequences in less than 2 days without the need of a supercomputer. / Computer Sciences / text Metagenomics Hidden Markov models Phylogenetics Multiple sequence alignment
7	Metaheuristic Multiple Sequence Alignment Optimisation Auer, Jens January 2004 (has links) <p>The ability to tackle NP-hard problems has been greatly extended by the introduction of Metaheuristics (see Blum & Roli (2003)) for a summary of most Metaheuristics, general problem-independent optimisation algorithms extending the hill-climbing local search approach to escape local minima. One of these algorithms is Iterated Local Search (ILS) (Lourenco et al., 2002; Stützle, 1999a, p. 25ff), a recent easy to implement but powerful algorithm with results comparable or superior to other state-of-the-art methods for many combinatorial optimisation problems, among them the Traveling Salesman (TSP) and Quadratic Assignment Problem (QAP). ILS iteratively samples local minima by modifying the current local minimum and restarting</p><p>a local search porcedure on this modified solution. This thesis will show how ILS can be implemented for MSA. After that, ILS will be evaluated and compared to other MSA algorithms by BAliBASE (Thomson et al., 1999), a set of manually refined alignments used in most recent publications of algorithms and in at least two MSA algorithm surveys. The runtime-behaviour will be evaluated using runtime-distributions.</p><p>The quality of alignments produced by ILS is at least as good as the best algorithms available and significantly superiour to previously published Metaheuristics for MSA, Tabu Search and Genetic Algorithm (SAGA). On the average, ILS performed best in five out of eight test cases, second for one test set and third for the remaining two. A drawback of all iterative methods for MSA is the long runtime needed to produce good alignments. ILS needs considerably less runtime than Tabu Search and SAGA, but can not compete with progressive or consistency based methods, e. g. ClustalW or T-COFFEE.</p> Computer science Datavetenskap
8	Técnicas de otimização em alinhamentos múltiplos de sequência via Cadeias de Markov / Optimization techniques for multiple sequence alignments by Markov Chains Nóbrega, Juliano Farias da [UNESP] 29 February 2016 (has links) Submitted by Juliano Farias da Nobrega null (juliano@e8.com.br) on 2016-04-13T15:21:20Z No. of bitstreams: 1 dissert_juliano_unesp.pdf: 1652677 bytes, checksum: 2d05540d73450af0ce70d07689eeac2a (MD5) / Rejected by Felipe Augusto Arakaki (arakaki@reitoria.unesp.br), reason: Solicitamos que realize uma nova submissão seguindo as orientações abaixo: O arquivo submetido está sem a ficha catalográfica. A versão submetida por você é considerada a versão final da dissertação/tese, portanto não poderá ocorrer qualquer alteração em seu conteúdo após a aprovação. Corrija esta informação e realize uma nova submissão contendo o arquivo correto. Agradecemos a compreensão. on 2016-04-14T20:43:40Z (GMT) / Submitted by Juliano Farias da Nobrega null (juliano@e8.com.br) on 2016-04-15T13:45:15Z No. of bitstreams: 1 Dissertacao_Juliano_Unesp.pdf: 1798501 bytes, checksum: 97b5fd5aa56bbac1dd28b2e73b516bd4 (MD5) / Approved for entry into archive by Ana Paula Grisoto (grisotoana@reitoria.unesp.br) on 2016-04-18T13:22:17Z (GMT) No. of bitstreams: 1 nobrega_jf_me_sjrp.pdf: 1798501 bytes, checksum: 97b5fd5aa56bbac1dd28b2e73b516bd4 (MD5) / Made available in DSpace on 2016-04-18T13:22:17Z (GMT). No. of bitstreams: 1 nobrega_jf_me_sjrp.pdf: 1798501 bytes, checksum: 97b5fd5aa56bbac1dd28b2e73b516bd4 (MD5) Previous issue date: 2016-02-29 / Recentemente, a bioinformática tornou-se um recurso imprescindível para a análise e interpretação da grande quantidade de informação biológica gerada pela biologia molecular e pelos sequenciadores de última geração. O processo de comparação dessas biossequências é o ponto de partida para o estudo da evolução e diferenciação dos organismos vivos, além de ser uma das tarefas mais importantes na biologia computacional. Neste trabalho apresenta-se uma abordagem baseada na heurística de Cadeias de Markov para otimização de um algoritmo de alinhamento múltiplo de sequências biológicas, proporcionando resultados com mais qualidade e sem o comprometimento do desempenho da ferramenta MUSCLE, escolhida para dar suporte ao trabalho. As cadeias de Markov foram escolhidas como técnica de otimização devido sua eficiente aplicabilidade em diversos problemas, sobretudo na biologia computacional, pois sua metodologia probabilística torna a aplicação computacionalmente viável, contornando os problemas NP-difícil e apresentando resultados significamente precisos. / Recently, bioinformatics has become an indispensable tool for analyzing and interpreting large amounts of information biological generated by molecular biology and the next-generation sequencers. The comparison process these sequences is the starting point for the study of evolution and differentiation of living organisms as well as being one of the most important tasks in computational biology. This work presents an approach based on Markov chains heuristics for optimization of a multiple alignment algorithm of biological sequences, provides improved quality results and without compromising the performance of MUSCLE tool chosen to support the work.. Markov chains were chosen as optimization technique due to its efficient applicability in various other problems, especially in computational biology, as its probabilistic methodology makes applying computationally feasible, bypassing the NP-hard problems and stating significantly accurate results. Bioinformática Modelos de Makov Alinhamento múltiplo de sequências Bionformatics Multiple sequence alignment
9	Metaheuristic Multiple Sequence Alignment Optimisation Auer, Jens January 2004 (has links) The ability to tackle NP-hard problems has been greatly extended by the introduction of Metaheuristics (see Blum & Roli (2003)) for a summary of most Metaheuristics, general problem-independent optimisation algorithms extending the hill-climbing local search approach to escape local minima. One of these algorithms is Iterated Local Search (ILS) (Lourenco et al., 2002; Stützle, 1999a, p. 25ff), a recent easy to implement but powerful algorithm with results comparable or superior to other state-of-the-art methods for many combinatorial optimisation problems, among them the Traveling Salesman (TSP) and Quadratic Assignment Problem (QAP). ILS iteratively samples local minima by modifying the current local minimum and restarting a local search porcedure on this modified solution. This thesis will show how ILS can be implemented for MSA. After that, ILS will be evaluated and compared to other MSA algorithms by BAliBASE (Thomson et al., 1999), a set of manually refined alignments used in most recent publications of algorithms and in at least two MSA algorithm surveys. The runtime-behaviour will be evaluated using runtime-distributions. The quality of alignments produced by ILS is at least as good as the best algorithms available and significantly superiour to previously published Metaheuristics for MSA, Tabu Search and Genetic Algorithm (SAGA). On the average, ILS performed best in five out of eight test cases, second for one test set and third for the remaining two. A drawback of all iterative methods for MSA is the long runtime needed to produce good alignments. ILS needs considerably less runtime than Tabu Search and SAGA, but can not compete with progressive or consistency based methods, e. g. ClustalW or T-COFFEE. Computer Sciences Datavetenskap (datalogi)
10	A Local Improvement Algorithm for Multiple Sequence Alignment Zhang, Xiaodong 04 April 2003 (has links) No description available. Multiple Sequence Alignment Algorithm Simulated Annealing NP-Complete

Search results