Spelling suggestions: "subject:"sequencealignment"" "subject:"sequenzalignment""
31 |
An Approach for Solving the Constrained Longest Common Subsequence ProblemPeng, Chao-Li 28 August 2003 (has links)
A subsequence is obtained by deleting zero or more symbols from a given sequence. For given two sequences, the longest common subsequence problem is to find a common subsequence whose length is the longest. The constrained longest common subsequence (CLCS) problem is to find a longest common subsequence that contains a specific subsequence. Note a CLCS may be shorter than an LCS.
In this thesis, we propose a dynamic programming algorithm for solving the CLCS problem. The time complexity is O(pmn), where m and n are the lengths of the given sequences and p is the length of the constraint sequence. Our algorithm
can also be applied to solve the constrained sequence alignment problem, which is a sequence alignment problem with the requirement that some specific symbols must be aligned together.
|
32 |
Metaheuristic Multiple Sequence Alignment OptimisationAuer, Jens January 2004 (has links)
<p>The ability to tackle NP-hard problems has been greatly extended by the introduction of Metaheuristics (see Blum & Roli (2003)) for a summary of most Metaheuristics, general problem-independent optimisation algorithms extending the hill-climbing local search approach to escape local minima. One of these algorithms is Iterated Local Search (ILS) (Lourenco et al., 2002; Stützle, 1999a, p. 25ff), a recent easy to implement but powerful algorithm with results comparable or superior to other state-of-the-art methods for many combinatorial optimisation problems, among them the Traveling Salesman (TSP) and Quadratic Assignment Problem (QAP). ILS iteratively samples local minima by modifying the current local minimum and restarting</p><p>a local search porcedure on this modified solution. This thesis will show how ILS can be implemented for MSA. After that, ILS will be evaluated and compared to other MSA algorithms by BAliBASE (Thomson et al., 1999), a set of manually refined alignments used in most recent publications of algorithms and in at least two MSA algorithm surveys. The runtime-behaviour will be evaluated using runtime-distributions.</p><p>The quality of alignments produced by ILS is at least as good as the best algorithms available and significantly superiour to previously published Metaheuristics for MSA, Tabu Search and Genetic Algorithm (SAGA). On the average, ILS performed best in five out of eight test cases, second for one test set and third for the remaining two. A drawback of all iterative methods for MSA is the long runtime needed to produce good alignments. ILS needs considerably less runtime than Tabu Search and SAGA, but can not compete with progressive or consistency based methods, e. g. ClustalW or T-COFFEE.</p>
|
33 |
Comparative analyses of land plant plastid genomesCai, Zhengqiu 27 January 2011 (has links)
The availability of complete plastid genomes has been playing an important role in resolving phylogenetic relationships among the major clades of land plants and in improving our understanding of the evolution of genomic organization. The increased availability of complete genome sequences has enabled researchers to build large multi-gene datasets for phylogenetic and molecular evolutionary studies. In chapter 2 of this thesis a web-based multiple sequence web viewer and alignment tool (MSWAT) is developed to handle large amount of data generated from complete genome sequences for phylogenetic and evolutionary analyses. We expect that MSWAT will be of general interest to biologists who are building large data matrices for evolutionary analyses. The third chapter presents the sequenced plastid genomes of three magnoliids, Drimys (Canellales), Liriodendron (Magnoliales), and Piper (Piperales). Data from these genomes, in combination with 32 other angiosperm plastid genomes, were used to assess phylogenetic relationships of magnoliids to other angiosperms and to examine patterns of variation of GC content. Evolutionary comparisons of three new magnoliid plastid genome sequences, combined with other published angiosperm genomes, confirm that GC content is unevenly distributed across the genome by location, codon position, and functional group. Furthermore, phylogenetic analyses provide the strongest support so far for the hypothesis that the magnoliids are sister to a large clade that includes both monocots and eudicots. The fourth chapter presents the Trifolium subterraneum plastid genome sequence, which is unusual in genome size and organization relative to other angiosperm plastid genomes. The Trifolium plastid genome is an excellent model system to examine mechanisms of rearrangements and the evolution of repeats and unique DNA. / text
|
34 |
Probabilistic Approaches in Comparative Analysis of Biological Networks and SequencesSahraeian, Sayed 1983- 02 October 2013 (has links)
Comparative analysis of genomic data investigates the relationship of genome structure and function across different biological species to shed light on their similarities and differences. In this dissertation, we study two important problems in comparative genomics, namely comparative sequence analysis and comparative network analysis.
In the comparative sequence analysis, we study the multiple sequence alignment of protein and DNA sequences as well as the structural alignment of multiple RNA sequences. For closely related sequences, multiple sequence alignment can be efficiently performed through progressive techniques. However, for divergent sequences it is very challenging to predict an accurate alignment. Here, we introduce PicXAA, an efficient non-progressive technique for multiple protein and DNA sequence alignment. We also further extend PicXAA to PicXAA-R for structural alignment of RNA sequences. PicXAA and PicXAA-R greedily build up the alignment from sequence regions with high local similarity, thereby yielding an accurate global alignment that effectively captures local similarities among sequences.
As another important research area in comparative genomics, we also investigate the comparative network analysis problem. Translation of increasing number of large-scale biological networks into meaningful biological insights requires efficient computational techniques. One such example is network querying, which aims to identify subnetwork regions in a large target network that are similar to a given query network. Here, we introduce an efficient algorithm for querying large-scale biological networks, called RESQUE. RESQUE adopts a semi-Markov random walk model to probabilistically estimate the correspondence scores between nodes that belong to different networks. The target network is iteratively reduced based on the estimated correspondence scores until the best matching subnetwork emerges. The proposed network querying scheme is computationally efficient, can handle any network query with an arbitrary topology, and yields accurate querying results. We also extend the idea used in RESQUE to develop an efficient algorithm for alignment of multiple large-scale biological networks, called SMETANA. SMETANA outperforms state-of- the-art network alignment techniques, in terms of both computational efficiency and alignment accuracy.
The accomplished studies have enabled us to provide a coherent framework for probabilistic approach to comparative analysis of biological sequences and networks. Such a probabilistic framework helps us employ rigorous mathematical schemes to find accurate and efficient solutions to these problems.
|
35 |
Structural and functional studies of minor pseudopilins from the type 2 secretion system of Vibrio cholerae /Yáñez, Marissa Elena. January 2007 (has links)
Thesis (Ph. D.)--University of Washington, 2007. / Vita. Includes bibliographical references (leaves 175-194).
|
36 |
Methods and applications in DNA sequence alignments /Sherwood, Ellen, January 2007 (has links)
Diss. (sammanfattning) Stockholm : Karolinska institutet, 2007. / Härtill 5 uppsatser.
|
37 |
Orthology and protein domain architecture evolution /Hollich, Volker, January 2006 (has links)
Diss. (sammanfattning) Stockholm : Karol. inst., 2006. / Härtill 7 uppsatser.
|
38 |
Prediction of function shift in protein families /Abhiman, Saraswathi, January 2006 (has links)
Diss. (sammanfattning) Stockholm : Karolinska institutet, 2006. / Härtill 4 uppsatser.
|
39 |
Characterization and Analysis of a Novel Platform for Profiling the Antibody ResponseJanuary 2011 (has links)
abstract: Immunosignaturing is a new immunodiagnostic technology that uses random-sequence peptide microarrays to profile the humoral immune response. Though the peptides have little sequence homology to any known protein, binding of serum antibodies may be detected, and the pattern correlated to disease states. The aim of my dissertation is to analyze the factors affecting the binding patterns using monoclonal antibodies and determine how much information may be extracted from the sequences. Specifically, I examined the effects of antibody concentration, competition, peptide density, and antibody valence. Peptide binding could be detected at the low concentrations relevant to immunosignaturing, and a monoclonal's signature could even be detected in the presences of 100 fold excess naive IgG. I also found that peptide density was important, but this effect was not due to bivalent binding. Next, I examined in more detail how a polyreactive antibody binds to the random sequence peptides compared to protein sequence derived peptides, and found that it bound to many peptides from both sets, but with low apparent affinity. An in depth look at how the peptide physicochemical properties and sequence complexity revealed that there were some correlations with properties, but they were generally small and varied greatly between antibodies. However, on a limited diversity but larger peptide library, I found that sequence complexity was important for antibody binding. The redundancy on that library did enable the identification of specific sub-sequences recognized by an antibody. The current immunosignaturing platform has little repetition of sub-sequences, so I evaluated several methods to infer antibody epitopes. I found two methods that had modest prediction accuracy, and I developed a software application called GuiTope to facilitate the epitope prediction analysis. None of the methods had sufficient accuracy to identify an unknown antigen from a database. In conclusion, the characteristics of the immunosignaturing platform observed through monoclonal antibody experiments demonstrate its promise as a new diagnostic technology. However, a major limitation is the difficulty in connecting the signature back to the original antigen, though larger peptide libraries could facilitate these predictions. / Dissertation/Thesis / Ph.D. Molecular and Cellular Biology 2011
|
40 |
Scalable tools for high-throughput viral sequence analysisHossain, A. S. Md Mukarram January 2017 (has links)
Viral sequence data are increasingly being used to estimate evolutionary and epidemiological parameters to understand the dynamics of viral diseases. This thesis focuses on developing novel and improved computational methods for high-throughput analysis of large viral sequence datasets. I have developed a novel computational pipeline, Pipelign, to detect potentially unrelated sequences from groups of viral sequences during sequence alignment. Pipelign detected a large number of unrelated and mis-annotated sequences from several viral sequence datasets collected from GenBank. I subsequently developed ANVIL, a machine learning-based recombination detection and subtyping framework for pathogen sequences. ANVIL's performance was benchmarked using two large HIV datasets collected from the Los Alamos HIV Sequence Database and the UK HIV Drug Resistance Database, as well as on simulated data. Finally, I present a computational pipeline named Phlow, for rapid phylodynamic inference of heterochronous pathogen sequence data. Phlow is implemented with specialised and published analysis tools to infer important phylodynamic parameters from large datasets. Phlow was run with three empirical viral datasets and their outputs were compared with published results. These results show that Phlow is suitable for high-throughput exploratory phylodynamic analysis of large viral datasets. When combined, these three novel computational tools offer a comprehensive system for large scale viral sequence analysis addressing three important aspects: 1) establishing accurate evolutionary history, 2) recombination detection and subtyping, and 3) inferring phylodynamic history from heterochronous sequence datasets.
|
Page generated in 0.0896 seconds