Global ETD Search

11	Multiple sequence alignment pomocí genetických algoritmů / Multiple sequence alignment using genetic algorithms Pátek, Zdeněk January 2012 (has links) Title: Multiple sequence alignment using genetic algorithms Author: Zdeněk Pátek Department: Department of Software and Computer Science Education Supervisor: RNDr. František Mráz, CSc. Abstract: The thesis adresses the problem of multiple sequence alignment (MSA). It contains the specication of the proposed method MSAMS that allows to find motifs in biological sequences, to split sequences to blocks using the motifs, to solve MSA on the blocks and nally to assemble the global alignment from the aligned blocks and motifs. Motif search and MSA are both solved using genetic algorithms. The thesis describes the implementation of the method, conguration of its settings, benchmarking on the BAliBASE database and comparison to the ClustalW program. Experimental results showed that MSAMS can discover better alignments than ClustalW. Keywords: multiple sequence alignment, motif nding, genetic algorithms, ClustalW
12	Multiple sequence alignment augmented by expert user constraints Jin, Lingling 13 April 2010 Sequence alignment has become one of the most common tasks in bioinformatics. Most of the existing sequence alignment methods use general scoring schemes. But these alignments are sometimes not completely relevant because they do not necessarily provide the desired information. It would be extremely difficult, if not impossible, to include any possible objective into an algorithm. Our goal is to allow a working biologist to augment a given alignment with additional information based on their knowledge and objectives.<p></p>In this thesis, we will formally define constraints and compatible constraint sets for an alignment which require some positions of the sequences to be aligned together. Using this approach, one can align some specific segments such as domains within protein sequences by inputting constraints (the positions of the segments on the sequences), and the algorithm will automatically find an optimal alignment in which the segments are aligned together.<p></p>A necessary prerequisite of calculating an alignment is that the constraints inputted be compatible with each other, and we will develop algorithms to check this condition for both pairwise and multiple sequence alignments. The algorithms are based on a depth-first search on a graph that is converted from the constraints and the alignment. We then develop algorithms to perform pairwise and multiple sequence alignments satisfying these compatible constraints.<p></p>Using straightforward dynamic programming for pairwise sequence alignment satisfying a compatible constraint set, an optimal alignment corresponds to a path going through the dynamic programming matrix, and as we are only using single-position constraints, a constraint can be represented as a point on the matrix, so a compatible constraint set is a set of points. We try to determine a new path, rather than the original path, that achieves the highest score which goes through all the compatible constraint set points. The path is a concatenation of sub-paths, so that only the scores in the sub-matrices need to be calculated. This means the time required to get the new path decreases as the number of constraints increases, and it also varies as the positions of the points change. It can be further reduced by using the information from the original alignment, which can offer a significant speed gain.<p></p>We then use exact and progressive algorithms to find multiple sequence alignments satisfying a compatible constraint set, which are extensions of pairwise sequence alignments. With exact algorithms for three sequences, where constraints are represented as lines, we discuss a method to force the optimal path to cross the constraint lines. And with progressive algorithms, we use a set of pairwise alignments satisfying compatible constraints to construct multiple sequence alignments progressively. Because they are more complex, we leave some extensions as future work. multiple sequence alignment constraint compatible constraint set
13	Comparison of Methods Used for Aligning Protein Sequences Madangopal, Sangeetha 05 December 2006 (has links) Comparing protein sequences is an essential procedure that has many applications in the field of bioinformatics. The recent advances in computational capabilities and algorithm design, simplified the comparison procedure of protein sequences from several databases. Various algorithms have emerged using state of the art approaches to match protein sequences based on structural and functional properties of the amino acids. The matching involves structural alignment, and this alignment may be global; comprising of the whole length of the protein, or local; comprising of the sub-sequences of the proteins. Families of related proteins are found by clustering sequence alignments. The frequency distributions of the amino acids within these different clusters define the sequence profile. The best alignment algorithm uses these profiles. In this thesis, we have studied different profile alignment algorithms where the cost function for comparing two profiles is changed. These are compared to the FFAS3 (Fold and Function Assignment) algorithm. Sequence Alignment Comparison Methods Computer Sciences
14	Multiple sequence alignment augmented by expert user constraints Jin, Lingling 13 April 2010 (has links) Sequence alignment has become one of the most common tasks in bioinformatics. Most of the existing sequence alignment methods use general scoring schemes. But these alignments are sometimes not completely relevant because they do not necessarily provide the desired information. It would be extremely difficult, if not impossible, to include any possible objective into an algorithm. Our goal is to allow a working biologist to augment a given alignment with additional information based on their knowledge and objectives.<p></p>In this thesis, we will formally define constraints and compatible constraint sets for an alignment which require some positions of the sequences to be aligned together. Using this approach, one can align some specific segments such as domains within protein sequences by inputting constraints (the positions of the segments on the sequences), and the algorithm will automatically find an optimal alignment in which the segments are aligned together.<p></p>A necessary prerequisite of calculating an alignment is that the constraints inputted be compatible with each other, and we will develop algorithms to check this condition for both pairwise and multiple sequence alignments. The algorithms are based on a depth-first search on a graph that is converted from the constraints and the alignment. We then develop algorithms to perform pairwise and multiple sequence alignments satisfying these compatible constraints.<p></p>Using straightforward dynamic programming for pairwise sequence alignment satisfying a compatible constraint set, an optimal alignment corresponds to a path going through the dynamic programming matrix, and as we are only using single-position constraints, a constraint can be represented as a point on the matrix, so a compatible constraint set is a set of points. We try to determine a new path, rather than the original path, that achieves the highest score which goes through all the compatible constraint set points. The path is a concatenation of sub-paths, so that only the scores in the sub-matrices need to be calculated. This means the time required to get the new path decreases as the number of constraints increases, and it also varies as the positions of the points change. It can be further reduced by using the information from the original alignment, which can offer a significant speed gain.<p></p>We then use exact and progressive algorithms to find multiple sequence alignments satisfying a compatible constraint set, which are extensions of pairwise sequence alignments. With exact algorithms for three sequences, where constraints are represented as lines, we discuss a method to force the optimal path to cross the constraint lines. And with progressive algorithms, we use a set of pairwise alignments satisfying compatible constraints to construct multiple sequence alignments progressively. Because they are more complex, we leave some extensions as future work. multiple sequence alignment constraint compatible constraint set
15	Multiple Sequence Alignment Using the Clustering Method Huang, Kuen-Feng 23 August 2001 (has links) The multiple sequence alignment (MSA) is a fundamental technique of molecular biology. Biological sequences are aligned with each other vertically in order to show the similarities and differences among them. Due to its importance, many algorithms have been proposed. With dynamic programming, finding the optimal alignment for a pair of sequences can be done in O(n2) time, where n is the length of the two strings. Unfortunately, for the general optimization problem of aligning k sequences of length n , O(nk) time is required. In this thesis, we shall first propose an efficient group alignment method to perform the alignment between two groups of sequences. Then we shall propose a clustering method to build the tree topology for merging. The clustering method is based on the concept that the two sequences having the longest distance should be split into two clusters. By our experiments, both the alignment quality and required time of our algorithm are better than those of NJ (neighbor joining) algorithm and Clustal W algorithm. Affine gap penalty Bioinformatics Multiple Sequence Alignment
16	Exploring microbial community structures and functions of activated sludge by high-throughput sequencing Ye, Lin, 叶林 January 2012 (has links) To investigate the diversities and abundances of nitrifiers and to apply the highthroughput sequencing technologies to analyze the overall microbial community structures and functions in the wastewater treatment bioreactors were the major objectives of this study. Specifically, this study was conducted: (1) to investigate the diversities and abundances of AOA, AOB and NOB in bioreactors, (2) to explore the bacterial communities in bioreactors using 454 pyrosequencing, and (3) to analyze the metagenomes of activated sludge using Illumina sequencing. A lab-scale nitrification bioreactor was operated for 342 days under low DO (0.15~0.5 mg/L) and high nitrogen loading (0.26~0.52 kg-N/(m3d)). T-RFLP and cloning analysis showed there were only one dominant AOA, AOB and NOB species in the bioreactor, respectively. The amoA gene of the dominant AOA had a similarity of 89.3% with the isolated AOA species Nitrosopumilus maritimus SCM1. The AOB species detected in the bioreactor belonged to Nitrosomonas genus. The abundance of AOB was more than 40 times larger than that of AOA. The percentage of NOB in total bacteria increased from not detectable to 30% when DO changed from 0.15 to 0.5 mg/L. Compared with traditional methods, pyrosequencing analysis of the bacteria in this bioreactor provided unprecedented information. 494 bacterial OTUs was obtained at 3% distance cutoff. Furthermore, 454 pyrosequencing was applied to investigate the bacterial communities of activated sludge samples from 14 WWTPs of Asia (mainland China, Hong Kong, and Singapore) and North America (Canada and the United States). The results revealed huge amounts of OTUs in activated sludge, i.e. 1183~3567 OTUs in one sludge sample at 3% distance cutoff. Clear geographical differences among these samples were observed. The AOB amoA genes in different WWTPs were found quite diverse while the 16S rRNA genes were relatively conserved. To explore microbial community structures and functions in the abovementioned labscale bioreactor and a full-scale bioreactor, over six gigabases of metagenomic sequence data and 150,000 paired-end reads of PCR amplicons were generated from the activated sludge in the two bioreactors on Illumina HiSeq2000 platform. Three kinds of sequences (16S rRNA amplicons, 16S rRNA gene tags and predicted genes) were used to conduct taxonomic assignment and their applicabilities and reliabilities were compared. Specially, based on 16S rRNA and amoA gene sequences, AOB were found more abundant than AOA in the two bioreactors. Furthermore, the analysis of the metabolic profiles and pathways indicated that the overall pathways in the two bioreactors were quite similar. However, the abundances of some specific genes in the two bioreactors were different. In addition, 454 pyrosequencing was also used to detect potentially pathogenic bacteria in environmental samples. It was found most abundant potentially pathogenic bacteria in the WWTPs were affiliated with Aeromonas and Clostridium. Aeromonas veronii, Aeromonas hydrophila and Clostridium perfringens were species most similar to the potentially pathogenic bacteria found in this study. Overall, the percentage of the sequences closely related to known pathogenic bacteria sequences was about 0.16% of the total sequences. Additionally, a Java application (BAND) was developed for graphical visualization of microbial abundance data. / published_or_final_version / Civil Engineering / Doctoral / Doctor of Philosophy Sequence alignment (Bioinformatics) Microbial genomics - Data processing.
17	Iterative de Bruijn graph assemblers for second-generation sequencing reads Peng, Yu, 彭煜 January 2012 (has links) The recent advance of second-generation sequencing technologies has made it possible to generate a vast amount of short read sequences from a DNA (cDNA) sample. Current short read assemblers make use of the de Bruijn graph, in which each vertex is a k-mer and each edge connecting vertex u and vertex v represents u and v appearing in a read consecutively, to produce contigs. There are three major problems for de Bruijn graph assemblers: (1) branch problem, due to errors and repeats; (2) gap problem, due to low or uneven sequencing depth; and (3) error problem, due to sequencing errors. A proper choice of k value is a crucial tradeoff in de Bruijn graph assemblers: a low k value leads to fewer gaps but more branches; a high k value leads to fewer branches but more gaps. In this thesis, I first analyze the fundamental genome assembly problem and then propose an iterative de Bruijn graph assembler (IDBA), which iterates from low to high k values, to construct a de Bruijn graph with fewer branches and fewer gaps than any other de Bruijn graph assembler using a fixed k value. Then, the second-generation sequencing data from metagenomic, single-cell and transcriptome samples is investigated. IDBA is then tailored with special treatments to handle the specific issues for each kind of data. For metagenomic sequencing data, a graph partition algorithm is proposed to separate de Bruijn graph into dense components, which represent similar regions in subspecies from the same species, and multiple sequence alignment is used to produce consensus of each component. For sequencing data with highly uneven depth such as single-cell and metagenomic sequencing data, a method called local assembly is designed to reconstruct missing k-mers in low-depth regions. Then, based on the observation that short and relatively low-depth contigs are more likely erroneous, progressive depth on contigs is used to remove errors in both low-depth and high-depth regions iteratively. For transcriptome sequencing data, a variant of the progressive depth method is adopted to decompose the de Bruijn graph into components corresponding to transcripts from the same gene, and then the transcripts are found in each component by considering the reads and paired-end reads support. Plenty of experiments on both simulated and real data show that IDBA assemblers outperform the existing assemblers by constructing longer contigs with higher completeness and similar or better accuracy. The running time of IDBA assemblers is comparable to existing algorithms, while the memory cost is usually less than the others. / published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy Nucleotide sequence - Data processing. Sequence alignment (Bioinformatics)
18	EFFICIENT CONSTRUCTION OF ACCURATE MULTIPLE ALIGNMENTS AND LARGE-SCALE PHYLOGENIES Wheeler, Travis John January 2009 (has links) A central focus of computational biology is to organize and make use of vast stores of molecular sequence data. Two of the most studied and fundamental problems in the field are sequence alignment and phylogeny inference. The problem of multiple sequence alignment is to take a set of DNA, RNA, or protein sequences and identify related segments of these sequences. Perhaps the most common use of alignments of multiple sequences is as input for methods designed to infer a phylogeny, or tree describing the evolutionary history of the sequences. The two problems are circularly related: standard phylogeny inference methods take a multiple sequence alignment as input, while computation of a rudimentary phylogeny is a step in the standard multiple sequence alignment method.Efficient computation of high-quality alignments, and of high-quality phylogenies based on those alignments, are both open problems in the field of computational biology. The first part of the dissertation gives details of my efforts to identify a best-of-breed method for each stage of the standard form-and-polish heuristic for aligning multiple sequences; the result of these efforts is a tool, called Opal, that achieves state-of-the-art 84.7% accuracy on the BAliBASE alignment benchmark. The second part of the dissertation describes a new algorithm that dramatically increases the speed and scalability of a common method for phylogeny inference called neighbor-joining; this algorithm is implemented in a new tool, called NINJA, which is more than an order of magnitude faster than a very fast implementation of the canonical algorithm, for example building a tree on 218,000 sequences in under 6 days using a single processor computer. consistency neighbor joining phylogeny sequence alignment weighting
19	The Performance of Sequence Alignment Algorithms Alimehr, Leila January 2013 (has links) This thesis deals with sequence alignment algorithms. The sequence alignment is a mutual arrange of two or more sequences in order to study their similarity and dissimilarity. Four decades after the seminal work by Needleman and Wunsch in 1970, these methods still need more explorations. We start out with a review of a sequence alignment, and its generalization to multiple alignments, although the focus of this thesis is on the evaluation of the new alignment algorithms. The research presented here in has stepped into the different algorithms that are in terms of the dynamic programming. In the study of sequence alignment algorithms, two powerful techniques have been invented. According to the simulations, the new algorithms are shown to be extremely efficient for the comparing DNA sequences. All the sequence alignment algorithmsare compared in terms of the distance. We use the programming language R for the implementation and simulation of the algorithms discussed in this thesis.
20	Combinatorial optimization and application to DNA sequence analysis Gupta, Kapil. January 2008 (has links) Thesis (Ph.D)--Industrial and Systems Engineering, Georgia Institute of Technology, 2009. / Committee Chair: Lee, Eva K.; Committee Member: Barnes, Earl; Committee Member: Fan, Yuhong; Committee Member: Johnson, Ellis; Committee Member: Yuan, Ming. Part of the SMARTech Electronic Thesis and Dissertation Collection.

Search results