Global ETD Search

51	"Multiple Sequence Alignment Using External Sources Of Information" Yasin, Layal 28 January 2016 (has links) No description available. 510 PFAM Alignment PROSITE DIALIGN Domains Patterns Profiles Multiple sequence alignment Informatik (PPN619939052)
52	Lagrangian Relaxation - Solving NP-hard Problems in Computational Biology via Combinatorial Optimization Canzar, Stefan 14 November 2008 (has links) (PDF) This thesis is devoted to two $\mathcal{NP}$-complete combinatorial optimization problems arising in computational biology, the well-studied \emph{multiple sequence alignment} problem and the new formulated \emph{interval constraint coloring} problem. It shows that advanced mathematical programming techniques are capable of solving large scale real-world instances from biology to optimality. Furthermore, it reveals alternative methods that provide approximate solutions. In the first part of the thesis, we present a \emph{Lagrangian relaxation} approach for the multiple sequence alignment (MSA) problem. The multiple alignment is one common mathematical abstraction of the comparison of multiple biological sequences, like DNA, RNA, or protein sequences. If the weight of a multiple alignment is measured by the sum of the projected pairwise weights of all pairs of sequences in the alignment, then finding a multiple alignment of maximum weight is $\mathcal{NP}$-complete if the number of sequences is not fixed. The majority of the available tools for aligning multiple sequences implement heuristic algorithms; no current exact method is able to solve moderately large instances or instances involving sequences exhibiting a lower degree of similarity. We present a branch-and-bound (B\&B) algorithm for the MSA problem.\ignore{the multiple sequence alignment problem.} We approximate the optimal integer solution in the nodes of the B\&B tree by a Lagrangian relaxation of an ILP formulation for MSA relative to an exponential large class of inequalities, that ensure that all pairwise alignments can be incorporated to a multiple alignment. By lifting these constraints prior to dualization the Lagrangian subproblem becomes an \emph{extended pairwise alignment} (EPA) problem: Compute the longest path in an acyclic graph, that is penalized a charge for entering ``obstacles''. We describe an efficient algorithm that solves the EPA problem repetitively to determine near-optimal \emph{Lagrangian multipliers} via subgradient optimization. The reformulation of the dualized constraints with respect to additionally introduced variables improves the convergence rate dramatically. We account for the exponential number of dualized constraints by starting with an empty \emph{constraint pool} in the first iteration to which we add cuts in each iteration, that are most violated by the convex combination of a small number of preceding Lagrangian solutions (including the current solution). In this \emph{relax-and-cut} scheme, only inequalities from the constraint pool are dualized. The interval constraint coloring problem appears in the interpretation of experimental data in biochemistry. Monitoring hydrogen-deuterium exchange rates via mass spectroscopy is a method used to obtain information about protein tertiary structure. The output of these experiments provides aggregate data about the exchange rate of residues in overlapping fragments of the protein backbone. These fragments must be re-assembled in order to obtain a global picture of the protein structure. The interval constraint coloring problem is the mathematical abstraction of this re-assembly process. The objective of the interval constraint coloring problem is to assign a color (exchange rate) to a set of integers (protein residues) such that a set of constraints is satisfied. Each constraint is made up of a closed interval (protein fragment) and requirements on the number of elements in the interval that belong to each color class (exchange rates observed in the experiments). We introduce a polyhedral description of the interval constraint coloring problem, which serves as a basis to attack the problem by integer linear programming (ILP) methods and tools, which perform well in practice. Since the goal is to provide biochemists with all possible candidate solutions, we combine related solutions to equivalence classes in an improved ILP formulation in order to reduce the running time of our enumeration algorithm. Moreover, we establish the polynomial-time solvability of the two-color case by the integrality of the linear programming relaxation polytope $\mathcal{P}$, and also present a combinatorial polynomial-time algorithm for this case. We apply this algorithm as a subroutine to approximate solutions to instances with arbitrary but fixed number of colors and achieve an order of magnitude improvement in running time over the (exact) ILP approach. We show that the problem is $\mathcal{NP}$-complete for arbitrary number of colors, and we provide algorithms that, given an instance with $\mathcal{P}\neq\emptyset$, find a coloring that satisfies all the coloring requirements within $\pm 1$ of the prescribed value. In light of our $\mathcal{NP}$-completeness result, this is essentially the best one can hope for. Our approach is based on polyhedral theory and randomized rounding techniques. In practice, data emanating from the experiments are noisy, which normally causes the instance to be infeasible, and, in some cases, even forces $\mathcal{P}$ to be empty. To deal with this problem, the objective of the ILP is to minimize the total sum of absolute deviations from the coloring requirements over all intervals. The combinatorial approach for the two-color case optimizes the same objective function. Furthermore, we use this combinatorial method to compute, in a Lagrangian way, a bound on the minimum total error, which is exploited in a branch-and-bound manner to determine all optimal colorings. Alternatively, we study a variant of the problem in which we want to maximize the number of requirements that are satisfied. We prove that this variant is $\mathcal{APX}$-hard even in the two-color case and thus does not admit a polynomial time approximation scheme (PTAS) unless $\mathcal{P}=\mathcal{NP}$. Therefore, we slightly (by a factor of $(1+\epsilon)$) relax the condition on when a requirement is satisfied and propose a \emph{quasi-polynomial time approximation scheme} (QPTAS) which finds a coloring that ``satisfies'' the requirements of as many intervals as possible. [INFO:INFO_OH] Computer Science/Other Lagrangian Relaxation Computational Biology Multiple Sequence Alignment Interval Constrained Coloring
53	The genease activity of mung bean nuclease: fact or fiction? Kula, Nothemba January 2004 (has links) <p>The action of Mung Bean Nuclease (MBN) on DNA makes it possible to clone intact gene fragments from genes of the malaria parasite, Plasmodium. This &ldquo / genease&rdquo / activity has provided a foundation for further investigation of the coding elements of the Plasmodium genome. MBN has been reported to cleave genomic DNA of Plasmodium preferentially at positions before and after genes, but not within gene coding regions. This mechanism has overcome the difficulty encountered in obtaining genes with low expression levels because the cleavage mechanism of the enzyme yields sequences of genes from genomic DNA rather than mRNA. However, as potentially useful as MBN may be, evidence to support its genease activity comes from analysis of a limited number of genes. It is not clear whether this mechanism is specific to certain genes or species of Plasmodia or whether it is a general cleavage mechanism for Plasmodium DNA .There have also been some projects (Nomura et al., 2001 / van Lin, Janse, and Waters, 2000) which have identified MBN generated fragments which contain fragments of genes with both introns and exons, rather than the intact genes expected from MBN-digestion of genomic DNA, which raises concerns about the efficiency of the MBN mechanism in generating complete genes.</p> <p><br /> Using a large-scale, whole genome mapping approach, 7242 MBN generated genome survey sequences (GSSs) have been mapped to determine their position relative to coding sequences within the complete genome sequences of the human malaria parasite Plasmodium falciparum and the incomplete genome of a rodent malaria parasite Plasmodium berghei. The location of MBN cleavage sites was determined with respect to coding regions in orthologous genes, non-coding /intergenic regions and exon-intron boundaries in these two species of Plasmodium. The survey illustrates that for P. falciparum 79% of GSSs had at least one terminal mapping within an ortholog coding sequence and 85% of GSSs which overlapped coding sequence boundaries mapped within 50 bp of the start or end of the gene. Similarly, despite the partial nature of P.berghei genome sequence information, 73% of P.berghei GSSs had at least one terminal mapping within an ortholog coding sequence and 37% of these mapped between 0-50 bp of the start or end of the gene. This indicates that a larger percentage of cleavage sites in both P.falciparum and P.berghei were found proximal to coding regions. Furthermore, 86% of P.falciparum GSSs had at least one terminal mapping within a coding exon and 85% of GSSs which overlapped exon-intron boundaries mapped within 50bp of the exon start and end site. The fact that 11% of GSSs mapped completely to intronic regions, suggests that some introns contain specific cleavage sites sensitive to cleavage and this also indicates that MBN cleavage of Plasmodium DNA does not always yield complete exons.</p> <p><br /> Finally, the results presented herein were obtained from analysis of several thousand Plasmodium genes which have different coding sequences, in different locations on individual chromosomes/contigs in two different species of Plasmodium. Therefore it appears that the MBN mechanism is neither species specific nor is it limited to specific genes.</p> Exon Genome survey sequence Mung Bean Nuclease Nuclease cleavage site Plasmodium falciparum Plasmodium berghei Sequence Alignment.
54	Core column prediction for protein multiple sequence alignments DeBlasio, Dan, Kececioglu, John 19 April 2017 (has links) Background: In a computed protein multiple sequence alignment, the coreness of a column is the fraction of its substitutions that are in so-called core columns of the gold-standard reference alignment of its proteins. In benchmark suites of protein reference alignments, the core columns of the reference alignment are those that can be confidently labeled as correct, usually due to all residues in the column being sufficiently close in the spatial superposition of the known three-dimensional structures of the proteins. Typically the accuracy of a protein multiple sequence alignment that has been computed for a benchmark is only measured with respect to the core columns of the reference alignment. When computing an alignment in practice, however, a reference alignment is not known, so the coreness of its columns can only be predicted. Results: We develop for the first time a predictor of column coreness for protein multiple sequence alignments. This allows us to predict which columns of a computed alignment are core, and hence better estimate the alignment's accuracy. Our approach to predicting coreness is similar to nearest-neighbor classification from machine learning, except we transform nearest-neighbor distances into a coreness prediction via a regression function, and we learn an appropriate distance function through a new optimization formulation that solves a large-scale linear programming problem. We apply our coreness predictor to parameter advising, the task of choosing parameter values for an aligner's scoring function to obtain a more accurate alignment of a specific set of sequences. We show that for this task, our predictor strongly outperforms other column-confidence estimators from the literature, and affords a substantial boost in alignment accuracy. Multiple sequence alignment Core blocks Alignment accuracy Accuracy estimation Parameter advising Machine learning Regression
55	PARSES: A Pipeline for Analysis of RNA-Sequencing Exogenous Sequences Coco, Joseph 20 May 2011 (has links) RNA-Sequencing (RNA-Seq) has become one of the most widely used techniques to interrogate the transcriptome of an organism since the advent of next generation sequencing technologies [1]. A plethora of tools have been developed to analyze and visualize the transcriptome data from RNA-Seq experiments, solving the problem of mapping reads back to the host organism's genome [2] [3]. This allows for analysis of most reads produced by the experiments, but these tools typically discard reads that do not match well with the reference genome. This additional information could reveal important insight into the experiment and possible contributing factors to the condition under consideration. We introduce PARSES, a pipeline constructed from existing sequence analysis tools, which allows the user to interrogate RNA-Sequencing experiments for possible biological contamination or the presence of exogenous sequences that may shed light on other factors influencing an organism's condition. exogenous agents RNA-Seq contamination sequence alignment cancer etiology sequence assembly taxonomical classification cancer treatment
56	Identificação e caracterização de grupos de indivíduos segundo padrões de seqüências de atividades multidimensionais. / Identification and characterization of groups of individuals according to patterns of multidimensional activity sequences. Dalmaso, Ricardo Curvello 30 April 2009 (has links) O presente estudo procura identificar grupos homogêneos de indivíduos quanto aos padrões de seqüências de atividades diárias que estes realizam. As atividades são caracterizadas por múltiplos atributos, fazendo com que as seqüências sejam multidimensionais. Como atributos, ou características, são considerados a natureza da atividade realizada, ou motivo da viagem, e o período de realização da mesma, ambos separados em categorias. É estudado o efeito da inclusão da forma de acesso à atividade, ou modo de viagem, como uma terceira dimensão. Este atributo, entretanto, dados os resultados obtidos, não é utilizado nas análises finais. É também considerada a adoção de diferentes categorizações para a dimensão motivo. São usados dados da pesquisa Origem e Destino realizada em 1997, na Região Metropolitana de São Paulo. No trabalho são considerados os indivíduos com 12 anos ou mais, com pelo menos duas viagens diárias e com seqüência de viagens iniciada e terminada em sua residência, sem inconsistências internas. O número de indivíduos que atende a estes critérios é 49.616. A classificação, ou agrupamento, das seqüências de atividades em classes ou grupos é feita considerando uma medida de distância ou dissimilaridade calculada entre as seqüências, que é baseada no esforço necessário para igualá-las. Esta medida é chamada de OT-MDSAM (uni-dimensional Optimum Trajectories-based MultiDimensional Sequence Alignment Method). A partir da matriz de dissimilaridades é executado um processo estatístico de agrupamento hierárquico aglomerativo usando o Método de Ward. Os grupos de seqüências formados são analisados considerando características das próprias seqüências e atributos sóciodemográficas e econômicas dos indivíduos que os compõem, e usados em um modelo de segmentação do tipo árvore de decisão, usando o CHAID (Chi-square Automatic Interaction Detector). Resultados indicam que os grupos formados são bastante homogêneos quanto aos padrões de seqüências de atividades que representam e aos indivíduos associados a eles. / The main objective of the dissertation is to identify homogeneous groups of individuals, with regard to the daily activity/travel sequences performed in a weekday. Activities are characterized by multiple attributes, thus generating mutidimensional seguences. In this study, the nature of the activity (travel purpose) and the starting period of engagement in the activity (ending time of a trip) were the dimensions considered in the characterization of activities. Access mode to the activity was also considered as a third dimension, but the results had led to the decision not to include it in the final analysis. Alternative categorizations of the activity nature dimension were also studied, that resulted in further disaggregation than adopted in previous analyses of the same data. The study used data from the 1997 Origin-Destination household survey of the Sao Paulo Metropolitan Area. The analysis considered all individuals aged 12 or over that conducted two or more trips (starting and ending at home) on the survey day, resulting in a sample of 49,616 individuals. A sequence alignment method - OT-MDSUM (uni-dimensional Optimum Trajectories-based MultiDimensional Sequence Alignment Method) - was used to compare and calculate distances between pairs of different activity/travel sequences. These distances were then fed into a Ward hierarchical clustering algorithm to create classes of groups of activity/travel patterns. These groups were then analyzed according to the characteristics of the activity/travel sequences included and to the sociodemographic and economic characteristics of individuals who performed these patterns. The data were then utilized to develop a decision tree model using CHAID - Chi-Squared Automatic Interaction Detector, having the group of activity/travel sequences as the response variable and the characteristics of individuals and their families as independent variables. The results indicate that the groups formed through this procedure present a good degree of homogeneity regarding the activity patterns they represent and that they can be clearly associated to the characteristics of the individuals which perform these patterns. Activity sequences multidimensional sequence alignment Planejamento de transportes Transport planning Travel demand
57	Aplicação de estratégias híbridas em algoritmos de alinhamento múltiplo de sequências para ambientes de computação paralela e distribuída. / Application of hybrid strategies in multiple sequence alignments for parallel and distributed computing environments. Zafalon, Geraldo Francisco Donegá 11 November 2014 (has links) A Bioinformática tem se desenvolvido de forma intensa nos últimos anos. A necessidade de se processar os grandes conjuntos de sequências, sejam de nucleotídeos ou de aminoácidos, tem estimulado o desenvolvimento de diversas técnicas algorítmicas, de modo a tratar este problema de maneira factível. Os algoritmos de alinhamento de alinhamento múltiplo de sequências assumiram um papel primordial, tornando a execução de alinhamentos de conjuntos com mais de duas sequencias uma tarefa viável computacionalmente. No entanto, com o aumento vertiginoso tanto da quantidade de sequencias em um determinado conjunto, quanto do comprimento dessas sequencias, a utilização desses algoritmos de alinhamento múltiplo, sem o acoplamento de novas estratégias, tornou-se algo impraticável. Consequentemente, a computação de alto desempenho despontou como um dos recursos a serem utilizados, através da paralelização de diversas estratégias para sua execução em grandes sistemas computacionais. Além disso, com a contínua expansão dos conjuntos de sequências, outras estratégias de otimização passaram a ser agregadas aos algoritmos de alinhamento múltiplo paralelos. Com isso, o desenvolvimento de ferramentas para alinhamento múltiplo de sequencias baseadas em abordagens híbridas destaca-se, atualmente, como a solução com melhor aceitação. Assim, no presente trabalho, pode-se verificar o desenvolvimento de uma estratégia híbrida para os algoritmos de alinhamento múltiplo progressivos, cuja utilização e amplamente difundida, em Bioinformática. Nesta abordagem, conjugou-se a paralelização e o particionamento dos conjuntos de sequências, na fase de construção da matriz de pontuação, e a otimização das fases de construção da árvore filogenética e de alinhamento múltiplo, através dos algoritmos de colônia de formigas e simulated annealling paralelo, respectivamente. / Bioinformatics has been developed in a fast way in the last years. The need for processing large sequences sets, either nucleotides or aminoacids, has stimulated the development of many algorithmic techniques, to solve this problem in a feasible way. Multiple sequence alignment algorithms have played an important role, because with the reduced computational complexity provided by them, it is possible to perform alignments with more than two sequences. However, with the fast growing of the amount and length of sequences in a set, the use of multiple alignment algorithms without new optimization strategies became almost impossible. Therefore, high performance computing has emerged as one of the features being used, through the parallelization of many strategies for execution in large computational systems. Moreover, with the continued expansion of sequences sets, other optimization strategies have been coupled with parallel multiple sequence alignments. Thus, the development of multiple sequences alignment tools based on hybrid strategies has been considered the solution with the best results. In this work, we present the development of a hybrid strategy to progressive multiple sequence alignment, where its using is widespread in Bioinformatics. In this approach, we have aggregated the parallelization and the partitioning of sequences sets in the score matrix calculation stage, and the optimization of the stages of the phylogenetic tree reconstruction and multiple alignment through ant colony and parallel simulated annealing algorithms, respectively. Algoritmos de otimização Alinhamento múltiplo de sequências Bioinformática Bioinformatics Multiple sequence alignment Optimization algorithms Parallel processing Processamento paralelo
58	Aplicação de algoritmos genéricos multi-objetivo para alinhamento de seqüências biológicas. / Multi-objective genetic algorithms applied to protein sequence alignment. Ticona, Waldo Gonzalo Cancino 26 February 2003 (has links) O alinhamento de seqüências biológicas é uma operação básica em Bioinformática, já que serve como base para outros processos como, por exemplo, a determinação da estrutura tridimensional das proteínas. Dada a grande quantidade de dados presentes nas seqüencias, são usadas técnicas matemáticas e de computação para realizar esta tarefa. Tradicionalmente, o Problema de Alinhamento de Seqüências Biológicas é formulado como um problema de otimização de objetivo simples, onde alinhamento de maior semelhança, conforme um esquema de pontuação, é procurado. A Otimização Multi-Objetivo aborda os problemas de otimização que possuem vários critérios a serem atingidos. Para este tipo de problema, existe um conjunto de soluções que representam um "compromiso" entre os objetivos. Uma técnica que se aplica com sucesso neste contexto são os Algoritmos Evolutivos, inspirados na Teoria da Evolução de Darwin, que trabalham com uma população de soluções que vão evoluindo até atingirem um critério de convergência ou de parada. Este trabalho formula o Problema de Alinhamento de Seqüências Biológicas como um Problema de Otimização Multi-Objetivo, para encontrar um conjunto de soluções que representem um compromisso entre a extensão e a qualidade das soluções. Aplicou-se vários modelos de Algoritmos Evolutivos para Otimização Multi-Objetivo. O desempenho de cada modelo foi avaliado por métricas de performance encontradas na literatura. / The Biological Sequence Alignment is a basic operation in Bioinformatics since it serves as a basis for other processes, i.e. determination of the protein's three-dimensional structure. Due to the large amount of data involved, mathematical and computational methods have been used to solve this problem. Traditionally, the Biological Alignment Sequence Problem is formulated as a single optimization problem. Each solution has a score that reflects the similarity between sequences. Then, the optimization process looks for the best score solution. The Multi-Objective Optimization solves problems with multiple objectives that must be reached. Frequently, there is a solution set that represents a trade-off between the objectives. Evolutionary Algorithms, which are inspired by Darwin's Evolution Theory, have been applied with success in solving this kind of problems. This work formulates the Biological Sequence Alignment as a Multi-Objective Optimization Problem in order to find a set of solutions that represent a trade-off between the extension and the quality of the solutions. Several models of Evolutionary Algorithms for Multi-Objetive Optimization have been applied and were evaluated using several performance metrics found in the literature. algoritmos evolutivos alinhamento de seqüências evolutionary algorithms multi-objective optimization otimização multi-objetivo sequence alignment
59	Marker extractions in DNA sequences using sub-sequence segmentation tree. January 2005 (has links) Hung Wah Johnson. / Thesis submitted in: August 2004. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2005. / Includes bibliographical references (leaves 116-121). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation --- p.1 / Chapter 1.2 --- Problem Statement --- p.3 / Chapter 1.3 --- Outline of the thesis --- p.6 / Chapter 2 --- Background --- p.8 / Chapter 2.1 --- Biological Background --- p.8 / Chapter 2.2 --- Sequence Alignments --- p.9 / Chapter 2.2.1 --- Pairwise Sequences Alignment --- p.11 / Chapter 2.2.2 --- Multiple Sequences Alignment --- p.15 / Chapter 2.3 --- Neighbor Joining Tree --- p.16 / Chapter 2.4 --- Marker Extractions --- p.18 / Chapter 2.5 --- Neural Network --- p.19 / Chapter 2.6 --- Conclusion --- p.22 / Chapter 3 --- Related Work --- p.23 / Chapter 3.1 --- FASTA --- p.23 / Chapter 3.2 --- Suffix Tree --- p.25 / Chapter 4 --- Sub-Sequence Segmentation Tree --- p.28 / Chapter 4.1 --- Introduction --- p.28 / Chapter 4.2 --- Problem Statement --- p.29 / Chapter 4.3 --- Design --- p.33 / Chapter 4.4 --- Time and space complexity analysis --- p.38 / Chapter 4.4.1 --- Performance Evaluation --- p.40 / Chapter 4.5 --- Summary --- p.48 / Chapter 5 --- Applications: Global Sequences Alignment --- p.51 / Chapter 5.1 --- Introduction --- p.51 / Chapter 5.2 --- Problem Statement --- p.53 / Chapter 5.3 --- Pairwise Alignment --- p.53 / Chapter 5.3.1 --- Algorithm --- p.53 / Chapter 5.3.2 --- Time and Space Complexity Analysis --- p.64 / Chapter 5.4 --- Multiple Sequences Alignment --- p.67 / Chapter 5.4.1 --- The Clustalw Algorithm --- p.68 / Chapter 5.4.2 --- MSA Using SSST --- p.70 / Chapter 5.4.3 --- Time and Space Complexity Analysis --- p.70 / Chapter 5.5 --- Experiments --- p.71 / Chapter 5.5.1 --- Experiment Setting --- p.72 / Chapter 5.5.2 --- Experimental Results --- p.72 / Chapter 5.6 --- Summary --- p.80 / Chapter 6 --- Applications: Marker Extractions --- p.81 / Chapter 6.1 --- Introduction --- p.81 / Chapter 6.2 --- Problem Statement --- p.82 / Chapter 6.3 --- The Multiple Sequence Alignment Approach --- p.85 / Chapter 6.3.1 --- Design --- p.85 / Chapter 6.4 --- Reference Sequence Alignment Approach --- p.88 / Chapter 6.4.1 --- Design --- p.90 / Chapter 6.5 --- Time and Space Complexity Analysis --- p.95 / Chapter 6.6 --- Experiments --- p.95 / Chapter 6.7 --- Summary --- p.99 / Chapter 7 --- HBV Application Framework --- p.101 / Chapter 7.1 --- Motivations --- p.101 / Chapter 7.2 --- The Procedure Flow of the Application --- p.102 / Chapter 7.2.1 --- Markers Extractions --- p.103 / Chapter 7.2.2 --- Rules Training and Prediction --- p.103 / Chapter 7.3 --- Results --- p.105 / Chapter 7.3.1 --- Clustering --- p.106 / Chapter 7.3.2 --- Classification --- p.107 / Chapter 7.4 --- Summary --- p.110 / Chapter 8 --- Conclusions --- p.112 / Chapter 8.1 --- Contributions --- p.112 / Chapter 8.2 --- Future Works --- p.114 / Chapter 8.2.1 --- HMM Learning --- p.114 / Chapter 8.2.2 --- Splice Sites Learning --- p.114 / Chapter 8.2.3 --- Faster Algorithm for Multiple Sequences Alignment --- p.115 / Bibliography --- p.121 Nucleotide sequence--Methodology Hepatitis B virus Sequence Analysis, DNA--methods Sequence Alignment--methods Hepatitis B virus
60	Aplicação de estratégias híbridas em algoritmos de alinhamento múltiplo de sequências para ambientes de computação paralela e distribuída. / Application of hybrid strategies in multiple sequence alignments for parallel and distributed computing environments. Geraldo Francisco Donegá Zafalon 11 November 2014 (has links) A Bioinformática tem se desenvolvido de forma intensa nos últimos anos. A necessidade de se processar os grandes conjuntos de sequências, sejam de nucleotídeos ou de aminoácidos, tem estimulado o desenvolvimento de diversas técnicas algorítmicas, de modo a tratar este problema de maneira factível. Os algoritmos de alinhamento de alinhamento múltiplo de sequências assumiram um papel primordial, tornando a execução de alinhamentos de conjuntos com mais de duas sequencias uma tarefa viável computacionalmente. No entanto, com o aumento vertiginoso tanto da quantidade de sequencias em um determinado conjunto, quanto do comprimento dessas sequencias, a utilização desses algoritmos de alinhamento múltiplo, sem o acoplamento de novas estratégias, tornou-se algo impraticável. Consequentemente, a computação de alto desempenho despontou como um dos recursos a serem utilizados, através da paralelização de diversas estratégias para sua execução em grandes sistemas computacionais. Além disso, com a contínua expansão dos conjuntos de sequências, outras estratégias de otimização passaram a ser agregadas aos algoritmos de alinhamento múltiplo paralelos. Com isso, o desenvolvimento de ferramentas para alinhamento múltiplo de sequencias baseadas em abordagens híbridas destaca-se, atualmente, como a solução com melhor aceitação. Assim, no presente trabalho, pode-se verificar o desenvolvimento de uma estratégia híbrida para os algoritmos de alinhamento múltiplo progressivos, cuja utilização e amplamente difundida, em Bioinformática. Nesta abordagem, conjugou-se a paralelização e o particionamento dos conjuntos de sequências, na fase de construção da matriz de pontuação, e a otimização das fases de construção da árvore filogenética e de alinhamento múltiplo, através dos algoritmos de colônia de formigas e simulated annealling paralelo, respectivamente. / Bioinformatics has been developed in a fast way in the last years. The need for processing large sequences sets, either nucleotides or aminoacids, has stimulated the development of many algorithmic techniques, to solve this problem in a feasible way. Multiple sequence alignment algorithms have played an important role, because with the reduced computational complexity provided by them, it is possible to perform alignments with more than two sequences. However, with the fast growing of the amount and length of sequences in a set, the use of multiple alignment algorithms without new optimization strategies became almost impossible. Therefore, high performance computing has emerged as one of the features being used, through the parallelization of many strategies for execution in large computational systems. Moreover, with the continued expansion of sequences sets, other optimization strategies have been coupled with parallel multiple sequence alignments. Thus, the development of multiple sequences alignment tools based on hybrid strategies has been considered the solution with the best results. In this work, we present the development of a hybrid strategy to progressive multiple sequence alignment, where its using is widespread in Bioinformatics. In this approach, we have aggregated the parallelization and the partitioning of sequences sets in the score matrix calculation stage, and the optimization of the stages of the phylogenetic tree reconstruction and multiple alignment through ant colony and parallel simulated annealing algorithms, respectively. Algoritmos de otimização Alinhamento múltiplo de sequências Bioinformática Processamento paralelo Bioinformatics Multiple sequence alignment Optimization algorithms Parallel processing

Search results