Global ETD Search

1	Bit-parallel and SIMD alignment algorithms for biological sequence analysis Loving, Joshua 21 November 2017 (has links) High-throughput next-generation sequencing techniques have hugely decreased the cost and increased the speed of sequencing, resulting in an explosion of sequencing data. This motivates the development of high-efficiency sequence alignment algorithms. In this thesis, I present multiple bit-parallel and Single Instruction Multiple Data (SIMD) algorithms that greatly accelerate the processing of biological sequences. The first chapter describes the BitPAl bit-parallel algorithms for global alignment with general integer scoring, which assigns integer weights for match, mismatch, and insertion/deletion. The bit-parallel approach represents individual cells in an alignment scoring matrix as bits in computer words and emulates the calculation of scores by a series of logic operations. Bit-parallelism has previously been applied to other pattern matching problems, producing fast algorithms. In timed tests, we show that BitPAl runs 7 - 25 times faster than a standard iterative algorithm. The second part involves two approaches to alignment with substitution scoring, which assigns a potentially different substitution weight to every pair of alphabet characters, better representing the relative rates of different mutations. The first approach extends the existing BitPAl method. The second approach is a new SIMD algorithm that uses partial sums of adjacent score differences. I present a simple partial sum method as well as one that uses parallel scan for additional acceleration. Results demonstrate that these algorithms are significantly faster than existing SIMD dynamic programming algorithms. Finally, I describe two extensions to the partial sums algorithm. The first adds support for affine gap penalty scoring. Affine gap scoring represents the biological likelihood that it is more likely for gaps to be continuous than to be distributed throughout a region by introducing a gap opening penalty and a gap extension penalty. The second extension is an algorithm that uses the partial sums method to calculate the tandem alignment of a pattern against a text sequence using a single pattern copy. Next generation sequencing data provides a wealth of information to researchers. Extracting that information in a timely manner increases the utility and practicality of sequence analysis algorithms. This thesis presents a family of algorithms which provide alignment scores in less time than previous algorithms. Bioinformatics Bit-parallel Next generation sequencing analysis Pattern matching Sequence alignment SIMD
2	On An Architecture For A Parallel Finite Field Multiplier With Low Complexity Based On Composite Fields Kindap, Nihal 01 September 2004 (has links) (PDF) In this thesis, a bit parallel architecture for a parallel finite field multiplier with low complexity in composite fields GF((2n)m) with k = n &middot / m (k 32) is investigated. The architecture has lower complexity when the Karatsuba-Ofman algorithm is applied for certain k. Using particular primitive polynomials for composite fields improves the complexities. We demonstrated for the values m = 2, 4, 8 in details. This thesis is based on the paper &ldquo / A New Architecture for a Parallel Finite Field Multiplier with Low Complexity Based on Composite Fields &rdquo / by Christof Paar. The whole purpose of this thesis is to understand and present a detailed description of the results of the paper of Paar.
3	Filtros para a busca e extração de padrões aproximados em cadeias biológicas / Filter Algorithms for Approximate Patterns Matching and Extraction from Biological Strings Soares Neto, Domingos 10 September 2008 (has links) Esta dissertação de mestrado aborda formulações computacionais e algoritmos para a busca e extração de padrões em cadeias biológicas. Em particular, o presente texto concentra-se nos dois problemas a seguir, considerando-os sob as distâncias de Hamming e Levenshtein: a) como determinar os locais nos quais um dado padrão ocorre de modo aproximado em uma cadeia fornecida; b) como extrair padrões que ocorram de modo aproximado em um número significativo de cadeias de um conjunto fornecido. O primeiro problema, para o qual já existem diversos algoritmos polinomiais, tem recebido muita atenção desde a década de 60, e ganhou novos ares com o advento da biologia computacional, nos idos dos anos 80, e com a popularização da Internet e seus mecanismos de busca: ambos os fenômenos trouxeram novos obstáculos a serem superados, em razão do grande volume de dados e das bastante justas restrições de tempo inerentes a essas aplicações. O segundo problema, de surgimento um pouco mais recente, é intrinsicamente desafiador, em razão de sua complexidade computacional, do tamanho das entradas tratadas nas aplicações mais comuns e de sua dificuldade de aproximação. Também é de chamar a atenção o seu grande potencial de aplicação. Neste trabalho são apresentadas formulações adequadas dos problemas abordados, assim como algoritmos e estruturas de dados essenciais ao seu estudo. Em especial, estudamos a extremamente versátil árvore dos sufixos, assim como uma de suas generalizações e sua estrutura irmã: o vetor dos sufixos. Grande parte do texto é dedicada aos filtros baseados em q-gramas para a busca aproximada de padrões e algumas de suas mais recentes variações. Estão cobertos os algoritmos bit-paralelos de Myers e Baeza-Yates-Gonnet para a busca de padrões; os algoritmos de Sagot para a extração de padrões; os algoritmos de filtragem de Ukkonen, Jokinen-Ukkonen, Burkhardt-Kärkkäinen, entre outros. / This thesis deals with computational formulations and algorithms for the extraction and search of patterns from biological strings. In particular, the present text focuses on the following problems, both considered under Hamming and Levenshtein distances: 1. How to find the positions where a given pattern approximatelly occurs in a given string; 2. How to extract patterns which approximatelly occurs in a certain number of strings from a given set. The first problem, for which there are many polinomial time algorithms, has been receiving a lot of attention since the 60s and entered a new era of discoveries with the advent of computational biology, in the 80s, and the widespread of the Internet and its search engines: both events brought new challenges to be faced by virtue of the large volume of data usually held by such applications and its time constraints. The second problem, much younger, is very challenging due to its computational complexity, approximation hardness and the size of the input data usually held by the most common applications. This problem is also very interesting due to its potential of application. In this work we show computational formulations, algorithms and data structures for those problems. We cover the bit-parallel algorithms of Myers, Baeza-Yates-Gonnet and the Sagots algorithms for patterns extraction. We also cover here the oustanding versatile suffix tree, its generalised version, and a similar data structure: the suffix array. A significant part of the present work focuses on q-gram based filters designed to solve the approximate pattern search problem. More precisely, we cover the filter algorithms of Ukkonen, Jokinen-Ukkonen and Burkhardt-Kärkkäinen, among others. algoritmos bit-paralelos algoritmos de filtragem approximate string matching árvores dos sufixos bit-parallel algorithms busca aproximada de padrões extração de padrões filter algorithms motifs motifs patterns extraction q-gramas q-grams suffix array suffix tree vetor dos sufixos
4	Experimentos Computacionais com ImplementaÃÃes de Conjunto por EndereÃamento Direto e o Problema de Conjunto Independente MÃximo / Computational Experiments with Set Implementations by Direct Addressing and the Maximum Independent Set Problem Marcio Costa Santos 13 September 2013 (has links) CoordenaÃÃo de AperfeiÃoamento de Pessoal de NÃvel Superior / A utilizaÃÃo de vetores de bits Ã prÃtica corrente na representaÃÃo de conjuntos por endereÃamento direto com o intuito de reduzir o espaÃo de memÃria necessÃrio e melhorar o desempenho de aplicaÃÃes com uso de tÃcnicas de paralelismo em bits. Nesta dissertaÃÃo, examinamos implementaÃÃes para representaÃÃo de conjuntos por endereÃamento direto. A estrutura bÃsica nessas implementaÃÃes Ã o vetor de bits. No entanto, alÃm dessa estrutura bÃsica, implementamos tambÃm duas variaÃÃes. A primeira delas consiste em uma estratificaÃÃo de vetores de bits, enquanto a segunda emprega uma tabela de dispersÃo. As operaÃÃes associadas Ãs estruturas implementadas sÃo a inclusÃo ou remoÃÃo de um elemento do conjunto e a uniÃo ou interseÃÃo de dois conjuntos. Especial atenÃÃo Ã dada ao uso de paralelismo em bits nessas operaÃÃes. As implementaÃÃes das diferentes estruturas nesta dissertaÃÃo utilizam uma interface e uma implementaÃÃo abstrata comuns, nas quais as operaÃÃes sÃo especificadas e o paralelismo em bits Ã explorado. A diferenÃa entre as implementaÃÃes estÃ apenas na estrutura utilizada. Uma comparaÃÃo experimental Ã realizada entre as diferentes estruturas utilizando algoritmos enumerativos para o problema de conjunto independente mÃximo. Duas abordagens sÃo utilizadas na implementaÃÃo de algoritmos enumerativos para o problema de conjunto independente mÃximo, ambas explorando o potencial de paralelismo em bits na representaÃÃo do grafo e na operaÃÃo sobre subconjuntos de vÃrtices. A primeira delas Ã um algoritmo do tipo {em branch-and-boound} proposto na literatura e a segunda emprega o mÃtodo das bonecas russas. Em ambos os casos, o uso de paralelismo em bits proporciona ganhos de eficiÃncia quando empregado no cÃlculo de limites inferiores baseados em cobertura por cliques. Resultados de experimentos computacionais sÃo apresentados como forma de comparaÃÃo entre os dois algoritmos e como forma de avaliaÃÃo das estruturas implementadas. Esses resultados permitem concluir que o algoritmo baseado no mÃtodo das bonecas russas Ã mais eficiente quanto ao tempo de execuÃÃo e quanto ao consumo de memÃria. AlÃm disso, os resultados experimentais mostram tambÃm que o uso de estratificaÃÃo e tabelas de dispersÃo permitem ainda maior eficiÃncia no caso de grafos com muito vÃrtices e poucas arestas. / The use of bit vectors is a usual practice for represent sets by direct addressing with the aim of reduce memory consumed and improve efficiency of applications with the use of bit parallel techniques. In this text, we study implementations for represent sets by direct addressed. The basic structure in this implementations is the bit vector. Besides that basic implementation, we implement two variations also. The first one is a stratification of the bit vector, while the second uses a hash table. The operations linked to the implemented structure are include and remove an element and the union and intersection of two sets. Especial attention is given to the use of bit parallel in this condition. The implementation of the different structures in this work use an base interface and a base abstract class, where the operations are defined and the bit parallel is used. An experimental comparative between this structures is carry out using enumerative algorithms for the maximum stable set problem. Two approaches are used in the implementation of the enumerative algorithms for the maximum stable set problem, both using the bit parallel in the representation of the graph and on the operations with subsets of vertices. The first one is a known branch-and-bound algorithm and the second uses the Russian dolls method. In both cases, the use of bit parallel improve efficiency when the lower bounds are calculated based in a clique cover of the vertices. The results of computational experiments are presented as comparison between the two algorithms and as an assessment of the structures implemented. These results show that the algorithm based on the method Russian Dolls is more efficient regarding runtime and the memory consumed. Furthermore, the experimental results also show that the use stratification and hash tables also allow more efficiency in the case of sparse graphs. Conjunto independente em grafos MÃtodo de branch-and-bound MÃtodo das Bonecas Russas Stable set of graphs Direct addressed and bit parallel Russian dolls method Branch-and-bound
5	Filtros para a busca e extração de padrões aproximados em cadeias biológicas / Filter Algorithms for Approximate Patterns Matching and Extraction from Biological Strings Domingos Soares Neto 10 September 2008 (has links) Esta dissertação de mestrado aborda formulações computacionais e algoritmos para a busca e extração de padrões em cadeias biológicas. Em particular, o presente texto concentra-se nos dois problemas a seguir, considerando-os sob as distâncias de Hamming e Levenshtein: a) como determinar os locais nos quais um dado padrão ocorre de modo aproximado em uma cadeia fornecida; b) como extrair padrões que ocorram de modo aproximado em um número significativo de cadeias de um conjunto fornecido. O primeiro problema, para o qual já existem diversos algoritmos polinomiais, tem recebido muita atenção desde a década de 60, e ganhou novos ares com o advento da biologia computacional, nos idos dos anos 80, e com a popularização da Internet e seus mecanismos de busca: ambos os fenômenos trouxeram novos obstáculos a serem superados, em razão do grande volume de dados e das bastante justas restrições de tempo inerentes a essas aplicações. O segundo problema, de surgimento um pouco mais recente, é intrinsicamente desafiador, em razão de sua complexidade computacional, do tamanho das entradas tratadas nas aplicações mais comuns e de sua dificuldade de aproximação. Também é de chamar a atenção o seu grande potencial de aplicação. Neste trabalho são apresentadas formulações adequadas dos problemas abordados, assim como algoritmos e estruturas de dados essenciais ao seu estudo. Em especial, estudamos a extremamente versátil árvore dos sufixos, assim como uma de suas generalizações e sua estrutura irmã: o vetor dos sufixos. Grande parte do texto é dedicada aos filtros baseados em q-gramas para a busca aproximada de padrões e algumas de suas mais recentes variações. Estão cobertos os algoritmos bit-paralelos de Myers e Baeza-Yates-Gonnet para a busca de padrões; os algoritmos de Sagot para a extração de padrões; os algoritmos de filtragem de Ukkonen, Jokinen-Ukkonen, Burkhardt-Kärkkäinen, entre outros. / This thesis deals with computational formulations and algorithms for the extraction and search of patterns from biological strings. In particular, the present text focuses on the following problems, both considered under Hamming and Levenshtein distances: 1. How to find the positions where a given pattern approximatelly occurs in a given string; 2. How to extract patterns which approximatelly occurs in a certain number of strings from a given set. The first problem, for which there are many polinomial time algorithms, has been receiving a lot of attention since the 60s and entered a new era of discoveries with the advent of computational biology, in the 80s, and the widespread of the Internet and its search engines: both events brought new challenges to be faced by virtue of the large volume of data usually held by such applications and its time constraints. The second problem, much younger, is very challenging due to its computational complexity, approximation hardness and the size of the input data usually held by the most common applications. This problem is also very interesting due to its potential of application. In this work we show computational formulations, algorithms and data structures for those problems. We cover the bit-parallel algorithms of Myers, Baeza-Yates-Gonnet and the Sagots algorithms for patterns extraction. We also cover here the oustanding versatile suffix tree, its generalised version, and a similar data structure: the suffix array. A significant part of the present work focuses on q-gram based filters designed to solve the approximate pattern search problem. More precisely, we cover the filter algorithms of Ukkonen, Jokinen-Ukkonen and Burkhardt-Kärkkäinen, among others. algoritmos bit-paralelos algoritmos de filtragem árvores dos sufixos busca aproximada de padrões extração de padrões motifs q-gramas vetor dos sufixos approximate string matching bit-parallel algorithms filter algorithms motifs patterns extraction q-grams suffix array suffix tree

1

Page generated in 0.0449 seconds