Global ETD Search

1	Exploiting Multigrain Parallelism in Pairwise Sequence Search on Emergent CMP Architectures Aji, Ashwin Mandayam 25 August 2008 (has links) With the emerging hybrid multi-core and many-core compute platforms delivering unprecedented high performance within a single chip, and making rapid strides toward the commodity processor market, they are widely expected to replace the multi-core processors in the existing High-Performance Computing (HPC) infrastructures, such as large scale clusters, grids and supercomputers. On the other hand in the realm of bioinformatics, the size of genomic databases is doubling every 12 months, and hence the need for novel approaches to parallelize sequence search algorithms has become increasingly important. This thesis puts a significant step forward in bridging the gap between software and hardware by presenting an efficient and scalable model to accelerate one of the popular sequence alignment algorithms by exploiting multigrain parallelism that is exposed by the emerging multiprocessor architectures. Specifically, we parallelize a dynamic programming algorithm called Smith-Waterman both within and across multiple Cell Broadband Engines and within an nVIDIA GeForce General Purpose Graphics Processing Unit (GPGPU). Cell Broadband Engine: We parallelize the Smith-Waterman algorithm within a Cell node by performing a blocked data decomposition of the dynamic programming matrix followed by pipelined execution of the blocks across the synergistic processing elements (SPEs) of the Cell. We also introduce novel optimization methods that completely utilize the vector processing power of the SPE. As a result, we achieve near-linear scalability or near-constant efficiency for up to 16 SPEs on the dual-Cell QS20 blades, and our design is highly scalable to more cores, if available. We further extend this design to accelerate the Smith-Waterman algorithm across nodes on both the IBM QS20 and the PlayStation3 Cell cluster platforms and achieve a maximum speedup of 44, when compared to the execution times on a single Cell node. We then introduce an analytical model to accurately estimate the execution times of parallel sequence alignments and wavefront algorithms in general on the Cell cluster platforms. Lastly, we contribute and evaluate TOSS -- a Throughput-Oriented Sequence Scheduler, which leverages the performance prediction model and dynamically partitions the available processing elements to simultaneously align multiple sequences. This scheme succeeds in aligning more sequences per unit time with an improvement of 33.5% over the naive first-come, first-serve (FCFS) scheduler. nVIDIA GPGPU: We parallelize the Smith-Waterman algorithm on the GPGPU by optimizing the code in stages, which include optimal data layout strategies, coalesced memory accesses and blocked data decomposition techniques. Results show that our methods provide a maximum speedup of 3.6 on the nVIDIA GPGPU when compared to the performance of the naive implementation of Smith-Waterman. / Master of Science GPGPU Cell Broadband Engine Smith-Waterman Bioinformatics
2	Smith-Waterman Sequence Alignment For Massively Parallel High-Performance Computing Architectures Steinfadt, Shannon Irene 19 April 2010 (has links) No description available. Bioinformatics Computer Science Molecular Biology Sequence Alignment Smith-Waterman Parallel Computing Associative Computing ASC SIMD MIMD ClearSpeed JumboMem SWAMP SWAMP+
3	MR-CUDASW - GPU accelerated Smith-Waterman algorithm for medium-length (meta)genomic data 2014 November 1900 (has links) The idea of using a graphics processing unit (GPU) for more than simply graphic output purposes has been around for quite some time in scientific communities. However, it is only recently that its benefits for a range of bioinformatics and life sciences compute-intensive tasks has been recognized. This thesis investigates the possibility of improving the performance of the overlap determination stage of an Overlap Layout Consensus (OLC)-based assembler by using a GPU-based implementation of the Smith-Waterman algorithm. In this thesis an existing GPU-accelerated sequence alignment algorithm is adapted and expanded to reduce its completion time. A number of improvements and changes are made to the original software. Workload distribution, query profile construction, and thread scheduling techniques implemented by the original program are replaced by custom methods specifically designed to handle medium-length reads. Accordingly, this algorithm is the first highly parallel solution that has been specifically optimized to process medium-length nucleotide reads (DNA/RNA) from modern sequencing machines (i.e. Ion Torrent). Results show that the software reaches up to 82 GCUPS (Giga Cell Updates Per Second) on a single-GPU graphic card running on a commodity desktop hardware. As a result it is the fastest GPU-based implemen- tation of the Smith-Waterman algorithm tailored for processing medium-length nucleotide reads. Despite being designed for performing the Smith-Waterman algorithm on medium-length nucleotide sequences, this program also presents great potential for improving heterogeneous computing with CUDA-enabled GPUs in general and is expected to make contributions to other research problems that require sensitive pairwise alignment to be applied to a large number of reads. Our results show that it is possible to improve the performance of bioinformatics algorithms by taking full advantage of the compute resources of the underlying commodity hardware and further, these results are especially encouraging since GPU performance grows faster than multi-core CPUs. Bioinformatics Sequence Alignment Smith-Waterman Algorithm GPU Computing CUDA Sequence Assembly Metagenomics Next-Generation-Sequencing
4	Desenvolvimento de hardware reconfigurável dedicado para suporte ao alinhamento de seqüencias Silva, Fábio Vinícius Pinto e 17 September 2007 (has links) Dissertação (mestrado)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2007. / Submitted by Rosane Cossich Furtado (rosanecossich@gmail.com) on 2010-01-04T21:40:52Z No. of bitstreams: 1 2007_FabioViniciusPintoSilva.pdf: 1375531 bytes, checksum: 2272e318dce7e1284d2d2eb04367db52 (MD5) / Approved for entry into archive by Carolina Campos(carolinacamposmaia@gmail.com) on 2010-01-05T17:08:01Z (GMT) No. of bitstreams: 1 2007_FabioViniciusPintoSilva.pdf: 1375531 bytes, checksum: 2272e318dce7e1284d2d2eb04367db52 (MD5) / Made available in DSpace on 2010-01-05T17:08:01Z (GMT). No. of bitstreams: 1 2007_FabioViniciusPintoSilva.pdf: 1375531 bytes, checksum: 2272e318dce7e1284d2d2eb04367db52 (MD5) Previous issue date: 2007-09-17 / Encontrar e visualizar semelhanças entre seqüências de DNA permite aprofundar o conhecimento sobre genomas de organismos em Biologia Molecular. Com o número de seqüências disponíveis para consulta em alguns bancos de dados crescendo exponencialmente, surge um desafio para a ciência da computação. É o de construir sistemas de informática com desempenho suficiente para permitir comparar seqüências genômicas em tempo hábil para a pesquisa e com um custo viável. Freqüentemente são usadas soluções heurísticas, devido ao grande tempo computacional necessário para o uso de soluções exatas. Soluções exatas atualmente apresentam complexidade de tempo quadrática em computadores convencionais, dificultando seu uso prático para seqüências de comprimento como as de aplicações reais. O principal objetivo deste trabalho é viabilizar o uso de algoritmos exatos para comparação de seqüências genômicas, acelerando a obtenção de seus resultados. É proposto um arranjo sistólico de elementos de processamento em hardware reconfigurável. Assim, é explorado o paralelismo potencial do algoritmo de programação dinâmica de Smith-Waterman, reduzindo sua complexidade de tempo de quadrática para linear. É proposta uma solução para minimizar o problema de gargalo de comunicação, esperado por uma implementação "ingênua" da solução. Além do sistema proposto, a prototipação realizada em FPGA é descrita, incluindo uma análise do desempenho obtido. _______________________________________________________________________________ ABSTRACT / To find and to visualize similarities between DNA sequences allow to deepen the knowledgement on genomas of organisms in Molecular Biology. With the number of available sequences for consultation in some data bases growing exponentially , a challenge for the computer science appears. It is to construct computing systems with enough performance to allow to compare genomics sequences in skillful time for the research and at a viable cost. Frequently heuristical solutions are used, due to the great computational time necessary to the use of exact solutions. Exact solutions currently presents quadratic time complexity in conventionals computers, making difficult its practical use for sequences of length as of real applications. The main objective of this work is to make possible the use of exact algorithms for comparison of genomics sequences, by speeding up the attainment of its results. A systolic arrangement of elements of processing in reconfigurable hardware is proposed. This way, the potential parallelism of the algorithm of dynamic programming of Smith-Waterman is explored, reducing its time complexity from quadratic to linear. Is also proposed a solution to minimize the problem of communication bottleneck, waited in a “naive” implementation. Besides the proposed system, the prototipation made in FPGA is described, including an analysis of the performance gotten. Comparação de sequências Alinhamento de sequências Programação dinâmica Smith-Waterman Arquiteturas sistólicas Arquiteturas reconfiguráveis FPGA
5	A Framework for the Design and Analysis of High-Performance Applications on FPGAs using Partial Reconfiguration Anderson, Richard D 12 August 2016 (has links) The field-programmable gate array (FPGA) is a dynamically reconfigurable digital logic chip used to implement custom hardware. The large densities of modern FPGAs and the capability of the on-thely reconfiguration has made the FPGA a viable alternative to fixed logic hardware chips such as the ASIC. In high-performance computing, FPGAs are used as co-processors to speed up computationally intensive processes or as autonomous systems that realize a complete hardware application. However, due to the limited capacity of FPGA logic resources, denser FPGAs must be purchased if more logic resources are required to realize all the functions of a complex application. Alternatively, partial reconfiguration (PR) can be used to swap, on demand, idle components of the application with active components. This research uses PR to swap components to improve the performance of the application given the limited logic resources available with smaller but economical FPGAs. The swap is called ”resource sharing PR”. In a pipelined design of multiple hardware modules (pipeline stages), resource sharing PR is a technique that uses PR to improve the performance of pipeline bottlenecks. This is done by reconfiguring other pipeline stages, typically those that are idle waiting for data from a bottleneck, into an additional parallel bottleneck module. The target pipeline of this research is a two-stage “slow-toast” pipeline where the flow of data traversing the pipeline transitions from a relatively slow, bottleneck stage to a fast stage. A two stage pipeline that combines FPGA-based hardware implementations of well-known Bioinformatics search algorithms, the X! Tandem algorithm and the Smith-Waterman algorithm, is implemented for this research; the implemented pipeline demonstrates that characteristics of these algorithm. The experimental results show that, in a database of unknown peptide spectra, when matching spectra with 388 peaks or greater, performing resource sharing PR to instantiate a parallel X! Tandem module is worth the cost for PR. In addition, from timings gathered during experiments, a general formula was derived for determining the value of performing PR upon a fast module. Xilinx X! Tande Virtex-6 Tandem Mass Spectrometry Smith-Waterman Proteomics Partial Reconfiguration FPGA Bioinformatics
6	Acceleration of CFD and Data Analysis Using Graphics Processors Khajeh Saeed, Ali 01 February 2012 (has links) Graphics processing units function well as high performance computing devices for scientific computing. The non-standard processor architecture and high memory bandwidth allow graphics processing units (GPUs) to provide some of the best performance in terms of FLOPS per dollar. Recently these capabilities became accessible for general purpose computations with the CUDA programming environment on NVIDIA GPUs and ATI Stream Computing environment on ATI GPUs. Many applications in computational science are constrained by memory access speeds and can be accelerated significantly by using GPUs as the compute engine. Using graphics processing units as a compute engine gives the personal desktop computer a processing capacity that competes with supercomputers. Graphics Processing Units represent an energy efficient architecture for high performance computing in flow simulations and many other fields. This document reviews the graphic processing unit and its features and limitations. CFD GPUs Parallel Processing Smith-Waterman Tilera Unbalanced Tree Search Industrial Engineering Mechanical Engineering
7	AN ANALYSIS OF SUBSTANCE USE RELATED LYRICS IN TWITTER SPACE Luo, Waylon Wolf 14 November 2022 (has links) No description available. Computer Science Drug Abuse Lyrics Twitter Popular Music Smith-Waterman Algorithm
8	A PAIRWISE COMPARISON OF DNA SEQUENCE ALIGNMENT USING AN OPENMP IMPLEMENTATION OF THE SWAMP PARALLEL SMITH-WATERMAN ALGORITHM Cuevas, Tristan Lee 22 April 2015 (has links) No description available. Computer Science parallel smith-waterman dna sequencing dna-sequencing swamp cleerspeed openmp cn sequence-alignment simd
9	Searching Biological Sequence Databases Using Distributed Adaptive Computing Pappas, Nicholas Peter 06 February 2003 (has links) Genetic research projects currently can require enormous computing power to processes the vast quantities of data available. Further, DNA sequencing projects are generating data at an exponential rate greater than that of the development microprocessor technology; thus, new, faster methods and techniques of processing this data are needed. One common type of processing involves searching a sequence database for the most similar sequences. Here we present a distributed database search system that utilizes adaptive computing technologies. The search is performed using the Smith-Waterman algorithm, a common sequence comparison algorithm. To reduce the total search time, an initial search is performed using a version of the algorithm, implemented in adaptive computing hardware, which is designed to efficiently perform the initial search. A final search is performed using a complete version of the algorithm. This two-stage search, employing adaptive and distributed hardware, achieves a performance increase of several orders of magnitude over similar processor based systems. / Master of Science bioinformatics Smith-Waterman algorithm adaptive computing configurable computing sequence comparison sequence alignment Field programmable gate arrays
10	Generalizing the Utility of Graphics Processing Units in Large-Scale Heterogeneous Computing Systems Xiao, Shucai 03 July 2013 (has links) Today, heterogeneous computing systems are widely used to meet the increasing demand for high-performance computing. These systems commonly use powerful and energy-efficient accelerators to augment general-purpose processors (i.e., CPUs). The graphic processing unit (GPU) is one such accelerator. Originally designed solely for graphics processing, GPUs have evolved into programmable processors that can deliver massive parallel processing power for general-purpose applications. Using SIMD (Single Instruction Multiple Data) based components as building units; the current GPU architecture is well suited for data-parallel applications where the execution of each task is independent. With the delivery of programming models such as Compute Unified Device Architecture (CUDA) and Open Computing Language (OpenCL), programming GPUs has become much easier than before. However, developing and optimizing an application on a GPU is still a challenging task, even for well-trained computing experts. Such programming tasks will be even more challenging in large-scale heterogeneous systems, particularly in the context of utility computing, where GPU resources are used as a service. These challenges are largely due to the limitations in the current programming models: (1) there are no intra-and inter-GPU cooperative mechanisms that are natively supported; (2) current programming models only support the utilization of GPUs installed locally; and (3) to use GPUs on another node, application programs need to explicitly call application programming interface (API) functions for data communication. To reduce the mapping efforts and to better utilize the GPU resources, we investigate generalizing the utility of GPUs in large-scale heterogeneous systems with GPUs as accelerators. We generalize the utility of GPUs through the transparent virtualization of GPUs, which can enable applications to view all GPUs in the system as if they were installed locally. As a result, all GPUs in the system can be used as local GPUs. Moreover, GPU virtualization is a key capability to support the notion of "GPU as a service." Specifically, we propose the virtual OpenCL (or VOCL) framework for the transparent virtualization of GPUs. To achieve good performance, we optimize and extend the framework in three aspects: (1) optimize VOCL by reducing the data transfer overhead between the local node and remote node; (2) propose GPU synchronization to reduce the overhead of switching back and forth if multiple kernel launches are needed for data communication across different compute units on a GPU; and (3) extend VOCL to support live virtual GPU migration for quick system maintenance and load rebalancing across GPUs. With the above optimizations and extensions, we thoroughly evaluate VOCL along three dimensions: (1) show the performance improvement for each of our optimization strategies; (2) evaluate the overhead of using remote GPUs via several microbenchmark suites as well as a few real-world applications; and (3) demonstrate the overhead as well as the benefit of live virtual GPU migration. Our experimental results indicate that VOCL can generalize the utility of GPUs in large-scale systems at a reasonable virtualization and migration cost. / Ph. D. Graphics Processing Unit (GPU) CUDA OpenCL BLAST Smith-Waterman Fine-Grained Parallelization GPU Virtualization

Search results