• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 71
  • 21
  • 7
  • 7
  • 5
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 163
  • 163
  • 82
  • 75
  • 69
  • 54
  • 44
  • 29
  • 25
  • 24
  • 23
  • 22
  • 22
  • 19
  • 18
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Efficient solutions for bioinformatics applications using GPUs

Liu, Chi-man, 廖志敏 January 2015 (has links)
Over the past few years, DNA sequencing technology has been advancing at such a fast pace that computer hardware and software can hardly meet the ever-increasing demand for sequence analysis. A natural approach to boost analysis efficiency is parallelization, which divides the problem into smaller ones that are to be solved simultaneously on multiple execution units. Common architectures such as multi-core CPUs and clusters can increase the throughput to some extent, but the hardware setup and maintenance costs are prohibitive. Fortunately, the newly emerged general-purpose GPU programming paradigm gives us a low-cost alternative for parallelization. This thesis presents GPU-accelerated algorithms for several problems in bioinformatics, along with implementations to demonstrate their power in handling enormous totally different limitations and optimization techniques than the CPU. The first tool presented is SOAP3-dp, which is a DNA short-read aligner highly optimized for speed. Prior to SOAP3-DP, the fastest short-read aligner was its predecessor SOAP2, which was capable of aligning 1 million 100-bp reads in 5 minutes. SOAP3-dp beats this record by aligning the same volume in only 10 seconds. The key to unlocking this unprecedented speed is the revamped BWT engine underlying SOAP3-dp. All data structures and associated operations have been tailor made for the GPU to achieve optimized performance. Experiments show that SOAP3-dp not only excels in speed, but also outperforms other aligners in both alignment sensitivity and accuracy. The next tools are for constructing data structures, namely Burrows-Wheeler transform (BWT) and de Bruijn graphs (DBGs), to facilitate genome assembly of short reads, especially large metagenomics data. The BWT index for a set of short reads has recently found its use in string-graph assemblers [44], as it provides a succinct way of representing huge string graphs which would otherwise exceed the main memory limit. Constructing the BWT index for a million reads is by itself not an easy task, let alone optimize for the GPU. Another class of assemblers, the DBG-based assemblers, also faces the same problem. This thesis presents construction algorithms for both the BWT and DBGs in a succinct form. In our experiments, we constructed the succinct DBG for a metagenomics data set with over 200 gigabases in 3 hours, and the resulting DBG only consumed 31.2 GB of memory. We also constructed the BWT index for 10 million 100-bp reads in 40 minutes using 4 quad-core machines. Lastly, we introduce a SNP detection tool, iSNPcall, which detects SNPs from a set of reads. Given a set of user-supplied annotated SNPs, iSNPcall focuses only on alignments covering these SNPs, which greatly accelerates the detection of SNPs at the prescribed loci. The annotated SNPs also helps us distinguish sequencing errors from authentic SNPs alleles easily. This is in contrast to the traditional de novo method which aligns reads onto the reference genome and then filters inauthentic mismatches according to some probabilities. After comparing on several applications, iSNPcall was found to give a higher accuracy than the de novo method, especially for samples with low coverage. / published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy
2

Modular-MPC accelerators for cost-effective 3D visualisation based on surface rendering

Gong, Weizhong January 1997 (has links)
No description available.
3

High throughput mass spectrometry based peptide identification search engine by GPUs

Li, You 02 November 2015 (has links)
Mass spectrometry (MS)based protein and peptide identification has become a solid method in proteomics. In high-throughput proteomics research, the “shotgun method has been widely applied. Database searching is currently the main method of tandem mass spectrometrybased protein identification in shotgun proteomics. The most widely used traditional search engines search for spectra against a database of identified protein sequences. The search engine is evaluated for its efficiency and effectiveness. With the development of proteomics, both the scale and the complexity of the related data are increasing steadily. As a result, the existing search engines face serious challenges. First, the sizes of protein sequence databases are ever increasing. From IPI.Human.v3.22 to IPI.Human.v3.49, the number of protein sequences has increased by nearly one third. Second, the increasing demand of searches against semispecific or nonspecific peptides results in a search space that is approximately 10 to 100 times larger. Finally, posttranslational modifications (PTMs) produce exponentially more modified peptides. The Unimod database (http://www.unimod.org) currently includes more than 1000 types of PTMs. We analyzed the entire identification workflow and discovered three things. First, most search engines spend 50% to 90% of their total time on the scoring module, the most widely used of which is the spectrum dot product (SDP)based scoring module. Second, nearly half of the scoring operations are redundant, which costs more time but does not increase effectiveness. Third, more than half of the spectra cannot be identified via a database search alone, but the identified spectra have a connection with the unidentified ones, which can be clustered by their distances. Based on the above observations, we designed and implemented a new search engine for protein and peptide identification that includes three key modules. First, a parallel index system, based on GPU, organizes the protein database and the spectra with no redundant data, low search computation complexity, and no limitation of the protein database scale. Second, the graphics processing unit (GPU)based SDP module adopts GPUs to accelerate the most time-consuming step in the process. Third, a k-meansbased spectrum-clustering module classifies the unidentified spectra to the identified spectra for further analysis. As general-purpose high-performance parallel hardware, GPUs are promising platforms for the acceleration of database searches in the protein identification process. We designed a parallel index system that accelerated the entire identification process two to five times with no loss of effectiveness, and achieved around 80% linear speedup effect on the cluster. The index system also can be easily adopted by other search engines. We also designed and implemented a parallel SDP-based scoring module on GPUs that exploits the efficient use of GPU registers and shared memory. A single GPU was 30 to 60 times faster than the central processing unit (CPU)based version. We also implemented our algorithm on a GPU cluster and achieved approximately linear acceleration. In addition, a k-meansbased spectrum-clustering module with GPUs can classify the unidentified spectra to the identified spectra at 20 times the speed of the normal k-means spectrum-clustering algorithm.
4

Development and applications of GPU based medical image registration

Gruslys, Audrūnas January 2014 (has links)
No description available.
5

Dynamic warp formation : exploiting thread scheduling for efficient MIMD control flow on SIMD graphics hardware

Fung, Wilson Wai Lun 11 1900 (has links)
Recent advances in graphics processing units (GPUs) have resulted in massively parallel hardware that is easily programmable and widely available in commodity desktop computer systems. GPUs typically use single-instruction, multiple-data (SIMD) pipelines to achieve high performance with minimal overhead for control hardware. Scalar threads running the same computing kernel are grouped together into SIMD batches, sometimes referred to as warps. While SIMD is ideally suited for simple programs, recent GPUs include control flow instructions in the GPU instruction set architecture and programs using these instructions may experience reduced performance due to the way branch execution is supported by hardware. One solution is to add a stack to allow different SIMD processing elements to execute distinct program paths after a branch instruction. The occurrence of diverging branch outcomes for different processing elements significantly degrades performance using this approach. In this thesis, we propose dynamic warp formation and scheduling, a mechanism for more efficient SIMD branch execution on GPUs. It dynamically regroups threads into new warps on the fly following the occurrence of diverging branch outcomes. We show that a realistic hardware implementation of this mechanism improves performance by an average of 47% for an estimated area increase of 8%.
6

Dynamic warp formation : exploiting thread scheduling for efficient MIMD control flow on SIMD graphics hardware

Fung, Wilson Wai Lun 11 1900 (has links)
Recent advances in graphics processing units (GPUs) have resulted in massively parallel hardware that is easily programmable and widely available in commodity desktop computer systems. GPUs typically use single-instruction, multiple-data (SIMD) pipelines to achieve high performance with minimal overhead for control hardware. Scalar threads running the same computing kernel are grouped together into SIMD batches, sometimes referred to as warps. While SIMD is ideally suited for simple programs, recent GPUs include control flow instructions in the GPU instruction set architecture and programs using these instructions may experience reduced performance due to the way branch execution is supported by hardware. One solution is to add a stack to allow different SIMD processing elements to execute distinct program paths after a branch instruction. The occurrence of diverging branch outcomes for different processing elements significantly degrades performance using this approach. In this thesis, we propose dynamic warp formation and scheduling, a mechanism for more efficient SIMD branch execution on GPUs. It dynamically regroups threads into new warps on the fly following the occurrence of diverging branch outcomes. We show that a realistic hardware implementation of this mechanism improves performance by an average of 47% for an estimated area increase of 8%.
7

Computer vision applications on graphics processing units

Ohmer, Julius Fabian January 2007 (has links)
Over the last few years, commodity Graphics Processing Units (GPUs) have evolved from fixed graphics pipeline processors into more flexible and powerful data-parallel processors. These stream processors are capable of sustaining computation rates of greater than ten times that of a single-core CPU. GPUs are inexpensive and are becoming ubiquitous in a wide variety of computer architectures including desktop and laptop computers, PDAs and cell phones. This research works investigates possible ways to use modern GPUs for real-time computer vision and pattern classification tasks. Special attention is paid to algorithms, where the power of the CPU is a limiting factor. This is in particular the case for real-time tracking algorithms on video streams, where many candidate regions must be evaluated at once to allow stable tracking of features. They impose a high computational burdon on sequential processing units such as the CPU. The proposed implementation presented in this thesis is considering standard PC platforms rather than expensive special dedicated hardware to allow a broad variety of users to benefit from powerful computer vision applications. In particular, this thesis includes following topics: 1. First, we present a framework for computer vision on the GPU, which is used as a foundation for the implementation of computer vision methods. 2. We continue with the discussion of GPU-based implementation of Kernel Methods, including Support Vector Machines and Kernel PCA. 3. Finally, we propose GPU-accelerated implementations of two tracking algorithms. The first algorithm uses geometric templates in a gradient vector field. The second algorithm is a color-based approach in a particle filter framework. Both are able to track objects in a video stream. This thesis concludes with a final discussion of the presented methods and will propose directions for further research work. It will also briefly present the features of the next generation of GPUs.
8

GPU accelerated sequence alignment /Zhao Kaiyong.

Zhao, Kaiyong 15 November 2016 (has links)
DNA sequence alignment is a fundamental task in gene information processing, which is about searching the location of a string (usually based on newly collected DNA data) in the existing huge DNA sequence databases. Due to the huge amount of newly generated DNA data and the complexity of approximate string match, sequence alignment becomes a time-consuming process. Hence how to reduce the alignment time becomes a significant research problem. Some algorithms of string alignment based on HASH comparison, suffix array and BWT, which have been proposed for DNA sequence alignment. Although these algorithms have reached the speed of O(N), they still cannot meet the increasing demand if they are running on traditional CPUs. Recently, GPUs have been widely accepted as an efficient accelerator for many scientific and commercial applications. A typical GPU has thousands of processing cores which can speed up repetitive computations significantly as compared to multi-core CPUs. However, sequence alignment is one kind of computation procedure with intensive data access, i.e., it is memory-bounded. The access to GPU memory and IO has more significant influence in performance when compared to the computing capabilities of GPU cores. By analyzing GPU memory and IO characteristics, this thesis produces novel parallel algorithms for DNA sequence alignment applications. This thesis consists of six parts. The first two parts explain some basic knowledge of DNA sequence alignment and GPU computing. The third part investigates the performance of data access on different types of GPU memory. The fourth part describes a parallel method to accelerate short-read sequence alignment based on BWT algorithm. The fifth part proposes the parallel algorithm for accelerating BLASTN, one of the most popular sequence alignment software. It shows how multi-threaded control and multiple GPU cards can accelerate the BLASTN algorithm significantly. The sixth part concludes the whole thesis. To summarize, through analyzing the layout of GPU memory and comparing data under the mode of multithread access, this thesis analyzes and concludes a perfect optimization method to achieve sequence alignment on GPU. The outcomes can help practitioners in bioinformatics to improve their working efficiency by significantly reducing the sequence alignment time.
9

Dynamic warp formation : exploiting thread scheduling for efficient MIMD control flow on SIMD graphics hardware

Fung, Wilson Wai Lun 11 1900 (has links)
Recent advances in graphics processing units (GPUs) have resulted in massively parallel hardware that is easily programmable and widely available in commodity desktop computer systems. GPUs typically use single-instruction, multiple-data (SIMD) pipelines to achieve high performance with minimal overhead for control hardware. Scalar threads running the same computing kernel are grouped together into SIMD batches, sometimes referred to as warps. While SIMD is ideally suited for simple programs, recent GPUs include control flow instructions in the GPU instruction set architecture and programs using these instructions may experience reduced performance due to the way branch execution is supported by hardware. One solution is to add a stack to allow different SIMD processing elements to execute distinct program paths after a branch instruction. The occurrence of diverging branch outcomes for different processing elements significantly degrades performance using this approach. In this thesis, we propose dynamic warp formation and scheduling, a mechanism for more efficient SIMD branch execution on GPUs. It dynamically regroups threads into new warps on the fly following the occurrence of diverging branch outcomes. We show that a realistic hardware implementation of this mechanism improves performance by an average of 47% for an estimated area increase of 8%. / Applied Science, Faculty of / Electrical and Computer Engineering, Department of / Graduate
10

Runtime Adaptation for Autonomic Heterogeneous Computing

Scogland, Thomas R. 12 December 2014 (has links)
Heterogeneity is increasing across all levels of computing, with the rise of accelerators such as GPUs, FPGAs, and other coprocessors into everything from cell phones to supercomputers. More quietly it is increasing with the rise of NUMA systems, hierarchical caching, OS noise, and a myriad of other factors. As heterogeneity becomes a fact of life, efficiently managing heterogeneous compute resources is becoming a critical, and ever more complex, task. The focus of this dissertation is to lay the foundation for an autonomic system for heterogeneous computing, employing runtime adaptation to improve performance portability and performance consistency while maintaining or increasing programmability. We investigate heterogeneity arising from a myriad of factors, grouped into the dimensions of locality and capability. This work has resulted in runtime schedulers capable of automatically detecting and mitigating heterogeneity in physically homogeneous systems through MPI and adaptive coscheduling for physically heterogeneous accelerator based systems as well as a synthesis of the two to address multiple levels of heterogeneity as a coherent whole. We also discuss our current work towards the next generation of fine-grained scheduling and synchronization across heterogeneous platforms in the design of a highly-scalable and portable concurrent queue for many-core systems. Each component addresses aspects of the urgent need for automated management of the extreme and ever expanding complexity introduced by heterogeneity. / Ph. D.

Page generated in 0.082 seconds