Spelling suggestions: "subject:"3dgraphics processing units."" "subject:"biographics processing units.""
1 |
Efficient solutions for bioinformatics applications using GPUsLiu, Chi-man, 廖志敏 January 2015 (has links)
Over the past few years, DNA sequencing technology has been advancing at such a fast pace that computer hardware and software can hardly meet the ever-increasing demand for sequence analysis. A natural approach to boost analysis efficiency is parallelization, which divides the problem into smaller ones that are to be solved simultaneously on multiple execution units. Common architectures such as multi-core CPUs and clusters can increase the throughput to some extent, but the hardware setup and maintenance costs are prohibitive. Fortunately, the newly emerged general-purpose GPU programming paradigm gives us a low-cost alternative for parallelization.
This thesis presents GPU-accelerated algorithms for several problems in bioinformatics, along with implementations to demonstrate their power in handling enormous totally different limitations and optimization techniques than the CPU.
The first tool presented is SOAP3-dp, which is a DNA short-read aligner highly optimized for speed. Prior to SOAP3-DP, the fastest short-read aligner was its predecessor SOAP2, which was capable of aligning 1 million 100-bp reads in 5 minutes. SOAP3-dp beats this record by aligning the same volume in only 10 seconds. The key to unlocking this unprecedented speed is the revamped BWT engine underlying SOAP3-dp. All data structures and associated operations have been tailor made for the GPU to achieve optimized performance. Experiments show that SOAP3-dp not only excels in speed, but also outperforms other aligners in both alignment sensitivity and accuracy.
The next tools are for constructing data structures, namely Burrows-Wheeler transform (BWT) and de Bruijn graphs (DBGs), to facilitate genome assembly of short reads, especially large metagenomics data. The BWT index for a set of short reads has recently found its use in string-graph assemblers [44], as it provides a succinct way of representing huge string graphs which would otherwise exceed the main memory limit. Constructing the BWT index for a million reads is by itself not an easy task, let alone optimize for the GPU. Another class of assemblers, the DBG-based assemblers, also faces the same problem. This thesis presents construction algorithms for both the BWT and DBGs in a succinct form. In our experiments, we constructed the succinct DBG for a metagenomics data set with over 200 gigabases in 3 hours, and the resulting DBG only consumed 31.2 GB of memory. We also constructed the BWT index for 10 million 100-bp reads in 40 minutes using 4 quad-core machines.
Lastly, we introduce a SNP detection tool, iSNPcall, which detects SNPs from a set of reads. Given a set of user-supplied annotated SNPs, iSNPcall focuses only on alignments covering these SNPs, which greatly accelerates the detection of SNPs at the prescribed loci. The annotated SNPs also helps us distinguish sequencing errors from authentic SNPs alleles easily. This is in contrast to the traditional de novo method which aligns reads onto the reference genome and then filters inauthentic mismatches according to some probabilities. After comparing on several applications, iSNPcall was found to give a higher accuracy than the de novo method, especially for samples with low coverage. / published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy
|
2 |
High throughput mass spectrometry based peptide identification search engine by GPUsLi, You 02 November 2015 (has links)
Mass spectrometry (MS)based protein and peptide identification has become a solid method in proteomics. In high-throughput proteomics research, the “shotgun method has been widely applied. Database searching is currently the main method of tandem mass spectrometrybased protein identification in shotgun proteomics. The most widely used traditional search engines search for spectra against a database of identified protein sequences. The search engine is evaluated for its efficiency and effectiveness. With the development of proteomics, both the scale and the complexity of the related data are increasing steadily. As a result, the existing search engines face serious challenges. First, the sizes of protein sequence databases are ever increasing. From IPI.Human.v3.22 to IPI.Human.v3.49, the number of protein sequences has increased by nearly one third. Second, the increasing demand of searches against semispecific or nonspecific peptides results in a search space that is approximately 10 to 100 times larger. Finally, posttranslational modifications (PTMs) produce exponentially more modified peptides. The Unimod database (http://www.unimod.org) currently includes more than 1000 types of PTMs. We analyzed the entire identification workflow and discovered three things. First, most search engines spend 50% to 90% of their total time on the scoring module, the most widely used of which is the spectrum dot product (SDP)based scoring module. Second, nearly half of the scoring operations are redundant, which costs more time but does not increase effectiveness. Third, more than half of the spectra cannot be identified via a database search alone, but the identified spectra have a connection with the unidentified ones, which can be clustered by their distances. Based on the above observations, we designed and implemented a new search engine for protein and peptide identification that includes three key modules. First, a parallel index system, based on GPU, organizes the protein database and the spectra with no redundant data, low search computation complexity, and no limitation of the protein database scale. Second, the graphics processing unit (GPU)based SDP module adopts GPUs to accelerate the most time-consuming step in the process. Third, a k-meansbased spectrum-clustering module classifies the unidentified spectra to the identified spectra for further analysis. As general-purpose high-performance parallel hardware, GPUs are promising platforms for the acceleration of database searches in the protein identification process. We designed a parallel index system that accelerated the entire identification process two to five times with no loss of effectiveness, and achieved around 80% linear speedup effect on the cluster. The index system also can be easily adopted by other search engines. We also designed and implemented a parallel SDP-based scoring module on GPUs that exploits the efficient use of GPU registers and shared memory. A single GPU was 30 to 60 times faster than the central processing unit (CPU)based version. We also implemented our algorithm on a GPU cluster and achieved approximately linear acceleration. In addition, a k-meansbased spectrum-clustering module with GPUs can classify the unidentified spectra to the identified spectra at 20 times the speed of the normal k-means spectrum-clustering algorithm.
|
3 |
Development and applications of GPU based medical image registrationGruslys, Audrūnas January 2014 (has links)
No description available.
|
4 |
Computer vision applications on graphics processing unitsOhmer, Julius Fabian January 2007 (has links)
Over the last few years, commodity Graphics Processing Units (GPUs) have evolved from fixed graphics pipeline processors into more flexible and powerful data-parallel processors. These stream processors are capable of sustaining computation rates of greater than ten times that of a single-core CPU. GPUs are inexpensive and are becoming ubiquitous in a wide variety of computer architectures including desktop and laptop computers, PDAs and cell phones. This research works investigates possible ways to use modern GPUs for real-time computer vision and pattern classification tasks. Special attention is paid to algorithms, where the power of the CPU is a limiting factor. This is in particular the case for real-time tracking algorithms on video streams, where many candidate regions must be evaluated at once to allow stable tracking of features. They impose a high computational burdon on sequential processing units such as the CPU. The proposed implementation presented in this thesis is considering standard PC platforms rather than expensive special dedicated hardware to allow a broad variety of users to benefit from powerful computer vision applications. In particular, this thesis includes following topics: 1. First, we present a framework for computer vision on the GPU, which is used as a foundation for the implementation of computer vision methods. 2. We continue with the discussion of GPU-based implementation of Kernel Methods, including Support Vector Machines and Kernel PCA. 3. Finally, we propose GPU-accelerated implementations of two tracking algorithms. The first algorithm uses geometric templates in a gradient vector field. The second algorithm is a color-based approach in a particle filter framework. Both are able to track objects in a video stream. This thesis concludes with a final discussion of the presented methods and will propose directions for further research work. It will also briefly present the features of the next generation of GPUs.
|
5 |
GPU accelerated sequence alignment /Zhao Kaiyong.Zhao, Kaiyong 15 November 2016 (has links)
DNA sequence alignment is a fundamental task in gene information processing, which is about searching the location of a string (usually based on newly collected DNA data) in the existing huge DNA sequence databases. Due to the huge amount of newly generated DNA data and the complexity of approximate string match, sequence alignment becomes a time-consuming process. Hence how to reduce the alignment time becomes a significant research problem. Some algorithms of string alignment based on HASH comparison, suffix array and BWT, which have been proposed for DNA sequence alignment. Although these algorithms have reached the speed of O(N), they still cannot meet the increasing demand if they are running on traditional CPUs. Recently, GPUs have been widely accepted as an efficient accelerator for many scientific and commercial applications. A typical GPU has thousands of processing cores which can speed up repetitive computations significantly as compared to multi-core CPUs. However, sequence alignment is one kind of computation procedure with intensive data access, i.e., it is memory-bounded. The access to GPU memory and IO has more significant influence in performance when compared to the computing capabilities of GPU cores. By analyzing GPU memory and IO characteristics, this thesis produces novel parallel algorithms for DNA sequence alignment applications. This thesis consists of six parts. The first two parts explain some basic knowledge of DNA sequence alignment and GPU computing. The third part investigates the performance of data access on different types of GPU memory. The fourth part describes a parallel method to accelerate short-read sequence alignment based on BWT algorithm. The fifth part proposes the parallel algorithm for accelerating BLASTN, one of the most popular sequence alignment software. It shows how multi-threaded control and multiple GPU cards can accelerate the BLASTN algorithm significantly. The sixth part concludes the whole thesis. To summarize, through analyzing the layout of GPU memory and comparing data under the mode of multithread access, this thesis analyzes and concludes a perfect optimization method to achieve sequence alignment on GPU. The outcomes can help practitioners in bioinformatics to improve their working efficiency by significantly reducing the sequence alignment time.
|
6 |
Application of stream processing to hydraulic network solvers24 October 2011 (has links)
M.Ing. / The aim of this research was to investigate the use of stream processing on the graphics processing unit (GPU) and to apply it into the hydraulic modelling of a water distribution system. The stream processing model was programmed and compared to the programming on the conventional, sequential programming platform, namely the CPU. The use of the GPU as a parallel processor has been widely adopted in many different non-graphic applications and the benefits of implementing parallel processing in these fields have been significant. They have the capacity to perform from billions to trillions of floating-point operations per second using programmable shader programs. These great advances seen in the GPU architecture have been driven by the gaming industry and a demand for better gaming experiences. The computational performance of the GPU is much greater than the computational capability of CPU processors. Hydraulic modelling of water distribution systems has become vital to the construction of new water distribution systems. This is because water distribution networks are very complex and are nonlinear in nature. Further, modelling is able to prevent and anticipate problems in a system without physically building the system. The hydraulic model that was used was the Gradient Method, which is the hydraulic model used in the EPANET software package. The Gradient Method produces a linear system which is both positive-definite and symmetric. The Cholesky method is currently being used in the EPANET algorithm in order to solve the linear equations produced by the Gradient Method. Thus, a linear solution method had to be selected for the use in both parallel processing on the GPU and as a hydraulic network solver. The Conjugate Gradient algorithm was selected as an ideal algorithm as it works well with the hydraulic solver and could be converted into a parallel algorithm on the GPU. The Conjugate Gradient Method is one of the best-known iterative techniques used in the solution of sparse symmetric positive definite linear systems. The Conjugate Gradient Method was constructed both in the sequential programming model and the stream processing model, using the CPU and the GPU respectively on two different computer systems. The Cholesky method was also programmed in the sequential programming model for both of the computer systems. A comparison was made between the Cholesky and the Conjugate Gradient Methods in order to evaluate the two methods relative to each other. The findings in this study have shown that stream processing on the GPU can be used in the parallel GPU architecture in order to perform general-purpose algorithms. The results further affirmed that iterative linear solution methods should only be used for large linear systems.
|
7 |
A multiple-precision integer arithmetic library for GPUs and its applicationsZhao, Kaiyong 01 January 2011 (has links)
No description available.
|
8 |
Performance Analysis of Hybrid CPU/GPU EnvironmentsSmith, Michael Shawn 01 January 2010 (has links)
We present two metrics to assist the performance analyst to gain a unified view of application performance in a hybrid environment: GPU Computation Percentage and GPU Load Balance. We analyze the metrics using a matrix multiplication benchmark suite and a real scientific application. We also extend an experiment management system to support GPU performance data and to calculate and store our GPU Computation Percentage and GPU Load Balance metrics.
|
9 |
A recurrent neural network implementation using the graphics processing unit /Moore, Christopher January 1900 (has links)
Thesis (M.S.)--Oregon State University, 2010. / Printout. Includes bibliographical references (leaves 103-104). Also available on the World Wide Web.
|
10 |
Run-time loop parallelization with efficient dependency checking on GPU-accelerated platformsZhang, Chenggang, 张呈刚 January 2011 (has links)
General-Purpose computing on Graphics Processing Units (GPGPU) has attracted a lot of attention recently. Exciting results have been reported in using GPUs to accelerate applications in various domains such as scientific simulations, data mining, bio-informatics and computational finance. However, up to now GPUs can only accelerate data-parallel loops with statically analyzable parallelism. Loops with dynamic parallelism (e.g., with array accesses through subscripted subscripts), an important pattern in many general-purpose applications, cannot be parallelized on GPUs using existing technologies.
Run-time loop parallelization using Thread Level Speculation (TLS) has been proposed in the literatures to parallelize loops with statically un-analyzable dependencies. However, most of the existing TLS systems are designed for multiprocessor/multi-core CPUs. GPUs have fundamental differences with CPUs in both hardware architecture and execution model, making the previous TLS designs not work or inefficient when ported to GPUs. This thesis presents GPUTLS, a runtime system designed to support speculative loop parallelization on GPUs. The design of GPU-TLS addresses several key problems encountered when adapting TLS to GPUs: (1) To reduce the possibility of mis-speculation, deferred-update memory versioning scheme is adopted to avoid mis-speculations caused by inter-iteration WAR and WAW dependencies. A technique named intra-warp value forwarding is proposed to respect some inter-iteration RAW dependencies, which further reduces the mis-speculation possibility. (2) An incremental speculative execution scheme is designed to exploit partial parallelism within loops. This avoids excessive re-executions and reduces the mis-speculation penalty. (3) The dependency checking among thousands of speculative GPU threads poses large overhead and can easily become the performance bottleneck. To lower the overhead, we design several e_cient dependency checking schemes named PRW+BDC, SW, SR, SRW+EDC, and SRW+LDC respectively. (4) We devise a novel parallel commit scheme to avoid the overhead incurred by the serial commit phase in most existing TLS designs.
We have carried out extensive experiments on two platforms with different NVIDIA GPUs, using both a synthetic loop that can simulate loops with different characteristics and several loops from real-life applications. Testing results show that the proposed intra-warp value forwarding and eager dependency checking techniques can improve the performance for almost all kinds of loop patterns. We observe that compared with other dependency checking schemes, SR and SW can achieve better performance in most cases. It is also shown that the proposed parallel commit scheme is especially useful for loops with large write set size and small number of inter-iteration WAW dependencies. Overall, GPU-TLS can achieve speedups ranging from 5 to 105 for loops with dynamic parallelism. / published_or_final_version / Computer Science / Master / Master of Philosophy
|
Page generated in 0.083 seconds