11 |
Parallel algorithms for real-time peptide-spectrum matchingZhang, Jian 16 December 2010 (has links)
Tandem mass spectrometry is a powerful experimental tool used in molecular biology to determine the composition of protein mixtures. It has become a standard technique for protein identification. Due to the rapid development of mass spectrometry technology, the instrument can now produce a large number of mass spectra which are used for peptide identification. The increasing data size demands efficient software tools to perform peptide identification.<p>
In a tandem mass experiment, peptide ion selection algorithms generally select only the most abundant peptide ions for further fragmentation. Because of this, the low-abundance proteins in a sample rarely get identified. To address this problem, researchers develop the notion of a `dynamic exclusion list', which maintains a list of newly selected peptide ions, and it ensures these peptide ions do not get selected again for a certain time. In this way, other peptide ions will get more opportunity to be selected and identified, allowing for identification of peptides of lower abundance.
However, a better method is to also include the identification results into the `dynamic exclusion list' approach. In order to do this, a real-time peptide identification algorithm is required.<p>
In this thesis, we introduce methods to improve the speed of peptide identification so that the `dynamic exclusion list' approach can use the peptide identification results without affecting the throughput of the instrument. Our work is based on RT-PSM, a real-time program for peptide-spectrum matching with statistical significance. We profile the speed of RT-PSM and find out that the peptide-spectrum scoring module is the most time consuming portion.<p>
Given by the profiling results, we introduce methods to parallelize the peptide-spectrum scoring algorithm. In this thesis, we propose two parallel algorithms using different technologies. We introduce parallel peptide-spectrum matching using SIMD instructions. We implemented and tested the parallel algorithm on Intel SSE architecture. The test results show that a 18-fold speedup on the entire process is obtained. The second parallel algorithm is developed using NVIDIA CUDA technology. We describe two CUDA kernels based on different algorithms and compare the performance of the two kernels. The more efficient algorithm is integrated into RT-PSM. The time measurement results show that a 190-fold speedup on the scoring module is achieved and 26-fold speedup on the entire process is obtained. We perform profiling on the CUDA version again to show that the scoring module has been optimized sufficiently to the point where it is no longer the most time-consuming module in the CUDA version of RT-PSM.<p>
In addition, we evaluate the feasibility of creating a metric index to reduce the number of candidate peptides. We describe evaluation methods, and show that general indexing methods are not likely feasible for RT-PSM.
|
12 |
Software and Hardware Integration of a Programmable Floating- and Fixed-Point Vertex ShaderChen, Li-Yao 02 September 2010 (has links)
OpenGL ES 2.0 programmable 3D graphics pipeline is the current new standard for embedded graphics processor designs. The programmable vertex shader replaces the geometry operations in the previous fixed-function graphics pipeline and provides more flexible APIs for more realistic animation effects. In this thesis, we introduce the OpenGL ES 2.0 specification, and the design of programmable vertex shader architecture and instruction set. In particular, we focus on the integration issues encountered when the vertex shader is integrated with other hardware components and software during the entire SoC design, and verify the vertex shader on FPGA with demonstration.
|
13 |
Design and Analysis of low power/low cost MP3 Audio Decoder SystemLin, Yi-Ting 09 September 2004 (has links)
In embedded system, multimedia application is more important than before, And these products appearing more often. In addition, handholding devices are more and more popular, so these products¡¦ price is usually chip than others, and they concern more about power consumption. So in our design, we can¡¦t only focus on performance, low power and low cost are become the one of the most important factors.
The main contribution of this thesis is that in MP3 multimedia application we analyzed ¡Bestimated and optimized our hardware and software to achieve low power and low cost issues. In software part, we used the skill of optimization to optimize our complied assembly codes. In hardware part, we analyzed the MP3 decode algorithm, found the critical part, and implement them with hardware to hope we can use the smallest hardware cost, to achieve highest acceleration. And we hope that through our research, we can establish a fundament about development a special application platform.
|
14 |
Improving energy efficiency of reliable massively-parallel architecturesKrimer, Evgeni 12 July 2012 (has links)
While transistor size continues to shrink every technology generation
increasing the amount of transistors on a die, the reduction in energy
consumption is less significant. Furthermore, newer technologies induce
fabrication challenges resulting in uncertainties in transistor and wire properties.
Therefore to ensure correctness, design margins are introduced resulting in
significantly sub-optimal energy efficiency. While increasing parallelism and the
use of gating methods contribute to energy consumption reduction, ultimately,
more radical changes to the architecture and better integration of architectural
and circuit techniques will be necessary. This dissertation explores one such
approach, combining a highly-efficient massively-parallel processor architecture
with a design methodology that reduces energy by trimming design margins.
Using a massively-parallel GPU-like (graphics processing unit) base-
line architecture, we discuss the different components of process variation and
design microarchitectural approaches supporting efficient margins reduction.
We evaluate our design using a cycle-based GPU simulator, describe the
conditions where efficiency improvements can be obtained, and explore the benefits
of decoupling across a wide range of parameters. We architect a test-chip that
was fabricated and show these mechanisms to work.
We also discuss why previously developed related approaches fall short
when process variation is very large, such as in low-voltage operation or as
expected for future VLSI technology. We therefore develop and evaluate a
new approach specifically for high-variation scenarios.
To summarize, in this work, we address the emerging challenges of
modern massively parallel architectures including energy efficient, reliable
operation and high process variation. We believe that the results of this work
are essential for breaking through the energy wall, continuing to improve the
efficiency of future generations of the massively parallel architectures. / text
|
15 |
Dynamic warp formation : exploiting thread scheduling for efficient MIMD control flow on SIMD graphics hardwareFung, Wilson Wai Lun 11 1900 (has links)
Recent advances in graphics processing units (GPUs) have resulted in massively parallel hardware that is easily programmable and widely available in commodity desktop computer systems. GPUs typically use single-instruction, multiple-data (SIMD) pipelines to achieve high performance with minimal overhead for control hardware. Scalar threads running the same computing kernel are grouped together into SIMD batches, sometimes referred to as warps. While SIMD is ideally suited for simple programs, recent GPUs include control flow instructions in the GPU instruction set architecture and programs using these instructions may experience reduced performance due to the way branch execution is supported by hardware. One solution is to add a stack to allow different SIMD processing elements to execute distinct program paths after a branch instruction. The occurrence of diverging branch outcomes for different processing elements significantly degrades performance using this approach. In this thesis, we propose dynamic warp formation and scheduling, a mechanism for more efficient SIMD branch execution on GPUs. It dynamically regroups threads into new warps on the fly following the occurrence of diverging branch outcomes. We show that a realistic hardware implementation of this mechanism improves performance by an average of 47% for an estimated area increase of 8%.
|
16 |
Dynamic warp formation : exploiting thread scheduling for efficient MIMD control flow on SIMD graphics hardwareFung, Wilson Wai Lun 11 1900 (has links)
Recent advances in graphics processing units (GPUs) have resulted in massively parallel hardware that is easily programmable and widely available in commodity desktop computer systems. GPUs typically use single-instruction, multiple-data (SIMD) pipelines to achieve high performance with minimal overhead for control hardware. Scalar threads running the same computing kernel are grouped together into SIMD batches, sometimes referred to as warps. While SIMD is ideally suited for simple programs, recent GPUs include control flow instructions in the GPU instruction set architecture and programs using these instructions may experience reduced performance due to the way branch execution is supported by hardware. One solution is to add a stack to allow different SIMD processing elements to execute distinct program paths after a branch instruction. The occurrence of diverging branch outcomes for different processing elements significantly degrades performance using this approach. In this thesis, we propose dynamic warp formation and scheduling, a mechanism for more efficient SIMD branch execution on GPUs. It dynamically regroups threads into new warps on the fly following the occurrence of diverging branch outcomes. We show that a realistic hardware implementation of this mechanism improves performance by an average of 47% for an estimated area increase of 8%.
|
17 |
Grafische Benutzungsunterstützung auf Befehlsebene für die Entwicklung massivparalleler ProgrammeToussaint, Frederic. January 2007 (has links)
Zugl.: Karlsruhe, Universiẗat, Diss., 2007.
|
18 |
Das hermetische Eigenwertproblem Implementierungsaspekte für Festkomma-SIMD-DSPsSchäfer, Frank January 2007 (has links)
Zugl.: Dresden, Techn. Univ., Diss., 2007
|
19 |
Dynamic warp formation : exploiting thread scheduling for efficient MIMD control flow on SIMD graphics hardwareFung, Wilson Wai Lun 11 1900 (has links)
Recent advances in graphics processing units (GPUs) have resulted in massively parallel hardware that is easily programmable and widely available in commodity desktop computer systems. GPUs typically use single-instruction, multiple-data (SIMD) pipelines to achieve high performance with minimal overhead for control hardware. Scalar threads running the same computing kernel are grouped together into SIMD batches, sometimes referred to as warps. While SIMD is ideally suited for simple programs, recent GPUs include control flow instructions in the GPU instruction set architecture and programs using these instructions may experience reduced performance due to the way branch execution is supported by hardware. One solution is to add a stack to allow different SIMD processing elements to execute distinct program paths after a branch instruction. The occurrence of diverging branch outcomes for different processing elements significantly degrades performance using this approach. In this thesis, we propose dynamic warp formation and scheduling, a mechanism for more efficient SIMD branch execution on GPUs. It dynamically regroups threads into new warps on the fly following the occurrence of diverging branch outcomes. We show that a realistic hardware implementation of this mechanism improves performance by an average of 47% for an estimated area increase of 8%. / Applied Science, Faculty of / Electrical and Computer Engineering, Department of / Graduate
|
20 |
A M-SIMD Intelligent MemoryRangan, Krishna Kumar 11 October 2001 (has links)
No description available.
|
Page generated in 0.0344 seconds