Global ETD Search

11	Parallel algorithms for real-time peptide-spectrum matching Zhang, Jian 16 December 2010 (has links) Tandem mass spectrometry is a powerful experimental tool used in molecular biology to determine the composition of protein mixtures. It has become a standard technique for protein identification. Due to the rapid development of mass spectrometry technology, the instrument can now produce a large number of mass spectra which are used for peptide identification. The increasing data size demands efficient software tools to perform peptide identification.<p> In a tandem mass experiment, peptide ion selection algorithms generally select only the most abundant peptide ions for further fragmentation. Because of this, the low-abundance proteins in a sample rarely get identified. To address this problem, researchers develop the notion of a `dynamic exclusion list', which maintains a list of newly selected peptide ions, and it ensures these peptide ions do not get selected again for a certain time. In this way, other peptide ions will get more opportunity to be selected and identified, allowing for identification of peptides of lower abundance. However, a better method is to also include the identification results into the `dynamic exclusion list' approach. In order to do this, a real-time peptide identification algorithm is required.<p> In this thesis, we introduce methods to improve the speed of peptide identification so that the `dynamic exclusion list' approach can use the peptide identification results without affecting the throughput of the instrument. Our work is based on RT-PSM, a real-time program for peptide-spectrum matching with statistical significance. We profile the speed of RT-PSM and find out that the peptide-spectrum scoring module is the most time consuming portion.<p> Given by the profiling results, we introduce methods to parallelize the peptide-spectrum scoring algorithm. In this thesis, we propose two parallel algorithms using different technologies. We introduce parallel peptide-spectrum matching using SIMD instructions. We implemented and tested the parallel algorithm on Intel SSE architecture. The test results show that a 18-fold speedup on the entire process is obtained. The second parallel algorithm is developed using NVIDIA CUDA technology. We describe two CUDA kernels based on different algorithms and compare the performance of the two kernels. The more efficient algorithm is integrated into RT-PSM. The time measurement results show that a 190-fold speedup on the scoring module is achieved and 26-fold speedup on the entire process is obtained. We perform profiling on the CUDA version again to show that the scoring module has been optimized sufficiently to the point where it is no longer the most time-consuming module in the CUDA version of RT-PSM.<p> In addition, we evaluate the feasibility of creating a metric index to reduce the number of candidate peptides. We describe evaluation methods, and show that general indexing methods are not likely feasible for RT-PSM. Bioinfomatics SIMD Parallel GPU Computer Science
12	Software and Hardware Integration of a Programmable Floating- and Fixed-Point Vertex Shader Chen, Li-Yao 02 September 2010 (has links) OpenGL ES 2.0 programmable 3D graphics pipeline is the current new standard for embedded graphics processor designs. The programmable vertex shader replaces the geometry operations in the previous fixed-function graphics pipeline and provides more flexible APIs for more realistic animation effects. In this thesis, we introduce the OpenGL ES 2.0 specification, and the design of programmable vertex shader architecture and instruction set. In particular, we focus on the integration issues encountered when the vertex shader is integrated with other hardware components and software during the entire SoC design, and verify the vertex shader on FPGA with demonstration. Integration SOC Programmable SIMD Vertex Shader
13	Design and Analysis of low power/low cost MP3 Audio Decoder System Lin, Yi-Ting 09 September 2004 (has links) In embedded system, multimedia application is more important than before, And these products appearing more often. In addition, handholding devices are more and more popular, so these products¡¦ price is usually chip than others, and they concern more about power consumption. So in our design, we can¡¦t only focus on performance, low power and low cost are become the one of the most important factors. The main contribution of this thesis is that in MP3 multimedia application we analyzed ¡Bestimated and optimized our hardware and software to achieve low power and low cost issues. In software part, we used the skill of optimization to optimize our complied assembly codes. In hardware part, we analyzed the MP3 decode algorithm, found the critical part, and implement them with hardware to hope we can use the smallest hardware cost, to achieve highest acceleration. And we hope that through our research, we can establish a fundament about development a special application platform. multimedia application IMDCT DCT SIMD MP3 decode
14	Improving energy efficiency of reliable massively-parallel architectures Krimer, Evgeni 12 July 2012 (has links) While transistor size continues to shrink every technology generation increasing the amount of transistors on a die, the reduction in energy consumption is less significant. Furthermore, newer technologies induce fabrication challenges resulting in uncertainties in transistor and wire properties. Therefore to ensure correctness, design margins are introduced resulting in significantly sub-optimal energy efficiency. While increasing parallelism and the use of gating methods contribute to energy consumption reduction, ultimately, more radical changes to the architecture and better integration of architectural and circuit techniques will be necessary. This dissertation explores one such approach, combining a highly-efficient massively-parallel processor architecture with a design methodology that reduces energy by trimming design margins. Using a massively-parallel GPU-like (graphics processing unit) base- line architecture, we discuss the different components of process variation and design microarchitectural approaches supporting efficient margins reduction. We evaluate our design using a cycle-based GPU simulator, describe the conditions where efficiency improvements can be obtained, and explore the benefits of decoupling across a wide range of parameters. We architect a test-chip that was fabricated and show these mechanisms to work. We also discuss why previously developed related approaches fall short when process variation is very large, such as in low-voltage operation or as expected for future VLSI technology. We therefore develop and evaluate a new approach specifically for high-variation scenarios. To summarize, in this work, we address the emerging challenges of modern massively parallel architectures including energy efficient, reliable operation and high process variation. We believe that the results of this work are essential for breaking through the energy wall, continuing to improve the efficiency of future generations of the massively parallel architectures. / text SIMD Energy-efficiency Process variation GPU GPGPU
15	Dynamic warp formation : exploiting thread scheduling for efficient MIMD control flow on SIMD graphics hardware Fung, Wilson Wai Lun 11 1900 (has links) Recent advances in graphics processing units (GPUs) have resulted in massively parallel hardware that is easily programmable and widely available in commodity desktop computer systems. GPUs typically use single-instruction, multiple-data (SIMD) pipelines to achieve high performance with minimal overhead for control hardware. Scalar threads running the same computing kernel are grouped together into SIMD batches, sometimes referred to as warps. While SIMD is ideally suited for simple programs, recent GPUs include control flow instructions in the GPU instruction set architecture and programs using these instructions may experience reduced performance due to the way branch execution is supported by hardware. One solution is to add a stack to allow different SIMD processing elements to execute distinct program paths after a branch instruction. The occurrence of diverging branch outcomes for different processing elements significantly degrades performance using this approach. In this thesis, we propose dynamic warp formation and scheduling, a mechanism for more efficient SIMD branch execution on GPUs. It dynamically regroups threads into new warps on the fly following the occurrence of diverging branch outcomes. We show that a realistic hardware implementation of this mechanism improves performance by an average of 47% for an estimated area increase of 8%. GPU SIMD Control flow Graphics processing unit
16	Dynamic warp formation : exploiting thread scheduling for efficient MIMD control flow on SIMD graphics hardware Fung, Wilson Wai Lun 11 1900 (has links) Recent advances in graphics processing units (GPUs) have resulted in massively parallel hardware that is easily programmable and widely available in commodity desktop computer systems. GPUs typically use single-instruction, multiple-data (SIMD) pipelines to achieve high performance with minimal overhead for control hardware. Scalar threads running the same computing kernel are grouped together into SIMD batches, sometimes referred to as warps. While SIMD is ideally suited for simple programs, recent GPUs include control flow instructions in the GPU instruction set architecture and programs using these instructions may experience reduced performance due to the way branch execution is supported by hardware. One solution is to add a stack to allow different SIMD processing elements to execute distinct program paths after a branch instruction. The occurrence of diverging branch outcomes for different processing elements significantly degrades performance using this approach. In this thesis, we propose dynamic warp formation and scheduling, a mechanism for more efficient SIMD branch execution on GPUs. It dynamically regroups threads into new warps on the fly following the occurrence of diverging branch outcomes. We show that a realistic hardware implementation of this mechanism improves performance by an average of 47% for an estimated area increase of 8%. GPU SIMD Control flow Graphics processing unit
17	Grafische Benutzungsunterstützung auf Befehlsebene für die Entwicklung massivparalleler Programme Toussaint, Frederic. January 2007 (has links) Zugl.: Karlsruhe, Universiẗat, Diss., 2007.
18	Das hermetische Eigenwertproblem Implementierungsaspekte für Festkomma-SIMD-DSPs Schäfer, Frank January 2007 (has links) Zugl.: Dresden, Techn. Univ., Diss., 2007
19	Dynamic warp formation : exploiting thread scheduling for efficient MIMD control flow on SIMD graphics hardware Fung, Wilson Wai Lun 11 1900 (has links) Recent advances in graphics processing units (GPUs) have resulted in massively parallel hardware that is easily programmable and widely available in commodity desktop computer systems. GPUs typically use single-instruction, multiple-data (SIMD) pipelines to achieve high performance with minimal overhead for control hardware. Scalar threads running the same computing kernel are grouped together into SIMD batches, sometimes referred to as warps. While SIMD is ideally suited for simple programs, recent GPUs include control flow instructions in the GPU instruction set architecture and programs using these instructions may experience reduced performance due to the way branch execution is supported by hardware. One solution is to add a stack to allow different SIMD processing elements to execute distinct program paths after a branch instruction. The occurrence of diverging branch outcomes for different processing elements significantly degrades performance using this approach. In this thesis, we propose dynamic warp formation and scheduling, a mechanism for more efficient SIMD branch execution on GPUs. It dynamically regroups threads into new warps on the fly following the occurrence of diverging branch outcomes. We show that a realistic hardware implementation of this mechanism improves performance by an average of 47% for an estimated area increase of 8%. / Applied Science, Faculty of / Electrical and Computer Engineering, Department of / Graduate GPU SIMD Control flow Graphics processing unit
20	A M-SIMD Intelligent Memory Rangan, Krishna Kumar 11 October 2001 (has links) No description available. processing in memory M-SIMD parallel processing

Search results