Spelling suggestions: "subject:"gpu""
241 |
Fully Automatic Upper Airway Segmentation and Surfacing on a GPU from Cone-beam CT VolumesFarrell, Michael L. January 2009 (has links)
No description available.
|
242 |
GPU-based Parallel Computing for Nonlinear Finite Element Deformation AnalysisMafi, Ramin 04 1900 (has links)
<p>Computer-based surgical simulation and non-rigid medical image registration in image-guided interventions are examples of applications that would benefit from real-time deformation simulation of soft tissues. The physics of deformation for biological soft-tissue is best described by nonlinear continuum mechanics-based models which then can be discretized by the Finite Element Method (FEM) for a numerical solution. Computational complexity of nonlinear FEM-based models has limited their use in real-time applications. The data-parallel nature and intense arithmetic operations in nonlinear FEM models are suitable for massive parallelization of the computations, in order to meet the response time requirements in such applications.</p> <p>This thesis is concerned with computational aspects of complex nonlinear deformation analysis problems with an emphasis on the speed of response using a parallel computing philosophy. It proposes a fast, accurate and scalable Graphic Processing Unit (GPU)-based implementation of the total Lagrangian FEM using implicit time integration for dynamic nonlinear deformation analysis. This is a general formulation valid for large deformations and strains and can account for material nonlinearities. A penalty method is used to satisfy the physical boundary constraints due to contact between deformable objects. The proposed set of optimized GPU kernels for computing the FEM matrices achieves more than 100 GFLOPS on a GTX 470 GPU device. The use of a novel vector assembly kernel and memory optimization strategies result in a performance gain of up to 25 GFLOPS in the PCG computations.</p> / Doctor of Philosophy (PhD)
|
243 |
Energy-Efficient Interactive Ray Tracing of Static Scenes on Programmable Mobile GPUsLohrmann, Peter J 11 January 2007 (has links)
Mobile technology is improving in quality and capability faster now than ever before. When first introduced, cell phones were strictly used to make voice calls; now, they play satellite radio, MP3s, streaming television, have GPS and navigation capabilities, and have multi-megapixel video cameras. In the near future, cell phones will have programmable graphics processing units (GPU) that will allow users to play games similar to those currently available for top-of-the-line game consoles. Personal digital assistants enable users with full email, scheduling, and internet browsing capabilities in addition to those features offered on cell phones. Underlying all this mobile technology and entertainment is a battery whose technology has just barely tripled in the past 15 years, compared to available disk capacity that has increased over 1,000-fold. Ray tracing is a rendering technique used to generate photorealistic images that include reflections, refraction, participating media, and can fairly easily be extended to include photon mapping for indirect illumination and caustics. In recent years, ray tracing has been implemented on the GPU using various acceleration structures to facilitate rendering. Until now, all studies have used build time and achievable frame rates to determine which acceleration structure is best for ray tracing. We present the very first results comparing both CPU and GPU raytracing using various acceleration structures in terms of energy consumption. By exploring per-pixel costs, we provide insight on the energy consumption and frame rates that can be experienced on cell phones and other mobile devices based on currently available screen resolutions. Our results show that the choice in processing unit has the greatest affect on energy and time costs of ray tracing, followed by the size of the viewport used, and the choice of acceleration structure has the least impact on efficiency. For mobile devices enabled with a programmable GPU, whether it is a cell phone, PDA, or laptop computer, a bounding volume hierarchy implemented on the GPU is the most energy-efficient acceleration structure for ray tracing. Ray tracing on cellular phones with smaller screen resolutions is most energy-efficient using a CPU-based Kd-Tree implementation.
|
244 |
Interaktivní zpracování objemových dat / Interactive Processing of Volumetric DataKolomazník, Jan January 2018 (has links)
Title: Interactive Processing of Volumetric Data Author: Jan Kolomazník Department: Department of Software and Computer Science Education Supervisor: RNDr. Josef Pelikán, Department of Software and Computer Science Education Abstract: Interactive visualization and segmentation of volumetric data are quite lim- ited due to the increased complexity of the task and size of the input data in comparison to two-dimensional processing. A special interactive segmentation workflow is presented, based on minimal graph-cut search. The overall execution time was lowered by implementing all the computational steps on GPU, which required a design of massively parallel algorithms (using thousands of threads). To lower the computational burden even further the graph is constructed over the image subregions com- puted by parallel watershed transformation. As a suitable formalism for a range of massively parallel algorithms was chosen cellular automata. A set of cellular automata extensions was defined, which allows efficient mapping and computation on GPU. Several variants of parallel watershed transformation are then defined in the form of cellular automaton. A novel form of 2D transfer function was presented, to improve direct volume visualization of the input data, suited for discriminating image features by their shape and...
|
245 |
Distributed OpenCL : a platform for distributed, heterogeneous computing for domain scientistsDillon, William H. (William Hall) 29 May 2012 (has links)
It is possible to purchase, for as little as $10,000, a cluster of computers with the capability to rival the supercomputers of only a few years ago. Now, users that have little to no experience developing distributed applications or managing a cluster are in a position to do so. To allow domain scientists to effectively utilize these resources, Distributed OpenCL (DOCL) was developed. DOCL is an easy-to-use foundation for peer-to-peer distributed computation on small to medium clusters. It is assumed that the end-user is a domain scientist, familiar with model development in environments such as Matlab, though inexperienced with distributed computation or parallel programming. The scope of this work includes the definition of a peer-to-peer protocol for discovering and establishing relationships with every node within a multicast domain, using the concepts of Zero-Configuration Networking, multicast DNS, and DNS Service Discovery. A problematic edge case of multicast DNS is detailed along with a mitigation technique. An XML schema is also described for basic peer communication and cluster management and inventory. A system for scheduling algorithm tasks on the cluster of heterogeneous compute devices was developed, including an automatic computation and communication cost measurement system. Finally, a graphical programming language was designed and implemented that allows non-expert programmers and modelers to develop new applications in a straightforward, accessible way. / Graduation date: 2012
|
246 |
Accurate Residual-distribution Schemes for Accelerated Parallel ArchitecturesGuzik, Stephen Michael Jan 12 August 2010 (has links)
Residual-distribution methods offer several potential benefits over classical methods, such as a means of applying upwinding in a multi-dimensional manner and a multi-dimensional positivity property. While it is apparent that residual-distribution methods also offer higher accuracy than finite-volume methods on similar meshes, few studies have directly compared the performance of the two approaches in a systematic and quantitative manner. In this study, comparisons between residual distribution and finite volume are made for steady-state smooth and discontinuous flows of gas dynamics, governed by hyperbolic conservation laws, to illustrate the strengths and deficiencies of the residual-distribution method. Deficiencies which reduce the accuracy are analyzed and a new nonlinear scheme is proposed that closely reproduces or surpasses the accuracy of the best linear residual-distribution scheme. The accuracy is further improved by extending the scheme to fourth order using established finite-element techniques. Finally, the compact stencil, arithmetic workload, and data parallelism of the fourth-order residual-distribution scheme are exploited to accelerate parallel computations on an architecture consisting of both CPU cores and a graphics processing unit. Numerical experiments are used to assess the gains to efficiency and possible monetary savings that may be provided by accelerated architectures.
|
247 |
Parallel Sorting on the Heterogeneous AMD Fusion Accelerated Processing UnitDelorme, Michael Christopher 18 March 2013 (has links)
We explore efficient parallel radix sort for the AMD Fusion Accelerated Processing Unit (APU). Two challenges arise: efficiently partitioning data between the CPU and GPU and the allocation of data in memory regions. Our coarse-grained implementation utilizes both the GPU and CPU by sharing data at the begining and end of the sort. Our fine-grained implementation utilizes the APU’s integrated memory system to share data throughout the sort. Both these implementations outperform the current state of the art GPU radix sort from NVIDIA. We therefore demonstrate that the CPU can be efficiently used to speed up radix sort on the APU.
Our fine-grained implementation slightly outperforms our coarse-grained implementation. This demonstrates the benefit of the APU’s integrated architecture. This performance benefit is hindered by limitations in the APU’s architecture and programming model. We believe that the performance benefits will increase once these limitations are addressed in future generations of the APU.
|
248 |
Accurate Residual-distribution Schemes for Accelerated Parallel ArchitecturesGuzik, Stephen Michael Jan 12 August 2010 (has links)
Residual-distribution methods offer several potential benefits over classical methods, such as a means of applying upwinding in a multi-dimensional manner and a multi-dimensional positivity property. While it is apparent that residual-distribution methods also offer higher accuracy than finite-volume methods on similar meshes, few studies have directly compared the performance of the two approaches in a systematic and quantitative manner. In this study, comparisons between residual distribution and finite volume are made for steady-state smooth and discontinuous flows of gas dynamics, governed by hyperbolic conservation laws, to illustrate the strengths and deficiencies of the residual-distribution method. Deficiencies which reduce the accuracy are analyzed and a new nonlinear scheme is proposed that closely reproduces or surpasses the accuracy of the best linear residual-distribution scheme. The accuracy is further improved by extending the scheme to fourth order using established finite-element techniques. Finally, the compact stencil, arithmetic workload, and data parallelism of the fourth-order residual-distribution scheme are exploited to accelerate parallel computations on an architecture consisting of both CPU cores and a graphics processing unit. Numerical experiments are used to assess the gains to efficiency and possible monetary savings that may be provided by accelerated architectures.
|
249 |
CUDA performance analyzerDasgupta, Aniruddha 05 April 2011 (has links)
GPGPU Computing using CUDA is rapidly gaining ground today. GPGPU has been brought to the masses through the ease of use of CUDA and ubiquity of graphics cards supporting the same. Although CUDA has a low learning curve for programmers familiar with standard programming languages like C, extracting optimum performance from it, through optimizations and hand tuning is not a trivial task. This is because, in case of GPGPU, an optimization strategy rarely affects the functioning in an isolated manner. Many optimizations affect different aspects for better or worse, establishing a tradeoff situation between them, which needs to be carefully handled to achieve good performance. Thus optimizing an application for CUDA is tough and the performance gain might not be commensurate to the coding effort put in.
I propose to simplify the process of optimizing CUDA programs using a CUDA Performance Analyzer. The analyzer is based on analytical modeling of CUDA compatible GPUs. The model characterizes the different aspects of GPU compute unified architecture and can make prediction about expected performance of a CUDA program. It would also give an insight into the performance bottlenecks of the CUDA implementation. This would hint towards, what optimizations need to be applied to improve performance. Based on the model, one would also be able to make a prediction about the performance of the application if the optimizations are applied to the CUDA implementation. This enables a CUDA programmer to test out different optimization strategies without putting in a lot of coding effort.
|
250 |
Comparación del uso de GPGPU y cluster de multicore en problemas con alta demanda computacionalMontes de Oca, Erica January 2012 (has links)
La presente Tesina de Grado tiene como objetivo la investigación y el estudio de las plataformas de memoria compartida GPU y cluster de Multicore para la resolución de problemas con alta demanda computacional. Se presentan soluciones al problema planteado con el fin de comparar rendimiento en sus versiones secuencial, paralela con memoria compartida, paralela con pasaje de mensajes, paralela híbrida y paralela en GPU. Se
analiza la bondad de las soluciones en relación al tiempo de ejecución y aceleración, y se introduce el análisis de consumo energético.
|
Page generated in 0.0357 seconds