Spelling suggestions: "subject:"muda"" "subject:"duda""
201 |
Random Forests for CUDA GPUsLapajne, Mikael Hellborg, Slat, Daniel January 2010 (has links)
Context. Machine Learning is a complex and resource consuming process that requires a lot of computing power. With the constant growth of information, the need for efficient algorithms with high performance is increasing. Today's commodity graphics cards are parallel multi processors with high computing capacity at an attractive price and are usually pre-installed in new PCs. The graphics cards provide an additional resource to be used in machine learning applications. The Random Forest learning algorithm which has been showed competitive within machine learning has a good potential for performance increase through parallelization of the algorithm. Objectives. In this study we implement and review a revised Random Forest algorithm for GPU execution using CUDA. Methods. A review of previous work in the area has been done by studying articles from several sources, including Compendex, Inspec, IEEE Xplore, ACM Digital Library and Springer Link. Additional information regarding GPU architecture and implementation specific details have been obtained mainly from documentation available from Nvidia and the Nvidia developer forums. The implemented algorithm has been benchmarked and compared with two state-of-the-art CPU implementations of the Random Forest algorithm, both regarding consumed time for training and classification and for classification accuracy. Results. Measurements from benchmarks made on the three different algorithms are gathered showing the performance results of the algorithms for two publicly available data sets. Conclusion. We conclude that our implementation under the right conditions is able to outperform its competitors. We also conclude that this is only true for certain data sets depending on the size of the data sets. Moreover we conclude that there is potential for further improvements of the algorithm both regarding performance as well as adaption towards a wider range of real world applications. / Mikael: +46768539263, Daniel: +46703040693
|
202 |
Advanced Real-time Post-Processing using GPGPU techniquesLönroth, Per, Unger, Mattias January 2008 (has links)
Post-processing techniques are used to change a rendered image as a last step before presentation and include, but is not limited to, operations such as change of saturation or contrast, and also more advanced effects like depth-of-field and tone mapping. Depth-of-field effects are created by changing the focus in an image; the parts close to the focus point are perfectly sharp while the rest of the image has a variable amount of blurriness. The effect is widely used in photography and movies as a depth cue but has in the latest years also been introduced into computer games. Today’s graphics hardware gives new possibilities when it comes to computation capacity. Shaders and GPGPU languages can be used to do massive parallel operations on graphics hardware and are well suited for game developers. This thesis presents the theoretical background of some of the recent and most valuable depth-of-field algorithms and describes the implementation of various solutions in the shader domain but also using GPGPU techniques. The main objective is to analyze various depth-of-field approaches and look at their visual quality and how the methods scale performance wise when using different techniques.
|
203 |
A Skeleton Programming Library for Multicore CPU and Multi-GPU SystemsEnmyren, Johan January 2010 (has links)
This report presents SkePU, a C++ template library which provides a simple and unified interface for specifying data-parallel computations with the help of skeletons on GPUs using CUDA and OpenCL. The interface is also general enough to support other architectures, and SkePU implements both a sequential CPU and a parallel OpenMP back end. It also supports multi-GPU systems. Benchmarks show that copying data between the host and the GPU is often a bottleneck. Therefore a container which uses lazy memory copying has been implemented to avoid unnecessary memory transfers. SkePU was evaluated with small benchmarks and a larger application, a Runge-Kutta ODE solver. The results show that skeletal parallel programming is indeed a viable approach for GPU Computing and that a generalized interface for multiple back ends is also reasonable. The best performance gains are received when the computation load is large compared to memory I/O (the lazy memory copying can help to achieve this). We see that SkePU offers good performance with a more complex and realistic task such as ODE solving, with up to ten times faster run times when using SkePU with a GPU back end compared to a sequential solver running on a fast CPU. From the benchmarks we can conclude that skeletal parallel programming is indeed a viable approach for GPU Computing and that a generalized interface for multiple back ends is also reasonable. SkePU does however have some disadvantages too; there is some overhead in using the library which we can see from the dot product and LibSolve benchmarks. Although not big, it is still there and if performance is of uttermost importance, then a hand coded solution would be best. One cannot express all calculations in terms of skeletons either, if one have such a problem, specialized routines must still be created.
|
204 |
GPU accelerated Nonlinear Soft Tissue DeformationKottravel, Sathish January 2012 (has links)
There are two types of structures in human body, solid organs and hollow membrane like organs. Brain, liver and other soft tissues such as tendons, muscles, cartilage etc., are examples of solid organs. Colon and blood vessels are examples of hollow organs. They greatly differ in structure and mechanical behavior. Deformation of these types of structures is an important phenomena during the process of medical simulation. The primary focus of this project is on deformation of soft tissues. These kind of soft tissues usually undergo large deformation. Deformation of an organ can be considered as mechanical response of that organ during medical simulation. This can be modeled using continuum mechanics and FEM. The primary goal of any system, irrespective of methods and models chosen, it must provide real-time response to obtain sufficient realism and accurate information. One such example is medical training system using haptic feedback. In the past two decades many models were developed and very few considered the non-linear nature in material and geometry of the solid organs. TLED is one among them. A finite element formulation proposed by Miller in 2007, known as total Lagrangian explicit dynamics (TLED) algorithm, will be discussed with respect to implementation point of view and deploying GPU acceleration (because of its parallel nature to some extent) for both pre-processing and actual computation.
|
205 |
Mikrovågssimulering med realtidsljus : Realtids-ray tracing i CUDAHaggren, Simon January 2010 (has links)
Detta arbete undersöker möjligheterna med att simulera mikrovågor i ett slutet system. Systemet implementeras med en redan befintlig teknik kallad ray tracing. Ray tracing är en ljussättningsteknik som går ut på att simulera fotoners rörelse mellan ljuskälla och betraktare i en miljö man önskar ljussätta, och sedan belysa de områden som blir träffade för att på detta vis rendera en bild. Fotoner och mikrovågor har egenskaper som liknar varandra då de båda är elektromagnetism med olika våglängder. Ray tracing är en krävande algoritm då många uträkningar för varje foton måste utföras varje uppdatering. Därför har algoritmen implementerats med CUDA, ett bibliotek från Nvidia som gör det möjligt att använda GPU:n som ett generellt beräkningssystem. Detta är lämpligt för just den här typen av problem då GPU:ns arkitektur är ämnad för multipla, parallella uträkningar.
|
206 |
Implementation och prestandaanalys av radarsignalbehandlingsalgoritmer på GPUNilsson, Mikael January 2014 (has links)
Det här examensarbetet utvärderar om det är möjligt att använda en eller flera GPUs för att under realtidsförhållanden utföra radarsignalbehandling i ett pulsdopplerradarsystem. En kedja med radarsignalbehandlingsalgoritmer som används för att utföra detektion har implementerats med CUDA och sedan prestandaanalyserats med fokus på låg exekveringstid. Två CFAR-detektionsalgoritmer, CA- och OS-CFAR, har inkluderats i analysen. För CFAR-algoritmerna har flera alternativ formulerats och implementerats för att utvärdera hur de bäst kan anpassas för att exekvera på en GPU. Prestandaanalysen av de implementerade algoritmerna visar att det är möjligt för det tänkta systemet att använda grafikkort för att utföra radarsignalbehandlingen i realtid. Implementationslösningar har presenterats både för CA- och OS-CFAR som uppfyller tidskraven för systemet, i vissa fall med god marginal. Lägst exekveringstider erhölls när vissa kompromisser gjordes med algoritmernas flexibilitet. För CA-CFAR erhölls lägst exekveringstider när ett Summed Area Table användes för tröskelvärdesberäkningen. För OS-CFAR uppmättes de lägsta exekveringstiderna när en rankjämförelse gjordes istället för en full sortering. Prestandaanalysen visar även att det på ett effektivt sätt går att skala upp implementationen för att utnyttja fler än en GPU.
|
207 |
Adaptation of algorithms for underwater sonar data processing to GPU-based systemsSundin, Patricia January 2013 (has links)
In this master thesis, algorithms for acoustic simulations in underwater environments are ported for GPU processing. The GPU parallel computing platforms used are CUDA, OpenCL and SkePU. The purpose of this master thesis is to adapt and evaluate the ported algorithms' performance on two modern NVIDIA GPUs, Tesla K20 and Quadro K5000. Several optimizations, described in existing literature for GPU processing (e.g. usage of shared memory, coalesced memory accesses), are implemented and multiple versions of each algorithm are created to study their trade-offs. Evaluation on two GPUs showed that different versions of the same algorithm have different performance characteristic and execution with the best performing version can give better performance than the original algorithm executing on 8 CPUs. A performance comparison between CUDA, OpenCL and SkePU versions of one algorithm is also made.
|
208 |
Enforcing Security Policies On GPU Computing Through The Use Of Aspect-Oriented Programming TechniquesAlbassam, Bader 29 June 2016 (has links)
This thesis presents a new security policy enforcer designed for securing parallel computation on CUDA GPUs. We show how the very features that make a GPGPU desirable have already been utilized in existing exploits, fortifying the need for security protections on a GPGPU. An aspect weaver was designed for CUDA with the goal of utilizing aspect-oriented programming for security policy enforcement. Empirical testing verified the ability of our aspect weaver to enforce various policies. Furthermore, a performance analysis was performed to demonstrate that using this policy enforcer provides no significant performance impact over manual insertion of policy code. Finally, future research goals are presented through a plan of work. We hope that this thesis will provide for long term research goals to guide the field of GPU security.
|
209 |
Distribuovaný systém kryptoanalýzy / Distributed systems for cryptoanalysysZelinka, Miloslav Unknown Date (has links)
This work deals with crytpoanalysis, calculation performance and its distribution. It describes the methods of distributing the calculation performance for the needs of crypto analysis. Further it focuses on other methods allowing the speed increasing in breaking the cryptographic algorithms especially by means of the hash functions. The work explains the relatively new term of cloud computing and its consecutive use in cryptography. The examples of its practical utilisation follow. Also this work deals with possibility how to use grid computing for needs of cryptoanalysis. At last part of this work is system design using „cloud computing“ for breaking access password.
|
210 |
Implementation of Fast Real-Time Control of Unstable Modes in Fusion Plasma DevicesLundberg, Martin January 2017 (has links)
In recent years, multi-core graphics processing units (GPUs) have been increasingly used by researchers for other purposes than rendering graphics. This thesis presents the implementation of GPU computing for real-time control of plasma instabilities known as resistive wall modes at the EXTRAP T2R fusion plasma device. A NVIDIA GPU is installed in the device plasma control system. Using the CUDA parallel computing platform, PID and LQG control algorithms are developed for the GPU. It is shown that computation times decrease with up to 80 % for the LQG algorithm and 33 % for the PID algorithm if computations in the control system are shifted from the central processing unit (CPU) to the GPU. The gains of GPU utilization are limited by latencies introduced by the CPU-GPU interaction. To better exploit the potential of the GPU, a zero-copy method is proposed, in which the GPU is allowed to perform read and write operations on CPU memory.
|
Page generated in 0.0682 seconds