Global ETD Search

141	Simulation de fluides, approche lagrangienne Wattez, Adrien January 2014 (has links) Avec la généralisation du recours à l’infographie dans l’industrie des loisirs, la demande concernant la production de scènes de simulation de fluides d’un réalisme croissant a fortement augmenté durant les deux dernières décennies. Nous proposons de nombreux éléments pertinents pour simuler le fluide, essentiellement tournés vers l’approche lagrangienne (les méthodes particulaires). Cette présentation a donc pour objet l’étude et la mise au point de techniques permettant de reproduire le comportement des fluides s’appuyant sur l’aspect particulaire du fluide. Les algorithmes de ces dernières années permettent un gain de performance significatif, nous permettant d’obtenir des simulations de fluides incompressibles en temps réel. L’usage des noyaux constants par morceaux, nouvel outil de calcul numérique, au sein de simulations de fluides dites lagrangiennes sera également abordé. Avec l’augmentation continue de la puissance de calcul et de nouvelles avancées telles que la programmation dite GPGPU, nous verrons également comment obtenir une recherche de voisinage efficace permettant d’augmenter grandement les performances de calcul. SPH Simulation de fluides Méthode lagrangienne Équations de Navier - Stokes GPGPU CUDA Noyaux constants par morceaux
142	Improving Visualisation of Large Multi-Variate Datasets: New Hardware-Based Compression Algorithms and Rendering Techniques Chernoglazov, Alexander Igorevich January 2012 (has links) Spectral computed tomography (CT) is a novel medical imaging technique that involves simultaneously counting photons at several energy levels of the x-ray spectrum to obtain a single multi-variate dataset. Visualisation of such data poses significant challenges due its extremely large size and the need for interactive performance for scientific and medical end-users. This thesis explores the properties of spectral CT datasets and presents two algorithms for GPU-accelerated real-time rendering from compressed spectral CT data formats. In addition, we describe an optimised implementation of a volume raycasting algorithm on modern GPU hardware, tailored to the visualisation of spectral CT data. visualisation compression volume rendering gpu gpgpu cuda spectral ct ct rendering optimisation
143	High performance bioinformatics and computational biology on general-purpose graphics processing units Ling, Cheng January 2012 (has links) Bioinformatics and Computational Biology (BCB) is a relatively new multidisciplinary field which brings together many aspects of the fields of biology, computer science, statistics, and engineering. Bioinformatics extracts useful information from biological data and makes these more intuitive and understandable by applying principles of information sciences, while computational biology harnesses computational approaches and technologies to answer biological questions conveniently. Recent years have seen an explosion of the size of biological data at a rate which outpaces the rate of increases in the computational power of mainstream computer technologies, namely general purpose processors (GPPs). The aim of this thesis is to explore the use of off-the-shelf Graphics Processing Unit (GPU) technology in the high performance and efficient implementation of BCB applications in order to meet the demands of biological data increases at affordable cost. The thesis presents detailed design and implementations of GPU solutions for a number of BCB algorithms in two widely used BCB applications, namely biological sequence alignment and phylogenetic analysis. Biological sequence alignment can be used to determine the potential information about a newly discovered biological sequence from other well-known sequences through similarity comparison. On the other hand, phylogenetic analysis is concerned with the investigation of the evolution and relationships among organisms, and has many uses in the fields of system biology and comparative genomics. In molecular-based phylogenetic analysis, the relationship between species is estimated by inferring the common history of their genes and then phylogenetic trees are constructed to illustrate evolutionary relationships among genes and organisms. However, both biological sequence alignment and phylogenetic analysis are computationally expensive applications as their computing and memory requirements grow polynomially or even worse with the size of sequence databases. The thesis firstly presents a multi-threaded parallel design of the Smith- Waterman (SW) algorithm alongside an implementation on NVIDIA GPUs. A novel technique is put forward to solve the restriction on the length of the query sequence in previous GPU-based implementations of the SW algorithm. Based on this implementation, the difference between two main task parallelization approaches (Inter-task and Intra-task parallelization) is presented. The resulting GPU implementation matches the speed of existing GPU implementations while providing more flexibility, i.e. flexible length of sequences in real world applications. It also outperforms an equivalent GPPbased implementation by 15x-20x. After this, the thesis presents the first reported multi-threaded design and GPU implementation of the Gapped BLAST with Two-Hit method algorithm, which is widely used for aligning biological sequences heuristically. This achieved up to 3x speed-up improvements compared to the most optimised GPP implementations. The thesis then presents a multi-threaded design and GPU implementation of a Neighbor-Joining (NJ)-based method for phylogenetic tree construction and multiple sequence alignment (MSA). This achieves 8x-20x speed up compared to an equivalent GPP implementation based on the widely used ClustalW software. The NJ method however only gives one possible tree which strongly depends on the evolutionary model used. A more advanced method uses maximum likelihood (ML) for scoring phylogenies with Markov Chain Monte Carlo (MCMC)-based Bayesian inference. The latter was the subject of another multi-threaded design and GPU implementation presented in this thesis, which achieved 4x-8x speed up compared to an equivalent GPP implementation based on the widely used MrBayes software. Finally, the thesis presents a general evaluation of the designs and implementations achieved in this work as a step towards the evaluation of GPU technology in BCB computing, in the context of other computer technologies including GPPs and Field Programmable Gate Arrays (FPGA) technology. 572.8
144	Jämförelse av GPGPU-ramverk och AES-metoder : Jämförelse av GPGPU-ramverk och AES-metoder för att besvara vilka GPGPU-ramverk och vilken AES-metod som bör rekommenderas för AES-kryptering med GPGPU Berggren, Emil, Gustafson, Tobias January 2017 (has links) Sammanfattning Bakgrund - Dagens processorer börjar närma sig gränsen för hur höga klockfrekvenser de kan köras i. Detta har lett till att processorer har fått fler kärnor för att kunna exekvera flera processer parallellt med flertrådade applikationer. Det finns dock ofta en stor mängd oanvänd beräkningskraft under långa perioder då datorn är igång som ligger i grafikprocessorn, GPU. Då en GPU kan köra tusentals många fler trådar på samma gång än en CPU har ramverk för att göra mer generella beräkningar på GPU utvecklats, dessa kallas för GPGPU-ramverk. Då varje kärna på en GPU inte är lika stark som på en CPU ligger vinsten i att använda algoritmer som går bra att parallellisera. En sådan algoritm är krypteringsalgoritmen AES som är en av de säkraste och vanligaste krypteringsalgoritmerna som används idag. Syfte – Med hjälp av GPU-accelerering kan man kryptera med AES snabbare än med en traditionell CPU-lösning. För att göra GPU-accelereringen så effektiv som möjligt undersöker detta examensarbete vilken AES-metod samt vilket GPGPU-ramverk man bör välja. Metod – För att undersöka vilken/vilka AES-metoder samt vilka GPGPU-ramverk som var lämpliga att använda för denna undersökning gjordes två litteraturstudier. Utifrån data som litteraturstudierna gav genomfördes experiment för att jämföra de valda GPGPU-ramverken med den valda AES-metoden som ansågs vara mest lämpliga. Resultat – Från litteraturstudierna kom det fram att OpenCL och CUDA blir de rekommenderade GPGPU-ramverken och att CTR blir den rekommenderade AES-metoden för AES-kryptering med GPGPU-programmering. Utifrån experimenten som genomförts kunde det konstateras att CUDA är ett effektivare GPGPU-ramverk än OpenCL för AES-CTR på det testade grafikkortet, GTX 560. Implikationer – CUDA är snabbare vid större filer för att OpenCL begränsas mer av dataöverföringshastigheten än CUDA på ett GTX 560. Begränsningar – Experimenten genomfördes endast på ett grafikkort från Nvidia. Eftersom Nvidia inte har något intresse i att optimera för andra GPGPU-ramverk så kunde inte testresultaten från OpenCL verifieras med externa verktyg. Detta p.g.a. att Nvidias verktyg inte längre stödjer debugging eller profiling för OpenCL. Nyckelord – Processorer, GPGPU, AES, CTR, OpenCL, CUDA, GPGPU-ramverk / Abstract Background - Processors today are approaching the limit for how high clockfrequences they can run. This has led to that instead of trying to make them run faster they are instead made with multiple cores so they can utilize parallelization by running several threads in parallel. However aside from the CPU there is still the graphics card which has a large amount of unused computing power for long durations of time while the computer is active. While a GPU might not have as quick processors it instead has several thousands of them at the same time than a CPU which have led to the development of GPGPU-frameworks to use that potential parallelization. The profit in this lies in using algorithms and code functions that got high potential parallelization, one of which is the AES encryption algorithm. AES is one of the most widely used encryption algorithms today and also considered to be one of the most secure. Purpose – By using GPGPU-acceleration the encryption speed of AES is higher than by using a traditional CPU approach. To make the GPU-acceleration as effective as possible this study looks into which AES-method and which GPGPU-framework that should be chosen during development. Method – This study makes two literature studies to determine which AES-methods and which GPGPU-frameworks that are viable for GPU-acceleration of AES. Afterwards this study conducts experiments to determine which of these GPGPU-frameworks are the most effective. Findings – The conclusion drawn from the literature study is that the CTR-method among the AES-methods is preferable due to its parallelization potential and high security measures. Among the current GPGPU-frameworks only two frameworks satisfies the criteria determined from the literature study and those are CUDA and OpenCL. From the experiment the conclusion is thereafter drawn that of the two GPGPU-frameworks CUDA is more effective due to the bandwidth limits that OpenCL have compared to CUDA. This conclusion is valid on at least the tested graphics card, GTX 560. Implications – CUDA is faster at larger file sizes than OpenCL due to limited data transfer speed in OpenCL on a GTX 560. Limitations – The experiments were only conducted on one graphics card from Nvidia due to hardware constraints in that CUDA can only be run on Nvidia hardware. Due to this hardware constraint and Nvidia’s lack of support in their tools for debugging and profiling of OpenCL the results from the testing of OpenCL couldn’t be verified using external tools. Keywords – Processor, GPGPU, AES, CTR, OpenCL, CUDA, GPGPU-framework Processorer GPGPU AES CTR OpenCL CUDA GPGPU-ramverk Embedded Systems Inbäddad systemteknik
145	Fast Parallel Machine Learning Algorithms for Large Datasets Using Graphic Processing Unit Li, Qi 30 November 2011 (has links) This dissertation deals with developing parallel processing algorithms for Graphic Processing Unit (GPU) in order to solve machine learning problems for large datasets. In particular, it contributes to the development of fast GPU based algorithms for calculating distance (i.e. similarity, affinity, closeness) matrix. It also presents the algorithm and implementation of a fast parallel Support Vector Machine (SVM) using GPU. These application tools are developed using Compute Unified Device Architecture (CUDA), which is a popular software framework for General Purpose Computing using GPU (GPGPU). Distance calculation is the core part of all machine learning algorithms because the closer the query is to some samples (i.e. observations, records, entries), the more likely the query belongs to the class of those samples. K-Nearest Neighbors Search (k-NNS) is a popular and powerful distance based tool for solving classification problem. It is the prerequisite for training local model based classifiers. Fast distance calculation can significantly improve the speed performance of these classifiers and GPUs can be very handy for their accelerations. Meanwhile, several GPU based sorting algorithms are also included to sort the distance matrix and seek for the k-nearest neighbors. The speed performances of the sorting algorithms vary depending upon the input sequences. The GPUKNN proposed in this dissertation utilizes the GPU based distance computation algorithm and automatically picks up the most suitable sorting algorithm according to the characteristics of the input datasets. Every machine learning tool has its own pros and cons. The advantage of SVM is the high classification accuracy. This makes SVM possibly the best classification tool. However, as in many other machine learning algorithms, SVM's slow training phase slows down when the size of the input datasets increase. The GPU version of parallel SVM based on parallel Sequential Minimal Optimization (SMO) implemented in this dissertation is proposed to reduce the time cost in both training and predicting phases. This implementation of GPUSVM is original. It utilizes many parallel processing techniques to accelerate and minimize the computations of kernel evaluation, which are considered as the most time consuming operations in SVM. Although the many-core architecture of GPU performs the best in data level parallelism, multi-task (aka. task level parallelism) processing is also integrated into the application to improve the speed performance of tasks such as multiclass classification and cross-validation. Furthermore, the procedure of finding worst violators is distributed to multiple blocks on the CUDA model. This reduces the time cost for each iteration of SMO during the training phase. All of these violators are shared among different tasks in multiclass classification and cross-validation to reduce the duplicate kernel computations. The speed performance results have shown that the achieved speedup of both the training phase and predicting phase are ranging from one order of magnitude to three orders of magnitude times faster compared to the state of the art LIBSVM software on some well known benchmarking datasets. SVM GPU CUDA Machine Learning Parallel Processing Computer Sciences Physical Sciences and Mathematics
146	GPU implementace algoritmů irradiance a radiance caching / GPU implementation of the irradiance and radiance caching algorithms Bulant, Martin January 2015 (has links) The object of this work is to create software implementing two algorithms for global ilumination computing. Iradiance and radiance caching should be implemented in CUDA framework on graphics card (GPU). Parallel implementation on GPU should dramatically improve algoritm speed compared to CPU implementation. The software will be written using already done framework for global illumunation computation. That allow to focus to algorithm implementation only. This work should speed up testing of new or existing methods for global illumination computing, because saving and reusing of intermediate results can be used for other algorithms too. Powered by TCPDF (www.tcpdf.org)
147	Riešenie problému globálnej optimalizácie využitím GPU / Employing GPUs in Global Optimization Problems Hošala, Michal January 2014 (has links) The global optimization problem -- i.e., the problem of finding global extreme points of given function on restricted domain of values -- often appears in many real-world applications. Improving efficiency of this task can reduce the latency of the application or provide more precise result since the task is usually solved by an approximative algorithm. This thesis focuses on the practical aspects of global optimization algorithms, especially in the domain of algorithmic trading data analysis. Successful implementations of the global optimization solver already exist for CPUs, but they are quite time demanding. The main objective of this thesis is to design a GO solver that utilizes the raw computational power of the GPU devices. Despite the fact that the GPUs have significantly more computational cores than the CPUs, the parallelization of a known serial algorithm is often quite challenging due to the specific execution model and the memory architecture constraints of the existing GPU architectures. Therefore, the thesis will explore multiple approaches to the problem and present their experimental results.
148	Dynamická simulace tuhých těles na programovatelných GPU / Dynamic simulation of rigid bodies using programmable GPUs Cséfalvay, Szabolcs January 2011 (has links) The goal of this work is to create a program which simulates the dynamics of rigid bodies and their systems using GPGPU with an emphasis on speed and stability. The result is a physics engine that uses the CUDA architecture. It runs entirely on the GPU, handles collision detection, collision response and different forces like friction, gravity, contact forces, etc. It supports spheres, rods (which are similar to cylinders), springs, boxes and planes. It's also possible to construct compound objects by connecting basic primitives.
149	GPU implementace algoritmů irradiance a radiance caching / GPU implementation of the irradiance and radiance caching algorithms Bulant, Martin January 2015 (has links) The objective of this work is to create software implementing two algorithms for global ilumination computation. Iradiance and radiance caching should be implemented in CUDA framework on a graphics card (GPU). Parallel implementation on the GPU should improve algoritm speed compared to CPU implementation. The software will be written using an already done framework for global illumunation computation. That allows to focus on algorithm implementation only. This work should speed up testing of new or existing methods for global illumination computing, because saving and reusing of intermediate results can be used for other algorithms too. Powered by TCPDF (www.tcpdf.org)
150	Using Graphical Processors to Implement Radio Base Station Control Plane Functions / Implementera radiobasstationers kontrollplans funktioner med grafikprocessor Ringman, Noak January 2019 (has links) Today more devices are being connected to the Internet via mobile networks. With more devices in mobile networks, the workload on radio base stations increases. Radio base stations must be energy efficient and cheap which makes high-performance central processing units (CPUs) a bad alternative to meet the increasing workload. An alternative could be a graphics processing unit (GPU) which have a different hardware architecture more suitable for data parallel problems. This thesis has investigated the parallelisation possibilities in the user-equipment handling part of radio base stations, and the aim was to use a GPU to take advantage of the parallelism. The investigation found a mixed pipeline and data parallelism in user-equipment handling. A parallelism suitable for a graphics processing unit (GPU) execution. The tasks which handle user-equipment were divided into smaller communication-free sub-tasks. Sub-task batches of user-equipment were collected and offloaded to a GPU. A peak throughput gain of 62.2 times over the single-threaded CPU was achieved, but with an impact on latency with more than a magnitude. The latency was for all workloads at least 1.24 higher for the GPU implementations compared to the CPU implementations. A radio base station with many more user-equipment than the once existing today was simulated. For this radio base station, a gain of 14.0 times the single-threaded CPU was achieved, while the latency increased by 2.4 times. To really make use of a GPU implementation the number of user-equipment, the load, must be higher than in existing radio base stations today. Mobile networks Radio base stations Control plane GPU GPGPU CUDA OpenCL Computer Engineering Datorteknik

Search results