Global ETD Search

221	Adéquation Algorithme Architecture et modèle de programmation pour l'implémentation d'algorithmes de traitement du signal et de l'image sur cluster multi-GPU Boulos, Vincent 18 December 2012 (has links) (PDF) Initialement con¸cu pour d'echarger le CPU des tˆaches de rendu graphique, le GPU estdevenu une architecture massivement parall'ele adapt'ee au traitement de donn'ees volumineuses.Alors qu'il occupe une part de march'e importante dans le Calcul Haute Performance, uned'emarche d'Ad'equation Algorithme Architecture est n'eanmoins requise pour impl'ementerefficacement un algorithme sur GPU.La contribution de cette th'ese est double. Dans un premier temps, nous pr'esentons legain significatif apport'e par l'impl'ementation optimis'ee d'un algorithme de granulom'etrie(l'ordre de grandeur passe de l'heure 'a la minute pour un volume de 10243 voxels). Un mod'eleanalytique permettant d''etablir les variations de performance de l'application de granulom'etriesur GPU a 'egalement 'et'e d'efini et pourrait ˆetre 'etendu 'a d'autres algorithmes r'eguliers.Dans un second temps, un outil facilitant le d'eploiement d'applications de Traitementdu Signal et de l'Image sur cluster multi-GPU a 'et'e d'evelopp'e. Pour cela, le champ d'actiondu programmeur est r'eduit au d'ecoupage du programme en tˆaches et 'a leur mapping sur les'el'ements de calcul (GPP ou GPU). L'am'elioration notable du d'ebit sortant d'une applicationstreaming de calcul de carte de saillence visuelle a d'emontr'e l'efficacit'e de notre outil pourl'impl'ementation d'une solution sur cluster multi-GPU. Afin de permettre un 'equilibrage decharge dynamique, une m'ethode de migration de tˆaches a 'egalement 'et'e incorpor'ee 'a l'outil. [SPI:OTHER] Engineering Sciences/Other Adéquation Algorithme Architecture GPGPU Modèle de programmation Architecture hétérogène Multi-GPU
222	Calcul hautes performances pour les formulations intégrales en électromagnétisme basses fréquences. Intégration, compression matricielle par ondelettes et résolution sur architecture GPGPU Rubeck, Christophe 18 December 2012 (has links) (PDF) Les méthodes intégrales sont des méthodes particulièrement bien adaptées à la modélisation des systèmes électromagnétiques car contrairement aux méthodes par éléments finis elles ne nécessitent pas le maillage des matériaux inactifs tel que l'air. Ces modèles sont donc légers en terme du nombre de degrés de liberté. Cependant ceux sont des méthodes à interactions totales qui génèrent des matrices de systèmes d'équations pleines. Ces matrices sont longues à calculer en temps processeur et coûteuses à stocker dans la mémoire vive de l'ordinateur. Nous réduisons dans ces travaux les temps de calcul grâce au parallélisme, c'est-à-dire l'utilisation de plusieurs processeurs, notamment sur cartes graphiques (GPGPU). Nous réduisons également le coût du stockage mémoire via de la compression matricielle par ondelettes (il s'agit d'un algorithme proche de la compression d'images). C'est une compression par pertes, nous avons ainsi développé un critère pour contrôler l'erreur introduite par la compression. Les méthodes développées sont appliquées sur une formulation électrostatique de calcul de capacités, mais elles sont à priori également applicables à d'autres formulations. [SPI:OTHER] Engineering Sciences/Other Calcul hautes performances Méthodes intégrales Compression matricielle par ondelettes Architecture GPGPU
223	Efficient graph algorithm execution on data-parallel architectures Bangalore Lakshminarayana, Nagesh 12 January 2015 (has links) Mechanisms for improving the execution efficiency of graph algorithms on Data-Parallel Architectures were proposed and identified. Execution of graph algorithms on GPGPU architectures, the prevalent data-parallel architectures was considered. Irregular and data dependent accesses in graph algorithms were found to cause significant idle cycles in GPGPU cores. A prefetching mechanism that reduced the amount of idle cycles by prefetching a data-dependent access pattern found in graph algorithms was proposed. Storing prefetches in unused spare registers in addition to storing them in the cache was shown to be more effective by the prefetching mechanism. The design of the cache hierarchy for graph algorithms was explored. First, an exclusive cache hierarchy was shown to be beneficial at the cost of increased traffic; a region based exclusive cache hierarchy was shown to be similar in performance to an exclusive cache hierarchy while reducing on-chip traffic. Second, bypassing cache blocks at both the level one and level two caches was shown to be beneficial. Third, the use of fine-grained memory accesses (or cache sub-blocking) was shown to be beneficial. The combination of cache bypassing and fine-grained memory accesses was shown to be more beneficial than applying the two mechanisms individually. Finally, the impact of different implementation strategies on algorithm performance was evaluated for the breadth first search algorithm using different input graphs and heuristics to identify the best performing implementation for a given input graph were also discussed. Graph algorithms Data-parallel architectures GPGPU architectures Prefetching Cache hierarchy Inclusion property Cache bypass Fine-grained accesses BFS characterization
224	Signal- och bildbehandling på moderna grafikprocessorer Pettersson, Erik January 2005 (has links) En modern grafikprocessor är oerhört kraftfull och har en prestanda som potentiellt sett är många gånger högre än för en modern mikroprocessor. I takt med att grafikprocessorn blivit alltmer programmerbar har det blivit möjligt att använda den för beräkningstunga tillämpningar utanför dess normala användningsområde. Inom det här arbetet utreds vilka möjligheter och begränsningar som uppstår vid användandet av grafikprocessorer för generell programmering. Arbetet inriktas främst mot signal- och bildbehandlingstillämpningar men mycket av principerna är tillämpliga även inom andra områden. Ett ramverk för bildbehandling implementeras och några algoritmer inom bildanalys realiseras och utvärderas, bland annat stereoseende och beräkning av optiskt flöde. Resultaten visar på att vissa tillämpningar kan uppvisa en avsevärd prestandaökning i en grafikprocessor jämfört med i en mikroprocessor men att andra tillämpningar kan vara ineffektiva eller mycket svåra att implementera. / The modern graphical processing unit, GPU, is an extremely powerful unit, potentially many times more powerful than a modern microprocessor. Due to its increasing programmability it has recently become possible to use it in computation intensive applications outside its normal usage. This work investigates the possibilities and limitations of general purpose programming on GPUs. The work mainly concentrates on signal and image processing although much of the principles are applicable to other areas as well. A framework for image processing on GPUs is implemented and a few computer vision algorithms are implemented and evaluated, among them stereo vision and optical flow. The results show that some applications can gain a substantial speedup when implemented correctly in the GPU but others can be inefficent or extremly hard to implement. GPU GPGPU image processing computer vision stereo vision optical flow
225	Optical Flow Computation on Compute Unified Device Architecture / Optiskt flödeberäkning med CUDA Ringaby, Erik January 2008 (has links) There has been a rapid progress of the graphics processor the last years, much because of the demands from computer games on speed and image quality. Because of the graphics processor’s special architecture it is much faster at solving parallel problems than the normal processor. Due to its increasing programmability it is possible to use it for other tasks than it was originally designed for. Even though graphics processors have been programmable for some time, it has been quite difficult to learn how to use them. CUDA enables the programmer to use C-code, with a few extensions, to program NVIDIA’s graphics processor and completely skip the traditional programming models. This thesis investigates if the graphics processor can be used for calculations without knowledge of how the hardware mechanisms work. An image processing algorithm calculating the optical flow has been implemented. The result shows that it is rather easy to implement programs using CUDA, but some knowledge of how the graphics processor works is required to achieve high performance. optical flow GPU GPGPU CUDA Engineering and Technology Teknik och teknologier
226	Physically-based fluid-particle system using DirectCompute for use in real-time games / Fysiskt baserade vätskepartikelsystem med DirectCompute för användning i realtidsspel Falkenby, Jesper Hansson January 2014 (has links) Context: Fluid-particle systems are seldom used in games, the apparent performance costs of simulating a fluid-particle system discourages the developer to implement a system of such. The processing power delivered by a modern GPU enables the developer to implement complex particle systems such as fluid-particle systems. Writing efficient fluid-particle systems is the key when striving for real-time fluid-particle simulations with good scalability. Objectives: This thesis ultimately tries to provide the reader with a well-performing and scalable fluid-particle system simulated in real-time using a great number of particles. The fluid-particle system implements two different fluid physics models for diversity and comparison purposes. The fluid-particle system will then be measured for each fluid physics model and provide results to educate the reader on how well the performance of a fluid-particle system might scale with the increase of active particles in the simulation. Finally, a performance comparison of the particle scalability is made by completely excluding the fluid physics calculations and simulate the particles using only gravity as an affecting force to be able to demonstrate how taxing the fluid physics calculations are on the GPU. Methods: The fluid-particle system has been run using different simulation scenarios, where each scenario is defined by the amount of particles being active and the dimensions of our fluid-particle simulation space. The performance results from each scenario has then been saved and put into a collection of results for a given simulation space. Results: The results presented demonstrate how well the fluid-particle system actually scales being run on a modern GPU. The system reached over a million particles while still running at an acceptable frame rate, for both of the fluid physics models. The results also shows that the performance is greatly reduced by simulating the particle system as a fluid-particle one, instead of only running it with gravity applied. Conclusions: With the results presented, we are able to conclude that fluid-particle systems scale well with the number of particles being active, while being run on a modern GPU. There are many optimizations to be done to be able to achieve a well-performing fluid-particle system, when developing fluid-particle system you should be wary of the many performance pitfalls that comes with it. / Vätskebaserade partikelsystem används sällan inom realtidsspel. Dessa system är väldigt prestandakrävande, till den grad att de avskräcker utvecklare från att implementera dem i sina realtidsspel. GPGPU ger utvecklare möjligheten att implementera komplexa partikelsystem, såsom vätskepartikelsystem, och simulera dessa system i realtid. Den här uppsatsen utforskar två olika fysikmodeller som kan användas för vätskesimulering, och sedan utförs det prestandamätningar under varierande omständigheter. Baserat på dessa prestandamätningar så kan slutsatser dras om hur skalbart ett vätskepartikelsystem är, alltså hur prestandan sjunker i förhållande till antalet partiklar i systemet. Slutsatser som dras efter att samtliga mätningar har utförts är att dessa system har en god skalbarhet, men att det finns många prestandafallgropar man måste se upp för när man utvecklar ett vätskepartikelsystem. Fluid-particle system GPGPU DirectCompute fluid physics model Computer Sciences Datavetenskap (datalogi) Human Computer Interaction
227	Laser Triangulation Using Spacetime Analysis Benderius, Björn January 2007 (has links) In this thesis spacetime analysis is applied to laser triangulation in an attempt to eliminate certain artifacts caused mainly by reflectance variations of the surface being measured. It is shown that spacetime analysis do eliminate these artifacts almost completely, it is also shown that the shape of the laser beam used no longer is critical thanks to the spacetime analysis, and that in some cases the laser probably even could be exchanged for a non-coherent light source. Furthermore experiments of running the derived algorithm on a GPU (Graphics Processing Unit) are conducted with very promising results. The thesis starts by deriving the theory needed for doing spacetime analysis in a laser triangulation setup taking perspective distortions into account, then several experiments evaluating the method is conducted. laser triangulation camera geometry spacetime analysis range imaging parallelization gpgpu
228	GPU based particle system / GPU baserat partikel system Olsson, Martin Wexö January 2010 (has links) GPGPU (General purpose computing on graphics processing unit) is quite common in today's modern computer games when doing heavy simulation calculations like game physics or particle systems. GPU programming is not only used in games but also in scientific research when doing heavy calculations on molecular structures and protein folding etc. The reason why you use the GPU for these kinds of tasks is that you can gain an incredible speedup in performance to your application. Previous research shows that particle systems scale very well to the GPU architecture. When simulating very large particle-system on the GPU it can run up to 79 times faster than the CPU. But for some very small particle systems the CPU proved to be faster. This research aims to compare the difference between the GPU and CPU when it comes to simulating many smaller particle-systems and to see what happen to the performance when the particle-systems become smaller and smaller. GPU GPGPU OpenCL CL Particles Particle Systems Performance Simulation Computer Sciences Datavetenskap (datalogi) Human Computer Interaction
229	Accelerating computational diffusion MRI using Graphics Processing Units Fernandez, Moises Hernandez January 2017 (has links) Diffusion magnetic resonance imaging (dMRI) allows uniquely the study of the human brain non-invasively and in vivo. Advances in dMRI offer new insight into tissue microstructure and connectivity, and the possibility of investigating the mechanisms and pathology of neurological diseases. The great potential of the technique relies on indirect inference, as modelling frameworks are necessary to map dMRI measurements to neuroanatomical features. However, this mapping can be computationally expensive, particularly given the trend of increasing dataset sizes and/or the increased complexity in biophysical modelling. Limitations on computing can restrict data exploration and even methodology development. A step forward is to take advantage of the power offered by recent parallel computing architectures, especially Graphics Processing Units (GPUs). GPUs are massive parallel processors that offer trillions of floating point operations per second, and have made possible the solution of computationally intensive scientific problems that were intractable before. However, they are not inherently suited for all types of problems, and bespoke computational frameworks need to be developed in many cases to take advantage of their full potential. In this thesis, we propose parallel computational frameworks for the analysis of dMRI using GPUs within different contexts. We show that GPU-based designs can offer accelerations of more than two orders of magnitude for a number of scientific computing tasks with different parallelisability requirements, ranging from biophysical modelling for tissue microstructure estimation to white matter tractography for connectome generation. We develop novel and efficient GPUaccelerated solutions, including a framework that automatically generates GPU parallel code from a user-specified biophysical model. We also present a parallel GPU framework for performing probabilistic tractography and generating whole-brain connectomes. Throughout the thesis, we discuss several strategies for parallelising scientific applications, and we show the great potential of the accelerations obtained, which change the perspective of what is computationally feasible.
230	Efektivní komunikace v multi-GPU systémech / Efficient Communication in Multi-GPU Systems Špeťko, Matej January 2018 (has links) After the introduction of CUDA by Nvidia, the GPUs became devices capable of accelerating any general purpose computation. GPUs are designed as parallel processors which posses huge computation power. Modern supercomputers are often equipped with GPU accelerators. Sometimes single GPU performance is not enough for a scientific application and it needs to scale over multiple GPUs. During the computation, there is a need for the GPUs to exchange partial results. This communication represents computation overhead and it is important to research methods of the effective communication between GPUs. This means less CPU involvement, lower latency and shared system buffers. This thesis is focused on inter-node and intra-node GPU-to-GPU communication using GPUDirect technologies from Nvidia and CUDA-Aware MPI. Subsequently, k-Wave toolbox for simulating the propagation of acoustic waves is introduced. This application is accelerated by using CUDA-Aware MPI. Peer-to-peer transfer support is also integrated to k-Wave using CUDA Inter-process Communication.

Search results