Global ETD Search

141	Implementing method of moments on a GPGPU using Nvidia CUDA Virk, Bikram 12 April 2010 (has links) This thesis concentrates on the algorithmic aspects of Method of Moments (MoM) and Locally Corrected Nyström (LCN) numerical methods in electromagnetics. The data dependency in each step of the algorithm is analyzed to implement a parallel version that can harness the powerful processing power of a General Purpose Graphics Processing Unit (GPGPU). The GPGPU programming model provided by NVIDIA's Compute Unified Device Architecture (CUDA) is described to learn the software tools at hand enabling us to implement C code on the GPGPU. Various optimizations such as the partial update at every iteration, inter-block synchronization and using shared memory enable us to achieve an overall speedup of approximately 10. The study also brings out the strengths and weaknesses in implementing different methods such as Crout's LU decomposition and triangular matrix inversion on a GPGPU architecture. The results suggest future directions of study in different algorithms and their effectiveness on a parallel processor environment. The performance data collected show how different features of the GPGPU architecture can be enhanced to yield higher speedup. Nvidia CUDA Electromagnetics Numerical methods Method of moments MoM CUDA GPGPU GPU Moments method (Statistics) Electromagnetism
142	Laser Triangulation Using Spacetime Analysis Benderius, Björn January 2007 (has links) <p>In this thesis spacetime analysis is applied to laser triangulation in an attempt to eliminate certain artifacts caused mainly by reflectance variations of the surface being measured. It is shown that spacetime analysis do eliminate these artifacts almost completely, it is also shown that the shape of the laser beam used no longer is critical thanks to the spacetime analysis, and that in some cases the laser probably even could be exchanged for a non-coherent light source. Furthermore experiments of running the derived algorithm on a GPU (Graphics Processing Unit) are conducted with very promising results.</p><p>The thesis starts by deriving the theory needed for doing spacetime analysis in a laser triangulation setup taking perspective distortions into account, then several experiments evaluating the method is conducted.</p> laser triangulation camera geometry spacetime analysis range imaging parallelization gpgpu Image analysis Bildanalys
143	Advanced Real-time Post-Processing using GPGPU techniques Lönroth, Per, Unger, Mattias January 2008 (has links) <p> </p><p>Post-processing techniques are used to change a rendered image as a last step before presentation and include, but is not limited to, operations such as change of saturation or contrast, and also more advanced effects like depth-of-field and tone mapping.</p><p>Depth-of-field effects are created by changing the focus in an image; the parts close to the focus point are perfectly sharp while the rest of the image has a variable amount of blurriness. The effect is widely used in photography and movies as a depth cue but has in the latest years also been introduced into computer games.</p><p>Today’s graphics hardware gives new possibilities when it comes to computation capacity. Shaders and GPGPU languages can be used to do massive parallel operations on graphics hardware and are well suited for game developers.</p><p>This thesis presents the theoretical background of some of the recent and most valuable depth-of-field algorithms and describes the implementation of various solutions in the shader domain but also using GPGPU techniques. The main objective is to analyze various depth-of-field approaches and look at their visual quality and how the methods scale performance wise when using different techniques.</p><p> </p> Computer graphics Post processing GPGPU Depth of Field CUDA Computer science Datavetenskap
144	Verwendung von Graﬁkprozessoren zur Simulation von Diffusionsprozessen mit zufälligen Sierpiński-Teppichen Lang, Jens 20 May 2009 (has links) (PDF) In dieser Arbeit wurde ein Verfahrung zur Random-Walk-Simulation auf fraktalen Strukturen untersucht. Es dient der Simulation von Diffusion in porösen Materialien. Konkret wurde der Mastergleichungsansatz zur Simulation eines Random Walks auf Sierpiński-Teppichen für GPGPUs (General Purpose Graphics Processing Units) in drei verschiedenen Versionen implementiert: Zunächst wurde die gesamte Fläche in einem zweidimensionalen Array gespeichert. Danach wurde eine Version untersucht, bei der nur die begehbaren Felder abgespeichert wurden. Diese Vorgehensweise spart Speicher, da die Sierpiński-Teppiche meist nur dünn besetzt sind. Weiter wurde die Implementierung verbessert, indem die Fläche jeweils dynamisch erweitert wird, wenn die Simulation an den Rand des vorhandenen Gebietes stößt. Die genutzten Graﬁkprozessoren arbeiten nach dem SIMD-Prinzip. Daher wurde zusätzlich untersucht, ob sich Laufzeitverbesserungen ergeben, wenn der Code dahingehend optimiert wird. Die Ergebnisse zeigen, dass sich in der Tat eine kürzere Laufzeit ergibt, wenn nur noch begehbare Felder abgespeichert werden. Noch weiter kann die Laufzeit mit der dynamischen Erweiterung der Simulationsﬂäche verkürzt werden. Optimierungen für die SIMD-Arbeitsweise der Prozessoren bringen jedoch keine Laufzeitver besserung. / This thesis investigates an algorithm for random walk simulations on fractal structures. Its purpose is the simulation of diffusion in porous materials. Indeed the master equation approach for the simulation of random walks on Sierpiński carpets has been implemented for GPGPUs (general purpose graphics processing units) in three different versions: In the first approach the whole carpet has been saved in a two-dimensional array. Secondly a version was investigated that only saves the present cells. This strategy saves memory as Sierpiński carpets are generally sparse. The implementation has been further improved by extending the carpet dynamically each time when the simulation reaches its current border. The graphics processing units that were used have a SIMD architecture. Therefore it has been investigated additionally if optimization for the SIMD architecture leads to performance improvements. The results show that execution time does indeed decrease if only present cells are being saved. It can be decreased further by dynamically extending the carpet. Optimizations for the SIMD architecture did not result in a reduced execution time. CUDA GPGPU Sierpiński-Teppich ddc:004 ddc:530 Diffusion Irrfahrtsproblem Paralleler Algorithmus Parallelverarbeitung
145	An enhanced GPU architecture for not-so-regular parallelism with special implications for database search Narasiman, Veynu Tupil 27 June 2014 (has links) Graphics Processing Units (GPUs) have become a popular platform for executing general purpose (i.e., non-graphics) applications. To run efficiently on a GPU, applications must be parallelized into many threads, each of which performs the same task but operates on different data (i.e., data parallelism). Previous work has shown that some applications experience significant speedup when executed on a GPU instead of a CPU. The applications that benefit most tend to have certain characteristics such as high computational intensity, regular control-flow and memory access patterns, and little to no communication among threads. However, not all parallel applications have these characteristics. Applications with a more balanced compute to memory ratio, divergent control flow, irregular memory accesses, and/or frequent communication (i.e., not-so-regular applications) will not take full advantage of the GPU's resources, resulting in performance far short of what could be delivered. The goal of this dissertation is to enhance the GPU architecture to better handle not-so-regular parallelism. This is accomplished in two parts. First, I analyze a diverse set of data parallel applications that suffer from divergent control-flow and/or significant stall time due to memory. I propose two microarchitectural enhancements to the GPU called the Large Warp Microarchitecture and Two-Level Warp Scheduling to address these problems respectively. When combined, these mechanisms increase performance by 19% on average. Second, I examine one of the most important and fundamental applications in computing: database search. Database search is an excellent example of an application that is rich in parallelism, but rife with not-so-regular characteristics. I propose enhancements to the GPU architecture including new instructions that improve intra-warp thread communication and decision making, and also a row-buffer locality hint bit to better handle the irregular memory access patterns of index-based tree search. These proposals improve performance by 21% for full table scans, and 39% for index-based search. The result of this dissertation is an enhanced GPU architecture that better handles not-so-regular parallelism. This increases the scope of applications that run efficiently on the GPU, making it a more viable platform not only for current parallel workloads such as databases, but also for future and emerging parallel applications. / text Graphics processing units GPU GPUs GPGPU Branch divergence Warp scheduling Database search on GPUs
146	Calcul en n-dimensions sur GPU Bergeron, Arnaud 04 1900 (has links) Le calcul scientifique sur processeurs graphiques (GPU) est en plein essor depuis un certain temps, en particulier dans le domaine de l'apprentissage machine. Cette thèse présente les efforts pour établir une structure de données de table au multidimensionnel de manière efficace sur GPU. Nous commençons par faire une revue de ce qui est actuellement similaire dans le domaine et des désavantages d'avoir une multitude d'approches. Nous nous intéresserons particulièrement aux calculs fait à partir du langage Python. Nous décrirons des techniques intéressantes telles que la réduction d'ordre et le calcul asynchrone automatique. Pour terminer nous présenterons l'utilisation du module développé dans le cadre de cette thèse. / Scientific computing on GPU (graphical processing units) is on the rise, specifically in machine learning. This thesis presents the implementation of an efficient multidimensional array on the GPU. We will begin by a review of what currently implements similar functionality and the disadvantage of a fragmented approach. We will focus on packages that have a Python interface. We will explain techniques to optimize execution such as order reduction and automatic asynchronous computations. Finally, we will present the functionality of the module developed for this thesis. / Le code source de la libraire développée accompagne ce dépôt dans l'état où il était à ce moment. Il est possible de trouver une version plus à jour sur github (http://github.com/abergeron). Calcul scientifique Python GPGPU Scientific computing
147	Architectures logicielles pour la radio flexible : intégration d'unités de calcul hétérogènes HORREIN, Pierre-Henri 10 January 2012 (has links) (PDF) L'utilisation de la radio flexible permet d'envisager des réseaux sans fil évolutifs, plus efficaces et plus intelligents. Le terme "~radio flexible~" est un terme très général, qui peut s'appliquer à une implanta- tion logicielle des opérations, à une implantation matérielle adaptable basée sur des accélérateurs matériels, ou encore à des implantations mixtes. Il regroupe en fait toutes les implantations de terminaux radio qui ne sont pas figées. Les travaux réalisés durant cette thèse tournent autour de deux points. Le premier est l'utilisation des processeurs graphiques pour la radio flexible, et plus particulièrement pour la radio logicielle. Ces cibles offrent des performances impressionnantes en termes de débit brut de calcul, en se basant sur architecture massivement parallèle. Le parallélisme de données utilisé dans ces processeurs diffère cependant du parallélisme de tâches souvent exploitées dans les applications de radio logicielle. Différentes approches pour résoudre ce problème sont étudiées. Les résultats obtenus sur ce point permettent une nette amélioration du débit de calcul atteignable avec une implantation logicielle, tout en libérant le processeur pour d'autres tâches. Le deuxième point abordé dans cette étude concerne la définition d'un environnement perme- ttant de supporter différentes possibilités d'implantation de la radio flexible. Cet environnement englobe le support de la plateforme hétérogène, et la gestion des applications sur ces plateformes. Bien qu'encore expérimental, les premiers résultats obtenus avec l'environnement montrent ses capacités d'adaptation, et le rendent utilisable pour des applications radio variées sur des plateformes hétérogènes. Radio Flexible Environnement logiciel GPGPU Plateformes Hétérogènes
148	Architectures logicielles pour la radio flexible : intégration d'unités de calcul hétérogènes Horrein, Pierre-henri 10 January 2012 (has links) (PDF) L'utilisation de la radio flexible permet d'envisager des réseaux sans fil évolutifs, plus efficaces et plus intelligents. Le terme "~radio flexible~" est un terme très général, qui peut s'appliquer à une implanta- tion logicielle des opérations, à une implantation matérielle adaptable basée sur des accélérateurs matériels, ou encore à des implantations mixtes. Il regroupe en fait toutes les implantations de terminaux radio qui ne sont pas figées. Les travaux réalisés durant cette thèse tournent autour de deux points. Le premier est l'utilisation des processeurs graphiques pour la radio flexible, et plus particulièrement pour la radio logicielle. Ces cibles offrent des performances impressionnantes en termes de débit brut de calcul, en se basant sur architecture massivement parallèle. Le parallélisme de données utilisé dans ces processeurs diffère cependant du parallélisme de tâches souvent exploitées dans les applications de radio logicielle. Différentes approches pour résoudre ce problème sont étudiées. Les résultats obtenus sur ce point permettent une nette amélioration du débit de calcul atteignable avec une implantation logicielle, tout en libérant le processeur pour d'autres tâches. Le deuxième point abordé dans cette étude concerne la définition d'un environnement perme- ttant de supporter différentes possibilités d'implantation de la radio flexible. Cet environnement englobe le support de la plateforme hétérogène, et la gestion des applications sur ces plateformes. Bien qu'encore expérimental, les premiers résultats obtenus avec l'environnement montrent ses capacités d'adaptation, et le rendent utilisable pour des applications radio variées sur des plateformes hétérogènes. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Radio Flexible Environnement GPGPU Plateformes hétérogènes
149	Molecular Dynamics on a Grand Scale: Towards large-scale atomistic simulations of self-assembling biomolecular systems Matthew Breeze Unknown Date (has links) To explore progressively larger biomolecular systems, methods to model explicit solvent cheaply are required. In this work, the use of Graphics Processing Units, found in commodity video cards, for solving the constraints, calculating the non-bonded forces and generating the pair list in the case of the fully constrained three site SPC water model is investigated. It was shown that the GPU implementation of the SPC constraint-solving algorithm SETTLE was overall 26% faster than a conventional implementation running on a Central Processing Unit (CPU) core. The non-bonded forces were calculated up to 17 times faster than using a CPU core. Using these two approaches, an overall speed up of around 4 times was found. The most successful implementation of the pair-list generation ran at 38% the speed of a conventional grid-based implementation on a CPU core. In each investigation the accuracy was shown to be sufficient using a variety of numerical and distributional tests. Thus, the use of GPUs as parallel processors for MD calculations is highly promising. Lastly, a method of calculating a constraint force analytically is presented. molecular dynamics parallel GPU GPGPU water SETTLE force neighbour list grid search
150	Accelerating digital forensic searching through GPGPU parallel processing techniques Bayne, Ethan January 2017 (has links) Background: String searching within a large corpus of data is a critical component of digital forensic (DF) analysis techniques such as file carving. The continuing increase in capacity of consumer storage devices requires similar improvements to the performance of string searching techniques employed by DF tools used to analyse forensic data. As string searching is a trivially-parallelisable problem, general purpose graphic processing unit (GPGPU) approaches are a natural fit. Currently, only some of the research in employing GPGPU programming has been transferred to the field of DF, of which, a closed-source GPGPU framework was used— Complete Unified Device Architecture (CUDA). Findings from these earlier studies have found that local storage devices from which forensic data are read present an insurmountable performance bottleneck. Aim: This research hypothesises that modern storage devices no longer present a performance bottleneck to the currently used processing techniques of the field, and proposes that an open-standards GPGPU framework solution – Open Computing Language (OpenCL) – would be better suited to accelerate file carving with wider compatibility across an array of modern GPGPU hardware. This research further hypothesises that a modern multi-string searching algorithm may be better adapted to fulfil the requirements of DF investigation. Methods: This research presents a review of existing research and tools used to perform file carving and acknowledges related work within the field. To test the hypothesis, parallel file carving software was created using C# and OpenCL, employing both a traditional string searching algorithm and a modern multi-string searching algorithm to conduct an analysis of forensic data. A set of case studies that demonstrate and evaluate potential benefits of adopting various methods in conducting string searching on forensic data are given. This research concludes with a final case study which evaluates the performance to perform file carving with the best-proposed string searching solution and compares the result with an existing file carving tool— Foremost. Results: The results demonstrated from the research establish that utilising the parallelised OpenCL and Parallel Failureless Aho-Corasick (PFAC) algorithm solution demonstrates significantly greater processing improvements from the use of a single, and multiple, GPUs on modern hardware. In comparison to CPU approaches, GPGPU processing models were observed to minimised the amount of time required to search for greater amounts of patterns. Results also showed that employing PFAC also delivers significant performance increases over the BM algorithm. The method employed to read data from storage devices was also seen to have a significant effect on the time required to perform string searching and file carving. Conclusions: Empirical testing shows that the proposed string searching method is believed to be more efficient than the widely-adopted Boyer-Moore algorithms when applied to string searching and performing file carving. The developed OpenCL GPGPU processing framework was found to be more efficient than CPU counterparts when searching for greater amounts of patterns within data. This research also refutes claims that file carving is solely limited by the performance of the storage device, and presents compelling evidence that performance is bound by the combination of the performance of the storage device and processing technique employed.

Search results