• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 465
  • 88
  • 87
  • 56
  • 43
  • 20
  • 14
  • 14
  • 10
  • 5
  • 5
  • 3
  • 3
  • 3
  • 2
  • Tagged with
  • 977
  • 316
  • 202
  • 182
  • 167
  • 165
  • 153
  • 137
  • 123
  • 104
  • 96
  • 93
  • 92
  • 87
  • 81
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
101

Multilevel multidimensional scaling on the GPU

Ingram, Stephen F. 05 1900 (has links)
We present Glimmer, a new multilevel visualization algorithm for multidimensional scaling designed to exploit modern graphics processing unit (GPU) hard-ware. We also present GPU-SF, a parallel, force-based subsystem used by Glimmer. Glimmer organizes input into a hierarchy of levels and recursively applies GPU-SF to combine and refine the levels. The multilevel nature of the algorithm helps avoid local minima while the GPU parallelism improves speed of computation. We propose a robust termination condition for GPU-SF based on a filtered approximation of the normalized stress function. We demonstrate the benefits of Glimmer in terms of speed, normalized stress, and visual quality against several previous algorithms for a range of synthetic and real benchmark datasets. We show that the performance of Glimmer on GPUs is substantially faster than a CPU implementation of the same algorithm. We also propose a novel texture paging strategy called distance paging for working with precomputed distance matrices too large to fit in texture memory. / Science, Faculty of / Computer Science, Department of / Graduate
102

Dynamic warp formation : exploiting thread scheduling for efficient MIMD control flow on SIMD graphics hardware

Fung, Wilson Wai Lun 11 1900 (has links)
Recent advances in graphics processing units (GPUs) have resulted in massively parallel hardware that is easily programmable and widely available in commodity desktop computer systems. GPUs typically use single-instruction, multiple-data (SIMD) pipelines to achieve high performance with minimal overhead for control hardware. Scalar threads running the same computing kernel are grouped together into SIMD batches, sometimes referred to as warps. While SIMD is ideally suited for simple programs, recent GPUs include control flow instructions in the GPU instruction set architecture and programs using these instructions may experience reduced performance due to the way branch execution is supported by hardware. One solution is to add a stack to allow different SIMD processing elements to execute distinct program paths after a branch instruction. The occurrence of diverging branch outcomes for different processing elements significantly degrades performance using this approach. In this thesis, we propose dynamic warp formation and scheduling, a mechanism for more efficient SIMD branch execution on GPUs. It dynamically regroups threads into new warps on the fly following the occurrence of diverging branch outcomes. We show that a realistic hardware implementation of this mechanism improves performance by an average of 47% for an estimated area increase of 8%. / Applied Science, Faculty of / Electrical and Computer Engineering, Department of / Graduate
103

Acceleration of Block-Aware Matrix Factorization on Heterogeneous Platforms

Somers, Gregory W. January 2016 (has links)
Block-structured matrices arise in several contexts in circuit simulation problems. These matrices typically inherit the pattern of sparsity from the circuit connectivity. However, they are also characterized by dense spots or blocks. Direct factorization of those matrices has emerged as an attractive approach if the host memory is sufficiently large to store the block-structured matrix. The approach proposed in this thesis aims to accelerate the direct factorization of general block-structured matrices by leveraging the power of multiple OpenCL accelerators such as Graphical Processing Units (GPUs). The proposed approach utilizes the notion of a Directed Acyclic Graph representing the matrix in order to schedule its factorization on multiple accelerators. This thesis also describes memory management techniques that enable handling large matrices while minimizing the amount of memory transfer over the PCIe bus between the host CPU and the attached devices. The results demonstrate that by using two GPUs the proposed approach can achieve a nearly optimal speedup when compared to a single GPU platform.
104

Visualisation interactive de modèles complexes avec les cartes graphiques programmables / Interactive visualization of massive models using graphics cards

Toledo, Rodrigo 12 October 2007 (has links)
Le but de notre travail est d’accélérer les méthodes de visualisation afin d’obtenir un rendu interactif de modèles volumineux. Ceci est particulièrement problématique pour les applications dont les données dépassent plusieurs millions de polygones. Ces modèles sont généralement composés de nombreux petits objets (ex : plate-forme pétrolière), ou sont très détaillés (ex : objets naturels haute qualité). Nous avons étudié la littérature qui traite de la visualisation en la classant en fonction de son échelle d’application : au niveau de la scène (visibilité des objets), macroscopique (dessin de la géométrie), mésoscopique (ajout de détails pour le rendu final) et microscopique (effets d’éclairage microscopique). Nous nous sommes particulièrement intéressés au niveau macroscopique en introduisant de nouvelles représentations de surfaces, algorithmes de conversion, et primitives basées sur le GPU. Nous classifions les modèles massifs en deux catégories commuite suit : (I) Naturels : Pour les objets très triangulés, les triangles représentent à la fois la partie macroscopique et mésoscopique. Notre idée est d’appliquer un algorithme approprié pour les mésostructures à l’objet en entier. Nous représentons les modèles naturels avec des Geometry Textures (représentation géométrique basée sur des cartes des hauteurs) en conservant la qualité de rendu et en gagnant un comportement de type LOD. (II) Industriels : Nous avons centré notre travail sur la visualisation de sites industriels dont les objets sont principalement constitués de primitives simples. Normalement elles sont triangulées avant le rendu. Nous proposons de les remplacer par nos primitives GPU implicites qui utilisent les équations originelles des primitives. Les bénéfices sont : qualité d’image, mémoire et efficacité de rendu. Nous avons aussi développé un algorithme de récupération de surface qui fourni les équations géométriques originales à partir des maillages polygonaux. / The goal of our work is to speed-up visualization methods in order to obtain interactive rendering of massive models. This is especially challenging for applications whose usual data has a significant size (millions of polygons). These massive models are usually composed either by numerous small objects (such as an oil platform) or by very detailed geometry information (such as high-quality natural models). We have reviewed the visualization literature from the scale-level point-of-view: scene (which concerns objects visibility), macroscale (covering geometry rendering issues), mesoscale (characterized by introducing details in the final rendering) and microscale (responsible for reproducing microscopic lighting effects). We have focused our contributions on the macroscale level, introducing new surface representations, conversion algorithms and GPU-based primitives. We have classified massive models into two different categories as follows: (I) Natural models: For over-tessellated objects, triangles represent both macro and mesostructures. The main idea is to use a visualization algorithm that is adequate to mesostructure but applied to the complete object. We represent natural objects through geometry textures (a geometric representation for surfaces based on height maps), preserving rendering quality and presenting LOD speed-up. (II) Manufactured models : We have focused our work on industrial plant visualization, whose objects are mostly described by combining simple primitives. Usually, these primitives are tessellated before rendering. We suggest replacing them with our GPU implicit primitives that use their original equation. The benefits are: image quality (perfect silhouette and per-pixel depth), memory and rendering efficiency. We have also developed a reverse engineering algorithm to recover original geometric equations from polygonal meshes.
105

Interactive fluid-structure interaction with many-core accelerators

Mawson, Mark January 2014 (has links)
The use of accelerator technology, particularly Graphics Processing Units (GPUs), for scientific computing has increased greatly over the last decade. While this technology allows larger and more complicated problems to be solved faster than before it also presents another opportunity: the real-time and interactive solution of problems. This work aims to investigate the progress that GPU technology has made towards allowing fluid-structure interaction (FSI) problems to be solved in real-time, and to facilitate user interaction with such a solver. A mesoscopic scale fluid flow solver is implemented on third generation nVidia ‘Kepler’ GPUs in two and three dimensions, and its performance studied and compared with existing literature. Following careful optimisation the solvers are found to be at least as efficient as existing work, reaching peak efficiencies of 93% compared with theoretical values. These solvers are then coupled with a novel immersed boundary method, allowing boundaries defined at arbitrary coordinates to interact with the structured fluid domain through a set of singular forces. The limiting factor of the performance of this method is found to be the integration of forces and velocities over the fluid and boundaries; the arbitrary location of boundary markers makes the memory accesses during these integrations largely random, leading to poor utilisation of the available memory bandwidth. In sample cases, the efficiency of the method is found to be as low as 2.7%, although in most scenarios this inefficiency is masked by the fact that the time taken to evolve the fluid flow dominates the overall execution time of the solver. Finally, techniques to visualise the fluid flow in-situ are implemented, and used to allow user interaction with the solvers. Initially this is achieved via keyboard and mouse to control the fluid properties and create boundaries within the fluid, and later by using an image based depth sensor to import real world geometry into the fluid. The work concludes that, for 2D problems, real-time interactive FSI solvers can be implemented on a single laptop-based GPU. In 3D the memory (both size and bandwidth) of the GPU limits the solver to relatively simple cases. Recommendations for future work to allow larger and more complicated test cases to be solved in real-time are then made to complete the work.
106

Blaze-DEM : a GPU based large scale 3D discrete element particle transport framework

Govender, Nicolin January 2015 (has links)
Understanding the dynamic behavior of particulate materials is extremely important to many industrial processes with a wide range of applications ranging from hopper flows in agriculture to tumbling mills in the mining industry. Thus simulating the dynamics of particulate materials is critical in the design and optimization of such processes. The mechanical behavior of particulate materials is complex and cannot be described by a closed form solution for more than a few particles. A popular and successful numerical approach in simulating the underlying dynamics of particulate materials is the discrete element method (DEM). However, the DEM is computationally expensive and computationally viable simulations are typically restricted to a few particles with realistic particle shape or a larger number of particles with an often oversimplified particle shape. It has been demonstrated for numerous applications that an accurate representation of the particle shape is essential to accurately capture the macroscopic transport of particulates. The most common approach to represent particle shape is by using a cluster of spheres to approximate the shape of a particle. This approach is computationally intensive as multiple spherical particles are required to represent a single non-spherical particle. In addition spherical particles are for certain applications a poor approximation when sharp interfaces are essential to capture the bulk transport behavior. An advantage of this approach is that non-convex particles are handled with ease. Polyhedra represent the geometry of most convex particulate materials well and when combined with appropriate contact models exhibit realistic transport behavior to that of the actual system. However detecting collisions between the polyhedra is computationally expensive, often limiting simulations to only a few thousand of particles. Driven by the demand for real-time graphics, the Graphical Processor Unit (GPU) offers cluster type performance at a fraction of the computational cost. The parallel nature of the GPU allows for a large number of simple independent processes to be executed in parallel. This results in a significant speed up over conventional implementations utilizing the Central Processing Unit (CPU) architecture, when algorithms are well aligned and optimized for the threading model of the GPU. This thesis investigates the suitability of the GPU architecture to simulate the transport of particulate materials using the DEM. The focus of this thesis is to develop a computational framework for the GPU architecture that can model (i) tens of millions of spherical particles and (ii) millions of polyhedral particles in a realistic time frame on a desktop computer using a single GPU. The contribution of this thesis is the development of a novel GPU computational frame- work Blaze-DEM, that encompasses collision detection algorithms and various heuristics that are optimized for the parallel GPU architecture. This research has resulted in a new computational performance level being reached in DEM simulations for both spherical / Thesis (PhD)--University of Pretoria, 2015. / Mechanical and Aeronautical Engineering / PhD / Unrestricted
107

Renderování rozsáhlého terénu / Rendering of Large Scale Terrain

Marušič, Martin January 2010 (has links)
This thesis deals with rendering of large scale terrain. The first part describes theory of terrain rendering and particular level of detail techniques. Three modern intriguing algorithms are briefly depicted after this theoretical part. Main work insists on description of Geometry Clipmaps algorithm along with its optimized version GPU-Based Geometry Clipmaps. Implementation of this optimized algorithm is depicted in detail. Main advantage of this approach is incremental update of vertex data, which allows to offload overhead from CPU to GPU. In the last chapter performance of my implementation is analysed using simple benchmark.
108

Computational Enhancements for Direct Numerical Simulations of Statistically Stationary Turbulent Premixed Flames

Mukhadiyev, Nurzhan 05 1900 (has links)
Combustion at extreme conditions, such as a turbulent flame at high Karlovitz and Reynolds numbers, is still a vast and an uncertain field for researchers. Direct numerical simulation of a turbulent flame is a superior tool to unravel detailed information that is not accessible to most sophisticated state-of-the-art experiments. However, the computational cost of such simulations remains a challenge even for modern supercomputers, as the physical size, the level of turbulence intensity, and chemical complexities of the problems continue to increase. As a result, there is a strong demand for computational cost reduction methods as well as in acceleration of existing methods. The main scope of this work was the development of computational and numerical tools for high-fidelity direct numerical simulations of premixed planar flames interacting with turbulence. The first part of this work was KAUST Adaptive Reacting Flow Solver (KARFS) development. KARFS is a high order compressible reacting flow solver using detailed chemical kinetics mechanism; it is capable to run on various types of heterogeneous computational architectures. In this work, it was shown that KARFS is capable of running efficiently on both CPU and GPU. The second part of this work was numerical tools for direct numerical simulations of planar premixed flames: such as linear turbulence forcing and dynamic inlet control. DNS of premixed turbulent flames conducted previously injected velocity fluctuations at an inlet. Turbulence injected at the inlet decayed significantly while reaching the flame, which created a necessity to inject higher than needed fluctuations. A solution for this issue was to maintain turbulence strength on the way to the flame using turbulence forcing. Therefore, a linear turbulence forcing was implemented into KARFS to enhance turbulence intensity. Linear turbulence forcing developed previously by other groups was corrected with net added momentum removal mechanism to prevent mean velocity drift. Also, dynamic inlet control was implemented which retained flame inside of a domain even at very high fuel consumption fluctuations. Last part of this work was to implement pseudospectral method into KARFS. Direct numerical simulations performed previously are targeting real engines and turbines conditions as an ultimate goal. These targeted simulations are prohibitively computationally expensive. This work suggested and implemented into KARFS a pseudospectral method for reacting turbulent flows, as an attempt to decrease computational cost. Approximately four times computational CPU hours savings were achieved.
109

Resource management and application customization for hardware accelerated systems

Tasoulas, Zois Gerasimos 01 June 2021 (has links)
Computational demands are continuously increasing, driven by the growing resource demands of applications. At the era of big-data, big-scale applications, and real-time applications, there is an enormous need for quick processing of big amounts of data. To meet these demands, computer systems have shifted towards multi-core solutions. Technology scaling has allowed the incorporation of even larger numbers of transistors and cores into chips. Nevertheless, area constrains, power consumption limitations, and thermal dissipation limit the ability to design and sustain ever increasing chips. To overpassthese limitations, system designers have turned towards the usage of hardware accelerators. These accelerators can take the form of modules attached to each core of a multi-core system, forming a network on chip of cores with attached accelerators. Another option of hardware accelerators are Graphics Processing Units (GPUs). GPUs can be connected through a host-device model with a general purpose system, and are used to off-load parts of a workload to them. Additionally, accelerators can be functionality dedicated units. They can be part of a chip and the main processor can offload specific workloads to the hardware accelerator unit.In this dissertation we present: (a) a microcoded synchronization mechanism for systems with hardware accelerators that provide distributed shared memory, (b) a Streaming Multiprocessor (SM) allocation policy for single application execution on GPUs, (c) an SM allocation policy for concurrent applications that execute on GPUs, and (d) a framework to map neural network (NN) weights to approximate multiplier accuracy levels. Theaforementioned mechanisms coexist in the resource management domain. Specifically, the methodologies introduce ways to boost system performance by using hardware accelerators. In tandem with improved performance, the methodologies explore and balance trade-offs that the use of hardware accelerators introduce.
110

Dimensionality reduction for hyperspectral imagery

Yang, He 30 April 2011 (has links)
In this dissertation, dimensionality reduction for hyperspectral remote sensing imagery is investigated to alleviate practical application difficulties caused by high data dimension. Band selection and band clustering are applied for this purpose. Based on availability of object prior information, supervised, semi-supervised, and unsupervised techniques are proposed. To take advantage of modern computational architecture, parallel implementations on cluster and graphics processing units (GPU) are developed. The impact of dimensionality reduction on the following data analysis is also evaluated. Specific contributions are as below. 1. A similarity-based unsupervised band selection algorithm is developed to select distinctive and informative bands, which outperforms other existing unsupervised band selection approaches in the literature. 2. An efficient supervised band selection method based on minimum estimated abundance covariance is developed, which outperforms other frequently-used metrics. This new method does not need to conduct classification during band selection process or examine original bands/band combinations as do traditional approaches. 3. An efficient semi-supervised band clustering method is proposed, which uses class signatures to conduct band partition. Compared to traditional unsupervised clustering, computational complexity is significantly reduced. 4. Parallel GPU implementations with computational cost saving strategies for the developed algorithms are designed to facilitate onboard processing. 5. As an application example, band selection results are used for urban land cover classification. With a few selected bands, classification accuracy can be greatly improved, compared to the one using all the original bands or those from other frequently-used dimensionality reduction methods.

Page generated in 0.0196 seconds