• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 465
  • 88
  • 87
  • 56
  • 43
  • 20
  • 14
  • 14
  • 10
  • 5
  • 5
  • 3
  • 3
  • 3
  • 2
  • Tagged with
  • 977
  • 316
  • 202
  • 182
  • 167
  • 165
  • 153
  • 137
  • 123
  • 104
  • 96
  • 93
  • 92
  • 87
  • 81
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Exemplar-based image inpainting on the GPU applied to 3D video conversion

Wallace, Ryan 22 February 2012 (has links)
My thesis investigates automation and optimizations for occlusion filling, a problem resulting from the generation of new viewpoints in the 3D video conversion process. Image inpainting is a popular topic in image processing research. The ability to fill a region of an image in a manner that is visually pleasing is a difficult and computationally expensive task. Recently, the most successful methods have been exemplar-based, copying patches of the image from a specified source region into the region to be filled. These algorithms are designed to propagate both structure and texture into the fill region. They are brute force algorithms however, and are generally implemented as sequential algorithms to be run on the CPU. In this research, I have effectively mapped the costly portions of an exemplar-based image inpainting algorithm to the GPU. I produce equivalent inpainting results in less time by parallelizing the brute force patch searching portion of the algorithm. Furthermore, I compare the results with another recent, optimized inpainting algorithm, and apply both algorithms to the real world problem of occlusion filling in a 3D video conversion pipeline. / Graduate / 10000-01-01
22

Interactive simulation and visualization of complex physics problems using the GPU

Zhao, Cailu 06 1900 (has links)
Physical simulations are in general very computationally intensive and required large and costly computing resources. Most of those simulations are rarely interactive as the link between visualization, interaction, and simulation is too slow. The recent development of parallel Graphic Processing Unit (GPU) on graphic cards has enabled us to develop real-time interactive simulators of complex physical phenomenon. In this thesis, two GPU-based implementations of interactive physical simulations are presented: (1) visualization of the electron probability distribution of a hydrogen atom, (2) visualization and simulation of particle based fluid dynamic model using smoothed particle hydrodynamics. These simulations were developed in the context of the Microscopic and Subatomic Visualization (MASAV) project as a demonstration of the capabilities of the GPU to create realistic interactive physical simulators for scientific education. / Computer Graphics
23

MULTIPLE SEQUENCES ALIGNMENT FOR PHYLOGENETIC TREE CONSTRUCTION USING GRAPHICS PROCESSING UNITS

He, Jintai 01 January 2008 (has links)
Sequence alignment has become a routine procedure in evolutionary biology in looking for evolutionary relationships between primary sequences of DNA, RNA, and protein. Smith Waterman and Needleman Wunsch algorithms are two algorithms respectively for local alignment and global alignment. Both of them are based on dynamic programming and guarantee optimal results. They have been widely used for the past dozens of years. However, time and space requirement increase exponentially with the number of sequences increase. Here I present a novel approach to improve the performance of sequence alignment by using graphics processing unit which is capable of handling large amount of data in parallel.
24

Modèles de programmation et d'exécution pour les architectures parallèles et hybrides. Applications à des codes de simulation pour la physique. / Programming models and execution models for parallel and hybrid architectures. Application to physics simulations.

Ospici, Matthieu 03 July 2013 (has links)
Nous nous intéressons dans cette thèse aux grandes architectures parallèles hybrides, c'est-à-dire aux architectures parallèles qui sont une combinaison de processeurs généraliste (Intel Xeon par exemple) et de processeurs accélérateur (GPU Nvidia). L'exploitation efficace de ces grappes hybrides pour le calcul haute performance est au cœur de nos travaux. L'hétérogénéité des ressources de calcul au sein des grappes hybrides pose de nombreuses problématiques lorsque l'on souhaite les exploiter efficacement avec de grandes applications scientifiques existantes. Deux principales problématiques ont été traitées. La première concerne le partage des accélérateurs pour les applications MPI et la seconde porte sur la programmation et l'exécution concurrente de code entre CPU et accélérateur. Les architectures hybrides sont très hétérogènes : en fonction des architectures, le ratio entre le nombre d'accélérateurs et le nombre de coeurs CPU est très variable. Ainsi, nous avons tout d'abord proposé une notion de virtualisation d'accélérateur, qui permet de donner l'illusion aux applications qu'elles ont la capacité d'utiliser un nombre d'accélérateurs qui n'est pas lié au nombre d'accélérateurs physiques disponibles dans le matériel. Un modèle d'exécution basé sur un partage des accélérateurs est ainsi mis en place et permet d'exposer aux applications une architecture hybride plus homogène. Nous avons également proposé des extensions aux modèles de programmation basés sur MPI / threads afin de traiter le problème de l'exécution concurrente entre CPU et accélérateurs. Nous avons proposé pour cela un modèle basé sur deux types de threads, les threads CPU et accélérateur, permettant de mettre en place des calculs hybrides exploitant simultanément les CPU et les accélérateurs. Dans ces deux cas, le déploiement et l'exécution du code sur les ressources hybrides est crucial. Nous avons pour cela proposé deux bibliothèques logicielles S_GPU 1 et S_GPU 2 qui ont pour rôle de déployer et d'exécuter les calculs sur le matériel hybride. S_GPU 1 s'occupant de la virtualisation, et S_GPU 2 de l'exploitation concurrente CPU -- accélérateurs. Pour observer le déploiement et l'exécution du code sur des architectures complexes à base de GPU, nous avons intégré des mécanismes de traçage qui permettent d'analyser le déroulement des programmes utilisant nos bibliothèques. La validation de nos propositions a été réalisée sur deux grandes application scientifiques : BigDFT (simulation ab-initio) et SPECFEM3D (simulation d'ondes sismiques). Nous les avons adapté afin qu'elles puissent utiliser S_GPU 1 (pour BigDFT) et S_GPU 2 (pour SPECFEM3D). / We focus on large parallel hybrid architectures based on a combination of general processors (eg Intel Xeon) and accelerators (Nvidia GPU). Using with efficiency these hybrid clusters for high performance computing is central in our work. The heterogeneity of computing resources in hybrid clusters leads to many issues when we want to use large scientific applications on it. Two main issues were addressed in this thesis. The first one concerns the sharing of accelerators for MPI applications and the second one focuses on programming and concurrent execution of application between CPUs and accelerators. Hybrid architectures are very heterogeneous: for each cluster, the ratio between the number of accelerators and the number of CPU cores can be different. Thus, we first propose a concept of accelerator virtualization, which allows applications to view an architecture in which the number of accelerators is not related to the number of physical accelerators. An execution model based on the sharing of accelerators is proposed. We also propose extensions to the programming model based on MPI + threads to address the problem of concurrent execution between CPUs and accelerators. We propose a system based on two types of threads (CPU and accelerator threads) to implement hybrid calculations simultaneously exploiting the CPU and accelerators model. In both cases, the deployment and the execution of code on hybrid resources is critical. Consequently, we propose two software libraries, called S_GPU 1 and S_GPU 2, designed to deploy and perform calculations on the hybrid hardware. S_GPU 1 deals with virtualization and S_GPU 2 allows concurrent operations on CPUs and accelerators. To observe the deployment and the execution of code on complex hybrid architectures, we integrated trace mechanisms for analyzing the progress of the programs using our libraries. The validation of our proposals has been carried out on two large scientific applications: BigDFT (ab-initio simulation) and SPECFEM3D (simulation of seismic waves).
25

Accelerating Parallel Tasks by Optimizing GPU Hardware Utilization

Tsung-Tai Yeh (8775680) 29 April 2020 (has links)
<div> <div> <div> <p>Efficient GPU applications rely on programmers carefully structure their codes to fully utilize the GPU resources. In general, programmers spend a significant amount of time optimizing their applications to run efficiently on domain-specific architectures. To reduce the burden on programmers to utilize GPUs fully, I create several hardware and software solutions that improve the resource utilization on parallel processors without significant programmer intervention. </p><p><br></p> <p>Recently, GPUs are increasingly being deployed in data centers to accelerate latency-driven applications, which exhibit a modest amount of data parallelism. The synchronous kernel execution on these applications cannot fully utilize the entire GPU. Thus, a GPU contains multiple hardware queues to improve its throughput by executing multiple kernels on a single device simultaneously when there are sufficient hardware resources. However, a GPU faces severe underutilization when the space in these queues has been exhausted, and the performance benefit vanishes with the decreased parallelism. As a result, I proposed a GPU runtime system – Pagoda, which virtualizes the GPU hardware resources by using an OS-like daemon kernel called MasterKernel. Tasks (kernels) are spawned from the CPU onto Pagoda as they be- come available, and are scheduled by the MasterKernel at the warp granularity to increase the GPU throughput for latency-driven applications. This work invents several programming APIs to handle task spawning and synchronization and includes parallel tasks and warp scheduling policies to reduce runtime overhead. </p> </div> </div> <div> <div> <p><br></p> </div> </div> </div> <div> <div> <div> <p>Latency-driven applications have both high throughput demands and response time constraints. These applications may launch many kernels that do not fully utilize the GPU unless grouped with large batch sizes. However, batching forces jobs to wait, which increases their latency. This wait time can be unacceptable when considering real-world arrival times of jobs. However, the round-robin GPU kernel scheduler is oblivious to application deadlines. This deadline-blind scheduling policy makes it harder to ensure that kernels meet their QoS deadlines. To enhance the responsiveness of the GPU, I also proposed LAX, including an execution time estimate for jobs with one or many kernels. Moreover, LAX adjusts priorities of kernels dynamically based on their slack time to increase the number of jobs that complete by their real-time deadlines. LAX improves the responsiveness and throughput of GPUs. </p><p><br></p> <p>It is well-known that grouping threads into warps can create redundancy across scalar values in GPU vector registers. However, I also found that the layout of thread indices in multi-dimensional threadblocks (TBs) creates redundancy in the registers storing thread IDs. This redundancy propagates into dependent instructions that can be traced and identified statically. To remove GPU redundant instructions, I proposed DARSIE that uses a per-kernel compiler finalization check that uses TB dimensions to determine which instructions are redundant. Once identified, DARSIE hardware skips TB-redundant instructions before they are fetched. </p><p>DARSIE uses a new multithreaded register renaming and instruction synchronization technique to share the values from redundant instructions among warps in each TB. Altogether, DARSIE decreases the number of executed instructions to improve GPU performance and energy. </p> </div> </div> </div>
26

Exemplar-based image inpainting on the GPU applied to 3D video conversion

Wallace, Ryan 22 February 2012 (has links)
My thesis investigates automation and optimizations for occlusion filling, a problem resulting from the generation of new viewpoints in the 3D video conversion process. Image inpainting is a popular topic in image processing research. The ability to fill a region of an image in a manner that is visually pleasing is a difficult and computationally expensive task. Recently, the most successful methods have been exemplar-based, copying patches of the image from a specified source region into the region to be filled. These algorithms are designed to propagate both structure and texture into the fill region. They are brute force algorithms however, and are generally implemented as sequential algorithms to be run on the CPU. In this research, I have effectively mapped the costly portions of an exemplar-based image inpainting algorithm to the GPU. I produce equivalent inpainting results in less time by parallelizing the brute force patch searching portion of the algorithm. Furthermore, I compare the results with another recent, optimized inpainting algorithm, and apply both algorithms to the real world problem of occlusion filling in a 3D video conversion pipeline. / Graduate
27

Scientific Computing on Streaming Processors

Menon, Sandeep 01 January 2008 (has links) (PDF)
High performance streaming processors have achieved the distinction of being very efficient and cost-effective in terms of floating-point capacity, thereby making them an attractive option for scientific algorithms that involve large arithmetic effort. Graphics Processing Units (GPUs) are an example of this new initiative to bring vector-processing to desktop computers; and with the advent of 32-bit floating-point capabilities, these architectures provide a versatile platform for the efficient implementation of such algorithms. To exemplify this, the implementation of a Conjugate Gradient iterative solver for PDE solutions on unstructured two- and three-dimensional grids using such hardware is described. This would greatly benefit applications such as fluid-flow solvers which seek efficient methods to solve large sparse systems. The implementation has also been successfully incorporated into an existing object oriented CFD code, thereby enabling the option of using these architectures as efficient math co-processors in the computational framework.
28

Algorithmes de visualisation des incertitudes en géomodélisation sur GPU / GPU-Based uncertainty visualization algorithms for geomodeling

Viard, Thomas 05 October 2010 (has links)
En géosciences, la majeure partie du sous-sol est inaccessible à toute observation directe. Seules des informations parcellaires ou imprécises sont donc disponibles lors de la construction ou de la mise à jour de modèles géologiques ; de ce fait, les incertitudes jouent un rôle fondamental en géomodélisation. La théorie des problèmes inverses et les méthodes de simulations stochastiques fournissent un cadre théorique permettant de générer un ensemble de représentations plausibles du sous-sol, également nommées réalisations. En pratique, la forte cardinalité de l'ensemble des réalisations limite significativement tout traitement ou interprétation sur le modèle géologique.L'objectif de cette thèse est de fournir au géologue des algorithmes de visualisation permettant d'explorer, d'analyser et de communiquer les incertitudes spatiales associées à de larges ensembles de réalisations. Nos contributions sont les suivantes : (1) Nous proposons un ensemble de techniques dédiées à la visualisation des incertitudes pétrophysiques. Ces techniques reposent sur une programmation sur carte graphique (GPU) et utilisent une architecture garantissant leur interopérabilité ; (2) Nous proposons deux techniques dédiées à la visualisation des incertitudes structurales, traitant aussi bien les incertitudes géométriques que les incertitudes topologiques (existence de la surface ou interactions avec d'autres surfaces) ; (3) Nous évaluons la qualité des algorithmes de visualisation des incertitudes par le biais de deux études sur utilisateurs, portant respectivement sur la perception des méthodes statiques et par animation. Ces études apportent un éclairage nouveau sur la manière selon laquelle l'incertitude doit être représentée / Most of the subsurface is inaccessible to direct observation in geosciences. Consequently, only local or imprecise data are available when building or updating a geological model; uncertainties are therefore central to geomodeling. The inverse problem theory and the stochastic simulation methods provide a framework for the generation of large sets of likely representations of the subsurface, also termed realizations. In practice, however, the size of the set of realizations severely impacts further interpretation or processing of the geological model.This thesis aims at providing visualization algorithms to expert geologists that allow them to explore, analyze and communicate on spatial uncertainties associated to large sets of realizations. Our contributions are: (1) We propose a set of techniques dedicated to petrophysical uncertainty visualization, based on a GPU programming approach that maintains their interoperability; (2) We propose two techniques dedicated to structural uncertainty visualization that can handle both geometrical and topological uncertainties (e.g., the existence of the surface or its relationships with other surfaces); (3) We assess the quality of our uncertainty visualization algorithms through two user studies, which respectively focus on the perception of static and animated methods. These studies bring new elements on how uncertainty should be represented
29

GPU-aware Component-based Development for Embedded Systems

Campeanu, Gabriel January 2016 (has links)
Nowadays, more and more embedded systems are equipped with e.g., various sensors that produce large amount of data. One of the challenges of traditional (CPU-based) embedded systems is to process this considerable amount of data such that it produces the appropriate performance level demanded by embedded applications. A solution comes from the usage of a specialized processing unit such as Graphics Processing Unit (GPU). A GPU can process large amount of data thanks to its parallel processing architecture, delivering an im- proved performance outcome compared to CPU. A characteristic of the GPU is that it cannot work alone; the CPU must trigger all its activities. Today, taking advantage of the latest technology breakthrough, we can benefit of the GPU technology in the context of embedded systems by using heterogeneous CPU-GPU embedded systems. Component-based development has demonstrated to be a promising methology in handling software complexity. Through component models, which describe the component specification and their interaction, the methodology has been successfully used in embedded system domain. The existing component models, designed to handle CPU-based embedded systems, face challenges in developing embedded systems with GPU capabilities. For example, current so- lutions realize the communication between components with GPU capabilities via the RAM system. This introduces an undesired overhead that negatively affects the system performance. This Licentiate presents methods and techniques that address the component- based development of embedded systems with GPU capabilities. More concretely, we provide means for component models to explicitly address the GPU-aware component-based development by using specific artifacts. For example, the overhead introduced by the traditional way of communicating via RAM is reduced by inserting automatically generated adapters that facilitate a direct component communication over the GPU memory. Another contribution of the thesis is a component allocation method over the system hardware. The proposed solution offers alternative options in opti- mizing the total system performance and balancing various system properties (e.g., memory usage, GPU load). For the validation part of our proposed solutions, we use an underwater robot demonstrator equipped with GPU hardware. / Ralf 3
30

Traitement STAP en environnement hétérogène. Application à la détection radar et implémentation sur GPU / STAP processing in heterogeneous environment. Application to radar detection and implementation on GPU

Degurse, Jean-François 15 January 2014 (has links)
Les traitements spatio-temporels adaptatifs (STAP) sont des traitements qui exploitent conjointement les deux dimensions spatiale et temporelle des signaux reçus sur un réseau d'antennes, contrairement au traitement d'antenne classique qui n'exploite que la dimension spatiale, pour leur filtrage. Ces traitements sont particulièrement intéressants dans le cadre du filtrage des échos reçus par un radar aéroporté en provenance du sol pour lesquels il existe un lien direct entre direction d'arrivée et fréquence Doppler. Cependant, si les principes des traitements STAP sont maintenant bien acquis, leur mise en œuvre pratique face à un environnement réel se heurte à des points durs non encore résolus dans le contexte du radar opérationnel. Le premier verrou, adressé par la thèse dans une première phase, est d'ordre théorique, et consiste en la définition de procédures d'estimation de la matrice de covariance du fouillis sur la base d'une sélection des données d'apprentissage représentatives, dans un contexte à la fois de fouillis non homogène et de densité parfois importante des cibles d'intérêts. Le second verrou est d'ordre technologique, et réside dans l'implémentation physique des algorithmes, lié à la grande charge de calcul nécessaire. Ce point, crucial en aéroporté, est exploré par la thèse dans une deuxième phase, avec l'analyse de la faisabilité d'une implémentation sur GPU des étapes les plus lourdes d'un algorithme de traitement STAP. / Space-time adaptive processing (STAP) is a processing that makes use of both the spatial and the temporal dimensions of the received signals by an antenna array, whereas conventional antenna processing only exploits the spatial dimension to perform filtering. These processing are very powerful to remove ground echoes received by airborne radars, where there is a direct relation between the arrival angle and the Doppler frequency. However, if the principles of STAP processing are now well understood, their performances are limited when facing practical situations. The first part of this thesis, is theoretical, and consists of defining effective procedures to estimate the covariance matrix of the clutter using a representative selection of training data, in a context of both non-homogeneous clutter and sometimes high density of targets. The second point studied in this thesis is technological, and lies in the physical implementation of the selected algorithms, because of their high computational workload requirement. This is a key point in airborne operations, and is explored by the thesis in a second phase, with the analysis of the feasibility of implementation on GPU of the heaviest stages of a STAP processing.

Page generated in 0.0158 seconds