Global ETD Search

31	Traitement STAP en environnement hétérogène. Application à la détection radar et implémentation sur GPU / STAP processing in heterogeneous environment. Application to radar detection and implementation on GPU Degurse, Jean-François 15 January 2014 (has links) Les traitements spatio-temporels adaptatifs (STAP) sont des traitements qui exploitent conjointement les deux dimensions spatiale et temporelle des signaux reçus sur un réseau d'antennes, contrairement au traitement d'antenne classique qui n'exploite que la dimension spatiale, pour leur filtrage. Ces traitements sont particulièrement intéressants dans le cadre du filtrage des échos reçus par un radar aéroporté en provenance du sol pour lesquels il existe un lien direct entre direction d'arrivée et fréquence Doppler. Cependant, si les principes des traitements STAP sont maintenant bien acquis, leur mise en œuvre pratique face à un environnement réel se heurte à des points durs non encore résolus dans le contexte du radar opérationnel. Le premier verrou, adressé par la thèse dans une première phase, est d'ordre théorique, et consiste en la définition de procédures d'estimation de la matrice de covariance du fouillis sur la base d'une sélection des données d'apprentissage représentatives, dans un contexte à la fois de fouillis non homogène et de densité parfois importante des cibles d'intérêts. Le second verrou est d'ordre technologique, et réside dans l'implémentation physique des algorithmes, lié à la grande charge de calcul nécessaire. Ce point, crucial en aéroporté, est exploré par la thèse dans une deuxième phase, avec l'analyse de la faisabilité d'une implémentation sur GPU des étapes les plus lourdes d'un algorithme de traitement STAP. / Space-time adaptive processing (STAP) is a processing that makes use of both the spatial and the temporal dimensions of the received signals by an antenna array, whereas conventional antenna processing only exploits the spatial dimension to perform filtering. These processing are very powerful to remove ground echoes received by airborne radars, where there is a direct relation between the arrival angle and the Doppler frequency. However, if the principles of STAP processing are now well understood, their performances are limited when facing practical situations. The first part of this thesis, is theoretical, and consists of defining effective procedures to estimate the covariance matrix of the clutter using a representative selection of training data, in a context of both non-homogeneous clutter and sometimes high density of targets. The second point studied in this thesis is technological, and lies in the physical implementation of the selected algorithms, because of their high computational workload requirement. This is a key point in airborne operations, and is explored by the thesis in a second phase, with the analysis of the feasibility of implementation on GPU of the heaviest stages of a STAP processing. Radar STAP GPU Filtrage adaptatif Radar STAP GPU Adaptive processing
32	Um estudo do uso eficiente de programas em placas gráficas / A case study on the efficient use of programs on GPUs Ikeda, Patricia Akemi 20 September 2011 (has links) Inicialmente projetadas para processamento de gráficos, as placas gráficas (GPUs) evoluíram para um coprocessador paralelo de propósito geral de alto desempenho. Devido ao enorme potencial que oferecem para as diversas áreas de pesquisa e comerciais, a fabricante NVIDIA destaca-se pelo pioneirismo ao lançar a arquitetura CUDA (compatível com várias de suas placas), um ambiente capaz de tirar proveito do poder computacional aliado à maior facilidade de programação. Na tentativa de aproveitar toda a capacidade da GPU, algumas práticas devem ser seguidas. Uma delas consiste em manter o hardware o mais ocupado possível. Este trabalho propõe uma ferramenta prática e extensível que auxilie o programador a escolher a melhor configuração para que este objetivo seja alcançado. / Initially designed for graphical processing, the graphic cards (GPUs) evolved to a high performance general purpose parallel coprocessor. Due to huge potencial that graphic cards offer to several research and commercial areas, NVIDIA was the pioneer lauching of CUDA architecture (compatible with their several cards), an environment that take advantage of computacional power combined with an easier programming. In an attempt to make use of all capacity of GPU, some practices must be followed. One of them is to maximizes hardware utilization. This work proposes a practical and extensible tool that helps the programmer to choose the best configuration and achieve this goal. CUDA CUDA GPU Computing GPU Computing NVIDIA NVIDIA
33	Enhancements to Reconstruction Techniques in Computed Tomography Using High Performance Computing Eliuk, Steven N Unknown Date No description available. Computed Tomography GPU Multi-GPU ART SART FBP CWBP NPS
34	Fusion: abstraÃÃes linguÃsticas sobre Java para programaÃÃo paralela heterogÃnea sobre GPGPUs / Fusion: linguistic abstractions on Java for parallel programming on heterogeneous GPGPUs Anderson Boettge Pinheiro 22 May 2013 (has links) CoordenaÃÃo de AperfeiÃoamento de Pessoal de NÃvel Superior / Unidades de aceleraÃÃo grÃca, ou GPU (Graphical Processing Units ), tem se consolidado nos Ãltimos anos para computaÃÃo de propÃsito geral, para aceleraÃÃo de trechos crÃticos de programas que apresentam requisitos severos de desempenho quanto ao tempo de execuÃÃo. GPUs constituem um dentre vÃrios tipos de aceleradores computacionais de propÃsito geral que tem sido incorporados em vÃrias plataformas de computaÃÃo de alto desempenho, com destaque tambÃm para as MIC (Many Integrated Cores ) e FPGA (Field Programmable Gateway Arrays ). A despeito da Ãnfase nas pesquisas de novos algoritmos paralelos capazes de explorar o paralelismo massivo oferecido por dispositivos GPGPU, ainda sÃo incipientes as iniciativas sobre novas abstraÃÃes de programaÃÃo que tornem mais simples a descriÃÃo desses algoritmos sobre GPGPUs, sem detrimento Ã efciÃncia. Ainda Ã necessÃrio que o programador possua conhecimento especÃfico sobre as peculiaridades da arquitetura desses dispositivos, assim como tÃcnicas de programaÃÃo que nÃo sÃo do domÃnio mesmo de programadores paralelos experientes na atualidade. Nos Ãltimos anos, a NVIDIA, indÃstria que tem dominado a evoluÃÃo arquitetural dos dispositivos GPGPU, lanÃou a arquitetura Kepler, incluindo o suporte Ãs extensÃes Hyper-Q e Dynamic Parallelism (DP), as quais oferecem novas oportunidades de expressÃo de padrÃes de programaÃÃo paralela sobre esses dispositivos. Esta dissertaÃÃo tem por objetivo a proposta de novas abstraÃÃes de programaÃÃo paralela sobre uma linguagem orientada a objetos baseada em Java, a m de expressar computaÃÃes paralelas heterogÃneas do tipo multicore/manycore, onde o dispositivo GPU Ã compartilhado por um conjunto de threads paralelas que executam no processador hospedeiro, em um nÃvel de abstraÃÃo mais elevado comparado Ãs alternativas existentes, porÃm ainda oferecendo ao programador total controle sobre o uso dos recursos do dispositivo. O projeto das abstraÃÃes dessa linguagem proposta, doravante chamada Fusion, parte da expressividade oferecida pela arquitetura Kepler. / Acceleration units free, or GPU (Graphical Processing Units), have been consolidated in recent years for general purpose computing for accelerating critical sections of programs that exhibit high standards of performance and the execution time. GPUs are one of several types of general-purpose computational accelerators that have been built on various platforms for high performance computing, especially also for the MIC (Many Integrated Cores) and FPGA (Field Programmable Gateway Arrays). Despite the emphasis on the research of new parallel algorithms capable of exploiting the massive parallelism offered by GPGPU devices are still incipient initiatives on new programming abstractions that make the simplest description of these algorithms on GPGPUs, without detriment to the effciency. It is still necessary that the programmer has specific knowledge of the peculiarities of the architecture of these devices, as well as programming techniques that are not domain even experienced parallel programmers today. In recent years, NVIDIA, an industry that has dominated the evolution of architectural GPGPU devices, launched the Kepler architecture, including extensions to support Hyper-Q and Dynamic Parallelism (DP), which offer new opportunities for expression patterns of parallel programming on such devices. This paper aims at proposing new programming abstractions over a parallel object-oriented language based on Java, am expressing parallel computations heterogeneous type multicore / manycore, where the GPU device is shared by a set of parallel threads running in host processor, on a higher level of abstraction compared to existing alternatives, but still offering the programmer full control over the use of device capabilities. The design of this proposed language abstractions, hereinafter called Fusion, part of the expressiveness offered by Kepler architecture. java GPU paralela heterogÃnea java GPU parallel heterogeneous CIENCIA DA COMPUTACAO
35	GeraÃÃo de Malhas por Refinamento Adptativo Usando GPU / Generation of mesh by adaptive refinement using GPU Ricardo Lenz Cesar 24 April 2009 (has links) Conselho Nacional de Desenvolvimento CientÃfico e TecnolÃgico / O alto desempenho da GPU e o crescente uso dos seus mecanismos de programaÃÃo tÃm estimulado diversas aplicaÃÃes grÃficas de realidade virtual a explorar melhor o potencial desse dispositivo para alcanÃar nÃveis mais altos de realismo. Trabalhos tÃm surgido com um enfoque no refinamento da silhueta de malhas geomÃtricas, buscando expressar melhor a superfÃcie dos objetos tridimensionais sendo representados. O tipo de refinamento aplicado pode ser, por exemplo, uma suavizaÃÃo da malha bruta de um avatar, por meio da interpolaÃÃo de uma superfÃcie curva sobre suas faces. A ideia bÃsica Ã fazer uma discretizaÃÃo adaptativa da malha do objeto e entÃo gerar uma nova silhueta usando essa discretizaÃÃo. MÃtodos anteriores sÃo analisados e sÃo apresentadas melhorias que juntas formarÃo o mÃtodo proposto. O desempenho obtido Ã superior devido a uma exploraÃÃo melhor do paralelismo da GPU, e a tÃcnica proposta funciona suficientemente bem com malhas existentes sem necessidade de se projetar novos modelos para isso. / The high performance of the GPU and the increasing use of its programming mechanisms have stimulated several graphic applications of virtual reality to explore the potential of this device to achieve higher levels of realism. Studies have emerged with a focus on refining the silhouette of geometric meshes, seeking to express better the surface of three-dimensional objects being represented. The type of refining can be applied, for example, a fabric softening raw an avatar by means of an interpolation curve on their surface faces. Basic idea is to make an adaptive mesh discretization of the object and then generate a new silhouette using this discretization. Previous methods are analyzed and improvements are presented which together form the proposed method. The performance obtained is superior due to a better exploitation of parallelism of the GPU, and the proposed technique works well enough with existing mesh without the need to design new models for this. GPU Refinamento Malha Silhueta GPU Refinement Mesh Silhouette CIENCIA DA COMPUTACAO
36	Um estudo do uso eficiente de programas em placas gráficas / A case study on the efficient use of programs on GPUs Patricia Akemi Ikeda 20 September 2011 (has links) Inicialmente projetadas para processamento de gráficos, as placas gráficas (GPUs) evoluíram para um coprocessador paralelo de propósito geral de alto desempenho. Devido ao enorme potencial que oferecem para as diversas áreas de pesquisa e comerciais, a fabricante NVIDIA destaca-se pelo pioneirismo ao lançar a arquitetura CUDA (compatível com várias de suas placas), um ambiente capaz de tirar proveito do poder computacional aliado à maior facilidade de programação. Na tentativa de aproveitar toda a capacidade da GPU, algumas práticas devem ser seguidas. Uma delas consiste em manter o hardware o mais ocupado possível. Este trabalho propõe uma ferramenta prática e extensível que auxilie o programador a escolher a melhor configuração para que este objetivo seja alcançado. / Initially designed for graphical processing, the graphic cards (GPUs) evolved to a high performance general purpose parallel coprocessor. Due to huge potencial that graphic cards offer to several research and commercial areas, NVIDIA was the pioneer lauching of CUDA architecture (compatible with their several cards), an environment that take advantage of computacional power combined with an easier programming. In an attempt to make use of all capacity of GPU, some practices must be followed. One of them is to maximizes hardware utilization. This work proposes a practical and extensible tool that helps the programmer to choose the best configuration and achieve this goal. CUDA GPU Computing NVIDIA CUDA GPU Computing NVIDIA
37	Simulation des réseaux à grande échelle sur les architectures de calculs hétérogènes / Large-scale network simulation over heterogeneous computing architecture Ben Romdhanne, Bilel 16 December 2013 (has links) La simulation est une étape primordiale dans l'évolution des systèmes en réseaux. L’évolutivité et l’efficacité des outils de simulation est une clef principale de l’objectivité des résultats obtenue, étant donné la complexité croissante des nouveaux des réseaux sans-fils. La simulation a évènement discret est parfaitement adéquate au passage à l'échelle, cependant les architectures logiciel existantes ne profitent pas des avancées récente du matériel informatique comme les processeurs parallèle et les coprocesseurs graphique. Dans ce contexte, l'objectif de cette thèse est de proposer des mécanismes d'optimisation qui permettent de surpasser les limitations des approches actuelles en combinant l’utilisation des ressources de calcules hétérogène. Pour répondre à la problématique de l’efficacité, nous proposons de changer la représentation d'événement, d'une représentation bijective (évènement-descripteur) à une représentation injective (groupe d'évènements-descripteur). Cette approche permet de réduire la complexité de l'ordonnancement d'une part et de maximiser la capacité d'exécuter massivement des évènements en parallèle d'autre part. Dans ce sens, nous proposons une approche d'ordonnancement d'évènements hybride qui se base sur un enrichissement du descripteur pour maximiser le degré de parallélisme en combinons la capacité de calcule du CPU et du GPU dans une même simulation. Les résultats comparatives montre un gain en terme de temps de simulation de l’ordre de 100x en comparaison avec une exécution équivalente sur CPU uniquement. Pour répondre à la problématique d’évolutivité du système, nous proposons une nouvelle architecture distribuée basée sur trois acteurs. / The simulation is a primary step on the evaluation process of modern networked systems. The scalability and efficiency of such a tool in view of increasing complexity of the emerging networks is a key to derive valuable results. The discrete event simulation is recognized as the most scalable model that copes with both parallel and distributed architecture. Nevertheless, the recent hardware provides new heterogeneous computing resources that can be exploited in parallel.The main scope of this thesis is to provide a new mechanisms and optimizations that enable efficient and scalable parallel simulation using heterogeneous computing node architecture including multicore CPU and GPU. To address the efficiency, we propose to describe the events that only differs in their data as a single entry to reduce the event management cost. At the run time, the proposed hybrid scheduler will dispatch and inject the events on the most appropriate computing target based on the event descriptor and the current load obtained through a feedback mechanisms such that the hardware usage rate is maximized. Results have shown a significant gain of 100 times compared to traditional CPU based approaches. In order to increase the scalability of the system, we propose a new simulation model, denoted as general purpose coordinator-master-worker, to address jointly the challenge of distributed and parallel simulation at different levels. The performance of a distributed simulation that relies on the GP-CMW architecture tends toward the maximal theoretical efficiency in a homogeneous deployment. The scalability of such a simulation model is validated on the largest European GPU-based supercomputer Calcul parallèle CPU/GPU Parallel computing CPU/GPU
38	Zpracování dat z elektronového mikroskopu pomocí GPU / Employing GPU to Process Data from Electron Microscope Bali, Michal January 2021 (has links) Electron backscatter diffraction (EBSD) is a common tool used by phy- sicists to examine crystalline materials, which is based on taking pictures of material microstructure using electron microscope. To determine additional characteristics of studied specimen, a specific variant called High resolution EBSD has been proposed (and partially adopted). The technique takes se- veral subregions of the images taken by the EBSD camera and uses cross- correlation to measure deformation of obtained patterns. Usability of this method is limited by its relatively high computational complexity, which makes it useless for the analysis of larger specimen surfaces. At the same time, processing of individual subregions and images is independent, which makes it appropriate for parallelization provided by modern GPUs. In this thesis, we describe the technique used to process the EBSD data in detail, analyze it and implement the most computationally demanding parts using the CUDA technology. Compared to a reference Python implementation, we measured a speedup of 30-40-times when using a double floating precision and up to a 270-times speedup for a single precision.
39	GPU-Based Acceleration on ACEnet for FDTD Method of Electromagnetic Field Analysis Sun, Dachuan 21 November 2013 (has links) Graphics Processing Unit (GPU) programming techniques have been applied to a range of scientific and engineering computations. In computational electromagnetics, uses of the GPU technique have dramatically increased since the release of NVIDIA’s Compute Unified Device Architecture (CUDA), a powerful and simple-to-use programmer environment that renders GPU computing easy accessibility to developers not specialized in computer graphics. The focus of recent research has been on problems concerning the Finite-Difference Time-Domain (FDTD) simulation of electromagnetic (EM) fields. Traditional FDTD methods sometimes run slowly due to large memory and CPU requirements for modeling electrically large structures. Acceleration methods such as parallel programming are then needed. FDTD algorithm is suitable for multi-thread parallel computation with GPU. For complex structures and procedures, high performance GPU calculation algorithms will be crucial. In this work, we present the implementation of GPU programming for acceleration of computations for EM engineering problems. The speed-up is demonstrated through a few simulations with inexpensive GPUs and ACEnet, and the attainable efficiency is illustrated with numerical results. Using C, CUDA C, Matlab GPU, and ACEnet, we make comparisons between serial and parallel algorithms and among computations with and without GPU and CUDA, different types of GPUs, and personal computers and ACEnet. A maximum of 26.77 times of speed-up is achieved, which could be further boosted with development of new hardware in the future. The acceleration in run time will make many investigations possible and will pave the way for studies of large-scale computational electromagnetic problems that were previously impractical. This is a field that definitely invites more in-depth studies. / This is the thesis of my Master of Applied Science work at Dalhousie University. GPU FDTD CUDA parallel computing
40	Techniques for Shared Resource Management in Systems with Throughput Processors Ausavarungnirun, Rachata 01 May 2017 (has links) The continued growth of the computational capability of throughput processors has made throughput processors the platform of choice for a wide variety of high performance computing applications. Graphics Processing Units (GPUs) are a prime example of throughput processors that can deliver high performance for applications ranging from typical graphics applications to general-purpose data parallel (GPGPU) applications. However, this success has been accompa- nied by new performance bottlenecks throughout the memory hierarchy of GPU-based systems. This dissertation identifies and eliminates performance bottlenecks caused by major sources of interference throughout the memory hierarchy. Specifically, we provide an in-depth analysis of inter- and intra-application as well as inter- address-space interference that significantly degrade the performance and efficiency of GPU-based systems. To minimize such interference, we introduce changes to the memory hierarchy for systems with GPUs that allow the memory hierarchy to be aware of both CPU and GPU applications’ charac- teristics. We introduce mechanisms to dynamically analyze different applications’ characteristics and propose four major changes throughout the memory hierarchy. First, we introduce Memory Divergence Correction (MeDiC), a cache management mecha- nism that mitigates intra-application interference in GPGPU applications by allowing the shared L2 cache and the memory controller to be aware of the GPU’s warp-level memory divergence characteristics. MeDiC uses this warp-level memory divergence information to give more cache space and more memory bandwidth to warps that benefit most from utilizing such resources. Our evaluations show that MeDiC significantly outperforms multiple state-of-the-art caching policies proposed for GPUs. Second, we introduce the Staged Memory Scheduler (SMS), an application-aware CPU-GPU memory request scheduler that mitigates inter-application interference in heterogeneous CPU-GPU systems. SMS creates a fundamentally new approach to memory controller design that decouples the memory controller into three significantly simpler structures, each of which has a separate task, These structures operate together to greatly improve both system performance and fairness. Our three-stage memory controller first groups requests based on row-buffer locality. This grouping allows the second stage to focus on inter-application scheduling decisions. These two stages en- force high-level policies regarding performance and fairness. As a result, the last stage is simple logic that deals only with the low-level DRAM commands and timing. SMS is also configurable: it allows the system software to trade off between the quality of service provided to the CPU versus GPU applications. Our evaluations show that SMS not only reduces inter-application interference caused by the GPU, thereby improving heterogeneous system performance, but also provides better scalability and power efficiency compared to multiple state-of-the-art memory schedulers. Third, we redesign the GPU memory management unit to efficiently handle new problems caused by the massive address translation parallelism present in GPU computation units in multi- GPU-application environments. Running multiple GPGPU applications concurrently induces significant inter-core thrashing on the shared address translation/protection units; e.g., the shared Translation Lookaside Buffer (TLB), a new phenomenon that we call inter-address-space interference. To reduce this interference, we introduce Multi Address Space Concurrent Kernels (MASK). MASK introduces TLB-awareness throughout the GPU memory hierarchy and introduces TLBand cache-bypassing techniques to increase the effectiveness of a shared TLB. Finally, we introduce Mosaic, a hardware-software cooperative technique that further increases the effectiveness of TLB by modifying the memory allocation policy in the system software. Mosaic introduces a high-throughput method to support large pages in multi-GPU-application environments. The key idea is to ensure memory allocation preserve address space contiguity to allow pages to be coalesced without any data movements. Our evaluations show that the MASK-Mosaic combination provides a simple mechanism that eliminates the performance overhead of address translation in GPUs without significant changes to GPU hardware, thereby greatly improving GPU system performance. The key conclusion of this dissertation is that a combination of GPU-aware cache and memory management techniques can effectively mitigate the memory interference on current and future GPU-based systems as well as other types of throughput processors. GPU Resource Management Throughput Processors

Search results