• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 471
  • 88
  • 87
  • 56
  • 43
  • 20
  • 14
  • 14
  • 10
  • 5
  • 5
  • 3
  • 3
  • 3
  • 2
  • Tagged with
  • 983
  • 318
  • 202
  • 183
  • 167
  • 165
  • 153
  • 137
  • 124
  • 104
  • 96
  • 93
  • 93
  • 87
  • 81
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
131

Extração de informações de desempenho em GPUs NVIDIA / Performance Information Extraction on NVIDIA GPUs

Santos, Paulo Carlos Ferreira dos 15 March 2013 (has links)
O recente crescimento da utilização de Unidades de Processamento Gráfico (GPUs) em aplicações científicas, que são voltadas ao desempenho, gerou a necessidade de otimizar os programas que nelas rodam. Uma ferramenta adequada para essa tarefa é o modelo de desempenho que, por sua vez, se beneficia da existência de uma ferramenta de extração de informações de desempenho para GPUs. Este trabalho cobre a criação de um gerador de microbenchmark para instruções PTX que também obtém informações sobre as características do hardware da GPU. Os resultados obtidos com o microbenchmark foram validados através de um modelo simplificado que obteve erros entre 6,11% e 16,32% em cinco kernels de teste. Também foram levantados os fatores de imprecisão nos resultados do microbenchmark. Utilizamos a ferramenta para analisar o perfil de desempenho das instruções e identificar grupos de comportamentos semelhantes. Também testamos a dependência do desempenho do pipeline da GPU em função da sequência de instruções executada e verificamos a otimização do compilador para esse caso. Ao fim deste trabalho concluímos que a utilização de microbenchmarks com instruções PTX é factível e se mostrou eficaz para a construção de modelos e análise detalhada do comportamento das instruções. / The recent growth in the use of tailored for performance Graphics Processing Units (GPUs) in scientific applications, generated the need to optimize GPU targeted programs. Performance models are the suitable tools for this task and they benefits from existing GPUs performance information extraction tools. This work covers the creation of a microbenchmark generator using PTX instructions and it also retrieves information about the GPU hardware characteristics. The microbenchmark results were validated using a simplified model with errors rates between 6.11% and 16.32% under five diferent GPU kernels. We also explain the imprecision factors present in the microbenchmark results. This tool was used to analyze the instructions performance profile, identifying groups with similar behavior. We also evaluated the corelation of the GPU pipeline performance and instructions execution sequence. Compiler optimization capabilities for this case were also verified. We concluded that the use of microbenchmarks with PTX instructions is a feasible approach and an efective way to build performance models and to generate detailed analysis of the instructions\' behavior.
132

Modèles de programmation et d'exécution pour les architectures parallèles et hybrides. Applications à des codes de simulation pour la physique.

Ospici, Matthieu 03 July 2013 (has links) (PDF)
Nous nous intéressons dans cette thèse aux grandes architectures parallèles hybrides, c'est-à-dire aux architectures parallèles qui sont une combinaison de processeurs généraliste (Intel Xeon par exemple) et de processeurs accélérateur (GPU Nvidia). L'exploitation efficace de ces grappes hybrides pour le calcul haute performance est au cœur de nos travaux. L'hétérogénéité des ressources de calcul au sein des grappes hybrides pose de nombreuses problématiques lorsque l'on souhaite les exploiter efficacement avec de grandes applications scientifiques existantes. Deux principales problématiques ont été traitées. La première concerne le partage des accélérateurs pour les applications MPI et la seconde porte sur la programmation et l'exécution concurrente de code entre CPU et accélérateur. Les architectures hybrides sont très hétérogènes : en fonction des architectures, le ratio entre le nombre d'accélérateurs et le nombre de coeurs CPU est très variable. Ainsi, nous avons tout d'abord proposé une notion de virtualisation d'accélérateur, qui permet de donner l'illusion aux applications qu'elles ont la capacité d'utiliser un nombre d'accélérateurs qui n'est pas lié au nombre d'accélérateurs physiques disponibles dans le matériel. Un modèle d'exécution basé sur un partage des accélérateurs est ainsi mis en place et permet d'exposer aux applications une architecture hybride plus homogène. Nous avons également proposé des extensions aux modèles de programmation basés sur MPI / threads afin de traiter le problème de l'exécution concurrente entre CPU et accélérateurs. Nous avons proposé pour cela un modèle basé sur deux types de threads, les threads CPU et accélérateur, permettant de mettre en place des calculs hybrides exploitant simultanément les CPU et les accélérateurs. Dans ces deux cas, le déploiement et l'exécution du code sur les ressources hybrides est crucial. Nous avons pour cela proposé deux bibliothèques logicielles S_GPU 1 et S_GPU 2 qui ont pour rôle de déployer et d'exécuter les calculs sur le matériel hybride. S_GPU 1 s'occupant de la virtualisation, et S_GPU 2 de l'exploitation concurrente CPU -- accélérateurs. Pour observer le déploiement et l'exécution du code sur des architectures complexes à base de GPU, nous avons intégré des mécanismes de traçage qui permettent d'analyser le déroulement des programmes utilisant nos bibliothèques. La validation de nos propositions a été réalisée sur deux grandes application scientifiques : BigDFT (simulation ab-initio) et SPECFEM3D (simulation d'ondes sismiques). Nous les avons adapté afin qu'elles puissent utiliser S_GPU 1 (pour BigDFT) et S_GPU 2 (pour SPECFEM3D).
133

Accelerating Java on Embedded GPU

P. Joseph, Iype 10 March 2014 (has links)
Multicore CPUs (Central Processing Units) and GPUs (Graphics Processing Units) are omnipresent in today’s market-leading smartphones and tablets. With CPUs and GPUs getting more complex, maximizing hardware utilization is becoming problematic. The challenges faced in GPGPU (General Purpose computing using GPU) computing on embedded platforms are different from their desktop counterparts due to their memory and computational limitations. This thesis evaluates the performance and energy efficiency achieved by offloading Java applications to an embedded GPU. The existing solutions in literature address various techniques and benefits of offloading Java on desktop or server grade GPUs and not on embedded GPUs. Our research is focussed on providing a framework for accelerating Java programs on embedded GPUs. Our experiments were conducted on a Freescale i.MX6Q SabreLite board which encompasses a quad-core ARM Cortex A9 CPU and a Vivante GC 2000 GPU that supports the OpenCL 1.1 Embedded Profile. We successfully accelerated Java code and reduced energy consumption by employing two approaches, namely JNI-OpenCL, and JOCL, which is a popular Java-binding for OpenCL. These approaches can be easily implemented on other platforms by embedded Java programmers to exploit the computational power of GPUs. Our results show up to an 8 times increase in performance efficiency and 3 times decrease in energy consumption compared to the embedded CPU-only execution of Java program. To the best of our knowledge, this is the first work done on accelerating Java on an embedded GPU.
134

Détection d’évènements impulsionnels en environnement radioélectrique perturbé : application à l’observation des pulsars intermittents avec un système temps réel de traitement du signal / Impulsive event detection in a disturbed radio environment : application to the observation of intermittent pulsars with real-time signal processing system

Ait Allal, Dalal 16 November 2012 (has links)
Les travaux présentés dans ce mémoire s’inscrivent dans le cadre de la détection d’évènements impulsionnels intermittents en provenance de pulsars. Ces objets astrophysiques sont des étoiles à neutrons hautement magnétisées en rotation rapide, qui émettent un faisceau radio balayant l’espace comme la lentille d’un phare. Ils sont détectables grâce à une instrumentation spécifique. Depuis quelques années, on a découvert de nouvelles catégories de ces pulsars, aux caractéristiques extrêmes, avec en particulier des impulsions individuelles plus intenses et irrégulières comparé à la moyenne. Il faut pouvoir les détecter en temps réel dans un environnement radio perturbé à cause des signaux de télécommunications. Cette étude propose des algorithmes de traitement d’interférences radio fréquence (RFI) adaptés à ce contexte. Plusieurs méthodes de traitement de RFI sont présentées et comparées. Parmi elles, deux ont été retenues et comparées au moyen de simulations Monte Carlo, avec un jeu de paramètres simulant le pulsar et un signal BPSK avec des puissances et des durées différentes. Pour la recherche de nouveaux pulsars, une méthode alternative est proposée (SIPSFAR), combinant capacité de recherche en temps réel et robustesse contre les RFI. Elle est basée sur la transformée de Fourier 2D et la transformée de Radon. Une étude comparative théorique a permis de confronter et comparer la sensibilité de cette nouvelle méthode avec celle communément utilisée par les radioastronomes. La méthode a été implantée sur un GPU GTX285 et testée sur un grand relevé du ciel effectué au radiotélescope de Nançay. Les résultats obtenus ont donné lieu à une comparaison statistique complémentaire à partir de données réelles. / The work presented in this thesis is in the context of the intermittent impulsive event detection at Nançay Observatory. The pulsars are highly magnetized neutron stars in rapid rotation, which emit a radio beam scanning the space like a lighthouse. They are detectable with a specific instrumentation. In recent years, new classes of such pulsars were discovered. These pulsars with extreme features, especially with individual pulses more intense and irregular compared to the average, must be detected in real time in a disrupted radio environment because of telecommunication signals. This study presents some radio frequency interference (RFI) mitigation algorithms adapted to this context. Several methods are presented and compared. Among them, two were selected and compared using Monte Carlo simulations with a set of parameters to simulate the pulsar and a BPSK signal with power and different durations. In the case of researching new pulsars, an alternative method is proposed (SIPSFAR), combining research capacity in real time and robustness against RFI. It is based on 2D Fourier transform and the Radon transform. A theoretical comparative study has confronted and compared the sensitivity of this new method and the commonly method used by radio astronomers. SIPSFAR was implemented on a GPU GTX285 and tested on a large survey of the sky made at Nançay radio telescope. The results have led to a further statistical comparison from the actual data.
135

Multi-scale Feature-Preserving Smoothing of Images and Volumes on GPU / Lissage multi-echelle sur GPU des images et volumes avec preservation des details

Jibai, Nassim 24 May 2012 (has links)
Les images et données volumiques sont devenues importantes dans notre vie quotidienne que ce soit sur le plan artistique, culturel, ou scientifique. Les données volumiques ont un intérêt important dans l'imagerie médicale, l'ingénierie, et l'analyse du patrimoine culturel. Ils sont créées en utilisant la reconstruction tomographique, une technique qui combine une large série de scans 2D capturés de plusieur points de vue. Chaque scan 2D est obtenu par des methodes de rayonnement : Rayons X pour les scanners CT, ondes radiofréquences pour les IRM, annihilation électron-positron pour les PET scans, etc. L'acquisition des images et données volumique est influencée par le bruit provoqué par différents facteurs. Le bruit dans les images peut être causée par un manque d'éclairage, des défauts électroniques, faible dose de rayonnement, et un mauvais positionnement de l'outil ou de l'objet. Le bruit dans les données volumique peut aussi provenir d'une variété de sources : le nombre limité de points de vue, le manque de sensibilité dans les capteurs, des contrastes élevé, les algorithmes de reconstruction employés, etc. L'acquisition de données non bruitée est iréalisable. Alors, il est souhaitable de réduire ou d'éliminer le bruit le plus tôt possible dans le pipeline. La suppression du bruit tout en préservant les caractéristiques fortes d'une image ou d'un objet volumique reste une tâche difficile. Nous proposons une méthode multi-échelle pour lisser des images 2D et des données tomographiques 3D tout en préservant les caractéristiques à l'échelle spécifiée. Notre algorithme est contrôlé par un seul paramètre – la taille des caractéristiques qui doivent être préservées. Toute variation qui est plus petite que l'échelle spécifiée est traitée comme bruit et lissée, tandis que les discontinuités telles que des coins, des bords et des détails à plus grande échelle sont conservés. Nous démontrons les données lissées produites par notre algorithme permettent d'obtenir des images nettes et des iso-surfaces plus propres. Nous comparons nos résultats avec ceux des methodes précédentes. Notre méthode est inspirée par la diffusion anisotrope. Nous calculons nos tenseurs de diffusion à partir des histogrammes continues locaux de gradients autour de chaque pixel dans les images et autour de chaque voxel dans des volumes. Comme notre méthode de lissage fonctionne entièrement sur GPU, il est extrêmement rapide. / Two-dimensional images and three-dimensional volumes have become a staple ingredient of our artistic, cultural, and scientific appetite. Images capture and immortalize an instance such as natural scenes, through a photograph camera. Moreover, they can capture details inside biological subjects through the use of CT (computer tomography) scans, X-Rays, ultrasound, etc. Three-dimensional volumes of objects are also of high interest in medical imaging, engineering, and analyzing cultural heritage. They are produced using tomographic reconstruction, a technique that combine a large series of 2D scans captured from multiple views. Typically, penetrative radiation is used to obtain each 2D scan: X-Rays for CT scans, radio-frequency waves for MRI (magnetic resonance imaging), electron-positron annihilation for PET scans, etc. Unfortunately, their acquisition is influenced by noise caused by different factors. Noise in two-dimensional images could be caused by low-light illumination, electronic defects, low-dose of radiation, and a mispositioning tool or object. Noise in three-dimensional volumes also come from a variety of sources: the limited number of views, lack of captor sensitivity, high contrasts, the reconstruction algorithms, etc. The constraint that data acquisition be noiseless is unrealistic. It is desirable to reduce, or eliminate, noise at the earliest stage in the application. However, removing noise while preserving the sharp features of an image or volume object remains a challenging task. We propose a multi-scale method to smooth 2D images and 3D tomographic data while preserving features at a specified scale. Our algorithm is controlled using a single user parameter – the minimum scale of features to be preserved. Any variation that is smaller than the specified scale is treated as noise and smoothed, while discontinuities such as corners, edges and detail at a larger scale are preserved. We demonstrate that our smoothed data produces clean images and clean contour surfaces of volumes using standard surface-extraction algorithms. In addition to, we compare our results with results of previous approaches. Our method is inspired by anisotropic diffusion. We compute our diffusion tensors from the local continuous histograms of gradients around each pixel in image
136

Face perception in videos : contributions to a visual saliency model and its implementation on GPUs / La perception des visages en vidéos : contributions à un modèle saillance visuelle et son application sur les GPU

Rahman, Anis Ur 12 April 2013 (has links)
Les études menées dans cette thèse portent sur le rôle des visages dans l'attention visuelle. Nous avons cherché à mieux comprendre l'influence des visages dans les vidéos sur les mouvements oculaires, afin de proposer un modèle de saillance visuelle pour la prédiction de la direction du regard. Pour cela, nous avons analysé l'effet des visages sur les fixations oculaires d'observateurs regardant librement (sans consigne ni tâche particulière) des vidéos. Nous avons étudié l'impact du nombre de visages, de leur emplacement et de leur taille. Il est apparu clairement que les visages dans une scène dynamique (à l'instar de ce qui se passe sur les images fixes) modifie fortement les mouvements oculaires. En nous appuyant sur ces résultats, nous avons proposé un modèle de saillance visuelle, qui combine des caractéristiques classiques de bas-niveau (orientations et fréquences spatiales, amplitude du mouvement des objets) avec cette caractéristique importante de plus haut-niveau que constitue les visages. Enfin, afin de permettre des traitements plus proches du temps réel, nous avons développé une implémentation parallèle de ce modèle de saillance visuelle sur une plateforme multi-GPU. Le gain en vitesse est d'environ 130 par rapport à une implémentation sur un processeur multithread. / Studies conducted in this thesis focuses on faces and visual attention. We are interested to better understand the influence and perception of faces, to propose a visual saliency model with face features. Throughout the thesis, we concentrate on the question, "How people explore dynamic visual scenes, how the different visual features are modeled to mimic the eye movements of people, in particular, what is the influence of faces?" To answer these questions we analyze the influence of faces on gaze during free-viewing of videos, as well as the effects of the number, location and size of faces. Based on the findings of this work, we propose model with face as an important information feature extracted in parallel alongside other classical visual features (static and dynamic features). Finally, we propose a multi-GPU implementation of the visual saliency model, demonstrating an enormous speedup of more than 132 times compared to a multithreaded CPU.
137

Extração de informações de desempenho em GPUs NVIDIA / Performance Information Extraction on NVIDIA GPUs

Paulo Carlos Ferreira dos Santos 15 March 2013 (has links)
O recente crescimento da utilização de Unidades de Processamento Gráfico (GPUs) em aplicações científicas, que são voltadas ao desempenho, gerou a necessidade de otimizar os programas que nelas rodam. Uma ferramenta adequada para essa tarefa é o modelo de desempenho que, por sua vez, se beneficia da existência de uma ferramenta de extração de informações de desempenho para GPUs. Este trabalho cobre a criação de um gerador de microbenchmark para instruções PTX que também obtém informações sobre as características do hardware da GPU. Os resultados obtidos com o microbenchmark foram validados através de um modelo simplificado que obteve erros entre 6,11% e 16,32% em cinco kernels de teste. Também foram levantados os fatores de imprecisão nos resultados do microbenchmark. Utilizamos a ferramenta para analisar o perfil de desempenho das instruções e identificar grupos de comportamentos semelhantes. Também testamos a dependência do desempenho do pipeline da GPU em função da sequência de instruções executada e verificamos a otimização do compilador para esse caso. Ao fim deste trabalho concluímos que a utilização de microbenchmarks com instruções PTX é factível e se mostrou eficaz para a construção de modelos e análise detalhada do comportamento das instruções. / The recent growth in the use of tailored for performance Graphics Processing Units (GPUs) in scientific applications, generated the need to optimize GPU targeted programs. Performance models are the suitable tools for this task and they benefits from existing GPUs performance information extraction tools. This work covers the creation of a microbenchmark generator using PTX instructions and it also retrieves information about the GPU hardware characteristics. The microbenchmark results were validated using a simplified model with errors rates between 6.11% and 16.32% under five diferent GPU kernels. We also explain the imprecision factors present in the microbenchmark results. This tool was used to analyze the instructions performance profile, identifying groups with similar behavior. We also evaluated the corelation of the GPU pipeline performance and instructions execution sequence. Compiler optimization capabilities for this case were also verified. We concluded that the use of microbenchmarks with PTX instructions is a feasible approach and an efective way to build performance models and to generate detailed analysis of the instructions\' behavior.
138

Análise de técnicas de implementação paralela para treinamento de redes neurais em GPU

Gurgel, Sáskya Thereza Alves 31 January 2014 (has links)
Made available in DSpace on 2015-05-14T12:36:46Z (GMT). No. of bitstreams: 1 arquivototal.pdf: 3331001 bytes, checksum: ea8e995295d4e5afdb8c4ddea63e5358 (MD5) Previous issue date: 2014-01-31 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / With the increase of data volume and the latent necessity of turn them into knowledge and information, arises the need to develop techniques able to perform the data analysis in a timely and efficient manner. Neural networks promotes an data analysis that is able to classify and predict information. However, the natural model of parallel computing proposed by neural networks, requires techniques of implementation with high processing power. The evolution of parallel hardware provides an environment with ever growing computational power. The GPU is a hardware that is able to process parallel implementations in a efficient way and at low cost. Therefore, this paper provides a technique of parallel implementation of neural networks with GPU processing and seeks to achieve an comparative analysis between different implementation techniques found in literature and the technique proposed in this paper. / Com a crescente expansão do volume de dados disponíveis e a latente necessidade de transformá-los em conhecimento e informação, faz-se necessário o desenvolvimento de técnicas capazes de realizar a análise destes dados em tempo hábil e de uma maneira eficiente. Redes Neurais promovem uma análise de dados capaz de classificá-los, como também, predizem informações sobre estes. Entretanto, Redes Neurais propõem um modelo natural de computação paralela que requer técnicas de implementação com alto poder de processamento. A crescente evolução do hardware paralelo oferece ambientes com poder computacional cada vez mais robusto. A GPU classifica-se como hardware capaz de processar implementações paralelas de uma maneira eficiente e a um custo em constante redução. Sendo assim, é apresentada uma técnica de implementação paralela de Redes Neurais com processamento em GPU. Este realiza uma análise comparativa entre diferentes técnicas de implementação encontradas na literatura e a técnica proposta neste trabalho.
139

Nanowires de InP: cálculo do espectro de absorção via método k.p / InP nanowires: absorption spectrum calculation via k.p method

Tiago de Campos 25 July 2013 (has links)
Nos últimos anos, os avanços nas técnicas de crescimento de semicondutores permitiram a fabricação de nanoestruturas isoladas de alta qualidade e com confinamento radial. Essas estruturas quase unidimensionais, conhecidas como nanowires (NWs) têm aplicações tecnológicas vastas, tais como nano sensores químicos e biológicos, foto-detectores e lasers. Seu uso em aplicações tecnológicas requer a compreensão de características óticas e eletrônicas e um estudo teórico mais profundo se faz necessário. O objetivo desse estudo e calcular teoricamente o poder de absorção para NWs de InP e comparar os resultados para as fases cristalinas zincblende (ZB) e wurtzita (WZ) nas suas direções de crescimento equivalentes. Usamos neste estudo a formulação do método k.p que descreve as duas fases cristalinas em um mesmo Hamiltoniano, a aproximação da função envelope e a expansão em ondas planas. O poder de absorção foi calculado a partir das transições entre as bandas de valência e condução através da regra de ouro de Fermi. Mesmo o método k.p sendo o menos custoso computacionalmente, quando comparado com seus correspondentes ab initio, o tamanho das matrizes envolvidas nos cálculos pode ultrapassar a barreira dos giga elementos. Para lidar com essas matrizes, foi implementado um método de resolução de sistemas lineares iterativo, o LOBPCG, utilizando o poder de processamento disponível nas placas gráficas atuais. O novo modo de resolução apresentou ganhos consideráveis em relação ao desempenho observado com os métodos de diagonalização diretos em testes com confinamento em uma única direção. A falta de um pré-condicionador adequado limita o seu uso em NWs. Os cálculos de absorção para NWs na fase ZB apresentaram uma anisotropia em seu espectro de absorção de mais de 90%, enquanto os na fase WZ apresentaram dois regimes distintos de anisotropia, governados pelo aparecimento de um estado oticamente proibido no topo da banda de valência. Em suma, os resultados obtidos com o modelo teórico proposto nesse estudo apresentam as propriedades óticas reportadas na literatura, inclusive o estado oticamente proibido observado em outros sistemas na fase WZ com um alto confinamento quântico. / In recent years, the advances of growth techniques allowed the fabrication of high quality single nanostructures with quantum confinement along lateral directions. These quasi one-dimensional structures known as nanowires (NWs) have vasts technological applications, such as biological and chemical nanosensors, photo detectors and lasers. The applications involving NWs require the comprehension of their optical and electronic properties and, therefore, a deep theoretical understanding should be pursued. The aim of this study is to provide optical absorption theoretical calculations for InP NWs, comparing the results for zincblende (ZB) and wurtzite (WZ) crystal phases, in their equivalent growth directions. We use the k.p method formulation that allow the description of both structures with the same Hamiltonian, the envelope function approximation and the plane wave expansion. The absorption power was calculated for transitions between valence and conduction bands using Fermis Golden Rule. Although the k.p method demands less computational effort, when compared to ab initio calculations, the k.p matrices can break the giga elements barrier. To deal with these matrices, we implemented an linear system solver method, the LOBPCG, using the processing power available in current GPUs. The new resolution method showed a considerable gain comparing the performance of direct diagonalization methods, when tested in systems with confinement in one direction. The lack of an adequate preconditioner limits its use in NWs. The absorption spectra calculations for ZB NWs presented a 90% plus anisotropy, whilst WZ NWs have two distinct regimes, ruled by the presence of an optically forbidden state at valence band maximum. In summary, the results obtained with the theoretical model in this study are in great agreement with optical properties reported in the literature, including the optically forbidden state observed in other WZ systems with high quantum confinement.
140

Transformations de programme automatiques et source-à-source pour accélérateurs matériels de type GPU / Source-to-Source Automatic Program Transformations for GPU-like Hardware Accelerators

Amini, Mehdi 13 December 2012 (has links)
Depuis le début des années 2000, la performance brute des cœurs des processeurs a cessé son augmentation exponentielle. Les circuits graphiques (GPUs) modernes ont été conçus comme des circuits composés d'une véritable grille de plusieurs centaines voir milliers d'unités de calcul. Leur capacité de calcul les a amenés à être rapidement détournés de leur fonction première d'affichage pour être exploités comme accélérateurs de calculs généralistes. Toutefois programmer un GPU efficacement en dehors du rendu de scènes 3D reste un défi.La jungle qui règne dans l'écosystème du matériel se reflète dans le monde du logiciel, avec de plus en plus de modèles de programmation, langages, ou API, sans laisser émerger de solution universelle.Cette thèse propose une solution de compilation pour répondre partiellement aux trois "P" propriétés : Performance, Portabilité, et Programmabilité. Le but est de transformer automatiquement un programme séquentiel en un programme équivalent accéléré à l'aide d'un GPU. Un prototype, Par4All, est implémenté et validé par de nombreuses expériences. La programmabilité et la portabilité sont assurées par définition, et si la performance n'est pas toujours au niveau de ce qu'obtiendrait un développeur expert, elle reste excellente sur une large gamme de noyaux et d'applications.Une étude des architectures des GPUs et les tendances dans la conception des langages et cadres de programmation est présentée. Le placement des données entre l'hôte et l'accélérateur est réalisé sans impliquer le développeur. Un algorithme d'optimisation des communications est proposé pour envoyer les données sur le GPU dès que possible et les y conserver aussi longtemps qu'elle ne sont pas requises sur l'hôte. Des techniques de transformations de boucles pour la génération de code noyau sont utilisées, et même certaines connues et éprouvées doivent être adaptées aux contraintes posées par les GPUs. Elles sont assemblées de manière cohérente, et ordonnancées dans le flot d'un compilateur interprocédural. Des travaux préliminaires sont présentés au sujet de l'extension de l'approche pour cibler de multiples GPUs. / Since the beginning of the 2000s, the raw performance of processors stopped its exponential increase. The modern graphic processing units (GPUs) have been designed as array of hundreds or thousands of compute units. The GPUs' compute capacity quickly leads them to be diverted from their original target to be used as accelerators for general purpose computation. However programming a GPU efficiently to perform other computations than 3D rendering remains challenging.The current jungle in the hardware ecosystem is mirrored by the software world, with more and more programming models, new languages, different APIs, etc. But no one-fits-all solution has emerged.This thesis proposes a compiler-based solution to partially answer the three "P" properties: Performance, Portability, and Programmability. The goal is to transform automatically a sequential program into an equivalent program accelerated with a GPU. A prototype, Par4All, is implemented and validated with numerous experiences. The programmability and portability are enforced by definition, and the performance may not be as good as what can be obtained by an expert programmer, but still has been measured excellent for a wide range of kernels and applications.A survey of the GPU architectures and the trends in the languages and framework design is presented. The data movement between the host and the accelerator is managed without involving the developer. An algorithm is proposed to optimize the communication by sending data to the GPU as early as possible and keeping them on the GPU as long as they are not required by the host. Loop transformations techniques for kernel code generation are involved, and even well-known ones have to be adapted to match specific GPU constraints. They are combined in a coherent and flexible way and dynamically scheduled within the compilation process of an interprocedural compiler. Some preliminary work is presented about the extension of the approach toward multiple GPUs.

Page generated in 0.0325 seconds