Global ETD Search

31	Automatic code generation and optimization of multi-dimensional stencil computations on distributed-memory architectures / Génération automatique de code et optimisation de calculs stencils sur des architectures à mémoire distribuée Saied, Mariem 25 September 2018 (has links) Nous proposons Dido, un langage dédié (DSL) implicitement parallèle qui capture les spécifications de haut niveau des stencils et génère automatiquement du code parallèle de haute performance pour les architectures à mémoire distribuée. Le code généré utilise ORWL en tant que interface de communication et runtime. Nous montrons que Dido réalise un grand progrès en termes de productivité sans sacrifier les performances. Dido prend en charge une large gamme de calculs stencils ainsi que des applications réelles à base de stencils. Nous montrons que le code généré par Dido est bien structuré et se prête à de différentes optimisations possibles. Nous combinons également la technique de génération de code de Dido avec Pluto l'optimiseur polyédrique de boucles pour améliorer la localité des données. Nous présentons des expériences qui prouvent l'efficacité et la scalabilité du code généré qui atteint de meilleures performances que les implémentations ORWL et MPI écrites à la main. / In this work, we present Dido, an implicitly parallel domain-specific language (DSL) that captures high-level stencil abstractions and automatically generates high-performance parallel stencil code for distributed-memory architectures. The generated code uses ORWL as a communication and synchronization backend. We show that Dido achieves a huge progress in terms of programmer productivity without sacrificing the performance. Dido supports a wide range of stencil computations and real-world stencil-based applications. We show that the well-structured code generated by Dido lends itself to different possible optimizations and study the performance of two of them. We also combine Dido's code generation technique with the polyhedral loop optimizer Pluto to increase data locality and improve intra-node data reuse. We present experiments that prove the efficiency and scalability of the generated code that outperforms both ORWL and MPI hand-crafted implementations. Langage dédié Verroux ordonnés de lecture/écriture Calculs stencils Modèle polyédrique Mémoire distribuée Domain-specific language Ordered read write locks Stencil computations Polyhedral model Distributed memory 004
32	Performance Optimization of Stencil Computations on Modern SIMD Architectures Henretty, Thomas Steel January 2014 (has links) No description available. Computer Science stencil SIMD PDE DLT SDSL domain specific language stream alignment conflict split tiling nested split tiling hybrid split tiling
33	Evaluating the OpenACC API for Parallelization of CFD Applications Pickering, Brent Phillip 06 September 2014 (has links) Directive-based programming of graphics processing units (GPUs) has recently appeared as a viable alternative to using specialized low-level languages such as CUDA C and OpenCL for general-purpose GPU programming. This technique, which uses directive or pragma statements to annotate source codes written in traditional high-level languages, is designed to permit a unified code base to serve multiple computational platforms and to simplify the transition of legacy codes to new architectures. This work analyzes the popular OpenACC programming standard, as implemented by the PGI compiler suite, in order to evaluate its utility and performance potential in computational fluid dynamics (CFD) applications. Of particular interest is the handling of stencil algorithms, which are an important component of finite-difference and finite-volume numerical methods. To this end, the process of applying the OpenACC Fortran API to a preexisting finite-difference CFD code is examined in detail, and all modifications that must be made to the original source in order to run efficiently on the GPU are noted. Optimization techniques for OpenACC are also explored, and it is demonstrated that tuning the code for a particular accelerator architecture can result in performance increases of over 30%. There are also some limitations and programming restrictions imposed by the API: it is observed that certain useful features of modern Fortran (2003/8) are effectively disabled within OpenACC regions. Finally, a combination of OpenACC and OpenMP directives is used to create a truly cross-platform Fortran code that can be compiled for either CPU or GPU hardware. The performance of the OpenACC code is measured on several contemporary NVIDIA GPU architectures, and a comparison is made between double and single precision arithmetic showing that if reduced precision can be tolerated, it can lead to significant speedups. To assess the performance gains relative to a typical CPU implementation, the execution time for a standard benchmark case (lid-driven cavity) is used as a reference. The OpenACC version is compared against the identical Fortran code recompiled to use OpenMP on multicore CPUs, as well as a highly-optimized C++ version of the code that utilizes hardware aware programming techniques to attain higher performance on the Intel Xeon platforms being tested. Low-level optimizations specific to these architectures are analyzed and it is observed that the stencil access pattern required by the structured-grid CFD code sometimes leads to performance degrading conflict misses in the hardware managed CPU caches. The GPU code, which primarily uses software managed caching, is found to be free from these issues. Overall, it is observed that the OpenACC GPU code compares favorably against even the best optimized CPU version: using a single NVIDIA K20x GPU, the Fortran+OpenACC code is seen to outperform the optimized C++ version by 20% and the Fortran+OpenMP version by more than 100% with both CPU codes running on a 16-core Xeon workstation. / Master of Science graphics processing unit (GPU) computational fluid dynamics (CFD) directive-based programming parallel programming OpenACC Fortran 2003 stencil code finite-volume method
34	Neeuklidovské vykreslování ve VR / Non-Euclidean Rendering in VR Bobuľa, Matej January 2021 (has links) The main goal of this master's thesis is to research different approaches of rendering geometries and spaces in virtual reality. Learn more about the terms, non-Euclidean geometry and non-Euclidean spaces, their origin and different principles used in video game industry to simulate such geometries or spaces. Based on the research, a selection of an optimal API is needed for the implementation of such application. Application is designed to run on desktop computers with Microsoft Windows operating system. Application, in it's core, is a video game and the main goal of the player is to successfully complete each and every level of the game. These levels are designed in a specific way so that they each individually represent some form of non-Euclidean geometry or space.
35	Implementation trade-offs for FGPA accelerators / Compromis pour l'implémentation d'accélérateurs sur FPGA Deest, Gaël 14 December 2017 (has links) L'accélération matérielle désigne l'utilisation d'architectures spécialisées pour effectuer certaines tâches plus vite ou plus efficacement que sur du matériel générique. Les accélérateurs ont traditionnellement été utilisés dans des environnements contraints en ressources, comme les systèmes embarqués. Cependant, avec la fin des règles empiriques ayant régi la conception de matériel pendant des décennies, ces quinze dernières années ont vu leur apparition dans les centres de calcul et des environnements de calcul haute performance. Les FPGAs constituent une plateforme d'implémentation commode pour de tels accélérateurs, autorisant des compromis subtils entre débit/latence, surface, énergie, précision, etc. Cependant, identifier de bons compromis représente un défi, dans la mesure où l'espace de recherche est généralement très large. Cette thèse propose des techniques de conception pour résoudre ce problème. Premièrement, nous nous intéressons aux compromis entre performance et précision pour la conversion flottant vers fixe. L'utilisation de l'arithmétique en virgule fixe au lieu de l'arithmétique flottante est un moyen efficace de réduire l'utilisation de ressources matérielles, mais affecte la précision des résultats. La validité d'une implémentation en virgule fixe peut être évaluée avec des simulations, ou en dérivant des modèles de précision analytiques de l'algorithme traité. Comparées aux approches simulatoires, les méthodes analytiques permettent une exploration plus exhaustive de l'espace de recherche, autorisant ainsi l'identification de solutions potentiellement meilleures. Malheureusement, elles ne sont applicables qu'à un jeu limité d'algorithmes. Dans la première moitié de cette thèse, nous étendons ces techniques à des filtres linéaires multi-dimensionnels, comme des algorithmes de traitement d'image. Notre méthode est implémentée comme une analyse statique basée sur des techniques de compilation polyédrique. Elle est validée en la comparant à des simulations sur des données réelles. Dans la seconde partie de cette thèse, on se concentre sur les stencils itératifs. Les stencils forment un motif de calcul émergeant naturellement dans de nombreux algorithmes utilisés en calcul scientifique ou dans l'embarqué. À cause de cette diversité, il n'existe pas de meilleure architecture pour les stencils de façon générale : chaque algorithme possède des caractéristiques uniques (intensité des calculs, nombre de dépendances) et chaque application possède des contraintes de performance spécifiques. Pour surmonter ces difficultés, nous proposons une famille d'architectures pour stencils. Nous offrons des paramètres de conception soigneusement choisis ainsi que des modèles analytiques simples pour guider l'exploration. Notre architecture est implémentée sous la forme d'un flot de génération de code HLS, et ses performances sont mesurées sur la carte. Comme les résultats le démontrent, nos modèles permettent d'identifier les solutions les plus intéressantes pour chaque cas d'utilisation. / Hardware acceleration is the use of custom hardware architectures to perform some computations faster or more efficiently than on general-purpose hardware. Accelerators have traditionally been used mostly in resource-constrained environments, such as embedded systems, where resource-efficiency was paramount. Over the last fifteen years, with the end of empirical scaling laws, they also made their way to datacenters and High-Performance Computing environments. FPGAs constitute a convenient implementation platform for such accelerators, allowing subtle, application-specific trade-offs between all performance metrics (throughput/latency, area, energy, accuracy, etc.) However, identifying good trade-offs is a challenging task, as the design space is usually extremely large. This thesis proposes design methodologies to address this problem. First, we focus on performance-accuracy trade-offs in the context of floating-point to fixed-point conversion. Usage of fixed-point arithmetic instead of floating-point is an affective way to reduce hardware resource usage, but comes at a price in numerical accuracy. The validity of a fixed-point implementation can be assessed using either numerical simulations, or with analytical models derived from the algorithm. Compared to simulation-based methods, analytical approaches enable more exhaustive design space exploration and can thus increase the quality of the final architecture. However, their are currently only applicable to limited sets of algorithms. In the first part of this thesis, we extend such techniques to multi-dimensional linear filters, such as image processing kernels. Our technique is implemented as a source-level analysis using techniques from the polyhedral compilation toolset, and validated against simulations with real-world input. In the second part of this thesis, we focus on iterative stencil computations, a naturally-arising pattern found in many scientific and embedded applications. Because of this diversity, there is no single best architecture for stencils: each algorithm has unique computational features (update formula, dependences) and each application has different performance constraints/requirements. To address this problem, we propose a family of hardware accelerators for stencils, featuring carefully-chosen design knobs, along with simple performance models to drive the exploration. Our architecture is implemented as an HLS-optimized code generation flow, and performance is measured with actual execution on the board. We show that these models can be used to identify the most interesting design points for each use case. FPGA Accélérateurs matériels Optimisation des largeurs Conversion flottantvers fixe Analyse de précision Stencils itératifs Synthèse de haut niveau Modèles de performance FPGA Hardware Accelerators Wordlength Optimization Floating-Point to fixed-Point conversion Accuracy Analysis Iterative Stencil Computations High-Level Synthesis Performance Models
36	Etude de matrices de filtres Fabry Pérot accordables en technologie MOEMS intégré 3D : Application à l’imagerie multispectrale / Array of tunable Fabry Perot filters in 3D MOEMS integration technology : Application to multispectral imaging Bertin, Hervé 23 July 2013 (has links) L’imagerie multispectrale permet d’améliorer la détection et la reconnaissance de cibles dans les applications de surveillance. Elle consiste à analyser des images de la même scène acquises simultanément dans plusieurs bandes spectrales grâce à un filtrage. Cette thèse étudie la possibilité de réaliser une matrice de 4 filtres Fabry Pérot (FP) intégrés 3D et ajustables par actionnement électrostatique dans le domaine visible-proche infrarouge. Les miroirs fixes des filtres FP sont des multicouches ZnS/YF₃ déposés sur un wafer de borosilicate, et les miroirs mobiles sont des membranes multicouches PECVD SiNH/SiOH encastrées sur une structure mobile très compacte micro-usinée dans un wafer en silicium. Les performances optiques des filtres FP ont été optimisées en prenant en compte la dissymétrie et le déphasage à la réflexion des miroirs. La structure mobile a été modélisée par éléments finis pour minimiser ses déformations lors de l’actionnement. Les étapes critiques des procédés de fabrication des miroirs mobiles en technologie Si ou SOI ont été mises au point : i) la fabrication et la libération par gravures profondes DRIE et XeF₂ des membranes multicouches avec une contrainte résiduelle ajustée par recuit et une réflectance voisine de 50% dans une large gamme spectrale, ii) le contrôle des vitesse de la gravure DRIE avec des motifs temporaires permettant la gravure simultanée de motifs de largeur et de profondeur variables, et iii) la délimitation de motifs sur surfaces fortement structurées à l’aide de pochoirs alignés mécaniquement ou de films secs photosensibles. Ces travaux ouvrent la voie vers une réalisation complète d’une matrice de filtres FP intégrés 3D. / Multispectral imaging is used to improve target detection and identification in monitoring applications. It consists in analyzing images of the same scene simultaneously recorded in several spectral bands owing to a filtering. This thesis investigates the possibility to realize, an array of four 3D integrated Fabry-Perot (FP) filters that are tunable in the visible-near infrared range by electrostatic actuation. The fixed mirrors of the FP filters are ZnS/YF₃ multilayers deposited on a borosilicate wafer, and the movable mirrors are PECVD SiNH/SiOH multilayer membranes clamped in a very compact movable structure micromachined in a Si wafer. A 3rd glass wafer is used for filters packaging. Optical performances of the FP filters have been optimized by taking into account the asymmetry and the reflection phase shift of the mirrors and the mobile structure has been modeled by finite elements analysis notably to minimize its deformation during actuation. The critical steps of the movable mirrors fabrication process in Si or SOI technology have been developed : i) the fabrication and the release by DRIE and XeF₂ etching of 8 or 12 layers membranes with a residual stress tunable by annealing and a reflectance close to 50% in broad wavelength range (570-900nm), ii) the control with temporary patterns of the simultaneous deep etching of patterns with different widths and depths, and iv) various patterning techniques on highly structured surfaces based on shadow masks (with mechanical alignment) or laminated photosensitive dry films. These results open the way towards the full realization of an array of 3D integrated FP filters. MOEMS Imagerie multispectrale Intégration 3D Membrane Miroir diélectrique Contraintes résiduelles Planéité DRIE XeF₂ Masque stencil Alignement de wafers Films secs MOEMS Multispectral imaging 3D integration Membrane Dielectric mirror Residual stress Flatness DRIE XeF₂ Shadow mask Wafer alignment Dry film
37	Synthèse et caractérisation de matériaux mésoporeux à base d'oxyde de vanadium pour l'oxydation de composés organiques / Synthesis and Characterization of Vanadium-containing Mesoporous Silica and its Application in the Catalysis of Oxidation Reaction Zheng, Yuting 02 November 2014 (has links) Les matériaux à base de vanadium sont largement utilisés comme catalyseurs pour l'oxydation de composés organiques. Les propriétés catalytiques des catalyseurs au vanadium pour l'oxydation dépendent de l'état et de la stabilité des espèces de vanadium. Dans cette étude, nous développons des nouveaux catalyseurs hétérogènes au vanadium pour la réaction d’oxydation.Dans la première partie du travail, les matériaux mésoporeux à base de silice (MCM-41) contenant du Al (III) et du Ti (IV) sont envisagés comme supports. L'effet d'ancrage chimique de ces hétéroatomes sur les ions V (V) et leur dispersion dans la silice MCM- 41 ont été étudiés à l'aide d'une analyse quantitative des spectres UV-visible de réflectance diffuse. En complément, les matériaux ont été caractérisés par diffraction des rayons X (DRX), mesure de sorption d’azote, spectroscopie de résonance magnétique électrique (RPE) et la spectroscopie Raman. Les spectres UV-visible des échantillons hydratés et déshydratés mettent en évidence la coexistence de plusieurs espèces V (V) de différente nucléarité et différent taux d'hydratation. Le décalage vers le bleu de la bande UV des échantillons contenant comme des additifs les ions Al(III) ou Ti(IV) est cohérent avec une meilleure dispersion des ions vanadium présentant entre autres plus d’espèces mononucléaires (isolées). L'effet bénéfique du titane sur la dispersion de vanadium est compatible avec la formation directe de ponts covalents de type Ti-O-V.Dans la seconde partie, les ions V(IV) ont été déposés sur des matériaux mésoporeux à base de silice en utilisant une nouvelle stratégie dite de pochoir moléculaire ou « Molecular-Stencil Patterning ». La stratégie de pochoir moléculaire s’applique à la silice contenant des tensioactifs ioniques en utilisant ces derniers comme agent de masquage lors du greffage covalent de diverses fonctions. Cette stratégie de surface moléculaire permet de contrôler à la fois le voisinage moléculaire et la dispersion à longue distance des espèces de vanadium entre elles. La caractérisation a été effectuée en utilisant plusieurs méthodes telles l’analyse thermogravimétrique (ATG), la spectroscopie de résonance magnétique nucléaire (RMN), la spectroscopie infrarouge (IR) et la spectroscopie UV-visible. L'incorporation des ions titane (IV) joue le rôle d’ancre chimique pour les ions V(IV) comme dans le chapitre précédent. Il est montré qu’une proportion de V/Ti inférieure à un et proche de trois génère les meilleures conditions pour éviter la formation de gros agrégats d’oxyde de vanadium.Enfin, ces nouveaux matériaux au vanadium ont été testés en phase liquide pour catalyser l'oxydation partielle du cyclohexane en une huile désignée par son rapport molaire K/A de cyclohexanone (K) et de cyclohexanol (A). Ce mélange est utilisé comme telle en chimie industrielle de base, an particulier comme précurseurs de l'acide adipique et de caprolactame pour la synthèse du nylon. Les tests ont démontré que l’introduction de titane combiné à la stratégie de pochoir moléculaire a notablement amélioré les propriétés catalytiques de ce type de catalyseurs au vanadium.En conclusion, la silice MCM-41 au vanadium a été conçu par l’introduction des hétéroatomes d'ancrage et de la stratégie de pochoir moléculaire, afin d'améliorer la dispersion et la stabilité des sites actifs. Les matériaux conçus ont montré de meilleures propriétés et caractéristiques catalytiques dans divers caractérisation et la réaction d'oxydation. / Vanadium-based materials are widely used as catalysts for oxidation of organic compounds. The catalytic properties of vanadium catalysts for oxidation are related closely to the state and the stability of vanadium species. Therefore, a series of vanadium-containing MCM-41 silica were designed and developed in this study, and their catalytic application for oxidation reactions was evaluated as well.In the first part of work, the chemical anchoring effect of Al(III) or Ti(IV) heteroatoms on the dispersion of V (V) in MCM-41 type silica was investigated using a quantitative analysis of diffuse reflectance UV-visible spectra. The characteristic properties of prepared materials were determined by various characterization such as X-ray diffraction (XRD), N2 sorption measurement, Electron paramagnetic resonance (EPR) spectroscopy, UV-visible spectroscopy and Raman spectroscopy. UV-visible spectra of hydrated and dehydrated samples evidenced the coexistence of several V(V) species of different oligomerization and hydration levels. The global blue shift of the band in the presence of Al(III) or Ti(IV) additives was then assigned to a higher proportion of less clustered and isolated V(V) species. The stronger beneficial effect of Ti on the vanadium dispersion is consistent with a higher stability of the X-O-V bridges moving from X = Si to X = Al and Ti. In the second part, new mesoporous silica materials containing vanadium species were synthesized according to the molecular stencil patterning technique. Molecular stencil patterning is developed specifically for silica templated with ionic surfactants used as masking agent to sequentially immobilize via covalent bonding (grafting) different functions. This molecular surface engineering was proved to improve the vanadium species dispersion according to Thermogravimetric Analysis (TGA), Nuclear Magnetic Resonance spectroscopy (NMR), Infrared spectroscopy (IR) and UV-visible spectroscopy. The incorporation of titanium species played again the role to immobilize the vanadium species as the results in previous work. The V/Ti ratio should be less than 1 to control the formation of clusters of vanadium species.Lastly, the vanadium-containing materials were applied to the liquid phase oxidation of cyclohexane into cyclohexanol (A) and cyclohexanone (K). A mixture of these two products is often called K/A oil in the industrial chemical production. K/A oil is widely used as a raw material for adipic acid and caprolactam in the nylon industry. The catalysis results proved that the modification by adding titanium chemical anchors combined with the MSP technique improve the catalytic properties of vanadium-containing heterogeneous catalysts.In conclusion, the dispersion and stability of vanadium active sites has been improved in new syntheses of vanadium-containing MCM-41 type silica by combining both anchoring heteroatoms and molecular stencil patterning techniques. Such a novel design leads to better catalytic performance in oxidation reaction in correlation with the structural and physical characteristics of the material. Vanadium Silice mésoporeuse MCM-41 Spectroscopie UV-visible Stratégie de pochoir moléculaire Lessivage Oxydation du cyclohexane Vanadium Mesoporous silica MCM-41 UV-visible spectroscopy Molecular stencil patterning technique Leaching Oxidation of cyclohexane
38	Etude de matrices de filtres Fabry Pérot accordables en technologie MOEMS intégré 3D : Application à l'imagerie multispectrale Bertin, Hervé 23 July 2013 (has links) (PDF) L'imagerie multispectrale permet d'améliorer la détection et la reconnaissance de cibles dans les applications de surveillance. Elle consiste à analyser des images de la même scène acquises simultanément dans plusieurs bandes spectrales grâce à un filtrage. Cette thèse étudie la possibilité de réaliser une matrice de 4 filtres Fabry Pérot (FP) intégrés 3D et ajustables par actionnement électrostatique dans le domaine visible-proche infrarouge. Les miroirs fixes des filtres FP sont des multicouches ZnS/YF₃ déposés sur un wafer de borosilicate, et les miroirs mobiles sont des membranes multicouches PECVD SiNH/SiOH encastrées sur une structure mobile très compacte micro-usinée dans un wafer en silicium. Les performances optiques des filtres FP ont été optimisées en prenant en compte la dissymétrie et le déphasage à la réflexion des miroirs. La structure mobile a été modélisée par éléments finis pour minimiser ses déformations lors de l'actionnement. Les étapes critiques des procédés de fabrication des miroirs mobiles en technologie Si ou SOI ont été mises au point : i) la fabrication et la libération par gravures profondes DRIE et XeF₂ des membranes multicouches avec une contrainte résiduelle ajustée par recuit et une réflectance voisine de 50% dans une large gamme spectrale, ii) le contrôle des vitesse de la gravure DRIE avec des motifs temporaires permettant la gravure simultanée de motifs de largeur et de profondeur variables, et iii) la délimitation de motifs sur surfaces fortement structurées à l'aide de pochoirs alignés mécaniquement ou de films secs photosensibles. Ces travaux ouvrent la voie vers une réalisation complète d'une matrice de filtres FP intégrés 3D. MOEMS Imagerie multispectrale Intégration 3D Membrane Miroir diélectrique Contraintes résiduelles Planéité DRIE XeF₂ Masque stencil Alignement de wafers Films secs
39	An Optimizing Code Generator for a Class of Lattice-Boltzmann Computations Pananilath, Irshad Muhammed January 2014 (has links) (PDF) Lattice-Boltzmann method(LBM), a promising new particle-based simulation technique for complex and multiscale fluid flows, has seen tremendous adoption in recent years in computational fluid dynamics. Even with a state-of-the-art LBM solver such as Palabos, a user still has to manually write his program using the library-supplied primitives. We propose an automated code generator for a class of LBM computations with the objective to achieve high performance on modern architectures. Tiling is a very important loop transformation used to improve the performance of stencil computations by exploiting locality and parallelism. In the first part of the work, we explore diamond tiling, a new tiling technique to exploit the inherent ability of most stencils to allow tile-wise concurrent start. This enables perfect load-balance during execution and reduces the frequency of synchronization required. Few studies have looked at time tiling for LBM codes. We exploit a key similarity between stencils and LBM to enable polyhedral optimizations and in turn time tiling for LBM. Besides polyhedral transformations, we also describe a number of other complementary transformations and post processing necessary to obtain good parallel and SIMD performance on modern architectures. We also characterize the performance of LBM with the Roofline performance model. Experimental results for standard LBM simulations like Lid Driven Cavity, Flow Past Cylinder, and Poiseuille Flow show that our scheme consistently outperforms Palabos–on average by3 x while running on 16 cores of a n Intel Xeon Sandy bridge system. We also obtain a very significant improvement of 2.47 x over the native production compiler on the SPECLBM benchmark. Lattice-Boltzmann Computations Computational Fluid Dynamics Tiling Stencil Computations Single Instruction Multiple Data (SIMD) Parallel Computers Parallel Processing Loop Transformations Lattice-Boltzman Method (LBM) Lattice Boltzman Method Lattice-Boltzmann Equation Computer Science
40	Code Optimization on GPUs Hong, Changwan 30 October 2019 (has links) No description available. Computer Science GPU performance modeling optimization SpMV SpMM SDDMM sparse matrix graph processing tiling multicore manycore matrix multiplication tensor stencil SIMD data locality CSR parallel load balance shared memory graph analytics

Search results