• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 194
  • 34
  • 31
  • 16
  • 11
  • 10
  • 6
  • 6
  • 5
  • 3
  • 3
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 368
  • 134
  • 80
  • 73
  • 51
  • 45
  • 42
  • 40
  • 39
  • 36
  • 34
  • 34
  • 34
  • 32
  • 30
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
211

Compiling the parallel programming language NestStep to the CELL processor

Holm, Magnus January 2010 (has links)
The goal of this project is to create a source-to-source compiler which will translate NestStep code to C code. The compiler's job is to replace NestStep constructs with a series of function calls to the NestStep runtime system. NestStep is a parallel programming language extension based on the BSP model. It adds constructs for parallel programming on top of an imperative programming language. For this project, only constructs extending the C language are relevant. The output code will compile to form an executable program that runs on the multicore processor Cell Broadband Engine (Cell BE). The NestStep runtime system has been ported to the Cell BE and is available from start of this project.
212

Recompiling DSP applications to x86 using LLVM IR

Stenberg, David January 2014 (has links)
This thesis describes the design and implementation of a prototype LLVM compiler backend, x86-64p, that compiles code written for a DSP architecture, FADER, into executables for the x86-64 architecture. The prototype takes LLVM IR generated for the FADER architecture and compiles x86-64 executables that emulate the properties of the DSP architecture, e.g. the multiple address spaces, the big-endianness and the support for fixed-point arithmetics. The backend is compared to a previous solution, C-Emu, that converts the DSP code to normal C code that is compiled using a normal x86-64 compiler. The two solutions are compared in terms of their correctness, debuggability and performance. The created prototype handles code containing low-level architectural assumptions better than C-Emu. However, the added emulation reduces the debuggability and performance of the generated executables. We have measured a runtime overhead of up to a factor of two compared to C-Emu. We also present some possible solutions for these issues.
213

Fast and flexible compilation techniques for effective speculative polyhedral parallelization / Techniques de compilation flexibles et rapides pour la parallelization polyédrique et spéculative

Martinez Caamaño, Juan Manuel 29 September 2016 (has links)
Dans cette thèse, nous présentons nos contributions à APOLLO : un compilateur de parallélisation automatique qui combine l'optimisation polyédrique et la parallélisation spéculative, afin d'optimiser des programmes dynamiques à la volée. Grâce à une phase de profilage en ligne et un modèle spéculatif du comportement mémoire du programme cible, Apollo est capable de sélectionner une optimisation et de générer le code résultant. Pendant l'exécution du programme optimisé, Apollo vérifie constamment la validité du modèle spéculatif. La contribution principale de cette thèse est un mécanisme de génération de code qui permet d'instancier toute transformation polyédrique, au cours de l'exécution du programme cible, sans engendrer de surcoût temporel majeur. Ce procédé est désormais utilisé dans Apollo. Nous l'appelons Code-Bones. Il procure des gains de performance significatifs par comparaison aux autres approches. / In this thesis, we present our contributions to APOLLO: an automatic parallelization compiler that combines polyhedral optimization with Thread-Level-Speculation, to optimize dynamic codes on-the-fly. Thanks to an online profiling phase and a speculation model about the target's code behavior, Apollo is able to select an optimization and to generate code based on it. During optimized code execution, Apollo constantly verifies the validity of the speculation model. The main contribution of this thesis is a code generation mechanism that is able to instantiate any polyhedral transformation, at runtime, without incurring a major time-overhead. This mechanism is currently in use inside Apollo. We called it Code-Bones. It provides significant performance benefits when compared to other approaches.
214

SAT Compilation for Constraints over Structured Finite Domains

Bau, Alexander 22 March 2017 (has links) (PDF)
A constraint is a formula in first-order logic expressing a relation between values of various domains. In order to solve a constraint, constructing a propositional encoding is a successfully applied technique that benefits from substantial progress made in the development of modern SAT solvers. However, propositional encodings are generally created by developing a problem-specific generator program or by crafting them manually, which often is a time-consuming and error-prone process especially for constraints over complex domains. Therefore, the present thesis introduces the constraint solver CO4 that automatically generates propositional encodings for constraints over structured finite domains written in a syntactical subset of the functional programming language Haskell. This subset of Haskell enables the specification of expressive and concise constraints by supporting user-defined algebraic data types, pattern matching, and polymorphic types, as well as higher-order and recursive functions. The constraint solver CO4 transforms a constraint written in this high-level language into a propositional formula. After an external SAT solver determined a satisfying assignment for the variables in the generated formula, a solution in the domain of discourse is derived. This approach is even applicable for finite restrictions of recursively defined algebraic data types. The present thesis describes all aspects of CO4 in detail: the language used for specifying constraints, the solving process and its correctness, as well as exemplary applications of CO4.
215

Design and Implementation of a Compiler for an XML-based Hardware Description Language to Support Energy Optimization / Design och implementering av en kompilator för ett XML-baserat hårdvarubeskrivande språk med support för energioptimering

Yang, Ming-Jie January 2017 (has links)
GPU-based heterogeneous system architectures are popular as they combine the advantages of CPU with the benefits of GPU. Development of high-performance and power-efficient software for heterogeneous system architecture needs to take both hardware and software specifications into consideration, which leads the software development process to be more complicated. To simplify the software development process, Architecture Description Languages (ADLs) came out. By modeling the target architecture components into structural formats, programmers can adapt their software to the platforms which they used. XPDL is a modular and extensible XML-based platform description language which is mainly designed to support optimization.The purposes of this thesis are to design the query API (Application Programming Interface) and develop a compiler which translates the XPDL descriptors to libraries that implement the API to support programmers for the development of adaptive high-performance and energy-optimized software. In this thesis, we design and develop a compiler to generate the API according to the XPDL descriptors.The main workflow of the designed compiler is following: first, the toolchain validates the XPDL descriptors against XSDs. Second, it parses the descriptors into DOM trees and transforms them into XPDL model trees. Next, the compiler links all XPDL model trees together, which results in the intermediate representation (IR). Then, any unspecified node values which means the unknown attributes, are handled by microbenchmark generator and executor. In the end, the code generator generates the libraries which expose the API according to the information in the IR. Finally, a few example codes are discussed to show how the API can be used to develop performance adaptive applications on heterogeneous systems.
216

Un environnement parallèle de développement haut niveau pour les accélérateurs graphiques : mise en œuvre à l’aide d’OPENMP / A high-level parallel development framework for graphic accelerators : an implementation based on OPENMP

Noaje, Gabriel 07 March 2013 (has links)
Les processeurs graphiques (GPU), originellement dédiés à l'accélération de traitements graphiques, ont une structure hautement parallèle. Les innovations matérielles et de langage de programmation ont permis d'ouvrir le domaine du GPGPU, où les cartes graphiques sont utilisées comme des accélérateurs de calcul pour des applications HPC généralistes.L'objectif de nos travaux est de faciliter l'utilisation de ces nouvelles architectures pour les besoins du calcul haute performance ; ils suivent deux objectifs complémentaires.Le premier axe de nos recherches concerne la transformation automatique de code, permettant de partir d'un code de haut niveau pour le transformer en un code de bas niveau, équivalent, pouvant être exécuté sur des accélérateurs. Dans ce but nous avons implémenté un transformateur de code capable de prendre en charge les boucles « pour » parallèles d'un code OpenMP (simples ou imbriquées) et de le transformer en un code CUDA équivalent, qui soit suffisamment lisible pour permettre de le retravailler par des optimisations ultérieures.Par ailleurs, le futur des architectures HPC réside dans les architectures distribuées basées sur des nœuds dotés d'accélérateurs. Pour permettre aux utilisateurs d'exploiter les nœuds multiGPU, il est nécessaire de mettre en place des schémas d'exécution appropriés. Nous avons mené une étude comparative et mis en évidence que les threads OpenMP permettent de gérer de manière efficace plusieurs cartes graphiques et les communications au sein d'un nœud de calcul multiGPU. / Graphic cards (GPUs), initially used for graphic processing, have a highly parallel architecture. Innovations in both architecture and programming languages opened the new domain of GPGPU where GPUs are used as accelerators for general purpose HPC applications.Our main objective is to facilitate the use of these new architectures for high-performance computing needs; our research follows two main directions.The first direction concerns an automatic code transformation from a high level code into an equivalent low level one, capable of running on accelerators. To this end we implemented a code transformer that can handle parallel “for” loops (single or nested) of an OpenMP code and convert it into an equivalent CUDA code, which is in a human readable form that allows for further optimizations.Moreover, the future of HPC lies in distributed architectures based on hybrid nodes. Specific programming schemes have to be used in order to allow users to benefit from such multiGPU nodes. We conducted a comparative study which revealed that using OpenMP threads is the most adequate way to control multiple graphic cards as well as manage communications efficiently within a multiGPU node.
217

Software engineering abstractions for a numerical linear algebra library

Song, Zixu January 2012 (has links)
This thesis aims at building a numerical linear algebra library with appropriate software engineering abstractions. Three areas of knowledge, namely, Numerical Linear Algebra (NLA), Software Engineering and Compiler Optimisation Techniques, are involved. Numerical simulation is widely used in a large number of distinct disciplines to help scientists understand and discover the world. The solutions to frequently occurring numerical problems have been implemented in subroutines, which were then grouped together to form libraries for ease of use. The design, implementation and maintenance of a NLA library require a great deal of work so that the other two topics, namely, software engineering and compiler optimisation techniques have emerged. Generally speaking, these both try to divide the system into smaller and controllable concerns, and allow the programmer to deal with fewer concerns at one time. Band matrix operation, as a new level of abstraction, is proposed for simplifying library implementation and enhancing extensibility for future functionality upgrades. Iteration Space Partitioning (ISP) is applied, in order to make the performance of this generalised implementation for band matrices comparable to that of the specialised implementations for dense and triangular matrices. The optimisation of ISP can be either programmed using the pointcut-advice model of Aspect-Oriented Programming, or integrated as part of a compiler. This naturally leads to a comparison of these two different techniques for resolving one fundamental problem. The thesis shows that software engineering properties of a library, such as modularity and extensibility, can be improved by the use of the appropriate level of abstraction, while performance is either not sacrificed at all, or at least the loss of performance is limited. In other words, the perceived trade-off between the use of high-level abstraction and fast execution is made less significant than previously assumed.
218

Improving Stability and Parameter Selection of Data Processing Programs

Wen-Chuan Lee (8206287) 07 January 2020 (has links)
<div>Data-processing programs are becoming increasingly important in the Big-data era. However, two notable problems of these programs may cause sub-optimal data- processing results. On one hand, these programs contain large number of floating-point computations. Due to the limited precision of floating-point representations, errors are introduced, propagated and accumulated in series of computations, making the computation results unreliable. We call this problem as floating-point instability. On the other hand, these programs are heavily parameterized. As no universal optimal parameter configuration exists for all possible inputs, the setting of program parameters should be carefully chosen and tuned for each input. Otherwise, the result would be sub-optimal. Manual tuning is infeasible because the number of parameters and the range of each parameter value may be big.</div><div><br></div><div>We try to address these two challenges in this dissertation. For floating-point instability problem, we develop a novel runtime technique to capture different output variations in the presence of instability. It features the idea of transforming every floating point value to a vector of multiple values $-$ the values added to create the vector are obtained by introducing artificial errors that are upper bounds of actual errors. The propagation of artificial errors models the propagation of actual errors. When values in vectors result in discrete execution differences (e.g., following different paths), the execution is forked to capture the resulting output variations.</div><div><br></div><div>For parameterized data-processing programs, we develop a white-box program tuning framework to tune the program parameter configuration for optimal data-processing result of each program input. </div><div>To further reduce the parameter configuration overhead, we propose the first general framework to inject artificial intelligence (AI) in the program, so the intelligent program is able to predict the parameter configuration for each incoming input directly. However, similar to many other ML/AI applications, the crucial challenge lies in feature selection, i.e., selection of the feature variables for predicting the target parameter specified by the users.</div><div>Thus, we propose a novel approach by combining program analysis and statistical analysis for better program feature variables selection which further helps better target parameter prediction and improves the result.</div>
219

Informace o architektuře pro optimalizace v překladači LLVM / Architecture Information for LLVM Compiler Optimizations

Svoboda, Jan January 2020 (has links)
Tato práce se zabývá automatickou extrakcí informací o architektuře procesoru z jazyka CodAL. Získané informace jsou využity jako základ pro cenový model optimalizátoru překladače LLVM. V rámci práce vznikl nový systém, který vytváří cenový model, převádí jej do C++ kódu a sestavuje do dynamické knihovny. Tato knihovna je za běhu načtena překladačem a využita pro přesnější rozhodování o přínosech jednotlivých optimalizací. Výsledkem práce je průměrné 14% snížení velikosti strojového kódu programů a až 68% zlepšení výkonu generovaného kódu.
220

Vývojový modul s 32bitovým procesorem typu ARM / Development board with 32-bit ARM-based processor

Jůn, Lukáš January 2009 (has links)
The content of this thesis is to create a detailed description of 32-bit ARM-based processors. Reader will be inform about the each one of the family of ARM-based processor, about the options of creating applications for these CPUs. The Applications are commonly developed by using the C/C++ language. This text also deal's with the development environments. These tools are making easier the development of new applications. This thesis also contains a complete design and description of development board with Atmel AT91SAM7S64 MCU (with sample of source code included).

Page generated in 0.0672 seconds