• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 119
  • 37
  • 28
  • 7
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 294
  • 179
  • 121
  • 102
  • 100
  • 68
  • 47
  • 42
  • 40
  • 40
  • 40
  • 37
  • 36
  • 35
  • 35
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
111

An LLVM Back-end for REPLICA : Code Generation for a Multi-core VLIWProcessor with Chaining / Ett LLVM Back-end för REPLICA : Kodgenerering för en Flerkärning VLIW Processor med Kedjade Instruktioner

Åkesson, Daniel January 2012 (has links)
REPLICA is a PRAM-NUMA hybrid architecture, with support for instructionlevel parallelism as a VLIW architecture. REPLICA can also chain instructionsso that the output from an earlier instruction can be used as input to a laterinstruction in the same execution step. There are plans in the REPLICA project to develop a new C-based program-ming language, compilers and libraries to speed up development of parallel pro-grams. We have developed a LLVM back-end as a part of the REPLICA projectthat can be used to generate code for the REPLICA architecture. We have alsocreated a simple optimization algorithm to make better use of REPLICAs supportfor instruction level parallelism. Some changes to Clang, LLVMs front-end forC/C++/Objective-C, was also necessary so that we could use assembler in-liningin our REPLICA programs. Using Clang to compile C-code to LLVMs internal representation and LLVMwith our REPLICA back-end to transform LLVMs internal representation intoMBTAC1 assembler. / REPLICA är en VLIW liknande PRAM-NUMA arkitektur, med möjlighet för attkedja ihop instruktioner så att resultat från tidigare instruktioner kan användassom indata till nästa instruktion i samma exekveringssteg. Inom REPLICA projetet finns planer på att utecklar ett nytt C-baserat pro-grammeringsspråk, kompilatorer och bibliotek för att snabbba upp utvecklingen avparallella program. Som en del av REPLICA projektet har vi utvecklat ett kompi-lator back-end för LLVM som kan användas för att generera kod till REPLICA. Vihar även utvecklat en enklare optimerings algoritm för att bättre utnyttja REPLI-CAs förmåga för instruktions parallelisering. Vi har även gjort ändringar i Clang,LLVMs front-end för C/C++/Objective-C, så att vi kan använda inline assembleri REPLICA program. Med Clang kan man kompilera C-kod till LLVMs interna representation somi sin tur genom LLVM och REPLICA back-end kan omvandlas till MBTAC3 as-sembler.
112

Webové prostředí pro výuku paralelního programování / Web Platform for Parallel Programming Tutorials

Buzek, Emanuel January 2017 (has links)
This thesis presents a novel approach to introducing programmers into parallel and distributed computing. The main objective of this work is to develop an online coding environment which contains tutorials in form of simple parallel programming tasks. The online application simulates and visualizes multiple agents which cooperate on a task in virtual environment. These agents are programmed in a custom procedural language similar to JavaScript. A significant part of this thesis focuses on the design of this language. The client-side compiler is built using tools similar to Bison and Flex. The parallel simulator supports different scheduling algorithms including lock- step mode simulating computation on a GPU. An important aspect of the platform is extensibility; therefore, the tutorials and the packages for the programming language can be added as plug-ins. The final part of this thesis is dedicated to the implementation of sample packages and tutorials which demonstrate that the key goals of this thesis have been accomplished.
113

An Estelle-C compiler for automatic protocol implementation

Chan, Robin Isaac Man-Hang January 1987 (has links)
Over the past few years, much experience has been gained in semi-automatic protocol implementation using an existing Estelle-C compiler developed at the University of British Columbia. However, with the continual evolution of the Estelle language, that compiler is now obsolete. The present study found substantial syntactic and semantic differences between the Estelle language as implemented by the existing compiler and that specified in the latest ISO document to warrant the construction of a new Estelle-C compiler. The result is a new compiler which translates Estelle as defined in the second version of the ISO Draft Proposal 9074 into the programming language C. The new Estelle-C compiler addresses issues such as dynamic reconfiguration of modules and maintenance of priority relationships among nested modules. A run-time environment capable of supporting the new Estelle features is also presented. The implementation strategy used in the new Estelle-C compiler is illustrated by using the alternating bit protocol found in the ISO Draft Proposal 9074 document. / Science, Faculty of / Computer Science, Department of / Graduate
114

Code optimisation using discrete optimisation techniques.

Dopler, Tristan Didier 29 May 2008 (has links)
The topic for this dissertation is the optimisation of computer programs, as they are being compiled, using discrete optimisation techniques. The techniques introduced aim to optimise the runtime performance of programs executing on certain types of processors. A very important component of this dissertation is the movement of complexity from the processor to the compiler. Therefore both computer architecture and compilers are important supporting topics. The data output of the compiler is processed using information about the processor to produce execution information which is the goal of this dissertation. Concepts related to instruction level parallelism are covered in two parts. The first part discusses implicit parallelism, where parallel instruction scheduling is performed by the processor. The second part discusses explicit parallelism, where the compiler schedules the instructions. Explicit parallelism is attractive because it allows processor design to be simplified resulting in multiple benefits. Scheduling the instructions to execute while adhering to resource limitations is the area of focus for the rest of the dissertation. In order to find optimal schedules the problem is modelled as a mathematical program. Expressing instructions, instruction dependencies and resource limitations as a mathematical program are discussed in detail with several algorithms being introduced. Several aspects prevent the mathematical programs from being solved in their initial state, therefore additional techniques are introduced. A heuristic algorithm is introduced for scheduling instructions in a resource limited environment. The primary use of this heuristic is to reduce the computational complexity of the problem. However, this heuristic algorithm can be used to generate good schedules on its own. Finally information regarding a practical implementation of a compiler that implements the introduced techniques is introduced as well as experimental results. The experimental results are generated from a series of test programs illustrating the complete process and the computational complexity of the algorithms employed. / Smith, T.H.C., Prof.
115

Uma coleção de estudos de caso sobre o uso da linguagem Halide de domínio-específico em processamento de imagens e arquiteturas paralelas / A collection of case studies of using the Halide domain-specific language for image processing tasks in parallel architectures

Oliveira, Victor Matheus de Araujo, 1988- 24 August 2018 (has links)
Orientador: Roberto de Alencar Lotufo / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação / Made available in DSpace on 2018-08-24T09:58:17Z (GMT). No. of bitstreams: 1 Oliveira_VictorMatheusdeAraujo_M.pdf: 5485933 bytes, checksum: 54457856261050bf86360bc7acbb2e5b (MD5) Previous issue date: 2013 / Resumo: Um novo desenvolvimento no campo de Linguagens de Domínio-Específico são linguagens de programação que podem funcionar tanto em CPUs multi-núcleo quanto em GPUs. Nesta dissertação, avaliamos Halide, uma Linguagem de Domínio Específico (DSL) para processamento de imagens. Halide funciona tanto em CPUs como em GPUs e almeja ser uma forma mais simples e eficiente, em termos de desempenho, de expressar algoritmos da área do que as alternativas tradicionais. Para mostrar o potencial e as limitações da linguagem Halide, fazemos nesta dissertação alguns estudos de caso com algoritmos que acreditamos ser bons exemplos de categorias-chave em Processamento de Imagens, especialmente em manipulação e edição de Imagens. Comparamos o desempenho e simplicidade de implementação desses problemas com implementações em C++ usando \emph{threads} e vetorização, para arquiteturas CPU multi-núcleo, e OpenCL, para CPUs e GPUs. Mostramos que há problemas na implementação atual de Halide e que alguns tipos de algoritmos da área não podem ser bem expressos na linguagem, o que limita a sua aplicabilidade prática. Entretanto, onde isso é possível, Halide tem performance similar à implementações em OpenCL, onde vemos que há de fato ganho em termos de produtividade do programador. Halide é, portanto, apropriado para um grande conjunto de algoritmos usados em imagens e é um passo na direção certa para um modo de desenvolvimento mais fácil para aplicações de alto desempenho na área / Abstract: A development in the field of Domain-Specific Languages (DSL) are programming languages that can target both Multi-Core CPUs and accelerators like GPUs. We use Halide, a Domain-Specific Language that is suited for Image Processing tasks and that claims to be a more simple and efficient (performance-wise) way of expressing imaging algorithms than traditional alternatives. In order to show both potential and limitations of the Halide language, we do several case studies with algorithms we believe are representatives of key categories in today's Image Processing, specially in the area of Image Manipulation and Editing. We compare performance and simplicity of Halide implementations with multi-threaded C++ (for multi-core architectures) and OpenCL (for CPU and GPUs). We show that there are problems in the current implementation of the DSL and that many imaging algorithms cannot be efficiently expressed in the language, which limits its practical application; Nevertheless, in the cases where it is possible, Halide has similar performance to OpenCL and is much more simple to develop for. So we find that Halide is appropriate for a big class of image manipulation algorithms and is a step in the right direction for an easier way to use GPUs in imaging applications / Mestrado / Engenharia de Computação / Mestre em Engenharia Elétrica
116

Leave the Features: Take the Cannoli

Catanio, Jonathan Joseph 01 June 2018 (has links)
Programming languages like Python, JavaScript, and Ruby are becoming increasingly popular due to their dynamic capabilities. These languages are often much easier to learn than other, statically type checked, languages such as C++ or Rust. Unfortunately, these dynamic languages come at the cost of losing compile-time optimizations. Python is arguably the most popular language for data scientists and researchers in the artificial intelligence and machine learning communities. As this research becomes increasingly popular, and the problems these researchers face become increasingly computationally expensive, questions are being raised about the performance of languages like Python. Language features found in Python, more specifically dynamic typing and run-time modification of object attributes, preclude common static analysis optimizations that often yield improved performance. This thesis attempts to quantify the cost of dynamic features in Python. Namely, the run-time modification of objects and scope as well as the dynamic type system. We introduce Cannoli, a Python 3.6.5 compiler that enforces restrictions on the language to enable opportunities for optimization. The Python code is compiled into an intermediate representation, Rust, which is further compiled and optimized by the Rust pipeline. We show that the analyzed features cause a significant reduction in performance and we quantify the cost of these features for language designers to consider.
117

An Empirical Study of Alias Analysis Techniques

Tran, Andrew T 01 June 2018 (has links)
As software projects become larger and more complex, software optimization at that scale is only feasible through automated means. One such component of software optimization is alias analysis, which attempts to determine which variables in a program refer to the same area in memory, and is used to relocate instructions to improve performance without interfering with program execution. Several alias analyses have been proposed over the past few decades, with varying degrees of precision and time and space complexity, but few studies have been conducted to compare these techniques with one another, nor to measure with program data to confirm their accuracy. Normally, this is out of the scope of alias analyses because these processes are static, and can only rely upon the input source code. We address these limitations by instrumenting several benchmarks and combining their data with commonly used alias analyses to objectively measure the accuracy of those analyses. Additionally, we also gather additional program statistics to further determine which programs are the most suitable for evaluating subsequent alias analysis techniques.
118

Supported Programming for Beginning Developers

Gilbert, Andrew 01 March 2019 (has links)
Testing code is important, but writing test cases can be time consuming, particularly for beginning programmers who are already struggling to write an implementation. We present TestBuilder, a system for test case generation which uses an SMT solver to generate inputs to reach specified lines in a function, and asks the user what the expected outputs would be for those inputs. The resulting test cases check the correctness of the output, rather than merely ensuring the code does not crash. Further, by querying the user for expectations, TestBuilder encourages the programmer to think about what their code ought to do, rather than assuming that whatever it does is correct. We demonstrate, using mutation testing of student projects, that tests generated by TestBuilder perform better than merely compiling the code using Python’s built-in compile function, although they underperform the tests students write when required to achieve 100% test coverage.
119

ALTREP Data Representation ve FastR / ALTREP Data Representation in FastR

Marek, Pavel January 2020 (has links)
R is a programming language and a tool used mostly in statistics and data analysis domains, with a rich package-based extension system. GNU-R, the standard interpreter of R, in version 3.5.0 introduced a new native API (ALTREP) for R extensions developers. The goal of the thesis is to implement this API for FastR, an interpreter of R based on GraalVM and Truffle, and explore options for optimization of FastR in context of this API. The motivation is to increase the number of extensions that can be installed and run on FastR. 1
120

Techniques for Managing Irregular Control Flow on GPUs

Jad Hbeika (5929730) 25 June 2020 (has links)
<p>GPGPU is a highly multithreaded throughput architecture that can deliver high speed-up for regular applications while remaining energy efficient. In recent years, there has been much focus on tuning irregular applications and/or the GPU architecture to achieve similar benefits for irregular applications as well as efforts to extract data parallelism from task parallel applications. In this work we tackle both problems.</p><p>The first part of this work tackles the problem of Control divergence in GPUs. GPGPUs’ SIMT execution model is ineffective for workloads with irregular control-flow because GPGPUs serialize the execution of divergent paths which lead to thread-level parallelism (TLP) loss. Previous works focused on creating new warps based on the control path threads follow, or created different warps for the different paths, or ran multiple narrower warps in parallel. While all previous solutions showed speedup for irregular workloads, they imposed some performance loss on regular workloads. In this work we propose a more fine-grained approach to exploit <i>intra-warp</i>convergence: rather than threads executing the same code path, <i>opcode-convergent threads</i>execute the same instruction, but with potentially different operands. Based on this new definition we find that divergent control blocks within a warp exhibit substantial opcode convergence. We build a compiler that analyzes divergent blocks and identifies the common streams of opcodes. We modify the GPU architecture so that these common instructions are executed as convergent instructions. Using software simulation, we achieve a 17% speedup over baseline GPGPU for irregular workloads and do not incur any performance loss on regular workloads.</p><p>In the second part we suggest techniques for extracting data parallelism from irregular, task parallel applications in order to take advantage of the massive parallelism provided by the GPU. Our technique involves dividing each task into multiple sub-tasks each performing less work and touching a smaller memory footprint. Our framework performs a locality-aware scheduling that works on minimizing the memory footprint of each warp (a set of threads performing in lock-step). We evaluate our framework with 3 task-parallel benchmarks and show that we can achieve significant speedups over optimized GPU code.</p>

Page generated in 0.0481 seconds