• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 4
  • 1
  • Tagged with
  • 5
  • 5
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Compiling the parallel programming language NestStep to the CELL processor

Holm, Magnus January 2010 (has links)
<p>The goal of this project is to create a source-to-source compiler which will translate NestStep code to C code. The compiler's job is to replace NestStep constructs with a series of function calls to the NestStep runtime system. NestStep is a parallel programming language extension based on the BSP model. It adds constructs for parallel programming on top of an imperative programming language. For this project, only constructs extending the C language are relevant. The output code will compile to form an executable program that runs on the multicore processor Cell Broadband Engine (Cell BE). The NestStep runtime system has been ported to the Cell BE and is available from start of this project.</p>
2

Automated Recognition of Algorithmic Patterns in DSP Programs

Shafiee Sarvestani, Amin January 2011 (has links)
We introduce an extensible knowledge based tool for idiom (pattern) recognition in DSP(digital signal processing) programs. Our tool utilizesfunctionality provided by the Cetus compiler infrastructure fordetecting certain computation patterns that frequently occurin DSP code. We focus on recognizing patterns for for-loops andstatements in their bodies as these often are the performance criticalconstructs in DSP applications for which replacementby highly optimized, target-specific parallel algorithms will bemost profitable. For better structuring and efficiency of patternrecognition, we classify patterns by different levels of complexitysuch that patterns in higher levels are defined in terms of lowerlevel patterns.The tool works statically on the intermediate representation(IR). It traverses the abstract syntax tree IR in post-orderand applies bottom-up pattern matching, at each IR nodeutilizing information about the patterns already matched for itschildren or siblings.For better extensibility and abstraction,most of the structuralpart of recognition rules is specified in XML form to separatethe tool implementation from the pattern specifications.Information about detected patterns will later be used foroptimized code generation by local algorithm replacement e.g. for thelow-power high-throughput multicore DSP architecture ePUMA.
3

Compiling the parallel programming language NestStep to the CELL processor

Holm, Magnus January 2010 (has links)
The goal of this project is to create a source-to-source compiler which will translate NestStep code to C code. The compiler's job is to replace NestStep constructs with a series of function calls to the NestStep runtime system. NestStep is a parallel programming language extension based on the BSP model. It adds constructs for parallel programming on top of an imperative programming language. For this project, only constructs extending the C language are relevant. The output code will compile to form an executable program that runs on the multicore processor Cell Broadband Engine (Cell BE). The NestStep runtime system has been ported to the Cell BE and is available from start of this project.
4

Improving performance of sequential code through automatic parallelization / Prestandaförbättring av sekventiell kod genom automatisk parallellisering

Sundlöf, Claudius January 2018 (has links)
Automatic parallelization is the conversion of sequential code into multi-threaded code with little or no supervision. An ideal implementation of automatic parallelization would allow programmers to fully utilize available hardware resources to deliver optimal performance when writing code. Automatic parallelization has been studied for a long time, with one result being that modern compilers support vectorization without any input. In the study, contemporary parallelizing compilers are studied in order to determine whether or not they can easily be used in modern software development, and how code generated by them compares to manually parallelized code. Five compilers, ICC, Cetus, autoPar, PLUTO, and TC Optimizing Compiler are included in the study. Benchmarks are used to measure speedup of parallelized code, these benchmarks are executed on three different sets of hardware. The NAS Parallel Benchmarks (NPB) suite is used for ICC, Cetus, and autoPar, and PolyBench for the previously mentioned compilers in addition to PLUTO and TC Optimizing Compiler. Results show that parallelizing compilers outperform serial code in most cases, with certain coding styles hindering the capability of them to parallelize code. In the NPB suite, manually parallelized code is outperformed by Cetus and ICC for one benchmark. In the PolyBench suite, PLUTO outperforms the other compilers to a great extent, producing code not only optimized for parallel execution, but also for vectorization. Limitations in code generated by Cetus and autoPar prevent them from being used in legacy projects, while PLUTO and TC do not offer fully automated parallelization. ICC was found to offer the most complete automatic parallelization solution, although offered speedups were not as great as ones offered by other tools. / Automatisk parallellisering innebär konvertering av sekventiell kod till multitrådad kod med liten eller ingen tillsyn. En idealisk implementering av automatisk parallellisering skulle låta programmerare utnyttja tillgänglig hårdvara till fullo för att uppnå optimal prestanda när de skriver kod. Automatisk parallellisering har varit ett forskningsområde under en längre tid, och har resulterat i att moderna kompilatorer stöder vektorisering utan någon insats från programmerarens sida. I denna studie studeras samtida parallelliserande kompilatorer för att avgöra huruvida de lätt kan integreras i modern mjukvaruutveckling, samt hur kod som dessa kompilatorer genererar skiljer sig från manuellt parallelliserad kod. Fem kompilatorer, ICC, Cetus, autoPar, PLUTO, och TC Optimizing Compiler inkluderas i studien. Benchmarks används för att mäta speedup av paralleliserad kod. Dessa benchmarks exekveras på tre skiljda hårdvaruuppsättningar. NAS Parallel Benchmarks (NPB) används som benchmark för ICC, Cetus, och autoPar, och PolyBench för samtliga kompilatorer i studien. Resultat visar att parallelliserande kompilatorer genererar kod som presterar bättre än sekventiell kod i de flesta fallen, samt att vissa kodstilar begränsar deras möjlighet att parallellisera kod. I NPB så presterar kod parallelliserad av Cetus och ICC bättre än manuellt parallelliserad kod för en benchmark. I PolyBench så presterar PLUTO mycket bättre än de andra kompilatorerna och producerar kod som inte endast är optimerad för parallell exekvering, utan också för vektorisering. Begränsningar i kod genererad av Cetus och autoPar förhindrar användningen av dessa redskap i etablerade projekt, medan PLUTO och TC inte är kapabla till fullt automatisk parallellisering. Det framkom att ICC erbjuder den mest kompletta lösningen för automatisk parallellisering, men möjliga speedups var ej på samma nivå som för de andra kompilatorerna.
5

Infraestrutura de compilação para a implementação de aceleradores em FPGA

Rettore, Paulo Henrique Lopes 23 November 2012 (has links)
Made available in DSpace on 2016-06-02T19:06:00Z (GMT). No. of bitstreams: 1 4747.pdf: 5016839 bytes, checksum: ca7594d5895754f4ee9eb215e548c3cc (MD5) Previous issue date: 2012-11-23 / Financiadora de Estudos e Projetos / In recent years, performance improvements in sequential microprocessors have been limited by physical and technological factors. For this reason, alternative approaches for high performance execution have gained importance. One of them is based in the use of reconfigurable hardware, implemented using FPGAs. However, conventional methods for programming those devices are notoriously complex, usually based on hardware description languages such as VHDL and Verilog. This work presents the development of a compilation framework to support the translation of a loop, described in C language, into its corresponding version for synthesis in reconfigurable hardware. The optimized execution is based on the loop pipelining technique, which requires advanced compiler support. That is achieved by using the Cetus compiler, enhanced by a number of modifications, and thus used as a basis for the semi-automatic generation of custom-hardware accelerators. In order to guide the compiler developments and validate its basic functionalities, two study cases were considered: one based on finite state machines as the method of choice for hardware modelling (EC-1), and another based on the LALP domain specific language. In both cases, the proposed compilation framework have shown to be a facilitator element for the development of high performance custom-hardware. / O aumento no desempenho de processadores sequenciais tem sido limitado severamente por fatores físicos e tecnológicos nos últimos anos. Dessa forma, abordagens alternativas para a execução com alto desempenho ganharam maior importância nos últimos anos. Uma delas baseia-se na utilização de hardware customizado, implementado utilizando-se FPGAs. Entretanto, os métodos convencionais para programação desses dispositivos são notoriamente complexos, normalmente baseados em linguagens como VHDL e Verilog. Este trabalho apresenta o desenvolvimento de um framework de compilação para auxiliar a transformação de um loop, escrito em linguagem C, em sua versão para hardware customizado. A execução otimizada baseia-se na técnica de loop pipelining, a qual exige suporte avançado de compilação. Este é conseguido utilizando o compilador Cetus, que após uma série de modificações, pode ser utilizado como base para a geração semi-automática de aceleradores em hardware customizado. Como forma de guiar o desenvolvimento do compilador e validar suas funcionalidades básicas, dois casos de estudo foram considerados: um baseado na utilização de máquinas de estados finitos como método para a modelagem de hardware (EC-1), e outro baseado na linguagem de domínio específico LALP (EC-2). Em ambos os casos, o framework de compilação proposto mostrou-se útil como elemento facilitador ao desenvolvimento de hardware customizado de alto desempenho.

Page generated in 0.0291 seconds