• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 119
  • 37
  • 28
  • 7
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 294
  • 179
  • 121
  • 102
  • 100
  • 68
  • 47
  • 42
  • 40
  • 40
  • 40
  • 37
  • 36
  • 35
  • 35
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

LALP: uma linguagem para exploração do paralelismo de loops em computação reconfigurável / LALP: a language for parallelism of loops exploitation in reconfigurable computing

Menotti, Ricardo 23 June 2010 (has links)
A computação reconfigurável tem se tornado cada vez mais importante em sistemas computacionais embarcados e de alto desempenho. Ela permite níveis de desempenho próximos aos obtidos com circuitos integrados de aplicação específica (ASIC), enquanto ainda mantém flexibilidade de projeto e implementação. No entanto, para programar eficientemente os dispositivos, é necessária experiência em desenvolvimento e domínio de linguagem de descrição de hardware (HDL), tais como VHDL ou Verilog. As técnicas empregadas na compilação em alto nível (por exemplo, a partir de programas em C) ainda possuem muitos pontos em aberto a serem resolvidos antes que se possa obter resultados eficientes. Muitos esforços em se obter um mapeamento direto de algoritmos em hardware se concentram em loops, uma vez que eles representam as regiões computacionalmente mais intensivas de muitos programas. Uma técnica particularmente útil para isto é a de loop pipelining, a qual geralmente é adaptada de técnicas de software pipelining. A aplicação dessas técnicas está fortemente relacionada ao escalonamento das instruções, o que frequentemente impede o uso otimizado dos recursos presentes nos FPGAs modernos. Esta tese descreve uma abordagem alternativa para o mapeamento direto de loops descritos em uma linguagem de alto nível para FPGAs. Diferentemente de outras abordagens, esta técnica não é proveniente das técnicas de software pipelining. Nas arquiteturas obtidas o controle das operações é distribuído, tornando desnecessária uma máquina de estados finitos para controlar a ordem das operações, o que permitiu a obtenção de implementações eficientes. A especificação de um bloco de hardware é feita por meio de uma linguagem de domínio específico (LALP), especialmente concebida para suportar a aplicação das técnicas. Embora a sintaxe da linguagem lembre C, ela contém certas construções que permitem intervenções do programador para garantir ou relaxar dependências de dados, conforme necessário, e assim otimizar o desempenho do hardware gerado / Reconfigurable computing is becoming increasingly important in embedded and high-performance computing systems. It allows performance levels close to the ones obtained with Application-Specific Integrated circuits (ASIC), while still keeping design and implementation flexibility. However, to efficiently program devices, one needs the expertise of hardware developers in order master hardware description languages (HDL) such as VHDL or Verilog. Attempts to furnish a high-level compilation flow (e.g., from C programs) still have to address open issues before broader efficient results can be obtained. Many efforts trying to achieve a direct of algorithms into hardware concentrate on loops since they represent the most computationally intensive regions of many application codes. A particularly useful technique for this purpose is loop pipelining, which is usually adapted from software pipelining techniques. The application of this technique is strongly related to instruction scheduling, whic often prevents an optimized use of the resources present in modern FPGAs. This thesis decribes an alternative approach to direct mapping loops described in high-level labguages onto FPGAs. Different from oyher approaches, this technique does not inherit from software pipelining techniques. The control is distributed over operations, thus a finite state machine is not necessary to control the order of operations, allowing efficient harware implementations. The specification of a hardware block is done by means of LALP, a domain specific language specially designed to help the application of the techniques. While the language syntax resembles C, it contains certain constructs that allow programmer interventions to enforce or relax data dependences as needed, and so optimize the performance of the generated hardware
72

Paralelização de programas sisal para sistemas MPI / Parallelization of sisal programs for MPI systems

Nakashima, Raul Junji 15 March 1996 (has links)
Este trabalho teve como finalidade a implementação de um método para a paralelização parcial de programas, escritos na linguagem funcional, SISAL utilizando as bibliotecas do padrão MPI (Message Passing Interface). Para tal, propusemos a transformação dos programas SISAL através do particionamento do loop paralelo forall, através do método de particionamento slice e a utilização do modelo de implementação do paralelismo SPMD (Single Program Multiple Data) no estilo de programas mestre/escravo. A validação de nossa proposta foi obtida através da realização de testes onde foram comparados os resultados obtidos com os programas originais e os programas com as alterações propostas / This work describes a method for the partial parallelization of SISAL programs into programs with calls to MPI routines. We focused on the parallelization of the forall loop (through slicing of the index range). The generated code is a master/slave SPMD program. The work was validated through the compilation of some simple SISAL programs and comparison of the results with an unmodified version
73

Software-assisted data prefetching algorithms.

January 1995 (has links)
by Chi-sum, Ho. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1995. / Includes bibliographical references (leaves 110-113). / Abstract --- p.i / Acknowledgement --- p.iii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview --- p.1 / Chapter 1.2 --- Cache Memories --- p.1 / Chapter 1.3 --- Improving Cache Performance --- p.3 / Chapter 1.4 --- Improving System Performance --- p.4 / Chapter 1.5 --- Organization of the dissertation --- p.6 / Chapter 2 --- Related Work --- p.8 / Chapter 2.1 --- Cache Performance --- p.8 / Chapter 2.2 --- Non-Blocking Cache --- p.9 / Chapter 2.3 --- Cache Prefetching --- p.10 / Chapter 2.3.1 --- Hardware Prefetching --- p.10 / Chapter 2.3.2 --- Software-assisted Prefetching --- p.13 / Chapter 2.3.3 --- Improving Cache Effectiveness --- p.22 / Chapter 2.4 --- Other Techniques to Reduce and Hide Memory Latencies --- p.25 / Chapter 2.4.1 --- Register Preloading --- p.25 / Chapter 2.4.2 --- Write Policies --- p.26 / Chapter 2.4.3 --- Small Specialized Cache --- p.26 / Chapter 2.4.4 --- Program Transformation --- p.27 / Chapter 3 --- Stride CAM Prefetching --- p.30 / Chapter 3.1 --- Introduction --- p.30 / Chapter 3.2 --- Architectural Model --- p.32 / Chapter 3.2.1 --- Compiler Support --- p.33 / Chapter 3.2.2 --- Hardware Support --- p.35 / Chapter 3.2.3 --- Model Details --- p.39 / Chapter 3.3 --- Optimization Issues --- p.39 / Chapter 3.3.1 --- Eliminating Reductant Prefetching --- p.40 / Chapter 3.3.2 --- Code Motion --- p.40 / Chapter 3.3.3 --- Burst Mode --- p.44 / Chapter 3.3.4 --- Stride CAM Overflow --- p.45 / Chapter 3.3.5 --- Effects of Loop Optimizations --- p.46 / Chapter 3.4 --- Practicability --- p.50 / Chapter 3.4.1 --- Evaluation Methodology --- p.51 / Chapter 3.4.2 --- Prefetch Accuracy --- p.54 / Chapter 3.4.3 --- Stride CAM Size --- p.56 / Chapter 3.4.4 --- Software Overhead --- p.60 / Chapter 4 --- Stride Register Prefetching --- p.67 / Chapter 4.1 --- Motivation --- p.67 / Chapter 4.2 --- Architectural Model --- p.67 / Chapter 4.2.1 --- Stride Register --- p.69 / Chapter 4.2.2 --- Compiler Support --- p.70 / Chapter 4.2.3 --- Prefetch Bits --- p.72 / Chapter 4.2.4 --- Operation Details --- p.77 / Chapter 4.3 --- Practicability and Optimizations --- p.78 / Chapter 4.3.1 --- Practicability on NASA7 Benchmark Programs --- p.78 / Chapter 4.3.2 --- Optimization Issues --- p.81 / Chapter 4.4 --- Comparison Between Stride CAM and Stride Register Models --- p.84 / Chapter 5 --- Small Software-Driven Array Cache --- p.87 / Chapter 5.1 --- Introduction --- p.87 / Chapter 5.2 --- Cache Pollution in MXM --- p.88 / Chapter 5.3 --- Architectural Model --- p.89 / Chapter 5.3.1 --- Operation Details --- p.91 / Chapter 5.4 --- Effectiveness of Array Cache --- p.92 / Chapter 6 --- Conclusion --- p.96 / Chapter 6.1 --- Conclusion --- p.96 / Chapter 6.2 --- Future Research: An Extension of the Stride CAM Model --- p.97 / Chapter 6.2.1 --- Background --- p.97 / Chapter 6.2.2 --- Reference Address Series --- p.98 / Chapter 6.2.3 --- Extending the Stride CAM Model --- p.100 / Chapter 6.2.4 --- Prefetch Overhead --- p.109 / Bibliography --- p.110 / Appendix --- p.114 / Chapter A --- Simulation Results - Stride CAM Model --- p.114 / Chapter A.l --- Execution Time --- p.114 / Chapter A.1.1 --- BTRIX --- p.114 / Chapter A.1.2 --- CFFT2D --- p.115 / Chapter A.1.3 --- CHOLSKY --- p.116 / Chapter A.1.4 --- EMIT --- p.117 / Chapter A.1.5 --- GMTRY --- p.118 / Chapter A.1.6 --- MXM --- p.119 / Chapter A.1.7 --- VPENTA --- p.120 / Chapter A.2 --- Memory Delay --- p.122 / Chapter A.2.1 --- BTRIX --- p.122 / Chapter A.2.2 --- CFFT2D --- p.123 / Chapter A.2.3 --- CHOLSKY --- p.124 / Chapter A.2.4 --- EMIT --- p.125 / Chapter A.2.5 --- GMTRY --- p.126 / Chapter A.2.6 --- MXM --- p.127 / Chapter A.2.7 --- VPENTA --- p.128 / Chapter A.3 --- Overhead --- p.129 / Chapter A.3.1 --- BTRIX --- p.129 / Chapter A.3.2 --- CFFT2D --- p.130 / Chapter A.3.3 --- CHOLSKY --- p.131 / Chapter A.3.4 --- EMIT --- p.132 / Chapter A.3.5 --- GMTRY --- p.133 / Chapter A.3.6 --- MXM --- p.134 / Chapter A.3.7 --- VPENTA --- p.135 / Chapter A.4 --- Hit Ratio --- p.136 / Chapter A.4.1 --- BTRIX --- p.136 / Chapter A.4.2 --- CFFT2D --- p.137 / Chapter A.4.3 --- CHOLSKY --- p.137 / Chapter A.4.4 --- EMIT --- p.138 / Chapter A.4.5 --- GMTRY --- p.139 / Chapter A.4.6 --- MXM --- p.139 / Chapter A.4.7 --- VPENTA --- p.140 / Chapter B --- Simulation Results - Array Cache --- p.141 / Chapter C --- NASA7 Benchmark --- p.145 / Chapter C.1 --- BTRIX --- p.145 / Chapter C.2 --- CFFT2D --- p.161 / Chapter C.2.1 --- cfft2dl --- p.161 / Chapter C.2.2 --- cfft2d2 --- p.169 / Chapter C.3 --- CHOLSKY --- p.179 / Chapter C.4 --- EMIT --- p.192 / Chapter C.5 --- GMTRY --- p.205 / Chapter C.6 --- MXM --- p.217 / Chapter C.7 --- VPENTA --- p.220
74

ML4JIT- um arcabouço para pesquisa com aprendizado de máquina em compiladores JIT. / ML4JIT - a framework for research on machine learning in JIT compilers.

Mignon, Alexandre dos Santos 27 June 2017 (has links)
Determinar o melhor conjunto de otimizações para serem aplicadas a um programa tem sido o foco de pesquisas em otimização de compilação por décadas. Em geral, o conjunto de otimizações é definido manualmente pelos desenvolvedores do compilador e aplicado a todos os programas. Técnicas de aprendizado de máquina supervisionado têm sido usadas para o desenvolvimento de heurísticas de otimização de código. Elas pretendem determinar o melhor conjunto de otimizações com o mínimo de interferência humana. Este trabalho apresenta o ML4JIT, um arcabouço para pesquisa com aprendizado de máquina em compiladores JIT para a linguagem Java. O arcabouço permite que sejam realizadas pesquisas para encontrar uma melhor sintonia das otimizações específica para cada método de um programa. Experimentos foram realizados para a validação do arcabouço com o objetivo de verificar se com seu uso houve uma redução no tempo de compilação dos métodos e também no tempo de execução do programa. / Determining the best set of optimizations to be applied in a program has been the focus of research on compile optimization for decades. In general, the set of optimization is manually defined by compiler developers and apply to all programs. Supervised machine learning techniques have been used for the development of code optimization heuristics. They intend to determine the best set of optimization with minimal human intervention. This work presents the ML4JIT, a framework for research with machine learning in JIT compilers for Java language. The framework allows research to be performed to better tune the optimizations specific to each method of a program. Experiments were performed for the validation of the framework with the objective of verifying if its use had a reduction in the compilation time of the methods and also in the execution time of the program.
75

Compiler-assisted Adaptive Software Testing

Petsios, Theofilos January 2018 (has links)
Modern software is becoming increasingly complex and is plagued with vulnerabilities that are constantly exploited by attackers. The vast numbers of bugs found in security-critical systems and the diversity of errors presented in commercial off-the-shelf software require effective, scalable testing frameworks. Unfortunately, the current testing ecosystem is heavily fragmented, with the majority of toolchains targeting limited classes of errors and applications without offering provably strong guarantees. With software codebases continuously becoming more diverse and complex, the large-scale deployment of monolithic, non-adaptive analysis engines is likely to increase the aforementioned fragmentation. Instead, modern software testing requires adaptive, hybrid techniques that target errors selectively. This dissertation argues that adopting context-aware analyses will enable us to set the foundations for retargetable testing frameworks while further increasing the accuracy and extensibility of existing toolchains. To this end, we initially examine how compiler analyses can become context-aware, prioritizing certain errors over others of the same type. As a use case of our proposed approach, we extend a state-of-the-art compiler's integer error detection pipeline to suppress reports of benign errors by up to 89% in real-world workloads, while allowing for reporting of serious errors. Subsequently, we demonstrate how compiler-based instrumentation can be utilized by feedback-driven evolutionary fuzzers to provide multifaceted analyses targeting broader classes of bugs. In this direction, we present differential diversity (δ-diversity), we propose a generic methodology for offering state-aware guidance in feedback-driven frameworks, and we demonstrate how to retrofit state-of-the-art fuzzers to target broader classes of errors. We provide two such prototype implementations: NEZHA, the first differential generic fuzzer capable of handling logic bugs, as well as SlowFuzz, the first generic fuzzer targeting complexity vulnerabilities. We applied both prototypes on production software, and demonstrate their effectiveness. We found that NEZHA discovered hundreds of logic discrepancies across a wide variety of applications (SSL/TLS libraries, parsers, etc.), while SlowFuzz successfully generated inputs triggering slowdowns in complex, real-world software, including zip parsers, regular expression libraries, and hash table implementations.
76

ML4JIT- um arcabouço para pesquisa com aprendizado de máquina em compiladores JIT. / ML4JIT - a framework for research on machine learning in JIT compilers.

Alexandre dos Santos Mignon 27 June 2017 (has links)
Determinar o melhor conjunto de otimizações para serem aplicadas a um programa tem sido o foco de pesquisas em otimização de compilação por décadas. Em geral, o conjunto de otimizações é definido manualmente pelos desenvolvedores do compilador e aplicado a todos os programas. Técnicas de aprendizado de máquina supervisionado têm sido usadas para o desenvolvimento de heurísticas de otimização de código. Elas pretendem determinar o melhor conjunto de otimizações com o mínimo de interferência humana. Este trabalho apresenta o ML4JIT, um arcabouço para pesquisa com aprendizado de máquina em compiladores JIT para a linguagem Java. O arcabouço permite que sejam realizadas pesquisas para encontrar uma melhor sintonia das otimizações específica para cada método de um programa. Experimentos foram realizados para a validação do arcabouço com o objetivo de verificar se com seu uso houve uma redução no tempo de compilação dos métodos e também no tempo de execução do programa. / Determining the best set of optimizations to be applied in a program has been the focus of research on compile optimization for decades. In general, the set of optimization is manually defined by compiler developers and apply to all programs. Supervised machine learning techniques have been used for the development of code optimization heuristics. They intend to determine the best set of optimization with minimal human intervention. This work presents the ML4JIT, a framework for research with machine learning in JIT compilers for Java language. The framework allows research to be performed to better tune the optimizations specific to each method of a program. Experiments were performed for the validation of the framework with the objective of verifying if its use had a reduction in the compilation time of the methods and also in the execution time of the program.
77

Compiling Evaluable Functions in the Godel Programming Language

Shapiro, David 30 January 1996 (has links)
We present an extension of the Godel logic programming language code generator which compiles user-defined functions. These functions may be used as arguments in predicate or goal clauses. They are defined in extended Godel as rewrite rules. A translation scheme is introduced to convert function definitions into predicate clauses for compilation. This translation scheme and the compilation of functional arguments both employ leftmost-innermost narrowing. As function declarations are indistinguishable from constructor declarations, a function detection method is implemented. The ultimate goal of this research is the implementation of extended Godel using needed narrowing. The work presented here is an intermediate step in creating a functional-logic language which expands the expressiveness of logic programming and streamlines its execution.
78

A Parallelizing Compiler Based on Partial Evaluation

Surati, Rajeev 01 July 1993 (has links)
We constructed a parallelizing compiler that utilizes partial evaluation to achieve efficient parallel object code from very high-level data independent source programs. On several important scientific applications, the compiler attains parallel performance equivalent to or better than the best observed results from the manual restructuring of code. This is the first attempt to capitalize on partial evaluation's ability to expose low-level parallelism. New static scheduling techniques are used to utilize the fine-grained parallelism of the computations. The compiler maps the computation graph resulting from partial evaluation onto the Supercomputer Toolkit, an eight VLIW processor parallel computer.
79

Compiler Techniques For Code Size And Power Reduction For Embedded Processors

Sarvani, V V N S 06 1900 (has links) (PDF)
No description available.
80

A library for doing polyhedral operations

Wilde, Doran K. 06 December 1993 (has links)
Polyhedra are geometric representations of linear systems of equations and inequalities. Since polyhedra are used to represent the iteration domains of nested loop programs, procedures for operating on polyhedra can be used for doing loop transformations and other program restructuring transformations which are needed in parallelizing compilers. Thus a need for a library of polyhedral operations has recently been recognized in the parallelizing compiler community. Polyhedra are also used in the definition of domains of variables in systems of affine recurrence equations (SARE). ALPHA is a language which is based on the SARE formalism in which all variables are declared over polyhedral domains consisting of finite unions of polyhedra. This thesis describes a library of polyhedral functions which was developed to support the ALPHA langauge environment, and which is general enough to satisfy the needs of researchers doing parallelizing compilers. This thesis describes the data structures used to represent domains, gives the motivations for the major design decisions that were made in creating the library, and presents the algorithms used for doing polyhedral operations. A new algorithm for recursively generating the face lattice of a polyhedron is also presented. This library has been written and tested, and has be in use since the first quarter of 1993. It is used by research facilities in Europe and Canada which do research in parallelizing compilers and systolic array synthesis. The library is freely distributed by ftp. / Graduation date: 1994

Page generated in 0.0415 seconds