• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 194
  • 33
  • 31
  • 16
  • 11
  • 10
  • 6
  • 6
  • 5
  • 3
  • 3
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 366
  • 132
  • 80
  • 72
  • 50
  • 45
  • 42
  • 40
  • 39
  • 36
  • 34
  • 34
  • 33
  • 31
  • 30
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
161

Compiler Directed Codesign for FPGA-based Embedded Systems

Hauff, Martin Anthony, marty@extendabilities.com.au January 2008 (has links)
As embedded systems designers increasingly turn to programmable logic technologies in place of off-the-shelf microprocessors, there is a growing interest in the development of optimised custom processing cores that can be designed on a per-application basis. FPGAs blur the traditional distinction between hardware and software and offer the promise of application specific hardware acceleration. But realizing this in a general sense requires a significant departure from traditional embedded systems development flows. Whereas off-the-shelf processors have a fixed architecture, the same cannot be said of purpose-built FPGA-based processors. With this freedom comes the challenge of empirically determining the optimal boundary point between hardware and software. The fluidity of the hardware/software partition also poses an interesting challenge for compiler developers. This thesis presents a tool and methodology that addresses these codesign challenges in a new way. Described as 'compiler-directed codesign', it makes use of a suitably modified compiler to help direct the development of a custom processor core on a per-application basis. By exposing the compiler's internal representation of a compiled target program, visibility into those instructions, and hardware resources, that are most sought after by the compiler can be gained. This information is then used to inform further processor development and to determine the optimal partition between hardware and software. At each design iteration, the machine model is updated to reflect the available hardware resources, the compiler is rebuilt, and the target application is compiled once again. By including the compiler 'in-the-loop' of custom processor design, developers can accurately quantify the impact on performance caused by the addition or removal of specific hardware resources and iteratively converge on an optimal solution. Compiler Directed Codesign has advantages over existing codesign methodologies because it offers both a concrete point from which to begin the partitioning process as well as providing quantifiable and rapid feedback of the merits of different partitioning choices. When applied to an Adaptive PCM Encoder/Decoder case study, the Compiler Directed Codesign technique yielded a custom processor core that was between 36% and 73% smaller, consumed between 11% to 19% less memory, and performed up to 10X faster than comparable general-purpose FPGA-based processor cores. The conclusion of this work is that a suitably modified compiler can serve a valuable role in directing hardware/software partitioning on a per-application basis.
162

Debugging Equation-Based Languages in OpenModelica Environment

Sjöholm, Klas January 2009 (has links)
<p>The need for debugging tools for declarative programming languages has increased due to the rapid development of modeling and simulation tools/programs. Declarative equation-based programming languages have the problem of equation systems being over-, or under-constrained. This means that the system of equations has more equations than variables or more variables than equations respectively, making the system of equations unsolvable. In this study a static debugger is implemented in OpenModelica compiler for the equation-based programming language Modelica to make it easier for the programmer or modeler to locate the equation/s causing the unconstrained system of equations. The debugging techniques used by the debugger are developed by Peter Bunus. Those techniques are able to detect unconstrained systems of equations and give solutions by identifying the minimal set ofequation/s that should be removed or which variable/s should be added to an equation/s to make the system solvable. In this study the debugging techniques for detecting and giving a solution for over-constrained system of equations are shown suitable to be used for the programming language Modelica in the OpenModelica compiler.</p>
163

En optimierande kompilator för SMV till CLP(B) / An optimising SMV to CLP(B) compiler

Asplund, Mikael January 2005 (has links)
<p>This thesis describes an optimising compiler for translating from SMV to CLP(B). The optimisation is aimed at reducing the number of required variables in order to decrease the size of the resulting BDDs. Also a partitioning of the transition relation is performed. The compiler uses an internal representation of a FSM that is built up from the SMV description. A number of rewrite steps are performed on the problem description such as encoding to a Boolean domain and performing the optimisations. </p><p>The variable reduction heuristic is based on finding sub-circuits that are suitable for reduction and a state space search is performed on those groups. An evaluation of the results shows that in some cases the compiler is able to greatly reduce the size of the resulting BDDs.</p>
164

Compiling the parallel programming language NestStep to the CELL processor

Holm, Magnus January 2010 (has links)
<p>The goal of this project is to create a source-to-source compiler which will translate NestStep code to C code. The compiler's job is to replace NestStep constructs with a series of function calls to the NestStep runtime system. NestStep is a parallel programming language extension based on the BSP model. It adds constructs for parallel programming on top of an imperative programming language. For this project, only constructs extending the C language are relevant. The output code will compile to form an executable program that runs on the multicore processor Cell Broadband Engine (Cell BE). The NestStep runtime system has been ported to the Cell BE and is available from start of this project.</p>
165

Transactions Everywhere

Kuszmaul, Bradley C., Leiserson, Charles E. 01 1900 (has links)
Arguably, one of the biggest deterrants for software developers who might otherwise choose to write parallel code is that parallelism makes their lives more complicated. Perhaps the most basic problem inherent in the coordination of concurrent tasks is the enforcing of atomicity so that the partial results of one task do not inadvertently corrupt another task. Atomicity is typically enforced through locking protocols, but these protocols can introduce other complications, such as deadlock, unless restrictive methodologies in their use are adopted. We have recently begun a research project focusing on transactional memory [18] as an alternative mechanism for enforcing atomicity, since it allows the user to avoid many of the complications inherent in locking protocols. Rather than viewing transactions as infrequent occurrences in a program, as has generally been done in the past, we have adopted the point of view that all user code should execute in the context of some transaction. To make this viewpoint viable requires the development of two key technologies: effective hardware support for scalable transactional memory, and linguistic and compiler support. This paper describes our preliminary research results on making “transactions everywhere” a practical reality. / Singapore-MIT Alliance (SMA)
166

Run-time optimization of adaptive irregular applications

Yu, Hao 15 November 2004 (has links)
Compared to traditional compile-time optimization, run-time optimization could offer significant performance improvements when parallelizing and optimizing adaptive irregular applications, because it performs program analysis and adaptive optimizations during program execution. Run-time techniques can succeed where static techniques fail because they exploit the characteristics of input data, programs' dynamic behaviors, and the underneath execution environment. When optimizing adaptive irregular applications for parallel execution, a common observation is that the effectiveness of the optimizing transformations depends on programs' input data and their dynamic phases. This dissertation presents a set of run-time optimization techniques that match the characteristics of programs' dynamic memory access patterns and the appropriate optimization (parallelization) transformations. First, we present a general adaptive algorithm selection framework to automatically and adaptively select at run-time the best performing, functionally equivalent algorithm for each of its execution instances. The selection process is based on off-line automatically generated prediction models and characteristics (collected and analyzed dynamically) of the algorithm's input data, In this dissertation, we specialize this framework for automatic selection of reduction algorithms. In this research, we have identified a small set of machine independent high-level characterization parameters and then we deployed an off-line, systematic experiment process to generate prediction models. These models, in turn, match the parameters to the best optimization transformations for a given machine. The technique has been evaluated thoroughly in terms of applications, platforms, and programs' dynamic behaviors. Specifically, for the reduction algorithm selection, the selected performance is within 2% of optimal performance and on average is 60% better than "Replicated Buffer," the default parallel reduction algorithm specified by OpenMP standard. To reduce the overhead of speculative run-time parallelization, we have developed an adaptive run-time parallelization technique that dynamically chooses effcient shadow structures to record a program's dynamic memory access patterns for parallelization. This technique complements the original speculative run-time parallelization technique, the LRPD test, in parallelizing loops with sparse memory accesses. The techniques presented in this dissertation have been implemented in an optimizing research compiler and can be viewed as effective building blocks for comprehensive run-time optimization systems, e.g., feedback-directed optimization systems and dynamic compilation systems.
167

Design and remote control of a Gantry mechanism for the SCARA robot

Surinder Pal, 15 May 2009 (has links)
Remote experimentation and control have led researchers to develop new technologies as well as implement existing techniques. The multidisciplinary nature of research in electromechanical systems has led to the synergy of mechanical engineering, electrical engineering and computer science. This work describes the design of a model of a Gantry Mechanism, which maneuvers a web-cam. The user controls virtually the position of end-effecter of the Gantry Mechanism using a Graphical User Interface. The GUI is accessed over the Internet. In order to reduce the unbalanced vibrations of the Gantry Mechanism, we investigate the development of an algorithm of input shaping. A model of the Gantry Mechanism is built, and it is controlled over the Internet to view experimentation of the SCARA Robot. The system performance is studied by comparing the inputs such as distances and angles with outputs, and methods to improve the performance are suggested.
168

NoGAP: Novel Generator of Accelerators and Processors

Karlström, Per Axel January 2010 (has links)
ASIPs are needed to handle the future demand of flexible yet highperformance embedded computing. The flexibility of ASIPs makes them preferable over fixed function ASICs. Also, a well designed ASIP, has a power consumption comparable to ASICs.  However the cost associated with ASIP design is a limiting factor for a more wide spread adoption. A number of different tools have been proposed, promising to ease this design process. However all of the current state of the art tools limits the designer due to a template based design process. It blocks design freedoms and limits the I/O bandwidth of the template. We have therefore proposed the Novel Generator of Accelerator and Processors (NoGAP). NoGAP is a design automation tool for ASIP andaccelerator design that puts very few limits on what can be designed, yet NoGAP gives support by automating much of the tedious anderror prone tasks associated with ASIP design. This thesis will present NoGAP and much of its key concepts. Such as; the NoGAP-CL) which is a language used to implement processors in NoGAP, This thesis exposes NoGAP's key technologies, which include automatic bus and wire sizing, instruction decoder and pipeline management, how PC-FSMs can be generated, how an assembler can be generated, and how cycle accurate simulators can be generated. We have so far proven NoGAP's strengths in three extensive case studies, in one a floating point pipelined data path was designed, in another a simple RISC processor was designed, and finally one advanced RISC style DSP was designed using NoGAP. All these case studies points to the same conclusion, that NoGAP speeds up development time, clarify complex pipeline architectures, retains design flexibility, and most importantly does not incur much performance penalty, compared to hand optimized RTL code. We belive that the work presented in this thesis shows that NoGAP, using our proposed novel approach to micro architecture design, can have a significant impact on both academic and industrial hardware design. To our best knowledge NoGAP is the first system that has demonstrated that a template free processor construction framework can be developed and generate high performance hardware solutions. / NoGAP
169

Automatic Task Formation Techniques for the Multi-level Computing Architecture

Stewart, Kirk 30 July 2008 (has links)
The Multi-Level Computing Architecture (MLCA) is a multiprocessor system-on-chip architecture designed for multimedia applications. It provides a programming model that simplifies the process of writing parallel applications by eliminating the need for explicit synchronization. However, developers must still invest effort to design applications that fully exploit the MLCA’s multiprocessing capabilities. We present a set of compiler techniques to streamline the process of developing applications for the MLCA. We present an algorithm to automatically partition a sequential application into tasks that can be executed in parallel. We also present code generation algorithms to translate annotated, sequential C code to the MLCA’s programming model. We provide an experimental evaluation of these techniques, performed with a prototype compiler based upon the open-source ORC compiler and integrated with the MLCA Optimizing Compiler. This evaluation shows that the performance of automatically generated code compares favourably to that of manually written code.
170

Directive-based General-purpose GPU Programming

Han, Tian Yi David 19 January 2010 (has links)
Graphics Processing Units (GPUs) have become a competitive accelerator for non-graphics applications, mainly driven by the improvements in GPU programmability. Although the Compute Unified Device Architecture (CUDA) is a simple C-like interface for programming NVIDIA GPUs, porting applications to CUDA remains a challenge to average programmers. In particular, CUDA places on the programmer the burden of packaging GPU code in separate functions, of explicitly managing data transfer between the host and GPU memories, and of manually optimizing the utilization of the GPU memory. We have designed hiCUDA, a high-level directive-based language for CUDA programming. It allows programmers to perform these tedious tasks in a simpler manner, and directly to the sequential code. We have also prototyped a compiler that translates a hiCUDA program to a CUDA program and can handle real-world applications. Experiments using seven standard CUDA benchmarks show that the simplicity hiCUDA provides comes at no expense to performance.

Page generated in 0.0286 seconds