• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 119
  • 37
  • 28
  • 7
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 294
  • 179
  • 121
  • 102
  • 100
  • 68
  • 47
  • 42
  • 40
  • 40
  • 40
  • 37
  • 36
  • 35
  • 35
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Specialising dynamic techniques for implementing the Ruby programming language

Seaton, Christopher Graham January 2015 (has links)
The Ruby programming language is dynamically typed, uses dynamic and late bound dispatch for all operators, method calls and many control structures, and provides extensive metaprogramming and introspective tooling functionality. Unlike other languages where these features are available, in Ruby their use is not avoided and key parts of the Ruby ecosystem use them extensively, even for inner-loop operations. This makes a high-performance implementation of Ruby problematic. Existing implementations either do not attempt to dynamically optimise Ruby programs, or achieve relatively limited success in optimising Ruby programs containing these features. One way that the community has worked around the limitations of existing Ruby implementations is to write extension modules in the C programming language. These are statically compiled and then dynamically linked into the Ruby implementation. Compared to equivalent Ruby, this C code is often more efficient for computationally intensive code. However the interface that these C extensions provides is defined by the non-optimising reference implementation of Ruby. Implementations which want to optimise by using different internal representations must do extensive copying to provide the same interface. This then limits the performance of the C extensions in those implementations. This leaves Ruby in the difficult position where it is not only difficult to implement the language efficiently, but the previous workaround for that problem, C extensions, also limits efforts to improve performance. This thesis describes an implementation of the Ruby programming language which embraces the Ruby language and optimises specifically for Ruby as it is used in practice. It provides a high performance implementation of Ruby's dynamic features, at the same time as providing a high performance implementation of C extensions. The implementation provides a high level of compatibility with existing Ruby implementations and does not limit the available features in order to achieve high performance. Common to all the techniques that are described in this thesis is the concept of specialisation. The conventional approach taken to optimise a dynamic language such as Ruby is to profile the program as it runs. Feedback from the profiling can then be used to specialise the program for the data and control flow it is actually experiencing. This thesis extends and advances that idea by specialising for conditions beyond normal data and control flow. Programs that call a method, or lookup a variable or constant by dynamic name rather than literal syntax can be specialised for the dynamic name by generalising inline caches. Debugging and introspective tooling is implemented by specialising the code for debug conditions such as the presence of a breakpoint or an attached tracing tool. C extensions are interpreted and dynamically optimised rather than being statically compiled, and the interface which the C code is programmed against is provided as an abstraction over the underlying implementation which can then independently specialise. The techniques developed in this thesis have a significant impact on performance of both synthetic benchmarks and kernels from real-world Ruby programs. The implementation of Ruby which has been developed achieves an order of magnitude or better increase in performance compared to the next-best implementation. In many cases the techniques are 'zero-overhead', in that the generated machine code is exactly the same for when the most dynamic features of Ruby are used, as when only static features are used.
62

VPI PROLOG compiler project report

Deighan, John 26 January 2010 (has links)
see document / Master of Science
63

Compile-time and Run-time Optimizations for Enhancing Locality and Parallelism on Multi-core and Many-core Systems

Baskaran, Muthu Manikandan 05 November 2009 (has links)
No description available.
64

Design and Analysis of an Instrumenting Profiler for WebAssembly

Gifford, Chandler 01 June 2019 (has links) (PDF)
This thesis presents the design, implementation, and analysis of WasmProf, an instrumenting profiler for WebAssembly programs. WebAssembly is a compiled language designed for use on the web that, at the time of this writing, is still being actively developed. At present, performance analysis for WebAssembly programs mostly consists of browsers’ built-in sampling profilers. These profilers work well in many cases but only give a statistical estimation of the distribution of function calls and are, therefore, not well-suited for more fine-grained analysis. The WasmProf instrumenting profiler fills this analysis gap. WasmProf is capable of tracking the number of calls made and the time spent in every function called within the profiled program. Analysis of WasmProf demonstrates performance equivalent to or slightly better than similar tools that perform instrumentation and dynamic analysis on WebAssembly programs.
65

Cu2cl: a Cuda-To-Opencl Translator for Multi- and Many-Core Architectures

Martinez Arroyo, Gabriel Ernesto 02 September 2011 (has links)
The use of graphics processing units (GPUs) in high-performance parallel computing continues to steadily become more prevalent, often as part of a heterogeneous system. For years, CUDA has been the de facto programming environment for nearly all general-purpose GPU (GPGPU) applications. In spite of this, the framework is available only on NVIDIA GPUs, traditionally requiring reimplementation in other frameworks in order to utilize additional multi- or many-core devices. On the other hand, OpenCL provides an open and vendor-neutral programming environment and run-time system. With implementations available for CPUs, GPUs, and other types of accelerators, OpenCL therefore holds the promise of a "write once, run anywhere" ecosystem for heterogeneous computing. Given the many similarities between CUDA and OpenCL, manually porting a CUDA application to OpenCL is almost straightforward, albeit tedious and error-prone. In response to this issue, we created CU2CL, an automated CUDA-to-OpenCL source-to-source translator that possesses a novel design and clever reuse of the Clang compiler framework. Currently, the CU2CL translator covers the primary constructs found in the CUDA Runtime API, and we have successfully translated several applications from the CUDA SDK and Rodinia benchmark suite. CU2CL's translation times are reasonable, allowing for many applications to be translated at once. The number of manual changes required after executing our translator on CUDA source is minimal, with some compiling and working with no changes at all. The performance of our automatically translated applications via CU2CL is on par with their manually ported counterparts. / Master of Science
66

A Neural Network Configuration Compiler Based on the Adaptrode Neuronal Model

McMichael, Lonny D. (Lonny Dean) 12 1900 (has links)
A useful compiler has been designed that takes a high level neural network specification and constructs a low level configuration file explicitly specifying all network parameters and connections. The neural network model for which this compiler was designed is the adaptrode neuronal model, and the configuration file created can be used by the Adnet simulation engine to perform network experiments. The specification language is very flexible and provides a general framework from which almost any network wiring configuration may be created. While the compiler was created for the specialized adaptrode model, the wiring specification algorithms could also be used to specify the connections in other types of networks.
67

Towards a Complete Formal Semantics of Rust

White, Alexa 01 March 2021 (has links) (PDF)
Rust is a relatively new programming language with a unique memory model designed to provide the ease of use of a high-level language as well as the power and control of a low-level language while preserving memory safety. In order to prove the safety and correctness of Rust and to provide analysis tools for its use cases, it is necessary to construct a formal semantics of the language. Existing efforts to construct such a semantic model are limited in their scope and none to date have successfully captured the complete functionality of the language. This thesis focuses on the K-Rust implementation, which is implemented in a rewrite-based semantic framework called K, and expands it to include a larger subset of the Rust language. The K framework allows Rust programs to be executed by the defined semantic model, and the implementation is tested with several Rust programs by comparing the results of execution to the Rust compiler itself.
68

JITed: A Framework for JIT Education in the Classroom

Watts, Caleb 01 December 2021 (has links) (PDF)
The study of programming languages is a rich field within computer science, incorporating both the abstract theoretical portions of computer science and the platform specific details. Topics studied in programming languages, chiefly compilers or interpreters, are permanent fixtures in programming that students will interact with throughout their career. These systems are, however, considerably complicated, as they must cover a wide range of functionality in order to enable languages to be created and run. The process of educating students thus requires that the demanding workload of creating one of the systems be balanced against the time and resources present in a university classroom setting. Systems building upon these fundamental systems can become out of reach when the number of preceding concepts and thus classes are taken into account. Among these is the study of just-in-time (JIT) compilers, which marry the processes of interpreters and compilers for the purposes of a flexible and fast runtime. The purpose of this thesis is to present JITed, a framework within which JIT compilers can be developed with a time commitment and workload befitting of a classroom setting, specifically one as short as ten weeks. A JIT compiler requires the development of both an interpreter and a compiler. This poses a problem, as classes teaching compilers and interpreters typically feature the construction of one of those systems as their term project. This makes the construction of both within the same time span as is usually allotted for a single system infeasible. To remedy this, JITed features a prebuilt interpreter, that provides the runtime environment necessary for the compiler portion of a JIT compiler to be built. JITed includes an interface for students to provide both their own compiler and the functionality to determine which portions of code should be compiled. The framework allows for important concepts of both compilers in general and JIT compilers to be taught in a reasonable timeframe.
69

Customising compilers for customisable processors

Murray, Alastair Colin January 2012 (has links)
The automatic generation of instruction set extensions to provide application-specific acceleration for embedded processors has been a productive area of research in recent years. There have been incremental improvements in the quality of the algorithms that discover and select which instructions to add to a processor. The use of automatic algorithms, however, result in instructions which are radically different from those found in conventional, human-designed, RISC or CISC ISAs. This has resulted in a gap between the hardware’s capabilities and the compiler’s ability to exploit them. This thesis proposes and investigates the use of a high-level compiler pass that uses graph-subgraph isomorphism checking to exploit these complex instructions. Operating in a separate pass permits techniques to be applied that are uniquely suited for mapping complex instructions, but unsuitable for conventional instruction selection. The existing, mature, compiler back-end can then handle the remainder of the compilation. With this method, the high-level pass was able to use 1965 different automatically produced instructions to obtain an initial average speed-up of 1.11x over 179 benchmarks evaluated on a hardware-verified cycle-accurate simulator. This result was improved following an investigation of how the produced instructions were being used by the compiler. It was established that the models the automatic tools were using to develop instructions did not take account of how well the compiler could realistically use them. Adding additional parameters to the search heuristic to account for compiler issues increased the speed-up from 1.11x to 1.24x. An alternative approach using a re-designed hardware interface was also investigated and this achieved a speed-up of 1.26x while reducing hardware and compiler complexity. A complementary, high-level, method of exploiting dual memory banks was created to increase memory bandwidth to accommodate the increased data-processing bandwidth provided by extension instructions. Finally, the compiler was considered for use in a non-conventional role where rather than generating code it is used to apply source-level transformations prior to the generation of extension instructions and thus affect the shape of the instructions that are generated.
70

Machine learning based mapping of data and streaming parallelism to multi-cores

Wang, Zheng January 2011 (has links)
Multi-core processors are now ubiquitous and are widely seen as the most viable means of delivering performance with increasing transistor densities. However, this potential can only be realised if the application programs are suitably parallel. Applications can either be written in parallel from scratch or converted from existing sequential programs. Regardless of how applications are parallelised, the code must be efficiently mapped onto the underlying platform to fully exploit the hardware’s potential. This thesis addresses the problem of finding the best mappings of data and streaming parallelism—two types of parallelism that exist in broad and important domains such as scientific, signal processing and media applications. Despite significant progress having been made over the past few decades, state-of-the-art mapping approaches still largely rely upon hand-crafted, architecture-specific heuristics. Developing a heuristic by hand, however, often requiresmonths of development time. Asmulticore designs become increasingly diverse and complex, manually tuning a heuristic for a wide range of architectures is no longer feasible. What are needed are innovative techniques that can automatically scale with advances in multi-core technologies. In this thesis two distinct areas of computer science, namely parallel compiler design and machine learning, are brought together to develop new compiler-based mapping techniques. Using machine learning, it is possible to automatically build highquality mapping schemes, which adapt to evolving architectures, with little human involvement. First, two techniques are proposed to find the best mapping of data parallelism. The first technique predicts whether parallel execution of a data parallel candidate is profitable on the underlying architecture. On a typical multi-core platform, it achieves almost the same (and sometimes a better) level of performance when compared to the manually parallelised code developed by independent experts. For a profitable candidate, the second technique predicts how many threads should be used to execute the candidate across different program inputs. The second technique achieves, on average, over 96% of the maximum available performance on two different multi-core platforms. Next, a new approach is developed for partitioning stream applications. This approach predicts the ideal partitioning structure for a given stream application. Based on the prediction, a compiler can rapidly search the program space (without executing any code) to generate a good partition. It achieves, on average, a 1.90x speedup over the already tuned partitioning scheme of a state-of-the-art streaming compiler.

Page generated in 0.049 seconds