• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 3
  • 1
  • Tagged with
  • 17
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

High performance optimizations in runtime speculative parallelization for multicore architectures

Yiapanis, Paraskevas January 2013 (has links)
Thread-Level Speculation (TLS) overcomes limitations intrinsic with conservative compile-time auto-parallelizing tools by extracting parallel threads optimistically and only ensuring absence of data dependence violations at runtime. A significant barrier for adopting TLS (implemented in software) is the overheads associated with maintaining speculative state. Previous TLS limit studies observe that on future multi-core systems it is likely to have more cores idle than those which traditional TLS would be able to harness. This thesis describes a novel compact version management data structure optimized for space overhead when using a small number of TLS threads. Furthermore, two novel software runtime parallelization systems were developed that utilize this compact data structure. The first one, MiniTLS, is optimized for fast recovery in the case of misspeculations by parallelizing the recovery procedure. The second one, Lector, is optimizedfor performance by using lightweight helper threads, along with TLS threads, to establish whether speculation can be withdrawn avoiding that way any speculative overheads. Facilitated by the novel compact representation, MiniTLS reduces the space overhead over state-of-the-art software TLS systems between 96% on 2 threads and 40% on 32 threads. MiniTLS and Lector were applied to seven Java benchmarks performing on average 7x and 8.2x faster, respectively, against the sequential versions and on average 1.7x faster than the current state-of-the-art in software TLS for 32 threads.
2

An executable formal semantics of PHP with applications to program analysis

Filaretti, Daniele January 2015 (has links)
Nowadays, many important activities in our lives involve the web. However, the software and protocols on which web applications are based were not designed with the appropriate level of security in mind. Many web applications have reached a level of complexity for which testing, code reviews and human inspection are no longer sufficient quality-assurance guarantees. Tools that employ static analysis techniques are needed in order to explore all possible execution paths through an application and guarantee the absence of undesirable behaviours. To make sure that an analysis captures the properties of interest, and to navigate the trade-offs between efficiency and precision, it is necessary to base the design and the development of static analysis tools on a firm understanding of the language to be analysed. When this underlying knowledge is missing or erroneous, tools can't be trusted no matter what advanced techniques they use to perform their task. In this Thesis, we introduce KPHP, the first executable formal semantics of PHP, one of the most popular languages for server-side web programming. Then, we demonstrate its practical relevance by developing two verification tools, of increasing complexity, on top of it - a simple verifier based on symbolic execution and LTL model checking and a general purpose, fully configurable and extensible static analyser based on Abstract Interpretation. Our LTL-based tool leverages the existing symbolic execution and model checking support offered by K, our semantics framework of choice, and constitutes a first proof-of-concept of the usefulness of our semantics. Our abstract interpreter, on the other hand, represents a more significant and novel contribution to the field of static analysis of dynamic scripting languages (PHP in particular). Although our tool is still a prototype and therefore not well suited for handling large real-world codebases, we demonstrate how our semantics-based, principled approach to the development of verification tools has lead to the design of static analyses that outperform existing tools and approaches, both in terms of supported language features, precision, and breadth of possible applications.
3

Verification and validation of JavaScript

Xiong, Wei January 2013 (has links)
JavaScript is a prototype-based, dynamically typed language with scope chains and higher-order functions. Third party web applications embedded in web pages rely on JavaScript to run inside every browser. Because of its dynamic nature, a JavaScript program is easily exploited by malicious manipulations and safety breach attacks. Therefore, it is highly desirable when developing a JavaScript application to be able to verify that it meets its expected specification and that it is safe. One of the challenges in achieving this objective is that it is hard to statically keep track of the heap-manipulating JavaScript program due to the mutability of data structures. This thesis focuses on developing a verification framework for both functional correctness and safety of JavaScript programs that involve heap-based data structures. Two automated inference-based verification frameworks are constructed based upon a variant of separation logic. The first framework defines a suitable subset of JavaScript, together with a set of operational semantics rules, a specification language and a set of inference rules. Furthermore, an axiomatic framework is presented to discover both pre/post-conditions of a JavaScript program. Hoare-style specification {Pre}prog{Post}, where program prog contains the language statements. The problem of verifying program can be reduced to the problem of proving that the execution of the statements meets the derived specification language. The second framework increases the expressiveness of the subset language to include this that can cause safety issues in JavaScript programs. It revises the operational rules and inference rules to manipulate the newly added feature. Furthermore, a safety verification algorithm is defined. Both verification frameworks have been proved sound, and the results ob- tained from evaluations validate the feasibility and precision of proposed approaches. The outcomes of this thesis confirm that it is possible to anal- yse heap-manipulating JavaScript programs automatically and precisely to discover unsafe programs.
4

Architectural patterns for Parallel Programming : models for performance estimation

Ortega-Arjona, J. L. January 2007 (has links)
Parallel Programming relies on the coordination of computing resources, so that they simultaneously work towards a common objective. Achieving this requires extra effort from the software designer, because of the increased complexity involved. Furthermore, as Parallel Programming is considered a means to improve performance, the software designer has to consider sophisticated and cost-effective practices and techniques for performance measurement and analysis. In particular, it is of great interest to obtain performance information during design stages and before implementation, since this enables the software developer to select the organisation of computations and communications between components. The Architectural Performance Modelling Method is presented as a criteria for selecting the organisation of a parallel program based on estimating its probable per formance. By considering a parallel program as an instance of a software architecture, it can be described in terms of interacting software components. Such components can be classified depending on their particular objective and their rate of change, for example, as components associated with the hardware and software environment (or Platform), components representing the fundamental structural organisation for execution and communication (or Coordination), and so on. The performance of a parallel program can be estimated as the result of the contribution of each one of those kinds of components. An Architectural Performance Model is based on selecting from the Architectural Patterns for Parallel Programming (descriptions of coodinations commonly used in Parallel Programming), a component simulator (representing a simulation of a processing component's behaviour), and a performance analysis of parallel applications (in which the information on system performance is examined). Parallel programs simulated using the Architectural Performance Modelling Method range from a complete parallel pro gram to a partially implemented program design. The simulation of parallel systems, using the information about the problem to be solved, the available resources, and architectural patterns describing overall coordinations of the parallel programs, makes it possible to identify the best performing architectural solution for the system being built.
5

Improving high performance computing using code generation and compilation techniques

Bercea, Gheorghe-Teodor January 2017 (has links)
In an ideal world, scientific applications would be expressed as high-level compositions of abstractions that encapsulate parallelism and deliver near-optimal performance with low maintainability costs. The alternative, where such abstractions are unavailable, is for application programmers to control execution using an appropriate explicitly parallel programming model. In this thesis we explore both approaches, represented by the Firedrake framework and the OpenMP programming model respectively. We also explore how OpenMP can support high level abstractions such as Firedrake. Firedrake is designed as a composition of domain-specific abstractions for solving partial differential equations via the finite element method. We extend Firedrake with support for extruded meshes frequently used in geophysical simulations. We introduce algorithms for numbering and iterating over any discretization supported by an extruded mesh. Starting with version 4.0, OpenMP computations, previously intended exclusively for the CPU, can be offloaded to accelerators and coprocessors. We introduce code generation schemes for offloading single and nested OpenMP parallel constructs in the CLANG/LLVM toolchain. The schemes map OpenMP directives to the hardware model of the accelerator enabling the programmer to use OpenMP in a prescriptive way. Performance is evaluated on the extruded mesh extensions to Firedrake as well as on LULESH, a widely ported proxy application intended to be representative of an important portion of Department of Energy’s scientific codes. In the case of Firedrake, performance is shown to reach significant percentages of theoretical hardware limits. For OpenMP, the runtime is compared against hand-optimized implementations employing the accelerator-specific CUDA C/C++ language extensions. The additions to the Firedrake framework combine both approaches into a single toolchain containing a newly introduced OpenMP 4.0 Firedrake backend with functionality equivalent to all existing Firedrake backends. OpenMP 4.0 is used as a single representation for both CPU and GPU platforms thus simplifying the application of target-specific optimisations. The OpenMP 4.0 backend improves maintainability through code reuse and will deliver gains in portability as offloading support in CLANG advances.
6

Lightweight speculative support for aggressive auto-parallelisation tools

Powell, Daniel Christopher January 2015 (has links)
With the recent move to multi-core architectures it has become important to create the means to exploit the performance made available to us by these architectures. Unfortunately parallel programming is often a difficult and time-intensive process, even to expert programmers. Auto-parallelisation tools have aimed to fill the performance gap this has created, but static analysis commonly employed by such tools are unable to provide the performance improvements required due to lack of information at compile-time. More recent aggressive parallelisation tools use profiled-execution to discover new parallel opportunities, but these tools are inherently unsafe. They require either manual confirmation that their changes are safe, completely ruling out auto-parallelisation, or they rely upon speculative execution such as software thread-level speculation (SW-TLS) to confirm safe execution at runtime. SW-TLS schemes are currently very heavyweight and often fail to provide speedups for a program. Performance gains are dependent upon suitable parallel opportunities, correct selection and configuration, and appropriate execution platforms. Little research has been completed into the automated implemention of SW-TLS programs. This thesis presents an automated, machine-learning based technique to select and configure suitable speculation schemes when appropriate. This is performed by extracting metrics from potential parallel opportunities and using them to determine if a loop is suitable for speculative execution and if so, which speculation policy should be used. An extensive evaluation of this technique is presented, verifying that SW-TLS configuration can indeed be automated and provide reliable performance gains. This work has shown that on an 8-core machine, up to 7.75X and a geometric mean of 1.64X speedups can be obtained through automatic configuration, providing on average 74% of the speedup obtainable through manual configuration. Beyond automated configuration, this thesis explores the idea that many SW-TLS schemes focus too heavily on recovery from detecting a dependence violation. Doing so often results in worse than sequential performance for many real-world applications, therefore this work hypothesises that for many highly-likely parallel candidates, discovered through aggressive parallelisation techniques, would benefit from a simple dependence check without the ability to roll back. Dependence violations become extremely expensive in this scenario, however this would be incredibly rare. With a thorough evaluation of the technique this thesis confirms the hypothesis whilst achieving speedups of up to 22.53X, and a geometric mean of 2.16X on a 32-core machine. In a competitive scheduling scenario performance loss can be restricted to at least sequential speeds, even when a dependence has been detected. As a means to lower costs further this thesis explores other platforms to aid in the execution of speculative error checking. Introduced is the use of a GPU to offload some of the costs to during execution that confirms that using an auxiliary device is a legitimate means to obtain further speedup. Evaluation demonstrates that doing so can achieve up to 14.74X and a geometric mean of 1.99X speedup on a 12-core hyperthreaded machine. Compared to standard CPU-only techniques this performs slightly slower with a geometric mean of 0.96X speedup, however this is likely to improve with upcoming GPU designs. With the knowledge that GPU’s can be used to reduce speculation costs, this thesis also investigates their use to speculatively improve execution times also. Presented is a novel SW-TLS scheme that targets GPU-based execution for use with aggressive auto-parallelisers. This scheme is executed using a competitive scheduling model, ensuring performance is no lower than sequential execution, whilst being able to provide speedups of up to 99X and on average 3.2X over sequential. On average this technique outperformed static analysis alone by a factor of 7X and achieved approximately 99% of the speedup obtained from manual parallel implementations and outperformed the state-of-the-art in GPU SW-TLS by a factor of 1.45.
7

Pre-emptive type checking in dynamically typed programs

Grech, Neville January 2013 (has links)
With the rise of languages such as JavaScript, dynamically typed languages have gained a strong foothold in the programming language landscape. These languages are very well suited for rapid prototyping and for use with agile programming methodologies. However, programmers would benefit from the ability to detect type errors in their code early, without imposing unnecessary restrictions on their programs. Here we describe a new type inference system that identifies potential type errors through a flow-sensitive static analysis. This analysis is invoked at a very late stage, after the compilation to bytecode and initialisation of the program. It computes for every expression the variable’s present (from the values that it has last been assigned) and future (with which it is used in the further program execution) types, respectively. Using this information, our mechanism inserts type checks at strategic points in the original program. We prove that these checks, inserted as early as possible, preempt type errors earlier than existing type systems. We further show that these checks do not change the semantics of programs that do not raise type errors. Preemptive type checking can be added to existing languages without the need to modify the existing runtime environment. We show this with an implementation for the Python language and demonstrate its effectiveness on a number of benchmarks.
8

Verification of message passing concurrent systems

D'Osualdo, Emanuele January 2015 (has links)
This dissertation is concerned with the development of fully-automatic methods of verification, for message-passing based concurrent systems. In the first part of the thesis we focus on Erlang, a dynamically typed, higher-order functional language with pattern-matching algebraic data types extended with asynchronous message-passing. We define a sound parametric control-flow analysis for Erlang, which we use to bootstrap the construction of an abstract model that we call Actor Communicating System (ACS). ACS are given semantics by means of Vector Addition Systems (VAS), which have rich decidable properties. We exploit VAS model checking algorithms to prove properties of Erlang programs such as unreachability of error states, mutual exclusion, or bounds on mailboxes. To assess the approach empirically, we constructed Soter, a prototype implementation of the verification method, thereby obtaining the first fully-automatic, infinite-state model checker for a core concurrent fragment of Erlang. The second part of the thesis addresses one of the major sources of imprecision in the ACS abstraction: process identities. To study the problem of algorithmically verifying models where process identities are accurately represented we turn to the π-calculus, a process algebra based around the notion of name and mobility. The full π-calculus is Turing-powerful so we focus on the depth-bounded fragment introduced by Roland Meyer, which enjoys decidability of some verification problems. The main obstacle in using depth-bounded terms as a target abstract model, is that depth-boundedness of arbitrary π-terms is undecidable. We therefore consider the problem of identifying a fragment of depth-bounded π-calculus for which membership is decidable. We define the first such fragment by means of a novel type system for the π-calculus. Typable terms are ensured to be depth-bounded. Both type-checking and type inference are shown to be decidable. The constructions are based on the novel notion of Τ-compatibility, which imposes a hierarchy between names. The type system's main goal is proving that this hierarchy is preserved under reduction, even in the presence of unbounded name creation and mobility.
9

Automated detection of structured coarse-grained parallelism in sequential legacy applications

Edler Von Koch, Tobias Joseph Kastulus January 2014 (has links)
The efficient execution of sequential legacy applications on modern, parallel computer architectures is one of today’s most pressing problems. Automatic parallelization has been investigated as a potential solution for several decades but its success generally remains restricted to small niches of regular, array-based applications. This thesis investigates two techniques that have the potential to overcome these limitations. Beginning at the lowest level of abstraction, the binary executable, it presents a study of the limits of Dynamic Binary Parallelization (Dbp), a recently proposed technique that takes advantage of an underlying multicore host to transparently parallelize a sequential binary executable. While still in its infancy, Dbp has received broad interest within the research community. This thesis seeks to gain an understanding of the factors contributing to the limits of Dbp and the costs and overheads of its implementation. An extensive evaluation using a parameterizable Dbp system targeting a Cmp with light-weight architectural Tls support is presented. The results show that there is room for a significant reduction of up to 54% in the number of instructions on the critical paths of legacy Spec Cpu2006 benchmarks, but that it is much harder to translate these savings into actual performance improvements, with a realistic hardware-supported implementation achieving a speedup of 1.09 on average. While automatically parallelizing compilers have traditionally focused on data parallelism, additional parallelism exists in a plethora of other shapes such as task farms, divide & conquer, map/reduce and many more. These algorithmic skeletons, i.e. high-level abstractions for commonly used patterns of parallel computation, differ substantially from data parallel loops. Unfortunately, algorithmic skeletons are largely informal programming abstractions and are lacking a formal characterization in terms of established compiler concepts. This thesis develops compiler-friendly characterizations of popular algorithmic skeletons using a novel notion of commutativity based on liveness. A hybrid static/dynamic analysis framework for the context-sensitive detection of skeletons in legacy code that overcomes limitations of static analysis by complementing it with profiling information is described. A proof-of-concept implementation of this framework in the Llvm compiler infrastructure is evaluated against Spec Cpu2006 benchmarks for the detection of a typical skeleton. The results illustrate that skeletons are often context-sensitive in nature. Like the two approaches presented in this thesis, many dynamic parallelization techniques exploit the fact that some statically detected data and control flow dependences do not manifest themselves in every possible program execution (may-dependences) but occur only infrequently, e.g. for some corner cases, or not at all for any legal program input. While the effectiveness of dynamic parallelization techniques critically depends on the absence of such dependences, not much is known about their nature. This thesis presents an empirical analysis and characterization of the variability of both data dependences and control flow across program runs. The cBench benchmark suite is run with 100 randomly chosen input data sets to generate whole-program control and data flow graphs (Cdfgs) for each run, which are then compared to obtain a measure of the variance in the observed control and data flow. The results show that, on average, the cumulative profile information gathered with at least 55, and up to 100, different input data sets is needed to achieve full coverage of the data flow observed across all runs. For control flow, the figure stands at 46 and 100 data sets, respectively. This suggests that profile-guided parallelization needs to be applied with utmost care, as misclassification of sequential loops as parallel was observed even when up to 94 input data sets are used.
10

Parallel computation techniques for virtual acoustics and physical modelling synthesis

Webb, Craig Jonathan January 2014 (has links)
The numerical simulation of large-scale virtual acoustics and physical modelling synthesis is a computationally expensive process. Time stepping methods, such as finite difference time domain, can be used to simulate wave behaviour in models of three-dimensional room acoustics and virtual instruments. In the absence of any form of simplifying assumptions, and at high audio sample rates, this can lead to simulations that require many hours of computation on a standard Central Processing Unit (CPU). In recent years the video game industry has driven the development of Graphics Processing Units (GPUs) that are now capable of multi-teraflop performance using highly parallel architectures. Whilst these devices are primarily designed for graphics calculations, they can also be used for general purpose computing. This thesis explores the use of such hardware to accelerate simulations of three-dimensional acoustic wave propagation, and embedded systems that create physical models for the synthesis of sound. Test case simulations of virtual acoustics are used to compare the performance of workstation CPUs to that of Nvidia’s Tesla GPU hardware. Using representative multicore CPU benchmarks, such simulations can be accelerated in the order of 5X for single precision and 3X for double precision floating-point arithmetic. Optimisation strategies are examined for maximising GPU performance when using single devices, as well as for multiple device codes that can compute simulations using billions of grid points. This allows the simulation of room models of several thousand cubic metres at audio rates such as 44.1kHz, all within a useable time scale. The performance of alternative finite difference schemes is explored, as well as strategies for the efficient implementation of boundary conditions. Creating physical models of acoustic instruments requires embedded systems that often rely on sparse linear algebra operations. The performance efficiency of various sparse matrix storage formats is detailed in terms of the fundamental operations that are required to compute complex models, with an optimised storage system achieving substantial performance gains over more generalised formats. An integrated instrument model of the timpani drum is used to demonstrate the performance gains that are possible using the optimisation strategies developed through this thesis.

Page generated in 0.1281 seconds