Global ETD Search

1	Ensuring performance and correctness for legacy parallel programs McPherson, Andrew John January 2015 (has links) Modern computers are based on manycore architectures, with multiple processors on a single silicon chip. In this environment programmers are required to make use of parallelism to fully exploit the available cores. This can either be within a single chip, normally using shared-memory programming or at a larger scale on a cluster of chips, normally using message-passing. Legacy programs written using either paradigm face issues when run on modern manycore architectures. In message-passing the problem is performance related, with clusters based on manycores introducing necessarily tiered topologies that unaware programs may not fully exploit. In shared-memory it is a correctness problem, with modern systems employing more relaxed memory consistency models, on which legacy programs were not designed to operate. Solutions to this correctness problem exist, but introduce a performance problem as they are necessarily conservative. This thesis focuses on addressing these problems, largely through compile-time analysis and transformation. The first technique proposed is a method for statically determining the communication graph of an MPI program. This is then used to optimise process placement in a cluster of CMPs. Using the 64-process versions of the NAS parallel benchmarks, we see an average of 28% (7%) improvement in communication localisation over by-rank scheduling for 8-core (12-core) CMP-based clusters, representing the maximum possible improvement. Secondly, we move into the shared-memory paradigm, identifying and proving necessary conditions for a read to be an acquire. This can be used to improve solutions in several application areas, two of which we then explore. We apply our acquire signatures to the problem of fence placement for legacy well-synchronised programs. We find that applying our signatures, we can reduce the number of fences placed by an average of 62%, leading to a speedup of up to 2.64x over an existing practical technique. Finally, we develop a dynamic synchronisation detection tool known as SyncDetect. This proof of concept tool leverages our acquire signatures to more accurately detect ad hoc synchronisations in running programs and provides the programmer with a report of their locations in the source code. The tool aims to assist programmers with the notoriously difficult problem of parallel debugging and in manually porting legacy programs to more modern (relaxed) memory consistency models. 005.7
2	Analysis and parameter prediction of compiler transformation for graphics processors Magni, Alberto January 2016 (has links) In the last decade graphics processors (GPUs) have been extensively used to solve computationally intensive problems. A variety of GPU architectures by different hardware manufacturers have been shipped in a few years. OpenCL has been introduced as the standard cross-vendor programming framework for GPU computing. Writing and optimising OpenCL applications is a challenging task, the programmer has to take care of several low level details. This is even harder when the goal is to improve performance on a wide range of devices: OpenCL does not guarantee performance portability. In this thesis we focus on the analysis and the portability of compiler optimisations. We describe the implementation of a portable compiler transformation: thread-coarsening. The transformation increases the amount of work carried out by a single thread running on the GPU. The goal is to reduce the amount of redundant instructions executed by the parallel application. The first contribution is a technique to analyse performance improvements and degradations given by the compiler transformation, we study the changes of hardware performance counters when applying coarsening. In this way we identify the root causes of execution time variations due to coarsening. As second contribution, we study the relative performance of coarsening over multiple input sizes. We show that the speedups given by coarsening are stable for problem sizes larger than a threshold that we call saturation point. We exploit the existence of the saturation point to speedup iterative compilation. The last contribution of the work is the development of a machine learning technique that automatically selects a coarsening configuration that improves performance. The technique is based on an iterative model built using a neural network. The network is trained once for a GPU model and used for several programs. To prove the flexibility of our techniques, all our experiments have been deployed on multiple GPU models by different vendors. 006.6
3	Superoptimisation : provably optimal code generation using answer set programming Crick, Thomas January 2009 (has links) No description available. 005.3
4	Parametric Potential-Outcome Survival Models for Causal Inference Gong, Zhaojing January 2008 (has links) Estimating causal effects in clinical trials is often complicated by treatment noncompliance and missing outcomes. In time-to-event studies, estimation is further complicated by censoring. Censoring is a type of missing outcome, the mechanism of which may be non-ignorable. While new estimates have recently been proposed to account for noncompliance and missing outcomes, few studies have specifically considered time-to-event outcomes, where even the intention-to-treat (ITT) estimator is potentially biased for estimating causal effects of assigned treatment. In this thesis, we develop a series of parametric potential-outcome (PPO) survival models, for the analysis of randomised controlled trials (RCT) with time-to-event outcomes and noncompliance. Both ignorable and non-ignorable censoring mechanisms are considered. We approach model-fitting from a likelihood-based perspective, using the EM algorithm to locate maximum likelihood estimators. We are not aware of any previous work that addresses these complications jointly. In addition, we give new formulations for the average causal effect (ACE) and the complier average causal effect (CACE) to suit survival analysis. To illustrate the likelihood-based method proposed in this thesis, the HIP breast cancer trial data \citep{Baker98, Shapiro88} were re-analysed using specific PPO-survival models, the Weibull and log-normal based PPO-survival models, which assume that the failure time and censored time distributions both follow Weibull or log-normal distributions. Furthermore, an extended PPO-survival model is also derived in this thesis, which permits investigation into the impact of causal effect after accommodating certain pre-treatment covariates. This is an important contribution to the potential outcomes, survival and RCT literature. For comparison, the Frangakis-Rubin (F-R) model \citep{Frangakis99} is also applied to the HIP breast cancer trial data. To date, the F-R model has not yet been applied to any time-to-event data in the literature. Causal inference Noncompliance Survival analysis ITT analysis EM algorithm HIP breast cancer trial Potential-outcome framework Compliers average causal effects

1

Page generated in 0.0346 seconds