Global ETD Search

141	Runtime Verification and Debugging of Concurrent Software Zhang, Lu 29 July 2016 (has links) Our reliance on software has been growing fast over the past decades as the pervasive use of computer and software penetrated not only our daily life but also many critical applications. As the computational power of multi-core processors and other parallel hardware keeps increasing, concurrent software that exploit these parallel computing hardware become crucial for achieving high performance. However, developing correct and efficient concurrent software is a difficult task for programmers due to the inherent nondeterminism in their executions. As a result, concurrency related software bugs are among the most troublesome in practice and have caused severe problems in recent years. In this dissertation, I propose a series of new and fully automated methods for verifying and debugging concurrent software. They cover the detection, prevention, classification, and repair of some important types of bugs in the implementation of concurrent data structures and client-side web applications. These methods can be adopted at various stages of the software development life cycle, to help programmers write concurrent software correctly as well as efficiently. / Ph. D. Concurrency Verification Debugging Program Repair Quasi Linearizability Concurrent Data Structure Web Application JavaScript
142	Dynamic Invariant Generation for Concurrent Programs Chattopadhyay, Arijit 23 June 2014 (has links) We propose a fully automated and dynamic method for generating likely invariants from multithreaded programs and then leveraging these invariants to infer atomic regions and diagnose concurrency errors in the software code. Although existing methods for dynamic invariant generation perform reasonably well on sequential programs, for multithreaded programs, their effectiveness often reduces dramatically in terms of both the number of invariants that they can generate and the likelihood of them being true invariants. We solve this problem by developing a new dynamic invariant generator, which consists of a new LLVM based code instrumentation tool, an INSPECT based thread interleaving explorer, and a customized inference engine inside Daikon. We have evaluated the resulting system on public domain multithreaded C/C++ benchmarks. Our experiments show that the new method is effective in generating high-quality invariants. Furthermore, the state and transition invariants generated by our new method have been proved useful both in error diagnosis and in identifying likely atomic regions in the concurrent software code. / Master of Science Concurrency Likely Invariant Dynamic Invariant Generation Partial Order Reduction Error Diagnosis Atomic Region
143	Scalable and Energy Efficient Execution Methods for Multicore Systems Li, Dong 16 February 2011 (has links) Multicore architectures impose great pressure on resource management. The exploration spaces available for resource management increase explosively, especially for large-scale high end computing systems. The availability of abundant parallelism causes scalability concerns at all levels. Multicore architectures also impose pressure on power management. Growth in the number of cores causes continuous growth in power. In this dissertation, we introduce methods and techniques to enable scalable and energy efficient execution of parallel applications on multicore architectures. We study strategies and methodologies that combine DCT and DVFS for the hybrid MPI/OpenMP programming model. Our algorithms yield substantial energy saving (8.74% on average and up to 13.8%) with either negligible performance loss or performance gain (up to 7.5%). To save additional energy for high-end computing systems, we propose a power-aware MPI task aggregation framework. The framework predicts the performance effect of task aggregation in both computation and communication phases and its impact in terms of execution time and energy of MPI programs. Our framework provides accurate predictions that lead to substantial energy saving through aggregation (64.87% on average and up to 70.03%) with tolerable performance loss (under 5%). As we aggregate multiple MPI tasks within the same node, we have the scalability concern of memory registration for high performance networking. We propose a new memory registration/deregistration strategy to reduce registered memory on multicore architectures with helper threads. We investigate design polices and performance implications of the helper thread approach. Our method efficiently reduces registered memory (23.62% on average and up to 49.39%) and avoids memory registration/deregistration costs for reused communication memory. Our system enables the execution of application input sets that could not run to the completion with the memory registration limitation. / Ph. D. Performance Modeling and Analysis Multicore Processors Power-Aware Computing Concurrency Throttling High-Performance Computing
144	Improving the Efficiency of Parallel Applications on Multithreaded and Multicore Systems Curtis-Maury, Matthew 15 April 2008 (has links) The scalability of parallel applications executing on multithreaded and multicore multiprocessors is often quite limited due to large degrees of contention over shared resources on these systems. In fact, negative scalability frequently occurs such that a non-negligable performance loss is observed through the use of more processors and cores. In this dissertation, we present a prediction model for identifying efficient operating points of concurrency in multithreaded scientific applications in terms of both performance as a primary objective and power secondarily. We also present a runtime system that uses live analysis of hardware event rates through the prediction model to optimize applications dynamically. We discuss a dynamic, phase-aware performance prediction model (DPAPP), which combines statistical learning techniques, including multivariate linear regression and artificial neural networks, with runtime analysis of data collected from hardware event counters to locate optimal operating points of concurrency. We find that the scalability model achieves accuracy approaching 95%, sufficiently accurate to identify improved concurrency levels and thread placements from within real parallel scientific applications. Using DPAPP, we develop a prediction-driven runtime optimization scheme, called ACTOR, which throttles concurrency so that power consumption can be reduced and performance can be set at the knee of the scalability curve of each parallel execution phase in an application. ACTOR successfully identifies and exploits program phases where limited scalability results in a performance loss through the use of more processing elements, providing simultaneous reductions in execution time by 5%-18% and power consumption by 0%-11% across a variety of parallel applications and architectures. Further, we extend DPAPP and ACTOR to include support for runtime adaptation of DVFS, allowing for the synergistic exploitation of concurrency throttling and DVFS from within a single, autonomically-acting library, providing improved energy-efficiency compared to either approach in isolation. / Ph. D. power-aware computing high-performance computing performance prediction multicore processors runtime adaptation concurrency throttling
145	Prediction Models for Multi-dimensional Power-Performance Optimization on Many Cores Shah, Ankur Savailal 28 May 2008 (has links) Power has become a primary concern for HPC systems. Dynamic voltage and frequency scaling (DVFS) and dynamic concurrency throttling (DCT) are two software tools (or knobs) for reducing the dynamic power consumption of HPC systems. To date, few works have considered the synergistic integration of DVFS and DCT in performance-constrained systems, and, to the best of our knowledge, no prior research has developed application-aware simultaneous DVFS and DCT controllers in real systems and parallel programming frameworks. We present a multi-dimensional, online performance prediction framework, which we deploy to address the problem of simultaneous runtime optimization of DVFS, DCT, and thread placement on multi-core systems. We present results from an implementation of the prediction framework in a runtime system linked to the Intel OpenMP runtime environment and running on a real dual-processor quad-core system as well as a dual-processor dual-core system. We show that the prediction framework derives near-optimal settings of the three power-aware program adaptation knobs that we consider. Our overall runtime optimization framework achieves significant reductions in energy (12.27% mean) and ED² (29.6% mean), through simultaneous power savings (3.9% mean) and performance improvements (10.3% mean). Our prediction and adaptation framework outperforms earlier solutions that adapt only DVFS or DCT, as well as one that sequentially applies DCT then DVFS. Further, our results indicate that prediction-based schemes for runtime adaptation compare favorably and typically improve upon heuristic search-based approaches in both performance and energy savings. / Master of Science concurrency throttling power-aware computing runtime adaptation performance prediction high-performance computing Multicore processors
146	Improving Performance of Highly-Programmable Concurrent Applications by Leveraging Parallel Nesting and Weaker Isolation Levels Niles, Duane Francis Jr. 15 July 2015 (has links) The recent development of multi-core computer architectures has largely affected the creation of everyday applications, requiring the adoption of concurrent programming to significantly utilize the divided processing power of computers. Applications must be split into sections able to execute in parallel, without any of these sections conflicting with one another, thereby necessitating some form of synchronization to be declared. The most commonly used methodology is lock-based synchronization; although, to improve performance the most, developers must typically form complex, low-level implementations for large applications, which can easily create potential errors or hindrances. An abstraction from database systems, known as transactions, is a rising concurrency control design aimed to circumvent the challenges with programmability, composability, and scalability in lock-based synchronization. Transactions execute their operations speculatively and are capable of being restarted (or rolled back) when there exist conflicts between concurrent actions. As such issues can occur later in the lifespans of transactions, entire rollbacks are not that effective for performance. One particular method, known as nesting, was created to counter that drawback. Nesting is the act of enclosing transactions within other transactions, essentially dividing the work into pieces called sub-transactions. These sub-transactions can roll back without affecting the entire main transaction, although general nesting models only allow one sub-transaction to perform work at a time. The first main contribution in this thesis is SPCN, an algorithm that parallelizes nested transactions while automatically processing any potential conflicts that may arise, eliminating the burden of additional processing from the application developers. Two versions of SPCN exist: Strict, which enforces the sub-transactions' work to be made visible in a serialized order; and Relaxed, which allows sub-transactions to distribute their information immediately as they finish (therefore invalidation may occur after-the-fact and must be handled). Despite the additional logic required by SPCN, it outperforms traditional closed nesting by 1.78x at the lowest and 3.78x at the highest in the experiments run. Another method to alter transactional execution and boost performance is to relax the rules of visibility for parallel operations (known as their isolation). Depending on the application, correctness is not broken even if some transactions see external work that may later be undone due to a rollback, or if an object is written while another transaction is using an older instance of its data. With lock-based synchronization, developers would have to explicitly design their application with varying amounts of locks, and different lock organizations or hierarchies, to change the strictness of the execution. With transactional systems, the processing performed by the system itself can be set to utilize different rulings, which can change the performance of an application without requiring it to be largely redesigned. This notion leads to the second contribution in this thesis: AsR, or As-Serializable transactions. Serializability is the general form of isolation or strictness for transactions in many applications. In terms of execution, its definition is equivalent to only one transaction running at a time in a given system. Many transactional systems use their own internal form of locking to create Serializable executions, but it is typically too strict for many applications. AsR transactions allow the internal processing to be relaxed while additional meta-data is used external to the system, without requiring any interaction from the developer or any changes to the given application. AsR transactions offer multiple orders of magnitude more in throughput in highly-contentious scenarios, due to their capability to outlast traditional levels of isolation. / Master of Science Concurrency Transactional Memory Parallel Nesting Distributed Systems Databases Transactions Serializability Weaker Isolation Levels
147	Modeling and Runtime Systems for Coordinated Power-Performance Management Li, Bo 28 January 2019 (has links) Emergent systems in high-performance computing (HPC) expect maximal efficiency to achieve the goal of power budget under 20-40 megawatts for 1 exaflop set by the Department of Energy. To optimize efficiency, emergent systems provide multiple power-performance control techniques to throttle different system components and scale of concurrency. In this dissertation, we focus on three throttling techniques: CPU dynamic voltage and frequency scaling (DVFS), dynamic memory throttling (DMT), and dynamic concurrency throttling (DCT). We first conduct an empirical analysis of the performance and energy trade-offs of different architectures under the throttling techniques. We show the impact on performance and energy consumption on Intel x86 systems with accelerators of Intel Xeon Phi and a Nvidia general-purpose graphics processing unit (GPGPU). We show the trade-offs and potentials for improving efficiency. Furthermore, we propose a parallel performance model for coordinating DVFS, DMT, and DCT simultaneously. We present a multivariate linear regression-based approach to approximate the impact of DVFS, DMT, and DCT on performance for performance prediction. Validation using 19 HPC applications/kernels on two architectures (i.e., Intel x86 and IBM BG/Q) shows up to 7% and 17% prediction error correspondingly. Thereafter, we develop the metrics for capturing the performance impact of DVFS, DMT, and DCT. We apply the artificial neural network model to approximate the nonlinear effects on performance impact and present a runtime control strategy accordingly for power capping. Our validation using 37 HPC applications/kernels shows up to a 20% performance improvement under a given power budget compared with the Intel RAPL-based method. / Ph. D. / System efficiency on high-performance computing (HPC) systems is the key to achieving the goal of power budget for exascale supercomputers. Techniques for adjusting the performance of different system components can help accomplish this goal by dynamically controlling system performance according to application behaviors. In this dissertation, we focus on three techniques: adjusting CPU performance, memory performance, and the number of threads for running parallel applications. First, we profile the performance and energy consumption of different HPC applications on both Intel systems with accelerators and IBM BG/Q systems. We explore the trade-offs of performance and energy under these techniques and provide optimization insights. Furthermore, we propose a parallel performance model that can accurately capture the impact of these techniques on performance in terms of job completion time. We present an approximation approach for performance prediction. The approximation has up to 7% and 17% prediction error on Intel x86 and IBM BG/Q systems respectively under 19 HPC applications. Thereafter, we apply the performance model in a runtime system design for improving performance under a given power budget. Our runtime strategy achieves up to 20% performance improvement to the baseline method. Parallel Performance Modeling Dynamic Voltage and Frequency Scaling Dynamic Memory Throttling Dynamic Concurrency Throttling Shared-Memory Systems
148	Explicit-State Model Checking of Concurrent x86-64 Assembly Bharadwaj, Abhijith Ananth 10 July 2020 (has links) The thesis presents xavier, a novel tool-set for model checking of concurrent x86-64 assembly programs, via Partial Order Reduction (POR). xavier{} presents a realistic platform for systematically exploring and analyzing the state-space of concurrent x86 assembly programs, with the aim of detecting bugs via assertion failures in mainstream programs. Recently, a number of state-of-the-art model checking solutions have been introduced to efficiently explore the state-space of concurrent programs, using POR algorithms. However, such solutions are inefficient while analyzing stateful programming languages, such as the x86 assembly language, due to their low level of abstraction. To this end, xavier{} makes two contributions: i) a novel order-sensitivity based POR algorithm, that is applicable to concurrent x86 assembly, ii) an x86 machine-model that can accurately perform relaxed-consistency emulation of concurrent x86 assembly, without the need for any translations. We demonstrate the applicability of xavier{} through an evaluation on several classical mutual-exclusion benchmarks and mainstream benchmarks from the Userspace Read-Copy-Update (URCU) concurrency library, where the benchmarks range from $250-3700$ lines of x86 assembly. The framework is the first that supports systematic model checking of concurrent x86 assembly programs, and the effectiveness of xavier{} is demonstrated by reproducing a concurrency issue of threads accessing intermediate states in the URCU library, which stems from an assumption violation. / Master of Science / Sound verification of multi-threaded programs necessitate a systematic analysis of program state-spaces that result from thread interactions. Consequently, model-checking cite{godefroid1997model, Clarke2018} has been one of the prominent methods used to tackle the verification of multi-threaded programs. However, existing model-checking solutions are inefficient while analyzing stateful programming languages, such as the x86 assembly language, due to the solutions' higher level of abstraction. Therefore, the thesis presents xavier, a novel tool-set and a realistic platform for systematically exploring and analyzing the state-space of mainstream concurrent x86 assembly programs, with the aim of detecting bugs via assertion failures. To this end, xavier{} makes two contributions: i) a novel order-sensitivity based Partial Order Reduction algorithm, which efficiently explores the state space of concurrent x86 assembly, ii) an x86 machine-model that can accurately emulate the execution of concurrent x86 assembly, without the need for any translations. We demonstrate the applicability of xavier{} through an evaluation on several classical mutual-exclusion and mainstream benchmarks from the Userspace Read-Copy-Update (URCU) concurrency library, where the benchmarks range from $250-3700$ lines of x86 assembly. Moreover, we demonstrate the effectiveness of xavier{} by reproducing a concurrency issue in the URCU library, which manifests as a result of an assumption violation. Partial Order Reduction x86 Machine-Code Verification Software Model Checking Concurrency
149	Analysis Techniques for Concurrent Programming Languages Tamarit Muñoz, Salvador 02 September 2013 (has links) Los lenguajes concurrentes est an cada d a m as presentes en nuestra sociedad, tanto en las nuevas tecnolog as como en los sistemas utilizados de manera cotidiana. M as a un, dada la actual distribuci on de los sistemas y su arquitectura interna, cabe esperar que este hecho siga siendo una realidad en los pr oximos a~nos. En este contexto, el desarrollo de herramientas de apoyo al desarrollo de programas concurrentes se vuelve esencial. Adem as, el comportamiento de los sistemas concurrentes es especialmente dif cil de analizar, por lo que cualquier herramienta que ayude en esta tarea, a un cuando sea limitada, ser a de gran utilidad. Por ejemplo, podemos encontrar herramientas para la depuraci on, an alisis, comprobaci on, optimizaci on, o simpli caci on de programas. Muchas de ellas son ampliamente utilizadas por los programadores hoy en d a. El prop osito de esta tesis es introducir, a trav es de diferentes lenguajes de programaci on concurrentes, t ecnicas de an alisis que puedan ayudar a mejorar la experiencia del desarrollo y publicaci on de software para modelos concurrentes. En esta tesis se introducen tanto an alisis est aticos (aproximando todas las posibles ejecuciones) como din amicos (considerando una ejecuci on en concreto). Los trabajos aqu propuestos di eren lo su ciente entre s para constituir ideas totalmente independientes, pero manteniendo un nexo com un: el hecho de ser un an alisis para un lenguaje concurrente. Todos los an alisis presentados han sido de nidos formalmente y se ha probado su correcci on, asegurando que los resultados obtenidos tendr an el grado de abilidad necesario en sistemas que lo requieran, como por ejemplo, en sistemas cr ticos. Adem as, se incluye la descripci on de las herramientas software que implementan las diferentes ideas propuestas. Esto le da al trabajo una utilidad m as all a del marco te orico, permitiendo poner en pr actica y probar con ejemplos reales los diferentes an alisis. Todas las ideas aqu presentadas constituyen, por s mismas, propuestas aplicables en multitud de contextos y problemas actuales. Adem as, individualmente sirven de punto de partida para otros an alisis derivados, as como para la adaptaci on a otros lenguajes de la misma familia. Esto le da un valor a~nadido a este trabajo, como bien atestiguan algunos trabajos posteriores que ya se est an bene ciando de los resultados obtenidos en esta tesis. / Concurrent languages are increasingly present in our society, both in new technologies and in the systems used on a daily basis. Moreover, given the current systems distribution and their internal architecture, one can expect that this remains so in the coming years. In this context, the development of tools to support the implementation of concurrent programs becomes essential. Futhermore, the behavior of concurrent systems is particularly difficult to analyse, so that any tool that helps in this task, even if in a limited way, will be very useful. For example, one can find tools for debugging, analysis, testing, optimisation, or simplification of programs, which are widely used by programmers nowadays. The purpose of this thesis is to introduce, through various concurrent programming languages, some analysis techniques that can help to improve the experience of the software development and release for concurrent models. This thesis introduces both static (approximating all possible executions) and dynamic (considering a specific execution) analysis. The topics considered here differ enough from each other to be fully independent. Nevertheless, they have a common link: they can be used to analyse properties of a concurrent programming language. All the analyses presented here have been formally defined and their correctness have been proved, ensuring that the results will have the reliability degree which is needed for some systems (for instance, for critical systems). It also includes a description of the software tools that implement the different ideas proposed. This gives the work a usefulness well beyond the theoretical aspect, allowing us to put it in practice and to test the different analyses with real-world examples All the ideas here presented are, by themselves, approaches that can be applied in many current contexts and problems. Moreover, individually they serve as a starting point for other derived analysis, as well as for the adaptation to other languages of the same family. This gives an added value to this work, a fact confirmed by some later works that are already benefiting from the results obtained in this thesis. / Tamarit Muñoz, S. (2013). Analysis Techniques for Concurrent Programming Languages [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/31651 LENGUAJES Y SISTEMAS INFORMATICOS
150	Targeted Client Synthesis for Detecting Concurrency Bugs Samak, Malavika January 2016 (has links) (PDF) Detecting concurrency bugs can be challenging due to the intricacies associated with their manifestation. These intricacies correspond to identifying the methods that need to be invoked concurrently, the inputs passed to these methods and the interleaving of the threads that cause the erroneous behavior. Neither fuzzing-based testing techniques nor over-approximate static analyses are well positioned to detect subtle concurrency defects while retaining high accuracy alongside satisfactory coverage. While dynamic analysis techniques have been proposed to overcome some of the challenges in detecting concurrency bugs, we observe that their success is critically dependent on the availability of eﬀective multithreaded clients. Without a priori knowledge of the defects, manually constructing defect-revealing multithreaded clients is non-trivial. In this thesis, we design an approach to address the problem of automatically generate clients for detecting concurrency bugs in multithreaded libraries. The key insight underlying our design is that a subset of the properties observed when the defects manifest in a concur-rent execution can also be observed in a sequential execution. The input to our approach is a library implementation and a sequential testsuite, and the output is a set of multithreaded clients that can be used to reveal defects in the input library implementation. Dynamic defect detectors can execute the clients and analyze the resulting traces to report various kinds of defects including deadlocks, data races and atomicity violations. Furthermore, the clients can also be used by testing frameworks to report assertion violations. We propose two variants of our design – (a) path-agnostic client generation, and (b) path-aware client generation. The path-agnostic client generation process helps in detection of potential bugs present in the paths executed by the input sequential testsuite. It does not attempt to explore newer paths by satisfying path conditions either by modifying the input or by scheduling the threads appropriately. The generated clients are used to expose deadlocks, data races and atomicity violations. Our analysis analyzes the execution traces obtained from executing the input sequential clients and produces a concurrent client program that drives shared objects via library methods calls to states conducive for triggering deadlocks, data races or atomicity violations. For path-aware client generation, our approach explores newer paths that are not covered by the input sequential testsuite to generate clients. For this purpose, we design a directed, iterative and scalable engine that combines the strengths of static and dynamic analysis to help synthesize both multithreaded clients and schedules that violate complex correctness conditions expressed by the developer. Apart from the library implementation and the sequential testsuite as input, this engine also accepts a specification of correctness as input. Then, it iteratively refines each client from the input sequential testsuite to generate an ex-ecution that can break the input specification. Each step of the iterative process includes statically identifying sub-goals towards the goal of failing the specification, generating a plan toward meeting these goals, and merging of the paths traversed dynamically with the plan computed statically via constraint solving to generate a new client. The engine reports full reproduction scenarios, guaranteed to be true, for the bugs it finds. We have implemented prototypes that incorporate the aforementioned ideas and validated them by applying them on 29 well-tested concurrent classes from popular Java libraries, including the latest version of JDK. We are able to automatically generate clients that helped expose more than 300 concurrency bugs including deadlocks, data races, atomicity violations and assertion violations. We reported many previously unknown bugs to the developers of these libraries resulting in either fixes to the code or changes to the documentation pertaining to the thread-safe behavior of the relevant classes. On average, the time taken to analyze a class and generate clients for it is less than two minutes. We believe that the demonstrated eﬀectiveness of our prototypes in helping expose deep bugs in popular Java libraries makes the design, proposed in this thesis, a vital cog in the future development and deployment of dynamic concurrency bug detectors. Concurrency Bugs Dynamic Analysis Program Synthesis Deadlock Detection Bytecode Instrumentation Targeted Client Synthesis Dynamic Deadlock Detection Multithreaded Client Synthesis Atomicity Violations Shared Memory Dynamic Concurrency Bug Detectors Racy Tests Bug Detection Computer Science

Search results