• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 125
  • 23
  • 13
  • 9
  • 8
  • 3
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 253
  • 78
  • 53
  • 50
  • 44
  • 42
  • 39
  • 37
  • 35
  • 32
  • 32
  • 30
  • 29
  • 27
  • 25
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
141

Runtime Verification and Debugging of Concurrent Software

Zhang, Lu 29 July 2016 (has links)
Our reliance on software has been growing fast over the past decades as the pervasive use of computer and software penetrated not only our daily life but also many critical applications. As the computational power of multi-core processors and other parallel hardware keeps increasing, concurrent software that exploit these parallel computing hardware become crucial for achieving high performance. However, developing correct and efficient concurrent software is a difficult task for programmers due to the inherent nondeterminism in their executions. As a result, concurrency related software bugs are among the most troublesome in practice and have caused severe problems in recent years. In this dissertation, I propose a series of new and fully automated methods for verifying and debugging concurrent software. They cover the detection, prevention, classification, and repair of some important types of bugs in the implementation of concurrent data structures and client-side web applications. These methods can be adopted at various stages of the software development life cycle, to help programmers write concurrent software correctly as well as efficiently. / Ph. D.
142

Dynamic Invariant Generation for Concurrent Programs

Chattopadhyay, Arijit 23 June 2014 (has links)
We propose a fully automated and dynamic method for generating likely invariants from multithreaded programs and then leveraging these invariants to infer atomic regions and diagnose concurrency errors in the software code. Although existing methods for dynamic invariant generation perform reasonably well on sequential programs, for multithreaded programs, their effectiveness often reduces dramatically in terms of both the number of invariants that they can generate and the likelihood of them being true invariants. We solve this problem by developing a new dynamic invariant generator, which consists of a new LLVM based code instrumentation tool, an INSPECT based thread interleaving explorer, and a customized inference engine inside Daikon. We have evaluated the resulting system on public domain multithreaded C/C++ benchmarks. Our experiments show that the new method is effective in generating high-quality invariants. Furthermore, the state and transition invariants generated by our new method have been proved useful both in error diagnosis and in identifying likely atomic regions in the concurrent software code. / Master of Science
143

Scalable and Energy Efficient Execution Methods for Multicore Systems

Li, Dong 16 February 2011 (has links)
Multicore architectures impose great pressure on resource management. The exploration spaces available for resource management increase explosively, especially for large-scale high end computing systems. The availability of abundant parallelism causes scalability concerns at all levels. Multicore architectures also impose pressure on power management. Growth in the number of cores causes continuous growth in power. In this dissertation, we introduce methods and techniques to enable scalable and energy efficient execution of parallel applications on multicore architectures. We study strategies and methodologies that combine DCT and DVFS for the hybrid MPI/OpenMP programming model. Our algorithms yield substantial energy saving (8.74% on average and up to 13.8%) with either negligible performance loss or performance gain (up to 7.5%). To save additional energy for high-end computing systems, we propose a power-aware MPI task aggregation framework. The framework predicts the performance effect of task aggregation in both computation and communication phases and its impact in terms of execution time and energy of MPI programs. Our framework provides accurate predictions that lead to substantial energy saving through aggregation (64.87% on average and up to 70.03%) with tolerable performance loss (under 5%). As we aggregate multiple MPI tasks within the same node, we have the scalability concern of memory registration for high performance networking. We propose a new memory registration/deregistration strategy to reduce registered memory on multicore architectures with helper threads. We investigate design polices and performance implications of the helper thread approach. Our method efficiently reduces registered memory (23.62% on average and up to 49.39%) and avoids memory registration/deregistration costs for reused communication memory. Our system enables the execution of application input sets that could not run to the completion with the memory registration limitation. / Ph. D.
144

Improving the Efficiency of Parallel Applications on Multithreaded and Multicore Systems

Curtis-Maury, Matthew 15 April 2008 (has links)
The scalability of parallel applications executing on multithreaded and multicore multiprocessors is often quite limited due to large degrees of contention over shared resources on these systems. In fact, negative scalability frequently occurs such that a non-negligable performance loss is observed through the use of more processors and cores. In this dissertation, we present a prediction model for identifying efficient operating points of concurrency in multithreaded scientific applications in terms of both performance as a primary objective and power secondarily. We also present a runtime system that uses live analysis of hardware event rates through the prediction model to optimize applications dynamically. We discuss a dynamic, phase-aware performance prediction model (DPAPP), which combines statistical learning techniques, including multivariate linear regression and artificial neural networks, with runtime analysis of data collected from hardware event counters to locate optimal operating points of concurrency. We find that the scalability model achieves accuracy approaching 95%, sufficiently accurate to identify improved concurrency levels and thread placements from within real parallel scientific applications. Using DPAPP, we develop a prediction-driven runtime optimization scheme, called ACTOR, which throttles concurrency so that power consumption can be reduced and performance can be set at the knee of the scalability curve of each parallel execution phase in an application. ACTOR successfully identifies and exploits program phases where limited scalability results in a performance loss through the use of more processing elements, providing simultaneous reductions in execution time by 5%-18% and power consumption by 0%-11% across a variety of parallel applications and architectures. Further, we extend DPAPP and ACTOR to include support for runtime adaptation of DVFS, allowing for the synergistic exploitation of concurrency throttling and DVFS from within a single, autonomically-acting library, providing improved energy-efficiency compared to either approach in isolation. / Ph. D.
145

Prediction Models for Multi-dimensional Power-Performance Optimization on Many Cores

Shah, Ankur Savailal 28 May 2008 (has links)
Power has become a primary concern for HPC systems. Dynamic voltage and frequency scaling (DVFS) and dynamic concurrency throttling (DCT) are two software tools (or knobs) for reducing the dynamic power consumption of HPC systems. To date, few works have considered the synergistic integration of DVFS and DCT in performance-constrained systems, and, to the best of our knowledge, no prior research has developed application-aware simultaneous DVFS and DCT controllers in real systems and parallel programming frameworks. We present a multi-dimensional, online performance prediction framework, which we deploy to address the problem of simultaneous runtime optimization of DVFS, DCT, and thread placement on multi-core systems. We present results from an implementation of the prediction framework in a runtime system linked to the Intel OpenMP runtime environment and running on a real dual-processor quad-core system as well as a dual-processor dual-core system. We show that the prediction framework derives near-optimal settings of the three power-aware program adaptation knobs that we consider. Our overall runtime optimization framework achieves significant reductions in energy (12.27% mean) and ED² (29.6% mean), through simultaneous power savings (3.9% mean) and performance improvements (10.3% mean). Our prediction and adaptation framework outperforms earlier solutions that adapt only DVFS or DCT, as well as one that sequentially applies DCT then DVFS. Further, our results indicate that prediction-based schemes for runtime adaptation compare favorably and typically improve upon heuristic search-based approaches in both performance and energy savings. / Master of Science
146

Improving Performance of Highly-Programmable Concurrent Applications by Leveraging Parallel Nesting and Weaker Isolation Levels

Niles, Duane Francis Jr. 15 July 2015 (has links)
The recent development of multi-core computer architectures has largely affected the creation of everyday applications, requiring the adoption of concurrent programming to significantly utilize the divided processing power of computers. Applications must be split into sections able to execute in parallel, without any of these sections conflicting with one another, thereby necessitating some form of synchronization to be declared. The most commonly used methodology is lock-based synchronization; although, to improve performance the most, developers must typically form complex, low-level implementations for large applications, which can easily create potential errors or hindrances. An abstraction from database systems, known as transactions, is a rising concurrency control design aimed to circumvent the challenges with programmability, composability, and scalability in lock-based synchronization. Transactions execute their operations speculatively and are capable of being restarted (or rolled back) when there exist conflicts between concurrent actions. As such issues can occur later in the lifespans of transactions, entire rollbacks are not that effective for performance. One particular method, known as nesting, was created to counter that drawback. Nesting is the act of enclosing transactions within other transactions, essentially dividing the work into pieces called sub-transactions. These sub-transactions can roll back without affecting the entire main transaction, although general nesting models only allow one sub-transaction to perform work at a time. The first main contribution in this thesis is SPCN, an algorithm that parallelizes nested transactions while automatically processing any potential conflicts that may arise, eliminating the burden of additional processing from the application developers. Two versions of SPCN exist: Strict, which enforces the sub-transactions' work to be made visible in a serialized order; and Relaxed, which allows sub-transactions to distribute their information immediately as they finish (therefore invalidation may occur after-the-fact and must be handled). Despite the additional logic required by SPCN, it outperforms traditional closed nesting by 1.78x at the lowest and 3.78x at the highest in the experiments run. Another method to alter transactional execution and boost performance is to relax the rules of visibility for parallel operations (known as their isolation). Depending on the application, correctness is not broken even if some transactions see external work that may later be undone due to a rollback, or if an object is written while another transaction is using an older instance of its data. With lock-based synchronization, developers would have to explicitly design their application with varying amounts of locks, and different lock organizations or hierarchies, to change the strictness of the execution. With transactional systems, the processing performed by the system itself can be set to utilize different rulings, which can change the performance of an application without requiring it to be largely redesigned. This notion leads to the second contribution in this thesis: AsR, or As-Serializable transactions. Serializability is the general form of isolation or strictness for transactions in many applications. In terms of execution, its definition is equivalent to only one transaction running at a time in a given system. Many transactional systems use their own internal form of locking to create Serializable executions, but it is typically too strict for many applications. AsR transactions allow the internal processing to be relaxed while additional meta-data is used external to the system, without requiring any interaction from the developer or any changes to the given application. AsR transactions offer multiple orders of magnitude more in throughput in highly-contentious scenarios, due to their capability to outlast traditional levels of isolation. / Master of Science
147

Modeling and Runtime Systems for Coordinated Power-Performance Management

Li, Bo 28 January 2019 (has links)
Emergent systems in high-performance computing (HPC) expect maximal efficiency to achieve the goal of power budget under 20-40 megawatts for 1 exaflop set by the Department of Energy. To optimize efficiency, emergent systems provide multiple power-performance control techniques to throttle different system components and scale of concurrency. In this dissertation, we focus on three throttling techniques: CPU dynamic voltage and frequency scaling (DVFS), dynamic memory throttling (DMT), and dynamic concurrency throttling (DCT). We first conduct an empirical analysis of the performance and energy trade-offs of different architectures under the throttling techniques. We show the impact on performance and energy consumption on Intel x86 systems with accelerators of Intel Xeon Phi and a Nvidia general-purpose graphics processing unit (GPGPU). We show the trade-offs and potentials for improving efficiency. Furthermore, we propose a parallel performance model for coordinating DVFS, DMT, and DCT simultaneously. We present a multivariate linear regression-based approach to approximate the impact of DVFS, DMT, and DCT on performance for performance prediction. Validation using 19 HPC applications/kernels on two architectures (i.e., Intel x86 and IBM BG/Q) shows up to 7% and 17% prediction error correspondingly. Thereafter, we develop the metrics for capturing the performance impact of DVFS, DMT, and DCT. We apply the artificial neural network model to approximate the nonlinear effects on performance impact and present a runtime control strategy accordingly for power capping. Our validation using 37 HPC applications/kernels shows up to a 20% performance improvement under a given power budget compared with the Intel RAPL-based method. / Ph. D. / System efficiency on high-performance computing (HPC) systems is the key to achieving the goal of power budget for exascale supercomputers. Techniques for adjusting the performance of different system components can help accomplish this goal by dynamically controlling system performance according to application behaviors. In this dissertation, we focus on three techniques: adjusting CPU performance, memory performance, and the number of threads for running parallel applications. First, we profile the performance and energy consumption of different HPC applications on both Intel systems with accelerators and IBM BG/Q systems. We explore the trade-offs of performance and energy under these techniques and provide optimization insights. Furthermore, we propose a parallel performance model that can accurately capture the impact of these techniques on performance in terms of job completion time. We present an approximation approach for performance prediction. The approximation has up to 7% and 17% prediction error on Intel x86 and IBM BG/Q systems respectively under 19 HPC applications. Thereafter, we apply the performance model in a runtime system design for improving performance under a given power budget. Our runtime strategy achieves up to 20% performance improvement to the baseline method.
148

Explicit-State Model Checking of Concurrent x86-64 Assembly

Bharadwaj, Abhijith Ananth 10 July 2020 (has links)
The thesis presents xavier, a novel tool-set for model checking of concurrent x86-64 assembly programs, via Partial Order Reduction (POR). xavier{} presents a realistic platform for systematically exploring and analyzing the state-space of concurrent x86 assembly programs, with the aim of detecting bugs via assertion failures in mainstream programs. Recently, a number of state-of-the-art model checking solutions have been introduced to efficiently explore the state-space of concurrent programs, using POR algorithms. However, such solutions are inefficient while analyzing stateful programming languages, such as the x86 assembly language, due to their low level of abstraction. To this end, xavier{} makes two contributions: i) a novel order-sensitivity based POR algorithm, that is applicable to concurrent x86 assembly, ii) an x86 machine-model that can accurately perform relaxed-consistency emulation of concurrent x86 assembly, without the need for any translations. We demonstrate the applicability of xavier{} through an evaluation on several classical mutual-exclusion benchmarks and mainstream benchmarks from the Userspace Read-Copy-Update (URCU) concurrency library, where the benchmarks range from $250-3700$ lines of x86 assembly. The framework is the first that supports systematic model checking of concurrent x86 assembly programs, and the effectiveness of xavier{} is demonstrated by reproducing a concurrency issue of threads accessing intermediate states in the URCU library, which stems from an assumption violation. / Master of Science / Sound verification of multi-threaded programs necessitate a systematic analysis of program state-spaces that result from thread interactions. Consequently, model-checking cite{godefroid1997model, Clarke2018} has been one of the prominent methods used to tackle the verification of multi-threaded programs. However, existing model-checking solutions are inefficient while analyzing stateful programming languages, such as the x86 assembly language, due to the solutions' higher level of abstraction. Therefore, the thesis presents xavier, a novel tool-set and a realistic platform for systematically exploring and analyzing the state-space of mainstream concurrent x86 assembly programs, with the aim of detecting bugs via assertion failures. To this end, xavier{} makes two contributions: i) a novel order-sensitivity based Partial Order Reduction algorithm, which efficiently explores the state space of concurrent x86 assembly, ii) an x86 machine-model that can accurately emulate the execution of concurrent x86 assembly, without the need for any translations. We demonstrate the applicability of xavier{} through an evaluation on several classical mutual-exclusion and mainstream benchmarks from the Userspace Read-Copy-Update (URCU) concurrency library, where the benchmarks range from $250-3700$ lines of x86 assembly. Moreover, we demonstrate the effectiveness of xavier{} by reproducing a concurrency issue in the URCU library, which manifests as a result of an assumption violation.
149

Targeted Client Synthesis for Detecting Concurrency Bugs

Samak, Malavika January 2016 (has links) (PDF)
Detecting concurrency bugs can be challenging due to the intricacies associated with their manifestation. These intricacies correspond to identifying the methods that need to be invoked concurrently, the inputs passed to these methods and the interleaving of the threads that cause the erroneous behavior. Neither fuzzing-based testing techniques nor over-approximate static analyses are well positioned to detect subtle concurrency defects while retaining high accuracy alongside satisfactory coverage. While dynamic analysis techniques have been proposed to overcome some of the challenges in detecting concurrency bugs, we observe that their success is critically dependent on the availability of effective multithreaded clients. Without a priori knowledge of the defects, manually constructing defect-revealing multithreaded clients is non-trivial. In this thesis, we design an approach to address the problem of automatically generate clients for detecting concurrency bugs in multithreaded libraries. The key insight underlying our design is that a subset of the properties observed when the defects manifest in a concur-rent execution can also be observed in a sequential execution. The input to our approach is a library implementation and a sequential testsuite, and the output is a set of multithreaded clients that can be used to reveal defects in the input library implementation. Dynamic defect detectors can execute the clients and analyze the resulting traces to report various kinds of defects including deadlocks, data races and atomicity violations. Furthermore, the clients can also be used by testing frameworks to report assertion violations. We propose two variants of our design – (a) path-agnostic client generation, and (b) path-aware client generation. The path-agnostic client generation process helps in detection of potential bugs present in the paths executed by the input sequential testsuite. It does not attempt to explore newer paths by satisfying path conditions either by modifying the input or by scheduling the threads appropriately. The generated clients are used to expose deadlocks, data races and atomicity violations. Our analysis analyzes the execution traces obtained from executing the input sequential clients and produces a concurrent client program that drives shared objects via library methods calls to states conducive for triggering deadlocks, data races or atomicity violations. For path-aware client generation, our approach explores newer paths that are not covered by the input sequential testsuite to generate clients. For this purpose, we design a directed, iterative and scalable engine that combines the strengths of static and dynamic analysis to help synthesize both multithreaded clients and schedules that violate complex correctness conditions expressed by the developer. Apart from the library implementation and the sequential testsuite as input, this engine also accepts a specification of correctness as input. Then, it iteratively refines each client from the input sequential testsuite to generate an ex-ecution that can break the input specification. Each step of the iterative process includes statically identifying sub-goals towards the goal of failing the specification, generating a plan toward meeting these goals, and merging of the paths traversed dynamically with the plan computed statically via constraint solving to generate a new client. The engine reports full reproduction scenarios, guaranteed to be true, for the bugs it finds. We have implemented prototypes that incorporate the aforementioned ideas and validated them by applying them on 29 well-tested concurrent classes from popular Java libraries, including the latest version of JDK. We are able to automatically generate clients that helped expose more than 300 concurrency bugs including deadlocks, data races, atomicity violations and assertion violations. We reported many previously unknown bugs to the developers of these libraries resulting in either fixes to the code or changes to the documentation pertaining to the thread-safe behavior of the relevant classes. On average, the time taken to analyze a class and generate clients for it is less than two minutes. We believe that the demonstrated effectiveness of our prototypes in helping expose deep bugs in popular Java libraries makes the design, proposed in this thesis, a vital cog in the future development and deployment of dynamic concurrency bug detectors.
150

Framework for Real-time collaboration on extensive Data Types using Strong Eventual Consistency

Masson, Constantin 12 1900 (has links)
La collaboration en temps réel est un cas spécial de collaboration où les utilisateurs travaillent sur le même élément simultanément et sont au courant des modifications des autres utilisateurs en temps réel. Les données distribuées doivent rester disponibles et consistant tout en étant répartis sur plusieurs systèmes physiques. "Strong Consistency" est une approche qui crée un ordre total des opérations en utilisant des mécanismes tel que le "locking". Cependant, cela introduit un "bottleneck". Ces dix dernières années, les algorithmes de concurrence ont été étudiés dans le but de garder la convergence de tous les replicas sans utiliser de "locking" ni de synchronisation. "Operational Trans- formation" et "Conflict-free Replicated Data Types (CRDT)" sont utilisés dans ce but. Cependant, la complexité de ces stratégies les rend compliquées à intégrer dans des logicielles conséquents, comme les éditeurs de modèles, spécialement pour des data structures complexes comme les graphes. Les implémentations actuelles intègrent seulement des data linéaires tel que le texte. Dans ce mémoire, nous présentons CollabServer, un framework pour construire des environnements de collaboration. Il a une implémentation de CRDTs pour des data structures complexes tel que les graphes et donne la possibilité de construire ses propres data structures. / Real-time collaboration is a special case of collaboration where users work on the same artefact simultaneously and are aware of each other’s changes in real-time. Shared data should remain available and consistent while dealing with its physically distributed aspect. Strong Consistency is one approach that enforces a total order of operations using mechanisms, such as locking. This however introduces a bottleneck. In the last decade, algorithms for concurrency control have been studied to keep convergence of all replicas without locking or synchronization. Operational Transformation and Conflict free Replicated Data Types (CRDT) are widely used to achieve this purpose. However, the complexity of these strategies makes it hard to integrate in large software, such as modeling editors, especially for complex data types like graphs. Current implementations only integrate linear data, such as text. In this thesis, we present CollabServer, a framework to build collaborative environments. It features a CRDTs implementation for complex data types such as graphs and gives possibility to build other data structures.

Page generated in 0.0558 seconds