Global ETD Search

141	Prediction Models for Multi-dimensional Power-Performance Optimization on Many Cores Shah, Ankur Savailal 28 May 2008 (has links) Power has become a primary concern for HPC systems. Dynamic voltage and frequency scaling (DVFS) and dynamic concurrency throttling (DCT) are two software tools (or knobs) for reducing the dynamic power consumption of HPC systems. To date, few works have considered the synergistic integration of DVFS and DCT in performance-constrained systems, and, to the best of our knowledge, no prior research has developed application-aware simultaneous DVFS and DCT controllers in real systems and parallel programming frameworks. We present a multi-dimensional, online performance prediction framework, which we deploy to address the problem of simultaneous runtime optimization of DVFS, DCT, and thread placement on multi-core systems. We present results from an implementation of the prediction framework in a runtime system linked to the Intel OpenMP runtime environment and running on a real dual-processor quad-core system as well as a dual-processor dual-core system. We show that the prediction framework derives near-optimal settings of the three power-aware program adaptation knobs that we consider. Our overall runtime optimization framework achieves significant reductions in energy (12.27% mean) and ED² (29.6% mean), through simultaneous power savings (3.9% mean) and performance improvements (10.3% mean). Our prediction and adaptation framework outperforms earlier solutions that adapt only DVFS or DCT, as well as one that sequentially applies DCT then DVFS. Further, our results indicate that prediction-based schemes for runtime adaptation compare favorably and typically improve upon heuristic search-based approaches in both performance and energy savings. / Master of Science concurrency throttling power-aware computing runtime adaptation performance prediction high-performance computing Multicore processors
142	Exploring the Boundaries of Operating System in the Era of Ultra-fast Storage Technologies Ramanathan, Madhava Krishnan 24 May 2023 (has links) The storage hardware is evolving at a rapid pace to keep up with the exponential rise of data consumption. Recently, ultra-fast storage technologies such as nano-second scale byte- addressable Non-Volatile Memory (NVM), micro-second scale SSDs are being commercialized. However, the OS storage stack has not been evolving fast enough to keep up with these new ultra-fast storage hardware. Hence, the latency due user-kernel context switch caused by system calls and hardware interrupts is no longer negligible as presumed in the era of slower high latency hard disks. Further, the OS storage stack is not designed with multi-core scalability in mind; so with CPU core count continuously increasing, the OS storage stack particularly the Virtual Filesystem (VFS) and filesystem layer are increasingly becoming a scalability bottleneck. Applications bypass the kernel (kernel-bypass storage stack) completely to eliminate the storage stack from becoming a performance and scalability bottleneck. But this comes at the cost of programmability, isolation, safety, and reliability. Moreover, scalability bottlenecks in the filesystem can not be addressed by simply moving the filesystem to the userspace. Overall, while designing a kernel-bypass storage stack looks obvious and promising there are several critical challenges in the aspects of programmability, performance, scalability, safety, and reliability that needs to be addressed to bypass the traditional OS storage stack. This thesis proposes a series of kernel-bypass storage techniques designed particularly for fast memory-centric storage. First, this thesis proposes a scalable persistent transactional memory (PTM) programming model to address the programmability and multi-core scalability challenges. Next, this thesis proposes techniques to make the PTM memory safe and fault tolerant. Further, this thesis also proposes a kernel-bypass programming framework to port legacy DRAM-based in-memory database applications to run on persistent memory-centric storage. Finally, this thesis explores an application-driven approach to address the CPU side and storage side bottlenecks in the deep learning model training by proposing a kernel-bypass programming framework to move to compute closer to the storage. Overall, the techniques proposed in this thesis will be a strong foundation for the applications to adopt and exploit the emerging ultra-fast storage technologies without being bottlenecked by the traditional OS storage stack. / Doctor of Philosophy / The storage hardware is evolving at a rapid pace to keep up with the exponential rise of data consumption. Recently, ultra-fast storage technologies such as nano-second scale byte- addressable Non-Volatile Memory (NVM), micro-second scale SSDs are being commercialized. The Operating System (OS) has been the gateway for the applications to access and manage the storage hardware. Unfortunately, the OS storage stack that is designed with slower storage technologies (e.g., hard disk drives) becomes a performance, scalability, and programmability bottleneck for the emerging ultra-fast storage technologies. This has created a large gap between the storage hardware advancements and the system software support for such emerging storage technologies. Consequently, applications are constrained by the limitations of the OS storage stack when they intend to explore these emerging storage technologies. In this thesis, we propose a series of novel kernel-bypass storage stack designs to address the performance, scalability, and programmability limitations of the conventional OS storage stack. The kernel-bypass storage stack proposed in this thesis is carefully designed with ultra-fast modern storage hardware in mind. Application developers can leverage the kernel-bypass techniques proposed in this thesis to develop new applications or port the legacy applications to use the emerging ultra-fast storage technologies without being constrained by the limitations of the conventional OS storage stack. Kernel-bypass storage stack Non-volatile Memory Concurrency Multi-core Scalability Operating System Computational Storage
143	Explicit-State Model Checking of Concurrent x86-64 Assembly Bharadwaj, Abhijith Ananth 10 July 2020 (has links) The thesis presents xavier, a novel tool-set for model checking of concurrent x86-64 assembly programs, via Partial Order Reduction (POR). xavier{} presents a realistic platform for systematically exploring and analyzing the state-space of concurrent x86 assembly programs, with the aim of detecting bugs via assertion failures in mainstream programs. Recently, a number of state-of-the-art model checking solutions have been introduced to efficiently explore the state-space of concurrent programs, using POR algorithms. However, such solutions are inefficient while analyzing stateful programming languages, such as the x86 assembly language, due to their low level of abstraction. To this end, xavier{} makes two contributions: i) a novel order-sensitivity based POR algorithm, that is applicable to concurrent x86 assembly, ii) an x86 machine-model that can accurately perform relaxed-consistency emulation of concurrent x86 assembly, without the need for any translations. We demonstrate the applicability of xavier{} through an evaluation on several classical mutual-exclusion benchmarks and mainstream benchmarks from the Userspace Read-Copy-Update (URCU) concurrency library, where the benchmarks range from $250-3700$ lines of x86 assembly. The framework is the first that supports systematic model checking of concurrent x86 assembly programs, and the effectiveness of xavier{} is demonstrated by reproducing a concurrency issue of threads accessing intermediate states in the URCU library, which stems from an assumption violation. / Master of Science / Sound verification of multi-threaded programs necessitate a systematic analysis of program state-spaces that result from thread interactions. Consequently, model-checking cite{godefroid1997model, Clarke2018} has been one of the prominent methods used to tackle the verification of multi-threaded programs. However, existing model-checking solutions are inefficient while analyzing stateful programming languages, such as the x86 assembly language, due to the solutions' higher level of abstraction. Therefore, the thesis presents xavier, a novel tool-set and a realistic platform for systematically exploring and analyzing the state-space of mainstream concurrent x86 assembly programs, with the aim of detecting bugs via assertion failures. To this end, xavier{} makes two contributions: i) a novel order-sensitivity based Partial Order Reduction algorithm, which efficiently explores the state space of concurrent x86 assembly, ii) an x86 machine-model that can accurately emulate the execution of concurrent x86 assembly, without the need for any translations. We demonstrate the applicability of xavier{} through an evaluation on several classical mutual-exclusion and mainstream benchmarks from the Userspace Read-Copy-Update (URCU) concurrency library, where the benchmarks range from $250-3700$ lines of x86 assembly. Moreover, we demonstrate the effectiveness of xavier{} by reproducing a concurrency issue in the URCU library, which manifests as a result of an assumption violation. Partial Order Reduction x86 Machine-Code Verification Software Model Checking Concurrency
144	Dynamic Invariant Generation for Concurrent Programs Chattopadhyay, Arijit 23 June 2014 (has links) We propose a fully automated and dynamic method for generating likely invariants from multithreaded programs and then leveraging these invariants to infer atomic regions and diagnose concurrency errors in the software code. Although existing methods for dynamic invariant generation perform reasonably well on sequential programs, for multithreaded programs, their effectiveness often reduces dramatically in terms of both the number of invariants that they can generate and the likelihood of them being true invariants. We solve this problem by developing a new dynamic invariant generator, which consists of a new LLVM based code instrumentation tool, an INSPECT based thread interleaving explorer, and a customized inference engine inside Daikon. We have evaluated the resulting system on public domain multithreaded C/C++ benchmarks. Our experiments show that the new method is effective in generating high-quality invariants. Furthermore, the state and transition invariants generated by our new method have been proved useful both in error diagnosis and in identifying likely atomic regions in the concurrent software code. / Master of Science Concurrency Likely Invariant Dynamic Invariant Generation Partial Order Reduction Error Diagnosis Atomic Region
145	Improving Performance of Highly-Programmable Concurrent Applications by Leveraging Parallel Nesting and Weaker Isolation Levels Niles, Duane Francis Jr. 15 July 2015 (has links) The recent development of multi-core computer architectures has largely affected the creation of everyday applications, requiring the adoption of concurrent programming to significantly utilize the divided processing power of computers. Applications must be split into sections able to execute in parallel, without any of these sections conflicting with one another, thereby necessitating some form of synchronization to be declared. The most commonly used methodology is lock-based synchronization; although, to improve performance the most, developers must typically form complex, low-level implementations for large applications, which can easily create potential errors or hindrances. An abstraction from database systems, known as transactions, is a rising concurrency control design aimed to circumvent the challenges with programmability, composability, and scalability in lock-based synchronization. Transactions execute their operations speculatively and are capable of being restarted (or rolled back) when there exist conflicts between concurrent actions. As such issues can occur later in the lifespans of transactions, entire rollbacks are not that effective for performance. One particular method, known as nesting, was created to counter that drawback. Nesting is the act of enclosing transactions within other transactions, essentially dividing the work into pieces called sub-transactions. These sub-transactions can roll back without affecting the entire main transaction, although general nesting models only allow one sub-transaction to perform work at a time. The first main contribution in this thesis is SPCN, an algorithm that parallelizes nested transactions while automatically processing any potential conflicts that may arise, eliminating the burden of additional processing from the application developers. Two versions of SPCN exist: Strict, which enforces the sub-transactions' work to be made visible in a serialized order; and Relaxed, which allows sub-transactions to distribute their information immediately as they finish (therefore invalidation may occur after-the-fact and must be handled). Despite the additional logic required by SPCN, it outperforms traditional closed nesting by 1.78x at the lowest and 3.78x at the highest in the experiments run. Another method to alter transactional execution and boost performance is to relax the rules of visibility for parallel operations (known as their isolation). Depending on the application, correctness is not broken even if some transactions see external work that may later be undone due to a rollback, or if an object is written while another transaction is using an older instance of its data. With lock-based synchronization, developers would have to explicitly design their application with varying amounts of locks, and different lock organizations or hierarchies, to change the strictness of the execution. With transactional systems, the processing performed by the system itself can be set to utilize different rulings, which can change the performance of an application without requiring it to be largely redesigned. This notion leads to the second contribution in this thesis: AsR, or As-Serializable transactions. Serializability is the general form of isolation or strictness for transactions in many applications. In terms of execution, its definition is equivalent to only one transaction running at a time in a given system. Many transactional systems use their own internal form of locking to create Serializable executions, but it is typically too strict for many applications. AsR transactions allow the internal processing to be relaxed while additional meta-data is used external to the system, without requiring any interaction from the developer or any changes to the given application. AsR transactions offer multiple orders of magnitude more in throughput in highly-contentious scenarios, due to their capability to outlast traditional levels of isolation. / Master of Science Concurrency Transactional Memory Parallel Nesting Distributed Systems Databases Transactions Serializability Weaker Isolation Levels
146	Efficient Symbolic Execution of Concurrent Software Guo, Shengjian 26 April 2019 (has links) Concurrent software has been widely utilizing in computer systems owing to the highly efficient computation. However, testing and verifying concurrent software remain challenging tasks. This matter is not only because of the non-deterministic thread interferences which are hard to reason about but also because of the large state space due to the simultaneous path and interleaving explosions. That is, the number of program paths in each thread may be exponential in the number of branch conditions, and also, the number of thread interleavings may be exponential in the number of concurrent operations. This dissertation presents a set of new methods, built upon symbolic execution, a program analysis technique that systematically explores program state space, for testing concurrent programs. By modeling both functional and non-functional properties of the programs as assertions, these new methods efficiently analyze the viable behaviors of the given concurrent programs. The first method is assertion guided symbolic execution, a state space reduction technique that identifies and eliminates redundant executions w.r.t the explored interleavings. The second method is incremental symbolic execution, which generates test inputs only for the influenced program behaviors by the small code changes between two program versions. The third method is SYMPLC, a technique with domain-specific reduction strategies for generating tests for the multitasking Programmable Logic Controller (PLC) programs written in languages specified by the IEC 61131-3 standard. The last method is adversarial symbolic execution, a technique for detecting concurrency related side-channel information leaks by analyzing the cache timing behaviors of a concurrent program in symbolic execution. This dissertation evaluates the proposed methods on a diverse set of both synthesized programs and real-world applications. The experimental results show that these techniques can significantly outperform state-of-the-art symbolic execution tools for concurrent software. / Doctor of Philosophy / Software testing is a technique that runs software as a black-box on computer hardware multiple times, with different inputs per run, to test if the software behavior conforms to the designed functionality by developers. Nowadays, programmers have been increasingly developing multithreaded and multitasking software, e.g., web browser and web server, to utilize the highly efficient multiprocessor hardware. This approach significantly improves the software performance since a large computing job can now decompose to a set of small jobs which can then distribute to concurrently running threads (tasks). However, testing multithreaded (multitask) software is extremely challenging. The most critical problem is the inherent non-determinism. Typically, executing sequential software with the same input data always results in the same output. However, running a multithreaded (multitask) software multiple times, even under the same input data, may yield different output in each run. The root reason is that concurrent threads (tasks) may interleave their running progress at any time; thus the internal software execution order may be altered unexpectedly, causing runtime errors. Meanwhile, finding such faults is difficult, since the number of all possible interleavings can be exponentially growing in the number of concurrent thread (task) operations. This dissertation proposes four methods to test multithreaded/multitask software efficiently. The first method summarizes the already-tested program behaviors to avoid future testing runs that cannot lead to new faults. The second method only tests program behaviors that are impacted by program changes. The third method tests multitask Programmable Logic Controller (PLC) programs by excluding infeasible testing runs w.r.t the PLC semantics. The last method tests non-functional program properties by systematic concurrency analysis. This dissertation evaluates these methods upon a diverse set of benchmarks. The experimental results show that the proposed methods significantly outperform state-of-the-art techniques for concurrent software analysis. Symbolic Execution Concurrency Predicate summary Change impact analysis Programmable logic controller Side-channel leak
147	Runtime Verification and Debugging of Concurrent Software Zhang, Lu 29 July 2016 (has links) Our reliance on software has been growing fast over the past decades as the pervasive use of computer and software penetrated not only our daily life but also many critical applications. As the computational power of multi-core processors and other parallel hardware keeps increasing, concurrent software that exploit these parallel computing hardware become crucial for achieving high performance. However, developing correct and efficient concurrent software is a difficult task for programmers due to the inherent nondeterminism in their executions. As a result, concurrency related software bugs are among the most troublesome in practice and have caused severe problems in recent years. In this dissertation, I propose a series of new and fully automated methods for verifying and debugging concurrent software. They cover the detection, prevention, classification, and repair of some important types of bugs in the implementation of concurrent data structures and client-side web applications. These methods can be adopted at various stages of the software development life cycle, to help programmers write concurrent software correctly as well as efficiently. / Ph. D. Concurrency Verification Debugging Program Repair Quasi Linearizability Concurrent Data Structure Web Application JavaScript
148	Modeling and Runtime Systems for Coordinated Power-Performance Management Li, Bo 28 January 2019 (has links) Emergent systems in high-performance computing (HPC) expect maximal efficiency to achieve the goal of power budget under 20-40 megawatts for 1 exaflop set by the Department of Energy. To optimize efficiency, emergent systems provide multiple power-performance control techniques to throttle different system components and scale of concurrency. In this dissertation, we focus on three throttling techniques: CPU dynamic voltage and frequency scaling (DVFS), dynamic memory throttling (DMT), and dynamic concurrency throttling (DCT). We first conduct an empirical analysis of the performance and energy trade-offs of different architectures under the throttling techniques. We show the impact on performance and energy consumption on Intel x86 systems with accelerators of Intel Xeon Phi and a Nvidia general-purpose graphics processing unit (GPGPU). We show the trade-offs and potentials for improving efficiency. Furthermore, we propose a parallel performance model for coordinating DVFS, DMT, and DCT simultaneously. We present a multivariate linear regression-based approach to approximate the impact of DVFS, DMT, and DCT on performance for performance prediction. Validation using 19 HPC applications/kernels on two architectures (i.e., Intel x86 and IBM BG/Q) shows up to 7% and 17% prediction error correspondingly. Thereafter, we develop the metrics for capturing the performance impact of DVFS, DMT, and DCT. We apply the artificial neural network model to approximate the nonlinear effects on performance impact and present a runtime control strategy accordingly for power capping. Our validation using 37 HPC applications/kernels shows up to a 20% performance improvement under a given power budget compared with the Intel RAPL-based method. / Ph. D. / System efficiency on high-performance computing (HPC) systems is the key to achieving the goal of power budget for exascale supercomputers. Techniques for adjusting the performance of different system components can help accomplish this goal by dynamically controlling system performance according to application behaviors. In this dissertation, we focus on three techniques: adjusting CPU performance, memory performance, and the number of threads for running parallel applications. First, we profile the performance and energy consumption of different HPC applications on both Intel systems with accelerators and IBM BG/Q systems. We explore the trade-offs of performance and energy under these techniques and provide optimization insights. Furthermore, we propose a parallel performance model that can accurately capture the impact of these techniques on performance in terms of job completion time. We present an approximation approach for performance prediction. The approximation has up to 7% and 17% prediction error on Intel x86 and IBM BG/Q systems respectively under 19 HPC applications. Thereafter, we apply the performance model in a runtime system design for improving performance under a given power budget. Our runtime strategy achieves up to 20% performance improvement to the baseline method. Parallel Performance Modeling Dynamic Voltage and Frequency Scaling Dynamic Memory Throttling Dynamic Concurrency Throttling Shared-Memory Systems
149	Targeted Client Synthesis for Detecting Concurrency Bugs Samak, Malavika January 2016 (has links) (PDF) Detecting concurrency bugs can be challenging due to the intricacies associated with their manifestation. These intricacies correspond to identifying the methods that need to be invoked concurrently, the inputs passed to these methods and the interleaving of the threads that cause the erroneous behavior. Neither fuzzing-based testing techniques nor over-approximate static analyses are well positioned to detect subtle concurrency defects while retaining high accuracy alongside satisfactory coverage. While dynamic analysis techniques have been proposed to overcome some of the challenges in detecting concurrency bugs, we observe that their success is critically dependent on the availability of eﬀective multithreaded clients. Without a priori knowledge of the defects, manually constructing defect-revealing multithreaded clients is non-trivial. In this thesis, we design an approach to address the problem of automatically generate clients for detecting concurrency bugs in multithreaded libraries. The key insight underlying our design is that a subset of the properties observed when the defects manifest in a concur-rent execution can also be observed in a sequential execution. The input to our approach is a library implementation and a sequential testsuite, and the output is a set of multithreaded clients that can be used to reveal defects in the input library implementation. Dynamic defect detectors can execute the clients and analyze the resulting traces to report various kinds of defects including deadlocks, data races and atomicity violations. Furthermore, the clients can also be used by testing frameworks to report assertion violations. We propose two variants of our design – (a) path-agnostic client generation, and (b) path-aware client generation. The path-agnostic client generation process helps in detection of potential bugs present in the paths executed by the input sequential testsuite. It does not attempt to explore newer paths by satisfying path conditions either by modifying the input or by scheduling the threads appropriately. The generated clients are used to expose deadlocks, data races and atomicity violations. Our analysis analyzes the execution traces obtained from executing the input sequential clients and produces a concurrent client program that drives shared objects via library methods calls to states conducive for triggering deadlocks, data races or atomicity violations. For path-aware client generation, our approach explores newer paths that are not covered by the input sequential testsuite to generate clients. For this purpose, we design a directed, iterative and scalable engine that combines the strengths of static and dynamic analysis to help synthesize both multithreaded clients and schedules that violate complex correctness conditions expressed by the developer. Apart from the library implementation and the sequential testsuite as input, this engine also accepts a specification of correctness as input. Then, it iteratively refines each client from the input sequential testsuite to generate an ex-ecution that can break the input specification. Each step of the iterative process includes statically identifying sub-goals towards the goal of failing the specification, generating a plan toward meeting these goals, and merging of the paths traversed dynamically with the plan computed statically via constraint solving to generate a new client. The engine reports full reproduction scenarios, guaranteed to be true, for the bugs it finds. We have implemented prototypes that incorporate the aforementioned ideas and validated them by applying them on 29 well-tested concurrent classes from popular Java libraries, including the latest version of JDK. We are able to automatically generate clients that helped expose more than 300 concurrency bugs including deadlocks, data races, atomicity violations and assertion violations. We reported many previously unknown bugs to the developers of these libraries resulting in either fixes to the code or changes to the documentation pertaining to the thread-safe behavior of the relevant classes. On average, the time taken to analyze a class and generate clients for it is less than two minutes. We believe that the demonstrated eﬀectiveness of our prototypes in helping expose deep bugs in popular Java libraries makes the design, proposed in this thesis, a vital cog in the future development and deployment of dynamic concurrency bug detectors. Concurrency Bugs Dynamic Analysis Program Synthesis Deadlock Detection Bytecode Instrumentation Targeted Client Synthesis Dynamic Deadlock Detection Multithreaded Client Synthesis Atomicity Violations Shared Memory Dynamic Concurrency Bug Detectors Racy Tests Bug Detection Computer Science
150	Framework for Real-time collaboration on extensive Data Types using Strong Eventual Consistency Masson, Constantin 12 1900 (has links) La collaboration en temps réel est un cas spécial de collaboration où les utilisateurs travaillent sur le même élément simultanément et sont au courant des modifications des autres utilisateurs en temps réel. Les données distribuées doivent rester disponibles et consistant tout en étant répartis sur plusieurs systèmes physiques. "Strong Consistency" est une approche qui crée un ordre total des opérations en utilisant des mécanismes tel que le "locking". Cependant, cela introduit un "bottleneck". Ces dix dernières années, les algorithmes de concurrence ont été étudiés dans le but de garder la convergence de tous les replicas sans utiliser de "locking" ni de synchronisation. "Operational Trans- formation" et "Conflict-free Replicated Data Types (CRDT)" sont utilisés dans ce but. Cependant, la complexité de ces stratégies les rend compliquées à intégrer dans des logicielles conséquents, comme les éditeurs de modèles, spécialement pour des data structures complexes comme les graphes. Les implémentations actuelles intègrent seulement des data linéaires tel que le texte. Dans ce mémoire, nous présentons CollabServer, un framework pour construire des environnements de collaboration. Il a une implémentation de CRDTs pour des data structures complexes tel que les graphes et donne la possibilité de construire ses propres data structures. / Real-time collaboration is a special case of collaboration where users work on the same artefact simultaneously and are aware of each other’s changes in real-time. Shared data should remain available and consistent while dealing with its physically distributed aspect. Strong Consistency is one approach that enforces a total order of operations using mechanisms, such as locking. This however introduces a bottleneck. In the last decade, algorithms for concurrency control have been studied to keep convergence of all replicas without locking or synchronization. Operational Transformation and Conflict free Replicated Data Types (CRDT) are widely used to achieve this purpose. However, the complexity of these strategies makes it hard to integrate in large software, such as modeling editors, especially for complex data types like graphs. Current implementations only integrate linear data, such as text. In this thesis, we present CollabServer, a framework to build collaborative environments. It features a CRDTs implementation for complex data types such as graphs and gives possibility to build other data structures. Concurrency Control Concurrent Algorithms Distributed Systems Optimistic concurrency control Software Engineering Shared Data Strong Eventual Consistency Algorithme de concurrence Data distribuée Génie logiciel Gestion de concurrence Strong Eventual Consistency Système distribués

Search results