Global ETD Search

51	Discovering heap anomalies in the wild Jump, Maria Eva 01 February 2010 (has links) Programmers increasingly rely on managed languages (e.g. Java and C#) to develop applications faster and with fewer bugs. Managed languages encourage allocating objects in the heap and rely on automatic memory management (garbage collection) to reclaim objects the program can no longer access. With more objects in the heap, the heap encodes more program state than ever before and offers new opportunities for optimization and analysis. This dissertation shows how to efficiently leverage the managed runtime to perform dynamic heap analysis. Previous heap analysis approaches significantly slow down programs, require special hardware, and/or increase memory consumption by 75% or more. We presents two synergistic techniques—dynamic object sampling (DOS) and heap summarization (HSG)—that mine program state embedded in the heap efficiently enough to use in production and effectively enough to improve performance, find bugs, and increase program understanding. We use these techniques to address three problems: (1) Performance of managed language. Because some objects live for a long time, they incur disproportionate collection costs. We optimize these costs with dynamic pretenuring. Dynamic pretenuring uses DOS to accurately predict allocation site survival rates and uses these predictions to improve performance. (2) Finding bugs. Memory leaks in managed languages occur when a program inadvertently maintains references to objects that it no longer needs. Along with degrading performance and resulting in program crashes, memory leaks cause systematic heap growth. We introduce Cork which uses the simplest type of HSG, a class points-from summary graph (CPFG), to detect systematic heap growth. Cork quickly identifies growing data structures observed in three popular benchmarks (fop, jess, and jbb2000) while adding an average of only 2.3% to total time. Additionally, we use Cork to debug a reported memory leak in Eclipse. (3) Program understanding. For a long time, static analysis has sought to statically summarize the shape of dynamic data structures to aid in program verification and understanding. Unfortunately, it only works on small programs. We introduce ShapeUp which instead characterizes recursive data structures dynamically by discovering data structure shape and degree invariants at runtime. ShapeUp uses DOS and a class field-wise summary graph (CFSG) to track in- and out-degree invariants of data structure nodes. We show how ShapeUp automatically identifies recursive data structures and likely shape invariants. Finally, we monitor discovered shape invariants to detect when a data structure becomes malformed. In summary, this dissertation is the first to leverage the managed runtime to perform dynamic heap analysis both accurately and efficiently. Our results show that the heap contains an enormous amount of program state and that there is much potential for dynamically mining heap characteristics for optimization, debugging, and program understanding. / text Heap analysis Dynamic object sampling Heap summarization Managed languages Managed runtime Debugging Heap anomalies
52	Effective cooperative scheduling of task-parallel applications on multiprogrammed parallel architectures Varisteas, Georgios January 2015 (has links) Emerging architecture designs include tens of processing cores on a single chip die; it is believed that the number of cores will reach the hundreds in not so many years from now. However, most common parallel workloads cannot fully utilize such systems. They expose fluctuating parallelism, and do not scale up indefinitely as there is usually a point after which synchronization costs outweigh the gains of parallelism. The combination of these issues suggests that large-scale systems will be either multiprogrammed or have their unneeded resources powered off.Multiprogramming leads to hardware resource contention and as a result application performance degradation, even when there are enough resources, due to negative share effects and increased bus traffic. Most often this degradation is quite unbalanced between co-runners, as some applications dominate the hardware over others. Current Operating Systems blindly provide applications with access to as many resources they ask for. This leads to over-committing the system with too many threads, memory contention and increased bus traffic. Due to the inability of the application to have any insight on system-wide resource demands, most parallel workloads will create as many threads as there are available cores. If every co-running application does the same, the system ends up with threads $N$ times the amount of cores. Threads then need to time-share cores, so the continuous context-switching and cache line evictions generate considerable overhead.This thesis proposes a novel solution across all software layers that achieves throughput optimization and uniform performance degradation of co-running applications. Through a novel fully automated approach (DVS and Palirria), task-parallel applications can accurately quantify their available parallelism online, generating a meaningful metric as parallelism feedback to the Operating System. A second component in the Operating System scheduler (Pond) uses such feedback from all co-runners to effectively partition available resources.The proposed two-level scheduling scheme ultimately achieves having each co-runner degrade its performance by the same factor, relative to how it would execute with unrestricted isolated access to the same hardware. We call this fair scheduling, departing from the traditional notion of equal opportunity which causes uneven degradation, with some experiments showing at least one application degrading its performance 10 times less than its co-runners. / <p>QC 20151016</p> multicore parallel scheduler workload runtime task adaptive resource management load balancing work-stealing
53	Modeling humans as peers and supervisors in computing systems through runtime models Zhong, Christopher January 1900 (has links) Doctor of Philosophy / Department of Computing and Information Sciences / Scott A. DeLoach / There is a growing demand for more effective integration of humans and computing systems, specifically in multiagent and multirobot systems. There are two aspects to consider in human integration: (1) the ability to control an arbitrary number of robots (particularly heterogeneous robots) and (2) integrating humans as peers in computing systems instead of being just users or supervisors. With traditional supervisory control of multirobot systems, the number of robots that a human can manage effectively is between four and six [17]. A limitation of traditional supervisory control is that the human must interact individually with each robot, which limits the upper-bound on the number of robots that a human can control effectively. In this work, I define the concept of "organizational control" together with an autonomous mechanism that can perform task allocation and other low-level housekeeping duties, which significantly reduces the need for the human to interact with individual robots. Humans are very versatile and robust in the types of tasks they can accomplish. However, failures in computing systems are common and thus redundancies are included to mitigate the chance of failure. When all redundancies have failed, system failure will occur and the computing system will be unable to accomplish its tasks. One way to further reduce the chance of a system failure is to integrate humans as peer "agents" in the computing system. As part of the system, humans can be assigned tasks that would have been impossible to complete due to failures. Multiagent systems Algorithms Runtime models Human-robot interactions Artificial Intelligence (0800) Computer Science (0984) Robotics (0771)
54	Insightful Performance Analysis of Many-Task Runtimes through Tool-Runtime Integration Chaimov, Nicholas 06 September 2017 (has links) Future supercomputers will require application developers to expose much more parallelism than current applications expose. In order to assist application developers in structuring their applications such that this is possible, new programming models and libraries are emerging, the many-task runtimes, to allow for the expression of orders of magnitude more parallelism than currently existing models. This dissertation describes the challenges that these emerging many-task runtimes will place on performance analysis, and proposes deep integration between runtimes and performance tools as a means of producing correct, insightful, and actionable performance results. I show how tool-runtime integration can be used to aid programmer understanding of performance characteristics and to provide online performance feedback to the runtime for Unified Parallel C (UPC), High Performance ParalleX (HPX), Apache Spark, the Open Community Runtime, and the OpenMP runtime. Apache Spark High performance computing High performance ParalleX Open community runtime Task parallelism Unified Parallel C
55	MPI Performance Engineering with the MPI Tools Information Interface Ramesh, Srinivasan 06 September 2018 (has links) The desire for high performance on scalable parallel systems is increasing the complexity and the need to tune MPI implementations. The MPI Tools Information Interface (MPI T) introduced in the MPI 3.0 standard provides an opportunity for performance tools and external software to introspect and understand MPI runtime behavior at a deeper level to detect scalability issues. The interface also provides a mechanism to fine-tune the performance of the MPI library dynamically at runtime. This thesis describes the motivation, design, and challenges involved in developing an MPI performance engineering infrastructure using MPI T for two performance toolkits — the TAU Performance System, and Caliper. I validate the design of the infrastructure for TAU by developing optimizations for production and synthetic applications. I show that the MPI T runtime introspection mechanism in Caliper enables a meaningful analysis of performance data. This thesis includes previously published co-authored material. High performance computing Message passing interface Performance autotuning Performance engineering Runtime introspection TAU
56	Multi-tasking scheduling for heterogeneous systems Wen, Yuan January 2017 (has links) Heterogeneous platforms play an increasingly important role in modern computer systems. They combine high performance with low power consumption. From mobiles to supercomputers, we see an increasing number of computer systems that are heterogeneous. The most well-known heterogeneous system, CPU+GPU platforms have been widely used in recent years. As they become more mainstream, serving multiple tasks from multiple users is an emerging challenge. A good scheduler can greatly improve performance. However, indiscriminately allocating tasks based on availability leads to poor performance. As modern GPUs have a large number of hardware resources, most tasks cannot efficiently utilize all of them. Concurrent task execution on GPU is a promising solution, however, indiscriminately running tasks in parallel causes a slowdown. This thesis focuses on scheduling OpenCL kernels. A runtime framework is developed to determine where to schedule OpenCL kernels. It predicts the best-fit device by using a machine learning-based classifier, then schedules the kernels accordingly to either CPU or GPU. To improve GPU utilization, a kernel merging approach is proposed. Kernels are merged if their predicted co-execution can provide better performance than sequential execution. A machine learning based classifier is developed to find the best kernel pairs for co-execution on GPU. Finally, a runtime framework is developed to schedule kernels separately on either CPU or GPU, and run kernels in pairs if their co-execution can improve performance. The approaches developed in this thesis significantly improve system performance and outperform all existing techniques. 006.6
57	Stagioni: Temperature management to enable near-sensor processing for performance, fidelity, and energy-efficiency of vision and imaging workloads January 2019 (has links) abstract: Vision processing on traditional architectures is inefficient due to energy-expensive off-chip data movements. Many researchers advocate pushing processing close to the sensor to substantially reduce data movements. However, continuous near-sensor processing raises the sensor temperature, impairing the fidelity of imaging/vision tasks. The work characterizes the thermal implications of using 3D stacked image sensors with near-sensor vision processing units. The characterization reveals that near-sensor processing reduces system power but degrades image quality. For reasonable image fidelity, the sensor temperature needs to stay below a threshold, situationally determined by application needs. Fortunately, the characterization also identifies opportunities -- unique to the needs of near-sensor processing -- to regulate temperature based on dynamic visual task requirements and rapidly increase capture quality on demand. Based on the characterization, the work proposes and investigate two thermal management strategies -- stop-capture-go and seasonal migration -- for imaging-aware thermal management. The work present parameters that govern the policy decisions and explore the trade-offs between system power and policy overhead. The work's evaluation shows that the novel dynamic thermal management strategies can unlock the energy-efficiency potential of near-sensor processing with minimal performance impact, without compromising image fidelity. / Dissertation/Thesis / Masters Thesis Computer Engineering 2019 Computer engineering Computer science Electrical engineering continuous mobile vision near-sensor processing runtime thermal noise
58	Conception et réalisation d'un solveur pour les problèmes de dynamique des fluides pour les architectures many-core / Design of generic modular solutions for PDE solvers for modern architectures Genet, Damien 12 December 2014 (has links) La simulation numérique fait partie intégrante du processus d'analyse. Que l'on veuille concevoir le profil d'un véhicule, ou chercher à prévoir le résultat d'un forage pétrolier, la simulation numérique est devenue un outil complémentaire à la théorie et aux expérimentations. Cet outildoit produire des résultats précis en un minimum de temps. Pour cela, nous avons à disposition des méthodes numériques précises, et des machines de calcul aux performances importantes. Cet outil doit être générique sur les maillages, l'ordre de la solution, les méthodes numériques, et doitmaintenir ses performances sur les machines de calculs modernes avec une hiérarchie complexes d'unité de calculs. Nous présentons dans cette thèse le background mathématiques de deux classes de schémas numériques, les méthodes aux éléments finis continus et discontinus. Puis nous présentons les enjeux de la conception d'une plateforme en prenant en compte l'ensemble de ces contraintes. Ensuite nous nous intéressons au sous-problème de l'assemblage au dessus d'un support d'exécution. L'opération d'assemblage se retrouve en algèbre linéaire dans les méthodes multi-frontales ou dans les applications de simulations assemblant un système linéaire. Puis, nous concluons en dressant un bilan sur la plateforme AeroSol et donnons des pistes d'évolution possibles. / Numerical simulation is nowadays an essential part of engineering analysis, be it to design anew plane, or to detect underground oil reservoirs. Numerical simulations have indeed become an important complement to theoretical and experimental investigation, allowing one to reduce the cost of engineering design processes. In order to achieve a high level of precision, one need to increase the resolution of his computational domain. So to keep getting results in reasonable time, one shall nd a way to speed-up computations. To do this, we use high performance computing, HPC, to exploit the complex architecture of modern supercomputers. Under these two constraints, and some other like the genericity of finite elements, or the mesh dimension, we developed a new platform AeroSol. In this thesis, we present the mathematical background, and the two types of schemes that are implemented in the platform, the continuous finite elements method, and the discontinuous one. Then, we present the design choices made in the platform,then, we study a sub-problem, the assembly operation, which can be found in linear algebra multi-frontal methods. Calcul haute performance Eléments finis Support d'exécution HPC Finite element methods Runtime systems
59	The JCop language specification : Version 1.0, April 2012 Appeltauer, Malte, Hirschfeld, Robert January 2012 (has links) Program behavior that relies on contextual information, such as physical location or network accessibility, is common in today's applications, yet its representation is not sufficiently supported by programming languages. With context-oriented programming (COP), such context-dependent behavioral variations can be explicitly modularized and dynamically activated. In general, COP could be used to manage any context-specific behavior. However, its contemporary realizations limit the control of dynamic adaptation. This, in turn, limits the interaction of COP's adaptation mechanisms with widely used architectures, such as event-based, mobile, and distributed programming. The JCop programming language extends Java with language constructs for context-oriented programming and additionally provides a domain-specific aspect language for declarative control over runtime adaptations. As a result, these redesigned implementations are more concise and better modularized than their counterparts using plain COP. JCop's main features have been described in our previous publications. However, a complete language specification has not been presented so far. This report presents the entire JCop language including the syntax and semantics of its new language constructs. / Das Verhalten von modernen Software-Anwendungen benötigt häufig Informationen über den Kontext ihrer Ausführung, z.B. die geografische Position, die Tageszeit oder die aktuelle Netzwerkbandbreite. Dennoch bieten heutige Programmiersprachen nur wenig Unterstützung für die Repräsentation kontextspezifischen Verhaltens. Kontextorientiertes Programmieren ist ein Ansatz, der die explizite Modularisierung und Laufzeitaktivierung von kontextspezifischem Verhalten auf der Ebene von Programmiersprachkonstrukten ermöglicht. Die bisherigen Umsetzungen von kontextorientiertem Programmieren schränken jedoch die Kontrolle der Laufzeitaktivierungen solches kontextspezifischen Verhaltens ein. Daraus folgt eine Einschränkung der Anwendungsbereiche für kontextorientiertes Programmieren, unter anderem für solche Domänen, in denen Programme sehr häufig kontextabhängiges Verhalten bereitstellen, z.B. ereignisbasierte, mobile und dienstorientierte Systeme. Die Programmiersprache JCop erweitert Java um Sprachkonstrukte für kontextorientieres Programmieren und bietet zusätzlich eine domänenspezifische Aspektsprach an, mit deren Hilfe Laufzeitadaptionen deklarativ spezifiziert werden können. Die Kernkonzepte von JCop wurden bereits in mehrern Publikationen vorgestellt, dieser Bericht enthält nun eine umfassende Sprachspezifikation von JCop. Programming Languages Context-oriented Programming Aspect-oriented Programming Java JCop runtime adaptations Data processing Computer science
60	RUMBA: Runtime Monitoring and Behavioral Analysis Framework for Java Software Systems Ashkan, Azin January 2007 (has links) A goal of runtime monitoring is to observe software execution to determine whether it complies with its intended behavior. Monitoring allows one to analyze and recover from detected faults, providing prevention activities against catastrophic failure. Although runtime monitoring has been in use for so many years, there is renewed interest in its application largely because of the increasing complexity and ubiquitous nature of software systems. To address such a demand for runtime monitoring and behavioral analysis of software systems, we present RUMBA framework. It utilizes a synergy between static and dynamic analyses to evaluate whether a program behavior complies with specified properties during its execution. The framework is comprised of three steps, namely: i) Extracting Architecture where reverse engineering techniques are used to extract two meta-models of a Java system by utilizing UML-compliant and graph representations of the system model, ii) Seeding Objectives in which information required for filtering runtime events is obtained based on properties that are defined in OCL (Object Constraint Language) as specifications for the behavioral analysis, and iii) Runtime Monitoring and Analysis where behavior of the system is monitored according to the output of the previous stages, and then is analyzed based on the objective properties. The first and the second stages are static while the third one is dynamic. A prototype of our framework has been developed in Java programming language. We have performed a set of empirical studies on the proposed framework to assess the techniques introduced in this thesis. We have also evaluated the efficiency of the RUMBA framework in terms of processor and memory utilization for the case study applications. Software Engineering Reverse Engineering Dynamic Analysis Runtime Monitoring

Search results