Global ETD Search

1	Scaling managed runtime systems for future multicore hardware Ha, Jung Woo 27 August 2010 (has links) The exponential improvement in single processor performance has recently come to an end, mainly because clock frequency has reached its limit due to power constraints. Thus, processor manufacturers are choosing to enhance computing capabilities by placing multiple cores into a single chip, which can improve performance given parallel software. This paradigm shift to chip multiprocessors (also called multicore) requires scalable parallel applications that execute tasks on each core, otherwise the additional cores are worthless. Making an application scalable requires more than simply parallelizing the application code itself. Modern applications are written in managed languages, which require automatic memory management, type and memory abstractions, dynamic analysis and just-in-time (JIT) compilation. These managed runtime systems monitor and interact frequently with the executing application. Hence, the managed runtime itself must be scalable, and the instrumentation that monitors the application should not perturb its scalability. While multicore hardware forces a redesign of managed runtimes for scalability, it also provides opportunities when applications do not fully utilize all of the cores. Using available cores for concurrent helper threads that enhance the software, with debugging, security, and software support will make the runtime itself more capable and more scalable. This dissertation presents two novel techniques that improve the scalability of managed runtimes by utilizing unused cores. The first technique is a concurrent dynamic analysis framework that provides a low-overhead buffering mechanism called Cache-friendly Asymmetric Buffering (CAB) that quickly offloads data from the application to helper threads that perform specific dynamic analyses. Our framework minimizes application instrumentation overhead, prevents microarchitectural side-effects, and supports a variety of dynamic analysis clients, ranging from call graph and path profiling to cache simulation. The use of this framework ensures that helper threads perturb the performance of application as little as possible. Our second technique is concurrent trace-based just-in-time compilation, which exploits available cores for the JavaScript runtime. The JavaScript language limits applications to a single-thread, so extra cores are worthless unless they are used by the runtime components. We redesigned a production trace-based JIT compiler to run concurrently with the interpreter, and our technique is the first to improve both responsiveness and throughput in a trace-based JIT compiler. This thesis presents the design and implementation of both techniques and shows that they improve scalability and core utilization when running applications in managed runtimes. Industry is already adopting our approaches, which demonstrates the urgency of the scalable runtime problem and the utility of these techniques. / text Scalability Multicore Managed Language Runtime System Parallelism
2	Towards an improved memory model for Java Kotrajaras, Vishnu January 2002 (has links) No description available. 005
3	A Template-Based Java Code Generator for OpenModelica and MetaModelica Munisamy, Manokar January 2014 (has links) The current OpenModelica Complier (OMC) translates Modelica models into executable Ccodethrough several stages. The Code Generator is the final stage of the compiler whichgenerates target C-code from the optimized sorted equations. Recently, the Code Generator inOMC has been rewritten using the OpenModelica text template language. This gives a moreconcise and easier to understand code generator. Modeling and simulation is becomingincreasingly used in several application areas. There is demand for the OpenModelicaComplier (OMC) to generate code in languages like C#, CSharp, XML, JAVA and so on. Inthis thesis work, we implement a Java code generator to translate the internal equation-basedmodels in OpenModelica and its extension MetaModelica into a Java code representation. Tocreate the Java code generator we used the OpenModelica text template language, also calledSusan. This work is an important step on the way to finalize a full version of a Java CodeGenerator for the OpenModelica Complier (OMC). Java Code Generator Modelica OpenModelica Java Runtime System OpenModelica Complier
4	Power, Performance and Energy Models and Systems for Emergent Architectures Song, Shuaiwen 10 April 2013 (has links) Massive parallelism combined with complex memory hierarchies and heterogeneity in high-performance computing (HPC) systems form a barrier to efficient application and architecture design. The performance achievements of the past must continue over the next decade to address the needs of scientific simulations. However, building an exascale system by 2022 that uses less than 20 megawatts will require significant innovations in power and performance efficiency. A key limitation of past approaches is a lack of power-performance policies allowing users to quantitatively bound the effects of power management on the performance of their applications and systems. Existing controllers and predictors use policies fixed by a knowledgeable user to opportunistically save energy and minimize performance impact. While the qualitative effects are often good and the aggressiveness of a controller can be tuned to try to save more or less energy, the quantitative effects of tuning and setting opportunistic policies on performance and power are unknown. In other words, the controller will save energy and minimize performance loss in many cases but we have little understanding of the quantitative effects of controller tuning. This makes setting power-performance policies a manual trial and error process for domain experts and a black art for practitioners. To improve upon past approaches to high-performance power management, we need to quantitatively understand the effects of power and performance at scale. In this work, I have developed theories and techniques to quantitatively understand the relationship between power and performance for high performance systems at scale. For instance, our system-level, iso-energy-efficiency model analyzes, evaluates and predicts the performance and energy use of data intensive parallel applications on multi-core systems. This model allows users to study the effects of machine and application dependent characteristics on system energy efficiency. Furthermore, this model helps users isolate root causes of energy or performance inefficiencies and develop strategies for scaling systems to maintain or improve efficiency. I have also developed methodologies which can be extended and applied to model modern heterogeneous architectures such as GPU-based clusters to improve their efficiency at scale. / Ph. D. High performance computing power-aware computing runtime system
5	Runtime Systems and Scheduling Support for High-End CPU-GPU Architectures Trichy Ravi, Vignesh 27 June 2012 (has links) No description available. Computer Science Heterogneous Architecture Multicore CPU GPU Runtime System Job Scheduling Dynamic Work Distribution Performance
6	Effective Fusion and Separation of Distribution, Fault-Tolerance, and Energy-Efficiency Concerns Kwon, Young Woo 03 July 2014 (has links) As software applications are becoming increasingly distributed and mobile, their design and implementation are characterized by distributed software architectures, possibility of faults, and the need for energy awareness. Thus, software developers should be able to simultaneously reason about and handle the concerns of distribution, fault-tolerance, and energy-efficiency. Being closely intertwined, these concerns can introduce significant complexity into the design and implementation of modern software. In other words, to develop reliable and energy-efficient applications, software developers must understand how distribution, fault-tolerance, and energy-efficiency interplay with each other and how to implement these concerns while keeping the complexity in check. This dissertation addresses five technical issues that stand on the way of engineering reliable and energy-efficient software: (1) how can developers select and parameterize middleware to achieve the requisite levels of performance, reliability, and energy-efficiency? (2) how can one streamline the process of implementing and reusing fault tolerance functionality in distributed applications? (3) can automated techniques be developed to help transition centralized applications to using cloud-based services efficiently and reliably? (4) how can one leverage cloud-based resources to improve the energy-efficiency of mobile applications? (5) how can middleware be adapted to improve the energy-efficiency of distributed mobile applications operated over heterogeneous mobile networks? To address these issues, this research studies the concerns of distribution, fault-tolerance, and energy-efficiency as well as their interaction. It also develops novel approaches, techniques, and tools that effectively fuse and separate these concerns as required by particular software development scenarios. The specific innovations include (1) a systematic assessment of the performance, conciseness, complexity, reliability, and energy consumption of middleware mechanisms for accessing remote functionality, (2) a declarative approach to hardening distributed applications with resiliency against partial failure, (3) cloud refactoring, a set of automated program transformations for transitioning to using cloud-based services efficiently and reliably, (4) a cloud offloading approach that improves the energy-efficiency of mobile applications without compromising their reliability, (5) a middleware mechanism that optimizes energy consumption by adapting execution patterns dynamically in response to fluctuations in network conditions. / Ph. D. distributed computing fault-tolerance energy-efficiency middleware mobile applications program transformation runtime system dynamic adaptation refactoring
7	Memory Management and Garbage Collection Algorithms for Java-Based Prolog Zhou, Qinan 08 1900 (has links) Implementing a Prolog Runtime System in a language like Java which provides its own automatic memory management and safety features such as built--in index checking and array initialization requires a consistent approach to memory management based on a simple ultimate goal: minimizing total memory management time and extra space involved. The total memory management time for Jinni is made up of garbage collection time both for Java and Jinni itself. Extra space is usually requested at Jinni's garbage collection. This goal motivates us to find a simple and practical garbage collection algorithm and implementation for our Prolog engine. In this thesis we survey various algorithms already proposed and offer our own contribution to the study of garbage collection by improvements and optimizations for some classic algorithms. We implemented these algorithms based on the dynamic array algorithm for an all--dynamic Prolog engine (JINNI 2000). The comparisons of our implementations versus the originally proposed algorithm allow us to draw informative conclusions on their theoretical complexity model and their empirical effectiveness. Memory management prolog runtime system garbage collection, algorithm Memory management (Computer science) Garbage collection (Computer science) Prolog (Computer program language)
8	Programming Idioms and Runtime Mechanisms for Distributed Pervasive Computing Adhikari, Sameer 13 October 2004 (has links) The emergence of pervasive computing power and networking infrastructure is enabling new applications. Still, many milestones need to be reached before pervasive computing becomes an integral part of our lives. An important missing piece is the middleware that allows developers to easily create interesting pervasive computing applications. This dissertation explores the middleware needs of distributed pervasive applications. The main contributions of this thesis are the design, implementation, and evaluation of two systems: D-Stampede and Crest. D-Stampede allows pervasive applications to access live stream data from multiple sources using time as an index. Crest allows applications to organize historical events, and to reason about them using time, location, and identity. Together they meet the important needs of pervasive computing applications. D-Stampede supports a computational model called the thread-channel graph. The threads map to computing devices ranging from small to high-end processing elements. Channels serve as the conduits among the threads, specifically tuned to handle time-sequenced streaming data. D-Stampede allows the dynamic creation of threads and channels, and for the dynamic establishment (and removal) of the plumbing among them. The Crest system assumes a universe that consists of participation servers and event stores, supporting a set of applications. Each application consists of distributed software entities working together. The participation server helps the application entities to discover each other for interaction purposes. Application entities can generate events, store them at an event store, and correlate events. The entities can communicate with one another directly, or indirectly through the event store. We have qualitatively and quantitatively evaluated D-Stampede and Crest. The qualitative aspect refers to the ease of programming afforded by our programming abstractions for pervasive applications. The quantitative aspect measures the cost of the API calls, and the performance of an application pipeline that uses the systems. Event reasoning Temporal data Middleware Runtime system Pervasive computing Distributed computing Ubiquitous computing Middleware
9	Adapting the polytope model for dynamic and speculative parallelization / Adaptation du modèle polyhédrique à la parallélisation dynamique et spéculatice Jimborean, Alexandra 14 September 2012 (has links) Dans cette thèse, nous décrivons la conception et l'implémentation d'une plate-forme logicielle de spéculation de threads, ou fils d'exécution, appelée VMAD, pour "Virtual Machine for Advanced Dynamic analysis and transformation", et dont la fonction principale est d'être capable de paralléliser de manière spéculative un nid de boucles séquentiel de différentes façons, en ré-ordonnançant ses itérations. La transformation à appliquer est sélectionnée au cours de l'exécution avec pour objectifs de minimiser le nombre de retours arrières et de maximiser la performance. Nous effectuons des transformations de code en appliquant le modèle polyédrique que nous avons adapté à la parallélisation spéculative au cours de l'exécution. Pour cela, nous construisons au préalable un patron de code qui est "patché" par notre "runtime", ou support d'exécution logiciel, selon des informations de profilage collectées sur des échantillons du temps d'exécution. L'adaptabilité est assurée en considérant des tranches de code de tailles différentes, qui sont exécutées successivement, chacune étant parallélisée différemment, ou exécutée en séquentiel, selon le comportement des accès à la mémoire observé. Nous montrons, sur plusieurs programmes que notre plate-forme offre de bonnes performances, pour des codes qui n'auraient pas pu être traités efficacement par les systèmes spéculatifs de threads proposés précédemment. / In this thesis, we present a Thread-Level Speculation (TLS) framework whose main feature is to speculatively parallelize a sequential loop nest in various ways, to maximize performance. We perform code transformations by applying the polyhedral model that we adapted for speculative and runtime code parallelization. For this purpose, we designed a parallel code pattern which is patched by our runtime system according to the profiling information collected on some execution samples. We show on several benchmarks that our framework yields good performance on codes which could not be handled efficiently by previously proposed TLS systems. Programmation parallèle Speculative parallelization Runtime system Compiler Polyhedral model Dynamic optimizations Loops Partial parallelism LLVM Automatic parallelization 005
10	Efficient in-situ workflows for time-critical applications on heterogeneous ecosystems Item Feng Li (16627272) 21 July 2023 (has links) <p>In-situ workflows are a special class of scientific workflows, where different component applications (such as simulation, visualization, analysis) run concurrently, and data flows continuously between components during the whole workflow lifetime. Traditionally, simulations write large amounts of output data to persistent storage, which are later read for future analysis/visualization. In comparison, in-situ workflows allow analysis/visualization components to consume simulation data while the simulations are still running and thus reduce the I/O overhead. There are recent research works that focus on providing data transport libraries to help compose a group of applications into an integral in-situ workflow. However, only a few ``performance-oriented'' studies exist for in-situ workflows, and most of these works focus on workflows with simple structures (e.g., single producer and single consumer), also without consideration of heterogeneous environments for in-situ workflows. Being able to efficiently utilize heterogeneous computing resources such as multiple Clouds and HPCs can significantly accelerate real-world in-situ workflows, and benefit applications that require both significant computation power and real-time outputs(e.g., identifying abnormal patterns in fluid dynamics). The goal of this dissertation is to provide resource planning algorithms and runtime support, to improve in-situ workflow performance on heterogeneous environments.</p> <p><br></p> <p>This dissertation first investigates the emerging applications of in-situ workflows, which usually include parallel simulation, visualization, and analysis components. Two representative real-world in-situ workflows are studied in details-- a real-time CFD machine learning/visualization workflow and a wildfire spreading workflow. These workflows showcase the capability of in-situ workflows: e.g., decoupled and accelerated computation and fast near-real-time response time, however, there is a lack of resource planning and runtime support for general in-situ workflows. For resource planning, I first formulate the optimization problem, and then design and implement a heuristic algorithm called ``SNL'' (Scheduled-Neighbor-Lookup). SNL considers the pipelined execution pattern of in-situ workflows, and guides the resource planning of complex in-situ workflows to achieve higher workflow throughput. For the runtime support, I design and implement the ``INSTANT'' runtime framework, a runtime framework to configure, plan, launch, and monitor in-situ workflows for distributed computing environments. INSTANT provides intuitive interfaces to compose abstract in-situ workflows, manages in-site and cross-site data transfers with ADIOS2, and supports resource planning using profiled performance data. Experiments with the two use cases show that INSTANT can efficiently streamline the orchestration of complex in-situ workflows, and the resource planning capability allows INSTANT to plan and carry out fast workflow execution at different computing resource availabilities.</p> Distributed systems and algorithms High performance computing in-situ workflows scientific workflows high-performance computing resource planning runtime system

Search results