Spelling suggestions: "subject:"arallel programming"" "subject:"arallel erogramming""
181 |
Transactions EverywhereKuszmaul, Bradley C., Leiserson, Charles E. 01 1900 (has links)
Arguably, one of the biggest deterrants for software developers who might otherwise choose to write parallel code is that parallelism makes their lives more complicated. Perhaps the most basic problem inherent in the coordination of concurrent tasks is the enforcing of atomicity so that the partial results of one task do not inadvertently corrupt another task. Atomicity is typically enforced through locking protocols, but these protocols can introduce other complications, such as deadlock, unless restrictive methodologies in their use are adopted. We have recently begun a research project focusing on transactional memory [18] as an alternative mechanism for enforcing atomicity, since it allows the user to avoid many of the complications inherent in locking protocols. Rather than viewing transactions as infrequent occurrences in a program, as has generally been done in the past, we have adopted the point of view that all user code should execute in the context of some transaction. To make this viewpoint viable requires the development of two key technologies: effective hardware support for scalable transactional memory, and linguistic and compiler support. This paper describes our preliminary research results on making “transactions everywhere” a practical reality. / Singapore-MIT Alliance (SMA)
|
182 |
On-the-fly Race Detection for Programs with Recursive Spawn-Sync ParallelismHe, Yuxiong, Wang, Junqing 01 1900 (has links)
Detecting data race is very important for debugging shared-memory parallel programs, because data races result in unintended nondeterministic execution of the program. We propose a dynamic on-the-fly race detection mechanism called Parallel Nondeterminator to check for determinacy races during the parallel execution of a program with recursive spawn-sync parallelism. A modified version of Nested Region Labeling scheme is developed for the concurrency relationship test in the spawn-sync parallel structure. Through the identification of Least Common Ancestor in the spawn tree, the Parallel Nondeterminator only needs to keep two read access records and one write access record for each shared location. The work and critical path in the instrumented codes are analyzed as well as time complexity and space requirements. Let N denote the maximum depth of the recursion in the parallel program. The worst case time increased for each spawn and sync operation is O(N) and the time required to monitor any shared memory location is O(lgN). Moreover, Parallel Nondeterminator is able to execute the race detection code without loss of parallelism of the original program. In summary, the Parallel Non-determinator represents a provably efficient strategy for detecting data races for shared-memory parallel programs. / Singapore-MIT Alliance (SMA)
|
183 |
Experience with Acore: Implementing GHC with ActorsPalmucci, Jeff, Waldsburger, Carl, Duis, David, Krause, Paul 01 August 1990 (has links)
This paper presents a concurrent interpreter for a general-purpose concurrent logic programming language, Guarded Horn Clauses (GHC). Unlike typical implementations of GHC in logic programming languages, the interpreter is implemented in the Actor language Acore. The primary motivation for this work was to probe the strengths and weaknesses of Acore as a platform for developing sophisticated programs. The GHC interpreter provided a rich testbed for exploring Actor programming methodology. The interpreter is a pedagogical investigation of the mapping of GHC constructs onto the Actor model. Since we opted for simplicity over optimization, the interpreter is somewhat inefficient.
|
184 |
Enhancing MPI with modern networking mechanisms in cluster interconnectsYu, Weikuan, January 2006 (has links)
Thesis (Ph. D.)--Ohio State University, 2006. / Title from first page of PDF file. Includes bibliographical references (p. 161-168).
|
185 |
E-AMOM: An Energy-Aware Modeling and Optimization Methodology for Scientific Applications on Multicore SystemsLively, Charles 2012 May 1900 (has links)
Power consumption is an important constraint in achieving efficient execution on High Performance Computing Multicore Systems. As the number of cores available on a chip continues to increase, the importance of power consumption will continue to grow. In order to achieve improved performance on multicore systems scientific applications must make use of efficient methods for reducing power consumption and must further be refined to achieve reduced execution time.
In this dissertation, we introduce a performance modeling framework, E-AMOM, to enable improved execution of scientific applications on parallel multicore systems with regards to a limited power budget. We develop models for each application based upon performance hardware counters. Our models utilize different performance counters for each application and for each performance component (runtime, system power consumption, CPU power consumption, and memory power consumption) that are selected via our performance-tuned principal component analysis method. Models developed through E-AMOM provide insight into the performance characteristics of each application that affect performance for each component on a parallel multicore system. Our models are more than 92% accurate across both Hybrid (MPI/OpenMP) and MPI implementations for six scientific applications.
E-AMOM includes an optimization component that utilizes our models to employ run-time Dynamic Voltage and Frequency Scaling (DVFS) and Dynamic Concurrency Throttling to reduce power consumption of the scientific applications. Further, we optimize our applications based upon insights provided by the performance models to reduce runtime of the applications. Our methods and techniques are able to save up to 18% in energy consumption for Hybrid (MPI/OpenMP) and MPI scientific applications and reduce the runtime of the applications up to 11% on parallel multicore systems.
|
186 |
Relaxing Concurrency Control in Transactional MemoryAydonat, Utku 05 January 2012 (has links)
Transactional memory (TM) systems have gained considerable popularity in the last decade driven by the increased demand for tools that ease parallel programming. TM eliminates the need for user-locks that protect accesses to shared data. It offers performance close to that of fine-grain locking with the programming simplicity of coarse-grain locking. Today’s TM systems implement the two-phase-locking (2PL) algorithm which aborts transactions every
time a conflict occurs. 2PL is a simple algorithm that provides fast transactional operations. However, it limits concurrency in applications with high contention because it increases the rate of aborts. We propose the use of a more relaxed concurrency control algorithm to provide better concurrency. This algorithm is based on the conflict-serializability (CS) model. Unlike 2PL, it allows some transactions to commit successfully even when they make conflicting accesses. We implement this algorithm both in a software TM system as well as in a simulator of a hardware TM system. Our evaluation using TM benchmarks shows that the algorithm improves the performance of applications with long transactions and high abort rates. Performance is improved by up to 299% in the software TM, and up to 66% in the hardware simulator. We argue that these improvements come with little additional complexity and require no changes to the transactional programming model. This makes our implementation feasible
|
187 |
Relaxing Concurrency Control in Transactional MemoryAydonat, Utku 05 January 2012 (has links)
Transactional memory (TM) systems have gained considerable popularity in the last decade driven by the increased demand for tools that ease parallel programming. TM eliminates the need for user-locks that protect accesses to shared data. It offers performance close to that of fine-grain locking with the programming simplicity of coarse-grain locking. Today’s TM systems implement the two-phase-locking (2PL) algorithm which aborts transactions every
time a conflict occurs. 2PL is a simple algorithm that provides fast transactional operations. However, it limits concurrency in applications with high contention because it increases the rate of aborts. We propose the use of a more relaxed concurrency control algorithm to provide better concurrency. This algorithm is based on the conflict-serializability (CS) model. Unlike 2PL, it allows some transactions to commit successfully even when they make conflicting accesses. We implement this algorithm both in a software TM system as well as in a simulator of a hardware TM system. Our evaluation using TM benchmarks shows that the algorithm improves the performance of applications with long transactions and high abort rates. Performance is improved by up to 299% in the software TM, and up to 66% in the hardware simulator. We argue that these improvements come with little additional complexity and require no changes to the transactional programming model. This makes our implementation feasible
|
188 |
Habanero-Scala: A Hybrid Programming model integrating Fork/Join and Actor modelsImam, Shams 24 July 2013 (has links)
This study presents a hybrid concurrent programming model combining the previously developed Fork-Join model (FJM) and Actor model (AM). With the advent of multi-core computers, there is a renewed interest in programming models that reduce the burden of reasoning about and writing efficient concurrent programs. The proposed hybrid model shows how the divide-and-conquer approach of the FJM and the no-shared mutable state and event-driven philosophy of the AM can be combined to solve certain classes of problems more efficiently and productively than either of the aforementioned models individually. The hybrid model adds actor creation and coordination
to into the FJM, while also enabling parallelization within actors. This study uses the Habanero-Java and Scala programming languages as the base for the FJM and AM respectively, and provides an implementation of the hybrid model as an extension of the Scala language called Habanero-Scala. The hybrid model adds to the foundations of parallel programs, and to the tools available for the programmer to aid in productivity and performance while developing parallel software.
|
189 |
Architectural support for high-performing hardware transactional memory systemsLupon Navazo, Marc 23 December 2011 (has links)
Parallel programming presents an efficient solution to exploit future multicore processors.
Unfortunately, traditional programming models depend on programmer’s skills for synchronizing
concurrent threads, which makes the development of parallel software a hard and errorprone
task. In addition to this, current synchronization techniques serialize the execution of
those critical sections that conflict in shared memory and thus limit the scalability of multithreaded
applications.
Transactional Memory (TM) has emerged as a promising programming model that solves
the trade-off between high performance and ease of use. In TM, the system is in charge of
scheduling transactions (atomic blocks of instructions) and guaranteeing that they are executed
in isolation, which simplifies writing parallel code and, at the same time, enables high concurrency
when atomic regions access different data. Among all forms of TM environments,
Hardware TM (HTM) systems is the only one that offers fast execution at the cost of adding
dedicated logic in the processor.
Existing HTMsystems suffer considerable delays when they execute complex transactional
workloads, especially when they deal with large and contending transactions because they lack
adaptability. Furthermore, most HTM implementations are ad hoc and require cumbersome
hardware structures to be effective, which complicates the feasibility of the design. This thesis
makes several contributions in the design and analysis of low-cost HTMsystems that yield good
performance for any kind of TM program.
Our first contribution, FASTM, introduces a novel mechanism to elegantly manage speculative
(and already validated) versions of transactional data by slightly modifying on-chip memory
engine. This approach permits fast recovery when a transaction that fits in private caches is discarded.
At the same time, it keeps non-speculative values in software, which allows in-place
x
memory updates. Thus, FASTM is not hurt from capacity issues nor slows down when it has to
undo transactional modifications.
Our second contribution includes two different HTM systems that integrate deferred resolution
of conflicts in a conventional multicore processor, which reduces the complexity of the
system with respect to previous proposals. The first one, FUSETM, combines different-mode
transactions under a unified infrastructure to gracefully handle resource overflow. As a result,
FUSETM brings fast transactional computation without requiring additional hardware nor extra
communication at the end of speculative execution. The second one, SPECTM, introduces a
two-level data versioning mechanism to resolve conflicts in a speculative fashion even in the
case of overflow.
Our third and last contribution presents a couple of truly flexible HTM systems that can
dynamically adapt their underlying mechanisms according to the characteristics of the program.
DYNTM records statistics of previously executed transactions to select the best-suited strategy
each time a new instance of a transaction starts. SWAPTM takes a different approach: it tracks
information of the current transactional instance to change its priority level at runtime. Both
alternatives obtain great performance over existing proposals that employ fixed transactional
policies, especially in applications with phase changes.
|
190 |
ParModelica : Extending the Algorithmic Subset ofModelica with Explicit Parallel LanguageConstructs for Multi-core SimulationGebremedhin, Mahder January 2011 (has links)
In today’s world of high tech manufacturing and computer-aided design simulations of models is at theheart of the whole manufacturing process. Trying to represent and study the variables of real worldmodels using simulation computer programs can turn out to be a very expensive and time consumingtask. On the other hand advancements in modern multi-core CPUs and general purpose GPUs promiseremarkable computational power. Properly utilizing this computational power can provide reduced simulation time. To this end modernmodeling environments provide different optimization and parallelization options to take advantage ofthe available computational power. Some of these parallelization approaches are based onautomatically extracting parallelism with the help of a compiler. Another approach is to provide themodel programmers with the necessary language constructs to express any potential parallelism intheir models. This second approach is taken in this thesis work. The OpenModelica modeling and simulation environment for the Modelica language has beenextended with new language constructs for explicitly stating parallelism in algorithms. This slightlyextended algorithmic subset of Modelica is called ParModelica. The new extensions allow modelswritten in ParModelica to be translated to optimized OpenCL code which can take advantage of thecomputational power of available Multi-core CPUs and general purpose GPUs.
|
Page generated in 0.0618 seconds