Spelling suggestions: "subject:"arallel programming"" "subject:"arallel erogramming""
311 |
Architecture synthesis for adaptive multiprocessor systems on chipIshebabi, Harold January 2010 (has links)
This thesis presents methods for automated synthesis of flexible chip multiprocessor systems from parallel programs targeted at FPGAs to exploit both task-level parallelism and architecture customization. Automated synthesis is necessitated by the complexity of the design space. A detailed description of the design space is provided in order to determine which parameters should be modeled to facilitate automated synthesis by optimizing a cost function, the emphasis being placed on inclusive modeling of parameters from application, architectural and physical subspaces, as well as their joint coverage in order to avoid pre-constraining the design space. Given a parallel program and a set of an IP library, the automated synthesis problem is to simultaneously (i) select processors (ii) map and schedule tasks to them, and (iii) select one or several networks for inter-task communications such that design constraints and optimization objectives are met. The research objective in this thesis is to find a suitable model for automated synthesis, and to evaluate methods of using the model for architectural optimizations. Our contributions are a holistic approach for the design of such systems, corresponding models to facilitate automated synthesis, evaluation of optimization methods using state of the art integer linear and answer set programming, as well as the development of synthesis heuristics to solve runtime challenges. / Aktuelle Technologien erlauben es komplexe Multiprozessorsysteme auf einem Chip mit Milliarden von Transistoren zu realisieren. Der Entwurf solcher Systeme ist jedoch zeitaufwendig und schwierig. Diese Arbeit befasst sich mit der Frage, wie On-Chip Multiprozessorsysteme ausgehend von parallelen Programmen automatisch synthetisiert werden können. Die Implementierung der Multiprozessorsysteme auf rekonfigurierbaren Chips erlaubt es die gesamte Architektur an die Struktur eines vorliegenden parallelen Programms anzupassen. Auf diese Weise ist es möglich die aktuellen technologischen Unzulänglichkeiten zu umgehen, insbesondere die nicht weitersteigende Taktfrequenzen sowie den langsamen Zugriff auf Datenspeicher.
Eine Automatisierung des Entwurfs von Multiprozessorsystemen ist notwendig, da der Entwurfsraum von Multiprozessorsystemen zu groß ist, um vom Menschen überschaut zu werden.
In einem ersten Ansatz wurde das Syntheseproblem mittels linearer Gleichungen modelliert, die dann durch lineare Programmierungswerkzeuge gelöst werden können. Ausgehend von diesem Ansatz wurde untersucht, wie die typischerweise langen Rechenzeiten solcher Optimierungsmethoden durch neuere Methode aus dem Gebiet der Erfüllbarkeitsprobleme der Aussagenlogik minimiert werden können. Dabei wurde die Werkzeugskette Potassco verwendet, in der lineare Programme direkt in Logikprogramme übersetzt werden können. Es wurde gezeigt, dass dieser zweite Ansatz die Optimierungszeit um bis zu drei Größenordnungen beschleunigt. Allerdings lassen sich große Syntheseprobleme auf diese weise wegen Speicherbegrenzungen nicht lösen.
Ein weiterer Ansatz zur schnellen automatischen Synthese bietet die Verwendung von Heuristiken. Es wurden im Rahmen diese Arbeit drei Heuristiken entwickelt, die die Struktur des vorliegenden Syntheseproblems ausnutzen, um die Optimierungszeit zu minimieren. Diese Heuristiken wurden unter Berücksichtigung theoretischer Ergebnisse entwickelt, deren Ursprung in der mathematische Struktur des Syntheseproblems liegt. Dadurch lassen sich optimale Architekturen in kurzer Zeit ermitteln.
Die durch diese Dissertation offen gewordene Forschungsarbeiten sind u. a. die Berücksichtigung der zeitlichen Reihenfolge des Datenaustauschs zwischen parallelen Tasks, die Optimierung des logik-basierten Ansatzes, die Integration von Prozessor- und Netzwerksimulatoren zur funktionalen Verifikation synthetisierter Architekturen, sowie die Entwicklung geeigneter Architekturkomponenten.
|
312 |
Programming models for speculative and optimistic parallelism based on algorithmic propertiesCledat, Romain 24 August 2011 (has links)
Today's hardware is becoming more and more parallel. While embarrassingly parallel codes, such as high-performance computing ones, can readily take advantage of this increased number of cores, most other types of code cannot easily scale using traditional data and/or task parallelism and cores are therefore left idling resulting in lost opportunities to improve performance. The opportunistic computing paradigm, on which this thesis rests, is the idea that computations should dynamically adapt to and exploit the opportunities that arise due to idling resources to enhance their performance or quality.
In this thesis, I propose to utilize algorithmic properties to develop programming models that leverage this idea thereby providing models that increase and improve the parallelism that can be exploited. I exploit three distinct algorithmic properties: i) algorithmic diversity, ii) the semantic content of data-structures, and iii) the variable nature of results in certain applications.
This thesis presents three main contributions: i) the N-way model which leverages algorithmic diversity to speed up hitherto sequential code, ii) an extension to the N-way model which opportunistically improves the quality of computations and iii) a framework allowing the programmer to specify the semantics of data-structures to improve the performance of optimistic parallelism.
|
313 |
Optimistic semantic synchronizationSreeram, Jaswanth 06 October 2011 (has links)
Within the last decade multi-core processors have become increasingly commonplace with the
power and performance demands of modern real-world programs acting to accelerate this trend. The
rapid advancements in designing and adoption of such architectures mean that there is a serious need
for programming models that allow the development of correct parallel programs that execute
efficiently on these processors. A principle problem in this regard is that of efficiently synchronizing
concurrent accesses to shared memory. Traditional solutions to this problem are either inefficient but
provide programmability (coarse-grained locks) or are efficient but are not composable and very hard
to program and verify (fine-grained locks). Optimistic Transactional Memory systems provide many of
the composability and programmabillity advantages of coarse-grained locks and good theoretical
scaling but several studies have found that their performance in practice for many programs remains
quite poor primarily because of the high overheads of providing safe optimism. Moreover current
transactional memory models remain rigid - they are not suited for expressing some of the complex
thread interactions that are prevalent in modern parallel programs. Moreover, the synchronization
achieved by these transactional memory systems is at the physical or memory level.
This thesis advocates a position that memory synchronization problem for threads should be
modeled and solved in terms of synchronization of underlying program values which have semantics
associated with them. It presents optimistic synchronization techniques that address the semantic
synchronization requirements of a parallel program instead.
These techniques include methods to 1) enable optimistic transactions to recover from
expensive sharing conflicts without discarding all the work made possible by the optimism 2) enable a
hybrid pessimistic-optimistic form of concurrency control that lowers overheads 3) make
synchronization value-aware and semantics-aware 4) enable finer grained consistency rules (than
allowed by traditional optimistic TM models) therefore avoiding conflicts that do not enforce any
semantic property required by the program. In addition to improving the expressibility of specific
synchronization idioms all these techniques are also effective in improving parallel performance. This
thesis formulates these techniques in terms of their purpose, the extensions to the language, the
compiler as well as to the concurrency control runtime necessary to implement them. It also briefly
presents an experimental evaluation of each of them on a variety of modern parallel workloads. These
experiments show that these techniques significantly improve parallel performance and scalability over
programs using state-of-the-art optimistic synchronization methods.
|
314 |
Zero-sided communication : challenges in implementing time-based channels using the MPI/RT specificationNeelamegam, Jothi P. January 2002 (has links)
Thesis (M.S.)--Mississippi State University. Department of Computer Science. / Title from title screen. Includes bibliographical references.
|
315 |
Integrating algorithmic and systemic load balancing strategies in parallel scientific applicationsGhafoor, Sheikh Khaled, January 2003 (has links)
Thesis (M.S.)--Mississippi State University. Department of Computer Science and Engineering. / Title from title screen. Includes bibliographical references.
|
316 |
Pricing of American Options by Adaptive Tree Methods on GPUsLundgren, Jacob January 2015 (has links)
An assembled algorithm for pricing American options with absolute, discrete dividends using adaptive lattice methods is described. Considerations for hardware-conscious programming on both CPU and GPU platforms are discussed, to provide a foundation for the investigation of several approaches for deploying the program onto GPU architectures. The performance results of the approaches are compared to that of a central processing unit reference implementation, and to each other. In particular, an approach of designating subtrees to be calculated in parallel by allowing multiple calculation of overlapping elements is described. Among the examined methods, this attains the best performance results in a "realistic" region of calculation parameters. A fifteen- to thirty-fold improvement in performance over the CPU reference implementation is observed as the problem size grows sufficiently large.
|
317 |
Software Engineering Best Practices for Parallel Computing Developmentpatney, vikas January 2010 (has links)
In today’s computer age, the numerical simulations are replacing the traditional laboratory experiments. Researchers around the world are using advanced computer software and multiprocessor computer technology to perform experiments, and analyse these simulation results to advance in their respective endeavours. With a wide variety of tools and technologies available, it could be a tedious and time taking task for a non-computer science researcher to choose appropriate methodologies for developing simulation software The research of this thesis addresses the use of Message Passing Interface (MPI) using object-oriented programming techniques and discusses the methodologies suitable to scientific computing, also, propose a customized software engineering development model.
|
318 |
Speculation in Parallel and Distributed Event Processing SystemsBrito, Andrey 09 August 2010 (has links) (PDF)
Event stream processing (ESP) applications enable the real-time processing of continuous flows of data. Algorithmic trading, network monitoring, and processing data from sensor networks are good examples of applications that traditionally rely upon ESP systems. In addition, technological advances are resulting in an increasing number of devices that are network enabled, producing information that can be automatically collected and processed. This increasing availability of on-line data motivates the development of new and more sophisticated applications that require low-latency processing of large volumes of data.
ESP applications are composed of an acyclic graph of operators that is traversed by the data. Inside each operator, the events can be transformed, aggregated, enriched, or filtered out. Some of these operations depend only on the current input events, such operations are called stateless. Other operations, however, depend not only on the current event, but also on a state built during the processing of previous events. Such operations are, therefore, named stateful.
As the number of ESP applications grows, there are increasingly strong requirements, which are often difficult to satisfy. In this dissertation, we address two challenges created by the use of stateful operations in a ESP application: (i) stateful operators can be bottlenecks because they are sensitive to the order of events and cannot be trivially parallelized by replication; and (ii), if failures are to be tolerated, the accumulated state of an stateful operator needs to be saved, saving this state traditionally imposes considerable performance costs.
Our approach is to evaluate the use of speculation to address these two issues. For handling ordering and parallelization issues in a stateful operator, we propose a speculative approach that both reduces latency when the operator must wait for the correct ordering of the events and improves throughput when the operation in hand is parallelizable. In addition, our approach does not require that user understand concurrent programming or that he or she needs to consider out-of-order execution when writing the operations.
For fault-tolerant applications, traditional approaches have imposed prohibitive performance costs due to pessimistic schemes. We extend such approaches, using speculation to mask the cost of fault tolerance.
|
319 |
Adaptive transaction scheduling for transactional memory systemsYoo, Richard M. 01 April 2008 (has links)
Transactional memory systems are expected to enable parallel
programming at lower programming complexity, while delivering improved performance over traditional lock-based systems. Nonetheless, there are certain situations where transactional memory systems could actually perform worse. Transactional memory systems can outperform locks only
when the executing workloads contain sufficient parallelism. When the workload lacks inherent parallelism, launching excessive transactions can adversely degrade performance. These situations will actually become dominant in future workloads when large-scale transactions are frequently executed.
In this thesis, we propose a new paradigm called adaptive transaction scheduling to address this issue. Based on the parallelism feedback from applications, our adaptive transaction scheduler dynamically dispatches and controls the number of concurrently executing transactions. In our case study, we show that our low-cost mechanism not only guarantees that hardware transactional memory systems perform no worse than a single global lock, but also significantly improves performance for both hardware and software transactional memory systems.
|
320 |
Data flow implementations of a lucid-like programming language / by Andrew Lawrence WendelbornWendelborn, Andrew Lawrence January 1985 (has links)
Bibliography: leaves [238]-244 / xi, 244 leaves : ill ; 30 cm. / Title page, contents and abstract only. The complete thesis in print form is available from the University Library. / Thesis (Ph.D.)--University of Adelaide, Dept. of Computer Science, 1985
|
Page generated in 0.061 seconds