• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 9
  • 4
  • 2
  • 2
  • Tagged with
  • 20
  • 13
  • 9
  • 8
  • 6
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Entwurf und Realisierung eines vierfach MIMD-Prozessor-Mikrochips für eine Anwendung in der Hochenergiephysik

Lesser, Falk. January 2002 (has links)
Mannheim, Univ., Diss., 2002.
2

Entwurf und Realisierung eines vierfach MIMD-Prozessor-Mikrochips für eine Anwendung in der Hochenergiephysik

Lesser, Falk. Unknown Date (has links) (PDF)
Universiẗat, Diss., 2002--Mannheim.
3

Parallel load-balancing on message passing architectures

Muniz, Francisco Junqueira January 1994 (has links)
No description available.
4

Task-parallel extension of a data-parallel language

Macielinski, Damien D. 28 October 1994 (has links)
Two prevalent models of parallel programming are data parallelism and task parallelism. Data parallelism is the simultaneous application of a single operation to a data set. This model fits best with regular computations. Task parallelism is the simultaneous application of possibly different operations to possibly different data sets. This fits best with irregular computations. Efficient solution of some problems require both regular and irregular computations. Implementing efficient and portable parallel solutions to these problems requires a high-level language that can accommodate both task and data parallelism. We have extended the data-parallel language Dataparallel C to include task parallelism so that programmers may now use data and task parallelism within the same program. The extension permits the nesting of data-parallel constructs inside a task-parallel framework. We present a banded linear system to analyze the benefits of our language extensions. / Graduation date: 1995
5

Scheduling non-uniform parallel loops on MIMD computers

Liu, Jie 22 September 1993 (has links)
Parallel loops are one of the main sources of parallelism in scientific applications, and many parallel loops do not have a uniform iteration execution time. To achieve good performance for such applications on a parallel computer, iterations of a parallel loop have to be assigned to processors in such a way that each processor has roughly the same amount of work in terms of execution time. A parallel computer with a large number of processors tends to have distributed-memory. To run a parallel loop on a distributed-memory machine, data distribution also needs to be considered. This research investigates the scheduling of non-uniform parallel loops on both shared-memory and distributed-memory parallel computers. We present Safe Self-Scheduling (SSS), a new scheduling scheme that combines the advantages of both static and dynamic scheduling schemes. SSS has two phases: a static scheduling phase and a dynamic self-scheduling phase that together reduce the scheduling overhead while achieving a well balanced workload. The techniques introduced in SSS can be used by other self-scheduling schemes. The static scheduling phase further improves the performance by maintaining a high cache hit ratio resulting from increased affinity of iterations to processors. SSS is also very well suited for distributed-memory machines. We introduce methods to duplicate data on a number of processors. The methods eliminate data movement during computation and increase the scalability of problem size. We discuss a systematic approach to implement a given self-scheduling scheme on a distributed-memory. We also show a multilevel scheduling scheme to self-schedule parallel loops on a distributed-memory machine with a large number of processors to eliminate the bottleneck resulting from a central scheduler. We proposed a method using abstractions to automate both self-scheduling methods and data distribution methods in parallel programming environments. The abstractions are tested using CHARM, a real parallel programming environment. Methods are also developed to tolerate processor faults caused by both physical failure and reassignment of processors by the operating system during the execution of a parallel loop. We tested the techniques discussed using simulations and real applications. Good results have been obtained on both shared-memory and distributed-memory parallel computers. / Graduation date: 1994
6

SIMD-MIMD cocktail in a hybrid memory glass: shaken, not stirred

Zarubin, Mikhail, Damme, Patrick, Krause, Alexander, Habich, Dirk, Lehner, Wolfgang 23 November 2021 (has links)
Hybrid memory systems consisting of DRAM and NVRAM offer a great opportunity for column-oriented data systems to persistently store and to efficiently process columnar data completely in main memory. While vectorization (SIMD) of query operators is state-of-the-art to increase the single-thread performance, it has to be combined with thread-level parallelism (MIMD) to satisfy growing needs for higher performance and scalability. However, it is not well investigated how such a SIMD-MIMD interplay could be leveraged efficiently in hybrid memory systems. On the one hand, we deliver an extensive experimental evaluation of typical workloads on columnar data in this paper. We reveal that the choice of the most performant SIMD version differs greatly for both memory types. Moreover, we show that the throughput of concurrent queries can be boosted (up to 2x) when combining various SIMD flavors in a multi-threaded execution. On the other hand, to enable that optimization, we propose an adaptive SIMD-MIMD cocktail approach incurring only a negligible runtime overhead.
7

Analysis of a coordination framework for mapping coarse-grain applications to distributed systems

Schaefer, Linda Ruth 01 January 1991 (has links)
A paradigm is presented for the parallelization of coarse-grain engineering and scientific applications. The coordination framework provides structure and an organizational strategy for a parallel solution in a distributed environment. Three categories of primitives which define the coordination framework are presented: structural, transformational. and operational. The prototype of the paradigm presented in this thesis is the first step towards a programming development tool. This tool will allow non-specialist programmers to parallelize existing sequential solutions through the distribution, synchronization and collection of tasks. The distributed control, multidimensional pipeline characteristics of the paradigm provide advantages which include load balancing through the use of self-directed workers, a simplified communication scheme ideally suited for infrequent task interaction, a simple programmer interface, and the ability of the programmer to use already existing code. Results for the parallelization of SPICE3Cl in a distributed system of fifteen SUN 3 workstations with one fileserver demonstrate linear speedup with slopes ranging from 0.7 to 0.9. A high-level abstraction of the system is presented in the form of a closed, single class, queuing network model. Using the Mean Value Analysis solution technique from queuing network theory, an expression for total execution time is obtained and is shown to be consistent with the well known Amdahl's Law. Our expression is in fact a refinement of Amdahl's Law which realistically captures the limitations of the system. We show that the portion of time spent executing serial code which cannot be enhanced by parallelization is a function of N, the number of workers in the system. Experiments reveal the critical nature of the communication scheme and the synchronization of the paradigm. Investigation of the synchronization center indicates that as N increases, visitations to the center increase and degrade system performance. Experimental data provides the information needed to characterize the impact of visitations on the perfoimance of the system. This characterization provides a mechanism for optimizing the speedup of an application. It is shown that the model replicates the system as well as predicts speedup over an extended range of processors, task count, and task size.
8

Implementation of Data Parallel Primitives on MIMD Shared Memory Systems

Mortensen, Christian January 2019 (has links)
This thesis presents an implementation of a multi-threaded C library for performing data parallel computations on MIMD shared memory systems, with support for user defined operators and one-dimensional sparse arrays. Multi-threaded parallel execution was achieved by the use of the POSIX threads, and the library exposes several functions for performing data parallel computations directly on arrays. The implemented functions were based on a set of primitives that many data parallel programming languages have in common. The individual scalability of the primitives varied greatly, with most of them only gaining a significant speedup when executed on two cores followed by a significant drop-off in speedup as more cores were added. An exception to this was the reduction primitive however, which managed to achieve near optimal speedup in most tests. The library proved unviable for expressing algorithms requiring more then one or two primitives in sequence due to the overhead that each of them cause.
9

Algorithmique parallèle : réseaux d'automates, architectures systoliques, machines SIMD et MIMD

Robert, Yves 06 January 1986 (has links) (PDF)
.
10

Design and implementation of massively parallel fine-grained processor arrays

Walsh, Declan January 2015 (has links)
This thesis investigates the use of massively parallel fine-grained processor arrays to increase computational performance. As processors move towards multi-core processing, more energy-efficient processors can be designed by increasing the number of processor cores on a single chip rather than increasing the clock frequency of a single processor. This can be done by making processor cores less complex, but increasing the number of processor cores on a chip. Using this philosophy, a processor core can be reduced in complexity, area, and speed to form a very small processor which can still perform basic arithmetic operations. Due to the small area occupation this can be multiplied and scaled to form a large scale parallel processor array to offer a significant performance. Following this design methodology, two fine-grained parallel processor arrays are designed which aim to achieve a small area occupation with each individual processor so that a larger array can be implemented over a given area. To demonstrate scalability and performance, SIMD parallel processor array is designed for implementation on an FPGA where each processor can be implemented using four ‘slices’ of a Xilinx FPGA. With such small area utilization, a large fine-grained processor can be implemented on these FPGAs. A 32 × 32 processor array is implemented and fast processing demonstrated using image processing tasks. An event-driven MIMD parallel processor array is also designed which occupies a small amount of area and can be scaled up to form much larger arrays. The event-driven approach allows the processor to enter an idle mode when no events are occurring local to the processor, reducing power consumption. The processor can switch to operational mode when events are detected. The processor core is designed with a multi-bit data path and ALU and contains its own instruction memory making the array a multi-core processor array. With area occupation of primary concern, the processor is relatively simple and connects with its four nearest direct neighbours. A small 8 × 8 prototype chip is implemented in a 65 nm CMOS technology process which can operate at a clock frequency of 80 MHz and offer a peak performance of 5.12 GOPS which can be scaled up to larger arrays. An application of the event-driven processor array is demonstrated using a simulation model of the processor. An event-driven algorithm is demonstrated to perform distributed control of distributed manipulator simulator by separating objects based on their physical properties.

Page generated in 0.0213 seconds