• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 164
  • 57
  • 44
  • 17
  • 15
  • 11
  • 10
  • 6
  • 5
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 382
  • 110
  • 90
  • 80
  • 66
  • 63
  • 61
  • 56
  • 51
  • 43
  • 42
  • 41
  • 39
  • 37
  • 36
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
191

Erfahrungen bei der Installation und vergleichende Messungen zu verschiedenen MPI Implementierungen auf einem Dual Xeon Cluster

Trautmann, Sven 02 July 2003 (has links) (PDF)
Workshop Mensch-Computer-Vernetzung
192

Communication/Computation Overlap in MPI

Hoefler, Torsten 04 January 2006 (has links) (PDF)
This talk discusses optimized collective algorithms and the benefits of leveraging independent hardware entities in a pipelined manner. The resulting approach uses overlap of computation and communication to reach this task. Different examples are given.
193

Integration einer neuen InfiniBand-Schnittstelle in die vorhandene InfiniBand MPICH2 Software

Mosch, Marek 25 April 2006 (has links) (PDF)
Entwurf einer einheitlichen API zur Nutzung von Mellanox V-API und OpenIB Verbs auf Basis von C Pre-Prozessor Makros und Integration der API in das vorhandene MPICH2-CH3 Device für Infiniband
194

Analysis and Optimization of the Packet Scheduler in Open MPI

Lichei, Andre 13 November 2006 (has links) (PDF)
We compared well known measurement methods for LogGP parameters and discuss their accuracy and network contention. Based on this, a new theoretically exact measurement method that does not saturate the network is derived and explained in detail. The applicability of our method is shown for the low level communication API of Open MPI across several interconnection networks. Based on the LogGP model, we developed a low overhead packet scheduling algorithm. It can handle different types of interconnects with different characteristics. It is able to produce schedules which are very close to the optimum for both small and large messages. The efficiency of the algorithm for small messages is show for a Open MPI implementation. The implementation uses the LogGP benchmark to obtain the LogGP parameters of the available interconnects and can so adapt to any given system.
195

Entwicklung einer optimierten kollektiven Komponente

Mosch, Marek 24 September 2007 (has links) (PDF)
Diese Diplomarbeit beschäftigt sich mit der Entwicklung einer kollektiven Komponente für die MPI-2 Implementation Open MPI. Die Komponente soll optimierte Algorithmen für das Myrinet Netzwerk auf Basis des Low-Level Kommunikations-protokolls GM beinhalten.
196

Optimierte Implementierung ausgewählter kollektiver Operationen unter Ausnutzung der Hardwareparallelität des InfiniBand Netzwerkes

Franke, Maik 24 September 2007 (has links) (PDF)
Ziel der Arbet ist eine optimierte Implementierung der im MPI-1 Standard definierten Reduktionsoperationen MPI_Reduce(), MPI_Allreduce(), MPI_Scan(), MPI_Reduce_scatter() für das InfiniBand Netzwerk. Hierbei soll besonderer Wert auf spezielle InfiniBand Operationen und die Hardwareparallelität gelegt werden. InfiniBand ermöglicht es Kommunikationsoperationen klar von Berechnungen zu trennen, was eine Überlappung beider Operationstypen in der Reduktion ermöglicht. Das Potential dieser Methode soll modelltheoretisch als auch praktisch in einer prototypischen Implementierung im Rahmen des Open MPI Frameworks erfolgen. Das Endresultat soll mit vorhandenen Implementierungen (z.B. MVAPICH) verglichen werden. / The performance of collective communication operations is one of the deciding factors in the overall performance of a MPI application. Current implementations of MPI use the point-to-point components to access the InfiniBand network. Therefore it is tried to improve the performance of a collective component by accessing the InfiniBand network directly. This should avoid overhead and make it possible to tune the algorithms to this specific network. Various algorithms for the MPI_Reduce, MPI_Allreduce, MPI_Scan and MPI_Reduce_scatter operations are presented. The theoretical performance of the algorithms is analyzed with the LogfP and LogGP models. Selected algorithms are implemented as part of an Open MPI collective component. Finally the performance of different algorithms and different MPI implementations is compared.
197

Erweiterung der Infinibandunterstützung von netgauge

Dietze, Stefan 25 February 2009 (has links) (PDF)
Diese Arbeit beschäftigt sich mit der Erweiterung des Infiniband-Moduls von netgauge. Dem Modul werden die nicht blockierenden Kommunikationsfunktionen hinzugefügt. Es wird auf die Implementierung dieser Funktionen und die verwendeten Algorithmen eingegangen. Weiterhin werden die Ergebnisse der Messungen bewertet und mit den Messergebnissen der blockierenden Funktionen verglichen. Betrachtet werden dabei die Bandbreite und Latenz der 1:1 Kommunikation. Die Messungen wurden sowohl auf Infinpath als auch auf Mellanox Hardware vorgenommen.
198

Evaluating and Improving the Performance of MPI-Allreduce on QLogic HTX/PCIe InifiniBand HCA

Mittenzwey, Nico 30 June 2009 (has links) (PDF)
This thesis analysed the QLogic InfiniPath QLE7140 HCA and its onload architecture and compared the results to the Mellanox InfiniHost III Lx HCA which uses an offload architecture. As expected, the QLogic InfiniPath QLE7140 HCA can outperform the Mellanox InfiniHost III Lx HCA in latency and bandwidth terms on our test system in various test scenarios. The benchmarks showed, that sending messages with multiple threads in parallel can increase the bandwidth greatly while bi-directional sends cut the effective bandwidth for one HCA by up to 30%. Different all-reduce algorithms where evaluated and compared with the help of the LogGP model. The comparison showed that new all-reduce algorithms can outperform the ones already implemented in Open MPI for different scenarios. The thesis also demonstrated, that one can implement multicast algorithms for InfiniBand easily by using the RDMA-CM API.
199

Analysis and Adaption of Graph Mapping Algorithms for Regular Graph Topologies

Rinke, Sebastian 01 September 2009 (has links) (PDF)
The Message Passing Interface (MPI) standard defines virtual topologies that can be applied to systems of cooperating processes. Among issues regarding a more convenient namespace this may be used to optimize the placement of MPI processes in order to reduce communication time. That means, the processes with their main communication paths represent a graph that has to be cost efficiently mapped onto the graph representing the actual communication network. In this context, this work analyses and compares state-of-the-art task mapping strategies with respect to running time and their quality of solutions to the MPI mapping problem. In particular, the focus is on generic strategies that can be used for arbitrary process/network topologies although, here, the topologies of interest are regular ones, where the number of processes is greater than the number of processors in the underlying physical network. Additionally, different measures of mapping quality are discussed and a close correspondence between the most appropriate, the weighted edge cut, and program execution time is shown. In order to investigate how mapping quality affects MPI program execution time, some mapping strategies have been incorporated into Open MPI. Finally, benchmark results prove that optimized process-to-processor mappings can improve program execution time by up to 60%, compared to the default mapping in many MPI implementations (linear mapping). The findings in this work can serve as reference not only for MPI implementors, but also for researchers investigating static process-to-processor mappings, in general.
200

A hybrid MPI/OpenMP parallelization of the adaptive integral method for multi-core clusters

Wei, Fangzhou 02 August 2011 (has links)
A hybrid of message passing and shared memory techniques is presented for scalable parallelization of the adaptive integral method (AIM), an FFT based algorithm, on clusters of identical multi-core processors. The proposed hybrid MPI/OpenMP parallelization scheme is based on a nested one-dimensional (1-D) slab decomposition of the 3-D auxiliary uniform grid and the associated AIM calculations: If there are M processors and T cores per processor, the scheme (i) divides the uniform grid into M slabs and MT sub-slabs, (ii) assigns each slab/sub-slab and the associated operations to one of the processors/cores, and (iii) uses MPI for inter-processor data communication and OpenMP for intra-processor data exchange. The MPI/OpenMP parallel AIM is used to accelerate the MOM solution of combined-field integral equations pertinent to the analysis of scattering from perfectly conducting surfaces. The scalability and efficiency of the implementation are investigated theoretically and verified numerically by solving benchmark scattering problems on a (near) petaflop supercomputing cluster of quad-core processors. The timing and speedup results on up to 1024 processors show that the proposed hybrid MPI/OpenMP parallelization exhibits better strong scalability (fixed problem size speedup) compared to pure MPI parallelization when multiple cores are used on each processor. / text

Page generated in 0.0278 seconds