Global ETD Search

191	Erfahrungen bei der Installation und vergleichende Messungen zu verschiedenen MPI Implementierungen auf einem Dual Xeon Cluster Trautmann, Sven 02 July 2003 (has links) (PDF) Workshop Mensch-Computer-Vernetzung SMP ddc:004 Cluster <Rechnernetz> MPI <Schnittstelle>
192	Communication/Computation Overlap in MPI Hoefler, Torsten 04 January 2006 (has links) (PDF) This talk discusses optimized collective algorithms and the benefits of leveraging independent hardware entities in a pipelined manner. The resulting approach uses overlap of computation and communication to reach this task. Different examples are given. MPI_BARRIER Non blocking collective operations kollektive Operationen ddc:004 MPI <Schnittstelle> Parallelrechner
193	Integration einer neuen InfiniBand-Schnittstelle in die vorhandene InfiniBand MPICH2 Software Mosch, Marek 25 April 2006 (has links) (PDF) Entwurf einer einheitlichen API zur Nutzung von Mellanox V-API und OpenIB Verbs auf Basis von C Pre-Prozessor Makros und Integration der API in das vorhandene MPICH2-CH3 Device für Infiniband InfiniBand MPICH2 OpenIB V-API Verbs ddc:004 API MPI <Schnittstelle>
194	Analysis and Optimization of the Packet Scheduler in Open MPI Lichei, Andre 13 November 2006 (has links) (PDF) We compared well known measurement methods for LogGP parameters and discuss their accuracy and network contention. Based on this, a new theoretically exact measurement method that does not saturate the network is derived and explained in detail. The applicability of our method is shown for the low level communication API of Open MPI across several interconnection networks. Based on the LogGP model, we developed a low overhead packet scheduling algorithm. It can handle different types of interconnects with different characteristics. It is able to produce schedules which are very close to the optimum for both small and large messages. The efficiency of the algorithm for small messages is show for a Open MPI implementation. The implementation uses the LogGP benchmark to obtain the LogGP parameters of the available interconnects and can so adapt to any given system. LogGP Modular Component Architecture Open MPI ddc:004 Hochleistungsrechnen Informatik Parallelrechner Scheduling
195	Entwicklung einer optimierten kollektiven Komponente Mosch, Marek 24 September 2007 (has links) (PDF) Diese Diplomarbeit beschäftigt sich mit der Entwicklung einer kollektiven Komponente für die MPI-2 Implementation Open MPI. Die Komponente soll optimierte Algorithmen für das Myrinet Netzwerk auf Basis des Low-Level Kommunikations-protokolls GM beinhalten. MPI_ALLTOALL MPI_BARRIER MPI_BCAST MPI_GATHER MPI_SCATTER Myrinet Open MPI ddc:004 Hochleistungsrechnen Netzwerk
196	Optimierte Implementierung ausgewählter kollektiver Operationen unter Ausnutzung der Hardwareparallelität des InfiniBand Netzwerkes Franke, Maik 24 September 2007 (has links) (PDF) Ziel der Arbet ist eine optimierte Implementierung der im MPI-1 Standard definierten Reduktionsoperationen MPI_Reduce(), MPI_Allreduce(), MPI_Scan(), MPI_Reduce_scatter() für das InfiniBand Netzwerk. Hierbei soll besonderer Wert auf spezielle InfiniBand Operationen und die Hardwareparallelität gelegt werden. InfiniBand ermöglicht es Kommunikationsoperationen klar von Berechnungen zu trennen, was eine Überlappung beider Operationstypen in der Reduktion ermöglicht. Das Potential dieser Methode soll modelltheoretisch als auch praktisch in einer prototypischen Implementierung im Rahmen des Open MPI Frameworks erfolgen. Das Endresultat soll mit vorhandenen Implementierungen (z.B. MVAPICH) verglichen werden. / The performance of collective communication operations is one of the deciding factors in the overall performance of a MPI application. Current implementations of MPI use the point-to-point components to access the InfiniBand network. Therefore it is tried to improve the performance of a collective component by accessing the InfiniBand network directly. This should avoid overhead and make it possible to tune the algorithms to this specific network. Various algorithms for the MPI_Reduce, MPI_Allreduce, MPI_Scan and MPI_Reduce_scatter operations are presented. The theoretical performance of the algorithms is analyzed with the LogfP and LogGP models. Selected algorithms are implemented as part of an Open MPI collective component. Finally the performance of different algorithms and different MPI implementations is compared. InfiniBand Kollektive Operationen LogfP Modell MPI_Allreduce MPI_Reduce MPI_Reduce_scatter MPI_Scan Open MPI ddc:004 Cluster Hochleistungsrechnen
197	Erweiterung der Infinibandunterstützung von netgauge Dietze, Stefan 25 February 2009 (has links) (PDF) Diese Arbeit beschäftigt sich mit der Erweiterung des Infiniband-Moduls von netgauge. Dem Modul werden die nicht blockierenden Kommunikationsfunktionen hinzugefügt. Es wird auf die Implementierung dieser Funktionen und die verwendeten Algorithmen eingegangen. Weiterhin werden die Ergebnisse der Messungen bewertet und mit den Messergebnissen der blockierenden Funktionen verglichen. Betrachtet werden dabei die Bandbreite und Latenz der 1:1 Kommunikation. Die Messungen wurden sowohl auf Infinpath als auch auf Mellanox Hardware vorgenommen. ddc:000 Computerunterstützte Kommunikation Latenzzeit <Informatik> MPI Netzwerk Parallelverarbeitung
198	Evaluating and Improving the Performance of MPI-Allreduce on QLogic HTX/PCIe InifiniBand HCA Mittenzwey, Nico 30 June 2009 (has links) (PDF) This thesis analysed the QLogic InﬁniPath QLE7140 HCA and its onload architecture and compared the results to the Mellanox InﬁniHost III Lx HCA which uses an oﬄoad architecture. As expected, the QLogic InﬁniPath QLE7140 HCA can outperform the Mellanox InﬁniHost III Lx HCA in latency and bandwidth terms on our test system in various test scenarios. The benchmarks showed, that sending messages with multiple threads in parallel can increase the bandwidth greatly while bi-directional sends cut the eﬀective bandwidth for one HCA by up to 30%. Diﬀerent all-reduce algorithms where evaluated and compared with the help of the LogGP model. The comparison showed that new all-reduce algorithms can outperform the ones already implemented in Open MPI for diﬀerent scenarios. The thesis also demonstrated, that one can implement multicast algorithms for InﬁniBand easily by using the RDMA-CM API. InfiniBand MPI_Allreduce Netzwerk OFED Open MPI PSM RDMA-CM ddc:004 Hochleistungsrechnen Parallelrechner
199	Analysis and Adaption of Graph Mapping Algorithms for Regular Graph Topologies Rinke, Sebastian 01 September 2009 (has links) (PDF) The Message Passing Interface (MPI) standard defines virtual topologies that can be applied to systems of cooperating processes. Among issues regarding a more convenient namespace this may be used to optimize the placement of MPI processes in order to reduce communication time. That means, the processes with their main communication paths represent a graph that has to be cost efficiently mapped onto the graph representing the actual communication network. In this context, this work analyses and compares state-of-the-art task mapping strategies with respect to running time and their quality of solutions to the MPI mapping problem. In particular, the focus is on generic strategies that can be used for arbitrary process/network topologies although, here, the topologies of interest are regular ones, where the number of processes is greater than the number of processors in the underlying physical network. Additionally, different measures of mapping quality are discussed and a close correspondence between the most appropriate, the weighted edge cut, and program execution time is shown. In order to investigate how mapping quality affects MPI program execution time, some mapping strategies have been incorporated into Open MPI. Finally, benchmark results prove that optimized process-to-processor mappings can improve program execution time by up to 60%, compared to the default mapping in many MPI implementations (linear mapping). The findings in this work can serve as reference not only for MPI implementors, but also for researchers investigating static process-to-processor mappings, in general. Graph mapping Network topologies Virtual topology ddc:004 Graph MPI Mapping-Problem Netzwerk
200	A hybrid MPI/OpenMP parallelization of the adaptive integral method for multi-core clusters Wei, Fangzhou 02 August 2011 (has links) A hybrid of message passing and shared memory techniques is presented for scalable parallelization of the adaptive integral method (AIM), an FFT based algorithm, on clusters of identical multi-core processors. The proposed hybrid MPI/OpenMP parallelization scheme is based on a nested one-dimensional (1-D) slab decomposition of the 3-D auxiliary uniform grid and the associated AIM calculations: If there are M processors and T cores per processor, the scheme (i) divides the uniform grid into M slabs and MT sub-slabs, (ii) assigns each slab/sub-slab and the associated operations to one of the processors/cores, and (iii) uses MPI for inter-processor data communication and OpenMP for intra-processor data exchange. The MPI/OpenMP parallel AIM is used to accelerate the MOM solution of combined-field integral equations pertinent to the analysis of scattering from perfectly conducting surfaces. The scalability and efficiency of the implementation are investigated theoretically and verified numerically by solving benchmark scattering problems on a (near) petaflop supercomputing cluster of quad-core processors. The timing and speedup results on up to 1024 processors show that the proposed hybrid MPI/OpenMP parallelization exhibits better strong scalability (fixed problem size speedup) compared to pure MPI parallelization when multiple cores are used on each processor. / text AIM CFIE MPI/OpenMP Multi-core processor cluster Multi-core processing Computer architecture

Search results