• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 42
  • 5
  • 5
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 65
  • 31
  • 30
  • 26
  • 22
  • 20
  • 19
  • 14
  • 13
  • 12
  • 12
  • 12
  • 11
  • 11
  • 11
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Fast Barrier Synchronization for InfiniBand

Hoefler, Torsten 04 January 2006 (has links)
Barrier Synchronization is crucial for many parallel systems. This talk introduces different synchronization mechanisms and demonstrates new approaches to leverage special hardware properties of InfiniBand to lower the Barrier latency.
42

Integration einer neuen InfiniBand-Schnittstelle in die vorhandene InfiniBand MPICH2 Software

Mosch, Marek 25 April 2006 (has links)
Entwurf einer einheitlichen API zur Nutzung von Mellanox V-API und OpenIB Verbs auf Basis von C Pre-Prozessor Makros und Integration der API in das vorhandene MPICH2-CH3 Device für Infiniband
43

Enhancing an InfiniBand driver by utilizing an efficient malloc/free library supporting multiple page sizes

Rex, Robert 18 September 2006 (has links)
Despite using high-speed network interconnection systems like InfiniBand, the communication overhead for parallel applications, especially in the area of High-Performance Computing (HPC), is still high. Using large page frames - so called hugepages in Linux - can improve the crucial work of registering communication buffers to the network adapter. Thus, an InfiniBand driver was modified. But these hugepages do not only reduce communication costs but can also improve computation time in a perceptible manner, e.g. by less TLB misses. To bypass the outlay of rewriting applications, a preload library was implemented that is able to utilize large page frames transparently. This work also shows benchmark results with these components and performance improvements of up to 10 %.
44

Optimierte Implementierung ausgewählter kollektiver Operationen unter Ausnutzung der Hardwareparallelität des InfiniBand Netzwerkes

Franke, Maik 30 April 2007 (has links)
Ziel der Arbet ist eine optimierte Implementierung der im MPI-1 Standard definierten Reduktionsoperationen MPI_Reduce(), MPI_Allreduce(), MPI_Scan(), MPI_Reduce_scatter() für das InfiniBand Netzwerk. Hierbei soll besonderer Wert auf spezielle InfiniBand Operationen und die Hardwareparallelität gelegt werden. InfiniBand ermöglicht es Kommunikationsoperationen klar von Berechnungen zu trennen, was eine Überlappung beider Operationstypen in der Reduktion ermöglicht. Das Potential dieser Methode soll modelltheoretisch als auch praktisch in einer prototypischen Implementierung im Rahmen des Open MPI Frameworks erfolgen. Das Endresultat soll mit vorhandenen Implementierungen (z.B. MVAPICH) verglichen werden. / The performance of collective communication operations is one of the deciding factors in the overall performance of a MPI application. Current implementations of MPI use the point-to-point components to access the InfiniBand network. Therefore it is tried to improve the performance of a collective component by accessing the InfiniBand network directly. This should avoid overhead and make it possible to tune the algorithms to this specific network. Various algorithms for the MPI_Reduce, MPI_Allreduce, MPI_Scan and MPI_Reduce_scatter operations are presented. The theoretical performance of the algorithms is analyzed with the LogfP and LogGP models. Selected algorithms are implemented as part of an Open MPI collective component. Finally the performance of different algorithms and different MPI implementations is compared.
45

Evaluating and Improving the Performance of MPI-Allreduce on QLogic HTX/PCIe InifiniBand HCA

Mittenzwey, Nico 31 March 2009 (has links)
This thesis analysed the QLogic InfiniPath QLE7140 HCA and its onload architecture and compared the results to the Mellanox InfiniHost III Lx HCA which uses an offload architecture. As expected, the QLogic InfiniPath QLE7140 HCA can outperform the Mellanox InfiniHost III Lx HCA in latency and bandwidth terms on our test system in various test scenarios. The benchmarks showed, that sending messages with multiple threads in parallel can increase the bandwidth greatly while bi-directional sends cut the effective bandwidth for one HCA by up to 30%. Different all-reduce algorithms where evaluated and compared with the help of the LogGP model. The comparison showed that new all-reduce algorithms can outperform the ones already implemented in Open MPI for different scenarios. The thesis also demonstrated, that one can implement multicast algorithms for InfiniBand easily by using the RDMA-CM API.
46

Enabling Efficient Use of MPI and PGAS Programming Models on Heterogeneous Clusters with High Performance Interconnects

Potluri, Sreeram 18 September 2014 (has links)
No description available.
47

Designing High Performance and Scalable Unified Communication Runtime (UCR) for HPC and Big Data Middleware

Jose, Jithin 30 December 2014 (has links)
No description available.
48

High Performance Network I/O in Virtual Machines over Modern Interconnects

Huang, Wei 12 September 2008 (has links)
No description available.
49

Large-Message Nonblocking Allgather and Broadcast Offload via BlueField-2 DPU

Sarkauskas, Nicholas Robert 09 August 2022 (has links)
No description available.
50

Data services: bringing I/O processing to petascale

Abbasi, Mohammad Hasan 08 July 2011 (has links)
The increasing size of high performance computing systems and the associated increase in the volume of generated data, has resulted in an I/O bottleneck for these applications. This bottleneck is further exacerbated by the imbalance in the growth of processing capability compared to storage capability, due mainly to the power and cost requirements of scaling the storage. This thesis introduces data services, a new abstraction which provides significant benefits for data intensive applications. Data services combine low overhead data movement with flexible placement of data manipulation operations, to address the I/O challenges of leadership class scientific applications. The impact of asynchronous data movement on application runtime is minimized by utilizing novel server side data movement schedulers to avoid contention related jitter in application communication. Additionally, the JITStager component is presented. Utilizing dynamic code generation and flexible code placement, the JITStager allows data services to be executed as a pipeline extending from the application to storage. It is shown in this thesis that data services can add new functionality to the application without having an significant negative impact on performance.

Page generated in 0.0453 seconds