Global ETD Search

41	Fast Barrier Synchronization for InfiniBand Hoefler, Torsten 04 January 2006 (has links) Barrier Synchronization is crucial for many parallel systems. This talk introduces different synchronization mechanisms and demonstrates new approaches to leverage special hardware properties of InfiniBand to lower the Barrier latency. info:eu-repo/classification/ddc/004 ddc:004 MPI <Schnittstelle> Parallelrechner Barrier InfiniBand MPI_Barrier Open MPI
42	Integration einer neuen InfiniBand-Schnittstelle in die vorhandene InfiniBand MPICH2 Software Mosch, Marek 25 April 2006 (has links) Entwurf einer einheitlichen API zur Nutzung von Mellanox V-API und OpenIB Verbs auf Basis von C Pre-Prozessor Makros und Integration der API in das vorhandene MPICH2-CH3 Device für Infiniband info:eu-repo/classification/ddc/004 ddc:004 API MPI <Schnittstelle> InfiniBand MPICH2 OpenIB V-API Verbs
43	Enhancing an InfiniBand driver by utilizing an efficient malloc/free library supporting multiple page sizes Rex, Robert 18 September 2006 (has links) Despite using high-speed network interconnection systems like InfiniBand, the communication overhead for parallel applications, especially in the area of High-Performance Computing (HPC), is still high. Using large page frames - so called hugepages in Linux - can improve the crucial work of registering communication buffers to the network adapter. Thus, an InfiniBand driver was modified. But these hugepages do not only reduce communication costs but can also improve computation time in a perceptible manner, e.g. by less TLB misses. To bypass the outlay of rewriting applications, a preload library was implemented that is able to utilize large page frames transparently. This work also shows benchmark results with these components and performance improvements of up to 10 %. info:eu-repo/classification/ddc/000 ddc:000 Cluster <Rechnernetz> Hochleistungsrechnen LINUX Rechnernetz HPC Hugepages InfiniBand
44	Optimierte Implementierung ausgewählter kollektiver Operationen unter Ausnutzung der Hardwareparallelität des InfiniBand Netzwerkes Franke, Maik 30 April 2007 (has links) Ziel der Arbet ist eine optimierte Implementierung der im MPI-1 Standard definierten Reduktionsoperationen MPI_Reduce(), MPI_Allreduce(), MPI_Scan(), MPI_Reduce_scatter() für das InfiniBand Netzwerk. Hierbei soll besonderer Wert auf spezielle InfiniBand Operationen und die Hardwareparallelität gelegt werden. InfiniBand ermöglicht es Kommunikationsoperationen klar von Berechnungen zu trennen, was eine Überlappung beider Operationstypen in der Reduktion ermöglicht. Das Potential dieser Methode soll modelltheoretisch als auch praktisch in einer prototypischen Implementierung im Rahmen des Open MPI Frameworks erfolgen. Das Endresultat soll mit vorhandenen Implementierungen (z.B. MVAPICH) verglichen werden. / The performance of collective communication operations is one of the deciding factors in the overall performance of a MPI application. Current implementations of MPI use the point-to-point components to access the InfiniBand network. Therefore it is tried to improve the performance of a collective component by accessing the InfiniBand network directly. This should avoid overhead and make it possible to tune the algorithms to this specific network. Various algorithms for the MPI_Reduce, MPI_Allreduce, MPI_Scan and MPI_Reduce_scatter operations are presented. The theoretical performance of the algorithms is analyzed with the LogfP and LogGP models. Selected algorithms are implemented as part of an Open MPI collective component. Finally the performance of different algorithms and different MPI implementations is compared. info:eu-repo/classification/ddc/004 ddc:004 Cluster Hochleistungsrechnen InfiniBand Kollektive Operationen LogfP Modell MPI_Allreduce MPI_Reduce MPI_Reduce_scatter MPI_Scan Open MPI
45	Evaluating and Improving the Performance of MPI-Allreduce on QLogic HTX/PCIe InifiniBand HCA Mittenzwey, Nico 31 March 2009 (has links) This thesis analysed the QLogic InﬁniPath QLE7140 HCA and its onload architecture and compared the results to the Mellanox InﬁniHost III Lx HCA which uses an oﬄoad architecture. As expected, the QLogic InﬁniPath QLE7140 HCA can outperform the Mellanox InﬁniHost III Lx HCA in latency and bandwidth terms on our test system in various test scenarios. The benchmarks showed, that sending messages with multiple threads in parallel can increase the bandwidth greatly while bi-directional sends cut the eﬀective bandwidth for one HCA by up to 30%. Diﬀerent all-reduce algorithms where evaluated and compared with the help of the LogGP model. The comparison showed that new all-reduce algorithms can outperform the ones already implemented in Open MPI for diﬀerent scenarios. The thesis also demonstrated, that one can implement multicast algorithms for InﬁniBand easily by using the RDMA-CM API. info:eu-repo/classification/ddc/004 ddc:004 Hochleistungsrechnen Parallelrechner InfiniBand MPI_Allreduce Netzwerk OFED Open MPI PSM RDMA-CM
46	Enabling Efficient Use of MPI and PGAS Programming Models on Heterogeneous Clusters with High Performance Interconnects Potluri, Sreeram 18 September 2014 (has links) No description available. Computer Science Heterogeneous Clusters GPU MIC Many-core Architectures MPI PGAS One-sided Communication Runtimes InfiniBand RDMA Overlap HPC Applications
47	Designing High Performance and Scalable Unified Communication Runtime (UCR) for HPC and Big Data Middleware Jose, Jithin 30 December 2014 (has links) No description available. Computer Science MPI PGAS Unified Runtime OpenSHMEM Unified Parallel C Memcached HBase InfiniBand Clusters RDMA Runtime Design Hybrid Programming
48	High Performance Network I/O in Virtual Machines over Modern Interconnects Huang, Wei 12 September 2008 (has links) No description available. Computer Science Network I/O Virtual Machines InfiniBand OS-bypass VMM-bypass Migration RDMA shared memory communication
49	Large-Message Nonblocking Allgather and Broadcast Offload via BlueField-2 DPU Sarkauskas, Nicholas Robert 09 August 2022 (has links) No description available. Computer Engineering Computer Science
50	Data services: bringing I/O processing to petascale Abbasi, Mohammad Hasan 08 July 2011 (has links) The increasing size of high performance computing systems and the associated increase in the volume of generated data, has resulted in an I/O bottleneck for these applications. This bottleneck is further exacerbated by the imbalance in the growth of processing capability compared to storage capability, due mainly to the power and cost requirements of scaling the storage. This thesis introduces data services, a new abstraction which provides significant benefits for data intensive applications. Data services combine low overhead data movement with flexible placement of data manipulation operations, to address the I/O challenges of leadership class scientific applications. The impact of asynchronous data movement on application runtime is minimized by utilizing novel server side data movement schedulers to avoid contention related jitter in application communication. Additionally, the JITStager component is presented. Utilizing dynamic code generation and flexible code placement, the JITStager allows data services to be executed as a pipeline extending from the application to storage. It is shown in this thesis that data services can add new functionality to the application without having an significant negative impact on performance. I/O Scientific computing Data services I/O frameworks Infiniband Cray Supercomputing Data HPC Petaflops computers High performance computing Electronic data processing Computer science

Search results