Global ETD Search

311	Evaluating and Improving the Performance of MPI-Allreduce on QLogic HTX/PCIe InifiniBand HCA Mittenzwey, Nico 31 March 2009 (has links) This thesis analysed the QLogic InﬁniPath QLE7140 HCA and its onload architecture and compared the results to the Mellanox InﬁniHost III Lx HCA which uses an oﬄoad architecture. As expected, the QLogic InﬁniPath QLE7140 HCA can outperform the Mellanox InﬁniHost III Lx HCA in latency and bandwidth terms on our test system in various test scenarios. The benchmarks showed, that sending messages with multiple threads in parallel can increase the bandwidth greatly while bi-directional sends cut the eﬀective bandwidth for one HCA by up to 30%. Diﬀerent all-reduce algorithms where evaluated and compared with the help of the LogGP model. The comparison showed that new all-reduce algorithms can outperform the ones already implemented in Open MPI for diﬀerent scenarios. The thesis also demonstrated, that one can implement multicast algorithms for InﬁniBand easily by using the RDMA-CM API. info:eu-repo/classification/ddc/004 ddc:004 Hochleistungsrechnen Parallelrechner InfiniBand MPI_Allreduce Netzwerk OFED Open MPI PSM RDMA-CM
312	Analysis and Adaption of Graph Mapping Algorithms for Regular Graph Topologies Rinke, Sebastian 22 April 2009 (has links) The Message Passing Interface (MPI) standard defines virtual topologies that can be applied to systems of cooperating processes. Among issues regarding a more convenient namespace this may be used to optimize the placement of MPI processes in order to reduce communication time. That means, the processes with their main communication paths represent a graph that has to be cost efficiently mapped onto the graph representing the actual communication network. In this context, this work analyses and compares state-of-the-art task mapping strategies with respect to running time and their quality of solutions to the MPI mapping problem. In particular, the focus is on generic strategies that can be used for arbitrary process/network topologies although, here, the topologies of interest are regular ones, where the number of processes is greater than the number of processors in the underlying physical network. Additionally, different measures of mapping quality are discussed and a close correspondence between the most appropriate, the weighted edge cut, and program execution time is shown. In order to investigate how mapping quality affects MPI program execution time, some mapping strategies have been incorporated into Open MPI. Finally, benchmark results prove that optimized process-to-processor mappings can improve program execution time by up to 60%, compared to the default mapping in many MPI implementations (linear mapping). The findings in this work can serve as reference not only for MPI implementors, but also for researchers investigating static process-to-processor mappings, in general. info:eu-repo/classification/ddc/004 ddc:004 Graph MPI Mapping-Problem Netzwerk Graph mapping Network topologies Virtual topology
313	Runtime MPI Correctness Checking with a Scalable Tools Infrastructure Hilbrich, Tobias 08 June 2015 (has links) Increasing computational demand of simulations motivates the use of parallel computing systems. At the same time, this parallelism poses challenges to application developers. The Message Passing Interface (MPI) is a de-facto standard for distributed memory programming in high performance computing. However, its use also enables complex parallel programing errors such as races, communication errors, and deadlocks. Automatic tools can assist application developers in the detection and removal of such errors. This thesis considers tools that detect such errors during an application run and advances them towards a combination of both precise checks (neither false positives nor false negatives) and scalability. This includes novel hierarchical checks that provide scalability, as well as a formal basis for a distributed deadlock detection approach. At the same time, the development of parallel runtime tools is challenging and time consuming, especially if scalability and portability are key design goals. Current tool development projects often create similar tool components, while component reuse remains low. To provide a perspective towards more efficient tool development, which simplifies scalable implementations, component reuse, and tool integration, this thesis proposes an abstraction for a parallel tools infrastructure along with a prototype implementation. This abstraction overcomes the use of multiple interfaces for different types of tool functionality, which limit flexible component reuse. Thus, this thesis advances runtime error detection tools and uses their redesign and their increased scalability requirements to apply and evaluate a novel tool infrastructure abstraction. The new abstraction ultimately allows developers to focus on their tool functionality, rather than on developing or integrating common tool components. The use of such an abstraction in wide ranges of parallel runtime tool development projects could greatly increase component reuse. Thus, decreasing tool development time and cost. An application study with up to 16,384 application processes demonstrates the applicability of both the proposed runtime correctness concepts and of the proposed tools infrastructure. info:eu-repo/classification/ddc/004 ddc:004
314	Automated Beam Hardening Correction for Myocardial Perfusion Imaging using Computed Tomography Levi, Jacob 23 May 2019 (has links) No description available. Medical Imaging Physics Biomedical Engineering beam hardening beam hardening correction BHC CT ABHC myocardial perfusion MPI-CT image processing
315	Design and evaluation of a plain MPI-based cluster execution backend for the SkePU 3 skeleton programming framework Zeijlon, Alexander January 2023 (has links) SkePU 3 is a framework for parallel program execution that uses higher order functions called skeletons, which provide a layer of abstraction between user code and the parallel implementation it provides through its backends. The backend that enables SkePU to run on an HPC cluster has a slowdown of a factor two. This reduces the viability of SkePU as an alternative for HPC, and as such, warrants an investigation. Programs written in SkePU are sequential-looking, single-source C++ programs where skeleton calls can transparently execute on multiple different types of processing units, such as CPU cores, GPUs and clusters, using different backends. In this thesis, a strategy for improving the performance of SkePU on clusters is presented, and with it, the design and implementation of a new cluster backend that is simpler and more closely integrated with the non-cluster SkePU code base. Runtime measurements are made, which show that the new cluster backend sees a relative speedup of about a factor of two, which effectively eliminates the slowdown. SkePU skeleton programming algorithmic skeletons HPC cluster parallel programming MPI OpenMP CUDA Hybrid NUMA Computer Sciences Datavetenskap (datalogi)
316	Overlapping of Communication and Computation and Early Binding: Fundamental Mechanisms for Improving Parallel Performance on Clusters of Workstations Dimitrov, Rossen Petkov 12 May 2001 (has links) This study considers software techniques for improving performance on clusters of workstations and approaches for designing message-passing middleware that facilitate scalable, parallel processing. Early binding and overlapping of communication and computation are identified as fundamental approaches for improving parallel performance and scalability on clusters. Currently, cluster computers using the Message-Passing Interface for interprocess communication are the predominant choice for building high-performance computing facilities, which makes the findings of this work relevant to a wide audience from the areas of high-performance computing and parallel processing. The performance-enhancing techniques studied in this work are presently underutilized in practice because of the lack of adequate support by existing message-passing libraries and are also rarely considered by parallel algorithm designers. Furthermore, commonly accepted methods for performance analysis and evaluation of parallel systems omit these techniques and focus primarily on more obvious communication characteristics such as latency and bandwidth. This study provides a theoretical framework for describing early binding and overlapping of communication and computation in models for parallel programming. This framework defines four new performance metrics that facilitate new approaches for performance analysis of parallel systems and algorithms. This dissertation provides experimental data that validate the correctness and accuracy of the performance analysis based on the new framework. The theoretical results of this performance analysis can be used by designers of parallel system and application software for assessing the quality of their implementations and for predicting the effective performance benefits of early binding and overlapping. This work presents MPI/Pro, a new MPI implementation that is specifically optimized for clusters of workstations interconnected with high-speed networks. This MPI implementation emphasizes features such as persistent communication, asynchronous processing, low processor overhead, and independent message progress. These features are identified as critical for delivering maximum performance to applications. The experimental section of this dissertation demonstrates the capability of MPI/Pro to facilitate software techniques that result in significant application performance improvements. Specific demonstrations with Virtual Interface Architecture and TCP/IP over Ethernet are offered. performance metrics cluster computing message-passing middleware parallel processing parallel performance MPI early binding
317	Incorporating Fault-Tolerant Features into Message-Passing Middleware Batchu, Rajanikanth Reddy 10 May 2003 (has links) The popularity of MPI-based middleware and applications has led to their wide deployment. Such systems, however, are not inherently reliable and cannot tolerate external faults. This thesis presents a novel model-based approach for exploiting application features and other characteristics to categorize and create AEMs (Application Execution Model). This work realizes MPI/FT(tm), a middleware derived by selective incorporation of fault-tolerant features into MPI/Pro(tm) for two relevant AEMs. This thesis proves the following hypothesis: it is possible to successfully complete select MPI applications even in the presence of external faults, and such fault-tolerance can be achieved with acceptable performance overhead. This work defines parameters to measure the impact of this middleware on performance through faultree and fault-injected overheads. The hypothesis is validated through experimentation and measurement of sample MPI applications for two AEMs. Fault-tolerant computing. Computer interfaces. Model-based fault tolerance MPI Cluster Computing Fault Detection Group Communication
318	Enabling Efficient Use of MPI and PGAS Programming Models on Heterogeneous Clusters with High Performance Interconnects Potluri, Sreeram 18 September 2014 (has links) No description available. Computer Science Heterogeneous Clusters GPU MIC Many-core Architectures MPI PGAS One-sided Communication Runtimes InfiniBand RDMA Overlap HPC Applications
319	Designing High Performance and Scalable Unified Communication Runtime (UCR) for HPC and Big Data Middleware Jose, Jithin 30 December 2014 (has links) No description available. Computer Science MPI PGAS Unified Runtime OpenSHMEM Unified Parallel C Memcached HBase InfiniBand Clusters RDMA Runtime Design Hybrid Programming
320	Designing Scalable and Efficient I/O Middleware for Fault-Resilient High-Performance Computing Clusters Raja Chandrasekar, Raghunath January 2014 (has links) No description available. Computer Engineering Computer Science fault-tolerance resilience checkpointing process-migration Input-Output HPC supercomputing MPI MVAPICH accelerators energy-efficiency

Search results