1 |
Performance Evaluation of High Performance Parallel I/ODhandapani, Mangayarkarasi 02 August 2003 (has links)
Performance of the I/O subsystem plays a significant role in parallel applications that need to access large amounts of data. I/O performance in such applications is expected to be scalable and balanced with respect to the communication and CPU performance. MPIIO, a part of the MPI-2 standard has many implementations. Each of the available clientside parallel architectures differ widely in their approach to achieving high performance. This thesis hypothesizes that the effectiveness of each available client-side parallel architecture differs in delivering overall parallel application performance for a given underlying file system and that increasing the performance for different workload characteristics requires different designs. This hypothesis is validated by the development of appropriate metrics and the analysis of the results, obtained from running the experiments.
|
2 |
Principal Design Criteria Influencing the Performance of a Portable, High Performance Parallel I/O ImplementationRajaram, Kumaran 11 May 2002 (has links)
MPI-IO, the parallel I/O functionality of MPI-2, is a portable interface designed specifically to achieve high-performance. This thesis proposes fundamental design criteria influencing the performance of a portable high performance I/O middleware. This thesis hypothesizes that overlap of I/O and computation and agglomeration of I/O requests based on an application's access pattern improve the performance of a portable parallel I/O implementation. The work included the development of MercutIO, a complete, portable, high performance MPI-IO implementation. MercutIO achieves portability through the Bulldog Abstract File System, a portable, efficient non-collective I/O interface, also developed in this thesis work. A new data access model based on non-blocking semantics is presented here. Two new I/O metrics (degree of overlapping and degree of non-contiguity) as well as parallel I/O benchmarks essential in the performance appraisal of a parallel I/O implementation are introduced in this thesis.
|
3 |
Adapting Remote Direct Memory Access Based File System to Parallel Input-/OutputVelusamy, Vijay 13 December 2003 (has links)
Traditional file access interfaces rely on ubiquitous transports that impose severe restrictions on performance and prove insufficient for adaptation to parallel Input/Output (I/O). Remote Direct Memory Access based (RDMA-based) approaches are aimed at moving data between different process address spaces with streamlined mediation and reduced involvement of the operating system using synchronization semantics that are different from ubiquitous transports. This thesis studies the adaptability of RDMA-based transports to parallel I/O. Combining RDMA semantics with parallel I/O leads to overhead reduction by overlapping communication and computation and by bandwidth enhancement. Although parallel I/O tends to increase latency in certain cases, use of RDMA techniques mitigate on this effect.
|
4 |
Improving Performance And Programmer Productivity For I/o-intensive High Performance Computing ApplicationsSehrish, Saba 01 January 2010 (has links)
Due to the explosive growth in the size of scientific data sets, data-intensive computing is an emerging trend in computational science. HPC applications are generating and processing large amount of data ranging from terabytes (TB) to petabytes (PB). This new trend of growth in data for HPC applications has imposed challenges as to what is an appropriate parallel programming framework to efficiently process large data sets. In this work, we study the applicability of two programming models (MPI/MPI-IO and MapReduce) to a variety of I/O-intensive HPC applications ranging from simulations to analytics. We identify several performance and programmer productivity related limitations of these existing programming models, if used for I/O-intensive applications. We propose new frameworks which will improve both performance and programmer productivity for the emerging I/O-intensive applications. Message Passing Interface (MPI) is widely used for writing HPC applications. MPI/MPI- IO allows a fine-grained control of assigning data and task distribution. At the programming frameworks level, various optimizations have been proposed to improve the performance of MPI/MPI-IO function calls. These performance optimizations are provided as various function options to the programmers. In order to write an efficient code, they are required to know the exact usage of the optimization functions, hence programmer productivity is limited. We propose an abstraction called Reduced Function Set Abstraction (RFSA) for MPI-IO to reduce the number of I/O functions and provide methods to automate the selection of appropriate I/O function for writing HPC simulation applications. The purpose of RFSA is to hide the performance optimization functions from the application developer, and relieve the application developer from deciding on a specific function. The proposed set of functions relies on a selection algorithm to decide among the most common optimizations provided by MPI-IO. Additionally, many application scientists are looking to integrate data-intensive computing into computational-intensive High Performance Computing facilities, particularly for data analytics. We have observed several scientific applications which must migrate their data from an HPC storage system to a data-intensive one. There is a gap between the data semantics of HPC storage and data-intensive system, hence, once migrated, the data must be further refined and reorganized. This reorganization must be performed before existing data-intensive tools such as MapReduce can be effectively used to analyze data. This reorganization requires at least two complete scans through the data set and then at least one MapReduce program to prepare the data before analyzing it. Running multiple MapReduce phases causes significant overhead for the application, in the form of excessive I/O operations. For every MapReduce application that must be run in order to complete the desired data analysis, a distributed read and write operation on the file system must be performed. Our contribution is to extend Map-Reduce to eliminate the multiple scans and also reduce the number of pre-processing MapReduce programs. We have added additional expressiveness to the MapReduce language in our novel framework called MapReduce with Access Patterns (MRAP), which allows users to specify the logical semantics of their data such that 1) the data can be analyzed without running multiple data pre-processing MapReduce programs, and 2) the data can be simultaneously reorganized as it is migrated to the data-intensive file system. We also provide a scheduling mechanism to further improve the performance of these applications. The main contributions of this thesis are, 1) We implement a selection algorithm for I/O functions like read/write, merge a set of functions for data types and file views and optimize the atomicity function by automating the locking mechanism in RFSA. By running different parallel I/O benchmarks on both medium-scale clusters and NERSC supercomputers, we show an improved programmer productivity (35.7% on average). This approach incurs an overhead of 2-5% for one particular optimization, and shows performance improvement of 17% when a combination of different optimizations is required by an application. 2) We provide an augmented Map-Reduce system (MRAP), which consist of an API and corresponding optimizations i.e. data restructuring and scheduling. We have demonstrated up to 33% throughput improvement in one real application (read-mapping in bioinformatics), and up to 70% in an I/O kernel of another application (halo catalogs analytics). Our scheduling scheme shows performance improvement of 18% for an I/O kernel of another application (QCD analytics).
|
Page generated in 0.0162 seconds