Global ETD Search

1	Principal Design Criteria Influencing the Performance of a Portable, High Performance Parallel I/O Implementation Rajaram, Kumaran 11 May 2002 (has links) MPI-IO, the parallel I/O functionality of MPI-2, is a portable interface designed specifically to achieve high-performance. This thesis proposes fundamental design criteria influencing the performance of a portable high performance I/O middleware. This thesis hypothesizes that overlap of I/O and computation and agglomeration of I/O requests based on an application's access pattern improve the performance of a portable parallel I/O implementation. The work included the development of MercutIO, a complete, portable, high performance MPI-IO implementation. MercutIO achieves portability through the Bulldog Abstract File System, a portable, efficient non-collective I/O interface, also developed in this thesis work. A new data access model based on non-blocking semantics is presented here. Two new I/O metrics (degree of overlapping and degree of non-contiguity) as well as parallel I/O benchmarks essential in the performance appraisal of a parallel I/O implementation are introduced in this thesis. MPI-IO MPI-2 Parallel I/O MercutIO BAFS
2	Adapting Remote Direct Memory Access Based File System to Parallel Input-/Output Velusamy, Vijay 13 December 2003 (has links) Traditional file access interfaces rely on ubiquitous transports that impose severe restrictions on performance and prove insufficient for adaptation to parallel Input/Output (I/O). Remote Direct Memory Access based (RDMA-based) approaches are aimed at moving data between different process address spaces with streamlined mediation and reduced involvement of the operating system using synchronization semantics that are different from ubiquitous transports. This thesis studies the adaptability of RDMA-based transports to parallel I/O. Combining RDMA semantics with parallel I/O leads to overhead reduction by overlapping communication and computation and by bandwidth enhancement. Although parallel I/O tends to increase latency in certain cases, use of RDMA techniques mitigate on this effect. DAFS RDMA MercutIO Parallel I/O MPI-IO MPI-2
3	A Scalable Architecture for Simplifying Full-Range Scientific Data Analysis Kendall, Wesley James 01 December 2011 (has links) According to a recent exascale roadmap report, analysis will be the limiting factor in gaining insight from exascale data. Analysis problems that must operate on the full range of a dataset are among the most difficult. Some of the primary challenges in this regard come from disk access, data managment, and programmability of analysis tasks on exascale architectures. In this dissertation, I have provided an architectural approach that simplifies and scales data analysis on supercomputing architectures while masking parallel intricacies to the user. My architecture has three primary general contributions: 1) a novel design pattern and implmentation for reading multi-file and variable datasets, 2) the integration of querying and sorting as a way to simplify data-parallel analysis tasks, and 3) a new parallel programming model and system for efficiently scaling domain-traversal tasks. The design of my architecture has allowed studies in several application areas that were not previously possible. Some of these include large-scale satellite data and ocean flow analysis. The major driving example is of internal-model variability assessments of flow behavior in the GEOS-5 atmospheric modeling dataset. This application issued over 40 million particle traces for model comparison (the largest parallel flow tracing experiment to date), and my system was able to scale execution up to 65,536 processes on an IBM BlueGene/P system. Large-Data Analysis Visualization MapReduce Parallel I/O Parallel Processing Systems Architecture
4	Scalability Analysis and Optimization for Large-Scale Deep Learning Pumma, Sarunya 03 February 2020 (has links) Despite its growing importance, scalable deep learning (DL) remains a difficult challenge. Scalability of large-scale DL is constrained by many factors, including those deriving from data movement and data processing. DL frameworks rely on large volumes of data to be fed to the computation engines for processing. However, current hardware trends showcase that data movement is already one of the slowest components in modern high performance computing systems, and this gap is only going to increase in the future. This includes data movement needed from the filesystem, within the network subsystem, and even within the node itself, all of which limit the scalability of DL frameworks on large systems. Even after data is moved to the computational units, managing this data is not easy. Modern DL frameworks use multiple components---such as graph scheduling, neural network training, gradient synchronization, and input pipeline processing---to process this data in an asynchronous uncoordinated manner, which results in straggler processes and consequently computational imbalance, further limiting scalability. This thesis studies a subset of the large body of data movement and data processing challenges that exist in modern DL frameworks. For the first study, we investigate file I/O constraints that limit the scalability of large-scale DL. We first analyze the Caffe DL framework with Lightning Memory-Mapped Database (LMDB), one of the most widely used file I/O subsystems in DL frameworks, to understand the causes of file I/O inefficiencies. Based on our analysis, we propose LMDBIO---an optimized I/O plugin for scalable DL that addresses the various shortcomings in existing file I/O for DL. Our experimental results show that LMDBIO significantly outperforms LMDB in all cases and improves overall application performance by up to 65-fold on 9,216 CPUs of the Blues and Bebop supercomputers at Argonne National Laboratory. Our second study deals with the computational imbalance problem in data processing. For most DL systems, the simultaneous and asynchronous execution of multiple data-processing components on shared hardware resources causes these components to contend with one another, leading to severe computational imbalance and degraded scalability. We propose various novel optimizations that minimize resource contention and improve performance by up to 35% for training various neural networks on 24,576 GPUs of the Summit supercomputer at Oak Ridge National Laboratory---the world's largest supercomputer at the time of writing of this thesis. / Doctor of Philosophy / Deep learning is a method for computers to automatically extract complex patterns and trends from large volumes of data. It is a popular methodology that we use every day when we talk to Apple Siri or Google Assistant, when we use self-driving cars, or even when we witnessed IBM Watson be crowned as the champion of Jeopardy! While deep learning is integrated into our everyday life, it is a complex problem that has gotten the attention of many researchers. Executing deep learning is a highly computationally intensive problem. On traditional computers, such as a generic laptop or desktop machine, the computation for large deep learning problems can take years or decades to complete. Consequently, supercomputers, which are machines with massive computational capability, are leveraged for deep learning workloads. The world's fastest supercomputer today, for example, is capable of performing almost 200 quadrillion floating point operations every second. While that is impressive, for large problems, unfortunately, even the fastest supercomputers today are not fast enough. The problem is not that they do not have enough computational capability, but that deep learning problems inherently rely on a lot of data---the entire concept of deep learning centers around the fact that the computer would study a huge volume of data and draw trends from it. Moving and processing this data, unfortunately, is much slower than the computation itself and with the current hardware trends it is not expected to get much faster in the future. This thesis aims at making deep learning executions on large supercomputers faster. Specifically, it looks at two pieces associated with managing data: (1) data reading---how to quickly read large amounts of data from storage, and (2) computational imbalance---how to ensure that the different processors on the supercomputer are not waiting for each other and thus wasting time. We first analyze each performance problem to identify the root cause of it. Then, based on the analysis, we propose several novel techniques to solve the problem. With our optimizations, we are able to significantly improve the performance of deep learning execution on a number of supercomputers, including Blues and Bebop at Argonne National Laboratory, and Summit---the world's fastest supercomputer---at Oak Ridge National Laboratory. Scalable Deep Learning Data Movement Investigation Parallel I/O Optimization Straggler Processes Computational Imbalance Resource Contention
5	Computational Flood Modeling and Visual Analysis Johnson, Donald W 07 May 2016 (has links) This dissertation introduces FESM (Flood Event Simulation Model), a Geographic Information System (GIS) tool designed for use on gaged river systems that can be used to guide logistic support during disaster events. FESM rapidly generates flood predictions using elevation data from real-world sensors or generated by other models. Verification and validation data for FESM are provided. In order to construct a visualization system for interacting with FESM outputs, single buffer and dual buffer techniques for moving massive datasets to the GPU for processing using OpenCL were rigorously tested and timed, and an analysis of the costs/benefits of using buffers or images was conducted. Finally, DRO (Dynamic Raster Overlay), a visualization system for analysis of datasets composed of multiple overlapping flood maps is introduced, and expert feedback is provided on the effectiveness of DRO with selected case studies. Multi Surface Display Visual Analytics OpenCL Parallel I/O Framebuffer Operations Bitmap Operations Parrallel Processing Flood Mapping\|GIS Flood Inundation
6	An in-situ visualization approach for parallel coupling and steering of simulations through distributed shared memory files / Une approche de visualisation in-situ pour le couplage parallèle et le pilotage de simulations à travers des fichiers en mémoire distribuée partagée Soumagne, Jérôme 14 December 2012 (has links) Les codes de simulation devenant plus performants et plus interactifs, il est important de suivre l'avancement d'une simulation in-situ, en réalisant non seulement la visualisation mais aussi l'analyse des données en même temps qu'elles sont générées. Suivre l'avancement ou réaliser le post-traitement des données de simulation in-situ présente un avantage évident par rapport à l'approche conventionnelle consistant à sauvegarder—et à recharger—à partir d'un système de fichiers; le temps et l'espace pris pour écrire et ensuite lire les données à partir du disque est un goulet d'étranglement significatif pour la simulation et les étapes consécutives de post-traitement. Par ailleurs, la simulation peut être arrêtée, modifiée, ou potentiellement pilotée, conservant ainsi les ressources CPU.Nous présentons dans cette thèse une approche de couplage faible qui permet à une simulation de transférer des données vers un serveur de visualisation via l'utilisation de fichiers en mémoire. Nous montrons dans cette étude comment l'interface, implémentée au-dessus d'un format hiérarchique de données (HDF5), nous permet de réduire efficacement le goulet d'étranglement introduit par les I/Os en utilisant des stratégies efficaces de communication et de configuration des données. Pour le pilotage, nous présentons une interface qui permet non seulement la modification de simples paramètres, mais également le remaillage complet de grilles ou des opérations impliquant la régénérationde grandeurs numériques sur le domaine entier de calcul d'être effectués. Cette approche, testée et validée sur deux cas-tests industriels, est suffisamment générique pour qu'aucune connaissance particulière du modèle de données sous-jacent ne soit requise. / As simulation codes become more powerful and more interactive, it is increasingly desirable to monitor a simulation in-situ, performing not only visualization but also analysis of the incoming data as it is generated. Monitoring or post-processing simulation data in-situ has obvious advantage over the conventional approach of saving to—and reloading data from—the file system; the time and space it takes to write and then read the data from disk is a significant bottleneck for both the simulation and subsequent post-processing steps. Furthermore, the simulation may be stopped, modified, or potentially steered, thus conserving CPU resources. We present in this thesis a loosely coupled approach that enables a simulation to transfer data to a visualization server via the use of in-memory files. We show in this study how the interface, implemented on top of a widely used hierarchical data format (HDF5), allows us to efficiently decrease the I/O bottleneck by using efficient communication and data mapping strategies. For steering, we present an interface that allows not only simple parameter changes but also complete re-meshing of grids or operations involving regeneration of field values over the entire computational domain to be carried out. This approach, tested and validated on two industrial test cases, is generic enough so that no particular knowledge of the underlying model is required. I/Os Parallèles Mémoire Partagée Distribuée Modèles de Communication Modélisation Informatique Modèles de Données Synchronisation Parallel I/O Distributed Shared Memory Communication Models Computational Modeling Data Models Synchronization
7	Evaluating I/O scheduling techniques at the forwarding layer and coordinating data server accesses / Avaliação de técnicas de escalonamento de E/S na camada de encaminhamento e coordenação de acesso aos servidores de dados Bez, Jean Luca January 2016 (has links) Em ambientes de Computação de Alto Desempenho, as aplicações científicas dependem dos Sistemas de Arquivos Paralelos (SAP) para obter desempenho de Entrada/Saída (E/S), especialmente ao lidar com grandes quantidades de dados. No entanto, E/S ainda é um gargalo para um número crescente de aplicações, devido à diferença histórica entre a velocidade de processamento e de acesso aos dados. Para aliviar a concorrência causada por milhares de nós que acessam um número significativamente menor de servidores SAP, normalmente nós intermediários de E/S são adicionados entre os nós de processamento e o sistema de arquivos. Cada nó intermediário encaminha solicitações de vários clientes para o sistema, uma configuração que dá a este componente a oportunidade de executar otimizações como o escalonamento de requisições de E/S. O objetivo desta dissertação é avaliar diferentes algoritmos de escalonamento, na camada de encaminhamento de E/S, cuja finalidade é melhorar o padrão de acesso das aplicações, agregando e reordenando requisições para evitar padrões que são conhecidos por prejudicar o desempenho. Demonstramos que os escalonadores FIFO (First In, First Out), HBRR (Handle-Based Round-Robin), TO (Time Order), SJF (Shortest Job First) e MLF (Multilevel Feedback) são apenas parcialmente eficazes porque o padrão de acesso não é o principal fator que afeta o desempenho na camada de encaminhamento de E/S, especialmente para requisições de leitura Um novo algoritmo de escalonamento chamado TWINS é proposto para coordenar o acesso de nós intermediários de E/S aos servidores de dados do sistema de arquivos paralelo. Nossa abordagem reduz a concorrência nos servidores de dados, um fator previamente demonstrado como reponsável por afetar negativamente o desempenho. O algoritmo proposto é capaz de melhorar o tempo de leitura de arquivos compartilhados em até 28% se comparado a outros algoritmos de escalonamento e em até 50% se comparado a não fazer o encaminhamento de requisições de E/S. / In High Performance Computing (HPC) environments, scientific applications rely on Parallel File Systems (PFS) to obtain Input/Output (I/O) performance especially when handling large amounts of data. However, I/O is still a bottleneck for an increasing number of applications, due to the historical gap between processing and data access speed. To alleviate the concurrency caused by thousands of nodes accessing a significantly smaller number of PFS servers, intermediate I/O nodes are typically employed between processing nodes and the file system. Each intermediate node forwards requests from multiple clients to the parallel file system, a setup which gives this component the opportunity to perform optimizations like I/O scheduling. The objective of this dissertation is to evaluate different scheduling algorithms, at the I/O forwarding layer, that work to improve concurrent access patterns by aggregating and reordering requests to avoid patterns known to harm performance. We demonstrate that the FIFO (First In, First Out), HBRR (Handle- Based Round-Robin), TO (Time Order), SJF (Shortest Job First) and MLF (Multilevel Feedback) schedulers are only partially effective because the access pattern is not the main factor that affects performance in the I/O forwarding layer, especially for read requests. A new scheduling algorithm, TWINS, is proposed to coordinate the access of intermediate I/O nodes to the parallel file system data servers. Our approach decreases concurrency at the data servers, a factor previously proven to negatively affect performance. The proposed algorithm is able to improve read performance from shared files by up to 28% over other scheduling algorithms and by up to 50% over not forwarding I/O requests. Processamento paralelo Computacao cientifica : Alto desempenho High performance I/O Parallel file systems Parallel I/O I/O forwarding I/O scheduling Access coordination
8	Evaluating I/O scheduling techniques at the forwarding layer and coordinating data server accesses / Avaliação de técnicas de escalonamento de E/S na camada de encaminhamento e coordenação de acesso aos servidores de dados Bez, Jean Luca January 2016 (has links) Em ambientes de Computação de Alto Desempenho, as aplicações científicas dependem dos Sistemas de Arquivos Paralelos (SAP) para obter desempenho de Entrada/Saída (E/S), especialmente ao lidar com grandes quantidades de dados. No entanto, E/S ainda é um gargalo para um número crescente de aplicações, devido à diferença histórica entre a velocidade de processamento e de acesso aos dados. Para aliviar a concorrência causada por milhares de nós que acessam um número significativamente menor de servidores SAP, normalmente nós intermediários de E/S são adicionados entre os nós de processamento e o sistema de arquivos. Cada nó intermediário encaminha solicitações de vários clientes para o sistema, uma configuração que dá a este componente a oportunidade de executar otimizações como o escalonamento de requisições de E/S. O objetivo desta dissertação é avaliar diferentes algoritmos de escalonamento, na camada de encaminhamento de E/S, cuja finalidade é melhorar o padrão de acesso das aplicações, agregando e reordenando requisições para evitar padrões que são conhecidos por prejudicar o desempenho. Demonstramos que os escalonadores FIFO (First In, First Out), HBRR (Handle-Based Round-Robin), TO (Time Order), SJF (Shortest Job First) e MLF (Multilevel Feedback) são apenas parcialmente eficazes porque o padrão de acesso não é o principal fator que afeta o desempenho na camada de encaminhamento de E/S, especialmente para requisições de leitura Um novo algoritmo de escalonamento chamado TWINS é proposto para coordenar o acesso de nós intermediários de E/S aos servidores de dados do sistema de arquivos paralelo. Nossa abordagem reduz a concorrência nos servidores de dados, um fator previamente demonstrado como reponsável por afetar negativamente o desempenho. O algoritmo proposto é capaz de melhorar o tempo de leitura de arquivos compartilhados em até 28% se comparado a outros algoritmos de escalonamento e em até 50% se comparado a não fazer o encaminhamento de requisições de E/S. / In High Performance Computing (HPC) environments, scientific applications rely on Parallel File Systems (PFS) to obtain Input/Output (I/O) performance especially when handling large amounts of data. However, I/O is still a bottleneck for an increasing number of applications, due to the historical gap between processing and data access speed. To alleviate the concurrency caused by thousands of nodes accessing a significantly smaller number of PFS servers, intermediate I/O nodes are typically employed between processing nodes and the file system. Each intermediate node forwards requests from multiple clients to the parallel file system, a setup which gives this component the opportunity to perform optimizations like I/O scheduling. The objective of this dissertation is to evaluate different scheduling algorithms, at the I/O forwarding layer, that work to improve concurrent access patterns by aggregating and reordering requests to avoid patterns known to harm performance. We demonstrate that the FIFO (First In, First Out), HBRR (Handle- Based Round-Robin), TO (Time Order), SJF (Shortest Job First) and MLF (Multilevel Feedback) schedulers are only partially effective because the access pattern is not the main factor that affects performance in the I/O forwarding layer, especially for read requests. A new scheduling algorithm, TWINS, is proposed to coordinate the access of intermediate I/O nodes to the parallel file system data servers. Our approach decreases concurrency at the data servers, a factor previously proven to negatively affect performance. The proposed algorithm is able to improve read performance from shared files by up to 28% over other scheduling algorithms and by up to 50% over not forwarding I/O requests. Processamento paralelo Computacao cientifica : Alto desempenho High performance I/O Parallel file systems Parallel I/O I/O forwarding I/O scheduling Access coordination
9	Evaluating I/O scheduling techniques at the forwarding layer and coordinating data server accesses / Avaliação de técnicas de escalonamento de E/S na camada de encaminhamento e coordenação de acesso aos servidores de dados Bez, Jean Luca January 2016 (has links) Em ambientes de Computação de Alto Desempenho, as aplicações científicas dependem dos Sistemas de Arquivos Paralelos (SAP) para obter desempenho de Entrada/Saída (E/S), especialmente ao lidar com grandes quantidades de dados. No entanto, E/S ainda é um gargalo para um número crescente de aplicações, devido à diferença histórica entre a velocidade de processamento e de acesso aos dados. Para aliviar a concorrência causada por milhares de nós que acessam um número significativamente menor de servidores SAP, normalmente nós intermediários de E/S são adicionados entre os nós de processamento e o sistema de arquivos. Cada nó intermediário encaminha solicitações de vários clientes para o sistema, uma configuração que dá a este componente a oportunidade de executar otimizações como o escalonamento de requisições de E/S. O objetivo desta dissertação é avaliar diferentes algoritmos de escalonamento, na camada de encaminhamento de E/S, cuja finalidade é melhorar o padrão de acesso das aplicações, agregando e reordenando requisições para evitar padrões que são conhecidos por prejudicar o desempenho. Demonstramos que os escalonadores FIFO (First In, First Out), HBRR (Handle-Based Round-Robin), TO (Time Order), SJF (Shortest Job First) e MLF (Multilevel Feedback) são apenas parcialmente eficazes porque o padrão de acesso não é o principal fator que afeta o desempenho na camada de encaminhamento de E/S, especialmente para requisições de leitura Um novo algoritmo de escalonamento chamado TWINS é proposto para coordenar o acesso de nós intermediários de E/S aos servidores de dados do sistema de arquivos paralelo. Nossa abordagem reduz a concorrência nos servidores de dados, um fator previamente demonstrado como reponsável por afetar negativamente o desempenho. O algoritmo proposto é capaz de melhorar o tempo de leitura de arquivos compartilhados em até 28% se comparado a outros algoritmos de escalonamento e em até 50% se comparado a não fazer o encaminhamento de requisições de E/S. / In High Performance Computing (HPC) environments, scientific applications rely on Parallel File Systems (PFS) to obtain Input/Output (I/O) performance especially when handling large amounts of data. However, I/O is still a bottleneck for an increasing number of applications, due to the historical gap between processing and data access speed. To alleviate the concurrency caused by thousands of nodes accessing a significantly smaller number of PFS servers, intermediate I/O nodes are typically employed between processing nodes and the file system. Each intermediate node forwards requests from multiple clients to the parallel file system, a setup which gives this component the opportunity to perform optimizations like I/O scheduling. The objective of this dissertation is to evaluate different scheduling algorithms, at the I/O forwarding layer, that work to improve concurrent access patterns by aggregating and reordering requests to avoid patterns known to harm performance. We demonstrate that the FIFO (First In, First Out), HBRR (Handle- Based Round-Robin), TO (Time Order), SJF (Shortest Job First) and MLF (Multilevel Feedback) schedulers are only partially effective because the access pattern is not the main factor that affects performance in the I/O forwarding layer, especially for read requests. A new scheduling algorithm, TWINS, is proposed to coordinate the access of intermediate I/O nodes to the parallel file system data servers. Our approach decreases concurrency at the data servers, a factor previously proven to negatively affect performance. The proposed algorithm is able to improve read performance from shared files by up to 28% over other scheduling algorithms and by up to 50% over not forwarding I/O requests. Processamento paralelo Computacao cientifica : Alto desempenho High performance I/O Parallel file systems Parallel I/O I/O forwarding I/O scheduling Access coordination

Search results