71 |
Exploring Long-Term Psychological Distress Resulting from Abusive SupervisionCorser, Peter 23 August 2022 (has links)
No description available.
|
72 |
Novel Methods for Improving Performance and Reliability of Flash-Based Solid State Storage SystemGuo, Jiayang 29 May 2018 (has links)
No description available.
|
73 |
HIGH PERFORMANCE I/O ARCHITECTURES AND FILE SYSTEMS FOR INTERNET SERVERSWANG, JUN 16 September 2002 (has links)
No description available.
|
74 |
USER-SPACE, CUSTOM FILE SYSTEM FOR PROXY SERVERSDHAR, MEGHNA 02 September 2003 (has links)
No description available.
|
75 |
Big Vector: An External Memory Algorithm and Data StructureUpadhyay, Abhyudaya 16 October 2015 (has links)
No description available.
|
76 |
A resource management framework for cloud computingLi, Min 06 May 2014 (has links)
The cloud computing paradigm is realized through large scale distributed resource management and computation platforms such as MapReduce, Hadoop, Dryad, and Pregel. These platforms enable quick and efficient development of a large range of applications that can be sustained at scale in a fault-tolerant fashion. Two key technologies, namely resource virtualization and feature-rich enterprise storage, are further driving the wide-spread adoption of virtualized cloud environments. Many challenges arise when designing resource management techniques for both native and virtualized data centers. First, parameter tuning of MapReduce jobs for efficient resource utilization is a daunting and time consuming task. Second, while the MapReduce model is designed for and leverages information from native clusters to operate efficiently, the emergence of virtual cluster topology results in overlaying or hiding the actual network information. This leads to two resource selection and placement anomalies: (i) loss of data locality, and (ii) loss of job locality. Consequently, jobs may be placed physically far from their associated data or related jobs, which adversely affect the overall performance. Finally, the extant resource provisioning approach leads to significant wastage as enterprise cloud providers have to consider and provision for peak loads instead of average load (that is many times lower).
In this dissertation, we design and develop a resource management framework to address the above challenges. We first design an innovative resource scheduler, CAM, aimed at MapReduce applications running in virtualized cloud environments. CAM reconciles both data and VM resource allocation with a variety of competing constraints, such as storage utilization, changing CPU load and network link capacities based on a flow-network algorithm. Additionally, our platform exposes the typically hidden lower-level topology information to the MapReduce job scheduler, which enables it to make optimal task assignments. Second, we design an online performance tuning system, mrOnline, which monitors the MapReduce job execution, tunes the parameters based on collected statistics and provides fine-grained control over parameter configuration changes to the user. To this end, we employ a gray-box based smart hill-climbing algorithm that leverages MapReduce runtime statistics and effectively converge to a desirable configuration within a single iteration. Finally, we target enterprise applications in virtualized environment where typically a network attached centralized storage system is deployed. We design a new protocol to share primary data de-duplication information available at the storage server with the client. This enables better client-side cache utilization and reduces server-client network traffic, which leads to overall high performance. Based on the protocol, a workload aware VM management strategy is further introduced to decrease the load to the storage server and enhance the I/O efficiency for clients. / Ph. D.
|
77 |
Scalability Analysis and Optimization for Large-Scale Deep LearningPumma, Sarunya 03 February 2020 (has links)
Despite its growing importance, scalable deep learning (DL) remains a difficult challenge. Scalability of large-scale DL is constrained by many factors, including those deriving from data movement and data processing. DL frameworks rely on large volumes of data to be fed to the computation engines for processing. However, current hardware trends showcase that data movement is already one of the slowest components in modern high performance computing systems, and this gap is only going to increase in the future. This includes data movement needed from the filesystem, within the network subsystem, and even within the node itself, all of which limit the scalability of DL frameworks on large systems. Even after data is moved to the computational units, managing this data is not easy. Modern DL frameworks use multiple components---such as graph scheduling, neural network training, gradient synchronization, and input pipeline processing---to process this data in an asynchronous uncoordinated manner, which results in straggler processes and consequently computational imbalance, further limiting scalability. This thesis studies a subset of the large body of data movement and data processing challenges that exist in modern DL frameworks.
For the first study, we investigate file I/O constraints that limit the scalability of large-scale DL. We first analyze the Caffe DL framework with Lightning Memory-Mapped Database (LMDB), one of the most widely used file I/O subsystems in DL frameworks, to understand the causes of file I/O inefficiencies. Based on our analysis, we propose LMDBIO---an optimized I/O plugin for scalable DL that addresses the various shortcomings in existing file I/O for DL. Our experimental results show that LMDBIO significantly outperforms LMDB in all cases and improves overall application performance by up to 65-fold on 9,216 CPUs of the Blues and Bebop supercomputers at Argonne National Laboratory.
Our second study deals with the computational imbalance problem in data processing. For most DL systems, the simultaneous and asynchronous execution of multiple data-processing components on shared hardware resources causes these components to contend with one another, leading to severe computational imbalance and degraded scalability. We propose various novel optimizations that minimize resource contention and improve performance by up to 35% for training various neural networks on 24,576 GPUs of the Summit supercomputer at Oak Ridge National Laboratory---the world's largest supercomputer at the time of writing of this thesis. / Doctor of Philosophy / Deep learning is a method for computers to automatically extract complex patterns and trends from large volumes of data. It is a popular methodology that we use every day when we talk to Apple Siri or Google Assistant, when we use self-driving cars, or even when we witnessed IBM Watson be crowned as the champion of Jeopardy! While deep learning is integrated into our everyday life, it is a complex problem that has gotten the attention of many researchers.
Executing deep learning is a highly computationally intensive problem. On traditional computers, such as a generic laptop or desktop machine, the computation for large deep learning problems can take years or decades to complete. Consequently, supercomputers, which are machines with massive computational capability, are leveraged for deep learning workloads. The world's fastest supercomputer today, for example, is capable of performing almost 200 quadrillion floating point operations every second. While that is impressive, for large problems, unfortunately, even the fastest supercomputers today are not fast enough. The problem is not that they do not have enough computational capability, but that deep learning problems inherently rely on a lot of data---the entire concept of deep learning centers around the fact that the computer would study a huge volume of data and draw trends from it. Moving and processing this data, unfortunately, is much slower than the computation itself and with the current hardware trends it is not expected to get much faster in the future.
This thesis aims at making deep learning executions on large supercomputers faster. Specifically, it looks at two pieces associated with managing data: (1) data reading---how to quickly read large amounts of data from storage, and (2) computational imbalance---how to ensure that the different processors on the supercomputer are not waiting for each other and thus wasting time. We first analyze each performance problem to identify the root cause of it. Then, based on the analysis, we propose several novel techniques to solve the problem. With our optimizations, we are able to significantly improve the performance of deep learning execution on a number of supercomputers, including Blues and Bebop at Argonne National Laboratory, and Summit---the world's fastest supercomputer---at Oak Ridge National Laboratory.
|
78 |
On the Use of Containers in High Performance ComputingAbraham, Subil 09 July 2020 (has links)
The lightweight, portable, and flexible nature of containers is driving their widespread adoption in cloud solutions. Data analysis and deep learning applications have especially benefited from containerized solutions. As such data analysis is also being utilized in the high performance computing (HPC) domain, the need for container support in HPC has become paramount. However, container adoption in HPC face crucial performance and I/O challenges. One obstacle is that while there have been container solutions for HPC, such solutions have not been thoroughly investigated, especially from the aspect of their impact on the crucial I/O throughput needs of HPC. To this end, this paper provides a first-of-its-kind empirical analysis of state-of-the-art representative container solutions (Docker, Podman, Singularity, and Charliecloud) in HPC environments, especially how containers interact with the HPC storage systems. We present the design of an analysis framework that is deployed on all nodes in an HPC environment, and captures aspects such as CPU, memory, network, and file I/O statistics from the nodes and the storage system. We are able to garner key insights from our analysis, e.g., Charliecloud outperforms other container solutions in terms of container start-up time, while Singularity and Charliecloud are equivalent in I/O throughput. But this comes at a cost, as Charliecloud invokes the most metadata and I/O operations on the underlying Lustre file system. By identifying such optimization opportunities, we can enhance performance of containers atop HPC and help the aforementioned applications. / Master of Science / Containers are a technology that allow for applications to be packaged along with its ideal environment, all the way down to its preferred operating system. This allows an application to run anywhere that can support containers without a huge hit to the application performance. Hence containers have seen wide adoption for use in the cloud. These qualities have also made it very appealing for use in the world of scientific research in national labs. Modern research heavily relies on the power of computing in order to model, simulate, and test the behavior of real world entities, often making use of large amounts of data and utilizing machine learning and deep learning. Doing this often requires the high performance computing power found in supercomputers. In most cases, scientists just want to be able to write their code and expect it to just work. Their applications might depend on other source code that form part of their standard toolkit and would expect to also be installed in the supercomputing environment. This may not always be the case, taking the scientist's focus away from their work in order ensure their requirements are set up in the supercomputing environment which might require extensive cooperation with the operations team responsible for the supercomputers. Containers easily solve this problem because it can package everything together. However, the use of containers in these environments have not been extensively tested, especially for applications that are very heavy on the analysis of large quantities of data. To fill this gap, this work analyzes the performance of several state-of-the-art container technologies (Docker, Podman, Singularity, Charliecloud), with a particular focus on its interaction with the Lustre data storage systems widely used in supercomputing environments. As part of this work, we design an analysis setup that captures the behavior of various aspects of the high performance computing environment like CPU, memory, network usage and data movement while using containers to run data heavy applications. We garner important insights about their performance that can help inform the best choice of container technology given an environment and the kind of application that needs to be run.
|
79 |
Holistic Performance Analysis of Multi-layer I/O in Parallel Scientific ApplicationsTschüter, Ronny 18 February 2021 (has links)
Efficient usage of file systems poses a major challenge for highly scalable parallel applications. The performance of even the most sophisticated I/O subsystems lags behind the compute capabilities of current processors. To improve the utilization of I/O subsystems, several libraries, such as HDF5, facilitate the implementation of parallel I/O operations. These libraries abstract from low-level I/O interfaces (for instance, POSIX I/O) and may internally interact with additional I/O libraries. While improving usability, I/O libraries also add complexity and impede the analysis and optimization of application I/O performance.
This thesis proposes a methodology to investigate application I/O behavior in detail. In contrast to existing approaches, this methodology captures I/O activities on multiple layers of the I/O software stack, correlates these activities across all layers explicitly, and identifies interactions between multiple layers of the I/O software stack. This allows users to identify inefficiencies at individual layers of the I/O software stack as well as to detect possible conflicts in the interplay between these layers. Therefor, a monitoring infrastructure observes an application and records information about I/O activities of the application during its execution. This work describes options to monitor applications and generate event logs reflecting their behavior. Additionally, it introduces concepts to store information about I/O activities in event logs that preserve hierarchical relations between I/O operations across all layers of the I/O
software stack.
In combination with the introduced methodology for multi-layer I/O performance analysis, this work provides the foundation for application I/O tuning by exposing patterns in the usage of I/O routines. This contribution includes the definition of I/O access patterns observable in the event logs of parallel scientific applications. These access patterns originate either directly from the application or from utilized I/O libraries. The introduced patterns reflect inefficiencies in the usage of I/O routines or reveal optimization strategies for I/O accesses. Software developers can use these patterns as a guideline for performance analysis to investigate the I/O behavior of their applications and verify the effectiveness of internal optimizations applied by high-level I/O libraries.
After focusing on the analysis of individual applications, this work widens the scope to investigations of coordinated sequences of applications by introducing a top-down approach for performance analysis of entire scientific workflows. The approach provides summarized performance metrics covering different workflow perspectives, from general overview to individual jobs and their job steps. These summaries allow users to identify inefficiencies and determine the responsible job steps. In addition, the approach utilizes the methodology for performance analysis of applications using multi-layer I/O to record detailed performance data about job steps, enabling a fine-grained analysis of the associated execution to exactly pinpoint performance issues. The introduced top-down performance analysis methodology presents a powerful tool for comprehensive performance analysis of complex workflows.
On top of their theoretical formulation, this thesis provides implementations of all proposed methodologies. For this purpose, an established performance monitoring infrastructure is enhanced by features to record I/O activities. These contributions complement existing functionality and provide a holistic performance analysis for parallel scientific applications covering computation, communication, and I/O operations. Evaluations with synthetic case studies, benchmarks, and real-world applications demonstrate the effectiveness of the proposed methodologies. The results of this work are distributed as open-source software. For instance, the measurement infrastructure including improvements introduced in this thesis is available for download and used in computing centers world-wide. Furthermore, research projects already employ the outcomes of this work.
|
80 |
Towards Data-Driven I/O Load Balancing in Extreme-Scale Storage SystemsBanavathi Srinivasa, Sangeetha 15 June 2017 (has links)
Storage systems used for supercomputers and high performance computing (HPC) centers exhibit load imbalance and resource contention. This is mainly due to two factors: the bursty nature of the I/O of scientific applications; and the complex and distributed I/O path without centralized arbitration and control. For example, the extant Lustre parallel storage system, which forms the backend storage for many HPC centers, comprises numerous components, all connected in custom network topologies, and serve varying demands of large number of users and applications. Consequently, some storage servers can be more loaded than others, creating bottlenecks, and reducing overall application I/O performance. Existing solutions focus on per application load balancing, and thus are not effective due to the lack of a global view of the system.
In this thesis, we adopt a data-driven quantitative approach to load balance the I/O servers at extreme scale. To this end, we design a global mapper on Lustre Metadata Server (MDS), which gathers runtime statistics collected from key storage components on the I/O path, and applies Markov chain modeling and a dynamic maximum flow algorithm to decide where data should be placed in a load-balanced fashion. Evaluation using a realistic system simulator shows that our approach yields better load balancing, which in turn can help yield higher end-to-end performance. / Master of Science / Critical jobs such as meteorological prediction are run at exa-scale supercomputing facilities like Oak Ridge Leadership Computing Facility (OLCF). It is necessary for these centers to provide an optimally running infrastructure to support these critical workloads. The amount of data that is being produced and processed is increasing rapidly necessitating the need for these High Performance Computing (HPC) centers to design systems to support the increasing volume of data.
Lustre is a parallel filesystem that is deployed in HPC centers. Lustre being a hierarchical filesystem comprises of a distributed layer of Object Storage Servers (OSSs) that are responsible for I/O on the Object Storage Targets (OSTs). Lustre employs a traditional capacity-based Round-Robin approach for file placement on the OSTs. This results in the usage of only a small fraction of OSTs. Traditional Round-Robin approach also increases the load on the same set of OSSs which results in a decreased performance. Thus, it is imperative to have a better load balanced file placement algorithm that can evenly distribute the load across all OSSs and the OSTs in order to meet the future demands of data storage and processing.
We approach the problem of load imbalance by splicing the whole system into two views: filesystem and applications. We first collect the current usage statistics of the filesystem by means of a distributed monitoring tool. We then predict the applications’ I/O request pattern by employing a Markov Chain Model. Finally, we make use of both these components to design a load balancing algorithm that eventually evens out the load on both the OSSs and OSTs.
We evaluate our algorithm on a custom-built simulator that simulates the behavior of the actual filesystem.
|
Page generated in 0.0291 seconds