Global ETD Search

1	Measuring, modeling, and optimizing counterintuitive performance phenomena in power-scalable, parallel systems Chang, Hung-Ching 09 April 2015 (has links) The demands of exascale computing systems and applications have pushed for a rapid, continual design paradigm coupled with increasing design complexities from the interaction between the application, the middleware, and the underlying system hardware, which forms a breeding ground for inefficiency. This work seeks to improve system efficiency by exposing the root causes of unexpected performance slowdowns (e.g., lower performance at higher processor speeds) that occur more frequently in power-scalable systems where raw processor speed varies. More precisely, we perform an exhaustive empirical study that conclusively shows that increasing processor speed often reduces performance and wastes energy. Our experimental work shows that the frequency of occurrence and magnitude of slowdowns grow with clock frequency and parallelism, indicating that such slowdowns will increasingly be observed with trends in processor and system design. Performance speedups at lower frequencies (or slowdowns at higher frequencies) have been anecdotally observed in the prevailing literature since 2004, but no research has explained nor exploited this phenomenon. This work conclusively demonstrates that performance slowdowns during processor speedup phases can exceed 47% in common I/O workloads. Our hypothesis challenges (and ultimately debunks) a fundamental assumption in computer systems: faster processor speeds result in the same or better performance. In this work, with the use of code and kernel instrumentation, exhaustive experiments, and deep insight into the inner workings of the Linux I/O subsystem, I overcome the aforementioned challenges of variance, complexity, and nondeterminism and identify the I/O resource contention as the root cause of the slowdowns during processor speedup. Specifically, such contention comes from the Linux kernel when the journaling block device (JBD) interacts with the ext3/4 file system that introduces file write delays and file synchronization delays. To fully explain how such I/O contention causes performance anomaly, I propose analytical models of resource contention among I/O threads to describe the root cause of the observed I/O slowdowns when processors speed up. To this end, I introduce LUC, a runtime system to limit the unintended consequences of power scaling and demonstrate the effectiveness of the LUC system for two critical parallel transaction-oriented workloads, including a mail server (varMail) and online transaction processing (oltp). / Ph. D. parallel and distributed processing I/O performance power runtime systems
2	Holistic Performance Analysis of Multi-layer I/O in Parallel Scientific Applications Tschüter, Ronny 18 February 2021 (has links) Efficient usage of file systems poses a major challenge for highly scalable parallel applications. The performance of even the most sophisticated I/O subsystems lags behind the compute capabilities of current processors. To improve the utilization of I/O subsystems, several libraries, such as HDF5, facilitate the implementation of parallel I/O operations. These libraries abstract from low-level I/O interfaces (for instance, POSIX I/O) and may internally interact with additional I/O libraries. While improving usability, I/O libraries also add complexity and impede the analysis and optimization of application I/O performance. This thesis proposes a methodology to investigate application I/O behavior in detail. In contrast to existing approaches, this methodology captures I/O activities on multiple layers of the I/O software stack, correlates these activities across all layers explicitly, and identifies interactions between multiple layers of the I/O software stack. This allows users to identify inefficiencies at individual layers of the I/O software stack as well as to detect possible conflicts in the interplay between these layers. Therefor, a monitoring infrastructure observes an application and records information about I/O activities of the application during its execution. This work describes options to monitor applications and generate event logs reflecting their behavior. Additionally, it introduces concepts to store information about I/O activities in event logs that preserve hierarchical relations between I/O operations across all layers of the I/O software stack. In combination with the introduced methodology for multi-layer I/O performance analysis, this work provides the foundation for application I/O tuning by exposing patterns in the usage of I/O routines. This contribution includes the definition of I/O access patterns observable in the event logs of parallel scientific applications. These access patterns originate either directly from the application or from utilized I/O libraries. The introduced patterns reflect inefficiencies in the usage of I/O routines or reveal optimization strategies for I/O accesses. Software developers can use these patterns as a guideline for performance analysis to investigate the I/O behavior of their applications and verify the effectiveness of internal optimizations applied by high-level I/O libraries. After focusing on the analysis of individual applications, this work widens the scope to investigations of coordinated sequences of applications by introducing a top-down approach for performance analysis of entire scientific workflows. The approach provides summarized performance metrics covering different workflow perspectives, from general overview to individual jobs and their job steps. These summaries allow users to identify inefficiencies and determine the responsible job steps. In addition, the approach utilizes the methodology for performance analysis of applications using multi-layer I/O to record detailed performance data about job steps, enabling a fine-grained analysis of the associated execution to exactly pinpoint performance issues. The introduced top-down performance analysis methodology presents a powerful tool for comprehensive performance analysis of complex workflows. On top of their theoretical formulation, this thesis provides implementations of all proposed methodologies. For this purpose, an established performance monitoring infrastructure is enhanced by features to record I/O activities. These contributions complement existing functionality and provide a holistic performance analysis for parallel scientific applications covering computation, communication, and I/O operations. Evaluations with synthetic case studies, benchmarks, and real-world applications demonstrate the effectiveness of the proposed methodologies. The results of this work are distributed as open-source software. For instance, the measurement infrastructure including improvements introduced in this thesis is available for download and used in computing centers world-wide. Furthermore, research projects already employ the outcomes of this work. multi-layer I/O, performance analysis Mehrschichtige E/A, Leistungsanalyse info:eu-repo/classification/ddc/004 ddc:004
3	Performance of Disk I/O operations during the Live Migration of a Virtual Machine over WAN Vemulapalli, Revanth, Mada, Ravi Kumar January 2014 (has links) Virtualization is a technique that allows several virtual machines (VMs) to run on a single physical machine (PM) by adding a virtualization layer above the physical host's hardware. Many virtualization products allow a VM be migrated from one PM to other PM without interrupting the services running on the VM. This is called live migration and offers many potential advantages like server consolidation, reduced energy consumption, disaster recovery, reliability, and efficient workflows such as "Follow-the-Sun''. At present, the advantages of VM live migration are limited to Local Area Networks (LANs) as migrations over Wide Area Networks (WAN) offer lower performance due to IP address changes in the migrating VMs and also due to large network latency. For scenarios which require migrations, shared storage solutions like iSCSI (block storage) and NFS (file storage) are used to store the VM's disk to avoid the high latencies associated with disk state migration when private storage is used. When using iSCSI or NFS, all the disk I/O operations generated by the VM are encapsulated and carried to the shared storage over the IP network. The underlying latency in WAN will effect the performance of application requesting the disk I/O from the VM. In this thesis our objective was to determine the performance of shared and private storage when VMs are live migrated in networks with high latency, with WANs as the typical case. To achieve this objective, we used Iometer, a disk benchmarking tool, to investigate the I/O performance of iSCSI and NFS when used as shared storage for live migrating Xen VMs over emulated WANs. In addition, we have configured the Distributed Replicated Block Device (DRBD) system to provide private storage for our VMs through incremental disk replication. Then, we have studied the I/O performance of the private storage solution in the context of live disk migration and compared it to the performance of shared storage based on iSCSI and NFS. The results from our testbed indicate that the DRBD-based solution should be preferred over the considered shared storage solutions because DRBD consumed less network bandwidth and has a lower maximum I/O response time. Virtualization XEN DRBD iSCSI NFS Iometer Distributed Replicated Block Device file storage block storage virtual machine VM I/O performance Computer Sciences Datavetenskap (datalogi) Telecommunications Telekommunikation

1

Page generated in 0.0895 seconds