Global ETD Search

281	An Adaptive Framework for Managing Heterogeneous Many-Core Clusters Rafique, Muhammad Mustafa 21 October 2011 (has links) The computing needs and the input and result datasets of modern scientific and enterprise applications are growing exponentially. To support such applications, High-Performance Computing (HPC) systems need to employ thousands of cores and innovative data management. At the same time, an emerging trend in designing HPC systems is to leverage specialized asymmetric multicores, such as IBM Cell and AMD Fusion APUs, and commodity computational accelerators, such as programmable GPUs, which exhibit excellent price to performance ratio as well as the much needed high energy efficiency. While such accelerators have been studied in detail as stand-alone computational engines, integrating the accelerators into large-scale distributed systems with heterogeneous computing resources for data-intensive computing presents unique challenges and trade-offs. Traditional programming and resource management techniques cannot be directly applied to many-core accelerators in heterogeneous distributed settings, given the complex and custom instruction sets architectures, memory hierarchies and I/O characteristics of different accelerators. In this dissertation, we explore the design space of using commodity accelerators, specifically IBM Cell and programmable GPUs, in distributed settings for data-intensive computing and propose an adaptive framework for programming and managing heterogeneous clusters. The proposed framework provides a MapReduce-based extended programming model for heterogeneous clusters, which distributes tasks between asymmetric compute nodes by considering workload characteristics and capabilities of individual compute nodes. The framework provides efficient data prefetching techniques that leverage general-purpose cores to stage the input data in the private memories of the specialized cores. We also explore the use of an advanced layered-architecture based software engineering approach and provide mixin-layers based reusable software components to enable easy and quick deployment of heterogeneous clusters. The framework also provides multiple resource management and scheduling policies under different constraints, e.g., energy-aware and QoS-aware, to support executing concurrent applications on multi-tenant heterogeneous clusters. When applied to representative applications and benchmarks, our framework yields significantly improved performance in terms of programming efficiency and optimal resource management as compared to conventional, hand-tuned, approaches to program and manage accelerator-based heterogeneous clusters. / Ph. D. Heterogeneous Computing High-Performance Computing Resource Sharing Resource Management and Scheduling Programming Models
282	Prediction Models for Multi-dimensional Power-Performance Optimization on Many Cores Shah, Ankur Savailal 28 May 2008 (has links) Power has become a primary concern for HPC systems. Dynamic voltage and frequency scaling (DVFS) and dynamic concurrency throttling (DCT) are two software tools (or knobs) for reducing the dynamic power consumption of HPC systems. To date, few works have considered the synergistic integration of DVFS and DCT in performance-constrained systems, and, to the best of our knowledge, no prior research has developed application-aware simultaneous DVFS and DCT controllers in real systems and parallel programming frameworks. We present a multi-dimensional, online performance prediction framework, which we deploy to address the problem of simultaneous runtime optimization of DVFS, DCT, and thread placement on multi-core systems. We present results from an implementation of the prediction framework in a runtime system linked to the Intel OpenMP runtime environment and running on a real dual-processor quad-core system as well as a dual-processor dual-core system. We show that the prediction framework derives near-optimal settings of the three power-aware program adaptation knobs that we consider. Our overall runtime optimization framework achieves significant reductions in energy (12.27% mean) and ED² (29.6% mean), through simultaneous power savings (3.9% mean) and performance improvements (10.3% mean). Our prediction and adaptation framework outperforms earlier solutions that adapt only DVFS or DCT, as well as one that sequentially applies DCT then DVFS. Further, our results indicate that prediction-based schemes for runtime adaptation compare favorably and typically improve upon heuristic search-based approaches in both performance and energy savings. / Master of Science concurrency throttling power-aware computing runtime adaptation performance prediction high-performance computing Multicore processors
283	Power Saving Analysis and Experiments for Large Scale Global Optimization Cao, Zhenwei 03 August 2009 (has links) Green computing, an emerging field of research that seeks to reduce excess power consumption in high performance computing (HPC), is gaining popularity among researchers. Research in this field often relies on simulation or only uses a small cluster, typically 8 or 16 nodes, because of the lack of hardware support. In contrast, System G at Virginia Tech is a 2592 processor supercomputer equipped with power aware components suitable for large scale green computing research. DIRECT is a deterministic global optimization algorithm, implemented in the mathematical software package VTDIRECT95. This thesis explores the potential energy savings for the parallel implementation of DIRECT, called pVTdirect, when used with a large scale computational biology application, parameter estimation for a budding yeast cell cycle model, on System G. Two power aware approaches for pVTdirect are developed and compared against the CPUSPEED power saving system tool. The results show that knowledge of the parallel workload of the underlying application is beneficial for power management. / Master of Science VTDIRECT95 power aware computing high performance computing DVFS large scale global optimization budding yeast problem
284	Enabling the use of Heterogeneous Computing for Bioinformatics Bijanapalli Chakri, Ramakrishna 02 October 2013 (has links) The huge amount of information in the encoded sequence of DNA and increasing interest in uncovering new discoveries has spurred interest in accelerating the DNA sequencing and alignment processes. The use of heterogeneous systems, that use different types of computational units, has seen a new light in high performance computing in recent years; However expertise in multiple domains and skills required to program these systems is causing an hindrance to bioinformaticians in rapidly deploying their applications into these heterogeneous systems. This work attempts to make an heterogeneous system, Convey HC-1, with an x86-based host processor and FPGA-based co-processor, accessible to bioinformaticians. First, a highly efficient dynamic programming based Smith-Waterman kernel is implemented in hardware, which is able to achieve a peak throughput of 307.2 Giga Cell Updates per Second (GCUPS) on Convey HC-1. A dynamic programming accelerator interface is provided to any application that uses Smith-Waterman. This implementation is also extended to General Purpose Graphics Processing Units (GP-GPUs), which achieved a peak throughput of 9.89 GCUPS on NVIDIA GTX580 GPU. Second, a well known graphical programming tool, LabVIEW is enabled as a programming tool for the Convey HC-1. A connection is established between the graphical interface and the Convey HC-1 to control and monitor the application running on the FPGA-based co-processor. / Master of Science Field programmable gate arrays Hardware Acceleration High Performance Computing DNA Alignment LabVIEW Heterogeneous Computing GP-GPUs
285	Scalable Data Management for Object-based Storage Systems Wadhwa, Bharti 19 August 2020 (has links) Parallel I/O performance is crucial to sustain scientific applications on large-scale High-Performance Computing (HPC) systems. Large scale distributed storage systems, in particular the object-based storage systems, face severe challenges for managing the data efficiently. Inefficient data management leads to poor I/O and storage performance in HPC applications and scientific workflows. Some of the main challenges for efficient data management arise from poor resource allocation, load imbalance in object storage targets, and inflexible data sharing between applications in a workflow. In addition, parallel I/O makes it challenging to shoehorn new interfaces, such as taking advantage of multiple layers of storage and support for analysis in the data path. Solving these challenges to improve performance and efficiency of object-based storage systems is crucial, especially for upcoming era of exascale systems. This dissertation is focused on solving these major challenges in object-based storage systems by providing scalable data management strategies. In the first part of the dis-sertation (Chapter 3), we present a resource contention aware load balancing tool (iez) for large scale distributed object-based storage systems. In Chapter 4, we extend iez to support Progressive File Layout for object-based storage system: Lustre. In the second part (Chapter 5), we present a technique to facilitate data sharing in scientific workflows using object-based storage, with our proposed tool Workflow Data Communicator. In the last part of this dissertation, we present a solution for transparent data management in multi-layer storage hierarchy of present and next-generation HPC systems.This dissertation shows that by intelligently employing scalable data management techniques, scientific applications' and workflows' flexibility and performance in object-based storage systems can be enhanced manyfold. Our proposed data management strategies can guide next-generation HPC storage systems' software design to efficiently support data for scientific applications and workflows. / Doctor of Philosophy / Large scale object-based storage systems face severe challenges to manage the data efficiently for HPC applications and workflows. These storage systems often manage and share data inflexibly, without considering the load imbalance and resource contention in the underlying multi-layer storage hierarchy. This dissertation first studies how resource contention and inflexible data sharing mechanisms impact HPC applications' storage and I/O performance; and then presents a series of efficient techniques, tools and algorithms to provide efficient and scalable data management for current and next-generation HPC storage systems Lustre Ceph High Performance Computing Parallel File Systems ParallelI/O Optimization Load Imbalance Resource Contention
286	On the Use of Containers in High Performance Computing Abraham, Subil 09 July 2020 (has links) The lightweight, portable, and flexible nature of containers is driving their widespread adoption in cloud solutions. Data analysis and deep learning applications have especially benefited from containerized solutions. As such data analysis is also being utilized in the high performance computing (HPC) domain, the need for container support in HPC has become paramount. However, container adoption in HPC face crucial performance and I/O challenges. One obstacle is that while there have been container solutions for HPC, such solutions have not been thoroughly investigated, especially from the aspect of their impact on the crucial I/O throughput needs of HPC. To this end, this paper provides a first-of-its-kind empirical analysis of state-of-the-art representative container solutions (Docker, Podman, Singularity, and Charliecloud) in HPC environments, especially how containers interact with the HPC storage systems. We present the design of an analysis framework that is deployed on all nodes in an HPC environment, and captures aspects such as CPU, memory, network, and file I/O statistics from the nodes and the storage system. We are able to garner key insights from our analysis, e.g., Charliecloud outperforms other container solutions in terms of container start-up time, while Singularity and Charliecloud are equivalent in I/O throughput. But this comes at a cost, as Charliecloud invokes the most metadata and I/O operations on the underlying Lustre file system. By identifying such optimization opportunities, we can enhance performance of containers atop HPC and help the aforementioned applications. / Master of Science / Containers are a technology that allow for applications to be packaged along with its ideal environment, all the way down to its preferred operating system. This allows an application to run anywhere that can support containers without a huge hit to the application performance. Hence containers have seen wide adoption for use in the cloud. These qualities have also made it very appealing for use in the world of scientific research in national labs. Modern research heavily relies on the power of computing in order to model, simulate, and test the behavior of real world entities, often making use of large amounts of data and utilizing machine learning and deep learning. Doing this often requires the high performance computing power found in supercomputers. In most cases, scientists just want to be able to write their code and expect it to just work. Their applications might depend on other source code that form part of their standard toolkit and would expect to also be installed in the supercomputing environment. This may not always be the case, taking the scientist's focus away from their work in order ensure their requirements are set up in the supercomputing environment which might require extensive cooperation with the operations team responsible for the supercomputers. Containers easily solve this problem because it can package everything together. However, the use of containers in these environments have not been extensively tested, especially for applications that are very heavy on the analysis of large quantities of data. To fill this gap, this work analyzes the performance of several state-of-the-art container technologies (Docker, Podman, Singularity, Charliecloud), with a particular focus on its interaction with the Lustre data storage systems widely used in supercomputing environments. As part of this work, we design an analysis setup that captures the behavior of various aspects of the high performance computing environment like CPU, memory, network usage and data movement while using containers to run data heavy applications. We garner important insights about their performance that can help inform the best choice of container technology given an environment and the kind of application that needs to be run. Container Performance High Performance Computing Parallel File Systems HPC Storage and I/O
287	Interpolants, Error Bounds, and Mathematical Software for Modeling and Predicting Variability in Computer Systems Lux, Thomas Christian Hansen 23 September 2020 (has links) Function approximation is an important problem. This work presents applications of interpolants to modeling random variables. Specifically, this work studies the prediction of distributions of random variables applied to computer system throughput variability. Existing approximation methods including multivariate adaptive regression splines, support vector regressors, multilayer perceptrons, Shepard variants, and the Delaunay mesh are investigated in the context of computer variability modeling. New methods of approximation using Box splines, Voronoi cells, and Delaunay for interpolating distributions of data with moderately high dimension are presented and compared with existing approaches. Novel theoretical error bounds are constructed for piecewise linear interpolants over functions with a Lipschitz continuous gradient. Finally, a mathematical software that constructs monotone quintic spline interpolants for distribution approximation from data samples is proposed. / Doctor of Philosophy / It is common for scientists to collect data on something they are studying. Often scientists want to create a (predictive) model of that phenomenon based on the data, but the choice of how to model the data is a difficult one to answer. This work proposes methods for modeling data that operate under very few assumptions that are broadly applicable across science. Finally, a software package is proposed that would allow scientists to better understand the true distribution of their data given relatively few observations. Approximation Theory Numerical Analysis High Performance Computing Computer Security Nonparametric Statistics Mathematical Software
288	Energy and Performance Models Enabling Design Space Exploration using Domain Specific Languages Umar, Mariam 25 May 2018 (has links) With the advent of exascale architectures maximizing performance while maintaining energy consumption within reasonable limits has become one of the most critical design constraints. This constraint is particularly significant in light of the power budget of 20 MWatts set by the U.S. Department of Energy for exascale supercomputing facilities. Therefore, understanding an application's characteristics, execution pattern, energy footprint, and the interactions of such aspects is critical to improving the application's performance as well as its utilization of the underlying resources. With conventional methods of analyzing performance and energy consumption trends scientists are forced to limit themselves to a manageable number of design parameters. While these modeling techniques have catered to the needs of current high-performance computing systems, the complexity and scale of exascale systems demands that large-scale design-space-exploration techniques are developed to enable comprehensive analysis and evaluations. In this dissertation we present research on performance and energy modeling of current high performance computing and future exascale systems. Our thesis is focused on the design space exploration of current and future architectures, in terms of their reconfigurability, application's sensitivity to hardware characteristics (e.g., system clock, memory bandwidth), application's execution patterns, application's communication behavior, and utilization of resources. Our research is aimed at understanding the methods by which we may maximize performance of exascale systems, minimize energy consumption, and understand the trade offs between the two. We use analytical, statistical, and machine-learning approaches to develop accurate, portable and scalable performance and energy models. We develop application and machine abstractions using Aspen (a domain specific language) to implement and evaluate our modeling techniques. As part of our research we develop and evaluate system-level performance and energy-consumption models that form part of an automated modeling framework, which analyzes application signatures to evaluate sensitivity of reconfigurable hardware components for candidate exascale proxy applications. We also develop statistical and machine-learning based models of the application's execution patterns on heterogeneous platforms. We also propose a communication and computation modeling and mapping framework for exascale proxy architectures and evaluate the framework for an exascale proxy application. These models serve as external and internal extensions to Aspen, which enable proxy exascale architecture implementations and thus facilitate design space exploration of exascale systems. / Ph. D. / Performance monitoring and modeling has been an extensively researched topic over the last decade. The traditional approaches of manually modeling performance and energy worked well for previous generation computers. With the prevalence of complex high-performance computers, clusters and the anticipation of future exascale architectures, the conventional modeling approaches will not be sufficient. A number of reasons limit the conventional modeling approaches, e.g, complexity of current and future architectures, increase in number of performance parameters to monitor, diversity in the architecture etc. This issue will worsen with the advent of exascale architectures that encompasses complex micro-architectures along with the increases in scale that have never been encountered in the computing industry before. In this dissertation, we focus on two primary aspects of performance and energy modeling in the context of current high performance computing and future exascale architectures. We focus on adapting conventional modeling approaches to comprise the properties of accuracy, scalability, portability and independence of architectures. Centered around performance and energy improvements, we also develop design space exploration techniques that study the effects of application performance improvement in terms of reconfigurable hardware. We also quantitatively measure the effects of application performance sensitivity with changing hardware configurations – using analytical and machine learning modeling techniques. We explore theoretical exascale architecture, and validate it for performance limits. We develop a communication and computation model for the proxy exascale architecture and test it for strong and weak scaling for co-design for molecular dynamics. Energy Modeling Performance Modeling Aspen Domain Specific Language Analytical Modeling High-Performance Computing Exascale Computing
289	Towards a Polyalgorithm for Land Use and Land Cover Change Detection Saxena, Rishu 23 February 2018 (has links) Earth observation satellites (EOS) such as Landsat provide image datasets that can be immensely useful in numerous application domains. One way of analyzing satellite images for land use and land cover change (LULCC) is time series analysis (TSA). Several algorithms for time series analysis have been proposed by various groups in remote sensing; more algorithms (that can be adapted) are available in the general time series literature. However, in spite of an abundance of algorithms, the choice of algorithm to be used for analyzing an image stack is presently an open question. A concurrent issue is the prohibitive size of Landsat datasets, currently of the order of petabytes and growing. This makes them computationally unwieldy --- both in storage and processing. An EOS image stack typically consists of multiple images of a fixed area on the Earth's surface (same latitudes and longitudes) taken at different time points. Experiments on multicore servers indicate that carrying out meaningful time series analysis on one such interannual, multitemporal stack with existing state of the art codes can take several days. This work proposes using multiple algorithms to analyze a given image stack in a polyalgorithmic framework. A polyalgorithm combines several basic algorithms, each meant to solve the same problem, producing a strategy that unites the strengths and circumvents the weaknesses of constituent algorithms. The foundation of the proposed TSA based polyalgorithm is laid using three algorithms (LandTrendR, EWMACD, and BFAST). These algorithms are precisely described mathematically, and chosen to be fundamentally distinct from each other in design and in the phenomena they capture. Analysis of results representing success, failure, and parameter sensitivity for each algorithm is presented. Scalability issues, important for real simulations, are also discussed, along with scalable implementations, and speedup results. For a given pixel, Hausdorff distance is used to compare the distance between the change times (breakpoints) obtained from two different algorithms. Timesync validation data, a dataset that is based on human interpretation of Landsat time series in concert with historical aerial photography, is used for validation. The polyalgorithm yields more accurate results than EWMACD and LandTrendR alone, but counterintuitively not better than BFAST alone. This nascent work will be directly useful in land use and land cover change studies, of interest to terrestrial science research, especially regarding anthropogenic impacts on the environment, and in much broader applications such as health monitoring and urban transportation. / M. S. / Numerous manmade satellites circling around the Earth regularly take pictures (images) of the Earth’s surface from up above. These images naturally provide information regarding the land cover of any given piece of land at the moment of capture (for e.g., whether the land area in the picture is covered with forests or with agriculture or housing). Therefore, for a fixed land area, if a person looks at a chronologically arranged series of images, any significant changes in land use can be identified. Identifying such changes is of critical importance, especially in this era where deforestation, urbanization, and global warming are major concerns. The goal of this thesis is to investigate the design of methodologies (algorithms) that can efficiently and accurately use satellite images for answering questions regarding land cover trend and change. Experience shows that the state-of-the-art methodologies produce great results for the region they were originally designed on but their performance on other regions is unpredictable. In this work, therefore, a ‘polyalgorithm’ is proposed. A ‘polyalgorithm’ utilizes multiple simple methodologies and strategically combines them so that the outcome is better than the individual components. In this introductory work, three component methodologies are utilized; each component methodology is capable of capturing phenomenon different from the other two. Mathematical formulation of each component methodology is presented. Initial strategy for combining the three component algorithms is proposed. The outcomes of each component methodology as well the polyalgorithm are tested on human interpreted data. The strengths and limitations of each methodology are also discussed. Efficiency of the codes used for implementing the polyalgorithm is also discussed; this is important because the satellite data that needs to be processed is known to be huge (petabytes sized already and growing). This nascent work will be directly useful especially in understanding the impact of human activities on the environment. It will also be useful in other applications such as health monitoring and urban transportation. Remote sensing time series analysis event detection change detection big data scalability high performance computing.
290	Objective-Driven Strategies for HPC Job Scheduling Goponenko, Alexander V 01 January 2024 (has links) (PDF) As High-Performance Computing (HPC) becomes increasingly prevalent and resource-intensive, there is a growing need for the development of more efficient job schedulers, which play a crucial role in the performance of HPC clusters. This dissertation manifests a comprehensive approach to this complex issue, contributing to three major components of the problem: (1) metrics of job packing efficiency and fairness, (2) advanced scheduling algorithms, and (3) job resource utilization prediction techniques. To ensure high relevance of the results, this study emphasizes scheduling objectives. Therefore, scheduling quality metrics are investigated first, yielding a set of metrics that allow comparing alternative schedules and evaluating scheduling goals trade-offs. The set of metrics enables the first comprehensive analysis of effects of different scheduling improvement approaches on several aspects of scheduling quality, covering a variety of list scheduling algorithms as well as constraint programming optimization schedulers. The contribution to the third research area covers techniques to measure and estimate resource usage data. It reports a first-of-a-kind evaluation of various job runtime prediction techniques in improving scheduling quality, demonstrates an approach capable of estimating job parameters beyond the runtime, and explores measuring resources consumed by a job in an HPC cluster. The dissertation concludes with a practical demonstration of these concepts through an I/O-aware scheduling prototype that measures real-time resource utilization, autonomously determines job resource requirements the scheduler needs, and implements full-featured multi-resource backfill scheduling that accounts for the specific properties of the parallel file system bandwidth resource. The study exhibits the advantages of further reducing I/O congestion—beyond the capability of generic I/O-aware scheduling—and presents the Workload-adaptive scheduling strategy that attains such improvement. This approach features a “two-group” approximation technique to maintain efficient performance regardless of zero-throughput job availability. An evaluation conducted on a real HPC cluster demonstrates the effectiveness of the novel strategy. high-performance computing parallel job scheduling schedule quality constraint programming I/O-aware scheduling Slurm

Search results