Global ETD Search

1	An Analysis of Conventional & Heterogenous Workloads on Production Supercomputing Resources Berkhahn, Jonathan Allen 06 June 2013 (has links) Cloud computing setups are a huge investment of resources and personnel to maintain. As<br />the workload on a system is a major contributing factor to both the performance of the<br />system and a representation of the needs of system users, a clear understanding of the<br />workload is critical to organizations that support supercomputing systems. In this paper,<br />we analyze traces from two production level supercomputers to infer the characteristics of<br />their workloads, and make observations as to the needs of supercomputer users based on<br />them. We particularly focus on the usage of graphical processing units by domain<br />scientists. Based on this analysis, we generate a synthetic workload that can be used for<br />testing future systems, and make observations as to e"cient resource provisioning. / Master of Science Workload synthesis resource provisioning heterogenous computing
2	Reactive scheduling of DAG applications on heterogeneous and dynamic distributed computing systems Hernandez, Jesus Israel January 2008 (has links) Emerging technologies enable a set of distributed resources across a network to be linked together and used in a coordinated fashion to solve a particular parallel application at the same time. Such applications are often abstracted as directed acyclic graphs (DAGs), in which vertices represent application tasks and edges represent data dependencies between tasks. Effective scheduling mechanisms for DAG applications are essential to exploit the tremendous potential of computational resources. The core issues are that the availability and performance of resources, which are already by their nature heterogeneous, can be expected to vary dynamically, even during the course of an execution. In this thesis, we first consider the problem of scheduling DAG task graphs onto heterogeneous resources with changeable capabilities. We propose a list-scheduling heuristic approach, the Global Task Positioning (GTP) scheduling method, which addresses the problem by allowing rescheduling and migration of tasks in response to significant variations in resource characteristics. We observed from experiments with GTP that in an execution with relatively frequent migration, it may be that, over time, the results of some task have been copied to several other sites, and so a subsequent migrated task may have several possible sources for each of its inputs. Some of these copies may now be more quickly accessible than the original, due to dynamic variations in communication capabilities. To exploit this observation, we extended our model with a Copying Management(CM) function, resulting in a new version, the Global Task Positioning with copying facilities (GTP/c) system. The idea is to reuse such copies, in subsequent migration of placed tasks, in order to reduce the impact of migration cost on makespan. Finally, we believe that fault tolerance is an important issue in heterogeneous and dynamic computational environments as the availability of resources cannot be guaranteed. To address the problem of processor failure, we propose a rewinding mechanism which rewinds the progress of the application to a previous state, thereby preserving the execution in spite of the failed processor(s). We evaluate our mechanisms through simulation, since this allow us to generate repeatable patterns of resource performance variation. We use a standard benchmark set of DAGs, comparing performance against that of competing algorithms from the scheduling literature. 004
3	Scalability Analysis of Synchronous Data-Parallel Artificial Neural Network (ANN) Learners Sun, Chang 14 September 2018 (has links) Artificial Neural Networks (ANNs) have been established as one of the most important algorithmic tools in the Machine Learning (ML) toolbox over the past few decades. ANNs' recent rise to widespread acceptance can be attributed to two developments: (1) the availability of large-scale training and testing datasets; and (2) the availability of new computer architectures for which ANN implementations are orders of magnitude more efficient. In this thesis, I present research on two aspects of the second development. First, I present a portable, open source implementation of ANNs in OpenCL and MPI. Second, I present performance and scaling models for ANN algorithms on state-of-the-art Graphics Processing Unit (GPU) based parallel compute clusters. / Master of Science / Artificial Neural Networks (ANNs) have been established as one of the most important algorithmic tools in the Machine Learning (ML) toolbox over the past few decades. ANNs’ recent rise to widespread acceptance can be attributed to two developments: (1) the availability of large-scale training and testing datasets; and (2) the availability of new computer architectures for which ANN implementations are orders of magnitude more efficient. In this thesis, I present research on two aspects of the second development. First, I present a portable, open source implementation of ANNs in OpenCL and MPI. Second, I present performance and scaling models for ANN algorithms on state-of-the-art Graphics Processing Unit (GPU) based parallel compute clusters. artificial neural networks Machine learning heterogenous computing parallel computing
4	High-performance algorithms and software for large-scale molecular simulation Liu, Xing 08 June 2015 (has links) Molecular simulation is an indispensable tool in many different disciplines such as physics, biology, chemical engineering, materials science, drug design, and others. Performing large-scale molecular simulation is of great interest to biologists and chemists, because many important biological and pharmaceutical phenomena can only be observed in very large molecule systems and after sufficiently long time dynamics. On the other hand, molecular simulation methods usually have very steep computational costs, which limits current molecular simulation studies to relatively small systems. The gap between the scale of molecular simulation that existing techniques can handle and the scale of interest has become a major barrier for applying molecular simulation to study real-world problems. In order to study large-scale molecular systems using molecular simulation, it requires developing highly parallel simulation algorithms and constantly adapting the algorithms to rapidly changing high performance computing architectures. However, many existing algorithms and codes for molecular simulation are from more than a decade ago, which were designed for sequential computers or early parallel architectures. They may not scale efficiently and do not fully exploit features of today's hardware. Given the rapid evolution in computer architectures, the time has come to revisit these molecular simulation algorithms and codes. In this thesis, we demonstrate our approach to addressing the computational challenges of large-scale molecular simulation by presenting both the high-performance algorithms and software for two important molecular simulation applications: Hartree-Fock (HF) calculations and hydrodynamics simulations, on highly parallel computer architectures. The algorithms and software presented in this thesis have been used by biologists and chemists to study some problems that were unable to solve using existing codes. The parallel techniques and methods developed in this work can be also applied to other molecular simulation applications. High-performance computing Parallel algorithm Distributed computing Heterogenous computing Quantum chemistry Stokesian dynamics Brownian dynamics
5	Linear Programming Based Resource Management for Heterogeneous Computing Systems Al-Azzoni, Issam 05 1900 (has links) An emerging trend in computing is to use distributed heterogeneous computing (HC) systems to execute a set of tasks. Cluster computer systems, grids, and Desktop Grids are three popular kinds of HC systems. An important component of an HC system is its resource management system (RMS). The main responsibility of an RMS is assigning resources to tasks in order to satisfy certain performance requirements. For cluster computer systems, we propose a new mapping heuristic which requires less state information than current heuristics. For Desktop Grids, we propose a new scheduling policy that exploits knowledge of the effective computing power delivered by the machines and the distribution of their fault times in order to improve performance. Finally, for grids, we propose a new decentralized load balancing policy which dramatically cuts down the communication overhead incurred in state information update. The proposed resource management policies utilize the solution to a linear programming problem (LP) which maximizes the system capacity. Our simulation experiments show that these policies perform very competitively, especially in highly heterogeneous systems. / Thesis / Doctor of Philosophy (PhD) distributed heterogenous computing resource management system linear programming problem Desktop Grids cluster computer system performance
6	HPSM: uma API em linguagem c++ para programas com laços paralelos com suporte a multi-CPUs e Multi-GPUs / HPSM: a c++ API for parallel loops programs Supporting multi-CPUs and multi-GPUs Di Domenico, Daniel 21 December 2016 (has links) Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / Parallel architectures has been ubiquitous for some time now. However, the word ubiquitous can’t be applied to parallel programs, because there is a greater complexity to code them comparing to ordinary programs. This fact is aggravated when the programming also involves accelerators, like GPUs, which demand the use of tools with scpecific resources. Considering this setting, there are programming models that make easier the codification of parallel applications to explore accelerators, nevertheless, we don’t know APIs that allow implementing programs with parallel loops that can be processed simultaneously by multiple CPUs and multiple GPUs. This works presents a high-level C++ API called HPSM aiming to make easier and more efficient the codification of parallel programs intended to explore multi-CPU and multi-GPU architectures. Following this idea, the desire is to improve performance through the sum of resources. HPSM uses parallel loops and reductions implemented by three parallel back-ends, being Serial, OpenMP and StarPU. Our hypothesis estimates that scientific applications can explore heterogeneous processing in multi-CPU and multi-GPU to achieve a better performance than exploring just accelerators. Comparisons with other parallel programming interfaces demonstrated that HPSM can reduce a multi-CPU and multi-GPU code in more than 50%. The use of the new API can introduce impact to program performance, where experiments showed a variable overhead for each application, that can achieve a maximum value of 16,4%. The experimental results confirmed the hypothesis, because the N-Body, Hotspot e CFD applications achieved gains using just CPUs and just GPUs, as well as overcame the performance achieved by just accelerators (GPUs) through the combination of multi-CPU and multi-GPU. / Arquiteturas paralelas são consideradas ubíquas atualmente. No entanto, o mesmo termo não pode ser aplicado aos programas paralelos, pois existe uma complexidade maior para codificálos em relação aos programas convencionais. Este fato é agravado quando a programação envolve também aceleradores, como GPUs, que demandam o uso de ferramentas com recursos muito específicos. Neste cenário, apesar de existirem modelos de programação que facilitam a codificação de aplicações paralelas para explorar aceleradores, desconhece-se a existência de APIs que permitam a construção de programas com laços paralelos que possam ser processados simultaneamente em múltiplas CPUs e múltiplas GPUs. Este trabalho apresenta uma API C++ de alto nível, denominada HPSM, visando facilitar e tornar mais eficiente a codificação de programas paralelos voltados a explorar arquiteturas com multi-CPU e multi-GPU. Seguindo esta ideia, deseja-se ganhar desempenho através da soma dos recursos. A HPSM é baseada em laços e reduções paralelas implementadas por meio de três diferentes back-ends paralelos, sendo Serial, OpenMP e StarPU. A hipótese deste estudo é que aplicações científicas podem valer-se do processamento heterogêneo em multi-CPU e multi-GPU para alcançar um desempenho superior em relação ao uso de apenas aceleradores. Comparações com outras interfaces de programação paralela demonstraram que o uso da HPSM pode reduzir em mais de 50% o tamanho de um programa multi-CPU e multi-GPU. O uso da nova API pode trazer impacto no desempenho do programa, sendo que experimentos demonstraram que seu sobrecusto é variável de acordo com a aplicação, chegando até 16,4%. Os resultados experimentais confirmaram a hipótese, pois as aplicações N-Body, Hotspot e CFD, além de alcançarem ganhos ao utilizar somente CPUs e somente GPUs, também superaram o desempenho obtido por somente aceleradores (GPUs) através da combinação de multi-CPU e multi-GPU. API C++ Programação paralela Laços paralelos Computação heterogênea GPU C++ API Parallel programming Parallel loops Heterogenous computing

1

Page generated in 0.1114 seconds