1 |
Energy-efficient mechanisms for managing on-chip storage in throughput processorsGebhart, Mark Alan 05 July 2012 (has links)
Modern computer systems are power or energy limited. While the number of transistors per chip continues to increase, classic Dennard voltage scaling has come to an end. Therefore, architects must improve a design's energy efficiency to continue to increase performance at historical rates, while staying within a system's power limit. Throughput processors, which use a large number of threads to tolerate
memory latency, have emerged as an energy-efficient platform for
achieving high performance on diverse workloads and are found in
systems ranging from cell phones to supercomputers. This work focuses
on graphics processing units (GPUs), which contain thousands of
threads per chip.
In this dissertation, I redesign the on-chip storage system of a
modern GPU to improve energy efficiency. Modern GPUs contain very large register files that consume between 15%-20% of the
processor's dynamic energy. Most values written into the register
file are only read a single time, often within a few instructions of
being produced. To optimize for these patterns, we explore various
designs for register file hierarchies. We study both a
hardware-managed register file cache and a software-managed operand register file. We evaluate the energy tradeoffs in varying the number of levels and the capacity of each level in the hierarchy. Our most efficient design reduces register file energy by 54%.
Beyond the register file, GPUs also contain on-chip scratchpad
memories and caches. Traditional systems have a fixed partitioning
between these three structures. Applications have diverse
requirements and often a single resource is most critical to
performance. We propose to unify the register file, primary data
cache, and scratchpad memory into a single structure that is
dynamically partitioned on a per-kernel basis to match the
application's needs.
The techniques proposed in this dissertation improve the utilization of on-chip memory, a scarce resource for systems with a large number of hardware threads. Making more efficient use of on-chip memory both improves performance and reduces energy. Future efficient systems will be achieved by the combination of several such techniques which
improve energy efficiency. / text
|
2 |
A Comprehensive Python Toolkit for Harnessing Cloud-Based High-Throughput Computing to Support Hydrologic Modeling WorkflowsChristensen, Scott D. 01 February 2016 (has links)
Advances in water resources modeling are improving the information that can be supplied to support decisions that affect the safety and sustainability of society, but these advances result in models being more computationally demanding. To facilitate the use of cost- effective computing resources to meet the increased demand through high-throughput computing (HTC) and cloud computing in modeling workflows and web applications, I developed a comprehensive Python toolkit that provides the following features: (1) programmatic access to diverse, dynamically scalable computing resources; (2) a batch scheduling system to queue and dispatch the jobs to the computing resources; (3) data management for job inputs and outputs; and (4) the ability for jobs to be dynamically created, submitted, and monitored from the scripting environment. To compose this comprehensive computing toolkit, I created two Python libraries (TethysCluster and CondorPy) that leverage two existing software tools (StarCluster and HTCondor). I further facilitated access to HTC in web applications by using these libraries to create powerful and flexible computing tools for Tethys Platform, a development and hosting platform for web-based water resources applications. I tested this toolkit while collaborating with other researchers to perform several modeling applications that required scalable computing. These applications included a parameter sweep with 57,600 realizations of a distributed, hydrologic model; a set of web applications for retrieving and formatting data; a web application for evaluating the hydrologic impact of land-use change; and an operational, national-scale, high- resolution, ensemble streamflow forecasting tool. In each of these applications the toolkit was successful in automating the process of running the large-scale modeling computations in an HTC environment.
|
3 |
Condor - Job-ManagementsystemGrabner, Rene 27 June 2002 (has links) (PDF)
In diesem Vortrag wird Condor als ein Job-Managementsystem für Rechen-Cluster vorgestellt. Dabei wird Funktionsweise an einem Beispiel demonstriert und erläutert. Besonders untersucht wird das Checkpointing und Migrieren von Prozessen zwischen verschiedenen Knoten.
|
4 |
Sensitivity analysis of biochemical systems using high-throughput computingKent, Edward Lander January 2013 (has links)
Mathematical modelling is playing an increasingly important role in helping us to understand biological systems. The construction of biological models typically requires the use of experimentally-measured parameter values. However, varying degrees of uncertainty surround virtually all parameters in these models. Sensitivity analysis is one of the most important tools for the analysis of models, and shows how the outputs of a model, such as concentrations and reaction fluxes, are dependent on the parameters which make up the input. Unfortunately, small changes in parameter values can lead to the results of a sensitivity analysis changing significantly. The results of such analyses must therefore be interpreted with caution, particularly if a high degree of uncertainty surrounds the parameter values. Global sensitivity analysis methods can help in such situations by allowing sensitivities to be calculated over a range of possible parameter values. However, these techniques are computationally expensive, particularly for larger, more detailed models. Software was developed to enable a number of computationally-intensive modelling tasks, including two global sensitivity analysis methods, to be run in parallel in a high-throughput computing environment. The use of high-throughput computing enabled the run time of these analyses to be drastically reduced, allowing models to be analysed to a degree that would otherwise be impractical or impossible. Global sensitivity analysis using high-throughput computing was performed on a selection of both theoretical and physiologically-based models. Varying degrees of parameter uncertainty were considered. These analyses revealed instances in which the results of a sensitivity analysis were valid, even under large degrees of parameter variation. Other cases were found for which only a slight change in parameter values could completely change the results of the analysis. Parameter uncertainties are a real problem in biological systems modelling. This work shows how, with the help of high-throughput computing, global sensitivity analysis can become a practical part of the modelling process.
|
5 |
Condor - Job-ManagementsystemGrabner, Rene 27 June 2002 (has links)
In diesem Vortrag wird Condor als ein Job-Managementsystem für Rechen-Cluster vorgestellt. Dabei wird Funktionsweise an einem Beispiel demonstriert und erläutert. Besonders untersucht wird das Checkpointing und Migrieren von Prozessen zwischen verschiedenen Knoten.
|
6 |
Coordenação de Infraestruturas Computacionais Distribuídas em larga escala baseadas em redes de broadcast de Sistemas de Televisão Digital.Vieira, Diénert de Alencar 28 February 2013 (has links)
Made available in DSpace on 2015-05-14T12:36:37Z (GMT). No. of bitstreams: 1
ArquivoTotalDienert.pdf: 3002239 bytes, checksum: 1f7cba7924f2a080510a72b608e59d82 (MD5)
Previous issue date: 2013-02-28 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / The On-Demand Distributed Computing Infrastructure (OddCI) makes use of a broad-cast communication network for allocating a large-scale set of processors aimed at High Throughput Computing (HTC). A broadcast network example is a Digital TV system whose signal is transmitted to thousands of receivers simultaneously. These receivers are machines with significant processing power available in our houses in increasing quantities and can be used as processing units. However, these potential processors are not completely dedicated, should be voluntarily ceded and may fail (turned off during the use), which makes them highly volatile resources. In other words, there are no guar-antees about the time they remain dedicated to a task. Thus, it is necessary to use mech-anisms able to deal with this volatility and to optimize the collective availability of the-se devices. This work aims at investigating OddCI architecture coordination heuristics, seeking for intelligent ways to allocate or release devices under the coverage of the broadcast network through sending collective messages with the goal of coordinating the allocated processors amount. In order to meet the established Service-Level Agree-ment (SLA), factors as resources population, volatility and number of simultaneous re-quests are considered among others. The efficiency of the coordination heuristics has been studied in a Digital TV network environment, through experiments with simula-tion. As results, we identified the most significant factors, the resulting effects and re-strictions in the evaluated scenarios. In the scenario where the backend infrastructure has limited capacity, the main factors were the size of the application image used by the instances and the number of concurrent instances, meeting the most extreme case of 4MB applications, 80 concurrent instances and volatilities of 40% with 50% of the re-quired parallelism. In the scenario where the minimum makespan was the goal, the main factors were the volatility, the population (and the devices availability), and meeting 50 concurrent instances with reduction of only 15% of the required average flow in the case of the smaller population with higher volatilities of up to 40%, showing how far the results have been favorable in each scenario. / A Infraestrutura Computacional Distribuída sob Demanda (On-Demand Distributed Computing Infrastructure OddCI) é uma arquitetura que utiliza como base uma rede de comunicação em broadcast para compor um ambiente em larga escala de processadores visando à computação dealta vazão (High Throughput Computing HTC). Um exemplo de rede de broadcast é o sistema de Televisão Digital cujo sinal é transmitido para milhares de receptores simultaneamente. Estes receptores são máquinas de significativo poder de processamento que estão em crescente aquisição pela população brasileira e podem ser utilizados como unidades de processamento para atingir alta vazão. Entretanto, esses potenciais processadores não são completamente dedicados e são cedidos voluntariamente, o que os tornam recursos altamente voláteis. Em outras palavras, não há garantias sobre o tempo que permanecem dedicados à uma tarefa. Assim, se faz necessária a utilização de algoritmos compensatórios que tratem essa volatilidade e otimizando adisponibilidade coletiva dos dispositivos. Este trabalho apresenta heurísticas de coordenação da arquitetura OddCI, que buscam convidar ou excluir dispositivos sob a cobertura da comunicação através de mensagens coletivas com o objetivo de coordenar a quantidade alocada de processadores, com vistas à atender acordos de nível de serviço (Service-Level Agreement SLA) estabelecidos. As heurísticas de coordenação são validadas por meio de simulação.
|
Page generated in 0.0738 seconds