291 |
Objective-Driven Strategies for HPC Job SchedulingGoponenko, Alexander V 01 January 2024 (has links) (PDF)
As High-Performance Computing (HPC) becomes increasingly prevalent and resource-intensive, there is a growing need for the development of more efficient job schedulers, which play a crucial role in the performance of HPC clusters. This dissertation manifests a comprehensive approach to this complex issue, contributing to three major components of the problem: (1) metrics of job packing efficiency and fairness, (2) advanced scheduling algorithms, and (3) job resource utilization prediction techniques.
To ensure high relevance of the results, this study emphasizes scheduling objectives. Therefore, scheduling quality metrics are investigated first, yielding a set of metrics that allow comparing alternative schedules and evaluating scheduling goals trade-offs. The set of metrics enables the first comprehensive analysis of effects of different scheduling improvement approaches on several aspects of scheduling quality, covering a variety of list scheduling algorithms as well as constraint programming optimization schedulers. The contribution to the third research area covers techniques to measure and estimate resource usage data. It reports a first-of-a-kind evaluation of various job runtime prediction techniques in improving scheduling quality, demonstrates an approach capable of estimating job parameters beyond the runtime, and explores measuring resources consumed by a job in an HPC cluster.
The dissertation concludes with a practical demonstration of these concepts through an I/O-aware scheduling prototype that measures real-time resource utilization, autonomously determines job resource requirements the scheduler needs, and implements full-featured multi-resource backfill scheduling that accounts for the specific properties of the parallel file system bandwidth resource. The study exhibits the advantages of further reducing I/O congestion—beyond the capability of generic I/O-aware scheduling—and presents the Workload-adaptive scheduling strategy that attains such improvement. This approach features a “two-group” approximation technique to maintain efficient performance regardless of zero-throughput job availability. An evaluation conducted on a real HPC cluster demonstrates the effectiveness of the novel strategy.
|
292 |
Advances in High Performance Computing Through Concurrent Data Structures and Predictive SchedulingLamar, Kenneth M 01 January 2024 (has links) (PDF)
Modern High Performance Computing (HPC) systems are made up of thousands of server-grade compute nodes linked through a high-speed network interconnect. Each node has tens or even hundreds of CPU cores each, with counts continuing to grow on newer HPC clusters. This results in a need to make use of millions of cores per cluster. Fully leveraging these resources is difficult. There is an active need to design software that scales and fully utilizes the hardware. In this dissertation, we address this gap with a dual approach, considering both intra-node (single node) and inter-node (across node) concerns. To aid in intra-node performance, we propose two novel concurrent data structures: a transactional vector and a persistent hash map. These designs have broad applicability in any multi-core environment but are particularly useful in HPC, which commonly features many cores per node. For inter-node performance, we propose a metrics-driven approach to improve scheduling quality, using predicted run times to backfill jobs more accurately and aggressively. This is augmented using application input parameters to further improve these run time predictions. Improved scheduling reduces the number of idle nodes in an HPC cluster, maximizing job throughput. We find that our data structures outperform the prior state-of-the-art while offering additional features. Our backfill technique likewise outperforms previous approaches in simulations, and our run time predictions were significantly more accurate than conventional approaches. Code for these works is freely available, and we have plans to deploy these techniques more broadly on real HPC systems in the future.
|
293 |
Impact of data dependencies for real-time high performance computing.Hossain, M. Alamgir, Kabir, U., Tokhi, M.O. January 2002 (has links)
No / This paper presents an investigation into the impact of data dependencies in real-time high performance sequential and parallel processing. An adaptive active vibration control algorithm is considered to demonstrate the impact of data dependencies in real-time computing. The algorithm is analysed in detail to explore the inherent data dependencies. To minimize the impact of data dependencies, an investigation into reducing memory access in sequential computing is provided. The impact of data dependencies with various interconnections is also explored and demonstrated in real-time parallel processing through a set of experiments.
|
294 |
Toward full-stack in silico synthetic biology: integrating model specification, simulation, verification, and biological compilationKonur, Savas, Mierla, L.M., Fellermann, H., Ladroue, C., Brown, B., Wipat, A., Twycross, J., Dun, B.P., Kalvala, S., Gheorghe, Marian, Krasnogor, N. 02 August 2021 (has links)
Yes / We present the Infobiotics Workbench (IBW), a user-friendly, scalable, and integrated computational environment for the computer-aided design of synthetic biological systems. It supports an iterative workflow that begins with specification of the desired synthetic system, followed by simulation and verification of the system in high- performance environments and ending with the eventual compilation of the system specification into suitable genetic constructs. IBW integrates modelling, simulation, verification and bicompilation features into a single software suite. This integration is achieved through a new domain-specific biological programming language, the Infobiotics Language (IBL), which tightly combines these different aspects of in silico synthetic biology into a full-stack integrated development environment. Unlike existing synthetic biology modelling or specification languages, IBL uniquely blends modelling, verification and biocompilation statements into a single file. This allows biologists to incorporate design constraints within the specification file rather than using decoupled and independent formalisms for different in silico analyses. This novel approach offers seamless interoperability across different tools as well as compatibility with SBOL and SBML frameworks and removes the burden of doing manual translations for standalone applications. We demonstrate the features, usability, and effectiveness of IBW and IBL using well-established synthetic biological circuits. / The work of S.K. is supported by EPSRC (EP/R043787/1). N.K., A.W., and B.B. acknowledge a Royal Academy of Engineering Chair in Emerging Technologies award and an EPSRC programme grant (EP/N031962/1).
|
295 |
Parallel programming on General Block Min Max CriterionLee, ChuanChe 01 January 2006 (has links)
The purpose of the thesis is to develop a parallel implementation of the General Block Min Max Criterion (GBMM). This thesis deals with two kinds of parallel overheads: Redundant Calculations Parallel Overhead (RCPO) and Communication Parallel Overhead (CPO).
|
296 |
Multi-level extensions for the fast and robust overlapping Schwarz preconditionersRöver, Friederike 14 June 2023 (has links)
Der GDSW-Vorkonditionierer ist ein zweistufiges überlappendes Schwarz-Gebietszerlegungsverfahren mit einem energieminimierenden Grobraum, dessen parallele Skalierbarkeit durch das direkt gelöste Grobproblem begrenzt ist. Zur Verbesserung der parallelen Skalierbarkeit wurde hier eine mehrstufige Erweiterung eingeführt. Für den Fall skalarer elliptischer Probleme wurde eine Konditionierungszahlschranke aufgestellt. Die parallele Implementierung wurde in das quelloffene ShyLU/FROSch Paket der Trilinos-Softwarebibliothek (http://trilinos.org) integriert und auf mehreren der leistungsstärksten Supercomputern der Welt (JUQUEEN, Forschungszentrum Jülich; SuperMUC-NG, LRZ Garching; Theta, Argonne Leadership Computing Facility, Argonne National Laboratory, USA) für Modellprobleme (Laplace und lineare Elastizität) getestet. Das angestrebte Ziel einer verbesserten parallelen Skalierbarkeit wurde erreicht, der Bereich der Skalierbarkeit wurde um mehr als eine Größenordnung erweitert.
Die größten Rechnungen verwendeten mehr als 200000 Prozessorkerne des Theta Supercomputers. Zudem wurde die Anwendung des GDSW-Vorkonditionierers auf ein vollständig gekoppeltes nichtlineare Deformations-Diffusions Problem in der Chemomechanik betrachtet.
|
297 |
Parallel likelihood calculations for phylogenetic treesHayward, Peter 12 1900 (has links)
Thesis (MSc)--Stellenbosch University, 2011. / ENGLISH ABSTRACT: Phylogenetic analysis is the study of evolutionary relationships among organisms.
To this end, phylogenetic trees, or evolutionary trees, are used to
depict the evolutionary relationships between organisms as reconstructed from
DNA sequence data. The likelihood of a given tree is commonly calculated
for many purposes including inferring phylogenies, sampling from the space of
likely trees and inferring other parameters governing the evolutionary process.
This is done using Felsenstein’s algorithm, a widely implemented dynamic
programming approach that reduces the computational complexity from exponential
to linear in the number of taxa. However, with the advent of efficient
modern sequencing techniques the size of data sets are rapidly increasing beyond
current computational capability.
Parallel computing has been used successfully to address many similar
problems and is currently receiving attention in the realm of phylogenetic
analysis. Work has been done using data decomposition, where the likelihood
calculation is parallelised over DNA sequence sites. We propose an alternative
way of parallelising the likelihood calculation, which we call segmentation,
where the tree is broken down into subtrees and the likelihood of each subtree
is calculated concurrently over multiple processes. We introduce our proposed
system, which aims to drastically increase the size of trees that can be practically
used in phylogenetic analysis. Then, we evaluate the system on large
phylogenies which are constructed from both real and synthetic data, to show
that a larger decrease of run times are obtained when the system is used. / AFRIKAANSE OPSOMMING:Filogenetiese analise is die studie van evolusionêre verwantskappe tussen
organismes. Filogenetiese of evolusionêre bome word aangewend om die evolusionêre
verwantskappe, soos herwin vanuit DNS-kettings data, tussen organismes
uit te beeld. Die aanneemlikheid van ’n gegewe filogenie word oor die
algemeen bereken en aangewend vir menigte doeleindes, insluitende die afleiding
van filogenetiese bome, om te monster vanuit ’n versameling van sulke
moontlike bome en vir die afleiding van ander belangrike parameters in die evolusionêre
proses. Dit word vermag met behulp van Felsenstein se algoritme,
’n alombekende benaderingwyse wat gebruik maak van dinamiese programmering
om die berekeningskompleksiteit van eksponensieel na lineêr in die aantal
taxa, te herlei. Desnieteenstaande, het die koms van moderne, doeltreffender
orderingsmetodes groter datastelle tot gevolg wat vinnig besig is om bestaande
berekeningsvermoë te oorskry.
Parallelle berekeningsmetodes is reeds suksesvol toegepas om vele soortgelyke
probleme op te los, met groot belangstelling tans in die sfeer van filogenetiese
analise. Werk is al gedoen wat gebruik maak van data dekomposisie, waar
die aanneemlikheidsberekening oor die DNS basisse geparallelliseer word. Ons
stel ’n alternatiewe metode voor, wat ons segmentasie noem, om die aanneemlikheidsberekening
te parallelliseer, deur die filogenetiese boom op te breek in
sub-bome, en die aanneemlikheid van elke sub-boom gelyklopend te bereken
oor verskeie verwerkingseenhede. Ons stel ’n stelsel voor wat dit ten doel het
om ’n drastiese toename in die grootte van die bome wat gebruik kan word in
filogenetiese analise, teweeg te bring. Dan, word ons voorgestelde stelsel op
groot filogenetiese bome, wat vanaf werklike en sintetiese data gekonstrueer is,
evalueer. Dit toon aan dat ’n groter afname in looptyd verkry word wanneer
die stelsel in gebruik is.
|
298 |
A High Order Finite Difference Method for Simulating Earthquake Sequences in a Poroelastic MediumTorberntsson, Kim, Stiernström, Vidar January 2016 (has links)
Induced seismicity (earthquakes caused by injection or extraction of fluids in Earth's subsurface) is a major, new hazard in the United States, the Netherlands, and other countries, with vast economic consequences if not properly managed. Addressing this problem requires development of predictive simulations of how fluid-saturated solids containing frictional faults respond to fluid injection/extraction. Here we present a numerical method for linear poroelasticity with rate-and-state friction faults. A numerical method for approximating the fully coupled linear poroelastic equations is derived using the summation-by-parts-simultaneous-approximation-term (SBP-SAT) framework. Well-posedness is shown for a set of physical boundary conditions in 1D and in 2D. The SBP-SAT technique is used to discretize the governing equations and show semi-discrete stability and the correctness of the implementation is verified by rigorous convergence tests using the method of manufactured solutions, which shows that the expected convergence rates are obtained for a problem with spatially variable material parameters. Mandel's problem and a line source problem are studied, where simulation results and convergence studies show satisfactory numerical properties. Furthermore, two problem setups involving fault dynamics and slip on faults triggered by fluid injection are studied, where the simulation results show that fluid injection can trigger earthquakes, having implications for induced seismicity. In addition, the results show that the scheme used for solving the fully coupled problem, captures dynamics that would not be seen in an uncoupled model. Future improvements involve imposing Dirichlet boundary conditions using a different technique, extending the scheme to handle curvilinear coordinates and three spatial dimensions, as well as improving the high-performance code and extending the study of the fault dynamics.
|
299 |
HYBRID PARALLELIZATION OF THE NASA GEMINI ELECTROMAGNETIC MODELING TOOLJohnson, Buxton L., Sr. 01 January 2017 (has links)
Understanding, predicting, and controlling electromagnetic field interactions on and between complex RF platforms requires high fidelity computational electromagnetic (CEM) simulation. The primary CEM tool within NASA is GEMINI, an integral equation based method-of-moments (MoM) code for frequency domain electromagnetic modeling. However, GEMINI is currently limited in the size and complexity of problems that can be effectively handled. To extend GEMINI’S CEM capabilities beyond those currently available, primary research is devoted to integrating the MFDlib library developed at the University of Kentucky with GEMINI for efficient filling, factorization, and solution of large electromagnetic problems formulated using integral equation methods. A secondary research project involves the hybrid parallelization of GEMINI for the efficient speedup of the impedance matrix filling process. This thesis discusses the research, development, and testing of the secondary research project on the High Performance Computing DLX Linux supercomputer cluster. Initial testing of GEMINI’s existing MPI parallelization establishes the benchmark for speedup and reveals performance issues subsequently solved by the NASA CEM Lab. Implementation of hybrid parallelization incorporates GEMINI’s existing course level MPI parallelization with Open MP fine level parallel threading. Simple and nested Open MP threading are compared. Final testing documents the improvements realized by hybrid parallelization.
|
300 |
Contribution à la modélisation numérique de la propagation des ondes sismiques sur architectures multicœurs et hiérarchiquesDupros, Fabrice 13 December 2010 (has links)
En termes de prévention du risque associé aux séismes, la prédiction quantitative des phénomènes de propagation et d'amplification des ondes sismiques dans des structures géologiques complexes devient essentielle. Dans ce domaine, la simulation numérique est prépondérante et l'exploitation efficace des techniques de calcul haute performance permet d'envisager les modélisations à grande échelle nécessaires dans le domaine du risque sismique.Plusieurs évolutions récentes au niveau de l'architecture des machines parallèles nécessitent l'adaptation des algorithmes classiques utilisées pour la modélisation sismique. En effet, l'augmentation de la puissance des processeurs se traduit maintenant principalement par un nombre croissant de cœurs de calcul et les puces multicœurs sont maintenant à la base de la majorité des architectures multiprocesseurs. Ce changement correspond également à une plus grande complexité au niveau de l'organisation physique de la mémoire qui s'articule généralement autour d'une architecture NUMA (Non Uniform Memory Access pour accès mémoire non uniforme) de profondeur importante.Les contributions de cette thèse se situent à la fois au niveau algorithmique et numérique mais abordent également l'articulation avec les supports d'exécution optimisés pour les architectures multicœurs. Les solutions retenues sont validées à grande échelle en considérant deux exemples de modélisation sismique. Le premier cas se situe dans la préfecture de Niigata-Chuetsu au Japon (événement du 16 juillet 2007) et repose sur la méthode des différences finies. Le deuxième exemple met en œuvre la méthode des éléments finis. Un séisme hypothétique dans la région de Nice est modélisé en tenant compte du comportement non linéaire du sol. / One major goal of strong motion seismology is the estimation of damage in future earthquake scenarios. Simulation of large scale seismic wave propagation is of great importance for efficient strong motion analysis and risk mitigation. Being particularly CPU-consuming, this three-dimensional problem makes use of high-performance computing technologies to make realistic simulation feasible on a regional scale at relatively high frequencies.Several evolutions at the chip level have an important impact on the performance of classical implementation of seismic applications. The trend in parallel computing is to increase the number of cores available at the shared-memory level with possible non-uniform cost of memory accesses. The increasing number of cores per processor and the effort made to overcome the limitation of classical symmetric multiprocessors SMP systems make available a growing number of NUMA (Non Uniform Memory Access) architecture as computing node. We therefore need to consider new approaches more suitable to such parallel systems.This PhD work addresses both the algorithmic issues and the integration of efficient programming models for multicore architectures. The proposed contributions are validated with two large scale examples. The first case is the modeling of the 2007 Niigata-Chuetsu, Japan earthquake based on the finite differences numerical method. The second example considers a potential seismic event in the Nice sedimentary basin in the French Riviera. The finite elements method is used and the nonlinear soil behavior is taken into account.
|
Page generated in 0.2793 seconds