Global ETD Search

291	Objective-Driven Strategies for HPC Job Scheduling Goponenko, Alexander V 01 January 2024 (has links) (PDF) As High-Performance Computing (HPC) becomes increasingly prevalent and resource-intensive, there is a growing need for the development of more efficient job schedulers, which play a crucial role in the performance of HPC clusters. This dissertation manifests a comprehensive approach to this complex issue, contributing to three major components of the problem: (1) metrics of job packing efficiency and fairness, (2) advanced scheduling algorithms, and (3) job resource utilization prediction techniques. To ensure high relevance of the results, this study emphasizes scheduling objectives. Therefore, scheduling quality metrics are investigated first, yielding a set of metrics that allow comparing alternative schedules and evaluating scheduling goals trade-offs. The set of metrics enables the first comprehensive analysis of effects of different scheduling improvement approaches on several aspects of scheduling quality, covering a variety of list scheduling algorithms as well as constraint programming optimization schedulers. The contribution to the third research area covers techniques to measure and estimate resource usage data. It reports a first-of-a-kind evaluation of various job runtime prediction techniques in improving scheduling quality, demonstrates an approach capable of estimating job parameters beyond the runtime, and explores measuring resources consumed by a job in an HPC cluster. The dissertation concludes with a practical demonstration of these concepts through an I/O-aware scheduling prototype that measures real-time resource utilization, autonomously determines job resource requirements the scheduler needs, and implements full-featured multi-resource backfill scheduling that accounts for the specific properties of the parallel file system bandwidth resource. The study exhibits the advantages of further reducing I/O congestion—beyond the capability of generic I/O-aware scheduling—and presents the Workload-adaptive scheduling strategy that attains such improvement. This approach features a “two-group” approximation technique to maintain efficient performance regardless of zero-throughput job availability. An evaluation conducted on a real HPC cluster demonstrates the effectiveness of the novel strategy. high-performance computing parallel job scheduling schedule quality constraint programming I/O-aware scheduling Slurm
292	Advances in High Performance Computing Through Concurrent Data Structures and Predictive Scheduling Lamar, Kenneth M 01 January 2024 (has links) (PDF) Modern High Performance Computing (HPC) systems are made up of thousands of server-grade compute nodes linked through a high-speed network interconnect. Each node has tens or even hundreds of CPU cores each, with counts continuing to grow on newer HPC clusters. This results in a need to make use of millions of cores per cluster. Fully leveraging these resources is difficult. There is an active need to design software that scales and fully utilizes the hardware. In this dissertation, we address this gap with a dual approach, considering both intra-node (single node) and inter-node (across node) concerns. To aid in intra-node performance, we propose two novel concurrent data structures: a transactional vector and a persistent hash map. These designs have broad applicability in any multi-core environment but are particularly useful in HPC, which commonly features many cores per node. For inter-node performance, we propose a metrics-driven approach to improve scheduling quality, using predicted run times to backfill jobs more accurately and aggressively. This is augmented using application input parameters to further improve these run time predictions. Improved scheduling reduces the number of idle nodes in an HPC cluster, maximizing job throughput. We find that our data structures outperform the prior state-of-the-art while offering additional features. Our backfill technique likewise outperforms previous approaches in simulations, and our run time predictions were significantly more accurate than conventional approaches. Code for these works is freely available, and we have plans to deploy these techniques more broadly on real HPC systems in the future. concurrent data structures high performance computing transactions persistence backfill scheduling run time prediction
293	Impact of data dependencies for real-time high performance computing. Hossain, M. Alamgir, Kabir, U., Tokhi, M.O. January 2002 (has links) No / This paper presents an investigation into the impact of data dependencies in real-time high performance sequential and parallel processing. An adaptive active vibration control algorithm is considered to demonstrate the impact of data dependencies in real-time computing. The algorithm is analysed in detail to explore the inherent data dependencies. To minimize the impact of data dependencies, an investigation into reducing memory access in sequential computing is provided. The impact of data dependencies with various interconnections is also explored and demonstrated in real-time parallel processing through a set of experiments. Active vibration control Data dependency High performance computing Memory management Finite difference method Real-time control
294	Toward full-stack in silico synthetic biology: integrating model specification, simulation, verification, and biological compilation Konur, Savas, Mierla, L.M., Fellermann, H., Ladroue, C., Brown, B., Wipat, A., Twycross, J., Dun, B.P., Kalvala, S., Gheorghe, Marian, Krasnogor, N. 02 August 2021 (has links) Yes / We present the Infobiotics Workbench (IBW), a user-friendly, scalable, and integrated computational environment for the computer-aided design of synthetic biological systems. It supports an iterative workflow that begins with specification of the desired synthetic system, followed by simulation and verification of the system in high- performance environments and ending with the eventual compilation of the system specification into suitable genetic constructs. IBW integrates modelling, simulation, verification and bicompilation features into a single software suite. This integration is achieved through a new domain-specific biological programming language, the Infobiotics Language (IBL), which tightly combines these different aspects of in silico synthetic biology into a full-stack integrated development environment. Unlike existing synthetic biology modelling or specification languages, IBL uniquely blends modelling, verification and biocompilation statements into a single file. This allows biologists to incorporate design constraints within the specification file rather than using decoupled and independent formalisms for different in silico analyses. This novel approach offers seamless interoperability across different tools as well as compatibility with SBOL and SBML frameworks and removes the burden of doing manual translations for standalone applications. We demonstrate the features, usability, and effectiveness of IBW and IBL using well-established synthetic biological circuits. / The work of S.K. is supported by EPSRC (EP/R043787/1). N.K., A.W., and B.B. acknowledge a Royal Academy of Engineering Chair in Emerging Technologies award and an EPSRC programme grant (EP/N031962/1). Synthetic biology Computational biology In silico Modelling Simulation Verification Biocompilation High performance computing SBOL SBML
295	Parallel programming on General Block Min Max Criterion Lee, ChuanChe 01 January 2006 (has links) The purpose of the thesis is to develop a parallel implementation of the General Block Min Max Criterion (GBMM). This thesis deals with two kinds of parallel overheads: Redundant Calculations Parallel Overhead (RCPO) and Communication Parallel Overhead (CPO). Parallel programming (Computer science) High performance computing Computer algorithms Parallel algorithms Computer algorithms High performance computing Parallel algorithms Parallel programming (Computer science) Software Engineering
296	Multi-level extensions for the fast and robust overlapping Schwarz preconditioners Röver, Friederike 14 June 2023 (has links) Der GDSW-Vorkonditionierer ist ein zweistufiges überlappendes Schwarz-Gebietszerlegungsverfahren mit einem energieminimierenden Grobraum, dessen parallele Skalierbarkeit durch das direkt gelöste Grobproblem begrenzt ist. Zur Verbesserung der parallelen Skalierbarkeit wurde hier eine mehrstufige Erweiterung eingeführt. Für den Fall skalarer elliptischer Probleme wurde eine Konditionierungszahlschranke aufgestellt. Die parallele Implementierung wurde in das quelloffene ShyLU/FROSch Paket der Trilinos-Softwarebibliothek (http://trilinos.org) integriert und auf mehreren der leistungsstärksten Supercomputern der Welt (JUQUEEN, Forschungszentrum Jülich; SuperMUC-NG, LRZ Garching; Theta, Argonne Leadership Computing Facility, Argonne National Laboratory, USA) für Modellprobleme (Laplace und lineare Elastizität) getestet. Das angestrebte Ziel einer verbesserten parallelen Skalierbarkeit wurde erreicht, der Bereich der Skalierbarkeit wurde um mehr als eine Größenordnung erweitert. Die größten Rechnungen verwendeten mehr als 200000 Prozessorkerne des Theta Supercomputers. Zudem wurde die Anwendung des GDSW-Vorkonditionierers auf ein vollständig gekoppeltes nichtlineare Deformations-Diffusions Problem in der Chemomechanik betrachtet. info:eu-repo/classification/ddc/510 ddc:510 Gebietszerlegungsmethode Hochleistungsrechnen Parallelrechner Skalierbarkeit
297	Parallel likelihood calculations for phylogenetic trees Hayward, Peter 12 1900 (has links) Thesis (MSc)--Stellenbosch University, 2011. / ENGLISH ABSTRACT: Phylogenetic analysis is the study of evolutionary relationships among organisms. To this end, phylogenetic trees, or evolutionary trees, are used to depict the evolutionary relationships between organisms as reconstructed from DNA sequence data. The likelihood of a given tree is commonly calculated for many purposes including inferring phylogenies, sampling from the space of likely trees and inferring other parameters governing the evolutionary process. This is done using Felsenstein’s algorithm, a widely implemented dynamic programming approach that reduces the computational complexity from exponential to linear in the number of taxa. However, with the advent of efficient modern sequencing techniques the size of data sets are rapidly increasing beyond current computational capability. Parallel computing has been used successfully to address many similar problems and is currently receiving attention in the realm of phylogenetic analysis. Work has been done using data decomposition, where the likelihood calculation is parallelised over DNA sequence sites. We propose an alternative way of parallelising the likelihood calculation, which we call segmentation, where the tree is broken down into subtrees and the likelihood of each subtree is calculated concurrently over multiple processes. We introduce our proposed system, which aims to drastically increase the size of trees that can be practically used in phylogenetic analysis. Then, we evaluate the system on large phylogenies which are constructed from both real and synthetic data, to show that a larger decrease of run times are obtained when the system is used. / AFRIKAANSE OPSOMMING:Filogenetiese analise is die studie van evolusionêre verwantskappe tussen organismes. Filogenetiese of evolusionêre bome word aangewend om die evolusionêre verwantskappe, soos herwin vanuit DNS-kettings data, tussen organismes uit te beeld. Die aanneemlikheid van ’n gegewe filogenie word oor die algemeen bereken en aangewend vir menigte doeleindes, insluitende die afleiding van filogenetiese bome, om te monster vanuit ’n versameling van sulke moontlike bome en vir die afleiding van ander belangrike parameters in die evolusionêre proses. Dit word vermag met behulp van Felsenstein se algoritme, ’n alombekende benaderingwyse wat gebruik maak van dinamiese programmering om die berekeningskompleksiteit van eksponensieel na lineêr in die aantal taxa, te herlei. Desnieteenstaande, het die koms van moderne, doeltreffender orderingsmetodes groter datastelle tot gevolg wat vinnig besig is om bestaande berekeningsvermoë te oorskry. Parallelle berekeningsmetodes is reeds suksesvol toegepas om vele soortgelyke probleme op te los, met groot belangstelling tans in die sfeer van filogenetiese analise. Werk is al gedoen wat gebruik maak van data dekomposisie, waar die aanneemlikheidsberekening oor die DNS basisse geparallelliseer word. Ons stel ’n alternatiewe metode voor, wat ons segmentasie noem, om die aanneemlikheidsberekening te parallelliseer, deur die filogenetiese boom op te breek in sub-bome, en die aanneemlikheid van elke sub-boom gelyklopend te bereken oor verskeie verwerkingseenhede. Ons stel ’n stelsel voor wat dit ten doel het om ’n drastiese toename in die grootte van die bome wat gebruik kan word in filogenetiese analise, teweeg te bring. Dan, word ons voorgestelde stelsel op groot filogenetiese bome, wat vanaf werklike en sintetiese data gekonstrueer is, evalueer. Dit toon aan dat ’n groter afname in looptyd verkry word wanneer die stelsel in gebruik is. Phylogenetics Bioinformatics High-performance computing Parallel programming Dissertations -- Computer science Theses -- Computer science Dissertations -- Mathematics Theses -- Mathematics
298	A High Order Finite Difference Method for Simulating Earthquake Sequences in a Poroelastic Medium Torberntsson, Kim, Stiernström, Vidar January 2016 (has links) Induced seismicity (earthquakes caused by injection or extraction of fluids in Earth's subsurface) is a major, new hazard in the United States, the Netherlands, and other countries, with vast economic consequences if not properly managed. Addressing this problem requires development of predictive simulations of how fluid-saturated solids containing frictional faults respond to fluid injection/extraction. Here we present a numerical method for linear poroelasticity with rate-and-state friction faults. A numerical method for approximating the fully coupled linear poroelastic equations is derived using the summation-by-parts-simultaneous-approximation-term (SBP-SAT) framework. Well-posedness is shown for a set of physical boundary conditions in 1D and in 2D. The SBP-SAT technique is used to discretize the governing equations and show semi-discrete stability and the correctness of the implementation is verified by rigorous convergence tests using the method of manufactured solutions, which shows that the expected convergence rates are obtained for a problem with spatially variable material parameters. Mandel's problem and a line source problem are studied, where simulation results and convergence studies show satisfactory numerical properties. Furthermore, two problem setups involving fault dynamics and slip on faults triggered by fluid injection are studied, where the simulation results show that fluid injection can trigger earthquakes, having implications for induced seismicity. In addition, the results show that the scheme used for solving the fully coupled problem, captures dynamics that would not be seen in an uncoupled model. Future improvements involve imposing Dirichlet boundary conditions using a different technique, extending the scheme to handle curvilinear coordinates and three spatial dimensions, as well as improving the high-performance code and extending the study of the fault dynamics. numerical analysis numerical methods numerical modeling SBP-SAT finite differences high performance computing geophysics geomechanics poroelasticity fault mechanics induced seismicity
299	HYBRID PARALLELIZATION OF THE NASA GEMINI ELECTROMAGNETIC MODELING TOOL Johnson, Buxton L., Sr. 01 January 2017 (has links) Understanding, predicting, and controlling electromagnetic field interactions on and between complex RF platforms requires high fidelity computational electromagnetic (CEM) simulation. The primary CEM tool within NASA is GEMINI, an integral equation based method-of-moments (MoM) code for frequency domain electromagnetic modeling. However, GEMINI is currently limited in the size and complexity of problems that can be effectively handled. To extend GEMINI’S CEM capabilities beyond those currently available, primary research is devoted to integrating the MFDlib library developed at the University of Kentucky with GEMINI for efficient filling, factorization, and solution of large electromagnetic problems formulated using integral equation methods. A secondary research project involves the hybrid parallelization of GEMINI for the efficient speedup of the impedance matrix filling process. This thesis discusses the research, development, and testing of the secondary research project on the High Performance Computing DLX Linux supercomputer cluster. Initial testing of GEMINI’s existing MPI parallelization establishes the benchmark for speedup and reveals performance issues subsequently solved by the NASA CEM Lab. Implementation of hybrid parallelization incorporates GEMINI’s existing course level MPI parallelization with Open MP fine level parallel threading. Simple and nested Open MP threading are compared. Final testing documents the improvements realized by hybrid parallelization. computational electromagnetics method of moments electric field integral equation hybrid parallelization high performance computing Electromagnetics and Photonics
300	Contribution à la modélisation numérique de la propagation des ondes sismiques sur architectures multicœurs et hiérarchiques Dupros, Fabrice 13 December 2010 (has links) En termes de prévention du risque associé aux séismes, la prédiction quantitative des phénomènes de propagation et d'amplification des ondes sismiques dans des structures géologiques complexes devient essentielle. Dans ce domaine, la simulation numérique est prépondérante et l'exploitation efficace des techniques de calcul haute performance permet d'envisager les modélisations à grande échelle nécessaires dans le domaine du risque sismique.Plusieurs évolutions récentes au niveau de l'architecture des machines parallèles nécessitent l'adaptation des algorithmes classiques utilisées pour la modélisation sismique. En effet, l'augmentation de la puissance des processeurs se traduit maintenant principalement par un nombre croissant de cœurs de calcul et les puces multicœurs sont maintenant à la base de la majorité des architectures multiprocesseurs. Ce changement correspond également à une plus grande complexité au niveau de l'organisation physique de la mémoire qui s'articule généralement autour d'une architecture NUMA (Non Uniform Memory Access pour accès mémoire non uniforme) de profondeur importante.Les contributions de cette thèse se situent à la fois au niveau algorithmique et numérique mais abordent également l'articulation avec les supports d'exécution optimisés pour les architectures multicœurs. Les solutions retenues sont validées à grande échelle en considérant deux exemples de modélisation sismique. Le premier cas se situe dans la préfecture de Niigata-Chuetsu au Japon (événement du 16 juillet 2007) et repose sur la méthode des différences finies. Le deuxième exemple met en œuvre la méthode des éléments finis. Un séisme hypothétique dans la région de Nice est modélisé en tenant compte du comportement non linéaire du sol. / One major goal of strong motion seismology is the estimation of damage in future earthquake scenarios. Simulation of large scale seismic wave propagation is of great importance for efficient strong motion analysis and risk mitigation. Being particularly CPU-consuming, this three-dimensional problem makes use of high-performance computing technologies to make realistic simulation feasible on a regional scale at relatively high frequencies.Several evolutions at the chip level have an important impact on the performance of classical implementation of seismic applications. The trend in parallel computing is to increase the number of cores available at the shared-memory level with possible non-uniform cost of memory accesses. The increasing number of cores per processor and the effort made to overcome the limitation of classical symmetric multiprocessors SMP systems make available a growing number of NUMA (Non Uniform Memory Access) architecture as computing node. We therefore need to consider new approaches more suitable to such parallel systems.This PhD work addresses both the algorithmic issues and the integration of efficient programming models for multicore architectures. The proposed contributions are validated with two large scale examples. The first case is the modeling of the 2007 Niigata-Chuetsu, Japan earthquake based on the finite differences numerical method. The second example considers a potential seismic event in the Nice sedimentary basin in the French Riviera. The finite elements method is used and the nonlinear soil behavior is taken into account. Calcul haute performance Modélisation sismique Architectures NUMA Processeurs multicœurs High performance computing Seismic modeling NUMA architecture Multicore processor

Search results