• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 341
  • 189
  • 134
  • 56
  • 45
  • 44
  • 4
  • 4
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 927
  • 927
  • 927
  • 404
  • 396
  • 351
  • 351
  • 329
  • 325
  • 320
  • 319
  • 316
  • 314
  • 313
  • 313
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
291

Advances in High Performance Computing Through Concurrent Data Structures and Predictive Scheduling

Lamar, Kenneth M 01 January 2024 (has links) (PDF)
Modern High Performance Computing (HPC) systems are made up of thousands of server-grade compute nodes linked through a high-speed network interconnect. Each node has tens or even hundreds of CPU cores each, with counts continuing to grow on newer HPC clusters. This results in a need to make use of millions of cores per cluster. Fully leveraging these resources is difficult. There is an active need to design software that scales and fully utilizes the hardware. In this dissertation, we address this gap with a dual approach, considering both intra-node (single node) and inter-node (across node) concerns. To aid in intra-node performance, we propose two novel concurrent data structures: a transactional vector and a persistent hash map. These designs have broad applicability in any multi-core environment but are particularly useful in HPC, which commonly features many cores per node. For inter-node performance, we propose a metrics-driven approach to improve scheduling quality, using predicted run times to backfill jobs more accurately and aggressively. This is augmented using application input parameters to further improve these run time predictions. Improved scheduling reduces the number of idle nodes in an HPC cluster, maximizing job throughput. We find that our data structures outperform the prior state-of-the-art while offering additional features. Our backfill technique likewise outperforms previous approaches in simulations, and our run time predictions were significantly more accurate than conventional approaches. Code for these works is freely available, and we have plans to deploy these techniques more broadly on real HPC systems in the future.
292

Impact of data dependencies for real-time high performance computing.

Hossain, M. Alamgir, Kabir, U., Tokhi, M.O. January 2002 (has links)
No / This paper presents an investigation into the impact of data dependencies in real-time high performance sequential and parallel processing. An adaptive active vibration control algorithm is considered to demonstrate the impact of data dependencies in real-time computing. The algorithm is analysed in detail to explore the inherent data dependencies. To minimize the impact of data dependencies, an investigation into reducing memory access in sequential computing is provided. The impact of data dependencies with various interconnections is also explored and demonstrated in real-time parallel processing through a set of experiments.
293

Toward full-stack in silico synthetic biology: integrating model specification, simulation, verification, and biological compilation

Konur, Savas, Mierla, L.M., Fellermann, H., Ladroue, C., Brown, B., Wipat, A., Twycross, J., Dun, B.P., Kalvala, S., Gheorghe, Marian, Krasnogor, N. 02 August 2021 (has links)
Yes / We present the Infobiotics Workbench (IBW), a user-friendly, scalable, and integrated computational environment for the computer-aided design of synthetic biological systems. It supports an iterative workflow that begins with specification of the desired synthetic system, followed by simulation and verification of the system in high- performance environments and ending with the eventual compilation of the system specification into suitable genetic constructs. IBW integrates modelling, simulation, verification and bicompilation features into a single software suite. This integration is achieved through a new domain-specific biological programming language, the Infobiotics Language (IBL), which tightly combines these different aspects of in silico synthetic biology into a full-stack integrated development environment. Unlike existing synthetic biology modelling or specification languages, IBL uniquely blends modelling, verification and biocompilation statements into a single file. This allows biologists to incorporate design constraints within the specification file rather than using decoupled and independent formalisms for different in silico analyses. This novel approach offers seamless interoperability across different tools as well as compatibility with SBOL and SBML frameworks and removes the burden of doing manual translations for standalone applications. We demonstrate the features, usability, and effectiveness of IBW and IBL using well-established synthetic biological circuits. / The work of S.K. is supported by EPSRC (EP/R043787/1). N.K., A.W., and B.B. acknowledge a Royal Academy of Engineering Chair in Emerging Technologies award and an EPSRC programme grant (EP/N031962/1).
294

Parallel programming on General Block Min Max Criterion

Lee, ChuanChe 01 January 2006 (has links)
The purpose of the thesis is to develop a parallel implementation of the General Block Min Max Criterion (GBMM). This thesis deals with two kinds of parallel overheads: Redundant Calculations Parallel Overhead (RCPO) and Communication Parallel Overhead (CPO).
295

Multi-level extensions for the fast and robust overlapping Schwarz preconditioners

Röver, Friederike 14 June 2023 (has links)
Der GDSW-Vorkonditionierer ist ein zweistufiges überlappendes Schwarz-Gebietszerlegungsverfahren mit einem energieminimierenden Grobraum, dessen parallele Skalierbarkeit durch das direkt gelöste Grobproblem begrenzt ist. Zur Verbesserung der parallelen Skalierbarkeit wurde hier eine mehrstufige Erweiterung eingeführt. Für den Fall skalarer elliptischer Probleme wurde eine Konditionierungszahlschranke aufgestellt. Die parallele Implementierung wurde in das quelloffene ShyLU/FROSch Paket der Trilinos-Softwarebibliothek (http://trilinos.org) integriert und auf mehreren der leistungsstärksten Supercomputern der Welt (JUQUEEN, Forschungszentrum Jülich; SuperMUC-NG, LRZ Garching; Theta, Argonne Leadership Computing Facility, Argonne National Laboratory, USA) für Modellprobleme (Laplace und lineare Elastizität) getestet. Das angestrebte Ziel einer verbesserten parallelen Skalierbarkeit wurde erreicht, der Bereich der Skalierbarkeit wurde um mehr als eine Größenordnung erweitert. Die größten Rechnungen verwendeten mehr als 200000 Prozessorkerne des Theta Supercomputers. Zudem wurde die Anwendung des GDSW-Vorkonditionierers auf ein vollständig gekoppeltes nichtlineare Deformations-Diffusions Problem in der Chemomechanik betrachtet.
296

Parallel likelihood calculations for phylogenetic trees

Hayward, Peter 12 1900 (has links)
Thesis (MSc)--Stellenbosch University, 2011. / ENGLISH ABSTRACT: Phylogenetic analysis is the study of evolutionary relationships among organisms. To this end, phylogenetic trees, or evolutionary trees, are used to depict the evolutionary relationships between organisms as reconstructed from DNA sequence data. The likelihood of a given tree is commonly calculated for many purposes including inferring phylogenies, sampling from the space of likely trees and inferring other parameters governing the evolutionary process. This is done using Felsenstein’s algorithm, a widely implemented dynamic programming approach that reduces the computational complexity from exponential to linear in the number of taxa. However, with the advent of efficient modern sequencing techniques the size of data sets are rapidly increasing beyond current computational capability. Parallel computing has been used successfully to address many similar problems and is currently receiving attention in the realm of phylogenetic analysis. Work has been done using data decomposition, where the likelihood calculation is parallelised over DNA sequence sites. We propose an alternative way of parallelising the likelihood calculation, which we call segmentation, where the tree is broken down into subtrees and the likelihood of each subtree is calculated concurrently over multiple processes. We introduce our proposed system, which aims to drastically increase the size of trees that can be practically used in phylogenetic analysis. Then, we evaluate the system on large phylogenies which are constructed from both real and synthetic data, to show that a larger decrease of run times are obtained when the system is used. / AFRIKAANSE OPSOMMING:Filogenetiese analise is die studie van evolusionêre verwantskappe tussen organismes. Filogenetiese of evolusionêre bome word aangewend om die evolusionêre verwantskappe, soos herwin vanuit DNS-kettings data, tussen organismes uit te beeld. Die aanneemlikheid van ’n gegewe filogenie word oor die algemeen bereken en aangewend vir menigte doeleindes, insluitende die afleiding van filogenetiese bome, om te monster vanuit ’n versameling van sulke moontlike bome en vir die afleiding van ander belangrike parameters in die evolusionêre proses. Dit word vermag met behulp van Felsenstein se algoritme, ’n alombekende benaderingwyse wat gebruik maak van dinamiese programmering om die berekeningskompleksiteit van eksponensieel na lineêr in die aantal taxa, te herlei. Desnieteenstaande, het die koms van moderne, doeltreffender orderingsmetodes groter datastelle tot gevolg wat vinnig besig is om bestaande berekeningsvermoë te oorskry. Parallelle berekeningsmetodes is reeds suksesvol toegepas om vele soortgelyke probleme op te los, met groot belangstelling tans in die sfeer van filogenetiese analise. Werk is al gedoen wat gebruik maak van data dekomposisie, waar die aanneemlikheidsberekening oor die DNS basisse geparallelliseer word. Ons stel ’n alternatiewe metode voor, wat ons segmentasie noem, om die aanneemlikheidsberekening te parallelliseer, deur die filogenetiese boom op te breek in sub-bome, en die aanneemlikheid van elke sub-boom gelyklopend te bereken oor verskeie verwerkingseenhede. Ons stel ’n stelsel voor wat dit ten doel het om ’n drastiese toename in die grootte van die bome wat gebruik kan word in filogenetiese analise, teweeg te bring. Dan, word ons voorgestelde stelsel op groot filogenetiese bome, wat vanaf werklike en sintetiese data gekonstrueer is, evalueer. Dit toon aan dat ’n groter afname in looptyd verkry word wanneer die stelsel in gebruik is.
297

A High Order Finite Difference Method for Simulating Earthquake Sequences in a Poroelastic Medium

Torberntsson, Kim, Stiernström, Vidar January 2016 (has links)
Induced seismicity (earthquakes caused by injection or extraction of fluids in Earth's subsurface) is a major, new hazard in the United States, the Netherlands, and other countries, with vast economic consequences if not properly managed. Addressing this problem requires development of predictive simulations of how fluid-saturated solids containing frictional faults respond to fluid injection/extraction. Here we present a numerical method for linear poroelasticity with rate-and-state friction faults. A numerical method for approximating the fully coupled linear poroelastic equations is derived using the summation-by-parts-simultaneous-approximation-term (SBP-SAT) framework. Well-posedness is shown for a set of physical boundary conditions in 1D and in 2D. The SBP-SAT technique is used to discretize the governing equations and show semi-discrete stability and the correctness of the implementation is verified by rigorous convergence tests using the method of manufactured solutions, which shows that the expected convergence rates are obtained for a problem with spatially variable material parameters. Mandel's problem and a line source problem are studied, where simulation results and convergence studies show satisfactory numerical properties. Furthermore, two problem setups involving fault dynamics and slip on faults triggered by fluid injection are studied, where the simulation results show that fluid injection can trigger earthquakes, having implications for induced seismicity. In addition, the results show that the scheme used for solving the fully coupled problem, captures dynamics that would not be seen in an uncoupled model. Future improvements involve imposing Dirichlet boundary conditions using a different technique, extending the scheme to handle curvilinear coordinates and three spatial dimensions, as well as improving the high-performance code and extending the study of the fault dynamics.
298

HYBRID PARALLELIZATION OF THE NASA GEMINI ELECTROMAGNETIC MODELING TOOL

Johnson, Buxton L., Sr. 01 January 2017 (has links)
Understanding, predicting, and controlling electromagnetic field interactions on and between complex RF platforms requires high fidelity computational electromagnetic (CEM) simulation. The primary CEM tool within NASA is GEMINI, an integral equation based method-of-moments (MoM) code for frequency domain electromagnetic modeling. However, GEMINI is currently limited in the size and complexity of problems that can be effectively handled. To extend GEMINI’S CEM capabilities beyond those currently available, primary research is devoted to integrating the MFDlib library developed at the University of Kentucky with GEMINI for efficient filling, factorization, and solution of large electromagnetic problems formulated using integral equation methods. A secondary research project involves the hybrid parallelization of GEMINI for the efficient speedup of the impedance matrix filling process. This thesis discusses the research, development, and testing of the secondary research project on the High Performance Computing DLX Linux supercomputer cluster. Initial testing of GEMINI’s existing MPI parallelization establishes the benchmark for speedup and reveals performance issues subsequently solved by the NASA CEM Lab. Implementation of hybrid parallelization incorporates GEMINI’s existing course level MPI parallelization with Open MP fine level parallel threading. Simple and nested Open MP threading are compared. Final testing documents the improvements realized by hybrid parallelization.
299

Contribution à la modélisation numérique de la propagation des ondes sismiques sur architectures multicœurs et hiérarchiques

Dupros, Fabrice 13 December 2010 (has links)
En termes de prévention du risque associé aux séismes, la prédiction quantitative des phénomènes de propagation et d'amplification des ondes sismiques dans des structures géologiques complexes devient essentielle. Dans ce domaine, la simulation numérique est prépondérante et l'exploitation efficace des techniques de calcul haute performance permet d'envisager les modélisations à grande échelle nécessaires dans le domaine du risque sismique.Plusieurs évolutions récentes au niveau de l'architecture des machines parallèles nécessitent l'adaptation des algorithmes classiques utilisées pour la modélisation sismique. En effet, l'augmentation de la puissance des processeurs se traduit maintenant principalement par un nombre croissant de cœurs de calcul et les puces multicœurs sont maintenant à la base de la majorité des architectures multiprocesseurs. Ce changement correspond également à une plus grande complexité au niveau de l'organisation physique de la mémoire qui s'articule généralement autour d'une architecture NUMA (Non Uniform Memory Access pour accès mémoire non uniforme) de profondeur importante.Les contributions de cette thèse se situent à la fois au niveau algorithmique et numérique mais abordent également l'articulation avec les supports d'exécution optimisés pour les architectures multicœurs. Les solutions retenues sont validées à grande échelle en considérant deux exemples de modélisation sismique. Le premier cas se situe dans la préfecture de Niigata-Chuetsu au Japon (événement du 16 juillet 2007) et repose sur la méthode des différences finies. Le deuxième exemple met en œuvre la méthode des éléments finis. Un séisme hypothétique dans la région de Nice est modélisé en tenant compte du comportement non linéaire du sol. / One major goal of strong motion seismology is the estimation of damage in future earthquake scenarios. Simulation of large scale seismic wave propagation is of great importance for efficient strong motion analysis and risk mitigation. Being particularly CPU-consuming, this three-dimensional problem makes use of high-performance computing technologies to make realistic simulation feasible on a regional scale at relatively high frequencies.Several evolutions at the chip level have an important impact on the performance of classical implementation of seismic applications. The trend in parallel computing is to increase the number of cores available at the shared-memory level with possible non-uniform cost of memory accesses. The increasing number of cores per processor and the effort made to overcome the limitation of classical symmetric multiprocessors SMP systems make available a growing number of NUMA (Non Uniform Memory Access) architecture as computing node. We therefore need to consider new approaches more suitable to such parallel systems.This PhD work addresses both the algorithmic issues and the integration of efficient programming models for multicore architectures. The proposed contributions are validated with two large scale examples. The first case is the modeling of the 2007 Niigata-Chuetsu, Japan earthquake based on the finite differences numerical method. The second example considers a potential seismic event in the Nice sedimentary basin in the French Riviera. The finite elements method is used and the nonlinear soil behavior is taken into account.
300

Accélération matérielle pour l’imagerie sismique : modélisation, migration et interprétation / Hardware acceleration for seismic imaging : modeling, migration and interpretation

Abdelkhalek, Rached 20 December 2013 (has links)
La donnée sismique depuis sa conception (modélisation d’acquisitions sismiques), dans sa phase de traitement (prétraitement et migration) et jusqu’à son exploitation pour en extraire les informations géologiques pertinentes nécessaires à l’identification et l’exploitation optimale des réservoirs d’hydrocarbures (interprétation), génère un volume important de calculs. Nous montrons dans ce travail de thèse qu’à chacune de ces étapes l’utilisation de technologies accélératrices de type GPGPU permet de réduire radicalement les temps de calcul tout en restant dans une enveloppe de consommation électrique raisonnable. Nous présentons et analysons les éléments sous-jacents à ces performances. L’importance de l’utilisation de motifs d’accès mémoire adéquats est particulièrement mise en exergue étant donné que l’accès à la mémoire représente le principal goulot d’étranglement pour les algorithmes abordés. Nous reportons des facteurs d’accélération de l’ordre de 40 pour la modélisation sismique par résolution de l’équation d’onde par différences finies (brique de base pour la modélisation et l’imagerie sismique) et entre 8 et 113 pour le calcul d’attributs sismiques. Nous démontrons que l’utilisation d’accélérateurs matériels élargit considérablement le champ du possible, aussi bien en imagerie sismique (modélisation de nouveaux types d’acquisitions à grande échelle) qu’en interprétation (calcul d’attributs complexes sur station de travail, paramétrage interactif des calculs, etc.). / During the seismic imaging workflow, from seismic modeling to interpretation, processingseismic data requires a massive amount of computation. We show in this work that, at eachstage of this workflow, hardware accelerators such as GPUs may help reducing the time requiredto process seismic data while staying at reasonable energy consumption levels.In this work, the key programming considerations needed to achieve good performance are describedand discussed. The importance of adapted in-memory data access patterns is particularlyemphasised since data access is the main bottleneck for the considered algorithms. When usingGPUs, speedup ratios of 40× are achieved for FDTD seismic modeling, and 8× up to 113× forseismic attribute computation compared to CPUs.

Page generated in 0.1214 seconds