Spelling suggestions: "subject:"high performance computing."" "subject:"igh performance computing.""
291 |
Towards a Polyalgorithm for Land Use and Land Cover Change DetectionSaxena, Rishu 23 February 2018 (has links)
Earth observation satellites (EOS) such as Landsat provide image datasets that can be immensely useful in numerous application domains. One way of analyzing satellite images for land use and land cover change (LULCC) is time series analysis (TSA). Several algorithms for time series analysis have been proposed by various groups in remote sensing; more algorithms (that can be adapted) are available in the general time series literature. However, in spite of an abundance of algorithms, the choice of algorithm to be used for analyzing an image stack is presently an open question. A concurrent issue is the prohibitive size of Landsat datasets, currently of the order of petabytes and growing. This makes them computationally unwieldy --- both in storage and processing. An EOS image stack typically consists of multiple images of a fixed area on the Earth's surface (same latitudes and longitudes) taken at different time points. Experiments on multicore servers indicate that carrying out meaningful time series analysis on one such interannual, multitemporal stack with existing state of the art codes can take several days.
This work proposes using multiple algorithms to analyze a given image stack in a polyalgorithmic framework. A polyalgorithm combines several basic algorithms, each meant to solve the same problem, producing a strategy that unites the strengths and circumvents the weaknesses of constituent algorithms. The foundation of the proposed TSA based polyalgorithm is laid using three algorithms (LandTrendR, EWMACD, and BFAST). These algorithms are precisely described mathematically, and chosen to be fundamentally distinct from each other in design and in the phenomena they capture. Analysis of results representing success, failure, and parameter sensitivity for each algorithm is presented. Scalability issues, important for real simulations, are also discussed, along with scalable implementations, and speedup results. For a given pixel, Hausdorff distance is used to compare the distance between the change times (breakpoints) obtained from two different algorithms. Timesync validation data, a dataset that is based on human interpretation of Landsat time series in concert with historical aerial photography, is used for validation. The polyalgorithm yields more accurate results than EWMACD and LandTrendR alone, but counterintuitively not better than BFAST alone. This nascent work will be directly useful in land use and land cover change studies, of interest to terrestrial science research, especially regarding anthropogenic impacts on the environment, and in much broader applications such as health monitoring and urban transportation. / M. S. / Numerous manmade satellites circling around the Earth regularly take pictures (images) of the Earth’s surface from up above. These images naturally provide information regarding the land cover of any given piece of land at the moment of capture (for e.g., whether the land area in the picture is covered with forests or with agriculture or housing). Therefore, for a fixed land area, if a person looks at a chronologically arranged series of images, any significant changes in land use can be identified. Identifying such changes is of critical importance, especially in this era where deforestation, urbanization, and global warming are major concerns.
The goal of this thesis is to investigate the design of methodologies (algorithms) that can efficiently and accurately use satellite images for answering questions regarding land cover trend and change. Experience shows that the state-of-the-art methodologies produce great results for the region they were originally designed on but their performance on other regions is unpredictable. In this work, therefore, a ‘polyalgorithm’ is proposed. A ‘polyalgorithm’ utilizes multiple simple methodologies and strategically combines them so that the outcome is better than the individual components. In this introductory work, three component methodologies are utilized; each component methodology is capable of capturing phenomenon different from the other two. Mathematical formulation of each component methodology is presented. Initial strategy for combining the three component algorithms is proposed. The outcomes of each component methodology as well the polyalgorithm are tested on human interpreted data. The strengths and limitations of each methodology are also discussed. Efficiency of the codes used for implementing the polyalgorithm is also discussed; this is important because the satellite data that needs to be processed is known to be huge (petabytes sized already and growing). This nascent work will be directly useful especially in understanding the impact of human activities on the environment. It will also be useful in other applications such as health monitoring and urban transportation.
|
292 |
Objective-Driven Strategies for HPC Job SchedulingGoponenko, Alexander V 01 January 2024 (has links) (PDF)
As High-Performance Computing (HPC) becomes increasingly prevalent and resource-intensive, there is a growing need for the development of more efficient job schedulers, which play a crucial role in the performance of HPC clusters. This dissertation manifests a comprehensive approach to this complex issue, contributing to three major components of the problem: (1) metrics of job packing efficiency and fairness, (2) advanced scheduling algorithms, and (3) job resource utilization prediction techniques.
To ensure high relevance of the results, this study emphasizes scheduling objectives. Therefore, scheduling quality metrics are investigated first, yielding a set of metrics that allow comparing alternative schedules and evaluating scheduling goals trade-offs. The set of metrics enables the first comprehensive analysis of effects of different scheduling improvement approaches on several aspects of scheduling quality, covering a variety of list scheduling algorithms as well as constraint programming optimization schedulers. The contribution to the third research area covers techniques to measure and estimate resource usage data. It reports a first-of-a-kind evaluation of various job runtime prediction techniques in improving scheduling quality, demonstrates an approach capable of estimating job parameters beyond the runtime, and explores measuring resources consumed by a job in an HPC cluster.
The dissertation concludes with a practical demonstration of these concepts through an I/O-aware scheduling prototype that measures real-time resource utilization, autonomously determines job resource requirements the scheduler needs, and implements full-featured multi-resource backfill scheduling that accounts for the specific properties of the parallel file system bandwidth resource. The study exhibits the advantages of further reducing I/O congestion—beyond the capability of generic I/O-aware scheduling—and presents the Workload-adaptive scheduling strategy that attains such improvement. This approach features a “two-group” approximation technique to maintain efficient performance regardless of zero-throughput job availability. An evaluation conducted on a real HPC cluster demonstrates the effectiveness of the novel strategy.
|
293 |
Advances in High Performance Computing Through Concurrent Data Structures and Predictive SchedulingLamar, Kenneth M 01 January 2024 (has links) (PDF)
Modern High Performance Computing (HPC) systems are made up of thousands of server-grade compute nodes linked through a high-speed network interconnect. Each node has tens or even hundreds of CPU cores each, with counts continuing to grow on newer HPC clusters. This results in a need to make use of millions of cores per cluster. Fully leveraging these resources is difficult. There is an active need to design software that scales and fully utilizes the hardware. In this dissertation, we address this gap with a dual approach, considering both intra-node (single node) and inter-node (across node) concerns. To aid in intra-node performance, we propose two novel concurrent data structures: a transactional vector and a persistent hash map. These designs have broad applicability in any multi-core environment but are particularly useful in HPC, which commonly features many cores per node. For inter-node performance, we propose a metrics-driven approach to improve scheduling quality, using predicted run times to backfill jobs more accurately and aggressively. This is augmented using application input parameters to further improve these run time predictions. Improved scheduling reduces the number of idle nodes in an HPC cluster, maximizing job throughput. We find that our data structures outperform the prior state-of-the-art while offering additional features. Our backfill technique likewise outperforms previous approaches in simulations, and our run time predictions were significantly more accurate than conventional approaches. Code for these works is freely available, and we have plans to deploy these techniques more broadly on real HPC systems in the future.
|
294 |
Parallel programming on General Block Min Max CriterionLee, ChuanChe 01 January 2006 (has links)
The purpose of the thesis is to develop a parallel implementation of the General Block Min Max Criterion (GBMM). This thesis deals with two kinds of parallel overheads: Redundant Calculations Parallel Overhead (RCPO) and Communication Parallel Overhead (CPO).
|
295 |
Multi-level extensions for the fast and robust overlapping Schwarz preconditionersRöver, Friederike 14 June 2023 (has links)
Der GDSW-Vorkonditionierer ist ein zweistufiges überlappendes Schwarz-Gebietszerlegungsverfahren mit einem energieminimierenden Grobraum, dessen parallele Skalierbarkeit durch das direkt gelöste Grobproblem begrenzt ist. Zur Verbesserung der parallelen Skalierbarkeit wurde hier eine mehrstufige Erweiterung eingeführt. Für den Fall skalarer elliptischer Probleme wurde eine Konditionierungszahlschranke aufgestellt. Die parallele Implementierung wurde in das quelloffene ShyLU/FROSch Paket der Trilinos-Softwarebibliothek (http://trilinos.org) integriert und auf mehreren der leistungsstärksten Supercomputern der Welt (JUQUEEN, Forschungszentrum Jülich; SuperMUC-NG, LRZ Garching; Theta, Argonne Leadership Computing Facility, Argonne National Laboratory, USA) für Modellprobleme (Laplace und lineare Elastizität) getestet. Das angestrebte Ziel einer verbesserten parallelen Skalierbarkeit wurde erreicht, der Bereich der Skalierbarkeit wurde um mehr als eine Größenordnung erweitert.
Die größten Rechnungen verwendeten mehr als 200000 Prozessorkerne des Theta Supercomputers. Zudem wurde die Anwendung des GDSW-Vorkonditionierers auf ein vollständig gekoppeltes nichtlineare Deformations-Diffusions Problem in der Chemomechanik betrachtet.
|
296 |
Parallel likelihood calculations for phylogenetic treesHayward, Peter 12 1900 (has links)
Thesis (MSc)--Stellenbosch University, 2011. / ENGLISH ABSTRACT: Phylogenetic analysis is the study of evolutionary relationships among organisms.
To this end, phylogenetic trees, or evolutionary trees, are used to
depict the evolutionary relationships between organisms as reconstructed from
DNA sequence data. The likelihood of a given tree is commonly calculated
for many purposes including inferring phylogenies, sampling from the space of
likely trees and inferring other parameters governing the evolutionary process.
This is done using Felsenstein’s algorithm, a widely implemented dynamic
programming approach that reduces the computational complexity from exponential
to linear in the number of taxa. However, with the advent of efficient
modern sequencing techniques the size of data sets are rapidly increasing beyond
current computational capability.
Parallel computing has been used successfully to address many similar
problems and is currently receiving attention in the realm of phylogenetic
analysis. Work has been done using data decomposition, where the likelihood
calculation is parallelised over DNA sequence sites. We propose an alternative
way of parallelising the likelihood calculation, which we call segmentation,
where the tree is broken down into subtrees and the likelihood of each subtree
is calculated concurrently over multiple processes. We introduce our proposed
system, which aims to drastically increase the size of trees that can be practically
used in phylogenetic analysis. Then, we evaluate the system on large
phylogenies which are constructed from both real and synthetic data, to show
that a larger decrease of run times are obtained when the system is used. / AFRIKAANSE OPSOMMING:Filogenetiese analise is die studie van evolusionêre verwantskappe tussen
organismes. Filogenetiese of evolusionêre bome word aangewend om die evolusionêre
verwantskappe, soos herwin vanuit DNS-kettings data, tussen organismes
uit te beeld. Die aanneemlikheid van ’n gegewe filogenie word oor die
algemeen bereken en aangewend vir menigte doeleindes, insluitende die afleiding
van filogenetiese bome, om te monster vanuit ’n versameling van sulke
moontlike bome en vir die afleiding van ander belangrike parameters in die evolusionêre
proses. Dit word vermag met behulp van Felsenstein se algoritme,
’n alombekende benaderingwyse wat gebruik maak van dinamiese programmering
om die berekeningskompleksiteit van eksponensieel na lineêr in die aantal
taxa, te herlei. Desnieteenstaande, het die koms van moderne, doeltreffender
orderingsmetodes groter datastelle tot gevolg wat vinnig besig is om bestaande
berekeningsvermoë te oorskry.
Parallelle berekeningsmetodes is reeds suksesvol toegepas om vele soortgelyke
probleme op te los, met groot belangstelling tans in die sfeer van filogenetiese
analise. Werk is al gedoen wat gebruik maak van data dekomposisie, waar
die aanneemlikheidsberekening oor die DNS basisse geparallelliseer word. Ons
stel ’n alternatiewe metode voor, wat ons segmentasie noem, om die aanneemlikheidsberekening
te parallelliseer, deur die filogenetiese boom op te breek in
sub-bome, en die aanneemlikheid van elke sub-boom gelyklopend te bereken
oor verskeie verwerkingseenhede. Ons stel ’n stelsel voor wat dit ten doel het
om ’n drastiese toename in die grootte van die bome wat gebruik kan word in
filogenetiese analise, teweeg te bring. Dan, word ons voorgestelde stelsel op
groot filogenetiese bome, wat vanaf werklike en sintetiese data gekonstrueer is,
evalueer. Dit toon aan dat ’n groter afname in looptyd verkry word wanneer
die stelsel in gebruik is.
|
297 |
A High Order Finite Difference Method for Simulating Earthquake Sequences in a Poroelastic MediumTorberntsson, Kim, Stiernström, Vidar January 2016 (has links)
Induced seismicity (earthquakes caused by injection or extraction of fluids in Earth's subsurface) is a major, new hazard in the United States, the Netherlands, and other countries, with vast economic consequences if not properly managed. Addressing this problem requires development of predictive simulations of how fluid-saturated solids containing frictional faults respond to fluid injection/extraction. Here we present a numerical method for linear poroelasticity with rate-and-state friction faults. A numerical method for approximating the fully coupled linear poroelastic equations is derived using the summation-by-parts-simultaneous-approximation-term (SBP-SAT) framework. Well-posedness is shown for a set of physical boundary conditions in 1D and in 2D. The SBP-SAT technique is used to discretize the governing equations and show semi-discrete stability and the correctness of the implementation is verified by rigorous convergence tests using the method of manufactured solutions, which shows that the expected convergence rates are obtained for a problem with spatially variable material parameters. Mandel's problem and a line source problem are studied, where simulation results and convergence studies show satisfactory numerical properties. Furthermore, two problem setups involving fault dynamics and slip on faults triggered by fluid injection are studied, where the simulation results show that fluid injection can trigger earthquakes, having implications for induced seismicity. In addition, the results show that the scheme used for solving the fully coupled problem, captures dynamics that would not be seen in an uncoupled model. Future improvements involve imposing Dirichlet boundary conditions using a different technique, extending the scheme to handle curvilinear coordinates and three spatial dimensions, as well as improving the high-performance code and extending the study of the fault dynamics.
|
298 |
HYBRID PARALLELIZATION OF THE NASA GEMINI ELECTROMAGNETIC MODELING TOOLJohnson, Buxton L., Sr. 01 January 2017 (has links)
Understanding, predicting, and controlling electromagnetic field interactions on and between complex RF platforms requires high fidelity computational electromagnetic (CEM) simulation. The primary CEM tool within NASA is GEMINI, an integral equation based method-of-moments (MoM) code for frequency domain electromagnetic modeling. However, GEMINI is currently limited in the size and complexity of problems that can be effectively handled. To extend GEMINI’S CEM capabilities beyond those currently available, primary research is devoted to integrating the MFDlib library developed at the University of Kentucky with GEMINI for efficient filling, factorization, and solution of large electromagnetic problems formulated using integral equation methods. A secondary research project involves the hybrid parallelization of GEMINI for the efficient speedup of the impedance matrix filling process. This thesis discusses the research, development, and testing of the secondary research project on the High Performance Computing DLX Linux supercomputer cluster. Initial testing of GEMINI’s existing MPI parallelization establishes the benchmark for speedup and reveals performance issues subsequently solved by the NASA CEM Lab. Implementation of hybrid parallelization incorporates GEMINI’s existing course level MPI parallelization with Open MP fine level parallel threading. Simple and nested Open MP threading are compared. Final testing documents the improvements realized by hybrid parallelization.
|
299 |
Contribution à la modélisation numérique de la propagation des ondes sismiques sur architectures multicœurs et hiérarchiquesDupros, Fabrice 13 December 2010 (has links)
En termes de prévention du risque associé aux séismes, la prédiction quantitative des phénomènes de propagation et d'amplification des ondes sismiques dans des structures géologiques complexes devient essentielle. Dans ce domaine, la simulation numérique est prépondérante et l'exploitation efficace des techniques de calcul haute performance permet d'envisager les modélisations à grande échelle nécessaires dans le domaine du risque sismique.Plusieurs évolutions récentes au niveau de l'architecture des machines parallèles nécessitent l'adaptation des algorithmes classiques utilisées pour la modélisation sismique. En effet, l'augmentation de la puissance des processeurs se traduit maintenant principalement par un nombre croissant de cœurs de calcul et les puces multicœurs sont maintenant à la base de la majorité des architectures multiprocesseurs. Ce changement correspond également à une plus grande complexité au niveau de l'organisation physique de la mémoire qui s'articule généralement autour d'une architecture NUMA (Non Uniform Memory Access pour accès mémoire non uniforme) de profondeur importante.Les contributions de cette thèse se situent à la fois au niveau algorithmique et numérique mais abordent également l'articulation avec les supports d'exécution optimisés pour les architectures multicœurs. Les solutions retenues sont validées à grande échelle en considérant deux exemples de modélisation sismique. Le premier cas se situe dans la préfecture de Niigata-Chuetsu au Japon (événement du 16 juillet 2007) et repose sur la méthode des différences finies. Le deuxième exemple met en œuvre la méthode des éléments finis. Un séisme hypothétique dans la région de Nice est modélisé en tenant compte du comportement non linéaire du sol. / One major goal of strong motion seismology is the estimation of damage in future earthquake scenarios. Simulation of large scale seismic wave propagation is of great importance for efficient strong motion analysis and risk mitigation. Being particularly CPU-consuming, this three-dimensional problem makes use of high-performance computing technologies to make realistic simulation feasible on a regional scale at relatively high frequencies.Several evolutions at the chip level have an important impact on the performance of classical implementation of seismic applications. The trend in parallel computing is to increase the number of cores available at the shared-memory level with possible non-uniform cost of memory accesses. The increasing number of cores per processor and the effort made to overcome the limitation of classical symmetric multiprocessors SMP systems make available a growing number of NUMA (Non Uniform Memory Access) architecture as computing node. We therefore need to consider new approaches more suitable to such parallel systems.This PhD work addresses both the algorithmic issues and the integration of efficient programming models for multicore architectures. The proposed contributions are validated with two large scale examples. The first case is the modeling of the 2007 Niigata-Chuetsu, Japan earthquake based on the finite differences numerical method. The second example considers a potential seismic event in the Nice sedimentary basin in the French Riviera. The finite elements method is used and the nonlinear soil behavior is taken into account.
|
300 |
Accélération matérielle pour l’imagerie sismique : modélisation, migration et interprétation / Hardware acceleration for seismic imaging : modeling, migration and interpretationAbdelkhalek, Rached 20 December 2013 (has links)
La donnée sismique depuis sa conception (modélisation d’acquisitions sismiques), dans sa phase de traitement (prétraitement et migration) et jusqu’à son exploitation pour en extraire les informations géologiques pertinentes nécessaires à l’identification et l’exploitation optimale des réservoirs d’hydrocarbures (interprétation), génère un volume important de calculs. Nous montrons dans ce travail de thèse qu’à chacune de ces étapes l’utilisation de technologies accélératrices de type GPGPU permet de réduire radicalement les temps de calcul tout en restant dans une enveloppe de consommation électrique raisonnable. Nous présentons et analysons les éléments sous-jacents à ces performances. L’importance de l’utilisation de motifs d’accès mémoire adéquats est particulièrement mise en exergue étant donné que l’accès à la mémoire représente le principal goulot d’étranglement pour les algorithmes abordés. Nous reportons des facteurs d’accélération de l’ordre de 40 pour la modélisation sismique par résolution de l’équation d’onde par différences finies (brique de base pour la modélisation et l’imagerie sismique) et entre 8 et 113 pour le calcul d’attributs sismiques. Nous démontrons que l’utilisation d’accélérateurs matériels élargit considérablement le champ du possible, aussi bien en imagerie sismique (modélisation de nouveaux types d’acquisitions à grande échelle) qu’en interprétation (calcul d’attributs complexes sur station de travail, paramétrage interactif des calculs, etc.). / During the seismic imaging workflow, from seismic modeling to interpretation, processingseismic data requires a massive amount of computation. We show in this work that, at eachstage of this workflow, hardware accelerators such as GPUs may help reducing the time requiredto process seismic data while staying at reasonable energy consumption levels.In this work, the key programming considerations needed to achieve good performance are describedand discussed. The importance of adapted in-memory data access patterns is particularlyemphasised since data access is the main bottleneck for the considered algorithms. When usingGPUs, speedup ratios of 40× are achieved for FDTD seismic modeling, and 8× up to 113× forseismic attribute computation compared to CPUs.
|
Page generated in 0.118 seconds