Global ETD Search

291	Characterization of Sparsity-aware Optimization Paths for Graph Traversal on FPGA Gondhalekar, Atharva 25 May 2023 (has links) Breath-first search (BFS) is a fundamental building block in many graph-based applications, but it is difficult to optimize for a field-programmable gate array (FPGA) due to its irregular memory-access patterns. Prior work, based on hardware description languages (HDLs) and high-level synthesis (HLS), address the memory-access bottleneck of BFS by using techniques such as data alignment and compute-unit replication on FPGAs. The efficacy of such optimizations depends on factors such as the sparsity of target graph datasets. Optimizations intended for sparse graphs may not work as effectively for dense graphs on an FPGA and vice versa. This thesis presents two sets of FPGA optimization strategies for BFS, one for near-hypersparse graphs and the other designed for sparse to moderately dense graphs. For near-hypersparse graphs, a queue-based kernel with maximal use of local memory on FPGA is implemented. For denser graphs, an array-based kernel with compute-unit replication is implemented. Across a diverse collection of graphs, our OpenCL optimization strategies for near-hypersparse graphs delivers a 5.7x to 22.3x speedup over a state-of-the-art OpenCL implementation, when evaluated on an Intel Stratix~10 FPGA. The optimization strategies for sparse to moderately dense graphs deliver 1.1x to 2.3x speedup over a state-of-the-art OpenCL implementation on the same FPGA. Finally, this work uses graph metrics such as average degree and Gini coefficient to observe the impact of graph properties on the performance of the proposed optimization strategies. / M.S. / A graph is a data structure that typically consists of two sets -- a set of vertices and a set of edges representing connections between the vertices. Graphs are used in a broad set of application domains such as the testing and verification of digital circuits, data mining of social networks, and analysis of road networks. In such application areas, breadth-first search (BFS) is a fundamental building block. BFS is used to identify the minimum number of edges needed to be traversed from a source vertex to one or many destination vertices. In recent years, several attempts have been made to optimize the performance of BFS on reconfigurable architectures such as field-programmable gate arrays (FPGAs). However, the optimization strategies for BFS are not necessarily applicable to all types of graphs. Moreover, the efficacy of such optimizations oftentimes depends on the sparsity of input graphs. To that end, this work presents optimization strategies for graphs with varying levels of sparsity. Furthermore, this work shows that by tailoring the BFS design based on the sparsity of the input graph, significant performance improvements are obtained over the state-of-the-art BFS implementations on an FPGA. High performance computing Reconfigurable computing Graph traversal Field-programmable gate arrays
292	Parallel programming on General Block Min Max Criterion Lee, ChuanChe 01 January 2006 (has links) The purpose of the thesis is to develop a parallel implementation of the General Block Min Max Criterion (GBMM). This thesis deals with two kinds of parallel overheads: Redundant Calculations Parallel Overhead (RCPO) and Communication Parallel Overhead (CPO). Parallel programming (Computer science) High performance computing Computer algorithms Parallel algorithms Computer algorithms High performance computing Parallel algorithms Parallel programming (Computer science) Software Engineering
293	Multi-level extensions for the fast and robust overlapping Schwarz preconditioners Röver, Friederike 14 June 2023 (has links) Der GDSW-Vorkonditionierer ist ein zweistufiges überlappendes Schwarz-Gebietszerlegungsverfahren mit einem energieminimierenden Grobraum, dessen parallele Skalierbarkeit durch das direkt gelöste Grobproblem begrenzt ist. Zur Verbesserung der parallelen Skalierbarkeit wurde hier eine mehrstufige Erweiterung eingeführt. Für den Fall skalarer elliptischer Probleme wurde eine Konditionierungszahlschranke aufgestellt. Die parallele Implementierung wurde in das quelloffene ShyLU/FROSch Paket der Trilinos-Softwarebibliothek (http://trilinos.org) integriert und auf mehreren der leistungsstärksten Supercomputern der Welt (JUQUEEN, Forschungszentrum Jülich; SuperMUC-NG, LRZ Garching; Theta, Argonne Leadership Computing Facility, Argonne National Laboratory, USA) für Modellprobleme (Laplace und lineare Elastizität) getestet. Das angestrebte Ziel einer verbesserten parallelen Skalierbarkeit wurde erreicht, der Bereich der Skalierbarkeit wurde um mehr als eine Größenordnung erweitert. Die größten Rechnungen verwendeten mehr als 200000 Prozessorkerne des Theta Supercomputers. Zudem wurde die Anwendung des GDSW-Vorkonditionierers auf ein vollständig gekoppeltes nichtlineare Deformations-Diffusions Problem in der Chemomechanik betrachtet. info:eu-repo/classification/ddc/510 ddc:510 Gebietszerlegungsmethode Hochleistungsrechnen Parallelrechner Skalierbarkeit
294	Parallel likelihood calculations for phylogenetic trees Hayward, Peter 12 1900 (has links) Thesis (MSc)--Stellenbosch University, 2011. / ENGLISH ABSTRACT: Phylogenetic analysis is the study of evolutionary relationships among organisms. To this end, phylogenetic trees, or evolutionary trees, are used to depict the evolutionary relationships between organisms as reconstructed from DNA sequence data. The likelihood of a given tree is commonly calculated for many purposes including inferring phylogenies, sampling from the space of likely trees and inferring other parameters governing the evolutionary process. This is done using Felsenstein’s algorithm, a widely implemented dynamic programming approach that reduces the computational complexity from exponential to linear in the number of taxa. However, with the advent of efficient modern sequencing techniques the size of data sets are rapidly increasing beyond current computational capability. Parallel computing has been used successfully to address many similar problems and is currently receiving attention in the realm of phylogenetic analysis. Work has been done using data decomposition, where the likelihood calculation is parallelised over DNA sequence sites. We propose an alternative way of parallelising the likelihood calculation, which we call segmentation, where the tree is broken down into subtrees and the likelihood of each subtree is calculated concurrently over multiple processes. We introduce our proposed system, which aims to drastically increase the size of trees that can be practically used in phylogenetic analysis. Then, we evaluate the system on large phylogenies which are constructed from both real and synthetic data, to show that a larger decrease of run times are obtained when the system is used. / AFRIKAANSE OPSOMMING:Filogenetiese analise is die studie van evolusionêre verwantskappe tussen organismes. Filogenetiese of evolusionêre bome word aangewend om die evolusionêre verwantskappe, soos herwin vanuit DNS-kettings data, tussen organismes uit te beeld. Die aanneemlikheid van ’n gegewe filogenie word oor die algemeen bereken en aangewend vir menigte doeleindes, insluitende die afleiding van filogenetiese bome, om te monster vanuit ’n versameling van sulke moontlike bome en vir die afleiding van ander belangrike parameters in die evolusionêre proses. Dit word vermag met behulp van Felsenstein se algoritme, ’n alombekende benaderingwyse wat gebruik maak van dinamiese programmering om die berekeningskompleksiteit van eksponensieel na lineêr in die aantal taxa, te herlei. Desnieteenstaande, het die koms van moderne, doeltreffender orderingsmetodes groter datastelle tot gevolg wat vinnig besig is om bestaande berekeningsvermoë te oorskry. Parallelle berekeningsmetodes is reeds suksesvol toegepas om vele soortgelyke probleme op te los, met groot belangstelling tans in die sfeer van filogenetiese analise. Werk is al gedoen wat gebruik maak van data dekomposisie, waar die aanneemlikheidsberekening oor die DNS basisse geparallelliseer word. Ons stel ’n alternatiewe metode voor, wat ons segmentasie noem, om die aanneemlikheidsberekening te parallelliseer, deur die filogenetiese boom op te breek in sub-bome, en die aanneemlikheid van elke sub-boom gelyklopend te bereken oor verskeie verwerkingseenhede. Ons stel ’n stelsel voor wat dit ten doel het om ’n drastiese toename in die grootte van die bome wat gebruik kan word in filogenetiese analise, teweeg te bring. Dan, word ons voorgestelde stelsel op groot filogenetiese bome, wat vanaf werklike en sintetiese data gekonstrueer is, evalueer. Dit toon aan dat ’n groter afname in looptyd verkry word wanneer die stelsel in gebruik is. Phylogenetics Bioinformatics High-performance computing Parallel programming Dissertations -- Computer science Theses -- Computer science Dissertations -- Mathematics Theses -- Mathematics
295	A High Order Finite Difference Method for Simulating Earthquake Sequences in a Poroelastic Medium Torberntsson, Kim, Stiernström, Vidar January 2016 (has links) Induced seismicity (earthquakes caused by injection or extraction of fluids in Earth's subsurface) is a major, new hazard in the United States, the Netherlands, and other countries, with vast economic consequences if not properly managed. Addressing this problem requires development of predictive simulations of how fluid-saturated solids containing frictional faults respond to fluid injection/extraction. Here we present a numerical method for linear poroelasticity with rate-and-state friction faults. A numerical method for approximating the fully coupled linear poroelastic equations is derived using the summation-by-parts-simultaneous-approximation-term (SBP-SAT) framework. Well-posedness is shown for a set of physical boundary conditions in 1D and in 2D. The SBP-SAT technique is used to discretize the governing equations and show semi-discrete stability and the correctness of the implementation is verified by rigorous convergence tests using the method of manufactured solutions, which shows that the expected convergence rates are obtained for a problem with spatially variable material parameters. Mandel's problem and a line source problem are studied, where simulation results and convergence studies show satisfactory numerical properties. Furthermore, two problem setups involving fault dynamics and slip on faults triggered by fluid injection are studied, where the simulation results show that fluid injection can trigger earthquakes, having implications for induced seismicity. In addition, the results show that the scheme used for solving the fully coupled problem, captures dynamics that would not be seen in an uncoupled model. Future improvements involve imposing Dirichlet boundary conditions using a different technique, extending the scheme to handle curvilinear coordinates and three spatial dimensions, as well as improving the high-performance code and extending the study of the fault dynamics. numerical analysis numerical methods numerical modeling SBP-SAT finite differences high performance computing geophysics geomechanics poroelasticity fault mechanics induced seismicity
296	HYBRID PARALLELIZATION OF THE NASA GEMINI ELECTROMAGNETIC MODELING TOOL Johnson, Buxton L., Sr. 01 January 2017 (has links) Understanding, predicting, and controlling electromagnetic field interactions on and between complex RF platforms requires high fidelity computational electromagnetic (CEM) simulation. The primary CEM tool within NASA is GEMINI, an integral equation based method-of-moments (MoM) code for frequency domain electromagnetic modeling. However, GEMINI is currently limited in the size and complexity of problems that can be effectively handled. To extend GEMINI’S CEM capabilities beyond those currently available, primary research is devoted to integrating the MFDlib library developed at the University of Kentucky with GEMINI for efficient filling, factorization, and solution of large electromagnetic problems formulated using integral equation methods. A secondary research project involves the hybrid parallelization of GEMINI for the efficient speedup of the impedance matrix filling process. This thesis discusses the research, development, and testing of the secondary research project on the High Performance Computing DLX Linux supercomputer cluster. Initial testing of GEMINI’s existing MPI parallelization establishes the benchmark for speedup and reveals performance issues subsequently solved by the NASA CEM Lab. Implementation of hybrid parallelization incorporates GEMINI’s existing course level MPI parallelization with Open MP fine level parallel threading. Simple and nested Open MP threading are compared. Final testing documents the improvements realized by hybrid parallelization. computational electromagnetics method of moments electric field integral equation hybrid parallelization high performance computing Electromagnetics and Photonics
297	Contribution à la modélisation numérique de la propagation des ondes sismiques sur architectures multicœurs et hiérarchiques Dupros, Fabrice 13 December 2010 (has links) En termes de prévention du risque associé aux séismes, la prédiction quantitative des phénomènes de propagation et d'amplification des ondes sismiques dans des structures géologiques complexes devient essentielle. Dans ce domaine, la simulation numérique est prépondérante et l'exploitation efficace des techniques de calcul haute performance permet d'envisager les modélisations à grande échelle nécessaires dans le domaine du risque sismique.Plusieurs évolutions récentes au niveau de l'architecture des machines parallèles nécessitent l'adaptation des algorithmes classiques utilisées pour la modélisation sismique. En effet, l'augmentation de la puissance des processeurs se traduit maintenant principalement par un nombre croissant de cœurs de calcul et les puces multicœurs sont maintenant à la base de la majorité des architectures multiprocesseurs. Ce changement correspond également à une plus grande complexité au niveau de l'organisation physique de la mémoire qui s'articule généralement autour d'une architecture NUMA (Non Uniform Memory Access pour accès mémoire non uniforme) de profondeur importante.Les contributions de cette thèse se situent à la fois au niveau algorithmique et numérique mais abordent également l'articulation avec les supports d'exécution optimisés pour les architectures multicœurs. Les solutions retenues sont validées à grande échelle en considérant deux exemples de modélisation sismique. Le premier cas se situe dans la préfecture de Niigata-Chuetsu au Japon (événement du 16 juillet 2007) et repose sur la méthode des différences finies. Le deuxième exemple met en œuvre la méthode des éléments finis. Un séisme hypothétique dans la région de Nice est modélisé en tenant compte du comportement non linéaire du sol. / One major goal of strong motion seismology is the estimation of damage in future earthquake scenarios. Simulation of large scale seismic wave propagation is of great importance for efficient strong motion analysis and risk mitigation. Being particularly CPU-consuming, this three-dimensional problem makes use of high-performance computing technologies to make realistic simulation feasible on a regional scale at relatively high frequencies.Several evolutions at the chip level have an important impact on the performance of classical implementation of seismic applications. The trend in parallel computing is to increase the number of cores available at the shared-memory level with possible non-uniform cost of memory accesses. The increasing number of cores per processor and the effort made to overcome the limitation of classical symmetric multiprocessors SMP systems make available a growing number of NUMA (Non Uniform Memory Access) architecture as computing node. We therefore need to consider new approaches more suitable to such parallel systems.This PhD work addresses both the algorithmic issues and the integration of efficient programming models for multicore architectures. The proposed contributions are validated with two large scale examples. The first case is the modeling of the 2007 Niigata-Chuetsu, Japan earthquake based on the finite differences numerical method. The second example considers a potential seismic event in the Nice sedimentary basin in the French Riviera. The finite elements method is used and the nonlinear soil behavior is taken into account. Calcul haute performance Modélisation sismique Architectures NUMA Processeurs multicœurs High performance computing Seismic modeling NUMA architecture Multicore processor
298	Accélération matérielle pour l’imagerie sismique : modélisation, migration et interprétation / Hardware acceleration for seismic imaging : modeling, migration and interpretation Abdelkhalek, Rached 20 December 2013 (has links) La donnée sismique depuis sa conception (modélisation d’acquisitions sismiques), dans sa phase de traitement (prétraitement et migration) et jusqu’à son exploitation pour en extraire les informations géologiques pertinentes nécessaires à l’identification et l’exploitation optimale des réservoirs d’hydrocarbures (interprétation), génère un volume important de calculs. Nous montrons dans ce travail de thèse qu’à chacune de ces étapes l’utilisation de technologies accélératrices de type GPGPU permet de réduire radicalement les temps de calcul tout en restant dans une enveloppe de consommation électrique raisonnable. Nous présentons et analysons les éléments sous-jacents à ces performances. L’importance de l’utilisation de motifs d’accès mémoire adéquats est particulièrement mise en exergue étant donné que l’accès à la mémoire représente le principal goulot d’étranglement pour les algorithmes abordés. Nous reportons des facteurs d’accélération de l’ordre de 40 pour la modélisation sismique par résolution de l’équation d’onde par différences finies (brique de base pour la modélisation et l’imagerie sismique) et entre 8 et 113 pour le calcul d’attributs sismiques. Nous démontrons que l’utilisation d’accélérateurs matériels élargit considérablement le champ du possible, aussi bien en imagerie sismique (modélisation de nouveaux types d’acquisitions à grande échelle) qu’en interprétation (calcul d’attributs complexes sur station de travail, paramétrage interactif des calculs, etc.). / During the seismic imaging workflow, from seismic modeling to interpretation, processingseismic data requires a massive amount of computation. We show in this work that, at eachstage of this workflow, hardware accelerators such as GPUs may help reducing the time requiredto process seismic data while staying at reasonable energy consumption levels.In this work, the key programming considerations needed to achieve good performance are describedand discussed. The importance of adapted in-memory data access patterns is particularlyemphasised since data access is the main bottleneck for the considered algorithms. When usingGPUs, speedup ratios of 40× are achieved for FDTD seismic modeling, and 8× up to 113× forseismic attribute computation compared to CPUs. Calcul haute performance GPGPU FPGA Imagerie sismique RTM Attributs sismiques High Performance Computing GPGPU FPGA Seismic Imaging RTM Seismic Attributes
299	A New Method for Modeling Free Surface Flows and Fluid-structure Interaction with Ocean Applications Lee, Curtis January 2016 (has links) <p>The computational modeling of ocean waves and ocean-faring devices poses numerous challenges. Among these are the need to stably and accurately represent both the fluid-fluid interface between water and air as well as the fluid-structure interfaces arising between solid devices and one or more fluids. As techniques are developed to stably and accurately balance the interactions between fluid and structural solvers at these boundaries, a similarly pressing challenge is the development of algorithms that are massively scalable and capable of performing large-scale three-dimensional simulations on reasonable time scales. This dissertation introduces two separate methods for approaching this problem, with the first focusing on the development of sophisticated fluid-fluid interface representations and the second focusing primarily on scalability and extensibility to higher-order methods.</p><p>We begin by introducing the narrow-band gradient-augmented level set method (GALSM) for incompressible multiphase Navier-Stokes flow. This is the first use of the high-order GALSM for a fluid flow application, and its reliability and accuracy in modeling ocean environments is tested extensively. The method demonstrates numerous advantages over the traditional level set method, among these a heightened conservation of fluid volume and the representation of subgrid structures.</p><p> </p><p>Next, we present a finite-volume algorithm for solving the incompressible Euler equations in two and three dimensions in the presence of a flow-driven free surface and a dynamic rigid body. In this development, the chief concerns are efficiency, scalability, and extensibility (to higher-order and truly conservative methods). These priorities informed a number of important choices: The air phase is substituted by a pressure boundary condition in order to greatly reduce the size of the computational domain, a cut-cell finite-volume approach is chosen in order to minimize fluid volume loss and open the door to higher-order methods, and adaptive mesh refinement (AMR) is employed to focus computational effort and make large-scale 3D simulations possible. This algorithm is shown to produce robust and accurate results that are well-suited for the study of ocean waves and the development of wave energy conversion (WEC) devices.</p> / Dissertation Engineering Applied mathematics computational fluid dynamics fluid-structure interaction high performance computing numerical modeling wave energy harvesting
300	[en] SOLVING LARGE SYSTEMS OF LINEAR EQUATIONS ON MULTI-GPU CLUSTERS USING THE CONJUGATE GRADIENT METHOD IN OPENCLTM / [pt] RESOLUÇÃO DE SISTEMAS DE EQUAÇÕES LINEARES DE GRANDE PORTE EM CLUSTERS MULTI-GPU UTILIZANDO O MÉTODO DO GRADIENTE CONJUGADO EM OPENCLTM ANDRE LUIS CAVALCANTI BUENO 27 September 2013 (has links) [pt] Sistemas de equações lineares esparsos e de grande porte aparecem como resultado da modelagem de vários problemas nas engenharias. Dada sua importância, muitos trabalhos estudam métodos para a resolução desses sistemas. Esta dissertação explora o potencial computacional de múltiplas GPUs, utilizando a tecnologia OpenCL, com a finalidade de resolver sistemas de equações lineares de grande porte. Na metodologia proposta, o método do gradiente conjugado é subdivido em kernels que são resolvidos por múltiplas GPUs. Para tal, se fez necessário compreender como a arquitetura das GPUs se relaciona com a tecnologia OpenCL a fim de obter um melhor desempenho. / [en] The process of modeling problems in the engineering fields tends to produce substantiously large systems of sparse linear equations. Extensive research has been done to devise methods to solve these systems. This thesis explores the computational potential of multiple GPUs, through the use of the OpenCL tecnology, aiming to tackle the solution of large systems of sparse linear equations. In the proposed methodology, the conjugate gradient method is subdivided into kernels, which are delegated to multiple GPUs. In order to achieve an efficient method, it was necessary to understand how the GPUs’ architecture communicates with OpenCL. [pt] GPGPU [en] GPGPU [pt] COMPUTACAO DE ALTO DESEMPENHO [en] HIGH PERFORMANCE COMPUTING [pt] METODO DO GRADIENTE CONJUGADO [pt] MULTI-GPU [pt] OPENCL

Search results