Global ETD Search

221	Asynchronous Task-Based Parallelism in Seismic Imaging and Reservoir Modeling Simulations AlOnazi, Amani 26 August 2019 (has links) The components of high-performance systems continue to become more complex on the road to exascale. This complexity is exposed at the level of: multi/many-core CPUs, accelerators (GPUs), interconnects (horizontal communication), and memory hierarchies (vertical communication). A crucial task is designing an algorithm and a programming model that scale to the same order of the HPC system size at multiple levels. This trend in HPC architecture more critically affects memory-intensive appli- cations than compute-bound applications. Accomplishing this task involves adopting less synchronous forms of the mathematical algorithm, reducing synchronization in the computational implementation, introducing more SIMT-style concurrency at the finest level of system hierarchy, and increasing arithmetic intensity as the bottleneck shifts from number of floating-point operations to number of memory accesses. This dissertation addresses these challenges in scientific simulation focusing in the dominant kernels of a memory-bound application: sparse solvers in implicit model- ing, and I/O in explicit reverse time migration in seismic imaging. We introduce asynchronous task-based parallelism into iterative algebraic preconditioners. We also introduce a task-based framework that hides the latency of I/O with computation. This dissertation targets two main applications in the oil and gas industry: reservoir simulation and seismic imaging simulation. It presents results on multi- and many- core systems and GPUs on four Top500 supercomputers: Summit, TSUBAME 3.0, Shaheen II, and Makman-2. We introduce an asynchronous implementation of four major memory-bound kernels: Algebraic multigrid (MPI+OmpSs), tridiagonal solve (MPI+OpenMP), Additive Schwarz Preconditioned Inexact Newton (MPI+MPI), and Reverse Time Migration (StarPU/StarPU+MPI and CUDA). Asynchronous Algorithms Task-based runtimes MPI+X approach Task-based RTM Asynchronous AMG
222	Analýza technologií pro distribuci výpočtu při lámání hesel / Analysis of Distributed Computing Technologies for Password Cracking Mráz, Patrik January 2019 (has links) The goal of this thesis is to analyze the technologies for distributed computing in password cracking. Distribution is a key factor regarding the total time of cracking the password which can sometimes take up to tens of years. In the introductory section we take a look at the general password cracking, types of attacks and the most popular tools. Next we address the GPU parallelization as well as the need of distributed computing on multiple computers. We look at all kinds of technologies, such as VirtualCL, BOINC, MPI and analyze their usability in password cracking. We examine each technology's performance, efficiency, scalability and adaptability when given pre-defined conditions. Part of this thesis is a design and implementation of distributed password cracking using MPI technology along with Hashcat, a self-proclaimed World's fastest password cracker.
223	Neblokující vstup/výstup pro projekt k-Wave / Non-Blocking Input/Output for the k-Wave Toolbox Kondula, Václav January 2020 (has links) This thesis deals with an implementation of non-blocking I/O interface for the k-Wave project, which is designed for time-domain simulation of ultrasound propagation. Main focus is on large domain simulations that, due to high computing power requirements, must run on supercomputers and produce tens of GB of data in a single simulation step. In this thesis, I have designed and implemented a non-blocking interface for storing data using dedicated threads, which allows to overlap simulation calculations with disk operations in order to speed up the simulation. An acceleration of up to 33% was achieved compared to the current implementation of project k-Wave, which resulted, among other things, also to reduce cost of the simulation.
224	Instalace a konfigurace Octave výpočetního clusteru / Installation and configuration of Octave computation cluster Mikulka, Zdeněk January 2014 (has links) This diploma thesis contains detailed design of high-performance cluster, primarely focused for parallel computing in Octave application. Each of component of this cluster is described along with instructions for installation and configuration. Cluster is based on GNU/Linux operating system and Message Parsing Interface. Design alllows implementation of this cluster in computers of schoolroom with active lessons.
225	Nutzung von MPI für parallele FEM-Systeme Grabowsky, L., Ermer, Th., Werner, J. 30 October 1998 (has links) Der Standard des Message Passing Interfaces (MPI) stellt dem Entwickler paralleler Anwendungen ein mächtiges Werkzeug zur Verfügung, seine Softwa- re effizient und weitgehend unabhängig von Details des parallelen Systems zu entwerfen. Im Rahmen einer Projektarbeit erfolgte die Umstellung der Kommunikationsbibliothek eines bestehenden FEM-Programmes auf den MPI-Mechanismus. Die Ergebnisse werden in der hier gegebenen Beschreibung der Cubecom-Implementierung zusammengefasst. In einem zweiten Teil dieser Arbeit wird untersucht, auf welchem Wege mit der in MPI verfügbaren Funktionalität auch die Koppelrandkommunikation mit einem einheitlichen und effizienten Verfahren durchgeführt werden kann. Sowohl fuer die Basisimplementierung als auch die MPI-basierte Koppelrandkommunikation wird die Effizienz untersucht und ein Ausblick auf weitere Anwendungsmoeglichkeiten gegeben. info:eu-repo/classification/ddc/004 ddc:004 MPI FEM MSC 65Y05 MSC 65N30
226	Erfahrungen bei der Installation und vergleichende Messungen zu verschiedenen MPI Implementierungen auf einem Dual Xeon Cluster Trautmann, Sven 02 July 2003 (has links) Workshop Mensch-Computer-Vernetzung info:eu-repo/classification/ddc/004 ddc:004 Cluster <Rechnernetz> MPI <Schnittstelle> SMP
227	developing a VIA-RPI for LAM Engler, Ralph, Wenzel, Tobias 30 January 2004 (has links) Development of an RPIs (Request Progression Interface = communication device) that uses VIA (virtual Interface Architecture) instead of TCP on ethernet networks. / Entwicklung eines RPIs (Request Progression Interface = Kommunikations Modul) das auf ethernet Netzwerken VIA (virtual Interface Architecture) an Stelle von TCP benutzt. info:eu-repo/classification/ddc/004 ddc:004 MPI LAM RPI VIA
228	Erweiterung eines existierenden Infiniband Benchmarks Viertel, Carsten 01 June 2006 (has links) Infiniband wird zunehmend als Verbindungsnetzwerk für Cluster eingesetzt. Dadurch wird es nötig existierende Bibliotheken für parallele Programmiersprachen an das neue Netzwerk bestmöglich anzupassen. Ein wichtiger Bestandteil paralleler Programmiersprachen sind dabei kollektive Operationen, die es erfordern, eine Nachricht von einem Knoten zu vielen anderen oder auch von vielen Knoten an einen einzelnen zu senden. Um herauszufinden, welche Verbindungsarten und Operationen am besten für diese kollektiven Operationen geeignet sind, wurde ein Benchmark entwickelt. Ziel dieser Studienarbeit ist es, dieses Programm zu erweitern, auf einem Cluster zu testen und die Ergebnisse auszuwerten. info:eu-repo/classification/ddc/004 ddc:004 Benchmark Cluster MPI InfiniBand Kollektive Operationen Multicast
229	Erweiterung der Infinibandunterstützung von netgauge Dietze, Stefan 25 February 2009 (has links) Diese Arbeit beschäftigt sich mit der Erweiterung des Infiniband-Moduls von netgauge. Dem Modul werden die nicht blockierenden Kommunikationsfunktionen hinzugefügt. Es wird auf die Implementierung dieser Funktionen und die verwendeten Algorithmen eingegangen. Weiterhin werden die Ergebnisse der Messungen bewertet und mit den Messergebnissen der blockierenden Funktionen verglichen. Betrachtet werden dabei die Bandbreite und Latenz der 1:1 Kommunikation. Die Messungen wurden sowohl auf Infinpath als auch auf Mellanox Hardware vorgenommen. info:eu-repo/classification/ddc/000 ddc:000 Computerunterstützte Kommunikation Latenzzeit <Informatik> MPI Netzwerk Parallelverarbeitung
230	A Performance Evaluation of MPI Shared Memory Programming / En utvärdering av MPI shared memory - programmering med inriktning på prestanda Karlbom, David January 2016 (has links) The thesis investigates the Message Passing Interface (MPI) support for shared memory programming on modern hardware architecture with multiple Non-Uniform Memory Access (NUMA) domains. We investigate its performance in two case studies: the matrix-matrix multiplication and Conway’s game of life. We compare MPI shared memory performance in terms of execution time and memory consumption with the performance of implementations using OpenMP and MPI point-to-point communication, also called "MPI two-sided". We perform strong scaling tests in both test cases. We observe that MPI two-sided implementation is 21% and 18% faster than the MPI shared and OpenMP implementations respectively in the matrix-matrix multiplication when using 32 processes. MPI shared uses less memory space: when compared to MPI two-sided, MPI shared uses 45% less memory. In the Conway’s game of life, we find that MPI two-sided implementation is 10% and 82% faster than the MPI shared and OpenMP implementations respectively when using 32 processes. We also observe that not mapping virtual memory to a specific NUMA domain can lead to an increment in execution time of 64% when using 32 processes. The use of MPI shared is viable for intranode communication on modern hardware architecture with multiple NUMA domains. / I detta examensarbete undersöker vi Message Passing Inferfaces (MPI) support för shared memory programmering på modern hårdvaruarkitektur med flera Non-Uniform Memory Access (NUMA) domäner. Vi undersöker prestanda med hjälp av två fallstudier: matris-matris multiplikation och Conway’s game of life. Vi jämför prestandan utav MPI shared med hjälp utav exekveringstid samt minneskonsumtion jämtemot OpenMP och MPI punkt-till-punkt kommunikation, även känd som MPI two-sided. Vi utför strong scaling tests för båda fallstudierna. Vi observerar att MPI-two sided är 21% snabbare än MPI shared och 18% snabbare än OpenMP för matris-matris multiplikation när 32 processorer användes. För samma testdata har MPI shared en 45% lägre minnesförburkning än MPI two-sided. För Conway’s game of life är MPI two-sided 10% snabbare än MPI shared samt 82% snabbare än OpenMP implementation vid användandet av 32 processorer. Vi kunde också utskilja att om ingen mappning av virtuella minnet till en specifik NUMA domän görs, leder det till en ökning av exekveringstiden med upp till 64% när 32 processorer används. Vi kom fram till att MPI shared är användbart för intranode kommunikation på modern hårdvaruarkitektur med flera NUMA domäner. MPI OpenMP Shared Memory HPC Parallel Programming NUMA Computer Sciences Datavetenskap (datalogi)

Search results