Global ETD Search

11	Profiling of RT-PICLS Code Kelling, Jeffrey, Juckeland, Guido January 2017 (has links) It was observed, that the RT-PICLS code ran by FWKT on the hypnos cluster was producing an unusual amount of system load, according to Ganglia metrics. Since this may point to an IO-problem in the code, this code was analyzed more closely. Hochleistungsrechnen, Profiling high-performance computing, profiling
12	Concepts and Prototype for a Collective Offload Unit Schneider, Timo, Eckelmann, Sven 18 June 2012 (has links) (PDF) Optimized implementations of blocking and nonblocking collective operations are most important for scalable high-performance applications. Offloading such collective operations into the communication layer can improve performance and asynchronous progression of the operations. However, it is most important that such offloading schemes remain flexible in order to support user-defined (sparse neighbor) collective communications. In this work we propose a design for a collective offload unit. Our hardware design is able to execute dependency graph based representations of collective functions. To cope with the scarcity of memory resources we designed a new point to point messaging protocol which does not need to store information about unexpected messages. The offload unit proposed in this thesis could be integrated into high performance networks such as EXTOLL. Our design achieves a clock frequency of 212 MHz on a Xilinx Virtex6 FPGA, while using less than 10% of the available logic slices and less than 30% of the available memory blocks. Due to the specialization of our design we can accelerate important tasks of the message passing framework, such as message matching by a factor of two, compared to a software implementation running on a CPU with a ten times higher clock speed. Hochleistungsrechnen Netzwerke Kommunikation HPC communication offload GOAL EXTOLL ddc:006 Hochleistungsrechnen Kommunikation Netzwerk
13	Strategien und Methoden zur Ausnutzung der High-Performance-Ressourcen moderner Rechnerarchitekturen für Finite-Element-Simulationen und ihre Realisierung in FEAST (Finite Element Analysis & Solution Tools) Becker, Christian January 2007 (has links) Zugl.: Dortmund, Univ., Diss., 2007
14	Verfahren und Werkzeuge zur Leistungsmessung, -analyse und -bewertung der Ein-, Ausgabeeinheiten von Rechensystemen Versick, Daniel January 2009 (has links) Zugl.: Rostock, Univ., Diss., 2009
15	SCARC ein verallgemeinertes Gebietszerlegungs-Mehrgitterkonzept auf Parallelrechnern / Kilian, Susanne. Unknown Date (has links) (PDF) Universiẗat, Diss., 2002--Dortmund. / Gedr. Ausg. im Logos-Verl., Berlin.
16	Dynamische Lastverteilung für middlewarebasierte Softwaresysteme in heterogenen Rechnerumgebungen Krüger, Thomas Moritz. Unknown Date (has links) (PDF) Techn. Hochsch., Diss., 2004--Aachen.
17	Concepts and Prototype for a Collective Offload Unit Schneider, Timo, Eckelmann, Sven 15 December 2011 (has links) Optimized implementations of blocking and nonblocking collective operations are most important for scalable high-performance applications. Offloading such collective operations into the communication layer can improve performance and asynchronous progression of the operations. However, it is most important that such offloading schemes remain flexible in order to support user-defined (sparse neighbor) collective communications. In this work we propose a design for a collective offload unit. Our hardware design is able to execute dependency graph based representations of collective functions. To cope with the scarcity of memory resources we designed a new point to point messaging protocol which does not need to store information about unexpected messages. The offload unit proposed in this thesis could be integrated into high performance networks such as EXTOLL. Our design achieves a clock frequency of 212 MHz on a Xilinx Virtex6 FPGA, while using less than 10% of the available logic slices and less than 30% of the available memory blocks. Due to the specialization of our design we can accelerate important tasks of the message passing framework, such as message matching by a factor of two, compared to a software implementation running on a CPU with a ten times higher clock speed.:1. Task Description 1.1. Theses 2. Introduction 2.1. Motivation 2.2. Outline of this Thesis 2.3. Related Work 2.3.1. NIC Based Packet Forwarding 2.3.2. Hardware Barrier Implementations 2.3.3. ConnectX2 CORE-Direct Collective Offload Support 2.3.4. Collective Offload Support in the Portals 4 API 2.4. Group Operation Assembly Language 2.4.1. GOAL API 2.4.2. Scratchpad Buffer 2.4.3. Schedule Execution 2.5. The EXTOLL Network 2.6. Field Programmable Gate Arrays 3. Dealing with Constrained Resources 3.1. Hardware Limitations 3.2. Common Collective Functions in GOAL 3.3. Schedule Representation for the Hardware GOAL Interpreter 3.4. Executing Large Schedules using a small amount of Memory 3.4.1. Limits of Previously Suggested Approaches 3.4.2. Testing for Deadlocks in Schedules 3.4.3. Transforming Process Local Schedules into Global Schedules 3.4.4. Predetermined Buffer Locations 3.5. Queueing Active Operations in Hardware 3.6. Designing a Low-Memory-Footprint Point to Point Protocol 3.6.1. Arrival Times 3.6.2. Eager Protocol 3.6.3. Rendezvous Protocol 3.6.4. A Protocol without an Unexpected Queue 3.7. Protocol Verification 3.7.1. Capabilities of the Model Checker SPIN 3.7.2. Modeling the Protocol 3.7.3. Limitations of the Basic Protocol 4. The Matching Problem 4.1. Matching on the Host CPU 4.2. Implementation Methodology 4.3. Matching Unit Interface 4.4. Matching Unit Implementation 4.4.1. Slot Management Unit 4.4.2. The Input Consumer 4.4.3. The Output Generator 4.4.4. The Matching Unit 4.5. Slot Management Unit for Non-synchronous Transfers 5. The GOAL Interpreter 5.1. Schedule Interpreter Design 5.1.1. The Active Queue 5.1.2. The Dependency Resolver 5.2. Transceiver Interface 5.3. The Starter 5.3.1. Starting Operations 5.3.2. Processing Incoming Packets 5.3.3. Incoming Non-synchronous Packets 5.3.4. Presorting the Active Queue 5.3.5. Arbitration Units 5.3.6. IN-Filter 5.3.7. Outcommand Manager 5.3.8. Non-synchronous Protocol 5.3.9. Send Protocol 5.3.10. Receive Protocol 5.3.11. Local Operations on FPGA 6 Evaluation 6.1. Performance Analysis 6.2. Future Work 6.3. Conclusions Bibliography info:eu-repo/classification/ddc/006 ddc:006 HPC, communication offload, GOAL, EXTOLL
18	Reactive transport modeling at hillslope scale with high performance computing methods He, Wenkui 07 December 2016 (has links) (PDF) Reactive transport modeling is an important approach to understand water dynamics, mass transport and biogeochemical processes from the hillslope to the catchment scale. It has a wide range of applications in the fields of e.g. water resource management, contaminanted site remediation and geotechnical engineering. To simulate reactive transport processes at a hillslope or larger scales is a challenging task, which involves interactions of complex physical and biogeochemical processes, huge computational expenses as well as difficulties in numerical precision and stability. The primary goal of the work is to develop a practical, accurate and efficient tool to facilitate the simulation techniques for reactive transport problems towards hillslope or larger scales. The first part of the work deals with the simulation of water flow in saturated and unsaturated porous media. The capability and accuracy of different numerical approaches were analyzed and compared by using benchmark tests. The second part of the work introduces the coupling of the scientific software packages OpenGeoSys and IPhreeqc by using a character-string-based interface. The accuracy and computational efficiency of the coupled tool were discussed based on three benchmarks. It shows that OGS#IPhreeqc provides sufficient numerical accuracy to simulate reactive transport problems for both equilibrium and kinetic reactions in variably saturated porous media. The third part of the work describes the algorithm of a parallelization scheme using MPI (Message Passing Interface) grouping concept, which enables a flexible allocation of computational resources for calculating geochemical reaction and the physical processes such as groundwater flow and transport. The parallel performance of the approach was tested by three examples. It shows that the new approach has more advantages than the conventional ones for the calculation of geochemically-dominated problems, especially when only limited benefit can be obtained through parallelization for solving flow or solute transport. The comparison between the character-string-based and the file-based coupling shows, that the former approach produces less computational overhead in a distributed-memory system such as a computing cluster. The last part of the work shows the application of OGS#IPhreeqc for the simulation of the water dynamic and denitrification process in the groundwater aquifer of a study site in Northern Germany. It demonstrates that OGS#IPhreeqc is able to simulate heterogeneous reactive transport problems at a hillslope scale within an acceptable time span. The model results shows the importance of functional zones for natural attenuation process. / Modellierung des reaktiven Stofftranports ist ein wichtiger Ansatz um die Wasserströmung, den Stofftransport und die biogeochemischen Prozesse von der Hang- bis zur Einzugsgebietsskala zu verstehen. Es gibt umfangreiche Anwendungsgebiete, z.B. in der Wasserwirtschaft, Umweltsanierung und Geotechnik. Die Simulation der reaktiven Stofftransportprozesse auf der Hangskala oder auf größeren Maßstäbe ist eine anspruchsvolle Aufgabe, da es sich um die Wechselwirkungen komplexer physikalischer und biogeochemischen Prozesse handelt, die riesigen Berechnungsaufwand sowie numerischen Schwierigkeiten bezogen auf die Genauigkeit und die Stabilität nach sich ziehen. Das Hauptziel dieser Arbeit besteht darin, ein praktisches, genaues und effizientes Werkzeug zu entwickeln, um die Simulationstechnik für reaktiven Stofftransport auf der Hangskala und auf größeren Skalen zu verbessern. Der erste Teil der Arbeit behandelt die Simulation der Wasserströmung in gesättigten und ungesättigten porösen Medien. Das Anwendungspotential und die Genauigkeit verschiedener numerischer Ansätze wurden mittels einiger Benchmarks analysiert und miteinander verglichen. Der zweite Teil der Arbeit stellt die Kopplung der wissenschaftlichen Softwarepakete OpenGeoSys und IPhreeqc mit einer stringbasierten Schnittstelle dar. Die Genauigkeit und die Recheneffizienz des gekoppelten Tools OGS#IPhreeqc wurden basierend auf drei Benchmark-Tests diskutiert. Das Ergebnis zeigt, dass OGS#IPhreeqc die ausreichende numerische Genauigkeit für die Simulation reaktiven Stofftransports liefert, welcher sich sowohl auf die Gleichgewichtsreaktion als auch auf die kinetische Reaktion in variabel gesättigten porösen Medien beziehen. Der dritte Teil der Arbeit beschreibt zuerst den Algorithmus der Parallelisierung des OGS#IPhreeqc basierend auf dem MPI (Message Passing Interface) Gruppierungskonzept, welcher eine flexible Verteilung der Rechenressourcen für die Berechnung der geochemischen Reaktion und der physikalischen Prozesse wie z.B. Wasserströmung oder Stofftransport ermöglicht. Danach wurde die Leistungsfähigkeit des Algorithmus anhand von drei Beispielen getestet. Es zeigt sich, dass der neue Ansatz Vorteile gegenüber die konventionellen Ansätzen für die Berechnung von geochemisch dominierten Problemen bringt. Dies ist vor allem dann der Fall, wenn nur eingeschränkter Nutzen aus der Parallelisierung für die Berechnung der Wasserströmung oder des Stofftransportes gezogen werden kann. Der Vergleich zwischen der string- und der dateibasierten Kopplung zeigt, dass die erstere weniger Rechenoverhead in einem verteilten Rechnersystem, wie z.B. Cluster erzeugt. Der letzte Teil der Arbeit zeigt die Anwendung von OGS#IPhreeqc für die Simulation der Wasserdynamik und der Denitrifikation im Grundwasserleiter eines Untersuchungsgebietes in NordDeutschland. Es beweist, dass OGS#IPhreeqc in der Lage ist, reaktiven Stofftransport auf der Hangskala innerhalb akzeptabler Zeitspanne zu simulieren. Die Simulationsergebnisse zeigen die Bedeutung der funktionalen Zonen für die natürlichen Selbstreinigungsprozesse. Hochleistungsrechnen reactive transport modeling high performance computing ddc:550 rvk:ZI 6716
19	Enhancing an InfiniBand driver by utilizing an efficient malloc/free library supporting multiple page sizes Rex, Robert 23 October 2006 (has links) (PDF) Despite using high-speed network interconnection systems like InfiniBand, the communication overhead for parallel applications, especially in the area of High-Performance Computing (HPC), is still high. Using large page frames - so called hugepages in Linux - can improve the crucial work of registering communication buffers to the network adapter. Thus, an InfiniBand driver was modified. But these hugepages do not only reduce communication costs but can also improve computation time in a perceptible manner, e.g. by less TLB misses. To bypass the outlay of rewriting applications, a preload library was implemented that is able to utilize large page frames transparently. This work also shows benchmark results with these components and performance improvements of up to 10 %. HPC Hugepages InfiniBand ddc:000 Cluster <Rechnernetz> Hochleistungsrechnen LINUX Rechnernetz
20	Analysis and Optimization of the Packet Scheduler in Open MPI Lichei, Andre 13 November 2006 (has links) (PDF) We compared well known measurement methods for LogGP parameters and discuss their accuracy and network contention. Based on this, a new theoretically exact measurement method that does not saturate the network is derived and explained in detail. The applicability of our method is shown for the low level communication API of Open MPI across several interconnection networks. Based on the LogGP model, we developed a low overhead packet scheduling algorithm. It can handle different types of interconnects with different characteristics. It is able to produce schedules which are very close to the optimum for both small and large messages. The efficiency of the algorithm for small messages is show for a Open MPI implementation. The implementation uses the LogGP benchmark to obtain the LogGP parameters of the available interconnects and can so adapt to any given system. LogGP Modular Component Architecture Open MPI ddc:004 Hochleistungsrechnen Informatik Parallelrechner Scheduling

Search results