Global ETD Search

31	Erstellung einer einheitlichen Taxonomie für die Programmiermodelle der parallelen Programmierung Nestmann, Markus 02 May 2017 (has links) (PDF) Durch die parallele Programmierung wird ermöglicht, dass Programme nebenläufig auf mehreren CPU-Kernen oder CPUs ausgeführt werden können. Um das parallele Programmieren zu erleichtern, wurden diverse Sprachen (z.B. Erlang) und Bibliotheken (z.B. OpenMP) aufbauend auf parallele Programmiermodelle (z.B. Parallel Random Access Machine) entwickelt. Möchte z.B. ein Softwarearchitekt sich in einem Projekt für ein Programmiermodell entscheiden, muss er dabei auf mehrere wichtige Kriterien (z.B. Abhängigkeiten zur Hardware) achten. erleichternd für diese Suche sind Übersichten, die die Programmiermodelle in diesen Kriterien unterscheiden und ordnen. Werden existierenden Übersichten jedoch betrachtet, finden sich Unterschiede in der Klassifizierung, den verwendeten Begriffen und den aufgeführten Programmiermodellen. Diese Arbeit begleicht dieses Defizit, indem zuerst durch ein Systematic Literature Review die existierenden Taxonomien gesammelt und analysiert werden. Darauf aufbauend wird eine einheitliche Taxonomie erstellt. Mit dieser Taxonomie kann eine Übersicht über die parallelen Programmiermodelle erstellt werden. Diese Übersicht wird zusätzlich durch Informationen zu den jeweiligen Abhängigkeiten der Programmiermodelle zu der Hardware-Architektur erweitert werden. Der Softwarearchitekt (oder Projektleiter, Softwareentwickler,...) kann damit eine informierte Entscheidung treffen und ist nicht gezwungen alle Programmiermodelle einzeln zu analysieren. Parallele Programmiermodelle Taxonomie Parallel Programming Models Taxonomy Overview ddc:000 Taxonomie Parallelverarbeitung Modell
32	Parallele Genetische Algorithmen mit Anwendungen / Parallel Genetic Algorithms with Applications Riedel, Marion 18 November 2002 (has links) (PDF) The diploma thesis with the subject ¨Parallel Genetic Algorithms with Applications¨ deals with the parallelization of Genetic Algorithms for the creation of efficient optimization methods especially for simulation based application problems. First, an introduction to Genetic Algorithms and an overview of possible parallelization approaches as well as already published results of research are given. This is followed by a detailed explanation of the conception and realization of own Parallel Genetic Algorithms. The paper is rounded off by an particularized description of the results of extensive test runs on the Chemnitzer Linux-Cluster (CLiC). / Die Diplomarbeit zum Thema ¨Parallele Genetische Algorithmen mit Anwendungen¨ befasst sich mit der Parallelisierung Genetischer Algorithmen zur Erzeugung effizienter Optimierungsverfahren für insbesondere simulationsbasierte Anwendungsprobleme. Zunächst werden eine Einführung in Genetische Algorithmen sowie ein Überblick über mögliche Parallelisierungsansätze und bereits veröffentlichte Forschungsergebnisse gegeben. Dem schließt sich eine detaillierte Erläuterung der Konzeption und Umsetzung eigener Paralleler Genetischer Algorithmen an. Abgerundet wird die Arbeit durch eine ausführliche Darstellung der Ergebnisse umfangreicher Testläufe auf dem Chemnitzer Linux-Cluster (CLiC). ddc:004 Cluster Parallelisierung Parallelrechner Parallelverarbeitung / Programmierung Paralleler Algorithmus Evolutionärer Algorithmus Genetischer Algorithmus
33	Bibliotheken zur Entwicklung paralleler Algorithmen - Basisroutinen für Kommunikation und Grafik Pester, Matthias 04 April 2006 (has links) (PDF) The purpose of this paper is to supply a summary of library subroutines and functions for parallel MIMD computers. The subroutines have been developed and continously extended at the University of Chemnitz since the end of the eighties. In detail, they are concerned with vector operations, inter-processor communication and simple graphic output to workstations. One of the most valuable features is the machine-independence of the communication subroutines proposed in this paper for a hypercube topology of the parallel processors (excepting a kernel of only two primitive system-dependend operations). They were implemented and tested for different hardware and operating systems including PARIX for transputers and PowerPC, nCube, PVM, MPI. The vector subroutines are optimized by the use of C language and unrolled loops (BLAS1-like). Hardware-optimized BLAS1 routines may be integrated. The paper includes hints for programmers how to use the libraries with both Fortran and C programs. message passing numerical algorithms ddc:510 Parallelverarbeitung / Programmierung Visualisierung Wissenschaftlich-technische Software
34	Visualization Tools for 2D and 3D Finite Element Programs - User's Manual Pester, Matthias 04 April 2006 (has links) (PDF) This paper deals with the visualization of numerical results as a very convenient method to understand and evaluate a solution which has been calculated as a set of millions of numerical values. One of the central research fields of the Chemnitz SFB 393 is the analysis of parallel numerical algorithms for large systems of linear equations arising from differential equations (e.g. in solid and fluid mechanics). Solving large problems on massively parallel computers makes it more and more impossible to store numerical data from the distributed memory of the parallel computer to the disk for later postprocessing. However, the developer of algorithms is interested in an on-line response of his algorithms. Both visual and numerical response of the running program may be evaluated by the user for a decision how to switch or adjust interactively certain parameters that may influence the solution process. The paper gives a survey of current programmer and user interfaces that are used in our various 2D and 3D parallel finite element programs for the visualization of the solution. computer graphics numerical algorithms ddc:510 Finite-Elemente-Methode Parallelverarbeitung / Programmierung Visualisierung
35	Distributed Occlusion Culling for Realtime Visualization Domaratius, Uwe 14 March 2007 (has links) (PDF) This thesis describes the development of a distributed occlusion culling solution for complex generic scenes. Moving these calculations onto a second computer should decrease the load on the actual rendering system and therefore allow higher framerates. This work includes an introduction to parallel rendering systems and discussion of suitable culling algorithms. Based on these parts, a client-server system for occlusion culling is developed. The test results of a prototypical implementation form the last part of this thesis. ddc:004 Client-Server-Konzept Culling <Computergraphik> Histogramm Parallelverarbeitung Sichtbarkeitsverfahren
36	Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems Badía, José M., Benner, Peter, Mayo, Rafael, Quintana-Ortí, Enrique S., Quintana-Ortí, Gregorio, Remón, Alfredo 26 November 2007 (has links) (PDF) We investigate model reduction of large-scale linear time-invariant systems in generalized state-space form. We consider sparse state matrix pencils, including pencils with banded structure. The balancing-based methods employed here are composed of well-known linear algebra operations and have been recently shown to be applicable to large models by exploiting the structure of the matrices defining the dynamics of the system. In this paper we propose a modification of the LR-ADI iteration to solve large-scale generalized Lyapunov equations together with a practical convergence criterion, and several other implementation refinements. Using kernels from several serial and parallel linear algebra libraries, we have developed a parallel package for model reduction, SpaRed, extending the applicability of balanced truncation to sparse systems with up to $O(10^5)$ states. Experiments on an SMP parallel architecture consisting of Intel Itanium 2 processors illustrate the numerical performance of this approach and the potential of the parallel algorithms for model reduction of large-scale sparse systems. balanced truncation generalized Lyapunov equations model reduction ddc:510 Ljapunov-Gleichung Ordnungsreduktion Parallelverarbeitung
37	Erweiterung der Infinibandunterstützung von netgauge Dietze, Stefan 25 February 2009 (has links) (PDF) Diese Arbeit beschäftigt sich mit der Erweiterung des Infiniband-Moduls von netgauge. Dem Modul werden die nicht blockierenden Kommunikationsfunktionen hinzugefügt. Es wird auf die Implementierung dieser Funktionen und die verwendeten Algorithmen eingegangen. Weiterhin werden die Ergebnisse der Messungen bewertet und mit den Messergebnissen der blockierenden Funktionen verglichen. Betrachtet werden dabei die Bandbreite und Latenz der 1:1 Kommunikation. Die Messungen wurden sowohl auf Infinpath als auch auf Mellanox Hardware vorgenommen. ddc:000 Computerunterstützte Kommunikation Latenzzeit <Informatik> MPI Netzwerk Parallelverarbeitung
38	Verwendung von Graﬁkprozessoren zur Simulation von Diffusionsprozessen mit zufälligen Sierpiński-Teppichen Lang, Jens 20 May 2009 (has links) (PDF) In dieser Arbeit wurde ein Verfahrung zur Random-Walk-Simulation auf fraktalen Strukturen untersucht. Es dient der Simulation von Diffusion in porösen Materialien. Konkret wurde der Mastergleichungsansatz zur Simulation eines Random Walks auf Sierpiński-Teppichen für GPGPUs (General Purpose Graphics Processing Units) in drei verschiedenen Versionen implementiert: Zunächst wurde die gesamte Fläche in einem zweidimensionalen Array gespeichert. Danach wurde eine Version untersucht, bei der nur die begehbaren Felder abgespeichert wurden. Diese Vorgehensweise spart Speicher, da die Sierpiński-Teppiche meist nur dünn besetzt sind. Weiter wurde die Implementierung verbessert, indem die Fläche jeweils dynamisch erweitert wird, wenn die Simulation an den Rand des vorhandenen Gebietes stößt. Die genutzten Graﬁkprozessoren arbeiten nach dem SIMD-Prinzip. Daher wurde zusätzlich untersucht, ob sich Laufzeitverbesserungen ergeben, wenn der Code dahingehend optimiert wird. Die Ergebnisse zeigen, dass sich in der Tat eine kürzere Laufzeit ergibt, wenn nur noch begehbare Felder abgespeichert werden. Noch weiter kann die Laufzeit mit der dynamischen Erweiterung der Simulationsﬂäche verkürzt werden. Optimierungen für die SIMD-Arbeitsweise der Prozessoren bringen jedoch keine Laufzeitver besserung. / This thesis investigates an algorithm for random walk simulations on fractal structures. Its purpose is the simulation of diffusion in porous materials. Indeed the master equation approach for the simulation of random walks on Sierpiński carpets has been implemented for GPGPUs (general purpose graphics processing units) in three different versions: In the first approach the whole carpet has been saved in a two-dimensional array. Secondly a version was investigated that only saves the present cells. This strategy saves memory as Sierpiński carpets are generally sparse. The implementation has been further improved by extending the carpet dynamically each time when the simulation reaches its current border. The graphics processing units that were used have a SIMD architecture. Therefore it has been investigated additionally if optimization for the SIMD architecture leads to performance improvements. The results show that execution time does indeed decrease if only present cells are being saved. It can be decreased further by dynamically extending the carpet. Optimizations for the SIMD architecture did not result in a reduced execution time. CUDA GPGPU Sierpiński-Teppich ddc:004 ddc:530 Diffusion Irrfahrtsproblem Paralleler Algorithmus Parallelverarbeitung
39	Primal and Dual Interface Concentrated Iterative Substructuring Methods Beuchler, Sven, Eibner, Tino, Langer, Ulrich 28 November 2007 (has links) (PDF) This paper is devoted to the fast solution of interface concentrated finite element equations. The interface concentrated finite element schemes are constructed on the basis of a non-overlapping domain decomposition where a conforming boundary concentrated finite element approximation is used in every subdomain. Similar to data-sparse boundary element domain decomposition methods the total number of unknowns per subdomain behaves like $O((H/h)^{d−1})$, where H, h, and d denote the usual scaling parameter of the subdomains, the average discretization parameter of the subdomain boundaries, and the spatial dimension, respectively. We propose and analyze primal and dual substructuring iterative methods which asymptotically exhibit the same or at least almost the same complexity as the number of unknowns. In particular, the so-called All-Floating Finite Element Tearing and Interconnecting solvers are highly parallel and very robust with respect to large coefficient jumps. boundary concentrated interface concentrated ddc:510 Finite-Elemente-Methode Gebietszerlegungsmethode Parallelverarbeitung Substruktur
40	Comparison and End-to-End Performance Analysis of Parallel Filesystems Kluge, Michael 20 September 2011 (has links) (PDF) This thesis presents a contribution to the field of performance analysis for Input/Output (I/O) related problems, focusing on the area of High Performance Computing (HPC). Beside the compute nodes, High Performance Computing systems need a large amount of supporting components that add their individual behavior to the overall performance characteristic of the whole system. Especially file systems in such environments have their own infrastructure. File operations are typically initiated at the compute nodes and proceed through a deep software stack until the file content arrives at the physical medium. There is a handful of shortcomings that characterize the current state of the art for performance analyses in this area. This includes a system wide data collection, a comprehensive analysis approach for all collected data, an adjusted trace event analysis for I/O related problems, and methods to compare current with archived performance data. This thesis proposes to instrument all soft- and hardware layers to enhance the performance analysis for file operations. The additional information can be used to investigate performance characteristics of parallel file systems. To perform I/O analyses on HPC systems, a comprehensive approach is needed to gather related performance events, examine the collected data and, if necessary, to replay relevant parts on different systems. One larger part of this thesis is dedicated to algorithms that reduce the amount of information that are found in trace files to the level that is needed for an I/O analysis. This reduction is based on the assumption that for this type of analysis all I/O events, but only a subset of all synchronization events of a parallel program trace have to be considered. To extract an I/O pattern from an event trace, only these synchronization points are needed that describe dependencies among different I/O requests. Two algorithms are developed to remove negligible events from the event trace. Considering the related work for the analysis of a parallel file systems, the inclusion of counter data from external sources, e.g. the infrastructure of a parallel file system, has been identified as a major milestone towards a holistic analysis approach. This infrastructure contains a large amount of valuable information that are essential to describe performance effects observed in applications. This thesis presents an approach to collect and subsequently process and store the data. Certain ways how to correctly merge the collected values with application traces are discussed. Here, a revised definition of the term "performance counter" is the first step followed by a tree based approach to combine raw values into secondary values. A visualization approach for I/O patterns closes another gap in the analysis process. Replaying I/O related performance events or event patterns can be done by a flexible I/O benchmark. The constraints for the development of such a benchmark are identified as well as the overall architecture for a prototype implementation. Finally, different examples demonstrate the usage of the developed methods and show their potential. All examples are real use cases and are situated on the HRSK research complex and the 100GBit Testbed at TU Dresden. The I/O related parts of a Bioinformatics and a CFD application have been analyzed in depth and enhancements for both are proposed. An instance of a Lustre file system was deployed and tuned on the 100GBit Testbed by the extensive use of external performance counters. Performance-Analyse Parallele Dateisysteme Performance Analysis Parallel File System ddc:004 rvk:ST 150 Dateisystem Parallelverarbeitung Leistungsbewertung

Search results