Global ETD Search

61	Parallel implementation of curve reconstruction from noisy samples Randrianarivony, Maharavo, Brunnett, Guido 06 April 2006 (has links) This paper is concerned with approximating noisy samples by non-uniform rational B-spline curves with special emphasis on free knots. We show how to set up the problem such that nonlinear optimization methods can be applied efficiently. This involves the introduction of penalizing terms in order to avoid undesired knot positions. We report on our implementation of the nonlinear optimization and we show a way to implement the program in parallel. Parallel performance results are described. Our experiments show that our program has a linear speedup and an efficiency value close to unity. Runtime results on a parallel computer are displayed. info:eu-repo/classification/ddc/510 ddc:510 B-Spline Geometrie Parallelverarbeitung curve reconstruction noisy samples nonlinear optimization
62	Parallel implementation of surface reconstruction from noisy samples Randrianarivony, Maharavo, Brunnett, Guido 06 April 2006 (has links) We consider the problem of reconstructing a surface from noisy samples by approximating the point set with non-uniform rational B-spline surfaces. We focus on the fact that the knot sequences should also be part of the unknown variables that include the control points and the weights in order to find their optimal positions. We show how to set up the free knot problem such that constrained nonlinear optimization can be applied efficiently. We describe in detail a parallel implementation of our approach that give almost linear speedup. Finally, we provide numerical results obtained on the Chemnitzer Linux Cluster supercomputer. info:eu-repo/classification/ddc/510 ddc:510
63	Task Pool Teams for Implementing Irregular Algorithms on Clusters of SMPs Hippold, Judith, Rünger, Gudula 06 April 2006 (has links) The characteristics of irregular algorithms make a parallel implementation difficult, especially for PC clusters or clusters of SMPs. These characteristics may include an unpredictable access behavior to dynamically changing data structures or strong irregular coupling of computations. Problems are an unknown load distribution and expensive irregular communication patterns for data accesses and redistributions. Thus the parallel implementation of irregular algorithms on distributed memory machines and clusters requires a special organizational mechanism for a dynamic load balance while keeping the communication and administration overhead low. We propose task pool teams for implementing irregular algorithms on clusters of PCs or SMPs. A task pool team combines multithreaded programming using task pools on single nodes with explicit message passing between different nodes. The dynamic load balance mechanism of task pools is generalized to a dynamic load balance scheme for all distributed nodes. We have implemented and compared several versions for task pool teams. As application example, we use the hierarchical radiosity algorithm, which is based on dynamically growing quadtree data structures annotated by varying interaction lists expressing the irregular coupling between the quadtrees. Experiments are performed on a PC cluster and a cluster of SMPs. info:eu-repo/classification/ddc/004 ddc:004 irregular algorithms, task pool teams
64	Solving Linear-Quadratic Optimal Control Problems on Parallel Computers Benner, Peter, Quintana-Ortí, Enrique S., Quintana-Ortí, Gregorio 11 September 2006 (has links) We discuss a parallel library of efficient algorithms for the solution of linear-quadratic optimal control problems involving largescale systems with state-space dimension up to $O(10^4)$. We survey the numerical algorithms underlying the implementation of the chosen optimal control methods. The approaches considered here are based on invariant and deflating subspace techniques, and avoid the explicit solution of the associated algebraic Riccati equations in case of possible ill-conditioning. Still, our algorithms can also optionally compute the Riccati solution. The major computational task of finding spectral projectors onto the required invariant or deflating subspaces is implemented using iterative schemes for the sign and disk functions. Experimental results report the numerical accuracy and the parallel performance of our approach on a cluster of Intel Itanium-2 processors. info:eu-repo/classification/ddc/510 ddc:510
65	Distributed Occlusion Culling for Realtime Visualization Domaratius, Uwe 18 December 2006 (has links) This thesis describes the development of a distributed occlusion culling solution for complex generic scenes. Moving these calculations onto a second computer should decrease the load on the actual rendering system and therefore allow higher framerates. This work includes an introduction to parallel rendering systems and discussion of suitable culling algorithms. Based on these parts, a client-server system for occlusion culling is developed. The test results of a prototypical implementation form the last part of this thesis. info:eu-repo/classification/ddc/004 ddc:004 Client-Server-Konzept Culling <Computergraphik> Histogramm Parallelverarbeitung Sichtbarkeitsverfahren
66	Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems Badía, José M., Benner, Peter, Mayo, Rafael, Quintana-Ortí, Enrique S., Quintana-Ortí, Gregorio, Remón, Alfredo 26 November 2007 (has links) We investigate model reduction of large-scale linear time-invariant systems in generalized state-space form. We consider sparse state matrix pencils, including pencils with banded structure. The balancing-based methods employed here are composed of well-known linear algebra operations and have been recently shown to be applicable to large models by exploiting the structure of the matrices defining the dynamics of the system. In this paper we propose a modification of the LR-ADI iteration to solve large-scale generalized Lyapunov equations together with a practical convergence criterion, and several other implementation refinements. Using kernels from several serial and parallel linear algebra libraries, we have developed a parallel package for model reduction, SpaRed, extending the applicability of balanced truncation to sparse systems with up to $O(10^5)$ states. Experiments on an SMP parallel architecture consisting of Intel Itanium 2 processors illustrate the numerical performance of this approach and the potential of the parallel algorithms for model reduction of large-scale sparse systems. info:eu-repo/classification/ddc/510 ddc:510 Ljapunov-Gleichung Ordnungsreduktion Parallelverarbeitung balanced truncation generalized Lyapunov equations model reduction
67	Primal and Dual Interface Concentrated Iterative Substructuring Methods Beuchler, Sven, Eibner, Tino, Langer, Ulrich 28 November 2007 (has links) This paper is devoted to the fast solution of interface concentrated finite element equations. The interface concentrated finite element schemes are constructed on the basis of a non-overlapping domain decomposition where a conforming boundary concentrated finite element approximation is used in every subdomain. Similar to data-sparse boundary element domain decomposition methods the total number of unknowns per subdomain behaves like $O((H/h)^{d−1})$, where H, h, and d denote the usual scaling parameter of the subdomains, the average discretization parameter of the subdomain boundaries, and the spatial dimension, respectively. We propose and analyze primal and dual substructuring iterative methods which asymptotically exhibit the same or at least almost the same complexity as the number of unknowns. In particular, the so-called All-Floating Finite Element Tearing and Interconnecting solvers are highly parallel and very robust with respect to large coefficient jumps. info:eu-repo/classification/ddc/510 ddc:510 Finite-Elemente-Methode Gebietszerlegungsmethode Parallelverarbeitung Substruktur boundary concentrated interface concentrated
68	Erstellung einer einheitlichen Taxonomie für die Programmiermodelle der parallelen Programmierung Nestmann, Markus 02 May 2017 (has links) Durch die parallele Programmierung wird ermöglicht, dass Programme nebenläufig auf mehreren CPU-Kernen oder CPUs ausgeführt werden können. Um das parallele Programmieren zu erleichtern, wurden diverse Sprachen (z.B. Erlang) und Bibliotheken (z.B. OpenMP) aufbauend auf parallele Programmiermodelle (z.B. Parallel Random Access Machine) entwickelt. Möchte z.B. ein Softwarearchitekt sich in einem Projekt für ein Programmiermodell entscheiden, muss er dabei auf mehrere wichtige Kriterien (z.B. Abhängigkeiten zur Hardware) achten. erleichternd für diese Suche sind Übersichten, die die Programmiermodelle in diesen Kriterien unterscheiden und ordnen. Werden existierenden Übersichten jedoch betrachtet, finden sich Unterschiede in der Klassifizierung, den verwendeten Begriffen und den aufgeführten Programmiermodellen. Diese Arbeit begleicht dieses Defizit, indem zuerst durch ein Systematic Literature Review die existierenden Taxonomien gesammelt und analysiert werden. Darauf aufbauend wird eine einheitliche Taxonomie erstellt. Mit dieser Taxonomie kann eine Übersicht über die parallelen Programmiermodelle erstellt werden. Diese Übersicht wird zusätzlich durch Informationen zu den jeweiligen Abhängigkeiten der Programmiermodelle zu der Hardware-Architektur erweitert werden. Der Softwarearchitekt (oder Projektleiter, Softwareentwickler,...) kann damit eine informierte Entscheidung treffen und ist nicht gezwungen alle Programmiermodelle einzeln zu analysieren. Parallele Programmiermodelle, Taxonomie info:eu-repo/classification/ddc/000 ddc:000 Taxonomie; Parallelverarbeitung; Modell
69	Comparison and End-to-End Performance Analysis of Parallel Filesystems Kluge, Michael 05 September 2011 (has links) This thesis presents a contribution to the field of performance analysis for Input/Output (I/O) related problems, focusing on the area of High Performance Computing (HPC). Beside the compute nodes, High Performance Computing systems need a large amount of supporting components that add their individual behavior to the overall performance characteristic of the whole system. Especially file systems in such environments have their own infrastructure. File operations are typically initiated at the compute nodes and proceed through a deep software stack until the file content arrives at the physical medium. There is a handful of shortcomings that characterize the current state of the art for performance analyses in this area. This includes a system wide data collection, a comprehensive analysis approach for all collected data, an adjusted trace event analysis for I/O related problems, and methods to compare current with archived performance data. This thesis proposes to instrument all soft- and hardware layers to enhance the performance analysis for file operations. The additional information can be used to investigate performance characteristics of parallel file systems. To perform I/O analyses on HPC systems, a comprehensive approach is needed to gather related performance events, examine the collected data and, if necessary, to replay relevant parts on different systems. One larger part of this thesis is dedicated to algorithms that reduce the amount of information that are found in trace files to the level that is needed for an I/O analysis. This reduction is based on the assumption that for this type of analysis all I/O events, but only a subset of all synchronization events of a parallel program trace have to be considered. To extract an I/O pattern from an event trace, only these synchronization points are needed that describe dependencies among different I/O requests. Two algorithms are developed to remove negligible events from the event trace. Considering the related work for the analysis of a parallel file systems, the inclusion of counter data from external sources, e.g. the infrastructure of a parallel file system, has been identified as a major milestone towards a holistic analysis approach. This infrastructure contains a large amount of valuable information that are essential to describe performance effects observed in applications. This thesis presents an approach to collect and subsequently process and store the data. Certain ways how to correctly merge the collected values with application traces are discussed. Here, a revised definition of the term "performance counter" is the first step followed by a tree based approach to combine raw values into secondary values. A visualization approach for I/O patterns closes another gap in the analysis process. Replaying I/O related performance events or event patterns can be done by a flexible I/O benchmark. The constraints for the development of such a benchmark are identified as well as the overall architecture for a prototype implementation. Finally, different examples demonstrate the usage of the developed methods and show their potential. All examples are real use cases and are situated on the HRSK research complex and the 100GBit Testbed at TU Dresden. The I/O related parts of a Bioinformatics and a CFD application have been analyzed in depth and enhancements for both are proposed. An instance of a Lustre file system was deployed and tuned on the 100GBit Testbed by the extensive use of external performance counters. info:eu-repo/classification/ddc/004 ddc:004
70	Suchbasierte Algorithmen für das Scheduling unabhängiger paralleler Tasks Dietze, Robert 09 May 2022 (has links) In parallelen Anwendungen, die auf Grundlage des Programmiermodells der gemischten Parallelität implementiert wurden, lassen sich meist unabhängige Programmteile (Tasks) identifizieren, die sowohl parallel zueinander als auch selbst parallel ausgeführt werden können. Zur Reduzierung der Ausführungszeit solcher Anwendungen auf einem parallelen System wird eine zeitliche und räumliche Zuordnung dieser parallelen Tasks zu den Prozessoren benötigt, welche mithilfe von Schedulingverfahren ermittelt werden kann. Jedoch ist bereits das Scheduling voneinander abhängiger Single-Prozessor-Tasks auf ein paralleles System mit zwei Prozessoren NP-schwer, weshalb zur Lösung von Schedulingproblemen häufig List-Scheduling-Heuristiken verwendet werden. Das Scheduling unabhängiger paralleler Tasks ist aufgrund der vielen zusätzlichen Zuordnungsmöglichkeiten deutlich komplexer und erfordert daher dedizierte Lösungsverfahren. Einen vielversprechenden Ansatz zur Lösung komplexer Schedulingprobleme bilden suchbasierte Verfahren, die lokale oder globale Suchstrategien zur Lösungsfindung nutzen. In der vorliegenden Arbeit wird untersucht, inwieweit sich derartige Verfahren für das Scheduling unabhängiger paralleler Tasks auf heterogene Systeme bestehend aus Multicore- Rechnern mit unterschiedlichen Eigenschaften eignen. Zu diesem Zweck werden vier suchbasierte Schedulingverfahren entwickelt und untersucht. Konkret werden zwei modifizierende und zwei inkrementelle Verfahren vorgestellt, die von Suchverfahren wie der A*-Suche und Metaheuristiken wie der Tabu-Suche und des Simulated Annealing inspiriert sind. Zusätzlich wird eine Kostenmodellierung in Form von parametrisierten Laufzeitformeln präsentiert, mit der die Ausführungszeiten der parallelen Tasks auf heterogenen Systemen modelliert werden können. Die Verfahren werden in Laufzeitmessungen auf heterogenen Rechnerplattformen untereinander und mit existierenden List-Scheduling-Heuristiken verglichen. Als Anwendungen für die Messungen werden sowohl Programme der SPLASH-3-Benchmark-Suite als auch eine praxisnahe Simulationsanwendung zur Bauteilbelastung untersucht. Die Ergebnisse zeigen, dass alle vier Verfahren im Vergleich zu existierenden List-Scheduling-Heuristiken eine signifikante Reduktion der Ausführungszeit erreichen können. info:eu-repo/classification/ddc/004.35 ddc:004.35

Search results