• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 339
  • 189
  • 134
  • 56
  • 45
  • 44
  • 4
  • 4
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 924
  • 924
  • 924
  • 404
  • 395
  • 351
  • 351
  • 329
  • 325
  • 320
  • 319
  • 316
  • 314
  • 313
  • 313
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
671

Adaptive Fault Tolerance Strategies for Large Scale Systems

George, Cijo January 2012 (has links) (PDF)
Exascale systems of the future are predicted to have mean time between node failures (MTBF) of less than one hour. At such low MTBF, the number of processors available for execution of a long running application can widely vary throughout the execution of the application. Employing traditional fault tolerance strategies like periodic checkpointing in these highly dynamic environments may not be effective because of the high number of application failures, resulting in large amount of work lost due to rollbacks apart from the increased recovery overheads. In this context, it is highly necessary to have fault tolerance strategies that can adapt to the changing node availability and also help avoid significant number of application failures. In this thesis, we present two adaptive fault tolerance strategies that make use of node failure pre-diction mechanisms to provide proactive fault tolerance for long running parallel applications on large scale systems. The first part of the thesis deals with an adaptive fault tolerance strategy for malleable applications. We present ADFT, an adaptive fault tolerance framework for long running malleable applications to maximize application performance in the presence of failures. We first develop cost models that consider different factors like accuracy of node failure predictions and application scalability, for evaluating the benefits of various fault tolerance actions including check-pointing, live-migration and rescheduling. Our adaptive framework then uses the cost models to make runtime decisions for dynamically selecting the fault tolerance actions at different points of application execution to minimize application failures and maximize performance. Simulations with real and synthetic failure traces show that our approach outperforms existing fault tolerance mechanisms for malleable applications yielding up to 23% improvement in work done by the application in the presence of failures, and is effective even for petascale and exascale systems. In the second part of the thesis, we present a fault tolerance strategy using adaptive process replication that can provide fault tolerance for applications using partial replication of a set of application processes. This fault tolerance framework adaptively changes the set of replicated processes (replicated set) periodically based on node failure predictions to avoid application failures. We have developed an MPI prototype implementation, PAREP-MPI that allows dynamically changing the replicated set of processes for MPI applications. Experiments with real scientific applications on real systems have shown that the overhead of PAREP-MPI is minimal. We have shown using simulations with real and synthetic failure traces that our strategy involving adaptive process replication significantly outperforms existing mechanisms providing up to 20% improvement in application efficiency even for exascale systems. Significant observations are also made which can drive future research efforts in fault tolerance for large and very large scale systems.
672

Implementação paralela em um ambiente de múltiplas GPUs de um modelo 3D do sistema imune inato

Xavier, Micael Peters 26 August 2013 (has links)
Submitted by Renata Lopes (renatasil82@gmail.com) on 2017-02-24T13:29:14Z No. of bitstreams: 1 micaelpetersxavier.pdf: 17481766 bytes, checksum: fb76bff140085a73dc148ca7493df8b3 (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-02-24T15:36:12Z (GMT) No. of bitstreams: 1 micaelpetersxavier.pdf: 17481766 bytes, checksum: fb76bff140085a73dc148ca7493df8b3 (MD5) / Made available in DSpace on 2017-02-24T15:36:12Z (GMT). No. of bitstreams: 1 micaelpetersxavier.pdf: 17481766 bytes, checksum: fb76bff140085a73dc148ca7493df8b3 (MD5) Previous issue date: 2013-08-26 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / O desenvolvimento de sistemas computacionais que simulam o funcionamento de tecidos ou mesmo de órgãos completos é uma tarefa extremamente complexa. Um dos muitos obstáculos relacionados ao desenvolvimento de tais sistemas é o enorme poder computacional necessário para a execução das simulações. Por essa razão, o uso de estratégias e métodos que empregam computação paralela são essenciais. Este trabalho foca na simulação temporal e espacial, em uma seção tridimensional de tecido, do comportamento de algumas das células e moléculas que constituem o sistema imunológico humano (SIH) inato. Com o objetivo de reduzir o tempo necessário para realizar a simulação, foram utilizadas múltiplas unidades de processamento gráfico (Graphics Processing Unit, GPUs) em um ambiente de agregados computacionais. Apesar do alto custo de comunicação imposto pelo uso de múltiplas GPUs, as abordagens e técnicas utilizadas neste trabalho para implementar as versões paralelas do simulador mostraram-se efetivas para alcançar o objetivo de redução do tempo de simulação. / The development of computer systems that simulate the behavior of tissues or even whole organs is an extremely complex task. One of the many obstacles related to the development of such systems is the huge computational resources needed to execute the simulations. For this reason, the use of strategies and methods that employ parallel computing are essential. This work focuses on the spatial-temporal simulation of some human innate immune system (HIS) cells and molecules in a three-dimensional section of tissue. Aiming to reduce the time required to perform the simulation, multiple graphics processing units (GPUs) were used in a cluster environment. Despite of high communication cost imposed by the use of multiple GPUs, the approaches and techniques used in this work to implement parallel versions of the simulator proved to be very effective in their purpose of reducing the simulation time.
673

Mitteilungen des URZ 4/2003

Ziegler,, Heik,, Arnold,, Clauß,, Koppe,, Petersen,, Richter,, Martin,, Trapp,, Fischer, 11 December 2003 (has links) (PDF)
"Mitteilungen des URZ" 4/2003
674

Mitteilungen des URZ 1/2005

Müller,, Riedel,, Ziegler, 17 March 2005 (has links) (PDF)
Informationen des Universitätsrechenzentrums mit Jahresrückblick 2004 zu allen Diensten des URZ
675

Mitteilungen des URZ 1/2007

Riedel, W., Trapp, H. 04 April 2007 (has links)
Informationen des Universitätsrechenzentrums mit Jahresrückblick 2006 zu den aktuellen Projekten und Diensten des URZ
676

Effective Automatic Computation Placement and Data Allocation for Parallelization of Regular Programs

Chandan, G January 2014 (has links) (PDF)
Scientific applications that operate on large data sets require huge amount of computation power and memory. These applications are typically run on High Performance Computing (HPC) systems that consist of multiple compute nodes, connected over an network interconnect such as InfiniBand. Each compute node has its own memory and does not share the address space with other nodes. A significant amount of work has been done in past two decades on parallelizing for distributed-memory architectures. A majority of this work was done in developing compiler technologies such as high performance Fortran (HPF) and partitioned global address space (PGAS). However, several steps involved in achieving good performance remained manual. Hence, the approach currently used to obtain the best performance is to rely on highly tuned libraries such as ScaLAPACK. The objective of this work is to improve automatic compiler and runtime support for distributed-memory clusters for regular programs. Regular programs typically use arrays as their main data structure and array accesses are affine functions of outer loop indices and program parameters. A lot of scientific applications such as linear-algebra kernels, stencils, partial differential equation solvers, data-mining applications and dynamic programming codes fall in this category. In this work, we propose techniques for finding computation mapping and data allocation when compiling regular programs for distributed-memory clusters. Techniques for transformation and detection of parallelism, relying on the polyhedral framework already exist. We propose automatic techniques to determine computation placements for identified parallelism and allocation of data. We model the problem of finding good computation placement as a graph partitioning problem with the constraints to minimize both communication volume and load imbalance for entire program. We show that our approach for computation mapping is more effective than those that can be developed using vendor-supplied libraries. Our approach for data allocation is driven by tiling of data spaces along with a compiler assisted runtime scheme to allocate and deallocate tiles on-demand and reuse them. Experimental results on some sequences of BLAS calls demonstrate a mean speedup of 1.82× over versions written with ScaLAPACK. Besides enabling weak scaling for distributed memory, data tiling also improves locality for shared-memory parallelization. Experimental results on a 32-core shared-memory SMP system shows a mean speedup of 2.67× over code that is not data tiled.
677

Accelerated Deep Learning using Intel Xeon Phi

Viebke, André January 2015 (has links)
Deep learning, a sub-topic of machine learning inspired by biology, have achieved wide attention in the industry and research community recently. State-of-the-art applications in the area of computer vision and speech recognition (among others) are built using deep learning algorithms. In contrast to traditional algorithms, where the developer fully instructs the application what to do, deep learning algorithms instead learn from experience when performing a task. However, for the algorithm to learn require training, which is a high computational challenge. High Performance Computing can help ease the burden through parallelization, thereby reducing the training time; this is essential to fully utilize the algorithms in practice. Numerous work targeting GPUs have investigated ways to speed up the training, less attention have been paid to the Intel Xeon Phi coprocessor. In this thesis we present a parallelized implementation of a Convolutional Neural Network (CNN), a deep learning architecture, and our proposed parallelization scheme, CHAOS. Additionally a theoretical analysis and a performance model discuss the algorithm in detail and allow for predictions if even more threads are available in the future. The algorithm is evaluated on an Intel Xeon Phi 7120p, Xeon E5-2695v2 2.4 GHz and Core i5 661 3.33 GHz using various architectures and thread counts on the MNIST dataset. Findings show a 103.5x, 99.9x, 100.4x speed up for the large, medium, and small architecture respectively for 244 threads compared to 1 thread on the coprocessor. Moreover, a 10.9x - 14.1x (large to small) speed up compared to the sequential version running on Xeon E5. We managed to decrease training time from 7 days on the Core i5 and 31 hours on the Xeon E5, to 3 hours on the Intel Xeon Phi when training our large network for 15 epochs
678

High performance lattice Boltzmann solvers on massively parallel architectures with applications to building aeraulics / Implantations hautes performances de la méthode de Boltzmann sur gaz réseau. Applications à l'aéraulique des bâtiments

Obrecht, Christian 11 December 2012 (has links)
Avec l'émergence des bâtiments à haute efficacité énergétique, il est devenu indispensable de pouvoir prédire de manière fiable le comportement énergétique des bâtiments. Or, à l'heure actuelle, la prise en compte des effets thermo-aérauliques dans les modèles se cantonne le plus souvent à l'utilisation d'approches simplifiées voire empiriques qui ne sauraient atteindre la précision requise. Le recours à la simulation numérique des écoulements semble donc incontournable, mais il est limité par un coût calculatoire généralement prohibitif. L'utilisation conjointe d'approches innovantes telle que la méthode de Boltzmann sur gaz réseau (LBM) et d'outils de calcul massivement parallèles comme les processeurs graphiques (GPU) pourrait permettre de s'affranchir de ces limites. Le présent travail de recherche s'attache à en explorer les potentialités. La méthode de Boltzmann sur gaz réseau, qui repose sur une forme discrétisée de l'équation de Boltzmann, est une approche explicite qui jouit de nombreuses qualités : précision, stabilité, prise en compte de géométries complexes, etc. Elle constitue donc une alternative intéressante à la résolution directe des équations de Navier-Stokes par une méthode numérique classique. De par ses caractéristiques algorithmiques, elle se révèle bien adaptée au calcul parallèle. L'utilisation de processeurs graphiques pour mener des calculs généralistes est de plus en plus répandue dans le domaine du calcul intensif. Ces processeurs à l'architecture massivement parallèle offrent des performances inégalées à ce jour pour un coût relativement modéré. Néanmoins, nombre de contraintes matérielles en rendent la programmation complexe et les gains en termes de performances dépendent fortement de la nature de l'algorithme considéré. Dans le cas de la LBM, les implantations GPU affichent couramment des performances supérieures de deux ordres de grandeur à celle d'une implantation CPU séquentielle faiblement optimisée. Le mémoire de thèse présenté est constitué d'un ensemble de neuf articles de revues internationales et d'actes de conférences internationales (le dernier étant en cours d'évaluation). Dans ces travaux sont abordés les problématiques liées tant à l'implantation mono-GPU de la LBM et à l'optimisation des accès en mémoire, qu'aux implantations multi-GPU et à la modélisation des communications inter-GPU et inter-nœuds. En complément, sont détaillées diverses extensions à la LBM indispensables pour envisager une utilisation en thermo-aéraulique des bâtiments. Les cas d'études utilisés pour la validation des codes permettent de juger du fort potentiel de cette approche en pratique. / With the advent of low-energy buildings, the need for accurate building performance simulations has significantly increased. However, for the time being, the thermo-aeraulic effects are often taken into account through simplified or even empirical models, which fail to provide the expected accuracy. Resorting to computational fluid dynamics seems therefore unavoidable, but the required computational effort is in general prohibitive. The joint use of innovative approaches such as the lattice Boltzmann method (LBM) and massively parallel computing devices such as graphics processing units (GPUs) could help to overcome these limits. The present research work is devoted to explore the potential of such a strategy. The lattice Boltzmann method, which is based on a discretised version of the Boltzmann equation, is an explicit approach offering numerous attractive features: accuracy, stability, ability to handle complex geometries, etc. It is therefore an interesting alternative to the direct solving of the Navier-Stokes equations using classic numerical analysis. From an algorithmic standpoint, the LBM is well-suited for parallel implementations. The use of graphics processors to perform general purpose computations is increasingly widespread in high performance computing. These massively parallel circuits provide up to now unrivalled performance at a rather moderate cost. Yet, due to numerous hardware induced constraints, GPU programming is quite complex and the possible benefits in performance depend strongly on the algorithmic nature of the targeted application. For LBM, GPU implementations currently provide performance two orders of magnitude higher than a weakly optimised sequential CPU implementation. The present thesis consists of a collection of nine articles published in international journals and proceedings of international conferences (the last one being under review). These contributions address the issues related to single-GPU implementations of the LBM and the optimisation of memory accesses, as well as multi-GPU implementations and the modelling of inter-GPU and internode communication. In addition, we outline several extensions to the LBM, which appear essential to perform actual building thermo-aeraulic simulations. The test cases we used to validate our codes account for the strong potential of GPU LBM solvers in practice.
679

An object oriented and high performance platform for aerothermodynamics simulation

Lani, Andrea 04 December 2008 (has links)
This thesis presents the author's contribution <p>to the design and implementation of COOLFluiD,<p>an object oriented software platform for <p>the high performance simulation of multi-physics phenomena on unstructured grids. In this context, the final goal has been to provide a reliable tool for handling high speed aerothermodynamic <p>applications. To this end, we introduce a number of design techniques that have been developed in order to provide the framework with flexibility<p>and reusability, allowing developers to easily integrate new functionalities such as arbitrary mesh-based data structures, numerical algorithms (space discretizations, time stepping schemes, linear system solvers, ),and physical models. <p>Furthermore, we describe the parallel algorithms <p>that we have implemented in order to efficiently <p>read/write generic computational meshes involving <p>millions of degrees of freedom and partition them <p>in a scalable way: benchmarks on HPC clusters with <p>up to 512 processors show their effective suitability for large scale computing. <p>Several systems of partial differential equations, <p>characterizing flows in conditions of thermal and <p>chemical equilibrium (with fixed and variable elemental fractions)and, particularly, nonequilibrium (multi-temperature models) <p>have been integrated in the framework. <p>In order to simulate such flows, we have developed <p>two state-of-the-art flow solvers: <p>1- a parallel implicit 2D/3D steady and unsteady cell-centered Finite Volume (FV) solver for arbitrary systems of PDE's on hybrid unstructured meshes; <p>2- a parallel implicit 2D/3D steady vertex-centered Residual Distribution (RD) solver for arbitrary systems of PDE's on meshes with simplex elements (triangles and tetrahedra). <p>The FV~code has been extended to handle all <p>the available physical models, in regimes ranging from incompressible to hypersonic. <p>As far as the RD code is concerned, the strictly conservative variant of the RD method, denominated CRD, has been applied for the first time in literature to solve high speed viscous flows in thermochemical nonequilibrium, yielding some preliminary outstanding results on a challenging double cone flow simulation. <p>All the developments have been validated on real-life testcases of current interest in the aerospace community. A quantitative comparison with experimental measurements and/or literature has been performed whenever possible. / Doctorat en Sciences de l'ingénieur / info:eu-repo/semantics/nonPublished
680

Jahresbericht 2014 zur kooperativen DV-Versorgung

16 November 2017 (has links) (PDF)
No description available.

Page generated in 0.1051 seconds