71 |
Impacto del subsistema de comunicación en el rendimiento de los computadores paralelos: desde el hardware hasta las aplicacionesPuente Varona, Valentín 20 February 2000 (has links)
A pesar del explosivo crecimiento de la capacidad computacional de los ordenadores convencionales, alimentada fundamentalmente por la rápida evolución experimentada por los procesadores, existen multitud de problemas de notable importancia que aún no pueden ser abordados de forma satisfactoria. La solución más factible para abordar este tipo de problemas se basa en la utilización de computadores paralelos. Esta tesis se centra en el estudio de la red de interconexión de los computadores paralelos, aportando soluciones eficaces para mejorar su rendimiento. Se proponen mejoras de los elementos críticos de la red: los encaminadores y la propia topología. Las nuevas propuestas derivadas del trabajo son:· Un eficaz mecanismo de encaminamiento con un menor coste. Esta idea fue empleada por IBM en el supercomputador IBM BlueGene/L.· Se ha mejorado la gestión interna de los encaminadores con un coste acotado.· Se presentan arquitecturas de almacenamiento para los encaminadores con una relación coste-rendimiento favorable.· Se propone una nueva disposición de la red de interconexión que permite mejorar sus propiedades topológicas de forma notable frente a las empleadas usualmente.
|
72 |
Checkpointing Algorithms for Parallel ComputersKalaiselvi, S 02 1900 (has links)
Checkpointing is a technique widely used in parallel/distributed computers for rollback error recovery. Checkpointing is defined as the coordinated saving of process state information at specified time instances. Checkpoints help in restoring the computation from the latest saved state, in case of failure. In addition to fault recovery, checkpointing has applications in fault detection, distributed debugging and process migration.
Checkpointing in uniprocessor systems is easy due to the fact that there is a single clock and events occur with respect to this clock. There is a clear demarcation of events that happens before a checkpoint and events that happens after a checkpoint. In parallel computers a large number of computers coordinate to solve a single problem. Since there might be multiple streams of execution, checkpoints have to be introduced along all these streams simultaneously. Absence of a global clock necessitates explicit coordination to obtain a consistent global state.
Events occurring in a distributed system, can be ordered partially using Lamport's happens before relation. Lamport's happens before relation ->is a partial ordering relation to identify dependent and concurrent events occurring in a distributed system.
It is defined as follows:
·If two events a and b happen in the same process, and if a happens before b, then a->b
·If a is the sending event of a message and b is the receiving event of the same message then a -> b
·If neither a à b nor b -> a, then a and b are said to be concurrent.
A consistent global state may have concurrent checkpoints. In the first chapter of the thesis we discuss issues regarding ordering of events in a parallel computer, need for coordination among checkpoints and other aspects related to checkpointing. Checkpointing locations can either be identified statically or dynamically. The static approach assumes that a representation of a program to be checkpointed is available with information that enables a programmer to specify the places where checkpoints are to be taken. The dynamic approach identifies the checkpointing locations at run time. In this thesis, we have proposed algorithms for both static and dynamic checkpointing. The main contributions of this thesis are as follows:
1. Parallel computers that are being built now have faster communication and hence more efficient clock synchronisation compared to those built a few years ago. Based on efficient clock synchronisation protocols, the clock drift in current machines can be maintained within a few microseconds. We have proposed a dynamic checkpointing algorithm for parallel computers assuming bounded clock drifts.
2. The shared memory paradigm is convenient for programming while message passing paradigm is easy to scale. Distributed Shared Memory (DSM) systems combine the advantage of both paradigms and can be visualized easily on top of a network of workstations. IEEE has recently proposed an interconnect standard called Scalable Coherent Interface (SCI), to con6gure computers as a Distributed Shared Memory system. A periodic dynamic checkpointing algorithm has been proposed in the thesis for a DSM system which uses the SCI standard.
3. When information about a parallel program is available one can make use of this knowledge to perform efficient checkpointing. A static checkpointing approach based on task graphs is proposed for parallel programs. The proposed task graph based static checkpointing approach has been implemented on a Parallel Virtual Machine (PVM) platform.
We now give a gist of various chapters of the thesis. Chapter 2 of the thesis gives a classification of existing checkpointing algorithms. The chapter surveys algorithm that have been reported in literature for checkpointing parallel/distributed systems. A point to be noted is that most of the algorithms published for checkpointing message passing systems are based on the seminal article by Chandy & Lamport. A large number of checkpointing algorithms have been published by relaxing the assumptions made in the above mentioned article and by extending the features to minimise the overheads of coordination and context saving.
Checkpointing for shared memory systems primarily extend cache coherence protocols to maintain a consistent memory. All of them assume that the main memory is safe for storing the context. Recently algorithms have been published for distributed shared memory systems, which extend the cache coherence protocols used in shared memory systems. They however also include methods for storing the status of distributed memory
in stable storage. Chapter 2 concludes with brief comments on the desirable features of a
checkpointing algorithm.
In Chapter 3, we develop a dynamic checkpointing algorithm for message passing systems assuming that the clock drift of processors in the system is bounded. Efficient clock synchronisation protocols have been implemented on recent parallel computers owing to the fact that communication between processors is very fast. Based on efficient clock synchronisation protocols, clock skew can be limited to a few microseconds. The algorithm proposed in the thesis uses clocks for checkpoint coordination and vector counts for identifying messages to be logged. The algorithm is a periodic, distributed algorithm. We prove correctness of the algorithm and compare it with similar clock based algorithms.
Distributed Shared Memory (DSM) systems provide the benefit of ease of programming in a scalable system. The recently proposed IEEE Scalable Coherent Interface (SCI) standard, facilitates the construction of scalable coherent systems. In Chapter 4 we discuss a checkpointing algorithm for an SCI based DSM system. SCI maintains cache coherence in hardware using a distributed cache directory which scales with the number of processors in the system. SCI recommends a two phase transaction protocol for communication. Our algorithm is a two phase centralised coordinated algorithm. Phase one initiates checkpoints and the checkpointing activity is completed in phase two. The correctness of the algorithm is established theoretically. The chapter concludes with the discussion of the features of SCI exploited by the checkpointing algorithm proposed in the thesis.
In Chapter 5, a static checkpointing algorithm is developed assuming that the program to be executed on a parallel computer is given as a directed acyclic task graph. We assume that the estimates of the time to execute each task in the task graph is given. Given the timing at which checkpoints are to be taken, the algorithm identifies a set of edges where checkpointing tasks can be placed ensuring that they form a consistent global checkpoint. The proposed algorithm eliminates coordination overhead at run time. It significantly reduces the context saving overhead by taking checkpoints along edges of the task graph. The algorithm is used as a preprocessing step before scheduling the tasks to processors. The algorithm complexity is O(km) where m is the number of edges in the graph and k the maximum number of global checkpoints to be taken.
The static algorithm is implemented on a parallel computer with a PVM environment as it is widely available and portable. The task graph of a program can be constructed manually or through program development tools. Our implementation is a collection of preprocessing and run time routines. The preprocessing routines operate on the task graph information to generate a set of edges to be checkpointed for each global checkpoint and write the information on disk. The run time routines save the context along the marked edges. In case of recovery, the recovery algorithms read the information from stable storage and reconstruct the context. The limitation of our static checkpointing algorithm is that it can operate only on deterministic task graphs. To demonstrate the practical feasibility of the proposed approach, case studies of checkpointing some parallel programs are included in the thesis.
We conclude the thesis with a summary of proposed algorithms and possible directions to continue research in the area of checkpointing.
|
73 |
Adaptive finite element simulation of flow and transport applications on parallel computersKirk, Benjamin Shelton 28 August 2008 (has links)
Not available / text
|
74 |
Adaptive finite element simulation of flow and transport applications on parallel computersKirk, Benjamin Shelton, 1978- 23 August 2011 (has links)
Not available / text
|
75 |
Técnicas de paralelização em GPGPU aplicadas em algoritmo para remoção de ruído multiplicativoGulo, Carlos Alex Sander Juvêncio [UNESP] 17 October 2012 (has links) (PDF)
Made available in DSpace on 2014-06-11T19:24:00Z (GMT). No. of bitstreams: 0
Previous issue date: 2012-10-17Bitstream added on 2014-06-13T20:30:51Z : No. of bitstreams: 1
gulo_casj_me_sjrp.pdf: 1004896 bytes, checksum: d189543ceda76e9ee5b4a62ae7aaaffa (MD5) / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / A evolução constante na velocidade de cálculos dos processadores tem sido uma grande aliada no desenvolvimento de áreas da Ciência que exigem processamento de alto desempenho. Associados aos recursos computacionais faz-se necessário o emprego de técnicas de computação paralela no intuito de explorar ao máximo a capacidade de processamento da arquitetura escolhida, bem como, reduzir o tempo de espera no processamento. No entanto, o custo financeiro para aquisição deste tipo dehardwarenão é muito baixo, implicando na busca de alternativas para sua utilização. As arquiteturas de processadores multicoree General Purpose Computing on Graphics Processing Unit(GPGPU), tornam-se opções de baixo custo, pois são projeta-das para oferecer infraestrutura para o processamento de alto desempenho e atender aplicações de tempo real. Com o aperfeiçoamento das tecnologias multicomputador, multiprocessador e GPGPU, a paralelização de técnicas de processamento de imagem tem obtido destaque por vi-abilizar a redução do tempo de processamento de métodos complexos aplicados em imagem de alta resolução. Neste trabalho, é apresentado o estudo e uma abordagem de paralelização em GPGPU, utilizando a arquitetura CUDA, do método de suavização de imagem baseado num modelo variacional, proposto por Jin e Yang (2011), e sua aplicação em imagens com al-tas resoluções. Os resultados obtidos nos experimentos, permitiram obter um speedupde até quinze vezes no tempo de processamento de imagens, comparando o algoritmo sequencial e o algoritmo otimizado paralelizado em CUDA, o que pode viabilizar sua utilização em diversas aplicações de tempo real / Supported by processors evolution, high performance computing have contributed to develop-ment in several scientific research areas which require advanced computations, such as image processing, augmented reality, and others. To fully exploit high performance computing availa-ble in these resources and to decrease processing time, is necessary apply parallel computing. However, those resources are expensive, which implies the search for alternatives ways to use it. The multicore processors architecture andGeneral Purpose Computing on Graphics Proces-sing Unit(GPGPU) become a low cost options, as they were designed to provide infrastructure for high performance computing and attend real-time applications.With the improvements gai-ned in technologies related to multicomputer, multiprocessor and, more recently, to GPGPUs, the parallelization of computational image processing techniques has gained extraordinary pro-minence. This parallelization is crucial for the use of such techniques in applications that have strong demands in terms of processing time, so that even more complex computational algo-rithms can be used, as well as their use on images of higher resolution. In this research, the parallelization in GPGPU of a recent image smoothing method based on a variation model is described and discussed. This method was proposed by Jin and Yang (2011) and is in-demand due to its computation time, and its use with high resolution images. The results obtained are very promising, revealing a speedup about fifteen times in terms of computational speed
|
76 |
Técnicas de paralelização em GPGPU aplicadas em algoritmo para remoção de ruído multiplicativo /Gulo, Carlos Alex Sander Juvêncio. January 2012 (has links)
Orientador: Antonio Carlos Sementille / Banca: José Remo Ferreira Brega / Banca: Edgard A. Lamounier Junior / Resumo: A evolução constante na velocidade de cálculos dos processadores tem sido uma grande aliada no desenvolvimento de áreas da Ciência que exigem processamento de alto desempenho. Associados aos recursos computacionais faz-se necessário o emprego de técnicas de computação paralela no intuito de explorar ao máximo a capacidade de processamento da arquitetura escolhida, bem como, reduzir o tempo de espera no processamento. No entanto, o custo financeiro para aquisição deste tipo dehardwarenão é muito baixo, implicando na busca de alternativas para sua utilização. As arquiteturas de processadores multicoree General Purpose Computing on Graphics Processing Unit(GPGPU), tornam-se opções de baixo custo, pois são projeta-das para oferecer infraestrutura para o processamento de alto desempenho e atender aplicações de tempo real. Com o aperfeiçoamento das tecnologias multicomputador, multiprocessador e GPGPU, a paralelização de técnicas de processamento de imagem tem obtido destaque por vi-abilizar a redução do tempo de processamento de métodos complexos aplicados em imagem de alta resolução. Neste trabalho, é apresentado o estudo e uma abordagem de paralelização em GPGPU, utilizando a arquitetura CUDA, do método de suavização de imagem baseado num modelo variacional, proposto por Jin e Yang (2011), e sua aplicação em imagens com al-tas resoluções. Os resultados obtidos nos experimentos, permitiram obter um speedupde até quinze vezes no tempo de processamento de imagens, comparando o algoritmo sequencial e o algoritmo otimizado paralelizado em CUDA, o que pode viabilizar sua utilização em diversas aplicações de tempo real / Abstract: Supported by processors evolution, high performance computing have contributed to develop-ment in several scientific research areas which require advanced computations, such as image processing, augmented reality, and others. To fully exploit high performance computing availa-ble in these resources and to decrease processing time, is necessary apply parallel computing. However, those resources are expensive, which implies the search for alternatives ways to use it. The multicore processors architecture andGeneral Purpose Computing on Graphics Proces-sing Unit(GPGPU) become a low cost options, as they were designed to provide infrastructure for high performance computing and attend real-time applications.With the improvements gai-ned in technologies related to multicomputer, multiprocessor and, more recently, to GPGPUs, the parallelization of computational image processing techniques has gained extraordinary pro-minence. This parallelization is crucial for the use of such techniques in applications that have strong demands in terms of processing time, so that even more complex computational algo-rithms can be used, as well as their use on images of higher resolution. In this research, the parallelization in GPGPU of a recent image smoothing method based on a variation model is described and discussed. This method was proposed by Jin and Yang (2011) and is in-demand due to its computation time, and its use with high resolution images. The results obtained are very promising, revealing a speedup about fifteen times in terms of computational speed / Mestre
|
77 |
Técnicas de computação paralela aplicadas ao método das características em sistemas hidráulicos = Parallel computing applied to method of characteristics in hydraulic systems. / Parallel computing applied to method of characteristics in hydraulic systemsNascimento Júnior, Orlando Saraiva, 1981- 22 August 2018 (has links)
Orientadores: Vitor Rafael Coluci, Lubienska Cristina Lucas Jaquiê Ribeiro / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Tecnologia / Made available in DSpace on 2018-08-22T12:49:14Z (GMT). No. of bitstreams: 1
NascimentoJunior_OrlandoSaraiva_M.pdf: 5339800 bytes, checksum: f37d5c4041d5404f5f45d33c5af054c5 (MD5)
Previous issue date: 2013 / Resumo: Uma instalação hidráulica é um conjunto de dispositivos hidromecânicos e tubos com a função de transportar um fluido. O controle do escoamento deste fluido ocorre por meio de manobras nos dispositivos hidromecânicos. Uma investigação sobre o impacto das manobras destes dispositivos em uma instalação hidráulica pode evitar danos físicos ao sistema (como rompimento de tubos, por exemplo). Uma das formas de se investigar o efeito destas manobras é por meio da simulação. A simulação permite estudar um sistema hidráulico, que após uma manobra hidráulica sai de uma situação contínua (regime permanente inicial), entra em um estado transitório (regime transiente) para posteriormente entrar em uma nova situação contínua (regime permanente final). No regime de transiente hidráulico são formadas ondas de sobrepressão e subpressão internas na tubulação e que podem levar a danos. Um dos métodos mais aceitos para simulações de transiente hidráulico é o método das características, que permite transformar as equações diferenciais parciais que descrevem o fenômeno em um conjunto de equações diferenciais ordinárias. Dependendo do tamanho do sistema hidráulico (número e comprimento de tubos, número de dispositivos eletromecânicos, etc), o custo computacional pode ser elevado para se obter as informações sobre o comportamento do transiente. Neste trabalho aplicamos técnicas de computação paralela em placas de vídeos para processamento de propósito geral (GPU) e em multi-núcleos (OpenMP) para acelerar os cálculos do transiente hidráulico. Utilizamos um sistema hidráulico composto por um reservatório, uma válvula e um tubo e determinamos o ganho de desempenho em função do tamanho do tubo do sistema. A técnica OpenMP forneceu ganhos computacionais de até 3.3× enquanto a técnica envolvendo GPUs forneceu ganhos de 17×. Dessa forma, placas gráficas se mostraram muito interessantes para acelerar simulações de transientes hidráulicos com o método das características / Abstract: A hydraulic system is a set of hydromechanical devices and tubes designed to transport fluids through controlled operations. Investigating the impact of these operations on hydraulic systems can avoid physical damage to its parts (such as breakage of pipes, for example). One way to investigate these impacts is through computational simulations. The simulations allow to study a hydraulic system during initial and final steady states (after some device operation, for instance), and the transient state between them. During the hydraulic transient state, high and low pressure waves are formed in the tubes and are the main cause of tube damages. One of the most accepted methods for transient hydraulic simulations is the method of characteristics, which allows to transform the partial differential equations that describe the phenomenon in a set of ordinary differential equations. Depending on the size of the hydraulic system (number and length of tubes, number of electromechanical devices, etc), the computational cost to obtain information about the behavior of the transient can be large. In this work, we apply techniques of parallel computing involving video cards for general purpose processing (GPU) and multi-cores (OpenMP) to accelerate hydraulic transient calculations. We simulated a hydraulic system consisting of a reservoir, a valve and a pipe to determine the performance speedup as a function of the size of the pipe. The OpenMP technique provided computational speedup up to 3.3× whereas the GPU technique provided speedup of 17×. Therefore, our results indicated that GPUs are very interesting to accelerate hydraulic transients simulations using the method of characteristics / Mestrado / Tecnologia e Inovação / Mestre em Tecnologia
|
78 |
An Optimizing Code Generator for a Class of Lattice-Boltzmann ComputationsPananilath, Irshad Muhammed January 2014 (has links) (PDF)
Lattice-Boltzmann method(LBM), a promising new particle-based simulation technique for complex and multiscale fluid flows, has seen tremendous adoption in recent years in computational fluid dynamics. Even with a state-of-the-art LBM solver such as Palabos, a user still has to manually write his program using the library-supplied primitives. We propose an automated code generator for a class of LBM computations with the objective to achieve high performance on modern architectures.
Tiling is a very important loop transformation used to improve the performance of stencil computations by exploiting locality and parallelism. In the first part of the work, we explore diamond tiling, a new tiling technique to exploit the inherent ability of most stencils to allow tile-wise concurrent start. This enables perfect load-balance during execution and reduces the frequency of synchronization required.
Few studies have looked at time tiling for LBM codes. We exploit a key similarity between stencils and LBM to enable polyhedral optimizations and in turn time tiling for LBM. Besides polyhedral transformations, we also describe a number of other complementary transformations and post processing necessary to obtain good parallel and SIMD performance on modern architectures. We also characterize the performance of LBM with the Roofline performance model.
Experimental results for standard LBM simulations like Lid Driven Cavity, Flow Past Cylinder, and Poiseuille Flow show that our scheme consistently outperforms Palabos–on average by3 x while running on 16 cores of a n Intel Xeon Sandy bridge system. We also obtain a very significant improvement of 2.47 x over the native production compiler on the SPECLBM benchmark.
|
79 |
Development and validation of distributed reactive control systems / Développement et validation de systèmes de contrôle réactifs distribuésMeuter, Cédric 14 March 2008 (has links)
A reactive control system is a computer system reacting to certain stimuli emitted by its environment in order to maintain it in a desired state. Distributed reactive control systems are generally composed of several processes, running in parallel on one or more computers, communicating with one another to perform the required control task. By their very nature, distributed reactive control systems are hard to design. Their distributed nature and/or the communication scheme used can introduce subtle unforeseen behaviours. When dealing with critical applications, such as plane control systems, or traffic light control systems, those unintended behaviours can have disastrous consequences. It is therefore essential, for the designer, to ensure that this does not happen. For that purpose, rigorous and systematic techniques can (and should) be applied as early as possible in the development process. In that spirit, this work aims at providing the designer with the necessary tools in order to facilitate the development and validation of such distributed reactive control systems. In particular, we show how using a dedicated language called dSL (Distributed Supervision language) can be used to ease the development process. We also study how validations techniques such as model-checking and testing can be applied in this context. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished
|
80 |
Some Novel Static Interconnection Networks For Parallel ComputersSebastian, M P 07 1900 (has links) (PDF)
No description available.
|
Page generated in 0.0914 seconds