Global ETD Search

211	Controle de granularidade com threads em programas MPI dinâmicos / Controlling granularity of dynamic mpi programs with threads Lima, João Vicente Ferreira January 2009 (has links) Nos últimos anos, a crescente demanda por alto desempenho tem favorecido o surgimento de arquiteturas e algoritmos cada vez mais eficientes. A popularidade das plataformas distribuídas levanta novas questões no desenvolvimento de algoritmos paralelos tais como comunicação, heterogeneidade e dinamismo de recursos. Estas questões podem resultar em aplicações com carga de trabalho conhecida somente em tempo de execução. A irregularidade do algoritmo ou da entrada de dados também pode influenciar na carga de trabalho da aplicação. Uma aplicação paralela pode solucionar estas questões por meio de algoritmos dinâmicos ao utilizar técnicas de programação que definam o trabalho de uma tarefa e possibilitem a utilização de recursos sob demanda. A granularidade, que é a razão entre processamento e comunicação, considera questões práticas de execução e é um fator importante no desempenho de algoritmos dinâmicos. A implementação de um controle de granularidade é complicada e depende do suporte dos ambientes de programação. Porém, os ambientes de programação possuem interfaces extensas e complicadas que dificultam sua utilização em PAD. Este trabalho propõe a implementação de uma biblioteca (libSpawn) que incorpora um controle de granularidade em aplicações MPI dinâmicas. A biblioteca controla a granularidade ao mapear tarefas entre processos ou threads de acordo com três parâmetros: cores da arquitetura, carga e recursos de sistema. Os tempos obtidos com processos e libSpawn demonstram ganhos significativos em benchmarks sintéticos utilizados por outros ambientes de programação. Não obstante, constata-se carências na implementação atual que produzem tempos anômalos, ainda que estes sejam insignificantes em relação aos tempos com processos. / In the last years, the demand for high performance enables the emergence of more efficient computing platforms and algorithms. The increase of distributed computing platforms rises new challenges for parallel algorithm development like communication, heterogeneity, and resource management. These factors can result in applications whose work load is unknown until runtime. An irregular behavior from algorithm or data can also affect the work load. A parallel application can solve these questions through a programming technique which predicts the work load of a task and offers resource on demand. The granularity, which is the ratio of computation to communication, considers more practical issues, and is an important factor in performance of dynamic algorithms. However, this control is difficult to be designed and the support of a programming tool is needed. Yet, the programming tools have extensive and complicated interfaces which difficult your usage in HPC. This work implements a library (libSpawn) which adds a granularity control on MPI dynamic programs. The library controls the granularity by mapping tasks between processes or threads with three parameters: cores of architecture, load and resources of the operating system. The results obtained between processes and libSpawn show significant gains on synthetic benchmarks from other programming tools. Processamento paralelo Mpi Parallel computing High performance computing Dynamic algorithms Granularity
212	Controle de granularidade com threads em programas MPI dinâmicos / Controlling granularity of dynamic mpi programs with threads Lima, João Vicente Ferreira January 2009 (has links) Nos últimos anos, a crescente demanda por alto desempenho tem favorecido o surgimento de arquiteturas e algoritmos cada vez mais eficientes. A popularidade das plataformas distribuídas levanta novas questões no desenvolvimento de algoritmos paralelos tais como comunicação, heterogeneidade e dinamismo de recursos. Estas questões podem resultar em aplicações com carga de trabalho conhecida somente em tempo de execução. A irregularidade do algoritmo ou da entrada de dados também pode influenciar na carga de trabalho da aplicação. Uma aplicação paralela pode solucionar estas questões por meio de algoritmos dinâmicos ao utilizar técnicas de programação que definam o trabalho de uma tarefa e possibilitem a utilização de recursos sob demanda. A granularidade, que é a razão entre processamento e comunicação, considera questões práticas de execução e é um fator importante no desempenho de algoritmos dinâmicos. A implementação de um controle de granularidade é complicada e depende do suporte dos ambientes de programação. Porém, os ambientes de programação possuem interfaces extensas e complicadas que dificultam sua utilização em PAD. Este trabalho propõe a implementação de uma biblioteca (libSpawn) que incorpora um controle de granularidade em aplicações MPI dinâmicas. A biblioteca controla a granularidade ao mapear tarefas entre processos ou threads de acordo com três parâmetros: cores da arquitetura, carga e recursos de sistema. Os tempos obtidos com processos e libSpawn demonstram ganhos significativos em benchmarks sintéticos utilizados por outros ambientes de programação. Não obstante, constata-se carências na implementação atual que produzem tempos anômalos, ainda que estes sejam insignificantes em relação aos tempos com processos. / In the last years, the demand for high performance enables the emergence of more efficient computing platforms and algorithms. The increase of distributed computing platforms rises new challenges for parallel algorithm development like communication, heterogeneity, and resource management. These factors can result in applications whose work load is unknown until runtime. An irregular behavior from algorithm or data can also affect the work load. A parallel application can solve these questions through a programming technique which predicts the work load of a task and offers resource on demand. The granularity, which is the ratio of computation to communication, considers more practical issues, and is an important factor in performance of dynamic algorithms. However, this control is difficult to be designed and the support of a programming tool is needed. Yet, the programming tools have extensive and complicated interfaces which difficult your usage in HPC. This work implements a library (libSpawn) which adds a granularity control on MPI dynamic programs. The library controls the granularity by mapping tasks between processes or threads with three parameters: cores of architecture, load and resources of the operating system. The results obtained between processes and libSpawn show significant gains on synthetic benchmarks from other programming tools. Processamento paralelo Mpi Parallel computing High performance computing Dynamic algorithms Granularity
213	Computação paralela em cluster de GPU aplicado a problema da engenharia nuclear MORAES, Sérgio Ricardo dos Santos 04 1900 (has links) Submitted by Almir Azevedo (barbio1313@gmail.com) on 2013-12-09T12:17:20Z No. of bitstreams: 1 dissertacao_mestrado_ien_2012_01.pdf: 1805099 bytes, checksum: c22681117de84a4db428c8b495af3eab (MD5) / Made available in DSpace on 2013-12-09T12:17:20Z (GMT). No. of bitstreams: 1 dissertacao_mestrado_ien_2012_01.pdf: 1805099 bytes, checksum: c22681117de84a4db428c8b495af3eab (MD5) Previous issue date: 2012 / A computação em cluster tem sido amplamente utilizada como uma alternativa de relativo baixo custo para processamento paralelo em aplicações científicas. Com a utilização do padrão de interface de troca de mensagens (MPI, do inglês Message-Passing Interface), o desenvolvimento tornou-se ainda mais acessível e difundido na comunidade científica. Uma tendência mais recente é a utilização de Unidades de Processamento Gráfico (GPU, do inglês Graphic Processing Unit), que são poderosos coprocessadores capazes de realizar centenas de instruções ao mesmo tempo, podendo chegar a uma capacidade de processamento centenas de vezes a de uma CPU. Entretanto, um microcomputador convencional não abriga, em geral, mais de duas GPUs. Portanto, propõe-se neste trabalho o desenvolvimento e avaliação de uma abordagem paralela híbrida de baixo custo na solução de um problema típico da engenharia nuclear. A ideia é utilizar a tecnologia de paralelismo em clusters (MPI) em conjunto com a de programação de GPUs (CUDA, do inglês Compute Unified Device Architecture) no desenvolvimento de um sistema para simulação do transporte de nêutrons, através de uma blindagem por meio do Método Monte Carlo. Utilizando a estrutura física de cluster composto de quatro computadores com processadores quad-core e 2 GPUs cada, foram desenvolvidos programas utilizando as tecnologias MPI e CUDA. Experimentos empregando diversas configurações, desde 1 até 8 GPUs, foram executados e comparados entre si, bem como com o programa sequencial (não paralelo). Observou-se uma redução do tempo de processamento da ordem de 2.000 vezes quando se comparada a versão paralela de 8 GPUs com a versão sequencial. Os resultados aqui apresentados são discutidos e analisados com o objetivo de destacar ganhos e possíveis limitações da abordagem proposta. / Cluster computing has been widely used as a low cost alternative for parallel processing in scientific applications. With the use of Message-Passing Interface (MPI) protocol development became even more accessible and widespread in the scientific community. A more recent trend is the use of Graphic Processing Unit (GPU), which is a powerful co-processor able to perform hundreds of instructions in parallel, reaching a capacity of hundreds of times the processing of a CPU. However, a standard PC does not allow, in general, more than two GPUs. Hence, it is proposed in this work development and evaluation of a hybrid low cost parallel approach to the solution to a nuclear engineering typical problem. The idea is to use clusters parallelism technology (MPI) together with GPU programming techniques (CUDA – Compute Unified Device Architeture) to simulate neutron transport through a slab using Monte Carlo method. By using a cluster comprised by four quad-core computers with 2 GPU each, it has been developed programs using MPI and CUDA technologies. Experiments, applying different configurations, from 1 to 8 GPUs has been performed and results were compared with the sequential (non-parallel) version. A speed up of about 2.000 times has been observed when comparing the 8- GPU with the sequential version. Results here presented are discussed and analysed with the objective of outlining gains and possible limitations of the proposed approah. Computação paralela Método de Monte Carlo Transporte de neutróns GPU CUDA MPI blindagem
214	Mappingstrategien für Kommunikatoren Ermer, Thomas 12 September 2005 (has links) (PDF) Es werden Fragen der effektiven Kommunikation in parallelen FEM-Systemen behandelt. Durch geschickte Partitionierung des betrachteten Gebietes und Verteilung auf die vorhandenen Prozessoren kann man versuchen, die Kommunikationslast zu minimieren, z. B. mit dem Programmsystem chaco. Ein möglichst paralleler Datenaustausch wird durch Anordnung der Kommunikationsschritte in so genannten Linkleveln versucht. In der vorliegenden Arbeit wird ausgehend von der MPI-basierten Koppelrandkommunikation ein Split-Algorithmus vorgestellt, der versucht, die Koppelranddaten großer Kommunikatoren auf die kleinerer Sub-Kommunikatoren abzubilden und damit die Anzahl der zu übertragenden Datenpakete zu minimieren. Kommunikatoren Kommunikatoren Split-Algorithmus ddc:004 Finite-Elemente-Methode Kommunikation MPI Parallelrechner
215	Supercomputing over Cloud using the Quicksort algorithm Mattamadugu, Lakshmi Narashima Seshendra, Pathan, Ashfaq Abdullah Khan January 2012 (has links) Context: Cloud Computing has advanced in recent years. It is catching people’s attention as a commodious resource of computational power. Slowly, Cloud is bringing new possibilities for a scientific community to build High Performance Computing platforms. Despite the wide benefits the Cloud offers, the question on everyone’s mind is “Whether the Cloud is a feasible platform for HPC applications”. This thesis evaluates the performance of the Amazon Cloud using a sorting benchmark. Objectives: 1. To investigate all the previous work on HPC that has been ported to the Cloud environment in various fields. Also, the problems and challenges are assessed relevant to HPC associated with the Cloud. 2. A study is done on how to implement parallel Quicksort efficiently to obtain good Speedup. 3. A parallel Quicksort is developed and its performance is measured using ‘Speedup’ by deploying in the Cloud. Methods: Two different research methods were used to carry out the research. They are Systematic Literature Review (SLR) and a Quantitative methodology. Research papers from academic databases namely IEEE Xplore, Inspec, ACM Digital Library and Springerlink were chosen for conducting SLR. Results: From the systematic review undertaken, 12 HPC applications, 9 problems and 5 challenges in the Cloud were identified. Efficient way to implement the parallel Quicksort on the Cloud has been identified. From the experiment results, a low Speedup is obtained in a Cloud environment. Conclusions: Many HPC applications which were deployed in the Cloud so far were identified along with problems and challenges. Message Passing interface (MPI) is chosen as the efficient method to develop and implement the parallel Quicksort in the Cloud. From the experiment results, we believe that the Cloud is not a suitable platform for HPC applications. Cloud Computing Quicksort HPC Amazon Web Services MPI Speedup. Computer Sciences Datavetenskap (datalogi) Telecommunications Telekommunikation
216	Evaluation of Parallel Programming Standards For Embedded High Performance Computing James Emmanuel Roy, Muggalla, Garimella, Pradeep January 2010 (has links) The aim of this project is to evaluate parallel programming standards for embedded high performance computing. There is a huge demand for high computational speed and performance in the present radar signal processing, so more processors are needed to get enough performance. One way of getting high performance is by dividing the work on multiple processors. At the same time, it has to get low communication overhead and good speedup. This has been done by using parallel computing languages such as OpenMP and MPI.We use these parallel programming languages on radar signal benchmark which is similar to many tasks in radar signal processing. For running OpenMP, a shared memory system SUNFIRE E2900 is used and for MPI, a SUNFIRE E2900, containing 8 nodes which uses SUN HPC cluster tools v5 is used. The OpenMP program shows pretty good speedup up to 5 processors, there after an increase in communication overhead is observed. MPI has shown low communication overhead at the beginning but got decreases when the numbers of processors were increased. Both OpenMP and MPI show similar aspects, at certain limit as the number of processors are increased there is decreasing trend in efficiency and increase in communication overhead. According to our results, OpenMP is a relatively easy to use program when compared to MPI. When using MPI it is up to the programmer to make explicit calls in order to parallelize. OpenMP MPI Parallelism Information Systems Computer Sciences Datavetenskap (datalogi)
217	Instalace a konfigurace Octave výpočetního clusteru / Installation and configuration of Octave computation cluster Mikulka, Zdeněk January 2014 (has links) This diploma thesis contains detailed design of high-performance cluster, primarely focused for parallel computing in Octave application. Each of component of this cluster is described along with instructions for installation and configuration. Cluster is based on GNU/Linux operating system and Message Parsing Interface. Design alllows implementation of this cluster in computers of schoolroom with active lessons.
218	Optimizing MPI Collective Communication by Orthogonal Structures Kühnemann, Matthias, Rauber, Thomas, Rünger, Gudula 28 June 2007 (has links) (PDF) Many parallel applications from scientific computing use MPI collective communication operations to collect or distribute data. Since the execution times of these communication operations increase with the number of participating processors, scalability problems might occur. In this article, we show for different MPI implementations how the execution time of collective communication operations can be significantly improved by a restructuring based on orthogonal processor structures with two or more levels. As platform, we consider a dual Xeon cluster, a Beowulf cluster and a Cray T3E with different MPI implementations. We show that the execution time of operations like MPI Bcast or MPI Allgather can be reduced by 40% and 70% on the dual Xeon cluster and the Beowulf cluster. But also on a Cray T3E a significant improvement can be obtained by a careful selection of the processor groups. We demonstrate that the optimized communication operations can be used to reduce the execution time of data parallel implementations of complex application programs without any other change of the computation and communication structure. Furthermore, we investigate how the execution time of orthogonal realization can be modeled using runtime functions. In particular, we consider the modeling of two-phase realizations of communication operations. We present runtime functions for the modeling and verify that these runtime functions can predict the execution time both for communication operations in isolation and in the context of application programs. communication operations message passing optimization parallel programming scientific computing ddc:000 MPI <Schnittstelle> Parallelverarbeitung
219	Correlates of the Scales of a Modified Screening Version of the Multidimensional Pain Inventory with Depression and Anxiety on a Chronic Pain Sample Walker, Katherine Elise 05 1900 (has links) This correlational study investigated the relationship between changes in the psychosocial scales of the MPI Screener Patient Report Card (Clark, 1996) with changes in depression and anxiety with a sample of chronic pain patients who completed a 4-week outpatient interdisciplinary treatment program located in a large regional medical center. Race, gender, and primary pain diagnosis were additional predictors. Data analyzed came from an existing patient outcome database (N = 203). Five research assumptions were examined using ten separate (five pre and five post-treatment) hierarchical multiple regression analyses. Statistical significance was found in pre and post-treatment analyses with predictors BDI-II (Beck, Steer, & Brown, 1996) and BAI (Beck & Steer, 1993) on criterions Pain Interference, Emotional Distress, and Life Control, and Total Function. MPI Screener interdisciplinary treatment Anxiety. depression chronic pain Chronic pain -- Psychological aspects. Depression, Mental.
220	Asynchronous Task-Based Parallelism in Seismic Imaging and Reservoir Modeling Simulations AlOnazi, Amani 26 August 2019 (has links) The components of high-performance systems continue to become more complex on the road to exascale. This complexity is exposed at the level of: multi/many-core CPUs, accelerators (GPUs), interconnects (horizontal communication), and memory hierarchies (vertical communication). A crucial task is designing an algorithm and a programming model that scale to the same order of the HPC system size at multiple levels. This trend in HPC architecture more critically affects memory-intensive appli- cations than compute-bound applications. Accomplishing this task involves adopting less synchronous forms of the mathematical algorithm, reducing synchronization in the computational implementation, introducing more SIMT-style concurrency at the finest level of system hierarchy, and increasing arithmetic intensity as the bottleneck shifts from number of floating-point operations to number of memory accesses. This dissertation addresses these challenges in scientific simulation focusing in the dominant kernels of a memory-bound application: sparse solvers in implicit model- ing, and I/O in explicit reverse time migration in seismic imaging. We introduce asynchronous task-based parallelism into iterative algebraic preconditioners. We also introduce a task-based framework that hides the latency of I/O with computation. This dissertation targets two main applications in the oil and gas industry: reservoir simulation and seismic imaging simulation. It presents results on multi- and many- core systems and GPUs on four Top500 supercomputers: Summit, TSUBAME 3.0, Shaheen II, and Makman-2. We introduce an asynchronous implementation of four major memory-bound kernels: Algebraic multigrid (MPI+OmpSs), tridiagonal solve (MPI+OpenMP), Additive Schwarz Preconditioned Inexact Newton (MPI+MPI), and Reverse Time Migration (StarPU/StarPU+MPI and CUDA). Asynchronous Algorithms Task-based runtimes MPI+X approach Task-based RTM Asynchronous AMG

Search results