Global ETD Search

211	Controle de granularidade com threads em programas MPI dinâmicos / Controlling granularity of dynamic mpi programs with threads Lima, João Vicente Ferreira January 2009 (has links) Nos últimos anos, a crescente demanda por alto desempenho tem favorecido o surgimento de arquiteturas e algoritmos cada vez mais eficientes. A popularidade das plataformas distribuídas levanta novas questões no desenvolvimento de algoritmos paralelos tais como comunicação, heterogeneidade e dinamismo de recursos. Estas questões podem resultar em aplicações com carga de trabalho conhecida somente em tempo de execução. A irregularidade do algoritmo ou da entrada de dados também pode influenciar na carga de trabalho da aplicação. Uma aplicação paralela pode solucionar estas questões por meio de algoritmos dinâmicos ao utilizar técnicas de programação que definam o trabalho de uma tarefa e possibilitem a utilização de recursos sob demanda. A granularidade, que é a razão entre processamento e comunicação, considera questões práticas de execução e é um fator importante no desempenho de algoritmos dinâmicos. A implementação de um controle de granularidade é complicada e depende do suporte dos ambientes de programação. Porém, os ambientes de programação possuem interfaces extensas e complicadas que dificultam sua utilização em PAD. Este trabalho propõe a implementação de uma biblioteca (libSpawn) que incorpora um controle de granularidade em aplicações MPI dinâmicas. A biblioteca controla a granularidade ao mapear tarefas entre processos ou threads de acordo com três parâmetros: cores da arquitetura, carga e recursos de sistema. Os tempos obtidos com processos e libSpawn demonstram ganhos significativos em benchmarks sintéticos utilizados por outros ambientes de programação. Não obstante, constata-se carências na implementação atual que produzem tempos anômalos, ainda que estes sejam insignificantes em relação aos tempos com processos. / In the last years, the demand for high performance enables the emergence of more efficient computing platforms and algorithms. The increase of distributed computing platforms rises new challenges for parallel algorithm development like communication, heterogeneity, and resource management. These factors can result in applications whose work load is unknown until runtime. An irregular behavior from algorithm or data can also affect the work load. A parallel application can solve these questions through a programming technique which predicts the work load of a task and offers resource on demand. The granularity, which is the ratio of computation to communication, considers more practical issues, and is an important factor in performance of dynamic algorithms. However, this control is difficult to be designed and the support of a programming tool is needed. Yet, the programming tools have extensive and complicated interfaces which difficult your usage in HPC. This work implements a library (libSpawn) which adds a granularity control on MPI dynamic programs. The library controls the granularity by mapping tasks between processes or threads with three parameters: cores of architecture, load and resources of the operating system. The results obtained between processes and libSpawn show significant gains on synthetic benchmarks from other programming tools. Processamento paralelo Mpi Parallel computing High performance computing Dynamic algorithms Granularity
212	Controle de granularidade com threads em programas MPI dinâmicos / Controlling granularity of dynamic mpi programs with threads Lima, João Vicente Ferreira January 2009 (has links) Nos últimos anos, a crescente demanda por alto desempenho tem favorecido o surgimento de arquiteturas e algoritmos cada vez mais eficientes. A popularidade das plataformas distribuídas levanta novas questões no desenvolvimento de algoritmos paralelos tais como comunicação, heterogeneidade e dinamismo de recursos. Estas questões podem resultar em aplicações com carga de trabalho conhecida somente em tempo de execução. A irregularidade do algoritmo ou da entrada de dados também pode influenciar na carga de trabalho da aplicação. Uma aplicação paralela pode solucionar estas questões por meio de algoritmos dinâmicos ao utilizar técnicas de programação que definam o trabalho de uma tarefa e possibilitem a utilização de recursos sob demanda. A granularidade, que é a razão entre processamento e comunicação, considera questões práticas de execução e é um fator importante no desempenho de algoritmos dinâmicos. A implementação de um controle de granularidade é complicada e depende do suporte dos ambientes de programação. Porém, os ambientes de programação possuem interfaces extensas e complicadas que dificultam sua utilização em PAD. Este trabalho propõe a implementação de uma biblioteca (libSpawn) que incorpora um controle de granularidade em aplicações MPI dinâmicas. A biblioteca controla a granularidade ao mapear tarefas entre processos ou threads de acordo com três parâmetros: cores da arquitetura, carga e recursos de sistema. Os tempos obtidos com processos e libSpawn demonstram ganhos significativos em benchmarks sintéticos utilizados por outros ambientes de programação. Não obstante, constata-se carências na implementação atual que produzem tempos anômalos, ainda que estes sejam insignificantes em relação aos tempos com processos. / In the last years, the demand for high performance enables the emergence of more efficient computing platforms and algorithms. The increase of distributed computing platforms rises new challenges for parallel algorithm development like communication, heterogeneity, and resource management. These factors can result in applications whose work load is unknown until runtime. An irregular behavior from algorithm or data can also affect the work load. A parallel application can solve these questions through a programming technique which predicts the work load of a task and offers resource on demand. The granularity, which is the ratio of computation to communication, considers more practical issues, and is an important factor in performance of dynamic algorithms. However, this control is difficult to be designed and the support of a programming tool is needed. Yet, the programming tools have extensive and complicated interfaces which difficult your usage in HPC. This work implements a library (libSpawn) which adds a granularity control on MPI dynamic programs. The library controls the granularity by mapping tasks between processes or threads with three parameters: cores of architecture, load and resources of the operating system. The results obtained between processes and libSpawn show significant gains on synthetic benchmarks from other programming tools. Processamento paralelo Mpi Parallel computing High performance computing Dynamic algorithms Granularity
213	Computação paralela em cluster de GPU aplicado a problema da engenharia nuclear MORAES, Sérgio Ricardo dos Santos 04 1900 (has links) Submitted by Almir Azevedo (barbio1313@gmail.com) on 2013-12-09T12:17:20Z No. of bitstreams: 1 dissertacao_mestrado_ien_2012_01.pdf: 1805099 bytes, checksum: c22681117de84a4db428c8b495af3eab (MD5) / Made available in DSpace on 2013-12-09T12:17:20Z (GMT). No. of bitstreams: 1 dissertacao_mestrado_ien_2012_01.pdf: 1805099 bytes, checksum: c22681117de84a4db428c8b495af3eab (MD5) Previous issue date: 2012 / A computação em cluster tem sido amplamente utilizada como uma alternativa de relativo baixo custo para processamento paralelo em aplicações científicas. Com a utilização do padrão de interface de troca de mensagens (MPI, do inglês Message-Passing Interface), o desenvolvimento tornou-se ainda mais acessível e difundido na comunidade científica. Uma tendência mais recente é a utilização de Unidades de Processamento Gráfico (GPU, do inglês Graphic Processing Unit), que são poderosos coprocessadores capazes de realizar centenas de instruções ao mesmo tempo, podendo chegar a uma capacidade de processamento centenas de vezes a de uma CPU. Entretanto, um microcomputador convencional não abriga, em geral, mais de duas GPUs. Portanto, propõe-se neste trabalho o desenvolvimento e avaliação de uma abordagem paralela híbrida de baixo custo na solução de um problema típico da engenharia nuclear. A ideia é utilizar a tecnologia de paralelismo em clusters (MPI) em conjunto com a de programação de GPUs (CUDA, do inglês Compute Unified Device Architecture) no desenvolvimento de um sistema para simulação do transporte de nêutrons, através de uma blindagem por meio do Método Monte Carlo. Utilizando a estrutura física de cluster composto de quatro computadores com processadores quad-core e 2 GPUs cada, foram desenvolvidos programas utilizando as tecnologias MPI e CUDA. Experimentos empregando diversas configurações, desde 1 até 8 GPUs, foram executados e comparados entre si, bem como com o programa sequencial (não paralelo). Observou-se uma redução do tempo de processamento da ordem de 2.000 vezes quando se comparada a versão paralela de 8 GPUs com a versão sequencial. Os resultados aqui apresentados são discutidos e analisados com o objetivo de destacar ganhos e possíveis limitações da abordagem proposta. / Cluster computing has been widely used as a low cost alternative for parallel processing in scientific applications. With the use of Message-Passing Interface (MPI) protocol development became even more accessible and widespread in the scientific community. A more recent trend is the use of Graphic Processing Unit (GPU), which is a powerful co-processor able to perform hundreds of instructions in parallel, reaching a capacity of hundreds of times the processing of a CPU. However, a standard PC does not allow, in general, more than two GPUs. Hence, it is proposed in this work development and evaluation of a hybrid low cost parallel approach to the solution to a nuclear engineering typical problem. The idea is to use clusters parallelism technology (MPI) together with GPU programming techniques (CUDA – Compute Unified Device Architeture) to simulate neutron transport through a slab using Monte Carlo method. By using a cluster comprised by four quad-core computers with 2 GPU each, it has been developed programs using MPI and CUDA technologies. Experiments, applying different configurations, from 1 to 8 GPUs has been performed and results were compared with the sequential (non-parallel) version. A speed up of about 2.000 times has been observed when comparing the 8- GPU with the sequential version. Results here presented are discussed and analysed with the objective of outlining gains and possible limitations of the proposed approah. Computação paralela Método de Monte Carlo Transporte de neutróns GPU CUDA MPI blindagem
214	Mappingstrategien für Kommunikatoren Ermer, Thomas 12 September 2005 (has links) (PDF) Es werden Fragen der effektiven Kommunikation in parallelen FEM-Systemen behandelt. Durch geschickte Partitionierung des betrachteten Gebietes und Verteilung auf die vorhandenen Prozessoren kann man versuchen, die Kommunikationslast zu minimieren, z. B. mit dem Programmsystem chaco. Ein möglichst paralleler Datenaustausch wird durch Anordnung der Kommunikationsschritte in so genannten Linkleveln versucht. In der vorliegenden Arbeit wird ausgehend von der MPI-basierten Koppelrandkommunikation ein Split-Algorithmus vorgestellt, der versucht, die Koppelranddaten großer Kommunikatoren auf die kleinerer Sub-Kommunikatoren abzubilden und damit die Anzahl der zu übertragenden Datenpakete zu minimieren. Kommunikatoren Kommunikatoren Split-Algorithmus ddc:004 Finite-Elemente-Methode Kommunikation MPI Parallelrechner
215	Supercomputing over Cloud using the Quicksort algorithm Mattamadugu, Lakshmi Narashima Seshendra, Pathan, Ashfaq Abdullah Khan January 2012 (has links) Context: Cloud Computing has advanced in recent years. It is catching people’s attention as a commodious resource of computational power. Slowly, Cloud is bringing new possibilities for a scientific community to build High Performance Computing platforms. Despite the wide benefits the Cloud offers, the question on everyone’s mind is “Whether the Cloud is a feasible platform for HPC applications”. This thesis evaluates the performance of the Amazon Cloud using a sorting benchmark. Objectives: 1. To investigate all the previous work on HPC that has been ported to the Cloud environment in various fields. Also, the problems and challenges are assessed relevant to HPC associated with the Cloud. 2. A study is done on how to implement parallel Quicksort efficiently to obtain good Speedup. 3. A parallel Quicksort is developed and its performance is measured using ‘Speedup’ by deploying in the Cloud. Methods: Two different research methods were used to carry out the research. They are Systematic Literature Review (SLR) and a Quantitative methodology. Research papers from academic databases namely IEEE Xplore, Inspec, ACM Digital Library and Springerlink were chosen for conducting SLR. Results: From the systematic review undertaken, 12 HPC applications, 9 problems and 5 challenges in the Cloud were identified. Efficient way to implement the parallel Quicksort on the Cloud has been identified. From the experiment results, a low Speedup is obtained in a Cloud environment. Conclusions: Many HPC applications which were deployed in the Cloud so far were identified along with problems and challenges. Message Passing interface (MPI) is chosen as the efficient method to develop and implement the parallel Quicksort in the Cloud. From the experiment results, we believe that the Cloud is not a suitable platform for HPC applications. Cloud Computing Quicksort HPC Amazon Web Services MPI Speedup. Computer Sciences Datavetenskap (datalogi) Telecommunications Telekommunikation
216	Evaluation of Parallel Programming Standards For Embedded High Performance Computing James Emmanuel Roy, Muggalla, Garimella, Pradeep January 2010 (has links) The aim of this project is to evaluate parallel programming standards for embedded high performance computing. There is a huge demand for high computational speed and performance in the present radar signal processing, so more processors are needed to get enough performance. One way of getting high performance is by dividing the work on multiple processors. At the same time, it has to get low communication overhead and good speedup. This has been done by using parallel computing languages such as OpenMP and MPI.We use these parallel programming languages on radar signal benchmark which is similar to many tasks in radar signal processing. For running OpenMP, a shared memory system SUNFIRE E2900 is used and for MPI, a SUNFIRE E2900, containing 8 nodes which uses SUN HPC cluster tools v5 is used. The OpenMP program shows pretty good speedup up to 5 processors, there after an increase in communication overhead is observed. MPI has shown low communication overhead at the beginning but got decreases when the numbers of processors were increased. Both OpenMP and MPI show similar aspects, at certain limit as the number of processors are increased there is decreasing trend in efficiency and increase in communication overhead. According to our results, OpenMP is a relatively easy to use program when compared to MPI. When using MPI it is up to the programmer to make explicit calls in order to parallelize. OpenMP MPI Parallelism Information Systems Computer Sciences Datavetenskap (datalogi)
217	Instalace a konfigurace Octave výpočetního clusteru / Installation and configuration of Octave computation cluster Mikulka, Zdeněk January 2014 (has links) This diploma thesis contains detailed design of high-performance cluster, primarely focused for parallel computing in Octave application. Each of component of this cluster is described along with instructions for installation and configuration. Cluster is based on GNU/Linux operating system and Message Parsing Interface. Design alllows implementation of this cluster in computers of schoolroom with active lessons.
218	Optimizing MPI Collective Communication by Orthogonal Structures Kühnemann, Matthias, Rauber, Thomas, Rünger, Gudula 28 June 2007 (has links) (PDF) Many parallel applications from scientific computing use MPI collective communication operations to collect or distribute data. Since the execution times of these communication operations increase with the number of participating processors, scalability problems might occur. In this article, we show for different MPI implementations how the execution time of collective communication operations can be significantly improved by a restructuring based on orthogonal processor structures with two or more levels. As platform, we consider a dual Xeon cluster, a Beowulf cluster and a Cray T3E with different MPI implementations. We show that the execution time of operations like MPI Bcast or MPI Allgather can be reduced by 40% and 70% on the dual Xeon cluster and the Beowulf cluster. But also on a Cray T3E a significant improvement can be obtained by a careful selection of the processor groups. We demonstrate that the optimized communication operations can be used to reduce the execution time of data parallel implementations of complex application programs without any other change of the computation and communication structure. Furthermore, we investigate how the execution time of orthogonal realization can be modeled using runtime functions. In particular, we consider the modeling of two-phase realizations of communication operations. We present runtime functions for the modeling and verify that these runtime functions can predict the execution time both for communication operations in isolation and in the context of application programs. communication operations message passing optimization parallel programming scientific computing ddc:000 MPI <Schnittstelle> Parallelverarbeitung
219	Correlates of the Scales of a Modified Screening Version of the Multidimensional Pain Inventory with Depression and Anxiety on a Chronic Pain Sample Walker, Katherine Elise 05 1900 (has links) This correlational study investigated the relationship between changes in the psychosocial scales of the MPI Screener Patient Report Card (Clark, 1996) with changes in depression and anxiety with a sample of chronic pain patients who completed a 4-week outpatient interdisciplinary treatment program located in a large regional medical center. Race, gender, and primary pain diagnosis were additional predictors. Data analyzed came from an existing patient outcome database (N = 203). Five research assumptions were examined using ten separate (five pre and five post-treatment) hierarchical multiple regression analyses. Statistical significance was found in pre and post-treatment analyses with predictors BDI-II (Beck, Steer, & Brown, 1996) and BAI (Beck & Steer, 1993) on criterions Pain Interference, Emotional Distress, and Life Control, and Total Function. MPI Screener interdisciplinary treatment Anxiety. depression chronic pain Chronic pain -- Psychological aspects. Depression, Mental.
220	Dense and sparse parallel linear algebra algorithms on graphics processing units Lamas Daviña, Alejandro 13 November 2018 (has links) Una línea de desarrollo seguida en el campo de la supercomputación es el uso de procesadores de propósito específico para acelerar determinados tipos de cálculo. En esta tesis estudiamos el uso de tarjetas gráficas como aceleradores de la computación y lo aplicamos al ámbito del álgebra lineal. En particular trabajamos con la biblioteca SLEPc para resolver problemas de cálculo de autovalores en matrices de gran dimensión, y para aplicar funciones de matrices en los cálculos de aplicaciones científicas. SLEPc es una biblioteca paralela que se basa en el estándar MPI y está desarrollada con la premisa de ser escalable, esto es, de permitir resolver problemas más grandes al aumentar las unidades de procesado. El problema lineal de autovalores, Ax = lambda x en su forma estándar, lo abordamos con el uso de técnicas iterativas, en concreto con métodos de Krylov, con los que calculamos una pequeña porción del espectro de autovalores. Este tipo de algoritmos se basa en generar un subespacio de tamaño reducido (m) en el que proyectar el problema de gran dimensión (n), siendo m << n. Una vez se ha proyectado el problema, se resuelve este mediante métodos directos, que nos proporcionan aproximaciones a los autovalores del problema inicial que queríamos resolver. Las operaciones que se utilizan en la expansión del subespacio varían en función de si los autovalores deseados están en el exterior o en el interior del espectro. En caso de buscar autovalores en el exterior del espectro, la expansión se hace mediante multiplicaciones matriz-vector. Esta operación la realizamos en la GPU, bien mediante el uso de bibliotecas o mediante la creación de funciones que aprovechan la estructura de la matriz. En caso de autovalores en el interior del espectro, la expansión requiere resolver sistemas de ecuaciones lineales. En esta tesis implementamos varios algoritmos para la resolución de sistemas de ecuaciones lineales para el caso específico de matrices con estructura tridiagonal a bloques, que se ejecutan en GPU. En el cálculo de las funciones de matrices hemos de diferenciar entre la aplicación directa de una función sobre una matriz, f(A), y la aplicación de la acción de una función de matriz sobre un vector, f(A)b. El primer caso implica un cálculo denso que limita el tamaño del problema. El segundo permite trabajar con matrices dispersas grandes, y para resolverlo también hacemos uso de métodos de Krylov. La expansión del subespacio se hace mediante multiplicaciones matriz-vector, y hacemos uso de GPUs de la misma forma que al resolver autovalores. En este caso el problema proyectado comienza siendo de tamaño m, pero se incrementa en m en cada reinicio del método. La resolución del problema proyectado se hace aplicando una función de matriz de forma directa. Nosotros hemos implementado varios algoritmos para calcular las funciones de matrices raíz cuadrada y exponencial, en las que el uso de GPUs permite acelerar el cálculo. / One line of development followed in the field of supercomputing is the use of specific purpose processors to speed up certain types of computations. In this thesis we study the use of graphics processing units as computer accelerators and apply it to the field of linear algebra. In particular, we work with the SLEPc library to solve large scale eigenvalue problems, and to apply matrix functions in scientific applications. SLEPc is a parallel library based on the MPI standard and is developed with the premise of being scalable, i.e. to allow solving larger problems by increasing the processing units. We address the linear eigenvalue problem, Ax = lambda x in its standard form, using iterative techniques, in particular with Krylov's methods, with which we calculate a small portion of the eigenvalue spectrum. This type of algorithms is based on generating a subspace of reduced size (m) in which to project the large dimension problem (n), being m << n. Once the problem has been projected, it is solved by direct methods, which provide us with approximations of the eigenvalues of the initial problem we wanted to solve. The operations used in the expansion of the subspace vary depending on whether the desired eigenvalues are from the exterior or from the interior of the spectrum. In the case of searching for exterior eigenvalues, the expansion is done by matrix-vector multiplications. We do this on the GPU, either by using libraries or by creating functions that take advantage of the structure of the matrix. In the case of eigenvalues from the interior of the spectrum, the expansion requires solving linear systems of equations. In this thesis we implemented several algorithms to solve linear systems of equations for the specific case of matrices with a block-tridiagonal structure, that are run on GPU. In the computation of matrix functions we have to distinguish between the direct application of a matrix function, f(A), and the action of a matrix function on a vector, f(A)b. The first case involves a dense computation that limits the size of the problem. The second allows us to work with large sparse matrices, and to solve it we also make use of Krylov's methods. The expansion of subspace is done by matrix-vector multiplication, and we use GPUs in the same way as when solving eigenvalues. In this case the projected problem starts being of size m, but it is increased by m on each restart of the method. The solution of the projected problem is done by directly applying a matrix function. We have implemented several algorithms to compute the square root and the exponential matrix functions, in which the use of GPUs allows us to speed up the computation. / Una línia de desenvolupament seguida en el camp de la supercomputació és l'ús de processadors de propòsit específic per a accelerar determinats tipus de càlcul. En aquesta tesi estudiem l'ús de targetes gràfiques com a acceleradors de la computació i ho apliquem a l'àmbit de l'àlgebra lineal. En particular treballem amb la biblioteca SLEPc per a resoldre problemes de càlcul d'autovalors en matrius de gran dimensió, i per a aplicar funcions de matrius en els càlculs d'aplicacions científiques. SLEPc és una biblioteca paral·lela que es basa en l'estàndard MPI i està desenvolupada amb la premissa de ser escalable, açò és, de permetre resoldre problemes més grans en augmentar les unitats de processament. El problema lineal d'autovalors, Ax = lambda x en la seua forma estàndard, ho abordem amb l'ús de tècniques iteratives, en concret amb mètodes de Krylov, amb els quals calculem una xicoteta porció de l'espectre d'autovalors. Aquest tipus d'algorismes es basa a generar un subespai de grandària reduïda (m) en el qual projectar el problema de gran dimensió (n), sent m << n. Una vegada s'ha projectat el problema, es resol aquest mitjançant mètodes directes, que ens proporcionen aproximacions als autovalors del problema inicial que volíem resoldre. Les operacions que s'utilitzen en l'expansió del subespai varien en funció de si els autovalors desitjats estan en l'exterior o a l'interior de l'espectre. En cas de cercar autovalors en l'exterior de l'espectre, l'expansió es fa mitjançant multiplicacions matriu-vector. Aquesta operació la realitzem en la GPU, bé mitjançant l'ús de biblioteques o mitjançant la creació de funcions que aprofiten l'estructura de la matriu. En cas d'autovalors a l'interior de l'espectre, l'expansió requereix resoldre sistemes d'equacions lineals. En aquesta tesi implementem diversos algorismes per a la resolució de sistemes d'equacions lineals per al cas específic de matrius amb estructura tridiagonal a blocs, que s'executen en GPU. En el càlcul de les funcions de matrius hem de diferenciar entre l'aplicació directa d'una funció sobre una matriu, f(A), i l'aplicació de l'acció d'una funció de matriu sobre un vector, f(A)b. El primer cas implica un càlcul dens que limita la grandària del problema. El segon permet treballar amb matrius disperses grans, i per a resoldre-ho també fem ús de mètodes de Krylov. L'expansió del subespai es fa mitjançant multiplicacions matriu-vector, i fem ús de GPUs de la mateixa forma que en resoldre autovalors. En aquest cas el problema projectat comença sent de grandària m, però s'incrementa en m en cada reinici del mètode. La resolució del problema projectat es fa aplicant una funció de matriu de forma directa. Nosaltres hem implementat diversos algorismes per a calcular les funcions de matrius arrel quadrada i exponencial, en les quals l'ús de GPUs permet accelerar el càlcul. / Lamas Daviña, A. (2018). Dense and sparse parallel linear algebra algorithms on graphics processing units [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/112425 / TESIS Parallel computing, GPU MPI Eigenproblems Linear systems Matrix functions SLEPc

Search results