Global ETD Search

141	A Runtime Framework for Regular and Irregular Message-Driven Parallel Applications on GPU Systems Rengasamy, Vasudevan January 2014 (has links) (PDF) The effective use of GPUs for accelerating applications depends on a number of factors including effective asynchronous use of heterogeneous resources, reducing data transfer between CPU and GPU, increasing occupancy of GPU kernels, overlapping data transfers with computations, reducing GPU idling and kernel optimizations. Overcoming these challenges require considerable effort on the part of the application developers. Most optimization strategies are often proposed and tuned specifically for individual applications. Message-driven executions with over-decomposition of tasks constitute an important model for parallel programming and provide multiple benefits including communication-computation overlap and reduced idling on resources. Charm++ is one such message-driven language which employs over decomposition of tasks, computation-communication overlap and a measurement-based load balancer to achieve high CPU utilization. This research has developed an adaptive runtime framework for efficient executions of Charm++ message-driven parallel applications on GPU systems. In the first part of our research, we have developed a runtime framework, G-Charm with the focus primarily on optimizing regular applications. At runtime, G-Charm automatically combines multiple small GPU tasks into a single larger kernel which reduces the number of kernel invocations while improving CUDA occupancy. G-Charm also enables reuse of existing data in GPU global memory, performs GPU memory management and dynamic scheduling of tasks across CPU and GPU in order to reduce idle time. In order to combine the partial results obtained from the computations performed on CPU and GPU, G-Charm allows the user to specify an operator using which the partial results are combined at runtime. We also perform compile time code generation to reduce programming overhead. For Cholesky factorization, a regular parallel application, G-Charm provides 14% improvement over a highly tuned implementation. In the second part of our research, we extended our runtime to overcome the challenges presented by irregular applications such as a periodic generation of tasks, irregular memory access patterns and varying workloads during application execution. We developed models for deciding the number of tasks that can be combined into a kernel based on the rate of task generation, and the GPU occupancy of the tasks. For irregular applications, data reuse results in uncoalesced GPU memory access. We evaluated the effect of altering the global memory access pattern in improving coalesced access. We’ve also developed adaptive methods for hybrid execution on CPU and GPU wherein we consider the varying workloads while scheduling tasks across the CPU and GPU. We demonstrate that our dynamic strategies result in 8-38% reduction in execution times for an N-body simulation application and a molecular dynamics application over the corresponding static strategies that are amenable for regular applications. Graphics Processing Unit (GPU) Parallel Programming (Computer Science) Parallel Programming Models Parallel Programming Frameworks Charm++ (Computer Program Language) HybridAPI-GPU Management Framework G-Charm Framework Accelerator Based Computing Cholesky Factorization Computer Science
142	3D Multi-parameters Full Waveform Inversion for challenging 3D elastic land targets / Inversion sismique 3D des formes d'onde complètes pour des cibles terrestres complexes Trinh, Phuong-Thu 24 September 2018 (has links) L’imagerie sismique du sous-sol à partir de données terrestres est très difficile à effectuer due à la complexité 3D de la proche surface. Dans cette zone, les ondes sismiques sous forme d’un paquet compact de phases souvent imbriquées sont dominées par des effets élastiques et viscoélastiques, couplés aux effets dus à la surface libre qui génèrent des ondes de surface de grande amplitude et dispersives.L’interaction des ondes sismiques avec une topographie plus ou moins complexe dans un contexte de fortes hétérogénéités de la proche surface induit d’importantes conversions des ondes avec de fortes dispersions d’énergie. Il est donc nécessaire de prendre en compte à la fois une représentation tridimensionnelle précise de la topographie et une physique correcte qui rend compte de la propagation du champ d’onde dans le sous-sol au niveau de précision réclamé par l’imagerie sismique. Dans ce manuscrit, nous présentons une stratégie d’inversion des formes d’onde complètes (FWI en anglais) efficace, autonome et donc flexible, pour la construction de modèles de vitesse à partir de données sismiques terrestres, plus particulièrement dans les environnements dits de chevauchements d’arrière pays(foothills en anglais) aux variations de vitesse importantes.Nous proposons une formulation efficace de cette problématique basée sur une méthode d’éléments spectraux en domaine temporel sur une grille cartésienne déformée, dans laquelle les variations de topographie sont représentées par une description détaillée de sa géométrie via une interpolation d’ordre élevé. La propagation du champ d’onde est caractérisée par une élasticité linéaire anisotrope et par une atténuation isotrope du milieu: cette deuxième approximation semble suffisante pour l’imagerie crustale considérée dans ce travail. L’implémentation numérique du problème direct inclut des produits matricevecteurefficaces pour résoudre des équations élastodynamiques composant un système différentielhyperbolique du second ordre, pour les géométries tridimensionnelles rencontrées dans l’exploration sismique. Les expressions explicites des gradients de la fonction écart entre les données et les prédictions sont fournies et inclut les contributions de la densité, des paramètres élastiques et des coefficients d’atténuation. Ces expressions réclament le champ incident venant de la source au même temps de propagation que le champ adjoint. Pour ce faire, lors du calcul du champ adjoint à partir de l’instant final, le champ incident est recalculé au vol à partir de son état final, de conditions aux bords préalablement sauvegardées et de certains états intermédiaires sans stockage sur disques durs. Le gradient est donc estimé à partir de quantités sauvegardées en mémoire vive. Deux niveaux de parallélisme sont implémentés, l’un sur les sources et l’autre sur la décomposition du domaine pour chaque source, cequi est nécessaire pour aborder des configurations tridimensionnelles réalistes. Le préconditionnement de ce gradient est réalisé par un filtre dit de Bessel, utilisant une implémentation différentielle efficace fondée sur la même discrétisation de l’espace du problème direct et formulée par une approche d’éléments spectraux composant un système linéaire symétrique résolu par une technique itérative de gradient conjugué. De plus, une contrainte non-linéaire sur le rapport des vitesses de compression et de cisaillement est introduite dans le processus d’optimisation sans coût supplémentaire: cette introductions’avére nécessaire pour traiter les données en présence de faibles valeurs de vitesse proche de la surface libre.L’inversion élastique multi-paramètres en contexte de chevauchement est illustrée à travers des exemples de données synthétiques dans un premier temps, ce qui met en évidence les difficultés d’une telle reconstruction…. / Seismic imaging of onshore targets is very challenging due to the 3D complex near-surface-related effects. In such areas, the seismic wavefield is dominated by elastic and visco-elastic effects such as highly energetic and dispersive surface waves. The interaction of elastic waves with the rough topography and shallow heterogeneities leads to significant converted and scattering energies, implying that both accurate 3D geometry representation and correct physics of the wave propagation are required for a reliable structured imaging. In this manuscript, we present an efficient and flexible full waveform inversion (FWI) strategy for velocity model building in land, specifically in foothill areas.Viscoelastic FWI is a challenging task for current acquisition deployment at the crustal scale. We propose an efficient formulation based on a time-domain spectral element method (SEM) on a flexible Cartesian-based mesh, in which the topography variation is represented by an accurate high-order geometry interpolation. The wave propagation is described by the anisotropic elasticity and isotropic attenuation physics. The numerical implementation of the forward problem includes efficient matrix-vector products for solving second-order elastodynamic equations, even for completely deformed 3D geometries. Complete misfit gradient expressions including attenuation contribution spread into density, elastic parameters and attenuation factors are given in a consistent way. Combined adjoint and forward fields recomputation from final state and previously saved boundary values allows the estimation of gradients with no I/O efforts. Two-levels parallelism is implemented over sources and domain decomposition, which is necessary for 3D realistic configuration. The gradient preconditioning is performed by a so-called Bessel filter using an efficient differential implementation based on the SEM discretization on the forward mesh instead of the costly convolution often-used approach. A non-linear model constraint on the ratio of compressional and shear velocities is introduced into the optimization process at no extra cost.The challenges of the elastic multi-parameter FWI in complex land areas are highlighted through synthetic and real data applications. A 3D synthetic inverse-crime illustration is considered on a subset of the SEAM phase II Foothills model with 4 lines of 20 sources, providing a complete 3D illumination. As the data is dominated by surface waves, it is mainly sensitive to the S-wave velocity. We propose a two-steps data-windowing strategy, focusing on early body waves before considering the entire wavefield, including surface waves. The use of this data hierarchy together with the structurally-based Bessel preconditioning make possible to reconstruct accurately both P- and S-wavespeeds. The designed inversion strategy is combined with a low-to-high frequency hierarchy, successfully applied to the pseudo-2D dip-line survey of the SEAM II Foothill dataset. Under the limited illumination of a 2D acquisition, the model constraint on the ratio of P- and S-wavespeeds plays an important role to mitigate the ill-posedness of the multi-parameter inversion process. By also considering surface waves, we manage to exploit the maximum amount of information in the observed data to get a reliable model parameters estimation, both in the near-surface and in deeper part.The developed FWI frame and workflow are finally applied on a real foothill dataset. The application is challenging due to sparse acquisition design, especially noisy recording and complex underneath structures. Additional prior information such as the logs data is considered to assist the FWI design. The preliminary results, only relying on body waves, are shown to improve the kinematic fit and follow the expected geological interpretation. Model quality control through data-fit analysis and uncertainty studies help to identify artifacts in the inverted models. Physics Geophysics Applied mathematics Numerical method Parallel programming Seismic modeling and imaging Physics Geophysics Applied mathematics Numerical method Parallel programming Seismic modeling and imaging 520
143	EVOLUTION OF THE COST EFFECTIVE, HIGH PERFORMANCE GROUND SYSTEMS: A QUANTITATIVE APPROACH Hazra, Tushar K., Stephenson, Richard A., Troendly, Gregory M. 10 1900 (has links) International Telemetering Conference Proceedings / October 17-20, 1994 / Town & Country Hotel and Conference Center, San Diego, California / During the recent years of small satellite space access missions, the trend has been towards designing low-cost ground control centers to maintain the space/ground cost ratio. The use of personal computers (PC) in combination with high speed transputer modules as embedded parallel processors, provides a relatively affordable, highly versatile, and reliable desktop workstation upon which satellite telemetry systems can be built to meet the ever-growing challenge of the space missions today and of the future. This paper presents the feasibility of cost effective, high performance ground systems and a quantitative analysis and study in terms of performance, speedup, efficiency, and the compatibility of the architecture to commercial off the shelf (COTS) tools, and finally, introduces an operational high performance, low cost ground system to strengthen the insight of the concept. Ground Systems Modern Computer Architecture Embedded Systems Parallel Programming Benchmarks Performance Evaluation
144	Parallel algorithms for electromagnetic moment method formulations Davidson, David Bruce 12 1900 (has links) Thesis (PhD) -- Stellenbosch University, 1991. / ENGLISH ABSTRACT: This dissertation investigates the moment method solution of electromagnetic radiation and scattering problems using parallel computers. In particular, electromagnetically large problems with arbitrary geometries are considered. Such problems require a large number of unknowns to obtain adequate approximate solutions, and make great computational demands. This dissertation considers in detail the efficient exploitation of the potential offered by parallel computers for solving such problems, and in particular the class of local memory Multiple Instruction, Multiple Data systems. A brief history of parallel computing is presented. Methods for quantifying the efficiency of parallel algorithms are reviewed. The use of pseudo-code for documenting algorithms is discussed and a pseudo-code notation is defined that is used in later chapters. A new parallel conjugate gradient algorithm, suitable for the solution of general systems of linear equations with complex values, is presented. A method is described to handle efficiently the Hermitian transpose of the matrix required by the algorithm. Careful attention is paid to the theoretical analysis of the algorithm's parallel properties (in particular, speed-up and efficiency). Pseudo-code is presented for the algorithms. Timing results for a moment method code, running on a transputer array and using this conjugate gradient solver, are presented and compared to the theoretical predictions. A parallel LU algorithm is described and documented in pseudo-code. A new graphical description of the algorithm is presented that simplifies the identification of the parallelism and the analysis of the algorithm. The use of formal methods for extracting parallelism via the use of invariants is presented and new examples given. The speed-up and efficiency of the algorithm are analyzed theoretically, using new methods that are simpler than those described in the literature. Techniques for optimizing the efficiency of parallel algorithms are introduced, and illustrated with pseudo-code. New parallel forward and backward substitution algorithms using the data distribution required for the parallel LV algorithm are described, and documented with pseudo-code. Results obtained with a Occam 2 moment method code running on a transputer array using these parallel LU solver and substitution algorithms are presented and compared with the theoretical predictions. PARNEC, a new Occam 2 implementation of the thin-wire core of NEC2, is discussed. The basic 'theory of NEC2 is reviewed. Problems with early attempts at combining Occam and FORTRAN are reported. Methodologies for re-coding an old code written in an unstructured language in a. modern structured language are discussed. Methods of parallelizing the matrix generation are discussed. The accuracy of large moment method formulations is investigated, as is the effect of machine precision on the solutions. The use of the biconjugate gradient method to accelerate convergence is briefly considered and rejected. The increased size of problem that can be handled by PARNEC, running on a transputer array, is demonstrated. Conclusions are dra.wn regarding the contributions of this dissertation to the development of efficient parallel electromagnetic moment method algorithms. / AFRIKAANSE OPSOMMING: Hierdie proefskrif ondersoek die momentmetode oplossing van elektromagnetiese straling- en strooiingprobleme d.m.v. multiverwerkers. In besonder, elektromagneties groot probleme met arbitrere geometriee word beskou. Sulke probleme vereis 'n groot aantal onbekendes om 'n voldoende benaderde oplossing te kry, en stel groot berekenings vereistes. Hierdie proefskrif beskou in detail die doeltreffende benutting van die potensiaal wat multiverwerkers vir sulke problem hied, in besonder die klas van lokale geheue Veelvoudige Instruksie, Veelvoudige Data stelsels. 'n Kort geskiedenis van multiverwerkers word gegee. Metodes vir die kwantifisering van die effektiwiteit van multiverwerkers word hersien. Die . gebruik van pseudokode vir die dokumentering van algoritmes word bespreek en 'n pseudokode notasie word gedefinieer wat gebruik word in latere hoofstukke. 'n Nuwe parallelle toegevoegde helling-algoritme wat geskik is vir die oplossing van algemene stelsels van lineere vergelykings word aangebied. 'n Metode word beskryf om op 'n doeltreffende wyse die Hermitiese transponent van die matriks, wat deur die algoritme benodig word, te hanteer. Sorgvuldige aandag word aan die teoretiese analise van die paralleleienskappe van die algoritme gegee (in die besonder, versnelling en doeltreffendheid). Pseudokode word aangebied vir die algoritmes. Resultate vir die looptyd van 'n momentmetode program, wat op 'n transputerskikking loop, word gegee en vergelyk met die teoretiese voorspellings. 'n Parallelle L U algoritme word beskryf en gedokumenteer in pseudokode. 'n Nuwe grafiese beskrywing van die algoritme, wat die identifikasie van parallelisme en die analise van die algoritme vergemaklik, word gegee. Die gebruik van formele metodes vir die onttrekking van parallelisme d.m.v. invariante word getoon en nuwe voorbeelde word gegee. Die versnelling en doeltreffendheid van die algoritme word teoreties geanaliseer, d.m.v. nuwe metodes wat eenvoudiger is as die wat in die literatuur beskryf word. Tegnieke vir die optimering van die doeltreffendheid van parallelle algoritmes word ingevoer, en gelllustreer met pseudokode. Nuwe parallelle voor- en truwaarts-substitusie algoritmes wat die data verspreiding van die parallelle LU algoritme gebruik word beskryf, en gedokumenteer met pseudokode. Resultate verkry met 'n Occam 2 momentmetode program wat op 'n transputerskikking loop en die parallelle L U en substit'usie algoritmes gebruik, word gegee en vergelyk met teoretiese voorspellings. PARNEC, 'n nuwe Occam 2 implementering van die dun-draad kern van NEC2, word bespreek. Die basiese teorie van NEC2 word opgesom. Verslag word gedoen oor probleme met vroee pogings orh Occam en FORTRAN te kombineer. Metodes om 'n ou program, geskryf in 'n ongestruktureerde taal, in 'n moderne gestruktureerde taal te herskryf word bespreek. Metodes om die matriksopwekking te paralleliseer word bespreek. Die akkuraatheid van groot momentmetode formulerings word ondersoek, asook die effek van masjienpresisie op die oplossings. Die gebruik van die dubbeltoegevoegde helling-metode om konvergensie te versnel word kortliks beskou en verwerp. Die vergrote probleemgrootte, wat met PARNEC op- 'n transputerskikking uitgevoer kan word, word gedemonstreer. Gevolgtrekkings word gemaak rakende die bydraes van hierdie proefskrif tot die ontwikkeling van doeltreffende parallelle elektromagnetiese momentmetode algoritmes. Parallel programming (Computer science) Algorithms Dissertations -- Engineering
145	Load balancing of irregular parallel applications on heterogeneous computing environments Janjic, Vladimir January 2012 (has links) Large-scale heterogeneous distributed computing environments (such as Computational Grids and Clouds) offer the promise of access to a vast amount of computing resources at a relatively low cost. In order to ease the application development and deployment on such complex environments, high-level parallel programming languages exist that need to be supported by sophisticated runtime systems. One of the main problems that these runtime systems need to address is dynamic load balancing that ensures that no resources in the environment are underutilised or overloaded with work. This thesis deals with the problem of obtaining good speedups for irregular applications on heterogeneous distributed computing environments. It focuses on workstealing techniques that can be used for load balancing during the execution of irregular applications. It specifically addresses two problems that arise during work-stealing: where thieves should look for work during the application execution and how victims should respond to steal attempts. In particular, we describe and implement a new Feudal Stealing algorithm and also we describe and implement new granularity-driven task selection policies in the SCALES simulator, which is a work-stealing simulator developed for this thesis. In addition, we present the comprehensive evaluation of the Feudal Stealing algorithm and the granularity-driven task selection policies using the simulations of a large class of regular and irregular parallel applications on a wide range of computing environments. We show how the Feudal Stealing algorithm and the granularity-driven task selection policies bring significant improvements in speedups of irregular applications, compared to the state-of-the-art work-stealing algorithms. Furthermore, we also present the implementation of the task selection policies in the Grid-GUM runtime system [AZ06] for Glasgow Parallel Haskell (GpH) [THLPJ98], in addition to the implementation in SCALES, and we also present the evaluation of this implementation on a large set of synthetic applications. 004.01
146	An adaptive software transactional memory support for multi-core programming Chan, Kinson., 陳傑信. January 2009 (has links) published_or_final_version / Computer Science / Master / Master of Philosophy Transaction systems (Computer systems) Memory management (Computer science) Parallel programming (Computer science) Algorithms.
147	Erstellung einer einheitlichen Taxonomie für die Programmiermodelle der parallelen Programmierung Nestmann, Markus 02 May 2017 (has links) (PDF) Durch die parallele Programmierung wird ermöglicht, dass Programme nebenläufig auf mehreren CPU-Kernen oder CPUs ausgeführt werden können. Um das parallele Programmieren zu erleichtern, wurden diverse Sprachen (z.B. Erlang) und Bibliotheken (z.B. OpenMP) aufbauend auf parallele Programmiermodelle (z.B. Parallel Random Access Machine) entwickelt. Möchte z.B. ein Softwarearchitekt sich in einem Projekt für ein Programmiermodell entscheiden, muss er dabei auf mehrere wichtige Kriterien (z.B. Abhängigkeiten zur Hardware) achten. erleichternd für diese Suche sind Übersichten, die die Programmiermodelle in diesen Kriterien unterscheiden und ordnen. Werden existierenden Übersichten jedoch betrachtet, finden sich Unterschiede in der Klassifizierung, den verwendeten Begriffen und den aufgeführten Programmiermodellen. Diese Arbeit begleicht dieses Defizit, indem zuerst durch ein Systematic Literature Review die existierenden Taxonomien gesammelt und analysiert werden. Darauf aufbauend wird eine einheitliche Taxonomie erstellt. Mit dieser Taxonomie kann eine Übersicht über die parallelen Programmiermodelle erstellt werden. Diese Übersicht wird zusätzlich durch Informationen zu den jeweiligen Abhängigkeiten der Programmiermodelle zu der Hardware-Architektur erweitert werden. Der Softwarearchitekt (oder Projektleiter, Softwareentwickler,...) kann damit eine informierte Entscheidung treffen und ist nicht gezwungen alle Programmiermodelle einzeln zu analysieren. Parallele Programmiermodelle Taxonomie Parallel Programming Models Taxonomy Overview ddc:000 Taxonomie Parallelverarbeitung Modell
148	Study of Parallel Algorithms Related to Subsequence Problems on the Sequent Multiprocessor System Pothuru, Surendra 08 1900 (has links) The primary purpose of this work is to study, implement and analyze the performance of parallel algorithms related to subsequence problems. The problems include string to string correction problem, to determine the longest common subsequence problem and solving the sum-range-product, 1 —D pattern matching, longest non-decreasing (non-increasing) (LNS) and maximum positive subsequence (MPS) problems. The work also includes studying the techniques and issues involved in developing parallel applications. These algorithms are implemented on the Sequent Multiprocessor System. The subsequence problems have been defined, along with performance metrics that are utilized. The sequential and parallel algorithms have been summarized. The implementation issues which arise in the process of developing parallel applications have been identified and studied. subsequence problems algorithms computer science multiprocessors Parallel programming (Computer science) Computer algorithms.
149	Object-oriented parallel paradigms 17 March 2015 (has links) M.Sc. (Computer Science) / This report is primarily concerned with highlighting fmdings of a research recently undertaken towards completing the requirements for the M.Sc. degree of 1994 at the Rand Afrikaans University (RAU). The research is aimed at striving to investigate what benefits (if any) exist in Object-Oriented Parallel Systems. The area of research revolves around the Object-Oriented Parallel Paradigm (OOPP) which is currently under development by the author. One primary aim of this research is to investigate numerous current trends in Object-Oriented Parallel Systems and Language Developments with the objective of providing an indication as to whether the Object-Oriented methodology can be (or has been) successfully married with existing Parallel Processing mechanisms. New benefits may come about while attempting to combine these methodologies, and this expectation will also be reflected upon. The Object-Oriented methodology allows a system designer the ability to approach a problem with a good degree of problem space understanding; while Parallel Processing allows the system designer the ability to create extremely fast algorithms for solving problems amenable to Parallel Processing techniques. The question we attempt to answer is whether the Object-Oriented methodology can be successfully married to the Parallel Processing field (whilst maintaining a high degree of benefits encountered in both methodologies) so as to gain the best of both worlds. Certain papers have laid claim to their proposed system encompassing both the Object-Oriented methodology, as well as the Parallel Processing methodology. In view of this fact, we shall furthermore examine papers to see if any of these systems are candidates for successfully marrying Object-Oriented and Parallel Processing into one homogeneous body. Criticism will be given on the shortcomings of unsuccessful candidates. Based on the findings of the research, the report will culminate to the proposal of the Object-Oriented Parallel Paradigm (OOPP). OOPP will speculate on the most probable features that system designers can expect to see in an almost ideal Object-Oriented Parallel System. It is very important at this stage to mention that, at its current state of development, OOPP is only a paradigm; thus OOPP should be viewed merely as an abstract model intended to establish a solid foundation for building more formal Object-Oriented Parallel Methodologies. Furthermore, OOPP is intended to be suitable for present day systems and amenable (possibly with a few minor adjustments) to future systems. The author trusts OOPP to generate sufficient interest to warrant further research being commissioned. In this event, OOPP should be expected to undergo modifications and enhancements... Object-oriented databases Parallel programming (Computer science)
150	Desenvolvimento de modelos para predição de desempenho de programas paralelos MPI. / Development of Performance Prediction Models for MPI Parallel Programs Laine, Jean Marcos 27 January 2003 (has links) Existem muitos fatores capazes de influenciar o desempenho de um programa paralelo MPI (Message Passing Interface). Dentre esses fatores, podemos citar a quantidade de dados processados, o número de nós envolvidos na solução do problema, as características da rede de interconexão, o tipo de switch utilizado, entre outros. Por isso, realizar predições de desempenho sobre programas paralelos que utilizam passagem de mensagem não é uma tarefa trivial. Com o intuito de modelar e predizer o comportamento dos programas citados anteriormente, nosso trabalho foi desenvolvido baseado em uma metodologia de análise e predição de desempenho de programas paralelos MPI. Inicialmente, propomos um modelo gráfico, denominado DPGraph+, para representar o código das aplicações. Em seguida, desenvolvemos modelos analíticos, utilizando técnicas de ajuste de curvas, para representar o comportamento das estruturas de repetição compostas por primitivas de comunicação e/ou computação local. Além disso, elaboramos modelos para predizer o comportamento de aplicações do tipo mestre/escravo. Durante o desenvolvimento das atividades de análise e predição de desempenho, implementamos algumas funções para automatizar tarefas e facilitar nosso trabalho. Por último, modelamos e estimamos o desempenho de duas versões diferentes de um programa de multiplicação de matrizes, a fim de validar os modelos propostos. Os resultados das predições realizadas sobre os programas de multiplicação de matrizes foram satisfatórios. Na maioria dos casos preditos, os erros ficaram abaixo de 6 %, confirmando a validade e a precisão dos modelos elaborados. / There are many factors able to influence the performance of a MPI (Message Passing Interface) parallel program. Within these factors, we may cite: amount of data, number of nodes, characteristics of the network and type of switch, among others. Then, performance prediction isnt a easy task. The work was developed based on a methodology of analysis and performance prediction of MPI parallel programs. First of all, we proposed a graphical model, named DPGraph+, to represent the code of applications. Next, we developed analytical models applying curve fitting techniques to represent the behavior of repetition structure compounds by comunication primitives and/or local computations. Besides, we elaborated models to predict aplications of type master/slave. For development of performance prediction activities, some functions was developed to automate tasks and make our work easy. Finally, we modeled and predicted the performance of two different programs of matrix multiplication to prove the accuracy of models. The results of predictions on the programs were good. In the majority of predicted cases, the errors were down 6 %. With these results, we proved the accuracy of developed models. análise e predição de desempenho MPI MPI parallel programming performance analysis and prediction programação paralela

Search results