• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 44
  • 7
  • 5
  • 1
  • Tagged with
  • 62
  • 62
  • 17
  • 16
  • 11
  • 11
  • 10
  • 10
  • 10
  • 9
  • 9
  • 9
  • 8
  • 8
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Escalonamento estático de programas-MPI

Silva, Rafael Ennes January 2006 (has links)
O bom desempenho de uma aplicação paralela é obtido conforme o modo como as técnicas de paralelização são empregadas. Para utilizar essas técnicas, é preciso encontrar uma forma adequada de extrair o paralelismo. Esta extração pode ser feita através de um grafo representativo da aplicação. Neste trabalho são aplicados métodos de particionamento de grafos para otimizar as comunicações entre os processos que fazem parte de uma computação paralela. Nesse contexto, a alocação dos processos almeja minimizar a quantidade de comunicações entre processadores. Esta técnica é frequentemente adotada em Processamento de Alto Desempenho - PAD. No entanto, a construção de grafo geralmente está embutida no programa, cujas estruturas de dados privadas são empregadas na contrução do grafo. A proposta é usar ferramentas diretamente em programas MPI, empregando, apenas, os recursos padr ões da norma MPI 1.2. O objetivo é fornecer uma biblioteca (b -MPI) portável para o escalonamento estático de programas MPI. O escalonamento estático realizado pela biblioteca é feito através do mapeamento de processos Esse mapeamento busca agrupar os processos que trocam muitas informações em um mesma máquina, o que nesse caso diminui o volume de dados trafegados pela rede. O mapeamento será realizado estaticamente após uma execução prévia do programa MPI. As aplicações alvo para o uso da b -MPI são aquelas que mantêm o mesmo padrão de comunicação após execuções sucessivas. A validação da biblioteca foi realizada atrav és da Transformada Rápida de Fourier disponível no pacote FFTW, da resolução do Problema de Transferência de Calor através do Método de Schwarz e Multigrid e da Fatora ção LU implementada no benchmark HPL. Os resultados mostraram que a b -MPI pode ser utilizada para distribuir os processos e cientemente minimizando o volume de mensagens trafegadas pela rede. / A good performance of a parallel application is obtained according to the mode as the parallelization techniques are applied. To make use of these techniques, is necessary to nd an appropriate way to extract the parallelism. This extraction can be done through a representative graph of the application. In this work, methods of partitioning graphs are applied to optimize the communication between processes that belong to a parallel computation. In this context, the processes allocation aims to minimize the communication amount between processors. This technique is frequently adopted in High Performance Computing - HPC. However, the graph building is generally inside the program, that has private data structures employed in the graph building. The proposal is to utilize tools directly in MPI programs, employing only standard resources of the MPI 1.2 norm. The goal is to provide a portable library (b -MPI) to static schedule MPI programs. The static scheduling realized by the library is done through the mapping of processes. This mapping seeks to cluster the processes that exchange a lot of information in the same machine that, in this case decreases the data volume passed through the net. The mapping will be done staticly after a previous execution of a MPI program. The target applications to make use of b -MPI are those whose keep the same communication pattern after successives executions. The library validation is done through the available applications in the FFTW package, the solving of the problem of Heat Transference through the Additive Schwarz Method and Multigrid and the LU factorization implemented in the HPL benchmark. The results show that b -MPI can be utilized to distribute the processes ef ciently minimizing the volume of messages exchanged through the network.
22

The GraphGrind framework : fast graph analytics on large shared-memory systems

Sun, Jiawen January 2018 (has links)
As shared memory systems support terabyte-sized main memory, they provide an opportunity to perform efficient graph analytics on a single machine. Graph analytics is characterised by frequent synchronisation, which is addressed in part by shared memory systems. However, performance is limited by load imbalance and poor memory locality, which originate in the irregular structure of small-world graphs. This dissertation demonstrates how graph partitioning can be used to optimise (i) load balance, (ii) Non-Uniform Memory Access (NUMA) locality and (iii) temporal locality of graph partitioning in shared memory systems. The developed techniques are implemented in GraphGrind, a new shared memory graph analytics framework. At first, this dissertation shows that heuristic edge-balanced partitioning results in an imbalance in the number of vertices per partition. Thus, load imbalance exists between partitions, either for loops iterating over vertices, or for loops iterating over edges. To address this issue, this dissertation introduces a classification of algorithms to distinguish whether they algorithmically benefit from edge-balanced or vertex-balanced partitioning. This classification supports the adaptation of partitions to the characteristics of graph algorithms. Evaluation in GraphGrind, shows that this outperforms state-of-the-art graph analytics frameworks for shared memory including Ligra by 1.46x on average, and Polymer by 1.16x on average, using a variety of graph algorithms and datasets. Secondly, this dissertation demonstrates that increasing the number of graph partitions is effective to improve temporal locality due to smaller working sets. However, the increasing number of partitions results in vertex replication in some graph data structures. This dissertation resorts to using a graph layout that is immune to vertex replication and an automatic graph traversal algorithm that extends the previously established graph traversal heuristics to a 3-way graph layout choice is designed. This new algorithm furthermore depends upon the classification of graph algorithms introduced in the first part of the work. These techniques achieve an average speedup of 1.79x over Ligra and 1.42x over Polymer. Finally, this dissertation presents a graph ordering algorithm to challenge the widely accepted heuristic to balance the number of edges per partition and minimise edge or vertex cut. This algorithm balances the number of edges per partition as well as the number of unique destinations of those edges. It balances edges and vertices for graphs with a power-law degree distribution. Moreover, this dissertation shows that the performance of graph ordering depends upon the characteristics of graph analytics frameworks, such as NUMA-awareness. This graph ordering algorithm achieves an average speedup of 1.87x over Ligra and 1.51x over Polymer.
23

Escalonamento estático de programas-MPI

Silva, Rafael Ennes January 2006 (has links)
O bom desempenho de uma aplicação paralela é obtido conforme o modo como as técnicas de paralelização são empregadas. Para utilizar essas técnicas, é preciso encontrar uma forma adequada de extrair o paralelismo. Esta extração pode ser feita através de um grafo representativo da aplicação. Neste trabalho são aplicados métodos de particionamento de grafos para otimizar as comunicações entre os processos que fazem parte de uma computação paralela. Nesse contexto, a alocação dos processos almeja minimizar a quantidade de comunicações entre processadores. Esta técnica é frequentemente adotada em Processamento de Alto Desempenho - PAD. No entanto, a construção de grafo geralmente está embutida no programa, cujas estruturas de dados privadas são empregadas na contrução do grafo. A proposta é usar ferramentas diretamente em programas MPI, empregando, apenas, os recursos padr ões da norma MPI 1.2. O objetivo é fornecer uma biblioteca (b -MPI) portável para o escalonamento estático de programas MPI. O escalonamento estático realizado pela biblioteca é feito através do mapeamento de processos Esse mapeamento busca agrupar os processos que trocam muitas informações em um mesma máquina, o que nesse caso diminui o volume de dados trafegados pela rede. O mapeamento será realizado estaticamente após uma execução prévia do programa MPI. As aplicações alvo para o uso da b -MPI são aquelas que mantêm o mesmo padrão de comunicação após execuções sucessivas. A validação da biblioteca foi realizada atrav és da Transformada Rápida de Fourier disponível no pacote FFTW, da resolução do Problema de Transferência de Calor através do Método de Schwarz e Multigrid e da Fatora ção LU implementada no benchmark HPL. Os resultados mostraram que a b -MPI pode ser utilizada para distribuir os processos e cientemente minimizando o volume de mensagens trafegadas pela rede. / A good performance of a parallel application is obtained according to the mode as the parallelization techniques are applied. To make use of these techniques, is necessary to nd an appropriate way to extract the parallelism. This extraction can be done through a representative graph of the application. In this work, methods of partitioning graphs are applied to optimize the communication between processes that belong to a parallel computation. In this context, the processes allocation aims to minimize the communication amount between processors. This technique is frequently adopted in High Performance Computing - HPC. However, the graph building is generally inside the program, that has private data structures employed in the graph building. The proposal is to utilize tools directly in MPI programs, employing only standard resources of the MPI 1.2 norm. The goal is to provide a portable library (b -MPI) to static schedule MPI programs. The static scheduling realized by the library is done through the mapping of processes. This mapping seeks to cluster the processes that exchange a lot of information in the same machine that, in this case decreases the data volume passed through the net. The mapping will be done staticly after a previous execution of a MPI program. The target applications to make use of b -MPI are those whose keep the same communication pattern after successives executions. The library validation is done through the available applications in the FFTW package, the solving of the problem of Heat Transference through the Additive Schwarz Method and Multigrid and the LU factorization implemented in the HPL benchmark. The results show that b -MPI can be utilized to distribute the processes ef ciently minimizing the volume of messages exchanged through the network.
24

Escalonamento estático de programas-MPI

Silva, Rafael Ennes January 2006 (has links)
O bom desempenho de uma aplicação paralela é obtido conforme o modo como as técnicas de paralelização são empregadas. Para utilizar essas técnicas, é preciso encontrar uma forma adequada de extrair o paralelismo. Esta extração pode ser feita através de um grafo representativo da aplicação. Neste trabalho são aplicados métodos de particionamento de grafos para otimizar as comunicações entre os processos que fazem parte de uma computação paralela. Nesse contexto, a alocação dos processos almeja minimizar a quantidade de comunicações entre processadores. Esta técnica é frequentemente adotada em Processamento de Alto Desempenho - PAD. No entanto, a construção de grafo geralmente está embutida no programa, cujas estruturas de dados privadas são empregadas na contrução do grafo. A proposta é usar ferramentas diretamente em programas MPI, empregando, apenas, os recursos padr ões da norma MPI 1.2. O objetivo é fornecer uma biblioteca (b -MPI) portável para o escalonamento estático de programas MPI. O escalonamento estático realizado pela biblioteca é feito através do mapeamento de processos Esse mapeamento busca agrupar os processos que trocam muitas informações em um mesma máquina, o que nesse caso diminui o volume de dados trafegados pela rede. O mapeamento será realizado estaticamente após uma execução prévia do programa MPI. As aplicações alvo para o uso da b -MPI são aquelas que mantêm o mesmo padrão de comunicação após execuções sucessivas. A validação da biblioteca foi realizada atrav és da Transformada Rápida de Fourier disponível no pacote FFTW, da resolução do Problema de Transferência de Calor através do Método de Schwarz e Multigrid e da Fatora ção LU implementada no benchmark HPL. Os resultados mostraram que a b -MPI pode ser utilizada para distribuir os processos e cientemente minimizando o volume de mensagens trafegadas pela rede. / A good performance of a parallel application is obtained according to the mode as the parallelization techniques are applied. To make use of these techniques, is necessary to nd an appropriate way to extract the parallelism. This extraction can be done through a representative graph of the application. In this work, methods of partitioning graphs are applied to optimize the communication between processes that belong to a parallel computation. In this context, the processes allocation aims to minimize the communication amount between processors. This technique is frequently adopted in High Performance Computing - HPC. However, the graph building is generally inside the program, that has private data structures employed in the graph building. The proposal is to utilize tools directly in MPI programs, employing only standard resources of the MPI 1.2 norm. The goal is to provide a portable library (b -MPI) to static schedule MPI programs. The static scheduling realized by the library is done through the mapping of processes. This mapping seeks to cluster the processes that exchange a lot of information in the same machine that, in this case decreases the data volume passed through the net. The mapping will be done staticly after a previous execution of a MPI program. The target applications to make use of b -MPI are those whose keep the same communication pattern after successives executions. The library validation is done through the available applications in the FFTW package, the solving of the problem of Heat Transference through the Additive Schwarz Method and Multigrid and the LU factorization implemented in the HPL benchmark. The results show that b -MPI can be utilized to distribute the processes ef ciently minimizing the volume of messages exchanged through the network.
25

Graph partitioning - a survey

Elsner, Ulrich 09 September 2005 (has links)
Many problems appearing in scientific computing and other areas can be formulated as a graph partitioning problems. Examples include data distribution for parallel computers, decomposition of sparse matrices and VLSI-design. In this survey we present the graph partitioning problem, describe some applications and introduce many of the algorithms used to solve the problem.
26

Distributed graph decomposition algorithms on Apache Spark

Mandal, Aritra 20 April 2018 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Structural analysis and mining of large and complex graphs for describing the characteristics of a vertex or an edge in the graph have widespread use in graph clustering, classification, and modeling. There are various methods for structural analysis of graphs including the discovery of frequent subgraphs or network motifs, counting triangles or graphlets, spectral analysis of networks using eigenvectors of graph Laplacian, and finding highly connected subgraphs such as cliques and quasi cliques. Unfortunately, the algorithms for solving most of the above tasks are quite costly, which makes them not-scalable to large real-life networks. Two such very popular decompositions, k-core and k-truss of a graph give very useful insight about the graph vertex and edges respectively. These decompositions have been applied to solve protein functions reasoning on protein-protein networks, fraud detection and missing link prediction problems. k-core decomposition with is linear time complexity is scalable to large real-life networks as long as the input graph fits in the main memory. k-truss on the other hands is computationally more intensive due to its definition relying on triangles and their is no linear time algorithm available for it. In this paper, we propose distributed algorithms on Apache Spark for k-truss and k-core decomposition of a graph. We also compare the performance of our algorithm with state-of-the-art Map-Reduce and parallel algorithms using openly available real world network data. Our proposed algorithms have shown substantial performance improvement.
27

GraphDHT: Scaling Graph Neural Networks' Distributed Training on Edge Devices on a Peer-to-Peer Distributed Hash Table Network

Gupta, Chirag 03 January 2024 (has links)
This thesis presents an innovative strategy for distributed Graph Neural Network (GNN) training, leveraging a peer-to-peer network of heterogeneous edge devices interconnected through a Distributed Hash Table (DHT). As GNNs become increasingly vital in analyzing graph-structured data across various domains, they pose unique challenges in computational demands and privacy preservation, particularly when deployed for training on edge devices like smartphones. To address these challenges, our study introduces the Adaptive Load- Balanced Partitioning (ALBP) technique in the GraphDHT system. This approach optimizes the division of graph datasets among edge devices, tailoring partitions to the computational capabilities of each device. By doing so, ALBP ensures efficient resource utilization across the network, significantly improving upon traditional participant selection strategies that often overlook the potential of lower-performance devices. Our methodology's core is weighted graph partitioning and model aggregation in GNNs, based on partition ratios, improving training efficiency and resource use. ALBP promotes inclusive device participation in training, overcoming computational limits and privacy concerns in large-scale graph data processing. Utilizing a DHT-based system enhances privacy in the peer-to-peer setup. The GraphDHT system, tested across various datasets and GNN architectures, shows ALBP's effectiveness in distributed GNN training and its broad applicability in different domains and structures. This contributes to applied machine learning, especially in optimizing distributed learning on edge devices. / Master of Science / Graph Neural Networks (GNNs) are a type of machine learning model that focuses on analyzing data structured like a network, such as social media connections or biological systems. These models can help identify patterns and make predictions in various tasks, but training them on large-scale datasets can require significant computing power and careful handling of sensitive data. This research proposes a new method for training GNNs on small devices, like smartphones, by dividing the data into smaller pieces and using a peer-to-peer (p2p) network for communication between devices. This approach allows the devices to work together and learn from the data while keeping sensitive information private. The main contributions of this research are threefold: (1) examining existing ways to divide network data and how they can be used for training GNNs on small devices, (2) improving the training process by creating a localized, decentralized network of devices that can communicate and learn together, and (3) testing the method on different types of datasets and GNN models, showing that it works well across a variety of situations. To sum up, this research offers a novel way to train GNNs on small devices, allowing for more efficient learning and better protection of sensitive information.
28

Computational Cost Analysis of Large-Scale Agent-Based Epidemic Simulations

Kamal, Tariq 21 September 2016 (has links)
Agent-based epidemic simulation (ABES) is a powerful and realistic approach for studying the impacts of disease dynamics and complex interventions on the spread of an infection in the population. Among many ABES systems, EpiSimdemics comes closest to the popular agent-based epidemic simulation systems developed by Eubank, Longini, Ferguson, and Parker. EpiSimdemics is a general framework that can model many reaction-diffusion processes besides the Susceptible-Exposed-Infectious-Recovered (SEIR) models. This model allows the study of complex systems as they interact, thus enabling researchers to model and observe the socio-technical trends and forces. Pandemic planning at the world level requires simulation of over 6 billion agents, where each agent has a unique set of demographics, daily activities, and behaviors. Moreover, the stochastic nature of epidemic models, the uncertainty in the initial conditions, and the variability of reactions require the computation of several replicates of a simulation for a meaningful study. Given the hard timelines to respond, running many replicates (15-25) of several configurations (10-100) (of these compute-heavy simulations) can only be possible on high-performance clusters (HPC). These agent-based epidemic simulations are irregular and show poor execution performance on high-performance clusters due to the evolutionary nature of their workload, large irregular communication and load imbalance. For increased utilization of HPC clusters, the simulation needs to be scalable. Many challenges arise when improving the performance of agent-based epidemic simulations on high-performance clusters. Firstly, large-scale graph-structured computation is central to the processing of these simulations, where the star-motif quality nodes (natural graphs) create large computational imbalances and communication hotspots. Secondly, the computation is performed by classes of tasks that are separated by global synchronization. The non-overlapping computations cause idle times, which introduce the load balancing and cost estimation challenges. Thirdly, the computation is overlapped with communication, which is difficult to measure using simple methods, thus making the cost estimation very challenging. Finally, the simulations are iterative and the workload (computation and communication) may change through iterations, as a result introducing load imbalances. This dissertation focuses on developing a cost estimation model and load balancing schemes to increase the runtime efficiency of agent-based epidemic simulations on high-performance clusters. While developing the cost model and load balancing schemes, we perform the static and dynamic load analysis of such simulations. We also statically quantified the computational and communication workloads in EpiSimdemics. We designed, developed and evaluated a cost model for estimating the execution cost of large-scale parallel agent-based epidemic simulations (and more generally for all constrained producer-consumer parallel algorithms). This cost model uses computational imbalances and communication latencies, and enables the cost estimation of those applications where the computation is performed by classes of tasks, separated by synchronization. It enables the performance analysis of parallel applications by computing its execution times on a number of partitions. Our evaluations show that the model is helpful in performance prediction, resource allocation and evaluation of load balancing schemes. As part of load balancing algorithms, we adopted the Metis library for partitioning bipartite graphs. We have also developed lower-overhead custom schemes called Colocation and MetColoc. We performed an evaluation of Metis, Colocation, and MetColoc. Our analysis showed that the MetColoc schemes gives a performance similar to Metis, but with half the partitioning overhead (runtime and memory). On the other hand, the Colocation scheme achieves a similar performance to Metis on a larger number of partitions, but at extremely lower partitioning overhead. Moreover, the memory requirements of Colocation scheme does not increase as we create more partitions. We have also performed the dynamic load analysis of agent-based epidemic simulations. For this, we studied the individual and joint effects of three disease parameter (transmissiblity, infection period and incubation period). We quantified the effects using an analytical equation with separate constants for SIS, SIR and SI disease models. The metric that we have developed in this work is useful for cost estimation of constrained producer-consumer algorithms, however, it has some limitations. The applicability of the metric is application, machine and data-specific. In the future, we plan to extend the metric to increase its applicability to a larger set of machine architectures, applications, and datasets. / Ph. D.
29

Detecção de comunidades em redes complexas utilizando estratégia multinível / Community detection in complex networks: a multilevel approach

Almeida, Leonardo Jesus 05 October 2009 (has links)
O grande volume de dados armazenados em meio digital dificulta a anáalise e extração de informações por um ser humano sem que seja utilizada alguma ferramenta computacional inteligente. A área de Aprendizado de Máquina (AM) estuda e desenvolve algoritmos para o processamento e obtenção automática de conhecimento em dados digitais. Tradicionalmente, os algoritmos de AM modelam os dados analisados com base na abordagem proposicional; entretanto, recentemente com a disponibilidade de conjuntos de dados relacionais novas abordagens têm sido estudadas, como a modelagem utilizando redes complexas. Redes complexas é uma área de pesquisa recente e ativa que têm atraíido a atenção de pesquisadores e tem sido aplicada em diversos domínios. Mais especificamente, o estudo de detecção de comunidades em redes complexas é o tema principal deste trabalho. Detectar comunidades consiste em buscar grupos de vértices densamente conectados entre si em uma rede. Detectar a melhor divisão em comunidades de uma rede é um problema NP-completo, o que requer que o desenvolvimento de soluções viáveis baseiem-se em heurísticas como, por exemplo, medidas de qualidade. Newman prop^os a medida de modularidade Q que tem se mostrado eficiiente na análise de comunidades em redes. Este trabalho apresenta o Algoritmo Multinível de Otimização de Modularidade (AMOM) que é baseado a na otimização da medida de modularidade e integrado na estratégia multinível. A estratégia multinível é composta de três fases: (i) sucessivas compactações da rede inicial com base em contrações de arestas e fus~oes de vértices, (ii) particionamento da rede reduzida utilizando Algoritmo de Otimização de Modularidade (AOM) modificado, e (iii) sucessivas descompactações das redes intermediárias até que se retorne a rede inicial. O principal atrativo da estratégia é viabilizar a utilização de algoritmos custosos no particionamento do grafo compactado, uma vez que neste grafo a quantidade de vértices e arestas é uma fração reduzida em relação ao grafo inicial. O trabalho também propõe dois novos métodos para refinamento dos particionamentos durante a fase de uncoasening. A fiim de avaliar a escalabilidade e eficiiência da metodologia proposta foram realizados experimentos empíricos em redes consideradas benchmark. Os resultados demonstram um significativo ganho de desempenho, mantendo bons resultados qualitativos / Human based analysis of large amount of data is a hard task when no intelligent computer aid is provided. In this context, Machine Learning (ML) algorithms are aimed at automatically processing and obtaining knowledge from data. In general, ML algorithms use a propositional representation of data such as an attribute-value table. However, this model is not suitable for relational information modeling, which can be better accomplished using graphs or networks. In this context, complex networks have been call attention of scientific community recently and many applications in different domains have been developed. In special, one of complex networks research trends is the community detection field which is the main focus of this work. Community detection is the problem of finding dense and disjoint connected groups of vertices in a network. The problem is a well know NP-complete task which requires heuristics approaches, like quality measures, to be addressed. Newman introduced a specific quality measure called modularity that proved to be useful for analysis communities in networks. This work presents a new algorithm, called Multilevel Modularity Optimization Algorithm, based on modularity measure optimization integrated in a multilevel graph partitioning strategy. The multilevel graph partitioning scheme consists of three phases: (i) reduction of the size (coarsen) of original graph by collapsing vertices and edges, (ii) partitioning the coarsened graph, and (iii) uncoarsen it to construct a partition for the original graph. The rationale behind this strategy is to apply a computationally expensive method in a coarsened graph, i.e., with a significantly reduced number of vertices and edges. In addition, it is proposed two new methods that uses modularity and clustering coefficient for partition refinement. Empirical evaluation on benchmarks networks using this approach demonstrate a significant speed up gain compared to the original modularity-based algorithm, keeping a good quality clusters partitioning
30

Equilibrage de charges dynamique avec un nombre variable de processeurs basé sur des méthodes de partitionnement de graphe / Dynamic Load-Balancing with Variable Number of Processors based on Graph Partitioning

Vuchener, Clement 07 February 2014 (has links)
L'équilibrage de charge est une étape importante conditionnant les performances des applications parallèles. Dans le cas où la charge varie au cours de la simulation, il est important de redistribuer régulièrement la charge entre les différents processeurs. Dans ce contexte, il peut s'avérer pertinent d'adapter le nombre de processeurs au cours d'une simulation afin d'obtenir une meilleure efficacité, ou de continuer l'exécution quand toute la mémoire des ressources courantes est utilisée. Contrairement au cas où le nombre de processeurs ne varie pas, le rééquilibrage dynamique avec un nombre variable de processeurs est un problème peu étudié que nous abordons ici.Cette thèse propose différentes méthodes basées sur le repartitionnement de graphe pour rééquilibrer la charge tout en changeant le nombre de processeurs. Nous appelons ce problème « repartitionnement M x N ». Ces méthodes se décomposent en deux grandes étapes. Dans un premier temps, nous étudions la phase de migration et nous construisons une « bonne » matrice de migration minimisant plusieurs critères objectifs comme le volume total de migration et le nombre total de messages échangés. Puis, dans un second temps, nous utilisons des heuristiques de partitionnement de graphe pour calculer une nouvelle distribution optimisant la migration en s'appuyant sur les résultats de l'étape précédente. En outre, nous proposons un algorithme de partitionnement k-aire direct permettant d'améliorer le partitionnement biaisé. Finalement, nous validons cette thèse par une étude expérimentale en comparant nos méthodes aux partitionneursactuels. / Load balancing is an important step conditioning the performance of parallel programs. If the workload varies drastically during the simulation, the load must be redistributed regularly among the processors. Dynamic load balancing is a well studied subject but most studies are limited to an initially fixed number of processors. Adjusting the number of processors at runtime allows to preserve the parallel code efficiency or to keep running the simulation when the memory of the current resources is exceeded.In this thesis, we propose some methods based on graph repartitioning in order to rebalance the load while changing the number of processors. We call this problem \M x N repartitioning". These methods are split in two main steps. Firstly, we study the migration phase and we build a \good" migration matrix minimizing several metrics like the migration volume or the number of exchanged messages. Secondly, we use graph partitioning heuristics to compute a new distribution optimizing the migration according to the previous step results. Besides, we propose a direct k-way partitioning algorithm that allows us to improve our biased partitioning. Finally, an experimental study validates our algorithms against state-of-the-art partitioning tools.

Page generated in 0.502 seconds