• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 13
  • 2
  • 2
  • 1
  • Tagged with
  • 19
  • 19
  • 10
  • 9
  • 6
  • 6
  • 6
  • 6
  • 6
  • 6
  • 5
  • 5
  • 4
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Self-tuned parallel runtimes: a case of study for OpenMP

Durán González, Alejandro 22 October 2008 (has links)
In recent years parallel computing has become ubiquitous. Lead by the spread of commodity multicore processors, parallel programming is not anymore an obscure discipline only mastered by a few.Unfortunately, the amount of able parallel programmers has not increased at the same speed because is not easy to write parallel codes.Parallel programming is inherently different from sequential programming. Programmers must deal with a whole new set of problems: identification of parallelism, work and data distribution, load balancing, synchronization and communication.Parallel programmers have embraced several languages designed to allow the creation of parallel applications. In these languages, the programmer is not only responsible of identifying the parallelism but also of specifying low-level details of how the parallelism needs to exploited (e.g. scheduling, thread distribution ...). This is a burden than hampers the productivity of the programmers.We demonstrate that is possible for the runtime component of a parallel environment to adapt itself to the application and the execution environment and thus reducing the burden put into the programmer. For this purpose we study three different parameters that are involved in the parallel exploitation of the OpenMP parallel language: parallel loop scheduling, thread allocation in multiple levels of parallelism and task granularity control.In all the cases, we propose a self-tuned algorithm that will first perform an on-line profiling of the application and based on the information gathered it will adapt the value of the parameter to the one that maximizes the performance of the application.Our goal is not to develop methods that outperform a hand-tuned application for a specific scenario, as this is probably just as difficult as compiler code outperforming hand-tuned assembly code, but methods that get close to that performance with a minimum effort from the programmer. In other words, what we want to achieve with our self-tuned algorithms is to maximize the ratio performance over effort so the entry level to the parallelism is lower. The evaluation of our algorithms with different applications shows that we achieve that goal.
2

Insightful Performance Analysis of Many-Task Runtimes through Tool-Runtime Integration

Chaimov, Nicholas 06 September 2017 (has links)
Future supercomputers will require application developers to expose much more parallelism than current applications expose. In order to assist application developers in structuring their applications such that this is possible, new programming models and libraries are emerging, the many-task runtimes, to allow for the expression of orders of magnitude more parallelism than currently existing models. This dissertation describes the challenges that these emerging many-task runtimes will place on performance analysis, and proposes deep integration between runtimes and performance tools as a means of producing correct, insightful, and actionable performance results. I show how tool-runtime integration can be used to aid programmer understanding of performance characteristics and to provide online performance feedback to the runtime for Unified Parallel C (UPC), High Performance ParalleX (HPX), Apache Spark, the Open Community Runtime, and the OpenMP runtime.
3

Evaluations of the parallel extensions in .NET 4.0

Islam, Md. Rashedul, Islam, Md. Rofiqul, Mazumder, Tahidul Arafhin January 2011 (has links)
Parallel programming or making parallel application is a great challenging part of computing research. The main goal of parallel programming research is to improve performance of computer applications. A well-structured parallel application can achieve better performance in terms of execution speed over sequential execution on existing and upcoming parallel computer architecture. This thesis named "Evaluations of the parallel extensions in .NET 4.0" describes the experimental evaluation of different parallel application performance with thread-safe data structure and parallel constructions in .NET Framework 4.0. Described different performance issues of this thesis help to make efficient parallel application for better performance. Before describing the experimental evaluation, this thesis describes some methodologies relevant to parallel programming like Parallel computer architecture, Memory architectures, Parallel programming models, decomposition, threading etc. It describes the different APIs in .NET Framework 4.0 and the way of coding for making an efficient parallel application in different situations. It also presents some implementations of different parallel constructs or APIs like Static Multithreading, Using ThreadPool, Task, Parallel.For, Parallel.ForEach, PLINQ etc. The evaluation of parallel application has been done by experimental result evaluation and performance measurements. In most of the cases, the result evaluation shows better performance of parallelism like less execution time and increase CPU uses over traditional sequential execution. In addition parallel loop doesn’t show better performance in case of improper partitioning, oversubscription, improper workloads etc. The discussion about proper partitioning, oversubscription and proper work load balancing will help to make more efficient parallel application. / Program: Magisterutbildning i informatik
4

Stratégies d'analyse de performance pour les applications basées sur tâches sur plates-formes hybrides / Performance Analysis Strategies for Task-based Applications on Hybrid Platforms

Garcia Pinto, Vinicius 30 October 2018 (has links)
Les techniques de programmations pour le calcul de haute performanceont adopté les modèles basés sur parallélisme de tâche qui sontcapables de s’adapter plus facilement à des superordinateurs avec desarchitectures hybrides. La performance des applications basées surtâches dépende fortement des heuristiques d'ordonnancement dynamiqueset de sa capacité à exploiter les ressources de calcul et decommunication.Malheureusement, les stratégies d'analyse de performancetraditionnelles ne sont pas convenables pour comprendre les supportsd'exécution dynamiques et les applications basées sur tâches. Cesstratégies prévoient un comportement régulier avec des phases decalcul et de communication, par contre, des applications basées surtâches ne manifestent pas de phases précises. Par ailleurs, la granularitéplus fine des applications basées sur tâches typiquement provoque descomportements stochastiques qui donnent lieu aux structuresirrégulières qui sont difficiles à analyser.Dans cette thèse, nous proposons des stratégies d'analyse deperformance qui exploitent la combinaison de la structure del'application, d'ordonnancement et des informations de laplate-forme. Nous présentons comment nos stratégies peuvent aider àcomprendre des problèmes de performance dans des applications baséesur tâches qui exécutent dans des plates-formes hybrides. Nosstratégies d'analyse de performance sont construites avec des outilsmodernes pour l'analyse de données, ce que permettre la création despanneaux de visualisation personnalisés. Ces panneaux permettent lacompréhension et l'identification de problèmes de performancesoccasionnés par de mauvaises décisions d'ordonnancement etconfiguration incorrect du support d'exécution et de laplate-forme. Grâce à combinaison de simulation et débogage nouspouvons aussi construire une représentation visuelle de l'état interneet des estimations calculées par l'ordonnancer durant l'ordonnancementd'une nouvelle tâche.Nous validons notre proposition parmi de l'analyse de tracesd'exécutions d'une factorisation de Cholesky implémenté avec lesupport d'exécution StarPU et exécutée dans une plate-forme hybride(CPU/GPU). Nos études de cas montrent comment améliorer la partitiondes tâches entre le multi-(GPU, coeur) pour s'approcher des bornesinférieures théoriques, comment améliorer le pipeline des opérationsMPI entre le multi-(noeud, coeur, GPU) pour réduire le démarrage lentedans les noeuds distribués et comment optimiser le support d'exécutionpour augmenter la bande passante MPI. Avec l'emploi des stratégies desimulation et débogage, nous fournissons un workflow pourl'examiner, en détail, les décisions d'ordonnancement. Cela permet deproposer des changements pour améliorer les mécanismes d'ordonnancementet prefetch du support d'exécution. / Programming paradigms in High-Performance Computing have been shiftingtoward task-based models that are capable of adapting readily toheterogeneous and scalable supercomputers. The performance oftask-based applications heavily depends on the runtime schedulingheuristics and on its ability to exploit computing and communicationresources.Unfortunately, the traditional performance analysis strategies areunfit to fully understand task-based runtime systems and applications:they expect a regular behavior with communication and computationphases, while task-based applications demonstrate no clearphases. Moreover, the finer granularity of task-based applicationstypically induces a stochastic behavior that leads to irregularstructures that are difficult to analyze.In this thesis, we propose performance analysis strategies thatexploit the combination of application structure, scheduler, andhardware information. We show how our strategies can help tounderstand performance issues of task-based applications running onhybrid platforms. Our performance analysis strategies are built on topof modern data analysis tools, enabling the creation of customvisualization panels that allow understanding and pinpointingperformance problems incurred by bad scheduling decisions andincorrect runtime system and platform configuration.By combining simulation and debugging we are also able to build a visualrepresentation of the internal state and the estimations computed bythe scheduler when scheduling a new task.We validate our proposal by analyzing traces from a Choleskydecomposition implemented with the StarPU task-based runtime systemand running on hybrid (CPU/GPU) platforms. Our case studies show howto enhance the task partitioning among the multi-(GPU, core) to getcloser to theoretical lower bounds, how to improve MPI pipelining inmulti-(node, core, GPU) to reduce the slow start in distributed nodesand how to upgrade the runtime system to increase MPI bandwidth. Byemploying simulation and debugging strategies, we also provide aworkflow to investigate, in depth, assumptions concerning the schedulerdecisions. This allows us to suggest changes to improve the runtimesystem scheduling and prefetch mechanisms.
5

Task Parallelism For Ray Tracing On A Gpu Cluster

Unlu, Caglar 01 February 2008 (has links) (PDF)
Ray tracing is a computationally complex global illumination algorithm that is used for producing realistic images. In addition to parallel implementations on commodity PC clusters, recently, Graphics Processing Units (GPU) have also been used to accelerate ray tracing. In this thesis, ray tracing is accelerated on a GPU cluster where the viewing plane is divided into unit tiles. Slave processes work on these tiles in a task parallel manner which are dynamically assigned to them. To decrease the number of ray-triangle intersection tests, Bounding Volume Hierarchies (BVH) are used. It is shown that almost linear speedup can be achieved. On the other hand, it is observed that API and network overheads are obstacles for scalability.
6

Exploiting Parallelism in GPUs

Hechtman, Blake Alan January 2014 (has links)
<p>Heterogeneous processors with accelerators provide an opportunity to improve performance within a given power budget.</p><p>Many of these heterogeneous processors contain Graphics Processing Units (GPUs) that can perform graphics and embarrassingly parallel computation orders of magnitude faster than a CPU while using less energy. Beyond these obvious applications for GPUs, a larger variety of applications can benefit from a GPU's large computation and memory bandwidth. However, many of these applications are irregular and, as a result, require synchronization and scheduling that are commonly believed to perform poorly on GPUs. The basic building block of synchronization and scheduling is memory consistency, which is, therefore, the first place to look for improving performance on irregular applications. In this thesis, we approach the programmability of irregular applications on GPUs by thinking across traditional boundaries of the compute stack. We think about architecture, microarchitecture and runtime systems from the programmers perspective. To this end, we study architectural memory consistency on future GPUs with cache coherence. In addition, we design a GPU memory system</p><p>microarchitecture that can support fine-grain and coarse-grain synchronization without sacrificing throughput. Finally, we develop a task runtime that embraces the GPU microarchitecture to perform well</p><p>on fork/join parallelism desired by many programmers. Overall, this thesis contributes non-intuitive solutions to improve the performance and programmability of irregular applications from the programmer's perspective.</p> / Dissertation
7

Providing adaptability to MPI applications on current parallel architectures / Provendo adaptabilidade em aplicações MPI nas arquiteturas paralelas atuais

Cera, Marcia Cristina January 2012 (has links)
Atualmente, adaptabilidade é uma característica desejada em aplicações paralelas. Por exemplo, o crescente número de usuários competindo por recursos em arquiteturas paralelas gera mudanças constantes no conjunto de processadores disponíveis. Aplicações adaptativas são capazes de executar usando um conjunto volátil de processadores, oferecendo urna melhor utilização dos recursos. Este comportamento adaptativo é conhecido corno maleabilidade. Outro exemplo vem da constante evolução das arquiteturas multi-core, as quais aumentam o número de cores em seus chips a cada nova geração. Adaptabilidade é a chave para permitir que os programas paralelos sejam portáveis de uma máquina a outra. Assim. os programas paralelos são capazes de adaptar a extração do paralelismo de acordo com o grau de paralelismo específico da arquitetura alvo. Este comportamento pode ser visto como um caso particular de evolutividade. Nesse sentido, esta tese está focada em: (i) maleabilidade para adaptar a execução das aplicações paralelas às mudanças na disponibilidade dos processadores; e (ii) evolutividade para adaptar a extração do paralelismo de acordo com propriedades da arquitetura e dos dados de entrada. Portanto, a questão remanescente é "Como prover e suportar aplicações adaptativas?". Esta tese visa responder tal questão com base no MPI (Message-Passing Interface), o qual é a API paralela padrão para HPC em ambientes distribuídos. Nosso trabalho baseia-se nas características do MPI-2 que permitem criar processos em tempo de execução, dando alguma flexibilidade às aplicações MPI. Aplicações MPI maleáveis usam a criação dinâmica de processos para expandir-se nas ações de crescimento (para usar processadores extras). As ações de diminuição (para liberar processadores) finalizam os processos MPI que executam nos processadores requeridos, preservando os dados da aplicação. Note que as aplicações maleáveis requerem suporte do ambiente de execução, uma vez que precisam ser notificadas sobre a disponibilidade dos processadores. Aplicações MPI evolutivas seguem o paradigma do paralelismo de tarefas explícitas para permitir adaptação em tempo de execução. Assim, a criação dinâmica de processos é usada para extrair o paralelismo, ou seja, para criar novas tarefas MPI sob demanda. Para prover tais aplicações nós definimos tarefas MPI abstratas, implementamos a sincronização entre elas através da troca de mensagens, e propusemos uma abordagem para ajustar a granularidade das tarefas MPI, visando eficiência em ambientes distribuídos. Os resultados experimentais validaram nossa hipótese de que aplicações adaptativas podem ser providas usando características do MPI-2. Adicionalmente, esta tese identificou os requisitos rio nível do ambiente de execução para suportá-las em clusters. Portanto, as aplicações MPI maleáveis melhoraram a utilização de recursos de clusters; e as aplicações de tarefas explícitas adaptaram a extração do paralelismo de acordo com a arquitetura alvo. mostrando que este paradigma também é eficiente em ambientes distribuídos. / Currently, adaptability is a desired feature in parallel applications. For instante, the increasingly number of user competing for resources of the parallel architectures causes dynamic changes in the set of available processors. Adaptive applications are able to execute using a set of volatile processors, providing better resource utilization. This adaptive behavior is known as malleability. Another example comes from the constant evolution of the multi-core architectures, which increases the number of cores to each new generation of chips. Adaptability is the key to allow parallel programs portability from one multi-core machine to another. Thus, parallel programs can adapt the unfolding of the parallelism to the specific degree of parallelism of the target architecture. This adaptive behavior can be seen as a particular case of evolutivity. In this sense, this thesis is focused on: (i) malleability to adapt the execution of parallel applications as changes in processors availability; and (ii) evolutivity to adapt the unfolding of the parallelism at runtime as the architecture and input data properties. Thus, the open issue is "How to provide and support adaptive applications?". This thesis aims to answer this question taking into account the MPI (Message-Passing Interface), which is the standard parallel API for HPC in distributed-memory environments. Our work is based on MPI-2 features that allow spawning processes at runtime. adding some fiexibility to the MPI applications. Malleable MPI applications use dynamic process creation to expand themselves in growth action (to use further processors). The shrinkage actions (to release processors) end the execution of the MPI processes on the required processors in such a way that the application's data are preserved. Notice that malleable applications require a runtime environment support to execute, once they must be notified about the processors availability. Evolving MPI applications follow the explicit task parallelism paradigm to allow their runtime adaptation. Thus, dynamic process creation is used to unfold the parallelism, i.e., to create new MPI tasks on demand. To provide these applications we defined the abstract MPI tasks, implemented the synchronization among these tasks through message exchanges, and proposed an approach to adjust MPI tasks granularity aiming at efficiency in distributed-memory environments. Experimental results validated our hypothesis that adaptive applications can be provided using the MPI-2 features. Additionally, this thesis identifies the requirements to support these applications in cluster environments. Thus, malleable MPI applications were able to improve the cluster utilization; and the explicit task ones were able to adapt the unfolding of the parallelism to the target architecture, showing that this programming paradigm can be efficient also in distributed-memory contexts.
8

Providing adaptability to MPI applications on current parallel architectures / Provendo adaptabilidade em aplicações MPI nas arquiteturas paralelas atuais

Cera, Marcia Cristina January 2012 (has links)
Atualmente, adaptabilidade é uma característica desejada em aplicações paralelas. Por exemplo, o crescente número de usuários competindo por recursos em arquiteturas paralelas gera mudanças constantes no conjunto de processadores disponíveis. Aplicações adaptativas são capazes de executar usando um conjunto volátil de processadores, oferecendo urna melhor utilização dos recursos. Este comportamento adaptativo é conhecido corno maleabilidade. Outro exemplo vem da constante evolução das arquiteturas multi-core, as quais aumentam o número de cores em seus chips a cada nova geração. Adaptabilidade é a chave para permitir que os programas paralelos sejam portáveis de uma máquina a outra. Assim. os programas paralelos são capazes de adaptar a extração do paralelismo de acordo com o grau de paralelismo específico da arquitetura alvo. Este comportamento pode ser visto como um caso particular de evolutividade. Nesse sentido, esta tese está focada em: (i) maleabilidade para adaptar a execução das aplicações paralelas às mudanças na disponibilidade dos processadores; e (ii) evolutividade para adaptar a extração do paralelismo de acordo com propriedades da arquitetura e dos dados de entrada. Portanto, a questão remanescente é "Como prover e suportar aplicações adaptativas?". Esta tese visa responder tal questão com base no MPI (Message-Passing Interface), o qual é a API paralela padrão para HPC em ambientes distribuídos. Nosso trabalho baseia-se nas características do MPI-2 que permitem criar processos em tempo de execução, dando alguma flexibilidade às aplicações MPI. Aplicações MPI maleáveis usam a criação dinâmica de processos para expandir-se nas ações de crescimento (para usar processadores extras). As ações de diminuição (para liberar processadores) finalizam os processos MPI que executam nos processadores requeridos, preservando os dados da aplicação. Note que as aplicações maleáveis requerem suporte do ambiente de execução, uma vez que precisam ser notificadas sobre a disponibilidade dos processadores. Aplicações MPI evolutivas seguem o paradigma do paralelismo de tarefas explícitas para permitir adaptação em tempo de execução. Assim, a criação dinâmica de processos é usada para extrair o paralelismo, ou seja, para criar novas tarefas MPI sob demanda. Para prover tais aplicações nós definimos tarefas MPI abstratas, implementamos a sincronização entre elas através da troca de mensagens, e propusemos uma abordagem para ajustar a granularidade das tarefas MPI, visando eficiência em ambientes distribuídos. Os resultados experimentais validaram nossa hipótese de que aplicações adaptativas podem ser providas usando características do MPI-2. Adicionalmente, esta tese identificou os requisitos rio nível do ambiente de execução para suportá-las em clusters. Portanto, as aplicações MPI maleáveis melhoraram a utilização de recursos de clusters; e as aplicações de tarefas explícitas adaptaram a extração do paralelismo de acordo com a arquitetura alvo. mostrando que este paradigma também é eficiente em ambientes distribuídos. / Currently, adaptability is a desired feature in parallel applications. For instante, the increasingly number of user competing for resources of the parallel architectures causes dynamic changes in the set of available processors. Adaptive applications are able to execute using a set of volatile processors, providing better resource utilization. This adaptive behavior is known as malleability. Another example comes from the constant evolution of the multi-core architectures, which increases the number of cores to each new generation of chips. Adaptability is the key to allow parallel programs portability from one multi-core machine to another. Thus, parallel programs can adapt the unfolding of the parallelism to the specific degree of parallelism of the target architecture. This adaptive behavior can be seen as a particular case of evolutivity. In this sense, this thesis is focused on: (i) malleability to adapt the execution of parallel applications as changes in processors availability; and (ii) evolutivity to adapt the unfolding of the parallelism at runtime as the architecture and input data properties. Thus, the open issue is "How to provide and support adaptive applications?". This thesis aims to answer this question taking into account the MPI (Message-Passing Interface), which is the standard parallel API for HPC in distributed-memory environments. Our work is based on MPI-2 features that allow spawning processes at runtime. adding some fiexibility to the MPI applications. Malleable MPI applications use dynamic process creation to expand themselves in growth action (to use further processors). The shrinkage actions (to release processors) end the execution of the MPI processes on the required processors in such a way that the application's data are preserved. Notice that malleable applications require a runtime environment support to execute, once they must be notified about the processors availability. Evolving MPI applications follow the explicit task parallelism paradigm to allow their runtime adaptation. Thus, dynamic process creation is used to unfold the parallelism, i.e., to create new MPI tasks on demand. To provide these applications we defined the abstract MPI tasks, implemented the synchronization among these tasks through message exchanges, and proposed an approach to adjust MPI tasks granularity aiming at efficiency in distributed-memory environments. Experimental results validated our hypothesis that adaptive applications can be provided using the MPI-2 features. Additionally, this thesis identifies the requirements to support these applications in cluster environments. Thus, malleable MPI applications were able to improve the cluster utilization; and the explicit task ones were able to adapt the unfolding of the parallelism to the target architecture, showing that this programming paradigm can be efficient also in distributed-memory contexts.
9

Providing adaptability to MPI applications on current parallel architectures / Provendo adaptabilidade em aplicações MPI nas arquiteturas paralelas atuais

Cera, Marcia Cristina January 2012 (has links)
Atualmente, adaptabilidade é uma característica desejada em aplicações paralelas. Por exemplo, o crescente número de usuários competindo por recursos em arquiteturas paralelas gera mudanças constantes no conjunto de processadores disponíveis. Aplicações adaptativas são capazes de executar usando um conjunto volátil de processadores, oferecendo urna melhor utilização dos recursos. Este comportamento adaptativo é conhecido corno maleabilidade. Outro exemplo vem da constante evolução das arquiteturas multi-core, as quais aumentam o número de cores em seus chips a cada nova geração. Adaptabilidade é a chave para permitir que os programas paralelos sejam portáveis de uma máquina a outra. Assim. os programas paralelos são capazes de adaptar a extração do paralelismo de acordo com o grau de paralelismo específico da arquitetura alvo. Este comportamento pode ser visto como um caso particular de evolutividade. Nesse sentido, esta tese está focada em: (i) maleabilidade para adaptar a execução das aplicações paralelas às mudanças na disponibilidade dos processadores; e (ii) evolutividade para adaptar a extração do paralelismo de acordo com propriedades da arquitetura e dos dados de entrada. Portanto, a questão remanescente é "Como prover e suportar aplicações adaptativas?". Esta tese visa responder tal questão com base no MPI (Message-Passing Interface), o qual é a API paralela padrão para HPC em ambientes distribuídos. Nosso trabalho baseia-se nas características do MPI-2 que permitem criar processos em tempo de execução, dando alguma flexibilidade às aplicações MPI. Aplicações MPI maleáveis usam a criação dinâmica de processos para expandir-se nas ações de crescimento (para usar processadores extras). As ações de diminuição (para liberar processadores) finalizam os processos MPI que executam nos processadores requeridos, preservando os dados da aplicação. Note que as aplicações maleáveis requerem suporte do ambiente de execução, uma vez que precisam ser notificadas sobre a disponibilidade dos processadores. Aplicações MPI evolutivas seguem o paradigma do paralelismo de tarefas explícitas para permitir adaptação em tempo de execução. Assim, a criação dinâmica de processos é usada para extrair o paralelismo, ou seja, para criar novas tarefas MPI sob demanda. Para prover tais aplicações nós definimos tarefas MPI abstratas, implementamos a sincronização entre elas através da troca de mensagens, e propusemos uma abordagem para ajustar a granularidade das tarefas MPI, visando eficiência em ambientes distribuídos. Os resultados experimentais validaram nossa hipótese de que aplicações adaptativas podem ser providas usando características do MPI-2. Adicionalmente, esta tese identificou os requisitos rio nível do ambiente de execução para suportá-las em clusters. Portanto, as aplicações MPI maleáveis melhoraram a utilização de recursos de clusters; e as aplicações de tarefas explícitas adaptaram a extração do paralelismo de acordo com a arquitetura alvo. mostrando que este paradigma também é eficiente em ambientes distribuídos. / Currently, adaptability is a desired feature in parallel applications. For instante, the increasingly number of user competing for resources of the parallel architectures causes dynamic changes in the set of available processors. Adaptive applications are able to execute using a set of volatile processors, providing better resource utilization. This adaptive behavior is known as malleability. Another example comes from the constant evolution of the multi-core architectures, which increases the number of cores to each new generation of chips. Adaptability is the key to allow parallel programs portability from one multi-core machine to another. Thus, parallel programs can adapt the unfolding of the parallelism to the specific degree of parallelism of the target architecture. This adaptive behavior can be seen as a particular case of evolutivity. In this sense, this thesis is focused on: (i) malleability to adapt the execution of parallel applications as changes in processors availability; and (ii) evolutivity to adapt the unfolding of the parallelism at runtime as the architecture and input data properties. Thus, the open issue is "How to provide and support adaptive applications?". This thesis aims to answer this question taking into account the MPI (Message-Passing Interface), which is the standard parallel API for HPC in distributed-memory environments. Our work is based on MPI-2 features that allow spawning processes at runtime. adding some fiexibility to the MPI applications. Malleable MPI applications use dynamic process creation to expand themselves in growth action (to use further processors). The shrinkage actions (to release processors) end the execution of the MPI processes on the required processors in such a way that the application's data are preserved. Notice that malleable applications require a runtime environment support to execute, once they must be notified about the processors availability. Evolving MPI applications follow the explicit task parallelism paradigm to allow their runtime adaptation. Thus, dynamic process creation is used to unfold the parallelism, i.e., to create new MPI tasks on demand. To provide these applications we defined the abstract MPI tasks, implemented the synchronization among these tasks through message exchanges, and proposed an approach to adjust MPI tasks granularity aiming at efficiency in distributed-memory environments. Experimental results validated our hypothesis that adaptive applications can be provided using the MPI-2 features. Additionally, this thesis identifies the requirements to support these applications in cluster environments. Thus, malleable MPI applications were able to improve the cluster utilization; and the explicit task ones were able to adapt the unfolding of the parallelism to the target architecture, showing that this programming paradigm can be efficient also in distributed-memory contexts.
10

Enabling Task Parallelism on Hardware/Software Layers using the Polyhedral Model

Kong, Martin Richard 09 June 2016 (has links)
No description available.

Page generated in 0.1368 seconds