1 |
Execution of Prolog by transformations on distributed memory multi-processorsXirogiannis, George January 1998 (has links)
No description available.
|
2 |
Cumulon: Simplified Matrix-Based Data Analytics in the CloudHuang, Botong January 2016 (has links)
<p>Cumulon is a system aimed at simplifying the development and deployment of statistical analysis of big data in public clouds. Cumulon allows users to program in their familiar language of matrices and linear algebra, without worrying about how to map data and computation to specific hardware and cloud software platforms. Given user-specified requirements in terms of time, monetary cost, and risk tolerance, Cumulon automatically makes intelligent decisions on implementation alternatives, execution parameters, as well as hardware provisioning and configuration settings -- such as what type of machines and how many of them to acquire. Cumulon also supports clouds with auction-based markets: it effectively utilizes computing resources whose availability varies according to market conditions, and suggests best bidding strategies for them. Cumulon explores two alternative approaches toward supporting such markets, with different trade-offs between system and optimization complexity. Experimental study is conducted to show the efficiency of Cumulon's execution engine, as well as the optimizer's effectiveness in finding the optimal plan in the vast plan space.</p> / Dissertation
|
3 |
Policy-Aware Parallel Execution of Composite Services / 複合サービスのポリシーアウェアな並列実行Mai, Xuan Trang 23 March 2016 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第19855号 / 情博第606号 / 新制||情||105(附属図書館) / 32891 / 京都大学大学院情報学研究科社会情報学専攻 / (主査)教授 石田 亨, 教授 吉川 正俊, 教授 岡部 寿男 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
4 |
Optimization of checkpointing and execution model for an implementation of OpenMP on distributed memory architectures / Optimisation des checkpoints et du modèle d'exécution pour une implémentation de OpenMP sur architectures à mémoire distribuéeTran, Van Long 14 November 2018 (has links)
OpenMP et MPI sont devenus les outils standards pour développer des programmes parallèles sur une architecture à mémoire partagée et à mémoire distribuée respectivement. Comparé à MPI, OpenMP est plus facile à utiliser. Ceci est dû au fait qu’OpenMP génère automatiquement le code parallèle et synchronise les résultats à l’aide de directives, clauses et fonctions d’exécution, tandis que MPI exige que les programmeurs fassent ce travail manuellement. Par conséquent, des efforts ont été faits pour porter OpenMP sur les architectures à mémoire distribuée. Cependant, à l’exclusion de CAPE, aucune solution ne satisfait les deux exigences suivantes: 1) être totalement conforme à la norme OpenMP et 2) être hautement performant. CAPE (Checkpointing-Aided Parallel Execution) est un framework qui traduit et fournit automatiquement des fonctions d’exécution pour exécuter un programme OpenMP sur une architecture à mémoire distribuée basé sur des techniques de checkpoint. Afin d’exécuter un programme OpenMP sur un système à mémoire distribuée, CAPE utilise un ensemble de modèles pour traduire le code source OpenMP en code source CAPE, puis le code source CAPE est compilé par un compilateur C/C++ classique. Fondamentalement, l’idée de CAPE est que le programme s’exécute d’abord sur un ensemble de nœuds du système, chaque nœud fonctionnant comme un processus. Chaque fois que le programme rencontre une section parallèle, le maître distribue les tâches aux processus esclaves en utilisant des checkpoints incrémentaux discontinus (DICKPT). Après l’envoi des checkpoints, le maître attend les résultats renvoyés par les esclaves. L’étape suivante au niveau du maître consiste à recevoir et à fusionner le résultat des checkpoints avant de les injecter dans sa mémoire. Les nœuds esclaves quant à eux reçoivent les différents checkpoints, puis l’injectent dans leur mémoire pour effectuer le travail assigné. Le résultat est ensuite renvoyé au master en utilisant DICKPT. À la fin de la région parallèle, le maître envoie le résultat du checkpoint à chaque esclave pour synchroniser l’espace mémoire du programme. Dans certaines expériences, CAPE a montré des performances élevées sur les systèmes à mémoire distribuée et constitue une solution viable entièrement compatible avec OpenMP. Cependant, CAPE reste en phase de développement, ses checkpoints et son modèle d’exécution devant être optimisés pour améliorer les performances, les capacités et la fiabilité. Cette thèse vise à présenter les approches proposées pour optimiser et améliorer la capacité des checkpoints, concevoir et mettre en œuvre un nouveau modèle d’exécution, et améliorer la capacité de CAPE. Tout d’abord, nous avons proposé une arithmétique sur les checkpoints qui modélise la structure de leurs données et ses opérations. Cette modélisation contribue à optimiser leur taille et à réduire le temps nécessaire à la fusion, tout en améliorant leur capacité. Deuxièmement, nous avons développé TICKPT (Time-Stamp Incremental Checkpointing) une implémentation de l’arithmétique sur les checkpoints. TICKPT est une amélioration de DICKPT, il a ajouté l’horodatage aux checkpoints pour en identifier l’ordre. L’analyse et les expériences comparées montrent TICKPT sont non seulement plus petites, mais qu’ils ont également moins d’impact sur les performances du programme. Troisièmement, nous avons conçu et implémenté un nouveau modèle d’exécution et de nouveaux prototypes pour CAPE basés sur TICKPT. Le nouveau modèle d’exécution permet à CAPE d’utiliser les ressources efficacement, d’éviter les risques de goulots d’étranglement et de satisfaire à l’exigence des les conditions de Bernstein. Au final, ces approches améliorent significativement les performances de CAPE, ses capacités et sa fiabilité. Le partage des données implémenté sur CAPE et basé sur l’arithmétique sur des checkpoints est ouvert et basé sur TICKPT. Cela démontre également la bonne direction que nous avons prise et rend CAPE plus complet / OpenMP and MPI have become the standard tools to develop parallel programs on shared-memory and distributed-memory architectures respectively. As compared to MPI, OpenMP is easier to use. This is due to the ability of OpenMP to automatically execute code in parallel and synchronize results using its directives, clauses, and runtime functions while MPI requires programmers do all this manually. Therefore, some efforts have been made to port OpenMP on distributed-memory architectures. However, excluding CAPE, no solution has successfully met both requirements: 1) to be fully compliant with the OpenMP standard and 2) high performance. CAPE stands for Checkpointing-Aided Parallel Execution. It is a framework that automatically translates and provides runtime functions to execute OpenMP program on distributed-memory architectures based on checkpointing techniques. In order to execute an OpenMP program on distributed-memory system, CAPE uses a set of templates to translate OpenMP source code to CAPE source code, and then, the CAPE source code is compiled by a C/C++ compiler. This code can be executed on distributed-memory systems under the support of the CAPE framework. Basically, the idea of CAPE is the following: the program first run on a set of nodes on the system, each node being executed as a process. Whenever the program meets a parallel section, the master distributes the jobs to the slave processes by using a Discontinuous Incremental Checkpoint (DICKPT). After sending the checkpoints, the master waits for the returned results from the slaves. The next step on the master is the reception and merging of the resulting checkpoints before injecting them into the memory. For slave nodes, they receive different checkpoints, and then, they inject it into their memory to compute the divided job. The result is sent back to the master using DICKPTs. At the end of the parallel region, the master sends the result of the checkpoint to every slaves to synchronize the memory space of the program as a whole. In some experiments, CAPE has shown very high-performance on distributed-memory systems and is a viable and fully compatible with OpenMP solution. However, CAPE is in the development stage. Its checkpoint mechanism and execution model need to be optimized in order to improve the performance, ability, and reliability. This thesis aims at presenting the approaches that were proposed to optimize and improve checkpoints, design and implement a new execution model, and improve the ability for CAPE. First, we proposed arithmetics on checkpoints, which aims at modeling checkpoint’s data structure and its operations. This modeling contributes to optimize checkpoint size and reduces the time when merging, as well as improve checkpoints capability. Second, we developed TICKPT which stands for Time-stamp Incremental Checkpointing as an instance of arithmetics on checkpoints. TICKPT is an improvement of DICKPT. It adds a timestamp to checkpoints to identify the checkpoints order. The analysis and experiments to compare it to DICKPT show that TICKPT do not only provide smaller in checkpoint size, but also has less impact on the performance of the program using checkpointing. Third, we designed and implemented a new execution model and new prototypes for CAPE based on TICKPT. The new execution model allows CAPE to use resources efficiently, avoid the risk of bottlenecks, overcome the requirement of matching the Bernstein’s conditions. As a result, these approaches make CAPE improving the performance, ability as well as reliability. Four, Open Data-sharing attributes are implemented on CAPE based on arithmetics on checkpoints and TICKPT. This also demonstrates the right direction that we took, and makes CAPE more complete
|
5 |
Abstraction fonctionnelle pour la programmation d’architecture multi-niveaux : formalisation et implantation / Functional abstraction for programming multi-level architectures : formalisation and implementationAllombert, Victor 07 July 2017 (has links)
Les architectures parallèles sont de plus en plus présentes dans notre environnement, que ce soit dans les ordinateurs personnels disposant des dizaines d’unités de calculs jusqu’aux super-calculateurs comptant des millions d’unités. Les architectures haute performance modernes sont généralement constituées de grappes de multiprocesseurs, elles même constituées de multi-cœurs, et sont qualifiées d’architecture hiérarchiques. La conception de langages pour de telles architectures est un sujet de recherche actif car il s’agit de simplifier la programmation tout en garantissant l’efficacité des programmes. En effet, écrire des programmes parallèles est, en général, plus complexe tant au point de vue algorithmique qu’au niveau de l’implémentation. Afin de répondre à cette problématique, plusieurs modèles structurés ont été proposés. Le modèle logico-materiel BSP définit une vision structurée pour les architectures parallèles dites plates. Afin d’exploiter les architectures actuelles, une extension adaptée aux architectures hiérarchiques a été proposée : Multi-BSP. Tout en préservant la philosophie BSP, ce modèle garanti efficacité, sécurité d’exécution, passage à l’échelle et prédiction de coût.Cette thèse s’articule donc autour de cette idée et propose de définir Multi-ML, un langage basé sur le modèle logico-materiel Multi-BSP, garantissant les propriétés énoncées ci-dessus. Afin de pouvoir garantir la sécurité d’exécution des programmes Multi-ML, nous proposons une sémantique formelle ainsi qu’un système de type afin d’accepter uniquement des programmes bien formés. De plus, nous proposons une machine abstraite permettant de décrire formellement l’évaluation d’un programme Multi-ML sur une machine Multi-BSP. Une implantation du langage, développé dans le cadre de cette thèse, permet de générer un code exécutable. Il est donc possible d’exécuter, efficacement, des algorithmes Multi-BSP écrits à l’aide de Multi-ML sur diverses machines hiérarchiques / From personal computers using an increasing number of cores, to supercomputers having millions of computing units, parallel architectures are the current standard. The high performance architectures are usually referenced to as hierarchical, as they are composed from clusters of multi-processors of multi-cores. Programming such architectures is known to be notoriously difficult. Writing parallel programs is, most of the time, difficult for both the algorithmic and the implementation phase. To answer those concerns, many structured models and languages were proposed in order to increase both expressiveness and efficiency. Among other models, Multi-BSP is a bridging model dedicated to hierarchical architecture that ensures efficiency, execution safety, scalability and cost prediction. It is an extension of the well known BSP model that handles flat architectures.In this thesis we introduce the Multi-ML language, which allows programming Multi-BSP algorithms “à la ML” and thus, guarantees the properties of the Multi-BSP model and the execution safety, thanks to a ML type system. To deal with the multi-level execution model of Multi-ML, we defined formal semantics which describe the valid evaluation of an expression. To ensure the execution safety of Multi-ML programs, we also propose a typing system that preserves replicated coherence. An abstract machine is defined to formally describe the evaluation of a Multi-ML program on a Multi-BSP architecture. An implementation of the language is available as a compilation toolchain. It is thus possible to generate an efficient parallel code from a program written in Multi-ML and execute it on any hierarchical machine
|
6 |
Metodologia de teste para acelerar o desenvolvimento de sistemas de processamento paralelo. / Test methodology to accelerate the development of parallel processing system.Santana, André Aguiar 09 April 2007 (has links)
Devido a grande competitividade do mercado, existe uma demanda crescente pela produção de sistemas computacionais modernos cada vez com mais qualidade e em menores prazos (CALANTONE; BENEDETTO, 00). O tempo para desenvolvimento de novas versões do sistema também é crítico, pois se espera melhor desempenho e mais funcionalidades que a versão atual, com grande expectativa por parte dos clientes em relação ao tempo de liberação. Em conseqüência das evoluções tecnológicas e com a redução no valor dos processadores e memórias, sistemas modernos com alto desempenho, como os sistemas de processamento paralelo, ganharam espaço e estão sendo cada vez mais requisitados pelos clientes devido ao seu poder computacional para resolver problemas complexos em áreas críticas como médica, militar, energética, simulações e previsões de tempo (MORRISON, 03). Na área de processamento paralelo, pode-se dizer que existe uma verdadeira corrida por colocar um novo produto e suas versões rapidamente no mercado, permitindo posicioná-lo de maneira vantajosa em relação aos concorrentes e torná-lo uma referência para os clientes, que passam a querer adotá-lo. Levando-se esses fatos em consideração, o problema pesquisado por esta tese é como melhorar o processo de desenvolvimento de sistemas de processamento paralelo, reduzindo o tempo de desenvolvimento de novas versões destes sistemas e sua colocação no mercado. A proposta para resolução do problema é reduzir o tempo gasto na atividade de teste, que corresponde a uma parte significativa do tempo total do projeto. Para diminuir esse tempo, o trabalho apresenta uma estratégia baseada na execução dos testes em paralelo com desenvolvimento. Esta técnica aplicada a sistemas de processamento paralelo resulta no principal objetivo do trabalho que é reduzir o tempo de desenvolvimento de novas versões destes sistemas através de uma metodologia de testes. Esta metodologia é usada para testar um dos componentes de um sistema de processamento paralelo, chamado Sistema de Controle, simultaneamente com o desenvolvimento dos outros componentes do sistema. Para testar a eficiência da solução, a metodologia foi aplicada no desenvolvimento do supercomputador da IBM Blue Gene. Como resultado, pode-se verificar uma redução de até 41% do tempo total do projeto. / Due to the great competitiveness of the market, there is an increasing demand for the production of modern computational systems with more quality in less time (CALANTONE; BENEDETTO, 00). The development time of new versions at those systems is also critical, because better performance and more functionalities than in the current version are expected, and the customers have great expectative regarding the release time. As consequence of the technological evolution and the price reduction of the processors and memories, modern systems with high performance, as the parallel processing systems, have gained more relevance and they have been progressively more requested by the customers due to their computational power to solve complex problems in critical fields such as medical, military, energy, simulation and weather forecasts (MORRISON, 03). In the parallel processing area, it is reasonable to affirm that there is a real race to release a new product and its versions earlier in the market, to get a better position and more advantages than the competitors and to become a reference for the customers, who then wish to adopt it. Taking these facts into consideration, the problem studied in this thesis is how to improve the development process of parallel processing systems, reducing the development time of new versions of these systems. The proposal to solve this problem is to reduce the test time, which corresponds to a significant part of the total project time. In order to achieve this reduction, this work presents a strategy based on the parallel execution of the test with the development. The application of this technique to the parallel processing systems results in the main objective of this work, which is to reduce the development time of new versions of these systems through a test methodology. This methodology is used to test one component of the parallel processing system, called Control System, simultaneously with the development of the other components of the system. To test the efficiency of the proposed solution, the methodology has been applied to the development of the IBM Blue Gene supercomputer. As a result, a reduction of up to 41% of the total project time could be observed.
|
7 |
Otimizando o teste estrutural de programas concorrentes: uma abordagem determinística e paralela / Improving the structural testing of concurrent programs: a deterministic and parallel approachBatista, Raphael Negrisoli 27 March 2015 (has links)
O teste de programas concorrentes é uma atividade custosa devido principalmente à quantidade de sequências de sincronização que devem ser testadas para validar tais programas. Uma das técnicas mais utilizadas para testar a comunicação e sincronização de programas concorrentes é a geração automática de diferentes pares de sincronização ou, em outras palavras, a geração de variantes de disputa (race variant). Nesta técnica as variantes de disputa são geradas a partir de arquivos de rastro de uma execução não-determinística e algoritmos de execução determinística são utilizados para forçar que diferentes sincronizações sejam cobertas. Este trabalho aborda de maneira abrangente este problema, cujo objetivo principal é reduzir o tempo de resposta da atividade de teste estrutural de programas concorrentes quando diferentes variantes de disputa são executadas. Há três principais contribuições neste trabalho: (1) geração de arquivos de rastro e execução determinística total/parcial, (2) geração automática de variantes e (3) paralelização da execução das variantes. Diferentemente de outros trabalhos disponíveis na literatura, os algoritmos propostos consideram programas concorrentes que interagem simultaneamente com passagem de mensagens e memória compartilhada. Foram consideradas seis primitivas com semânticas distintas: ponto-a-ponto bloqueante/não bloqueante, coletivas um-para-todos/todos-para-um/todos-para-todos e semáforos. Os algoritmos foram desenvolvidos no nível de aplicação em Java, são ortogonais à linguagem de programação utilizada e não requerem privilégios de sistema para serem executados. Estas três contribuições são descritas, detalhando seus algoritmos. Também são apresentados os resultados obtidos com os experimentos feitos durante as fases de validação e avaliação de cada contribuição. Os resultados demonstram que os objetivos propostos foram atingidos com sucesso para cada contribuição e, do ponto de vista do testador, o tempo de resposta da atividade de teste estrutural de programas concorrentes foi reduzido enquanto a cobertura de programas concorrentes com ambos os paradigmas aumentou com procedimentos automatizados e transparentes. Os experimentos mostram speedups próximos ao linear, quando comparadas as versões sequencial e paralela dos algoritmos. / The testing of concurrent programs is an expensive task, mainly because it needs to test a high number of synchronization sequences, in order to validate such programs. One of the most used techniques to test communication and synchronization of concurrent programs is the automatic generation of different synchronizations pairs (or generation of race variants). Race variants are generated from the trace files of a nondeterministic execution, and the deterministic executions force the coverage of different synchronizations. This work approaches this problem in a more general way. It reduces the response time of the structural testing of concurrent programs when different variants are required. There are three main contributions in this work: the generation of trace files and the total or partial deterministic execution, the automatic generation of race variants and the parallelization of execution of race variants. The proposed algorithms take into account concurrent programs that interact simultaneously with message passing and shared memory, including six primitives with distinct semantics: blocking and non-blocking point-to-point, all-to-all/one-to-all/all-toone collectives and shared memory. The algorithms have been implemented in Java in the application level, they are language independent and do not need system privileges to execute. Results obtained during the validation and evaluation phase are also presented and they show that the proposed objectives are reached with success. From the tester viewpoint, the response time of structural testing of concurrent programs was reduced, while the coverage of the concurrent programs with both paradigms increased with automatic and transparent procedures. The experiments showed speedups close to linear, when comparing the sequential and parallel versions.
|
8 |
Metodologia de teste para acelerar o desenvolvimento de sistemas de processamento paralelo. / Test methodology to accelerate the development of parallel processing system.André Aguiar Santana 09 April 2007 (has links)
Devido a grande competitividade do mercado, existe uma demanda crescente pela produção de sistemas computacionais modernos cada vez com mais qualidade e em menores prazos (CALANTONE; BENEDETTO, 00). O tempo para desenvolvimento de novas versões do sistema também é crítico, pois se espera melhor desempenho e mais funcionalidades que a versão atual, com grande expectativa por parte dos clientes em relação ao tempo de liberação. Em conseqüência das evoluções tecnológicas e com a redução no valor dos processadores e memórias, sistemas modernos com alto desempenho, como os sistemas de processamento paralelo, ganharam espaço e estão sendo cada vez mais requisitados pelos clientes devido ao seu poder computacional para resolver problemas complexos em áreas críticas como médica, militar, energética, simulações e previsões de tempo (MORRISON, 03). Na área de processamento paralelo, pode-se dizer que existe uma verdadeira corrida por colocar um novo produto e suas versões rapidamente no mercado, permitindo posicioná-lo de maneira vantajosa em relação aos concorrentes e torná-lo uma referência para os clientes, que passam a querer adotá-lo. Levando-se esses fatos em consideração, o problema pesquisado por esta tese é como melhorar o processo de desenvolvimento de sistemas de processamento paralelo, reduzindo o tempo de desenvolvimento de novas versões destes sistemas e sua colocação no mercado. A proposta para resolução do problema é reduzir o tempo gasto na atividade de teste, que corresponde a uma parte significativa do tempo total do projeto. Para diminuir esse tempo, o trabalho apresenta uma estratégia baseada na execução dos testes em paralelo com desenvolvimento. Esta técnica aplicada a sistemas de processamento paralelo resulta no principal objetivo do trabalho que é reduzir o tempo de desenvolvimento de novas versões destes sistemas através de uma metodologia de testes. Esta metodologia é usada para testar um dos componentes de um sistema de processamento paralelo, chamado Sistema de Controle, simultaneamente com o desenvolvimento dos outros componentes do sistema. Para testar a eficiência da solução, a metodologia foi aplicada no desenvolvimento do supercomputador da IBM Blue Gene. Como resultado, pode-se verificar uma redução de até 41% do tempo total do projeto. / Due to the great competitiveness of the market, there is an increasing demand for the production of modern computational systems with more quality in less time (CALANTONE; BENEDETTO, 00). The development time of new versions at those systems is also critical, because better performance and more functionalities than in the current version are expected, and the customers have great expectative regarding the release time. As consequence of the technological evolution and the price reduction of the processors and memories, modern systems with high performance, as the parallel processing systems, have gained more relevance and they have been progressively more requested by the customers due to their computational power to solve complex problems in critical fields such as medical, military, energy, simulation and weather forecasts (MORRISON, 03). In the parallel processing area, it is reasonable to affirm that there is a real race to release a new product and its versions earlier in the market, to get a better position and more advantages than the competitors and to become a reference for the customers, who then wish to adopt it. Taking these facts into consideration, the problem studied in this thesis is how to improve the development process of parallel processing systems, reducing the development time of new versions of these systems. The proposal to solve this problem is to reduce the test time, which corresponds to a significant part of the total project time. In order to achieve this reduction, this work presents a strategy based on the parallel execution of the test with the development. The application of this technique to the parallel processing systems results in the main objective of this work, which is to reduce the development time of new versions of these systems through a test methodology. This methodology is used to test one component of the parallel processing system, called Control System, simultaneously with the development of the other components of the system. To test the efficiency of the proposed solution, the methodology has been applied to the development of the IBM Blue Gene supercomputer. As a result, a reduction of up to 41% of the total project time could be observed.
|
9 |
Otimizando o teste estrutural de programas concorrentes: uma abordagem determinística e paralela / Improving the structural testing of concurrent programs: a deterministic and parallel approachRaphael Negrisoli Batista 27 March 2015 (has links)
O teste de programas concorrentes é uma atividade custosa devido principalmente à quantidade de sequências de sincronização que devem ser testadas para validar tais programas. Uma das técnicas mais utilizadas para testar a comunicação e sincronização de programas concorrentes é a geração automática de diferentes pares de sincronização ou, em outras palavras, a geração de variantes de disputa (race variant). Nesta técnica as variantes de disputa são geradas a partir de arquivos de rastro de uma execução não-determinística e algoritmos de execução determinística são utilizados para forçar que diferentes sincronizações sejam cobertas. Este trabalho aborda de maneira abrangente este problema, cujo objetivo principal é reduzir o tempo de resposta da atividade de teste estrutural de programas concorrentes quando diferentes variantes de disputa são executadas. Há três principais contribuições neste trabalho: (1) geração de arquivos de rastro e execução determinística total/parcial, (2) geração automática de variantes e (3) paralelização da execução das variantes. Diferentemente de outros trabalhos disponíveis na literatura, os algoritmos propostos consideram programas concorrentes que interagem simultaneamente com passagem de mensagens e memória compartilhada. Foram consideradas seis primitivas com semânticas distintas: ponto-a-ponto bloqueante/não bloqueante, coletivas um-para-todos/todos-para-um/todos-para-todos e semáforos. Os algoritmos foram desenvolvidos no nível de aplicação em Java, são ortogonais à linguagem de programação utilizada e não requerem privilégios de sistema para serem executados. Estas três contribuições são descritas, detalhando seus algoritmos. Também são apresentados os resultados obtidos com os experimentos feitos durante as fases de validação e avaliação de cada contribuição. Os resultados demonstram que os objetivos propostos foram atingidos com sucesso para cada contribuição e, do ponto de vista do testador, o tempo de resposta da atividade de teste estrutural de programas concorrentes foi reduzido enquanto a cobertura de programas concorrentes com ambos os paradigmas aumentou com procedimentos automatizados e transparentes. Os experimentos mostram speedups próximos ao linear, quando comparadas as versões sequencial e paralela dos algoritmos. / The testing of concurrent programs is an expensive task, mainly because it needs to test a high number of synchronization sequences, in order to validate such programs. One of the most used techniques to test communication and synchronization of concurrent programs is the automatic generation of different synchronizations pairs (or generation of race variants). Race variants are generated from the trace files of a nondeterministic execution, and the deterministic executions force the coverage of different synchronizations. This work approaches this problem in a more general way. It reduces the response time of the structural testing of concurrent programs when different variants are required. There are three main contributions in this work: the generation of trace files and the total or partial deterministic execution, the automatic generation of race variants and the parallelization of execution of race variants. The proposed algorithms take into account concurrent programs that interact simultaneously with message passing and shared memory, including six primitives with distinct semantics: blocking and non-blocking point-to-point, all-to-all/one-to-all/all-toone collectives and shared memory. The algorithms have been implemented in Java in the application level, they are language independent and do not need system privileges to execute. Results obtained during the validation and evaluation phase are also presented and they show that the proposed objectives are reached with success. From the tester viewpoint, the response time of structural testing of concurrent programs was reduced, while the coverage of the concurrent programs with both paradigms increased with automatic and transparent procedures. The experiments showed speedups close to linear, when comparing the sequential and parallel versions.
|
10 |
Performance Analysis of Distributed Spatial Interpolation for Air Quality DataAsratyan, Albert January 2021 (has links)
Deteriorating air quality is a growing concern that has been linked to many health- related issues. Its monitoring is a good first step to understanding the problem. However, it is not always possible to collect air quality data from every location. Various data interpolation techniques are used to assist with populating sparse maps with more context, but many of these algorithms are computationally expensive. This work presents a three- step chain mail algorithm that uses kriging (without any modifications to the kriging algorithm itself) and achieves up to ×100 execution time improvement with minimal accuracy loss (relative RMSE of 3%) by parallelizing the load for the locally tested data sets. This approach can be described as a multiple- step parallel interpolation algorithm that includes specific regional border data manipulation for achieving greater accuracy. It does so by interpolating geographically defined data chunks in parallel and sharing the results with their neighboring nodes to provide context and compensate for lack of knowledge of the surrounding areas. Combined with the cloud serverless function architecture, this approach opens doors to interpolating data sets of huge sizes in a matter of minutes while remaining cost- efficient. The effectiveness of the three- step chain mail approach depends on the equal point distribution among all regions and the resolution of the parallel configuration, but in general, it offers a good balance between execution speed and accuracy. / Försämrad luftkvalitet är en växande oro som har kopplats till många hälsorelaterade frågor. Övervakningen är ett bra första steg för att förstå problemet. Det är dock inte alltid möjligt att samla in luftkvalitetsdata från alla platser. Olika interpolationsmetoder används för att hjälpa till att fylla i glesa kartor med mer sammanhang, men många av dessa algoritmer är beräkningsdyra. Detta arbete presenterar en trestegs ‘kedjepostalgoritm’ som använder kriging (utan några modifieringar av själva krigingsalgoritmen) och uppnår upp till × 100 förbättring av exekveringstiden med minimal noggrannhetsförlust (relativ RMSE på 3%) genom att parallellisera exekveringen för de lokalt testade datamängderna. Detta tillvägagångssätt kan beskrivas som en flerstegs parallell interpoleringsalgoritm som inkluderar regional specifik gränsdatamanipulation för att uppnå större noggrannhet. Det görs genom att interpolera geografiskt definierade databitar parallellt och dela resultaten med sina angränsande noder för att ge sammanhang och kompensera för bristande kunskap om de omgivande områdena. I kombination med den molnserverfria funktionsarkitekturen öppnar detta tillvägagångssätt dörrar till interpolering av datamängder av stora storlekar på några minuter samtidigt som det förblir kostnadseffektivt. Effektiviteten i kedjepostalgorithmen i tre steg beror på lika punktfördelning mellan alla regioner och upplösningen av den parallella konfigurationen, men i allmänhet erbjuder den en bra balans mellan exekveringshastighet och noggrannhet.
|
Page generated in 0.1111 seconds