Global ETD Search

91	Estabilidade do fechamento dos diastemas interincisivos superiores, tratados na fase de dentadura permanente / Postretention stability after orthodontic closure of anterior maxillary diastemas Morais, Juliana Fernandes de 19 February 2009 (has links) O objetivo deste estudo foi avaliar a estabilidade do fechamento dos diastemas interincisivos superiores, verificando sua correlacao com a largura inicial desses espacos, a sobressaliencia, a sobremordida e o paralelismo de raizes. A amostra foi composta por 30 pacientes com pelo menos um diastema, com largura minima de 0,77mm e media do somatorio dos tres diastemas interincisivos de 2,64mm (DP = 1,46; minimo= 0,77; maximo= 8,04). Todos os pacientes apresentavam os caninos superiores permanentes com, no minimo, a metade da coroa intrabucal. As mensuracoes foram realizadas em modelos de estudo (largura dos diastemas, sobressaliencia e sobremordida) e radiografias panorâmicas (paralelismo de raizes), obtidos nos estagios pre-tratamento, final de tratamento e, pelo menos, 2,4 anos pos-tratamento. Os resultados da analise de variancia para medidas repetidas demonstraram que a recidiva do diastema mediano foi significante (media= 0,45mm, DP= 0,66), mas foi estatisticamente menor do que sua largura inicial, e os diastemas entre os incisivos centrais e laterais permaneceram fechados, na maioria dos casos. Ocorreu recidiva do diastema mediano em 18 pacientes (60% da amostra) e 19 pacientes apresentaram reabertura de pelo menos um dos diastemas interincisivos. De acordo com a analise de regressao multipla, os unicos fatores associados a recidiva do diastema mediano foram a largura pretratamento deste espaco (p=0,000) e a alteracao da sobressaliencia durante o periodo pos-tratamento (p=0,046). Nao foi encontrada associacao entre o paralelismo de raizes e a recidiva dos diastemas interincisivos. / This study evaluated the stability of maxillary anterior diastemas closure and its association with relapse and dental casts variables (interincisor width, overjet and overbite) and also with root parallelism. Sample comprised 30 patients with at least one pretreatment anterior diastema of 0.77mm or greater after eruption of maxillary permanent canines. Data were obtained from dental casts and panoramic radiographs taken pretreatment, posttreatment and at least 2 years postretention. The sum of initial mean width of the diastemas was 2.64mm (SD=1.46, minimum=0.77). Repeated measures analysis of variance demonstrated significant relapse of median diastema (mean=0.45mm, SD=0.66) but this value was statistically slighter than its initial width, and closure of diastemas located between central incisors and lateral incisors showed great stability. Relapse of median diastema occurred in 18 cases, and 19 patients showed at least one space recurrence. Only initial diastema severity width and relapse of overjet showed association with the relapse of median diastema. There was no association between interincisor diastemas relapse and root parallelism. diastema diastema estabilidade orthodontics ortodontia paralelismo radicular recidiva relapse root parallelism stability
92	Towards reproducible, accurately rounded and efficient BLAS Chohra, Chemseddine 10 March 2017 (has links) Le problème de non-reproductibilté numérique surgit dans les calculs parallèles principalement à cause de la non-associativité de l’addition flottante. Les environnements parallèles changent dynamiquement l’ordre des opérations. Par conséquent, les résultats numériques peuvent changer d’une exécution à une autre. Nous garantissons la reproductibilité en étendant autantque possible l’arrondi correct à des séquences de calculs plus importantes que les opérations arithmétique exigées par le standard IEEE-754. Nous introduisons RARE-BLAS une implémentation des BLAS qui est reproductible et précise en utilisant les transformations sans erreur et les algorithmes de sommation appropriés. Nous présentons dans cette thèsedes solutions pour le premier (asum, dot and nrm2) et le deuxième (gemv and trsv) niveaux des BLAS. Nous développons une implémentation de ces solutions qui utilise les interfaces de programmation parallèles (OpenMP et MPI) et les jeu d’instructions vectorielles. Nous comparons l’efficacité de RARE-BLAS à une bibliothèque optimisé (Intel MKL) et à des solutionsreproductibles existantes. / Numerical reproducibility failures rise in parallel computation because floating-point summation is non-associative. Massively parallel systems dynamically modify the order of floating-point operations. Hence, numerical results might change from one run to another. We propose to ensure reproducibility by extending as far as possible the IEEE-754 correct rounding property to larger computing sequences. We introduce RARE-BLAS a reproducible and accurate BLAS library that benefits from recent accurate and efficient summation algorithms. Solutions for level 1 (asum, dot and nrm2) and level 2 (gemv and trsv) routines are designed. Implementations relying on parallel programming API (OpenMP, MPI) and SIMD extensions areproposed. Their efficiency is studied compared to optimized library (Intel MKL) and other existing reproducible algorithms. Reproductibilité BLAS Efficacité Parallélisme Précision Reproducibility Accuracy Efficiency BLAS Parallelism Floating-point 004
93	[en] A MODEL OF COMPUTATION FOR OBJECT CIRCUITS / [pt] UM MODELO DE COMPUTAÇÃO PARA CIRCUITOS DE OBJETOS MATHEUS COSTA LEITE 19 September 2003 (has links) [pt] Programação Orientada a Objetos é uma técnica de modelagem de software madura e bem estabelecida. Entretanto, a importância do seu papel tem a mesma medida do consenso em relação às suas fraquezas e limitações. OO não é uma panacéia, e, caso falhe, alternativas devem ser buscadas - algumas delas híbridas, outras inteiramente novas. Neste trabalho, argumentamos que o paralelo entre OO e circuitos elétricos é uma solução híbrida interessante, pois algumas das características básicas destes circuitos são as mesmas perseguidas como o Santo Gral da Engenharia de Software - concorrência, modularidade, robustez, escalabilidade, etc. - e que nem sempre são alcançadas somente com a abordagem OO tradicional. Sendo assim, nossa proposta é o estabelecimento de uma correlação entre circuitos elétricos e programas orientados a objeto. Do primeiro, vem o circuito: percurso fechado por onde informação trafega e é processada. Do segundo, vem o objeto: entidade abstrata que constitui a informação que trafega no circuito. Finalmente, da união de ambos, surge um novo modelo de computação - o circuito de objetos - onde se supõe que os benefícios trazidos pelas partes que o compõem sejam complementares. Motivamos nossa discussão com uma série de exemplos simples, porém elucidativos, seguida de um estudo de caso na área de simulação. De modo a ratificar o funcionamento destes circuitos, foi construída uma implementação de circuitos de objetos utilizando a linguagem de programação Java. / [en] Object Oriented Programming is a mature, well established software modeling technique. Nevertheless, the importance of its role has the same magnitude as the consensus in respect to its weakness and limitations. OO is not a panacea, and, should it fail, alternatives must be found - some hybrid, while others entirely new. In this work, we argue that the parallel between OO and electric circuits is an interesting hybrid solution, for some of the basic features found in such circuits are the same as the ones sought after as the Holy Grail of Software Engineering - concurrency, modularity, robustness, scalability, etc. - and that are not always achieved only with the traditional OO approach. Hence, our proposal is the establishment of a correlation between electric circuits and object oriented programming. From the former, comes the circuit: closed path where information flows and is processed. From the second, comes the object: abstract entity that constitutes the information flowing within the circuit. Finally, from their union, arises a new model of computation - the object circuit - where it is supposed the benefits brought by each part are complementary. We motivate our discussion with a collection of simple - albeit elucidative - examples, followed by a case study in the simulation field. In order to ratify the functioning of these circuits, an object circuit`s implementation was built on top of the Java programming language. [pt] CIRCUITO [en] CIRCUIT [pt] MODELO DE COMPUTACAO [en] MODEL OF COMPUTATION [pt] COMPONENTE [en] COMPONENT [pt] PARALELISMO [en] PARALLELISM
94	Enhancing the performance of decoupled software pipeline through backward slicing Alwan, Esraa January 2014 (has links) The rapidly increasing number of cores available in multicore processors does not necessarily lead directly to a commensurate increase in performance: programs written in conventional languages, such as C, need careful restructuring, preferably automatically, before the benefits can be observed in improved run-times. Even then, much depends upon the intrinsic capacity of the original program for concurrent execution. Using software techniques to parallelize the sequential application can raise the level of gain from multicore systems. Parallel programming is not an easy job for the user, who has to deal with many issues such as dependencies, synchronization, load balancing, and race conditions. For this reason the role of automatically parallelizing compilers and techniques for the extraction of several threads from single-threaded programs, without programmer intervention, is becoming more important and may help to deliver better utilization of modern hardware. One parallelizing technique that has been shown to be an effective for the parallelization of applications that have irregular control flow and complex memory access patterns is Decoupled Software Pipeline (DSWP). This transformation partitions the loop body into a set of stages, ensuring that critical path dependencies are kept local to a stage. Each stage becomes a thread and data is passed between threads using inter-core communication. The success of DSWP depends on being able to extract the relatively fine-grain parallelism that is present in many applications. Another technique which offers potential gains in parallelizing general purpose applications is slicing. Program slicing transforms large programs into several smaller ones that execute independently, each consisting of only statements relevant to the computation of certain, socalled, (program) points. This dissertation explores the possibility of performance benefits arising from a secondary transformation of DSWP stages by slicing. To that end a new combination method called DSWP/Slice is presented. Our observation is that individual DSWP stages can be parallelized by slicing, leading to an improvement in performance of the longest duration DSWP stages. In particular, this approach can be applicable in cases where DOALL is not. In consequence better load balancing can be achieved between the DSWP stages. Moreover, we introduce an automatic implementation of the combination method using Low Level Virtual Machine (LLVM) compiler framework. This combination is particularly effective when the whole long stage comprises a function body. More than one slice extracted from a function body can speed up its execution time and also increases the scalability of DSWP. An evaluation of this technique on six programs with a range of dependence patterns leads to considerable performance gains on a core-i7 870 machine with 4-cores/8-threads. The results are obtained from an automatic implementation that shows the proposed method can give a factor of up to 1.8 speed up compared with the original sequential code. 004.35
95	Insightful Performance Analysis of Many-Task Runtimes through Tool-Runtime Integration Chaimov, Nicholas 06 September 2017 (has links) Future supercomputers will require application developers to expose much more parallelism than current applications expose. In order to assist application developers in structuring their applications such that this is possible, new programming models and libraries are emerging, the many-task runtimes, to allow for the expression of orders of magnitude more parallelism than currently existing models. This dissertation describes the challenges that these emerging many-task runtimes will place on performance analysis, and proposes deep integration between runtimes and performance tools as a means of producing correct, insightful, and actionable performance results. I show how tool-runtime integration can be used to aid programmer understanding of performance characteristics and to provide online performance feedback to the runtime for Unified Parallel C (UPC), High Performance ParalleX (HPX), Apache Spark, the Open Community Runtime, and the OpenMP runtime. Apache Spark High performance computing High performance ParalleX Open community runtime Task parallelism Unified Parallel C
96	Décomposition automatique des programmes parallèles pour l'optimisation et la prédiction de performance. / Automatic decomposition of parallel programs for optimization and performance prediction. Popov, Mihail 07 October 2016 (has links) Dans le domaine du calcul haute performance, de nombreux programmes étalons ou benchmarks sont utilisés pour mesurer l’efficacité des calculateurs,des compilateurs et des optimisations de performance. Les benchmarks de référence regroupent souvent des programmes de calcul issus de l’industrie et peuvent être très longs. Le processus d’´étalonnage d’une nouvelle architecture de calcul ou d’une optimisation est donc coûteux.La plupart des benchmarks sont constitués d’un ensemble de noyaux de calcul indépendants. Souvent l’´étalonneur n’est intéressé que par un sous ensemble de ces noyaux, il serait donc intéressant de pouvoir les exécuter séparément. Ainsi, il devient plus facile et rapide d’appliquer des optimisations locales sur les benchmarks. De plus, les benchmarks contiennent de nombreux noyaux de calcul redondants. Certaines opérations, bien que mesurées plusieurs fois, n’apportent pas d’informations supplémentaires sur le système à étudier. En détectant les similarités entre eux et en éliminant les noyaux redondants, on diminue le coût des benchmarks sans perte d’information.Cette thèse propose une méthode permettant de décomposer automatiquement une application en un ensemble de noyaux de performance, que nous appelons codelets. La méthode proposée permet de rejouer les codelets,de manière isolée, dans différentes conditions expérimentales pour pouvoir étalonner leur performance. Cette thèse étudie dans quelle mesure la décomposition en noyaux permet de diminuer le coût du processus de benchmarking et d’optimisation. Elle évalue aussi l’avantage d’optimisations locales par rapport à une approche globale.De nombreux travaux ont été réalisés afin d’améliorer le processus de benchmarking. Dans ce domaine, on remarquera l’utilisation de techniques d’apprentissage machine ou d’´echantillonnage. L’idée de décomposer les benchmarks en morceaux indépendants n’est pas nouvelle. Ce concept a été aappliqué avec succès sur les programmes séquentiels et nous le portons à maturité sur les programmes parallèles.Evaluer des nouvelles micro-architectures ou la scalabilité est 25× fois plus rapide avec des codelets que avec des exécutions d’applications. Les codelets prédisent le temps d’exécution avec une précision de 94% et permettent de trouver des optimisations locales jusqu’`a 1.06× fois plus efficaces que la meilleure approche globale. / In high performance computing, benchmarks evaluate architectures, compilers and optimizations. Standard benchmarks are mostly issued from the industrial world and may have a very long execution time. So, evaluating a new architecture or an optimization is costly. Most of the benchmarks are composed of independent kernels. Usually, users are only interested by a small subset of these kernels. To get faster and easier local optimizations, we should find ways to extract kernels as standalone executables. Also, benchmarks have redundant computational kernels. Some calculations do not bring new informations about the system that we want to study, despite that we measure them many times. By detecting similar operations and removing redundant kernels, we can reduce the benchmarking cost without loosing information about the system. This thesis proposes a method to automatically decompose applications into small kernels called codelets. Each codelet is a standalone executable that can be replayed in different execution contexts to evaluate them. This thesis quantifies how much the decomposition method accelerates optimization and benchmarking processes. It also quantify the benefits of local optimizations over global optimizations. There are many related works which aim to enhance the benchmarking process. In particular, we note machine learning approaches and sampling techniques. Decomposing applications into independent pieces is not a new idea. It has been successfully applied on sequential codes. In this thesis we extend it to parallel programs. Evaluating scalability or new micro-architectures is 25× faster with codelets than with full application executions. Codelets predict the execution time with an accuracy of 94% and find local optimizations that outperform the best global optimization up to 1.06×. Prédiction de performance Parallélisme Compilation Optimisation Checkpoint restart Performance prediction Parallelism Compilation Optimization Checkpoint restart 004.35
97	Solving Hard Combinatorial Optimization Problems using Cooperative Parallel Metaheuristics / Utilisation de méta-heuristiques coopératives parallèles pour la résolution de problèmes d'optimisation combinatoire difficiles Munera Ramirez, Danny 27 September 2016 (has links) Les Problèmes d’Optimisation Combinatoire (COP) sont largement utilisés pour modéliser et résoudre un grand nombre de problèmes industriels. La résolution de ces problèmes pose un véritable défi en raison de leur inhérente difficulté, la plupart étant NP-difficiles. En effet, les COP sont difficiles à résoudre par des méthodes exactes car la taille de l’espace de recherche à explorer croît de manière exponentielle par rapport à la taille du problème. Les méta-heuristiques sont souvent les méthodes les plus efficaces pour résoudre les problèmes les plus difficiles. Malheureusement, bien des problèmes réels restent hors de portée des meilleures méta-heuristiques. Le parallélisme permet d’améliorer les performances des méta-heuristiques. L’idée de base est d’avoir plusieurs instances d’une méta-heuristique explorant de manière simultanée l’espace de recherche pour accélérer la recherche de solution. Les meilleures techniques font communiquer ces instances pour augmenter la probabilité de trouver une solution. Cependant, la conception d’une méthode parallèle coopérative n’est pas une tâche aisée, et beaucoup de choix cruciaux concernant la communication doivent être résolus. Malheureusement, nous savons qu’il n’existe pas d’unique configuration permettant de résoudre efficacement tous les problèmes. Ceci explique que l’on trouve aujourd’hui des systèmes coopératifs efficaces mais conçus pour un problème spécifique ou bien des systèmes plus génériques mais dont les performances sont en général limitées. Dans cette thèse nous proposons un cadre général pour les méta-heuristiques parallèles coopératives (CPMH). Ce cadre prévoit plusieurs paramètres permettant de contrôler la coopération. CPMH organise les instances de méta-heuristiques en équipes ; chaque équipe vise à intensifier la recherche dans une région particulière de l’espace de recherche. Cela se fait grâce à des communications intra-équipes. Des communications inter-équipes permettent quant a` elles d’assurer la diversification de la recherche. CPMH offre à l’utilisateur la possibilité d’ajuster le compromis entre intensification et diversification. De plus, ce cadre supporte différentes méta-heuristiques et permet aussi l’hybridation de méta-heuristiques. Nous proposons également X10CPMH, une implémentation de CPMH, écrite en langage parallèle X10. Pour valider notre approche, nous abordons deux COP du monde industriel : des variantes difficiles du Problème de Stable Matching (SMP) et le Problème d’Affectation Quadratique (QAP). Nous proposons plusieurs méta-heuristiques originales en version séquentielle et parallèle, y compris un nouvelle méthode basée sur l’optimisation extrémale ainsi qu’un nouvel algorithme hybride en parallèle coopératif pour QAP. Ces algorithmes sont implémentés grâce à X10CPMH. L’évaluation expérimentale montre que les versions avec parallélisme coopératif offrent un très bon passage à l’échelle tout en fournissant des solutions de haute qualité. Sur les variantes difficiles de SMP, notre méthode coopérative offre des facteurs d’accélération super-linéaires. En ce qui concerne QAP, notre méthode hybride en parallèle coopératif fonctionne très bien sur les cas les plus difficiles et permet d’améliorer les meilleures solutions connues de plusieurs instances. / Combinatorial Optimization Problems (COP) are widely used to model and solve real-life problems in many different application domains. These problems represent a real challenge for the research community due to their inherent difficulty, as many of them are NP-hard. COPs are diﬃcult to solve with exact methods due to the exponential growth of the problem’s search space with respect to the size of the problem. Metaheuristics are often the most efficient methods to make the hardest problems tractable. However, some hard and large real-life problems are still out of the scope of even the best metaheuristic algorithms. Parallelism is a straightforward way to improve metaheuristics performance. The basic idea is to perform concurrent explorations of the search space in order to speed up the search process. Currently, the most advanced techniques implement some communication mechanism to exchange information between metaheuristic instances in order to try and increase the probability to find a solution. However, designing an efficient cooperative parallel method is a very complex task, and many issues about communication must be solved. Furthermore, it is known that no unique cooperative configuration may efficiently tackle all problems. This is why there are currently efficient cooperative solutions dedicated to some specific problems or more general cooperative methods but with limited performances in practice. In this thesis we propose a general framework for Cooperative Parallel Metaheuristics (CPMH). This framework includes several parameters to control the cooperation. CPMH organizes the explorers into teams; each team aims at intensifying the search in a particular region of the search space and uses intra-team communication. In addition, inter-team communication is used to ensure search diversification. CPMH allows the user to tune the trade-oﬀ between intensification and diversification. However, our framework supports different metaheuristics and metaheuristics hybridization. We also provide X10CPMH, an implementation of our CPMH framework developed in the X10 parallel language. To assess the soundness of our approach we tackle two hard real-life COP: hard variants of the Stable Matching Problem (SMP) and the Quadratic Assignment Problem (QAP). For all problems we propose new sequential and parallel metaheuristics, including a new Extremal Optimization-based method and a new hybrid cooperative parallel algorithm for QAP. All algorithms are implemented thanks to X10CPMH. A complete experimental evaluation shows that the cooperative parallel versions of our methods scale very well, providing high-quality solutions within a limited timeout. On hard and large variants of SMP, our cooperative parallel method reaches super-linear speedups. Regarding QAP, the cooperative parallel hybrid algorithm performs very well on the hardest instances, and improves the best known solutions of several instances. Méta-heuristiques Problèmes d’optimisation combinatoire Parallélisme Metaheuristics Combinatorial Optimization Problems Parallelism 004
98	Extração de informações de desempenho em GPUs NVIDIA / Performance Information Extraction on NVIDIA GPUs Santos, Paulo Carlos Ferreira dos 15 March 2013 (has links) O recente crescimento da utilização de Unidades de Processamento Gráfico (GPUs) em aplicações científicas, que são voltadas ao desempenho, gerou a necessidade de otimizar os programas que nelas rodam. Uma ferramenta adequada para essa tarefa é o modelo de desempenho que, por sua vez, se beneficia da existência de uma ferramenta de extração de informações de desempenho para GPUs. Este trabalho cobre a criação de um gerador de microbenchmark para instruções PTX que também obtém informações sobre as características do hardware da GPU. Os resultados obtidos com o microbenchmark foram validados através de um modelo simplificado que obteve erros entre 6,11% e 16,32% em cinco kernels de teste. Também foram levantados os fatores de imprecisão nos resultados do microbenchmark. Utilizamos a ferramenta para analisar o perfil de desempenho das instruções e identificar grupos de comportamentos semelhantes. Também testamos a dependência do desempenho do pipeline da GPU em função da sequência de instruções executada e verificamos a otimização do compilador para esse caso. Ao fim deste trabalho concluímos que a utilização de microbenchmarks com instruções PTX é factível e se mostrou eficaz para a construção de modelos e análise detalhada do comportamento das instruções. / The recent growth in the use of tailored for performance Graphics Processing Units (GPUs) in scientific applications, generated the need to optimize GPU targeted programs. Performance models are the suitable tools for this task and they benefits from existing GPUs performance information extraction tools. This work covers the creation of a microbenchmark generator using PTX instructions and it also retrieves information about the GPU hardware characteristics. The microbenchmark results were validated using a simplified model with errors rates between 6.11% and 16.32% under five diferent GPU kernels. We also explain the imprecision factors present in the microbenchmark results. This tool was used to analyze the instructions performance profile, identifying groups with similar behavior. We also evaluated the corelation of the GPU pipeline performance and instructions execution sequence. Compiler optimization capabilities for this case were also verified. We concluded that the use of microbenchmarks with PTX instructions is a feasible approach and an efective way to build performance models and to generate detailed analysis of the instructions\' behavior. desempenho de GPU GPU performance linguagem PTX microbenchmark microbenchmark modelo de desempenho paralelismo. parallelism. performance model PTX language
99	Parallélisation de simulations physiques utilisant un modéle de Boltzmann mullti-phases et multi-composants en vue d'un épandage de GNL sur sol / Parallelisation of physical simulations using Boltzmann method multiphase and multicomponent with the aim of manuring GNL on ground Duchateau, Julien 09 December 2015 (has links) Cette thèse a pour but de définir et de développer des solutions informatiques de manière à permettre la mise en place de simulations physiques sur des domaines de simulation très grands tels qu'un site industriel comme le terminal méthanier de Dunkerque. Le modèle d'écoulement mis en place est basé sur la méthode de Boltzmann sur réseau et permet de gérer de nombreux cas de simulation. Différentes architectures de calculs sont étudiées dans ce travail de thèse. L'utilisation du processeur central ainsi que de processeurs graphiques pour la parallélisation des calculs est abordée. Des solutions sont mises en place de manière à obtenir une parallélisation efficace du modèle de calcul sur plusieurs GPUS pouvant calculer en parallèle. Une approche de maillage progressif du maillage de simulation est également abordée pour gérer dynamiquement la quantité de mémoire nécessaire pour simuler en fonction des besoins de la simulation et de sa progression. Son intégration sur une architecture de calcul composée de plusieurs processeurs graphiques est également mise en avant. Finalement, une solution de type "Out-of-core" a été mise en place pour traiter des cas où la mémoire liée aux processeurs graphiques est insuffisante pour simuler. En effet, les processeurs graphiques disposent généralement d'une quantité de mémoire nettement inférieure à celle de la RAM du processeur central. La mise en place d'un système d'échange efficace entre les processeurs graphiques et la RAM est donc essentielle. / This thesis has for goal to define and develop solutions in order to achieve physical simulations on large simulation domains such as industrial sites (Dunkerque LNG Terminal). The simulation model is based on the lattice Boltzmann method (LBM) and allows to treat several simulation cases. The use of several computing architectures are studied in this work. The use of a multicore central processing unit (CPU) and also several graphics processing units (GPUS) is considered. An efficient parllelization of the simulation model is obtained by the use of several GPUS able to calculate in parallel. A progressive mesh algorithm is also defined in order to automatically mesh the simulation domain according to fluids propagation. Its integration on a multi-GPU architecture is studied. Finally, an "out-of-core" method is introduced in order to handle cases that require more memory than all GPUS have. Indeed, GPU memory is generally significantly inferior to the CPU memory. The definition of an exchange system between GPUS and the CPU is therefore essential. Parallélisme GPU Simulation Méthode de Boltzmann Parallelism Graphics Processing Unit Simulation Boltzmann method
100	Uma Interface de ProgramaÃÃo DistribuÃda para AplicaÃÃes em OtimizaÃÃo CombinatÃria / A Programming Interface for Distributed Applications in Combinatorial Optimization Allberson Bruno de Oliveira Dantas 12 September 2011 (has links) nÃo hÃ / Este trabalho foi motivado pela necessidade da exploraÃÃo do potencial do paralelismo distribuÃdo em aplicaÃÃes em OtimizaÃÃo CombinatÃria. Para tanto, propomos uma interface de programaÃÃo distribuÃda, na qual prezamos dois requisitos principais: eficiÃncia e reuso. O primeiro advÃm da necessidade de aplicaÃÃes de CAD exigirem mÃximo desempenho possÃvel. Assim sendo, especificamos esta interface como uma extensÃo da biblioteca MPI, a qual Ã assumida como eficiente para aplicaÃÃes distribuÃdas. O requisito reuso deve tornar compatÃveis duas caracterÃsticas importantes: assincronismo e operaÃÃes coletivas. O assincronismo deve estar presente na interface, uma vez que as aplicaÃÃes em OtimizaÃÃo CombinatÃria, em sua maioria, possuem uma natureza assÃncrona. OperaÃÃes coletivas sÃo funcionalidades que devem estar disponÃveis na interface, de modo que possam ser utilizadas por aplicaÃÃes em suas execuÃÃes. Tendo em vista atender o requisito reuso, baseamos esta interface nos Modelos de ComputaÃÃo DistribuÃda Dirigidos por Eventos e por Pulsos, pois os mesmos sÃo assÃncronos e permitem a incorporaÃÃo de operaÃÃes coletivas. Implementamos parcialmente a inteface definida neste trabalho. Tendo em vista validar uso desta inteface por aplicaÃÃes em OtimizaÃÃo CombinatÃria, selecionamos duas aplicaÃÃes e as implementamos utilizando a interface. SÃo elas a tÃcnica Branch-and-Bound e o Problema do Conjunto Independente MÃximo (CIM). Fornecemos tambÃm alguns resultados experimentais. / This work was motivated by the need of exploiting the potential of distributed paralelism in combinatorial optimization applications. propose a distributed programming interface, To achieve this goal, we in which we cherish two main requirements: eciency and reuse. The rst stems from the need of HPC (High applications require maximum possible performance. Performance Computing) Therefore, we specify our interface as an extension of the MPI library, which is assumed to be ecient for distributed applications. The reuse requirement must make compatible two important features: asynchronism and collective operations. Asynchronism must be present at our interface, once most of combinatorial optimization applications have an asynchronous nature. Collective operations are features that should be available in the interface, so that they can be used by applications in their execution. In order reach the reuse requirement, we based this interface on the Event- and Pulse-driven Models of Distributed Computing, once they are asynchronous and allow the incorporation of collective operations. We implemented partially the interface dened in this work. In order to validate the use of the inteface by combinatorial optimization applications, we selected two applications and implemented them using our interface. They are the Branch-and-Bound technique and the Maximum Stable Set Problem (MSSP). We also provide some experimental results. Paralelismo Algoritmos DistribuÃdos OtimizaÃÃo CombinatÃria CIENCIA DA COMPUTACAO

Search results