• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 165
  • 73
  • 45
  • 20
  • 18
  • 12
  • 4
  • 4
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 396
  • 78
  • 73
  • 72
  • 70
  • 59
  • 57
  • 50
  • 38
  • 37
  • 36
  • 35
  • 34
  • 34
  • 34
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
211

Analýza paralelizovatelnosti programů na základě jejich bytecode / Analýza paralelizovatelnosti programů na základě jejich bytecode

Brabec, Michal January 2013 (has links)
Analysis of automatic program parallelization based on bytecode There are many algorithms for automatic parallelization and this work explores the possible application of these algorithms to programs based on their bytecode or similar intermediate code. All these algorithms require the identification of independent code segments, because if two parts of code do not interfere with one another then they can be run in parallel without any danger of data corruption. Dependence testing is an extremely complicated problem and in general application, it is not algorithmically solvable. However, independences can be discovered in special cases and then they can be used as a basis for application of automatic parallelization, like the use of vector instructions. The first step is function inlining that allows the compiler to analyze the code more precisely, without unnecessary dependences caused by unknown functions. Next, it is necessary to identify all control flow constructs, like loops, and after that the compiler can attempt to locate dependences between the statements or instructions. Parallelization can be achieved only if the analysis discovered some independent parts in the code. This work is accompanied by an implementation of function inlining and code analysis for the .NET framework.
212

Parallélisme des nids de boucles pour l’optimisation du temps d’exécution et de la taille du code / Nested loop parallelism to optimize execution time and code size

Elloumi, Yaroub 16 December 2013 (has links)
Les algorithmes des systèmes temps réels incluent de plus en plus de nids de boucles, qui sont caractérisés par un temps d’exécution important. De ce fait, plusieurs démarches de parallélisme des boucles imbriquées ont été proposées dans l’objectif de réduire leurs temps d’exécution. Ces démarches peuvent être classifiées selon deux niveaux de granularité : le parallélisme au niveau des itérations et le parallélisme au niveau des instructions. Dans le cas du deuxième niveau de granularité, les techniques visent à atteindre un parallélisme total des instructions appartenant à une même itération. Cependant, le parallélisme est contraint par les dépendances des données inter-itérations ce qui implique le décalage des instructions à travers les boucles imbriquées, provocant ainsi une augmentation du code proportionnelle au niveau du parallélisme. Par conséquent, le parallélisme total au niveau des instructions des nids de boucles engendre des implémentations avec des temps d’exécution non-optimaux et des tailles du code importantes. Les travaux de cette thèse s’intéressent à l’amélioration des stratégies de parallélisme des nids de boucles. Une première contribution consiste à proposer une nouvelle technique de parallélisme au niveau des instructions baptisée « retiming multidimensionnel décalé ». Elle vise à ordonnancer les nids de boucles avec une période de cycle minimale, sans atteindre un parallélisme total. Une deuxième contribution consiste à mettre en pratique notre technique dans le contexte de l’implémentation temps réel embarquée des nids de boucles. L’objectif est de respecter la contrainte du temps d’exécution tout en utilisant un code de taille minimale. Dans ce contexte, nous avons proposé une première démarche d’optimisation qui consiste à utiliser notre technique pour déterminer le niveau parallélisme minimal. Par la suite, nous avons décrit une deuxième démarche permettant de combiner les parallélismes au niveau des instructions et au niveau des itérations, en utilisant notre technique et le « loop striping » / The real time implementation algorithms always include nested loops which require important execution times. Thus, several nested loop parallelism techniques have been proposed with the aim of decreasing their execution times. These techniques can be classified in terms of granularity, which are the iteration level parallelism and the instruction level parallelism. In the case of the instruction level parallelism, the techniques aim to achieve a full parallelism. However, the loop carried dependencies implies shifting instructions in both side of nested loops. Consequently, these techniques provide implementations with non-optimal execution times and important code sizes, which represent limiting factors when implemented on embedded real-time systems. In this work, we are interested on enhancing the parallelism strategies of nested loops. The first contribution consists of purposing a novel instruction level parallelism technique, called “delayed multidimensional retiming”. It aims to scheduling the nested loops with the minimal cycle period, without achieving a full parallelism. The second contribution consists of employing the “delayed multidimensional retiming” when providing nested loop implementations on real time embedded systems. The aim is to respect an execution time constraint while using minimal code size. In this context, we proposed a first approach that selects the minimal instruction parallelism level allowing the execution time constraint respect. The second approach employs both instruction level parallelism and iteration level parallelism, by using the “delayed multidimensional retiming” and the “loop striping”
213

Political Parallelism in Diaspora-based Transnational Media : The case of Ethiopian Satellite Television and Radio (ESAT)

Bekele, Mesfin Negash January 2019 (has links)
This study explores political parallelism in the context of diaspora-based transnational media through the experience of the Ethiopian Satellite Television and Radio (ESAT). The station is conceived as a party media outlet and transformed into a diaspora-based, non-profit and mainly diaspora funded institution. It has been operating from its three studios in Amsterdam, London and Washington, D.C., until recently. ESAT has emerged as one of the most influential media outlets in the political landscape of Ethiopia in the last ten years. The research, through qualitative and in-depth case study interviews, examines the underlying ideological, political and organizational affiliations that defined ESAT’s position in the media landscape. The study concluded that political parallelism, as an indicator of the dynamics between media and politics, can be used in the diaspora-based transnational media context. However, the study also validated critics on the inapplicability of the two preconditions of political parallelism, namely the existence of competitive system and patterns. The analysis confirms a high level of political parallelism in ESAT in all the five indicators selected for the study. The indicators considered are Ownership, Organizational connections, Party or ideological loyalty, Media personnel’s political involvement, and Journalists’ role orientation. Each of them demonstrated a level of parallelism in ideological orientations or party connection with Ginbot 7 Movement for Democracy and Justice. The study concluded that the salient features of political parallelism should further be studied in the context of the transnational media space of diaspora-based media.
214

Meaningful form : parallelism and inverse parallelism in catullus, tibullus and horace.

Van der Riet, Jacobus Werndly January 1998 (has links)
A thesis submitted to the Faculty of Arts, University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Doctor of Philosophy. / All the poems of Catullus and Tibullus and the first three books of Horace's Odes are investigated tor structures of parallelism and inverse paralelism (chiasmus) and thus the extent to which these devices were used is determined. Such structures are demonstrated for the first time for several poems. Sometimes additions or modifications are made to the structural analyses of other scholars, and sometimes their findings are confirmed. The notion that inverse parallelism was seldom used by Roman authors is dispelled. The freedom with which these devices were used, resulting in a great variety of deviations from strictly symmetrical structures, is demonstrated Both common and idiosyncratic features in the use of the devices by the three authors are shown. Several poems of each author are discussed to illustrate that the demonstration of a structure of parallelism or inverse parallelism is in itself an interpretative act, which can at the same time serve as a basis for further interpretation. In particular it is shown that structures of inverse parallelism often, if not always, iconically reflect the meaning of the poem (hence the title of the thesis) This ability or structures of inverse parallelism to reflect the meaning of the poem may partly account for the fact that they are used more frequently than are structures of parallelism. In the poems discussed structures of inverse parallelism iconically reflect the ideas of reversal, cyclical movement, non-progression/deadlock, balance and/or contrast and enclosure, as well as combinations of the above, such as a spiral (both progression and non-progression) or the combination of reversal and nonprogression. Continuity between the structural methods of Greek and Roman authors is demonstrated, and a theoretical framework is provided, which answers the questions how such structures can be determined, and what purposes, both practical and poetic, they serve. A literary-critical awareness of inverse parallelism in Antiquity is demonstrated. St. Augustine, especially, has a fairly developed theoretical frame of reference on the subject, in his De Genest ad Litteram / Andrew Chakane 2019
215

Análise de benefícios do paralelismo por comunicação unilateral em aplicações com grades não estruturadas / Improvement analysis of parallelism by one-sided communication on unstructured grids applications

Lopes, Pedro Pais 03 September 2010 (has links)
A computacao paralela, empregada no meio cientifico para resolucao de problemas que de- mandam grande poder computacional, teve nos ultimos anos o surgimento de um novo tipo de comunicacao entre instancias do paralelismo. Trata-se da Comunicacao Unilateral (CUL), onde somente uma instancia realiza a operacao de transferencia de informacoes, e esta ocorre em segundo plano, ao contrario da Comunicacao Bilateral (CBL), onde uma instancia envia a informacao e a outra recebe. Neste contexto se buscou analisar os beneficios que a CUL agrega ao paralelismo de um programa que se utiliza de uma grade nao estruturada em me- moria. Duas formas de apoio ao paralelismo foram utilizadas: uma biblioteca, a \"Message Passing Interface\" (MPI) (especificamente a sua parte que descreve a CUL), e uma extensao a linguagem Fortran, o Coarray Fortran (CAF). A semantica do MPI CUL e mais complexa que a do CAF, mas a do CAF e mais restritiva. Para analisar a semantica e desempenho da CUL foi realizada uma ambientacao utilizando MPI CUL e CAF no paralelismo de um programa simples, denominado jogo da Vida (Game of Life), com grade estruturada e com otimo desempenho paralelo atraves do MPI CBL. Na programacao o MPI CUL se mostrou verborragico (aumento do numero de linhas de codigo) e complexo, principalmente quando se utiliza um controle refinado de sincronismo entre as imagens. Ja o CAF reduziu o nu- mero de linhas de codigo (entre 20% e 40%), e o sincronismo e muito mais simples. Os resultados mostraram uma piora no desempenho no caso do MPI CUL, mas para o CAF o desempenho absoluto foi melhor que a implementacao original ate o numero de nucleos de processamento que compartilham a mesma memoria. Para grades nao estruturadas se utilizou o Ocean Land Atmospheric Model (OLAM), um modelo de simulacao do sistema terrestre com grade baseada em prismas triangulares, paralelizado atraves de MPI CBL. A implementacao da comunicacao por MPI CUL na estrutura do paralelismo existente mos- trou que esta semantica possui alguns pontos que podem prejudicar a programacao, como o tratamento da exposicao de memoria (cada instancia tem uma memoria exposta de tamanho diferente) e como e realizado o sincronismo entre as instancias. Em termos de desempenho as curvas de speed-ups mostraram que o MPI CUL prejudicou o OLAM independentemente da implementacao das bibliotecas ou do equipamento utilizado, com reducao de pelo menos 20% no speed-up para sete ou mais processadores. Assim como no jogo da Vida o MPI com comunicacao unilateral penalizou o desempenho. / Parallel computing is used to solve many scientific problems that demand intensive compu- ting power. Recently a new paradigm of communication between instances of the parallelism has appeared, called the one-sided communication (OSC), where only one instance performs the operation of information transfer, occurring in the background, as opposed to the two- sided communication (TSC), where one instance sends the information and the other receives it. In this context we analyze the benefits that OSC aggregates to the parallelism of a pro- gram that uses an unstructured grid in memory. Two OSC implementations were used: the \"Message Passing Interface\" (MPI) library (specifically the part that describes OSC), and Coarray Fortran (CAF), an extension of the Fortran language. The semantics of MPI OSC is more complex than that of CAF, but the semantics of CAF is more restrictive. To analyze the semantics and performance of OSC a simple program called Game of Life is used in a structured grid, giving very good parallel performance through MPI TSC. The MPI OSC program was verbose (increase in the number of lines of code) and complex, especially when using a more refined control to synchronize the parallel instances. On the other hand, CAF has reduced the number of lines of code (between 20% to 40%), and the synchronization is very simple. The results showed a worse performance in the case of MPI OSC, but for the CAF the absolute performance was better than the original implementation up to the number of processor cores that share the same memory. For unstructured grids we used the Ocean Land Atmospheric Model (OLAM), an earth simulation model on a grid based on triangular prisms, and parallelized with MPI TSC. The implementation with MPI OSC showed that this semantics has some points that may affect the coding of the communication structure, as in the treatment of memory exposure (each instance has an exposed memory of different size) and the way to treat the synchronization among instances. In terms of performance, the speedup curves showed that MPI OSC penalized OLAM, independently of the MPI implementation or the equipment used, with a reduction of at least 20% in speedup for seven or more processors. As in the Game of Life, MPI OSC degrades the performance.
216

A objetividade natural espiritualizada em Ideen II de Husserl

Santos, Sanqueilo de Lima 30 April 2013 (has links)
Made available in DSpace on 2016-04-27T17:27:03Z (GMT). No. of bitstreams: 1 Sanqueilo de Lima Santos.pdf: 1226141 bytes, checksum: a8e18778521ed5cb86817b8537850de3 (MD5) Previous issue date: 2013-04-30 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / The effort to go through the various forms of intentionality, in the constitution of objects of nature and spirit, is the goal for which they were produced in the complex descriptions Ideas II (1913 - 1922). The concept of regional ontology, which is treated Ideas I (1913) from the standpoint of its more general principles, is applied to particular problems in Ideen II, to demonstrate that reality and the characteristic features of each kind of objectivity (natural and spiritual) depend on intentionality. This is essential both to be donated, and to be recognized in its own ontological status. Our work focused on spiritual objectivity, and found himself thus engaged with themes that are present in the Cartesian Meditations (1931). The approach of spiritual objectivity depends on the exposure of the distinction between causality and motivation. Given the importance of the second, also requires more detailed analysis of the concepts of the spiritual world, empathy, intersubjectivity and expression. In the constitution of spiritual objectivity, its actually admits "predicates meaning" unlike natural objectivity. And our hypothesis is that the level of transcendental spirit, stated by Husserl, can only be sustained by a sense of ego, apparently secondary, whose predispositions of consciousness has essential debt with the body (Leib) and attitudes of parallelism / O esforço de percorrer as várias formas de intencionalidade, presentes na constituição dos objetos da natureza e do espírito, é a meta para a qual foram produzidas as complexas descrições em Ideias II (1913 1922). O conceito de ontologia regional, que é tratado Ideias I (1913) do ponto de vista dos seus princípios mais gerais, é aplicado a problemas particulares em Ideen II, no sentido de demonstrar que a realidade e os traços característicos de cada espécie de objetividade (natural e espiritual) dependem da intencionalidade. Essa é indispensável tanto para serem doados, quanto para serem reconhecidos em seu estatuto ontológico próprio. O nosso trabalho se concentrou na objetividade espiritual, e se viu, dessa forma, comprometido com temas que estão presentes nas Meditações Cartesianas (1931). A abordagem da objetividade espiritual depende da exposição da distinção entre causalidade e motivação. Pela importância da segunda, também exige a análise mais detalhada das noções de mundo espiritual, empatia, intersubjetividade e expressão. Na constituição das objetividades espirituais, a sua realidade admite predicados de significação , ao contrário da objetividade natural. E nossa hipótese é a de que o nível transcendental do espírito, afirmado por Husserl, só se sustenta mediante uma noção, aparentemente secundária, de ego, cujas predisposições de consciência contém uma dívida essencial com o corpo-próprio (Leib) e com o paralelismo das atitudes
217

SoMMA : a software managed memory architecture for multi-issue processors

Jost, Tiago Trevisan January 2017 (has links)
Processadores embarcados utilizam eficientemente o paralelismo a nível de instrução para atender as necessidades de desempenho e energia em aplicações atuais. Embora a melhoria de performance seja um dos principais objetivos em processadores em geral, ela pode levar a um impacto negativo no consumo de energia, uma restrição crítica para sistemas atuais. Nesta dissertação, apresentamos o SoMMA, uma arquitetura de memória gerenciada por software para processadores embarcados capaz de reduz consumo de energia e energy-delay product (EDP), enquanto ainda aumenta a banda de memória. A solução combina o uso de memórias gerenciadas por software com a cache de dados, de modo a reduzir o consumo de energia e EDP do sistema. SoMMA também melhora a performance do sistema, pois os acessos à memória podem ser realizados em paralelo, sem custo em portas de memória extra na cache de dados. Transformações de código do compilador auxiliam o programador a utilizar a arquitetura proposta. Resultados experimentais mostram que SoMMA é mais eficiente em termos de energia e desempenho tanto a nível de processador quanto a nível do sistema completo. A técnica apresenta speedups de 1.118x e 1.121x, consumindo 11% e 12.8% menos energia quando comparando processadores que utilizam e não utilizam SoMMA. Há ainda redução de até 41.5% em EDP do sistema, sempre mantendo a área dos processadores equivalentes. Por fim, SoMMA também reduz o número de cache misses quando comparado ao processador baseline. / Embedded processors rely on the efficient use of instruction-level parallelism to answer the performance and energy needs of modern applications. Though improving performance is the primary goal for processors in general, it might lead to a negative impact on energy consumption, a particularly critical constraint for current systems. In this dissertation, we present SoMMA, a software-managed memory architecture for embedded multi-issue processors that can reduce energy consumption and energy-delay product (EDP), while still providing an increase in memory bandwidth. We combine the use of software-managed memories (SMM) with the data cache, and leverage the lower energy access cost of SMMs to provide a processor with reduced energy consumption and EDP. SoMMA also provides a better overall performance, as memory accesses can be performed in parallel, with no cost in extra memory ports. Compiler-automated code transformations minimize the programmer’s effort to benefit from the proposed architecture. Our experimental results show that SoMMA is more energy- and performance-efficient not only for the processing cores, but also at full-system level. Comparisons were done using the VEX processor, a VLIW reconfigurable processor. The approach shows average speedups of 1.118x and 1.121x, while consuming up to 11% and 12.8% less energy when comparing two modified processors and their baselines. SoMMA also shows reduction of up to 41.5% on full-system EDP, maintaining the same processor area as baseline processors. Lastly, even with SoMMA halving the data cache size, we still reduce the number of data cache misses in comparison to baselines.
218

Gestion dynamique du parallélisme dans les architectures multi-cœurs pour applications mobiles / Dynamic parallelism adaptation in multicore architectures for mobile applications

Texier, Matthieu 08 December 2014 (has links)
Le nombre de smartphones vendus a récemment dépassé celui des ordinateurs. Ces appareils tendent à regrouper de plus en plus de fonctions, ceci grâce à des applications de plus en plus variées telles que la vidéo conférence, la réalité augmentée, ou encore les jeux vidéo. Le support de ces applications est assuré par des ressources de calculs hétérogènes qui sont spécifiques aux différents types de traitements et qui respectent les performances requises et les contraintes de consommation du système. Les applications graphiques, telles que les jeux vidéo, sont par exemple accélérées par un processeur graphique. Cependant les applications deviennent de plus en plus complexes. Une application de réalité augmentée va par exemple nécessiter du traitement d'image, du rendu graphique et un traitement des informations à afficher. Cette complexité induit souvent une variation de la charge de travail qui impacte les performances et donc les besoins en puissance de calcul de l'application. Ainsi, la parallélisation de l'application, généralement prévue pour une certaine charge, devient inappropriée. Ceci induit un gaspillage des ressources de calcul qui pourraient être exploitées par d'autres applications ou par d'autres étages de l'application. Un pipeline de rendu graphique a été choisi comme cas d'utilisation car c'est une application dynamique et qui est de plus en plus répandu dans les appareils mobiles. Cette application a été implémentée et parallélisée sur un simulateur d'architecture multi-cœurs. Un profilage a confirmé l'aspect dynamique, le temps de calcul de chaque donnée ainsi que le nombre d'objets à calculer variant de manière significative dans le temps et que la meilleure répartition du parallélisme évolue en fonction de la scène rendue. Ceci nous a amenés à définir un système permettant d'adapter, au fil de l'exécution, le parallélisme d'une application en fonction d'une prédiction faite de ses besoins. Le choix d'un nouveau parallélisme nécessite de connaître les besoins en puissance de calcul des différents étages, en surveillant les transferts de données entre les étages de l'application. Enfin, l'adaptation du parallélisme implique une nouvelle répartition des tâches en fonction des besoins des différents étages qui est effectuée grâce à un contrôleur central. Le système a été implémenté dans un simulateur précis au niveau TTLM afin d'estimer les gains de performances permis par l'adaptation dynamique. Une architecture permettant l'accélération de différents types d'applications que ce soit généralistes ou graphiques a été définie et comparée à d'autres architectures multi-cœurs. Le coût matériel de cette architecture a de plus été quantifié. Ainsi, pour un support matériel dont la complexité est inférieure à 1,5 % du design complet, on démontre des gains de performance allant jusqu'à 20 % par rapport à certains déploiements statiques, ainsi que la capacité à gérer dynamiquement un nombre de ressources de calcul variable. / The amount of smartphone sales recently surpassed the desktop computer ones. This is mainly due to the smart integration of many functionalities in the same architecture. This is also due to the wide variety of supported applications like augmented reality, video conferencing and video games. The support of these applications is made by heterogeneous computing resources specialized to support each application type thus allowing to meet required performance and power consumption. For example, multimedia applications are accelerated by hardware modules that help video encoding and decoding and video game 3D rendering is accelerated by specialized processors (GPU). However, applications become more and more complicated. As an example, augmented reality requires image processing, 3D rendering and computing the information to display. This complexity often comes with a variation of the computing load, which dynamically changes application performance requirements. When this application is implemented in parallel, the way parallelism is chosen for a specific workload, becomes inefficient for a different one. This leads to a waste in computing resources and our objective is to optimize the usage of all available computing resources at runtime. The selected use case is a graphic rendering pipeline application because it is a dynamic application, which is also widely used in mobile devices. This application has been implemented and parallelized on a multicore architecture simulator. The profiling shows that the dynamicity of the application, the time and the amount of data needed to compute vary. The profiling also shows that the best balance of the parallelism depends on the rendered scene; a dynamic load balancing is therefore required for this application. These studies brought us about defining a system allowing to dynamically adapt the application parallelism depending on a prediction of its computing requirements, which can be performed by monitoring the data exchanges between the application tasks. Then the new parallelism is calculated for each stage by a central controller that manages the whole application. This system has been implemented in a Timed-TLM simulator in order to estimate performance improvements allowed by the dynamic adaptation. An architecture allowing to accelerate mobile applications, such as general-purpose and 3D applications, has been defined and compared to other multicore architectures. The hardware complexity and the performance of the architecture have also been estimated. For an increased complexity lower that 1,5%, we demonstrate performance improvements up to 20% compared with static parallelisms. We also demonstrated the ability to support a variable amount of resources.
219

Exploração de paralelismo ou em uma linguagem em lógica com restrições / OR parallelism exploitation in a constraint logic language

Vargas, Patricia Kayser January 1998 (has links)
Este trabalho a dedicado ao estudo da exploração de paralelismo OU na programação em lógica com restrições em ambientes distribuídos. A programação em lógica, cuja linguagem mais significativa 6 Prolog, tem como premissa a utilização da lógica de predicados como linguagem computacional. A programação em lógica com restrições (CLP) é uma extensão da programação em lógica, onde busca-se a eficiência e a possibilidade de executar novas classes de problemas. Variáveis em CLP podem pertencer a domínios específicos como, por exemplo, reais ou booleanos. O principal conceito introduzido é a restrição. Restrição a uma equação que representa uma certa informação sobre uma variável e a sua relação com outras variáveis. o uso de restrições foi proposto para diminuir o espaço de busca na execução dos programas. Apesar de mais eficientes que a programação em lógica clássica, para algumas aplicações reais o desempenho das linguagens CLP ainda é insatisfatório. Por isso, é necessário buscar alternativas novas como a execução em paralelo. A exploração de paralelismo implícito em programas em 1ógica já demonstrou resultados promissores. Vários modelos foram propostos e implementados utilizando as duas principais fontes de paralelismo — E e OU — de forma isolada ou combinada. O objetivo principal desse trabalho é apresentar o modelo pclp(FD) de exploração de paralelismo OU multi-sequêncial para um ambiente com memória distribuída. O modelo pclp(FD) caracteriza-se pela existência de vários trabalhadores, cada um deles possuindo uma maquina abstrata completa. O escalonamento de tarefas a realizado por uma política dinâmica e distribuída. Uma tarefa em pclp(FD) equivale a um ponto de escolha e a um contexto de execução. O contexto de execução a formado por porções da pilha do exportador. Para que o importador tenha acesso ao contexto de execução utiliza-se a cópia incremental, que a uma das varias técnicas possíveis. Cada trabalhador possui a sua própria copia privada das pilhas de execução. A cópia caracteriza-se pelo envio das pilhas de execução do exportador para uma área privada do importador. A cópia incremental é uma técnica mais otimizada que verifica a existência de partes comuns entre os trabalhadores, copiando apenas as panes novas. O algoritmo de cópia incremental proposto no modelo a feito sem nenhuma centralização de informação do estado das pilhas. O projeto e implementação de um prot6tipo para esse modelo, utilizando a linguagem clp(FD), que implementa CLP sobre domínios finitos, permitirá uma analise das vantagens e desvantagens do modelo proposto. Os resultados obtidos com a análise servirão de base para trabalhos futuros, visando aprimorar a implementação e o modelo. / This work is dedicated to the study of the exploration of OR parallelism in Constraint Logic Programming for distributed environment. Logic Programming, which the most meaningful language is Prolog, has as premise the use of the logic of predicates as computational language. Constraint Logic Programming or CLP is an extension of the logic programming, where efficiency and the possibility to execute new kinds of problems are searched. A variable in CLP can belong to specific domains as, for example, Real or Boolean. The main concept introduced is the constraint. Constraint is an equation that represents a certain information over a variable and its relation with others variables. The use of constraints was proposed to decrease search space in the program execution. Although it is more efficient than classic logic programming, for some real applications, the performance of CLP languages still is unsatisfactory. So, it is necessary to search alternatives as parallel execution. The exploration of implicit parallelism in programs in logic has already demonstrated promising results. Several models have been proposed and implemented using the two main sources of parallelism - AND and OR — in an isolated or combined form. The main objective of this work is to present the pclp(FD) model of exploration of multi-sequential OR parallelism for a distributed memory environment. The pclp(FD) model is characterized for the existence of some workers, each one of them possessing a complete abstract machine. Task scheduling is executed by one dynamic and distributed policy. A task in pclp(FD) is equivalent to a choice point and an execution context. Execution context is formed by portions of the stack of the exporter. So that importer has access to the execution context, it uses incremental copy, which is one of the several possible techniques. The copy is characterized for sending execution stacks of the exporter to a private area of the importer, that is, each worker possesses its private copy of the execution stacks. The incremental copy is a more optimized technique that verifies the existence of common parts between workers, copying only the new ones. The incremental copy algorithm proposed in the model executes without centralized information of the state of the stacks. A prototype project and implementation for this model, using the language clp(FD), that implements CLP over finite domains, will allow an analysis of advantages and disadvantages of the considered model. The results gotten with the analysis will serve of base for future works, aiming to improve the implementation and the model.
220

On the use of low-rank arithmetic to reduce the complexity of parallel sparse linear solvers based on direct factorization techniques / Utilisation de la compression low-rank pour réduire la complexité des solveurs creux parallèles basés sur des techniques de factorisation directes.

Pichon, Grégoire 29 November 2018 (has links)
La résolution de systèmes linéaires creux est un problème qui apparaît dans de nombreuses applications scientifiques, et les solveurs creux sont une étape coûteuse pour ces applications ainsi que pour des solveurs plus avancés comme les solveurs hybrides direct-itératif. Pour ces raisons, optimiser la performance de ces solveurs pour les architectures modernes est un problème critique. Cependant, les contraintes mémoire et le temps de résolution limitent l’utilisation de ce type de solveur pour des problèmes de très grande taille. Pour les approches concurrentes, par exemple les méthodes itératives, des préconditionneurs garantissant une bonne convergence pour un large ensemble de problèmes sont toujours inexistants. Dans la première partie de cette thèse, nous présentons deux approches exploitant la compression Block Low-Rank (BLR) pour réduire la consommation mémoire et/ou le temps de résolution d’un solveur creux. Ce format de compression à plat, sans hiérarchie, permet de tirer profit du caractère low-rank des blocs apparaissant dans la factorisation de systèmes linéaires creux. La solution proposée peut être utilisée soit en tant que solveur direct avec une précision réduite, soit comme un préconditionneur très robuste. La première approche, appelée Minimal Memory, illustre le meilleur gain mémoire atteignable avec la compression BLR, alors que la seconde approche, appelée Just-In-Time, est dédiée à la réduction du nombre d’opérations, et donc du temps de résolution. Dans la seconde partie, nous présentons une stratégie de reordering qui augmente la granularité des blocs pour tirer davantage profit de la localité dans l’utilisation d’architectures multi-coeurs et pour fournir de tâches plus volumineuses aux GPUs. Cette stratégie s’appuie sur la factorisation symbolique par blocs pour raffiner la numérotation produite par des outils de partitionnement comme Metis ou Scotch, et ne modifie pas le nombre d’opérations nécessaires à la résolution du problème. A partir de cette approche, nous proposons dans la troisième partie de ce manuscrit une technique de clustering low-rank qui a pour objectif de former des clusters d’inconnues au sein d’un séparateur. Nous démontrons notamment les intérêts d’une telle approche par rapport aux techniques de clustering classiquement utilisées. Ces deux stratégies ont été développées pour le format à plat BLR, mais sont également une première étape pour le passage à un format hiérarchique. Dans la dernière partie de cette thèse, nous nous intéressons à une modification de la technique de dissection emboîtée afin d’aligner les séparateurs par rapport à leur père pour obtenir des structures de données plus régulières. / Solving sparse linear systems is a problem that arises in many scientific applications, and sparse direct solvers are a time consuming and key kernel for those applications and for more advanced solvers such as hybrid direct-iterative solvers. For those reasons, optimizing their performance on modern architectures is critical. However, memory requirements and time-to-solution limit the use of direct methods for very large matrices. For other approaches, such as iterative methods, general black-box preconditioners that can ensure fast convergence for a wide range of problems are still missing. In the first part of this thesis, we present two approaches using a Block Low-Rank (BLR) compression technique to reduce the memory footprint and/or the time-to-solution of a supernodal sparse direct solver. This flat, non-hierarchical, compression method allows to take advantage of the low-rank property of the blocks appearing during the factorization of sparse linear systems. The proposed solver can be used either as a direct solver at a lower precision or as a very robust preconditioner. The first approach, called Minimal Memory, illustrates the maximum memory gain that can be obtained with the BLR compression method, while the second approach, called Just-In-Time, mainly focuses on reducing the computational complexity and thus the time-to-solution. In the second part, we present a reordering strategy that increases the block granularity to better take advantage of the locality for multicores and provide larger tasks to GPUs. This strategy relies on the block-symbolic factorization to refine the ordering produced by tools such as Metis or Scotch, but it does not impact the number of operations required to solve the problem. From this approach, we propose in the third part of this manuscript a new low-rank clustering technique that is designed to cluster unknowns within a separator to obtain the BLR partition, and demonstrate its assets with respect to widely used clustering strategies. Both reordering and clustering where designed for the flat BLR representation but are also a first step to move to hierarchical formats. We investigate in the last part of this thesis a modified nested dissection strategy that aligns separators with respect to their father to obtain more regular data structure.

Page generated in 0.0851 seconds