171 |
Parallel algorithms and data structures for interactive applications / Algoritmos Paralelos e Estruturas de Dados para Aplicações Interativas / Algorithmes et Structures de Données Parallèles pour Applications InteractivesToss, Julio January 2017 (has links)
La quête de performance a été une constante à travers l’histoire des systèmes informatiques. Il y a plus d’une décennie maintenant, le modèle de traitement séquentiel montrait ses premiers signes d’épuisement pour satisfaire les exigences de performance. Les barrières du calcul séquentiel ont poussé à un changement de paradigme et ont établi le traitement parallèle comme standard dans les systèmes informatiques modernes. Avec l’adoption généralisée d’ordinateurs parallèles, de nombreux algorithmes et applications ont été développés pour s’adapter à ces nouvelles architectures. Cependant, dans des applications non conventionnelles, avec des exigences d’interactivité et de temps réel, la parallélisation efficace est encore un défi majeur. L’exigence de performance en temps réel apparaît, par exemple, dans les simulations interactives où le système doit prendre en compte l’entrée de l’utilisateur dans une itération de calcul de la boucle de simulation. Le même type de contrainte apparaît dans les applications d’analyse de données en continu. Par exemple, lorsque des donnes issues de capteurs de trafic ou de messages de réseaux sociaux sont produites en flux continu, le système d’analyse doit être capable de traiter ces données à la volée rapidement sur ce flux tout en conservant un budget de mémoire contrôlé La caractéristique dynamique des données soulève plusieurs problèmes de performance tel que la décomposition du problème pour le traitement en parallèle et la maintenance de la localité mémoire pour une utilisation efficace du cache. Les optimisations classiques qui reposent sur des modèles pré-calculés ou sur l’indexation statique des données ne conduisent pas aux performances souhaitées. Dans cette thèse, nous abordons les problèmes dépendants de données sur deux applications différentes : la première dans le domaine de la simulation physique interactive et la seconde sur l’analyse des données en continu. Pour le problème de simulation, nous présentons un algorithme GPU parallèle pour calculer les multiples plus courts chemins et des diagrammes de Voronoi sur un graphe en forme de grille. Pour le problème d’analyse de données en continu, nous présentons une structure de données parallélisable, basée sur des Packed Memory Arrays, pour indexer des données dynamiques géo-référencées tout en conservant une bonne localité de mémoire. / A busca por desempenho tem sido uma constante na história dos sistemas computacionais. Ha mais de uma década, o modelo de processamento sequencial já mostrava seus primeiro sinais de exaustão pare suprir a crescente exigência por performance. Houveram "barreiras"para a computação sequencial que levaram a uma mudança de paradigma e estabeleceram o processamento paralelo como padrão nos sistemas computacionais modernos. Com a adoção generalizada de computadores paralelos, novos algoritmos foram desenvolvidos e aplicações reprojetadas para se adequar às características dessas novas arquiteturas. No entanto, em aplicações menos convencionais, com características de interatividade e tempo real, alcançar paralelizações eficientes ainda representa um grande desafio. O requisito por desempenho de tempo real apresenta-se, por exemplo, em simulações interativas onde o sistema deve ser capaz de reagir às entradas do usuário dentro do tempo de uma iteração da simulação. O mesmo tipo de exigência aparece em aplicações de monitoramento de fluxos contínuos de dados (streams). Por exemplo, quando dados provenientes de sensores de tráfego ou postagens em redes sociais são produzidos em fluxo contínuo, o sistema de análise on-line deve ser capaz de processar essas informações em tempo real e ao mesmo tempo manter um consumo de memória controlada A natureza dinâmica desses dados traz diversos problemas de performance, tais como a decomposição do problema para processamento em paralelo e a manutenção da localidade de dados para uma utilização eficiente da memória cache. As estratégias de otimização tradicionais, que dependem de modelos pré-computados ou de índices estáticos sobre os dados, não atendem às exigências de performance necessárias nesses cenários. Nesta tese, abordamos os problemas dependentes de dados em dois contextos diferentes: um na área de simulações baseada em física e outro em análise de dados em fluxo contínuo. Para o problema de simulação, apresentamos um algoritmo paralelo, em GPU, para computar múltiplos caminhos mínimos e diagramas de Voronoi em um grafo com topologia de grade. Para o problema de análise de fluxos de dados, apresentamos uma estrutura de dados paralelizável, baseada em Packed Memory Arrays, para indexar dados dinâmicos geo-localizados ao passo que mantém uma boa localidade de memória. / The quest for performance has been a constant through the history of computing systems. It has been more than a decade now since the sequential processing model had shown its first signs of exhaustion to keep performance improvements. Walls to the sequential computation pushed a paradigm shift and established the parallel processing as the standard in modern computing systems. With the widespread adoption of parallel computers, many algorithms and applications have been ported to fit these new architectures. However, in unconventional applications, with interactivity and real-time requirements, achieving efficient parallelizations is still a major challenge. Real-time performance requirement shows up, for instance, in user-interactive simulations where the system must be able to react to the user’s input within a computation time-step of the simulation loop. The same kind of constraint appears in streaming data monitoring applications. For instance, when an external source of data, such as traffic sensors or social media posts, provides a continuous flow of information to be consumed by an online analysis system. The consumer system has to keep a controlled memory budget and deliver a fast processed information about the stream Common optimizations relying on pre-computed models or static index of data are not possible in these highly dynamic scenarios. The dynamic nature of the data brings up several performance issues originated from the problem decomposition for parallel processing and from the data locality maintenance for efficient cache utilization. In this thesis we address data-dependent problems on two different applications: one on physically based simulations and another on streaming data analysis. To deal with the simulation problem, we present a parallel GPU algorithm for computing multiple shortest paths and Voronoi diagrams on a grid-like graph. Our contribution to the streaming data analysis problem is a parallelizable data structure, based on packed memory arrays, for indexing dynamic geo-located data while keeping good memory locality.
|
172 |
Les lieux de la mouvance du Libre : une approche comparatiste à partir de terrains aquitains et québécois / Places of the Libre movement : a comparatist approach built upon fieldwork in Quebec and AquitaineGiraud, Pierre-Amiel 18 January 2019 (has links)
La mondialisation néo-libérale s’accompagne d’un renforcement de divers droits de propriété intellectuelle, selon des rythmes et des modalités variées. Ces évolutions, parfois conçues comme un nouveau mouvement d’enclosure, sont contestées dès les années 1980. L’ensemble de ces contestations, que l’on peut appeler « le Libre », se fonde sur l’idée que les biens informationnels doivent rester ou redevenir des biens communs. Le Libre, traversé de nombreuses lignes de tension voire de fracture, porte dans son nom même la trace de son origine historique : le mouvement des logiciels libres. C’est dire si cette mouvance est consubstantielle du développement des technologies de l’information et de la communication. Elle ne saurait cependant être cantonnée au domaine du numérique, pour autant qu’une telle catégorie se révèle opératoire. En effet, si ses objectifs comme son fonctionnement montrent un usage avancé et réflexif de dispositifs numériques, ils montrent aussi un engagement marqué dans des pratiques et des représentations spatiales diversifiées qui vont parfois jusqu’à l’ancrage local fort. La problématique de cette thèse se situe donc au nœud qui tient ensemble le double paradoxe d’une mouvance qui gagne en reconnaissance sociale et en légitimité politique sans qu’il existe de consensus quant à l’identité de ses acteurs, ses objectifs ou ses limites d’une part ; d’ancrage fort d’individus dont les pratiques techniques laisseraient supposer la mise en œuvre d’espaces de représentation bien plus mobiles et mondialisés d’autre part. Cette problématique interroge donc le moment technique de l’individuation des espaces géographiques. En employant une méthode comparatiste sur nos terrains québécois et aquitains, nous tenterons de faire émerger quelques caractéristiques des lieux du monde contemporain. C’est pourquoi le concept géographique de lieu sera discuté et enrichi tout au long de cette thèse. / Neo-liberal globalization brings a reinforcement of several intellectual property rights, under a variety of ways and rhythms. These developments, sometimes understood as a new enclosure movement, are challenged since the eighties. All these challenges, that we chose to call “the Libre”, is based on the idea that informational goods must belong to the commons. The “Libre”, crossed by lots of tension or even dividing lines, is marked in its very name by its historical beginnings : the Free Software movement. In other words, the “Libre” movement is part and parcel of the development of information and communication technologies. However, it can’t be restricted to the digital world, provided that such a category is operational. Indeed, if its objectives as its functioning show an advanced and reflexive use of digital apparatus, they also show a clear commitment in diverse spatial practices and representations that can reveal strong local anchorages. Therefore, problematics of this research lies at the knot that ties together the double paradox of a movement that is gaining social recognition and political legitimacy without any consensus about who are its actors, its objectives or its limits on one hand; of locally anchored individuals whose technical practices that would suggest more mobile and global spaces of representation on another hand. Thus, these problematics questions the technical moment of individuation of geographical spaces. Using a comparatist method on our Quebec and Aquitaine fieldwork, we will attempt to detect some characteristics of places of the contemporary world. This is why the geographical concept of place will be extensively discussed and enriched all along this thesis.
|
173 |
Transformace lokálně ukotvené komunity v čase: vliv Faceboku na udržování vztahů / Transformation of a Local Community in Time: Influence of Facebook on RelationshipsTobrman, Michal January 2012 (has links)
The main aim of this thesis is to describe and analyze the nature of relationships of the originally local community and see what impact online social networks have on these relationships. The local community in this study is represented by one class of selected pupils from the primary school Na Smetance in Prague 2. For the main analysis were used mainly qualitative research methods: survey questionnaires and structured interviews with members of the screened community. The results showed that the transformation can be seen not only in the fragmentation of the local community into smaller groups of individuals with strong ties, but also in the influence of online social network Facebook, where relationships among members of the originally local community are maintenained. The analysis of strength relations between former classmates suggests that social network created by Facebook is essentially generated by weak ties, which are however indispensable for the creation of social capital. This thesis has contributed to the further understanding of the recent local community development. Keywords: local community, online social networks, Facebook, locality of Prague 2, social capital
|
174 |
Entropy: algoritmo de substituição de linhas de cache inspirado na entropia da informação. / Entropy: cache line replacement algorithm inspired in information entropy.Jorge Mamoru Kobayashi 07 June 2010 (has links)
Este trabalho apresenta um estudo sobre o problema de substituição de linhas de cache em microprocessadores. Inspirado no conceito de Entropia da Informação proposto em 1948 por Claude E. Shannon, este trabalho propõe uma nova heurística de substituição de linhas de cache. Seu objetivo é capturar e explorar melhor a localidade de referência dos programas e diminuir a taxa de miss rate durante a execução dos programas. O algoritmo proposto, Entropy, utiliza a heurística de entropia da informação para estimar as chances de uma linha ou bloco de cache ser referenciado após ter sido carregado na cache. Uma nova função de decaimento de entropia foi introduzida no algoritmo, otimizando seu funcionamento. Dentre os resultados obtidos, o Entropy conseguiu reduzir em até 50,41% o miss rate em relação ao algoritmo LRU. O trabalho propõe, ainda, uma implementação em hardware com complexidade e custo computacional comparáveis aos do algoritmo LRU. Para uma memória cache de segundo nível com 2-Mbytes e 8-way associative, a área adicional requerida é da ordem de 0,61% de bits adicionais. O algoritmo proposto foi simulado no SimpleScalar e comparado com o algoritmo LRU utilizando-se os benchmarks SPEC CPU2000. / This work presents a study about cache line replacement problem for microprocessors. Inspired in the Information Entropy concept stated by Claude E. Shannon in 1948, this work proposes a novel heuristic to replace cache lines in microprocessors. The major goal is to capture the referential locality of programs and to reduce the miss rate for cache access during programs execution. The proposed algorithm, Entropy, employs that new entropy heuristic to estimate the chances of a cache line to be referenced after it has been loaded into cache. A novel decay function has been introduced to optimize its operation. Results show that Entropy could reduce miss rate up to 50.41% in comparison to LRU. This work also proposes a hardware implementation which keeps computation and complexity costs comparable to the most employed algorithm, LRU. To a 2-Mbytes and 8-way associative cache memory, the required storage area is 0.61% of the cache size. The Entropy algorithm was simulated using SimpleScalar ISA simulator and compared to LRU using SPEC CPU2000 benchmark programs.
|
175 |
A relação entre gênero e morfologia avaliativa nos nominais do português brasileiro: uma abordagem sintática da formação de palavras / The relation between gender and evaluative morphology in Brazilian Portuguese nominals: a syntactic approach to word formationPaula Roberta Gabbai Armelin 06 April 2015 (has links)
Este trabalho se insere no âmbito dos estudos a respeito da formação de palavras e pretende analisar a estrutura morfossintática de diminutivos e aumentativos do português brasileiro construídos com os formadores -inh/-zinh e -ã/-zã, respectivamente. Mais especificamente, o recorte empírico feito na tese pode ser dividido em duas grandes linhas: uma que aborda a interação entre as marcas de diminutivo e de aumentativo com as marcas de gênero/classe nominal e outra que contempla a (im)possibilidade de que as formações diminutivas e aumentativas sejam não-composicionalmente interpretadas. Para tanto, reanalisamos o estatuto das noções de gênero e classe nominal, propondo que elas ocupam a mesma posição na estrutura sintática. Tal posição é identificada como uma projeção de gênero, que é parte da projeção estendida do nome. Os formadores de diminutivo e aumentativo são analisados com base nas relações que estabelecem com esse núcleo sintático de gênero. A hipótese de base é a de que nuances na relação entre os formadores avaliativos e o núcleo de gênero revelam aspectos significantes da posição estrutural ocupada por cada um deles. Em linhas gerais, propomos que o diminutivo -inh se diferencia dos outros formadores por compartilhar com a raiz o mesmo núcleo de gênero. Essa estrutura é capaz de dar conta, entre outros fatos empíricos, da possibilidade de que a vogal final da forma diminutiva seja idêntica à vogal final da forma não-diminutiva, ainda que tal vogal seja condicionada pela raiz. Por outro lado, a vogal final que completa o aumentativo -ã reflete os padrões gerais da língua, independentemente da raiz presente na derivação. Tomamos esse fato como evidência de que o aumentativo e a raiz possuem núcleos independentes de gênero. No que diz respeito às construções aumentativas e diminutivas encabeçadas pela consoante -z, a presença de núcleos de gênero independentes na estrutura sintática é ainda mais clara, uma vez que tanto a vogal que completa a raiz, como a vogal que completa as formas -zinh e -zã são fonologicamente realizadas. Assim como no caso das formações com -ã, a vogal que completa os formadores de grau encabeçados por -z é completamente independente da raiz que participa da formação, seguindo o padrão mais geral da língua. Esses fatores fazem a análise do aumentativo -ã e a análise das formas encabeçadas por -z bastante similares uma à outra: tais formas possuem, em sua estrutura sintática, um núcleo de gênero que é independente daquele que categoriza a raiz. No entanto, há diferenças no comportamento dessas formas que acabam por separar de um lado o aumentativo -ã e, de outro, as formas -zinh e -zã. Propomos que tais diferenças são derivadas do fato de o primeiro formador se anexar abaixo do núcleo de número, enquanto os dois últimos entram na estrutura depois que ela já possui um núcleo de número. Por fim, dentro de uma visão localista de gramática, em que a atribuição de significado não-composicional deverá ser licenciada dentro de domínios bem definidos, discutimos a composicionalidade das formações partir das posições sintáticas atribuídas a cada um dos formadores em questão. / This work is inserted within the scope of the studies that investigate word formation, and aims to analyze the morphosyntactic structure of diminutives and augmentatives in Brazilian Portuguese built with the formatives -inh/-zinh, and -ã/-za, respectively. More specifically, the empirical path of this thesis can be divided into two main lines: one that addresses the interaction of diminutive and augmentative with the notions of gender/noun classes, and one that addresses the (im)possibility of a non-compositional interpretation being attributed to the structure. In order to do so, it was necessary to review the status of notions like gender and noun class in the grammar, and the formal representation attributed to them. I propose that gender and noun class occupy the very same position in the syntactic structure. This position is identified as a gender projection, which is part of the extended projection of the noun. Diminutive and augmentative markers are, then, analyzed based on the relations they establish with the syntactic gender head. The underlying hypothesis is that differences in the relation established between the evaluative formatives and the gender head reveal important aspects of the structural position that hosts each of them. More specifically, I propose that the diminutive -inh differs from other formatives, because it shares with the root the same gender head. In this sense, -inh is attached to the same gender projection responsible for categorizing the root. This structure is capable of accounting, among other empirical facts, for the possibility that the final vowel of diminutive and nondiminutive forms is identical, even if it is a root-conditioned element. On the other hand, the final vowel that completes -ã augmentatives always reflects the general gender pattern of the language, quite independently of the root. I take this as evidence that, unlike the diminutive, the augmentative and the root have independent gender heads. Thus, the syntactic structure proposed for the augmentative formation has two gender heads: one that attaches to the root, and another one that attaches directly to the augmentative. The distance between the root and the gender head that follows the augmentative is responsible for the default phonological realization of the final vowel of the augmentative. In the same sense, in the diminutive and augmentative formations built with -zinh and -zã, the presence of two independent gender heads is even clearer, since they can be phonologically identified in the output form. It is proposed, then, that, just as in is the case of the -ã forms, the vowel completing -zinh and -zã occupies a gender head that is independent from the one that categorizes the root. Differences in the behavior of these forms point to a split between -ã on one side, and -zinh/-zã on the other. We propose that these differences are derived from the fact that while -ã attaches below a number head, -zinh/-zã, on the other hand, attaches above a number head. Finally, within a localist view of grammar, in which the licensing of noncompositional meaning must be conditioned by local domains of syntactic structure, the possibility and impossibility of non-compositional interpretation being attributed to diminutive and augmentative formations is derived from the syntactic positions assigned to each of the formatives.
|
176 |
Uma abordagem localista para morfologia e estrutura argumental dos verbos complexos (parassintéticos) do português brasileiro / A localist approch to morphology and argument structure of complex verbs (parasynthetic) of Brazilian PortugueseBassani, Indaiá de Santana 06 December 2013 (has links)
O objeto empírico desta tese é um subgrupo de verbos complexos do português brasileiro. Os dados estudados são formações sincronicamente transparentes e composicionais com prefixos a-, eN- e eS- e sufixos -ec-, -iz-, -e- e -ej-, incluindo os chamados verbos parassintéticos, e formações originalmente complexas, porém duvidosas quanto à complexidade atualmente. O corpus contém 380 verbos selecionados a partir de dicionário e organizados por critérios de frequência. O objetivo geral descritivo enfoca questões relativas às propriedades e ao comportamento dos afixos, das raízes e das vogais temáticas. A discussão é organizada em torno dos níveis de estrutura morfológica, morfofonológica, argumental e eventual. O objetivo geral teórico do trabalho consiste em discutir as propostas da Semântica Lexical, da Sintaxe Lexical e da Morfologia Distribuída. Como resultados, o estudo oferece uma primeira classificação em verbos parcialmente transparentes e totalmente transparentes. Aqueles são analisados como fruto de um processo de reanálise histórica comparado ao desaparecimento de preverbos. O estudo mostra que existe um continuum entre formações completamente fossilizadas, reanalisadas como simples, em processo de mudança e completamente transparentes e composicionais. Uma segunda classificação se refere a formações com significado composicional e não-composicional. Os dados não-composicionais são estruturalmente analisados através de uma releitura da restrição de localidade na interpretação das raízes e do uso da noção de polissemia das raízes. Os verbos totalmente transparentes e composicionais são descritivamente classificados em verbos de mudança de estado, de lugar (location), de posse concreta (locatum), de posse abstrata, de reconfiguração e verbos de modificação de v. A característica mais robusta dessa subclasse é a obrigatoriedade de um argumento interno interpretado como objeto afetado (tema ou experienciador, em menor escala) da mudança expressa pelo evento. A investigação aponta que esses prefixos podem ser a realização fonológica de um núcleo misto de natureza lexical e funcional que é responsável por introduzir o argumento interno na estrutura e relacioná-lo à semântica da raiz. Tal núcleo possui minimamente o traço [+r] (relacional) e, em poucos casos, apresenta especificação direcional [+dir]. Com isso, a ideia de que esses prefixos são morfemas direcionais é desmistificada, pois essa informação interna ao verbo complexo é residual e decadente. Em geral, os prefixos se comportam como alomorfes e não há fortes evidências de associação exclusiva de um prefixo a uma determinada estrutura argumental ou classe semântica. Os sufixos são analisados como realizações de núcleos funcionais de tipo v[+voice], v[-voice] e v[+voice, -télico] e também se observa que a ocorrência sufixal em tipos de eventos não se dá de modo tão sistemático como afirma a literatura prévia. A teoria de alomorfia prospota em Embick (2010), baseada em localidade e linearidade, se mostra efetiva para analisar a escolha dos alomorfes dos vi núcleos R (relacionador), v e Th (Vogal temática). O tipo semântico da raiz influencia o tipo de verbo formado, mas pode ser manipulado a fim de sofrer coerção por um processo metonímico ou estrutural. A principal conclusão a partir dos resultados obtidos é que a morfologia verbal do português brasileiro pode revelar tendências em relação à estrutura argumental e a estrutura de eventos, mas não reflete correlações suficientemente regulares ou consistentes. / The empirical object of this dissertation is a subgroup of complex verbs of Brazilian Portuguese. The dataset is composed by synchronically and compositional formations containing the prefixes a-, eN- e eS- and the suffixes -ec-, -iz-, -e- e -ej- and originally complex formations which are dubious in relation to its synchronic complexity. The corpus contains 380 verbs selected from a dictionary and organized by frequency criteria. The general descriptive goal encompasses topics on properties and behavior of affixes, roots and theme vowels and the discussion is guided by the levels of morphological, morphophonological, argument and event structure. The general theoretical goal of this dissertation is to discuss Lexical Semantics, Lexical Syntax and Distributed Morphology proposals. As empirical results, the study offers a primary classification in terms of partially and totally transparent verbs. Partially transparent verbs are treated as resulting from a historical reanalysis process compared to the well known process of disappearance of preverbs. It is assumed that there is a continuum from forms which are: 1) completely fossilized; 2) reanalyzed as simple; 3) forms in process of change; 4) completely compositional and transparent. A secondary classification refers to compositional and noncompositional formations. Non-compositional data are structurally analyzed by means of a new reading on the literature on locality restriction on the interpretation of roots and the use of the notion of root polysemy. Completely compositional and transparent verbs are empirically classified into change of state, change of location, change of abstract and concrete possession, reconfiguration and verbs of modification of v. The strongest characteristic of this subclass is the obligatory presence of an internal argument interpreted as an affected object (theme or experiencer, to a less extent) of the change denoted by the event. The investigation points out that the prefix may be considered as the phonological realization of a head with a mixed lexical functional nature, which is responsible for introducing the internal argument in the structure and relating it to the root semantics. This head has at least the feature [+r] and, in a few cases, it may present directional information [+dir]. Considering this, the assumption that these prefixes are directional morphemes is debunked since this kind of information within a complex verb is residual and decayed. In general, prefixes behave as allomorphs and there are not strong evidences of an exclusive association of a prefix and a certain kind of argument structure or semantic class. The suffixes are analyzed as realizations of three functional heads: v[+voice], v[-voice] and v[+voice, -telic] and it is observed that suffix occurrence in event type is not systematic as previous literature claims. The theory of allomorphy proposed in Embick (2010), which is based on locality and linearity, was efficient in accounting for selection of allomorphs of R, v and Th heads. Finally, semantic type shows influence on verb type but this information can be viii manipulated in order to derive structural or metonymical coercion. The main conclusion to be drawn from the results is the fact that Brazilian Portuguese verbal morphology may reveal certain tendencies in argument and event structure, but it does not reflect sufficiently regular or consistent correlations.
|
177 |
Software Techniques for Distributed Shared MemoryRadovic, Zoran January 2005 (has links)
<p>In large multiprocessors, the access to shared memory is often nonuniform, and may vary as much as ten times for some distributed shared-memory architectures (DSMs). This dissertation identifies another important nonuniform property of DSM systems: <i>nonuniform communication architecture</i>, NUCA. High-end hardware-coherent machines built from large nodes, or from chip multiprocessors, are typical NUCA systems, since they have a lower penalty for reading recently written data from a neighbor's cache than from a remote cache. This dissertation identifies <i>node affinity</i> as an important property for scalable general-purpose locks. Several software-based hierarchical lock implementations exploiting NUCAs are presented and evaluated. NUCA-aware locks are shown to be almost twice as efficient for contended critical sections compared to traditional lock implementations.</p><p>The shared-memory “illusion”' provided by some large DSM systems may be implemented using either hardware, software or a combination thereof. A software-based implementation can enable cheap cluster hardware to be used, but typically suffers from poor and unpredictable performance characteristics.</p><p>This dissertation advocates a new software-hardware trade-off design point based on a new combination of techniques. The two low-level techniques, fine-grain deterministic coherence and synchronous protocol execution, as well as profile-guided protocol flexibility, are evaluated in isolation as well as in a combined setting using all-software implementations. Finally, a minimum of hardware trap support is suggested to further improve the performance of coherence protocols across cluster nodes. It is shown that all these techniques combined could result in a fairly stable performance on par with hardware-based coherence.</p>
|
178 |
Software Techniques for Distributed Shared MemoryRadovic, Zoran January 2005 (has links)
In large multiprocessors, the access to shared memory is often nonuniform, and may vary as much as ten times for some distributed shared-memory architectures (DSMs). This dissertation identifies another important nonuniform property of DSM systems: nonuniform communication architecture, NUCA. High-end hardware-coherent machines built from large nodes, or from chip multiprocessors, are typical NUCA systems, since they have a lower penalty for reading recently written data from a neighbor's cache than from a remote cache. This dissertation identifies node affinity as an important property for scalable general-purpose locks. Several software-based hierarchical lock implementations exploiting NUCAs are presented and evaluated. NUCA-aware locks are shown to be almost twice as efficient for contended critical sections compared to traditional lock implementations. The shared-memory “illusion”' provided by some large DSM systems may be implemented using either hardware, software or a combination thereof. A software-based implementation can enable cheap cluster hardware to be used, but typically suffers from poor and unpredictable performance characteristics. This dissertation advocates a new software-hardware trade-off design point based on a new combination of techniques. The two low-level techniques, fine-grain deterministic coherence and synchronous protocol execution, as well as profile-guided protocol flexibility, are evaluated in isolation as well as in a combined setting using all-software implementations. Finally, a minimum of hardware trap support is suggested to further improve the performance of coherence protocols across cluster nodes. It is shown that all these techniques combined could result in a fairly stable performance on par with hardware-based coherence.
|
179 |
High Performance by Exploiting Information Locality through Reverse ComputingBahi, Mouad 21 December 2011 (has links) (PDF)
The main resources for computation are time, space and energy. Reducing them is the main challenge in the field of processor performance.In this thesis, we are interested in a fourth factor which is information. Information has an important and direct impact on these three resources. We show how it contributes to performance optimization. Landauer has suggested that independently on the hardware where computation is run information erasure generates dissipated energy. This is a fundamental result of thermodynamics in physics. Therefore, under this hypothesis, only reversible computations where no information is ever lost, are likely to be thermodynamically adiabatic and do not dissipate power. Reversibility means that data can always be retrieved from any point of the program. Information may be carried not only by the data but also by the process and input data that generate it. When a computation is reversible, information can also be retrieved from other already computed data and reverse computation. Hence reversible computing improves information locality.This thesis develops these ideas in two directions. In the first part, we address the issue of making a computation DAG (directed acyclic graph) reversible in terms of spatial complexity. We define energetic garbage as the additional number of registers needed for the reversible computation with respect to the original computation. We propose a reversible register allocator and we show empirically that the garbage size is never more than 50% of the DAG size. In the second part, we apply this approach to the trade-off between recomputing (direct or reverse) and storage in the context of supercomputers such as the recent vector and parallel coprocessors, graphical processing units (GPUs), IBM Cell processor, etc., where the gap between processor cycle time and memory access time is increasing. We show that recomputing in general and reverse computing in particular helps reduce register requirements and memory pressure. This approach of reverse rematerialization also contributes to the increase of instruction-level parallelism (Cell) and thread-level parallelism in multicore processors with shared register/memory file (GPU). On the latter architecture, the number of registers required by the kernel limits the number of running threads and affects performance. Reverse rematerialization generates additional instructions but their cost can be hidden by the parallelism gain. Experiments on the highly memory demanding Lattice QCD simulation code on Nvidia GPU show a performance gain up to 11%.
|
180 |
Distributed high-dimensional similarity search with music information retrieval applicationsFaghfouri, Aidin 29 August 2011 (has links)
Today, the advent of networking technologies and computer hardware have enabled more and more inexpensive PCs, various mobile devices, smart phones, PDAs, sensors and cameras to be linked to the Internet with better connectivity. In recent years, we have witnessed the emergence of several instances of distributed applications, providing infrastructures for social interactions over large-scale wide-area networks and facilitating the ways users share and publish data. User generated data today range from simple text files to (semi-) structured documents and multimedia content. With the emergence of Semantic Web, the number of features (associated with a content) that are used in order to index those large amounts of heterogenous pieces of data is growing dramatically. The feature sets associated with each content type can grow continuously as we discover new ways of describing a content in formulated terms.
As the number of dimensions in the feature data grow (as high as 100 to 1000), it becomes harder and harder to search for information in a dataset due to the curse of dimensionality and it is not appropriate to use naive search methods, as their performance degrade to linear search. As an alternative, we can distribute the content and the query processing load to a set of peers in a distributed Peer-to-Peer (P2P) network and incorporate high-dimensional distributed search techniques to attack the problem.
Currently, a large percentage of Internet traffic consists of video and music files shared and exchanged over P2P networks. In most present services, searching for music is performed through keyword search and naive string-matching algorithms using collaborative filtering techniques which mostly use tag based approaches. In music information retrieval (MIR) systems, the main goal is to make recommendations similar to the music that the user listens to. In these systems, techniques based on acoustic feature extraction can be employed to achieve content-based music similarity search (i.e., searching through music based on what can be heard from the music track). Using these techniques we can devise an automated measure of similarity that can replace the need for human experts (or users) who assign descriptive genre tags and meta-data to each recording and solve the famous cold-start problem associated with the collaborative filtering techniques.
In this work we explore the advantages of distributed structures by efficiently distributing the content features and query processing load on the peers in a P2P network. Using a family of Locality Sensitive Hash (LSH) functions based on p-stable distributions we propose an efficient, scalable and load-balanced system, capable of performing K-Nearest-Neighbor (KNN) and Range queries. We also propose a new load-balanced indexing algorithm and evaluate it using our Java based simulator.
Our results show that this P2P design ensures load-balancing and guarantees logarithmic number of hops for query processing. Our system is extensible to be used with all types of multi-dimensional feature data and it can also be employed as the main indexing scheme of a multipurpose recommendation system. / Graduate
|
Page generated in 0.0777 seconds