241 |
Synchronization costs in parallel programs and concurrent data structures / Coûts de synchronization dans les programmes parallèlles et les structures de données simultanéesAksenov, Vitalii 26 September 2018 (has links)
Pour utiliser la puissance de calcul des ordinateurs modernes, nous devons écrire des programmes concurrents. L’écriture de programme concurrent efficace est notoirement difficile, principalement en raison de la nécessité de gérer les coûts de synchronisation. Dans cette thèse, nous nous concentrons sur les coûts de synchronisation dans les programmes parallèles et les structures de données concurrentes.D’abord, nous présentons une nouvelle technique de contrôle de la granularité pour les programmes parallèles conçus pour un environnement de multi-threading dynamique. Ensuite, dans le contexte des structures de données concurrentes, nous considérons la notion d’optimalité de concurrence (concurrency-optimality) et proposons la première implémentation concurrence-optimal d’un arbre binaire de recherche qui, intuitivement, accepte un ordonnancement concurrent si et seulement si l’ordonnancement est correct. Nous proposons aussi la combinaison parallèle (parallel combining), une technique qui permet l’implémentation efficace des structures de données concurrences à partir de leur version parallèle par lots. Nous validons les techniques proposées par une évaluation expérimentale, qui montre des performances supérieures ou comparables à celles des algorithmes de l’état de l’art.Dans une perspective plus formelle, nous considérons le phénomène d’assistance (helping) dans des structures de données concurrentes. On observe un phénomène d’assistance quand l’ordre d’une opération d’un processus dans une trace linéarisée est fixée par une étape d’un autre processus. Nous montrons qu’aucune implémentation sans attente (wait-free) linéarisable d’une pile utilisant les primitives read, write, compare&swap et fetch&add ne peut être “sans assistance” (help-free), corrigeant une erreur dans une preuve antérieure de Censor-Hillel et al. Finalement, nous proposons une façon simple de prédire analytiquement le débit (throughput) des structures de données basées sur des verrous à gros grains. / To use the computational power of modern computing machines, we have to deal with concurrent programs. Writing efficient concurrent programs is notoriously difficult, primarily due to the need of harnessing synchronization costs. In this thesis, we focus on synchronization costs in parallel programs and concurrent data structures.First, we present a novel granularity control technique for parallel programs designed for the dynamic multithreading environment. Then in the context of concurrent data structures, we consider the notion of concurrency-optimality and propose the first implementation of a concurrency-optimal binary search tree that, intuitively, accepts a concurrent schedule if and only if the schedule is correct. Also, we propose parallel combining, a technique that enables efficient implementations of concurrent data structures from their parallel batched counterparts. We validate the proposed techniques via experimental evaluations showing superior or comparable performance with respect to state-of-the-art algorithms.From a more formal perspective, we consider the phenomenon of helping in concurrent data structures. Intuitively, helping is observed when the order of some operation in a linearization is fixed by a step of another process. We show that no wait-free linearizable implementation of stack using read, write, compare&swap and fetch&add primitives can be help-free, correcting a mistake in an earlier proof by Censor-Hillel et al. Finally, we propose a simple way to analytically predict the throughput of data structures based on coarse-grained locking
|
242 |
Scalable Dynamic Big Data Geovisualization With Spatial Data StructureSiqi Gu (8779961) 29 April 2020 (has links)
Comparing to traditional cartography, big data geographic information processing is not a simple task at all, it requires special methods and methods. When existing geovisualization systems face millions of data, the zoom function and the dynamical data adding function usually cannot be satisfied at the same time. This research classify the existing methods of geovisualization, then analyze its functions and bottlenecks, analyze its applicability in the big data environment, and proposes a method that combines spatial data structure and iterative calculation on demand. It also proves that this method can effectively balance the performance of scaling and new data, and it is significantly better than the existing library in the time consumption of new data and scaling<br>
|
243 |
Towards Dynamic Programming on Generalized Data Structures: and Applications of Dynamic Programming in BioinformaticsBerkemer, Sarah Juliane 11 March 2020 (has links)
Dynamische Programmierung (DP) ist eine Methode um Optimisierungsprobleme zu
lösen. Hierbei wird das Problem in sich überlappende Teilprobleme unterteilt und eine
optimale Lösung zu jedem der Teilprobleme berechnet. Diese werden dann wiederrum zur
Gesamtlösung zusammengesetzt. Teillösungen werden in einer Tabelle gespeichert, sodass
jede Teillösung nur einmal berechnet werden muss. So kann ein Suchraum exponentieller
Größe in polynomieller Zeit durchsucht und eine optimale Lösung gefunden werden. Die
dynamische Programmierung wurde 1952 von Bellman entwickelt und eine der ersten
Anwendung war die Detektion von Tippfehlern beim Programmieren.
DP Algorithmen werden oft und sehr vielschichtig in der Bioinformatik angewendet
wie zum Beispiel beim Vergleich von Gensequenzen, Sequenzalignment genannt, oder der
Vorhersage von Molekülstrukturen. Die Menge an Daten und somit auch deren Analyse
steigt stetig an, weshalb neue und komplexere Datenstrukturen immer wichtiger werden.
Ein Ziel ist es deswegen, DP Algorithmen zu entwickeln, die auf komplexeren Daten-
strukturen als Strings angewendet werden können. Durch das Prinzip der algebraischen
dynamischen Programmierung (ADP) können DP Algorithmen in kleinere Bestandteile
zerlegt werden, die dann unabhängig voneinander weiterentwickelt und abgeändert werden
können.
Die Arbeit ist in zwei Teile gegliedert, wobei der erste Teil die theoretische Arbeit
zur Entwicklung von Algorithmen der dynamischen Programmierung beinhaltet. Hierbei
werden zuerst Prinzipien und Definitionen zur dynamischen Programmierung vorgestellt
(Kapitel 2), um ein besseres Verständnis der darauffolgenden Kapitel zu gewährleisten.
Der zweite Teil der Arbeit zeigt unterschiedliche bioinformatische Anwendungen von
DP Algorithmen auf biologische Daten. In einem ersten Kapitel (Kapitel 5) werden
Grundsätze biologischer Daten und Algorithmen vorgestellt, die dann in den weiteren
Kapiteln benutzt werden.
|
244 |
Saving Energy in Network Hosts With an Application Layer Proxy: Design and Evaluation of New Methods That Utilize Improved Bloom FiltersJimeno, Miguel 11 December 2009 (has links)
One of the most urgent challenges of the 21st century is to investigate new technologies that can enable a transition towards a society with a reduced CO2 footprint. Information Technology generates about 2% of the global CO2, which is comparable to the aviation industry. Being connected to the Internet requires active participation in responding to protocol messages. Billions of dollars worth of electricity every year are used to keep network hosts fully powered-on at all times only for the purpose of maintaining network presence. Most network hosts are idle most of the time, thus presenting a huge opportunity for energy savings and reduced CO2 emissions.
Proxying has been previously explored as a means for allowing idle hosts to sleep yet still maintain network presence. This dissertation develops general requirements for proxying and is the first exploration of application-level proxying. Proxying for TCP connections, SIP, and Gnutella P2P was investigated. The TCP proxy keeps TCP connections open (when a host is sleeping) and buffers and/or discards packets as appropriate. The SIP proxy handles all communication with the SIP server and wakes up a sleeping SIP phone on an incoming call. The P2P proxy enables a Gnutella leaf node to sleep when not actively uploading or downloading files by handling all query messages and keyword lookups in a list of shared files. All proxies were prototyped and experimentally evaluated.
Proxying for P2P lead to the exploration of space and time efficient data structures to reduce the computational requirements of keyword search in the proxy. The use of pre-computation and hierarchical structures for reducing the false positive rate of a Bloom filter was explored. A Best-of-N Bloom filter was developed, which was shown to have a lower false positive rate than a standard Bloom filter and the Power-of-2 Bloom filter. An analysis of the Best-of-N Bloom Filter was completed using Order Statistics to predict the false positive rate.
Potential energy savings are shown to be in the hundreds of millions of dollars per year assuming a modest adoption rate of the methods investigated in this dissertation. Future directions could lead to greater savings.
|
245 |
Optimizing Query Processing Under SkewZhang, Wangda January 2020 (has links)
Big data systems such as relational databases, data science platforms, and scientific workflows all process queries over large and complex datasets. Skew is common in these real-world datasets and workloads. Different types of skew can have different impacts on the performance of query processing. Although skew sometimes causes load imbalance in a parallel execution environment, negatively impacting query performance, we demonstrate in this thesis that, in many cases we can actually improve the query performance in the presence of skew. To optimize query processing under skew, we develop a set of techniques to exploit the positive effects of skew and to avoid the negative effects. In order to exploit skew, we propose techniques including: (a) intentionally creating skew and clustering data in a distributed database system; (b) optimizing data layout for better caching in main-memory databases; and (c) adaptive execution techniques that are responsive to the underlying data in the context of compilers. In order to ameliorate skew, we study optimized hash-based partitioning that alleviate outliers in a genomic data context, as well as parallel prefix sum algorithms that used to develop skew-insensitive algorithms. We evaluate the effectiveness of our techniques over synthetic data, standard benchmarks, as well as empirical datasets, and show that the performance of query processing under skew can be greatly improved. Overall this thesis has made a concrete contribution to skew-related query processing.
|
246 |
Performance Study of Concurrent Search Trees and Hash Algorithms on Multiprocessors SystemsDemuynck, Marie-Anne 05 1900 (has links)
This study examines the performance of concurrent algorithms for B-trees and linear hashing. B-trees are widely used as an access method for large, single key, database files, stored in lexicographic order on secondary storage devices. Linear hashing is a fast and reliable hash algorithm, suitable for accessing records stored unordered in buckets. This dissertation presents performance results on implementations of concurrent Bunk-tree and linear hashing algorithms, using lock-based, partitioned and distributed methods on the Sequent Symmetry shared memory multiprocessor system and on a network of distributed processors created with PVM (Parallel Virtual Machine) software. Initial experiments, which started with empty data structures, show good results for the partitioned implementations and lock-based linear hashing, but poor ones for lock-based Blink-trees. A subsequent test, which started with loaded data structures, shows similar results, but with much improved performances for locked Blink- trees. The data also highlighted the high cost of split operations, which reached up to 70% of the total insert time.
|
247 |
Evaluating Memory Models for Graph‐Like Data Structures in the Rust Programming Language: Performance and Usabiliy / Utvärdering av Minnesmodeller för Graf‐Liknande Datastrukturer i Programmeringsspråket Rust: Användbarhet och PrestandaViitanen, Rasmus January 2020 (has links)
Representing graphs in Rust is a problematic issue, as ownership forbids typical representations found in e.g. C++. A common approach is to use reference counting to represent graphs, but this can easily lead to memory leaks if cycles are present in the graph. As naïve reference counting is not sufficient, we must search for alternative representations. In this thesis, we explore different memory models that allow safe representations of graph-like data structures in Rust. These memory models are later evaluated in terms of performance and usability. We find that region-based allocation is, in most cases, the best model to use when performance is of importance. In cases where usability is more important, either reference-counting with cycle collection or tracing garbage collection is a solid choice. When it comes to multi-threading, we propose a new implementation of a lock-free transactional graph in Rust. To our knowledge, this is the first lock-free graph representation in Rust. The model demonstrates poor scalability, but for certain graph topologies and sizes, it offers performance that exceeds the other graph models.
|
248 |
Simple, safe, and efficient memory management using linear pointersLiu, Likai 22 January 2016 (has links)
Efficient and safe memory management is a hard problem. Garbage collection promises automatic memory management but comes with the cost of increased memory footprint, reduced parallelism in multi-threaded programs, unpredictable pause time, and intricate tuning parameters balancing the program's workload and designated memory usage in order for an application to perform reasonably well. Existing research mitigates the above problems to some extent, but programmer error could still cause memory leak by erroneously keeping memory references when they are no longer needed. We need a methodology for programmers to become resource aware, so that efficient, scalable, predictable and high performance programs may be written without the fear of resource leak.
Linear logic has been recognized as the formalism of choice for resource tracking. It requires explicit introduction and elimination of resources and guarantees that a resource cannot be implicitly shared or abandoned, hence must be linear. Early languages based on linear logic focused on Curry-Howard correspondence. They began by limiting the expressive powers of the language and then reintroduced them by allowing controlled sharing which is necessary for recursive functions. However, only by deviating from Curry-Howard correspondence could later development actually address programming errors in resource usage.
The contribution of this dissertation is a simple, safe, and efficient approach introducing linear resource ownership semantics into C++ (which is still a widely used language after 30 years since inception) through linear pointer, a smart pointer inspired by linear logic. By implementing various linear data structures and a parallel, multi-threaded memory allocator based on these data structures, this work shows that linear pointer is practical and efficient in the real world, and that it is possible to build a memory management stack that is entirely leak free. The dissertation offers some closing remarks on the difficulties a formal system would encounter when reasoning about a concurrent linear data algorithm, and what might be done to solve these problems.
|
249 |
Practical Type Inference for the GADT Type SystemLin, Chuan-kai 01 January 2010 (has links)
Generalized algebraic data types (GADTs) are a type system extension to algebraic data types that allows the type of an algebraic data value to vary with its shape. The GADT type system allows programmers to express detailed program properties as types (for example, that a function should return a list of the same length as its input), and a general-purpose type checker will automatically check those properties at compile time. Type inference for the GADT type system and the properties of the type system are both currently areas of active research. In this dissertation, I attack both problems simultaneously by exploiting the symbiosis between type system research and type inference research. Deficiencies of GADT type inference algorithms motivate research on specific aspects of the type system, and discoveries about the type system bring in new insights that lead to improved GADT type inference algorithms. The technical contributions of this dissertation are therefore twofold: in addition to new GADT type system properties (such as the prevalence of pointwise type information flow in GADT patterns, a generalized notion of existential types, and the effects of enforcing the GADT branch reachability requirement), I will also present a new GADT type inference algorithm that is significantly more powerful than existing algorithms. These contributions should help programmers use the GADT type system more effectively, and they should also enable language implementers to provide better support for the GADT type system.
|
250 |
Data views for a programming environmentRobson, R. January 1988 (has links)
No description available.
|
Page generated in 0.0974 seconds