Global ETD Search

11	Scalable Streaming Graph Partitioning Seyed Khamoushi, Seyed Mohammadreza January 2017 (has links) Large-scale graph-structured datasets are growing at an increasing rate. Social network graphs are an example of these datasets. Processing large-scale graphstructured datasets are central to many applications ranging from telecommunication to biology and has led to the development of many parallel graph algorithms. Performance of parallel graph algorithms largely depends on how the underlying graph is partitioned. In this work, we focus on studying streaming vertex-cut graph partitioning algorithms where partitioners receive a graph as a stream of vertices and edges and assign partitions to them on their arrival once and for all. Some of these algorithms maintain a state during partitioning. In some cases, the size of the state is so huge that it cannot be kept in a single machine memory. In many real world scenarios, several instances of a streaming graph partitioning algorithm are run simultaneously to improve the system throughput. However, running several instances of a partitioner drops the partitioning quality considerably due to the incomplete information of partitioners. Even frequently sharing states and its combination with buffering mechanisms does not completely solves the problem because of the heavy communication overhead produced by partitioners. In this thesis, we propose an algorithm which tackles the problem of low scalability and performance of existing streaming graph partitioning algorithms by providing an efficient way of sharing states and its combination with windowing mechanism. We compare state-of-the-art streaming graph partitioning algorithms with our proposed solution concerning performance and efficiency. Our solution combines a batch processing method with a shared-state mechanism to achieve both an outstanding performance and a high partitioning quality. Shared state mechanism is used for sharing states of partitioners. We provide a robust implementation of our method in a PowerGraph framework. Furthermore, we empirically evaluate the impact of partitioning quality on how graph algorithms perform in a real cloud environment. The results show that our proposed method outperforms other algorithms in terms of partitioning quality and resource consumption and improves partitioning time considerably. On average our method improves partitioning time by 23%, decreases communication load by 15% and increase memory consumption by only 5% compared to the state-of-the-art streaming graph partitioning. streaming graph vertex-cut partitioning graph partitioning distributed hash table Engineering and Technology Teknik och teknologier
12	An Integer Programming Approach to Layer Planning in Communication Networks / Une approche de programmation entière pour le problème de planification de couches dans les réseaux de communication Ozsoy, Aykut F. A. 12 May 2011 (has links) In this thesis, we introduce the Partitioning-Hub Location-Routing problem (PHLRP), which can be classied as a variant of the hub location problem. PHLRP consists of partitioning a network into sub-networks, locating at least one hub in each subnetwork and routing the traffic within the network such that all inter-subnetwork traffic is routed through the hubs and all intra-subnetwork traffic stays within the sub-networks all the way from the source to the destination. Obviously, besides the hub location component, PHLRP also involves a graph partitioning component and a routing component. PHLRP finds applications in the strategic planning or deployment of the Intermediate System-Intermediate System (ISIS) Internet Protocol networks and the Less-than-truck load freight distribution systems. First, we introduce three IP formulations for solving PHLRP. The hub location component and the graph partitioning components of PHLRP are modeled in the same way in all three formulations. More precisely, the hub location component is represented by the p-median variables and constraints; and the graph partitioning component is represented by the size-constrained graph partitioning variables and constraints. The formulations differ from each other in the way the peculiar routing requirements of PHLRP are modeled. We then carry out analytical and empirical comparisons of the three IP formulations. Our thorough analysis reveals that one of the formulations is provably the tightest of the three formulations. We also show analytically that the LP relaxations of the other two formulations do not dominate each other. On the other hand, our empirical comparison in a standard branch-and-cut framework that is provided by CPLEX shows that not the tightest but the most compact of the three formulations yield the best performance in terms of solution time. From this point on, based on the insight gained from detailed analysis of the formulations, we focus our attention on a common sub-problem of the three formulations: the so-called size-constrained graph partitioning problem. We carry out a detailed polyhedral analysis of this problem. The main benet from this polyhedral analysis is that the facets we identify for the size-constrained graph partitioning problem constitute strong valid inequalities for PHLRP. And finally, we wrap up our efforts for solving PHLRP. Namely, we present the results of our computational experiments, in which we employ some facets of the size-constrained graph partitioning polytope in a branch-and-cut algorithm for solving PHLRP. Our experiments show that our approach brings signicant improvements to the solution time of PHLRP when compared with the default branch-and-cut solver of XPress. / Dans cette thèse, nous introduisons le problème Partitionnement-Location des Hubs et Acheminement (PLHA), une variante du problème de location de hubs. Le problème PLHA partitionne un réseau afin d'obtenir des sous-réseaux, localise au moins un hub dans chaque sous-réseau et achemine le traffic dans le réseau de la maniére suivante : le traffic entre deux sous-réseaux distincts doit être éxpedié au travers des hubs tandis que le traffic entre deux noeuds d'un même sous-réseau ne doit pas sortir de celui-ci. PLHA possède des applications dans le planning stratégique, ou déploiement, d'un certain protocole de communication utilisé dans l'Internet, Intermediate System - Intermediate System, ainsi que dans la distribution des frets. Premièrement, nous préesentons trois formulations linéaires en variables entières pour résoudre PLHA. Le partitionnement du graphe et la localisation des hubs sont modélisées de la même maniére dans les trois formulations. Ces formulations diffèrent les unes des autres dans la maniére dont l'acheminement du traffic est traité. Deuxièmement, nous présentons des comparaisons analytiques et empiriques des trois formulations. Notre comparaison analytique démontre que l'une des formulations est plus forte que les autres. Néanmoins, la comparaison empirique des formulations, via le solveur CPLEX, montre que la formulation la plus compacte (mais pas la plus forte) obtient les meilleures performances en termes de temps de résolution du problème. Ensuite, nous nous concentrons sur un sous-problème, à savoir, le partitionnement des graphes sous contrainte de taille. Nous étudions le polytope des solutions réalisables de ce sous-problème. Les facettes de ce polytope constituent des inégalités valides fortes pour PLHA et peuvent être utilisées dans un algorithme de branch-and-cut pour résoudre PLHA. Finalement, nous présentons les résultats d'un algorithme de branch-and-cut que nous avons développé pour résoudre PLHA. Les résultats démontrent que la performance de notre méthode est meilleure que celle de l'algorithme branch-and-cut d'Xpress. location de hubs routing / programmation entière partitionnement des graphes hub location graph partitioning acheminement polyhedral analysis Integer programming
13	iC2mpi: A Platform for Parallel Execution of Graph-Structured Iterative Computations Botadra, Harnish 02 August 2006 (has links) Parallelization of sequential programs is often daunting because of the substantial development cost involved. Various solutions have been proposed to address this concern, including directive-based approaches and parallelization platforms. These solutions have not always been successful, in part because many try to address all types of applications. We propose a platform for parallelization of a class of applications that have similar computational structure, namely graph-structured iterative applications. iC2mpi is a unique proof-of-concept prototype platform that provides relatively easy parallelization of existing sequential programs and facilitates experimentation with static partitioning and dynamic load balancing schemes. We demonstrate with various generic application graph topologies and an existing application, namely a time-stepped battlefield management simulation, that our platform can produce good performance with very little effort. PaGrid Metis Battlefiled Management Simulation Graph Partitioning Load Balancing Parallelization Computer Sciences
14	A Scalable Partial-Order Data Structure for Distributed-System Observation Ward, Paul January 2001 (has links) Distributed-system observation is foundational to understanding and controlling distributed computations. Existing tools for distributed-system observation are constrained in the size of computation that they can observe by three fundamental problems. They lack scalable information collection, scalable data-structures for storing and querying the information collected, and scalable information-abstraction schemes. This dissertation addresses the second of these problems. Two core problems were identified in providing a scalable data structure. First, in spite of the existence of several distributed-system-observation tools, the requirements of such a structure were not well-defined. Rather, current tools appear to be built on the basis of events as the core data structure. Events were assigned logical timestamps, typically Fidge/Mattern, as needed to capture causality. Algorithms then took advantage of additional properties of these timestamps that are not explicit in the formal semantics. This dissertation defines the data-structure interface precisely, and goes some way toward reworking algorithms in terms of that interface. The second problem is providing an efficient, scalable implementation for the defined data structure. The key issue in solving this is to provide a scalable precedence-test operation. Current tools use the Fidge/Mattern timestamp for this. While this provides a constant-time test, it requires space per event equal to the number of processes. As the number of processes increases, the space consumption becomes sufficient to affect the precedence-test time because of caching effects. It also becomes problematic when the timestamps need to be copied between processes or written to a file. Worse, existing theory suggested that the space-consumption requirement of Fidge/Mattern timestamps was optimal. In this dissertation we present two alternate timestamp algorithms that require substantially less space than does the Fidge/Mattern algorithm. Computer Science Dijkstra's shortest path algorithms disk-based algorithm graph partitioning divide and conquer GIS
15	A Scalable Partial-Order Data Structure for Distributed-System Observation Ward, Paul January 2001 (has links) Distributed-system observation is foundational to understanding and controlling distributed computations. Existing tools for distributed-system observation are constrained in the size of computation that they can observe by three fundamental problems. They lack scalable information collection, scalable data-structures for storing and querying the information collected, and scalable information-abstraction schemes. This dissertation addresses the second of these problems. Two core problems were identified in providing a scalable data structure. First, in spite of the existence of several distributed-system-observation tools, the requirements of such a structure were not well-defined. Rather, current tools appear to be built on the basis of events as the core data structure. Events were assigned logical timestamps, typically Fidge/Mattern, as needed to capture causality. Algorithms then took advantage of additional properties of these timestamps that are not explicit in the formal semantics. This dissertation defines the data-structure interface precisely, and goes some way toward reworking algorithms in terms of that interface. The second problem is providing an efficient, scalable implementation for the defined data structure. The key issue in solving this is to provide a scalable precedence-test operation. Current tools use the Fidge/Mattern timestamp for this. While this provides a constant-time test, it requires space per event equal to the number of processes. As the number of processes increases, the space consumption becomes sufficient to affect the precedence-test time because of caching effects. It also becomes problematic when the timestamps need to be copied between processes or written to a file. Worse, existing theory suggested that the space-consumption requirement of Fidge/Mattern timestamps was optimal. In this dissertation we present two alternate timestamp algorithms that require substantially less space than does the Fidge/Mattern algorithm. Computer Science Dijkstra's shortest path algorithms disk-based algorithm graph partitioning divide and conquer GIS
16	Shortest Path Queries in Very Large Spatial Databases Zhang, Ning January 2001 (has links) Finding the shortest paths in a graph has been studied for a long time, and there are many main memory based algorithms dealing with this problem. Among these, Dijkstra's shortest path algorithm is one of the most commonly used efficient algorithms to the non-negative graphs. Even more efficient algorithms have been developed recently for graphs with particular properties such as the weights of edges fall into a range of integer. All of the mentioned algorithms require the graph totally reside in the main memory. Howevery, for very large graphs, such as the digital maps managed by Geographic Information Systems (GIS), the requirement cannot be satisfied in most cases, so the algorithms mentioned above are not appropriate. My objective in this thesis is to design and evaluate the performance of external memory (disk-based) shortest path algorithms and data structures to solve the shortest path problem in very large digital maps. In particular the following questions are studied:What have other researchers done on the shortest path queries in very large digital maps?What could be improved on the previous works? How efficient are our new shortest paths algorithms on the digital maps, and what factors affect the efficiency? What can be done based on the algorithm? In this thesis, we give a disk-based Dijkstra's-like algorithm to answer shortest path queries based on pre-processing information. Experiments based on our Java implementation are given to show what factors affect the running time of our algorithms. Computer Science Dijkstra's shortest path algorithms disk-based algorithm graph partitioning divide and conquer GIS
17	Computing Word Senses by Semantic Mirroring and Spectral Graph Partitioning Fagerlund, Martin January 2010 (has links) In this thesis we use the method of Semantic Mirrors to create a graph of words that are semantically related to a seed word. Spectral graph partitioning methods are then used to partition the graph into subgraphs, and thus dividing the words into different word senses. These methods are applied to a bilingual lexicon of English and Swedish adjectives. A panel of human evaluators have looked at a few examples, and evaluated consistency within the derived senses and synonymy with the seed word. Cheeger inequality Fiedler value Fiedler vector Graph partitioning Normalised Laplacian Semantic mirror Word senses. MATHEMATICS MATEMATIK
18	A Distributed Graph Mining Framework Based On Mapreduce Alkan, Sertan 01 January 2010 (has links) (PDF) The frequent patterns hidden in a graph can reveal crucial information about the network the graph represents. Existing techniques to mine the frequent subgraphs in a graph database generally rely on the premise that the data can be fit into main memory of the device that the computation takes place. Even though there are some algorithms that are designed using highly optimized methods to some extent, many lack the solution to the problem of scalability. In this thesis work, our aim is to find and enumerate the subgraphs that are at least as frequent as the designated threshold in a given graph. Here, we propose a new distributed algorithm for frequent subgraph mining problem that can scale horizontally as the computing cluster size increases. The method described here, uses a partitioning method and Map/Reduce programming model to distribute the computation of frequent subgraphs. In the core of this algorithm, we make use of an existing graph partitioning method to split the given data in the distributed file system and to merge and join the computed subgraphs without losing information. The frequent subgraph computation in each split is done using another known method which can enumerate the frequent patterns. Although current algorithms can efficiently find frequent patterns, they are not parallel or distributed algorithms in that even when they partition the data, they are designed to work on a single machine. Furthermore, these algorithms are computationally expensive but not fault tolerant and are not designed to work on a distributed file system. Using the Map/Reduce paradigm, we distribute the computation of frequent patterns to every machine in a cluster. Our algorithm, first bi-partitions the data via successive Map/Reduce jobs, then invokes another Map/Reduce job to compute the subgraphs in partitions using CloseGraph, recovers the whole set by invoking a series of Map/Reduce jobs to merge-join the previously found patterns. The implementation uses an open source Map/Reduce environment, Hadoop. In our experiments, our method can scale up to large graphs, as the graph data size gets bigger, this method performs better than the existing algorithms. QA Computer Software 76.75-76.765
19	Graph partitioning - a survey Elsner, Ulrich 09 September 2005 (has links) (PDF) Many problems appearing in scientific computing and other areas can be formulated as a graph partitioning problems. Examples include data distribution for parallel computers, decomposition of sparse matrices and VLSI-design. In this survey we present the graph partitioning problem, describe some applications and introduce many of the algorithms used to solve the problem. mesh partitioning multilevel graph partitioning spectral bisection ddc:510 Graphentheorie Partitionierung
20	Allocation Strategies for Data-Oriented Architectures Kiefer, Tim 12 January 2016 (has links) (PDF) Data orientation is a common design principle in distributed data management systems. In contrast to process-oriented or transaction-oriented system designs, data-oriented architectures are based on data locality and function shipping. The tight coupling of data and processing thereon is implemented in different systems in a variety of application scenarios such as data analysis, database-as-a-service, and data management on multiprocessor systems. Data-oriented systems, i.e., systems that implement a data-oriented architecture, bundle data and operations together in tasks which are processed locally on the nodes of the distributed system. Allocation strategies, i.e., methods that decide the mapping from tasks to nodes, are core components in data-oriented systems. Good allocation strategies can lead to balanced systems while bad allocation strategies cause skew in the load and therefore suboptimal application performance and infrastructure utilization. Optimal allocation strategies are hard to find given the complexity of the systems, the complicated interactions of tasks, and the huge solution space. To ensure the scalability of data-oriented systems and to keep them manageable with hundreds of thousands of tasks, thousands of nodes, and dynamic workloads, fast and reliable allocation strategies are mandatory. In this thesis, we develop novel allocation strategies for data-oriented systems based on graph partitioning algorithms. Therefore, we show that systems from different application scenarios with different abstraction levels can be generalized to generic infrastructure and workload descriptions. We use weighted graph representations to model infrastructures with bounded and unbounded, i.e., overcommited, resources and possibly non-linear performance characteristics. Based on our generalized infrastructure and workload model, we formalize the allocation problem, which seeks valid and balanced allocations that minimize communication. Our allocation strategies partition the workload graph using solution heuristics that work with single and multiple vertex weights. Novel extensions to these solution heuristics can be used to balance penalized and secondary graph partition weights. These extensions enable the allocation strategies to handle infrastructures with non-linear performance behavior. On top of the basic algorithms, we propose methods to incorporate heterogeneous infrastructures and to react to changing workloads and infrastructures by incrementally updating the partitioning. We evaluate all components of our allocation strategy algorithms and show their applicability and scalability with synthetic workload graphs. In end-to-end--performance experiments in two actual data-oriented systems, a database-as-a-service system and a database management system for multiprocessor systems, we prove that our allocation strategies outperform alternative state-of-the-art methods. Datenbank Graphpartitionierung Datenorientierung data management data-orientation database-as-a-service graph partitioning ddc:004 rvk:ST 270

Search results