Global ETD Search

1	Graph Processing in Main-Memory Column Stores Paradies, Marcus 29 May 2017 (has links) (PDF) Evermore, novel and traditional business applications leverage the advantages of a graph data model, such as the offered schema flexibility and an explicit representation of relationships between entities. As a consequence, companies are confronted with the challenge of storing, manipulating, and querying terabytes of graph data for enterprise-critical applications. Although these business applications operate on graph-structured data, they still require direct access to the relational data and typically rely on an RDBMS to keep a single source of truth and access. Existing solutions performing graph operations on business-critical data either use a combination of SQL and application logic or employ a graph data management system. For the first approach, relying solely on SQL results in poor execution performance caused by the functional mismatch between typical graph operations and the relational algebra. To the worse, graph algorithms expose a tremendous variety in structure and functionality caused by their often domain-specific implementations and therefore can be hardly integrated into a database management system other than with custom coding. Since the majority of these enterprise-critical applications exclusively run on relational DBMSs, employing a specialized system for storing and processing graph data is typically not sensible. Besides the maintenance overhead for keeping the systems in sync, combining graph and relational operations is hard to realize as it requires data transfer across system boundaries. A basic ingredient of graph queries and algorithms are traversal operations and are a fundamental component of any database management system that aims at storing, manipulating, and querying graph data. Well-established graph traversal algorithms are standalone implementations relying on optimized data structures. The integration of graph traversals as an operator into a database management system requires a tight integration into the existing database environment and a development of new components, such as a graph topology-aware optimizer and accompanying graph statistics, graph-specific secondary index structures to speedup traversals, and an accompanying graph query language. In this thesis, we introduce and describe GRAPHITE, a hybrid graph-relational data management system. GRAPHITE is a performance-oriented graph data management system as part of an RDBMS allowing to seamlessly combine processing of graph data with relational data in the same system. We propose a columnar storage representation for graph data to leverage the already existing and mature data management and query processing infrastructure of relational database management systems. At the core of GRAPHITE we propose an execution engine solely based on set operations and graph traversals. Our design is driven by the observation that different graph topologies expose different algorithmic requirements to the design of a graph traversal operator. We derive two graph traversal implementations targeting the most common graph topologies and demonstrate how graph-specific statistics can be leveraged to select the optimal physical traversal operator. To accelerate graph traversals, we devise a set of graph-specific, updateable secondary index structures to improve the performance of vertex neighborhood expansion. Finally, we introduce a domain-specific language with an intuitive programming model to extend graph traversals with custom application logic at runtime. We use the LLVM compiler framework to generate efficient code that tightly integrates the user-specified application logic with our highly optimized built-in graph traversal operators. Our experimental evaluation shows that GRAPHITE can outperform native graph management systems by several orders of magnitude while providing all the features of an RDBMS, such as transaction support, backup and recovery, security and user management, effectively providing a promising alternative to specialized graph management systems that lack many of these features and require expensive data replication and maintenance processes. Graphdatenverarbeitung Spalten-orientierte Datenbanksysteme Graph Traversierung Graph data processing column stores graph traversals ddc:004 rvk:ST 270
2	Graph Processing in Main-Memory Column Stores Paradies, Marcus 03 February 2017 (has links) Evermore, novel and traditional business applications leverage the advantages of a graph data model, such as the offered schema flexibility and an explicit representation of relationships between entities. As a consequence, companies are confronted with the challenge of storing, manipulating, and querying terabytes of graph data for enterprise-critical applications. Although these business applications operate on graph-structured data, they still require direct access to the relational data and typically rely on an RDBMS to keep a single source of truth and access. Existing solutions performing graph operations on business-critical data either use a combination of SQL and application logic or employ a graph data management system. For the first approach, relying solely on SQL results in poor execution performance caused by the functional mismatch between typical graph operations and the relational algebra. To the worse, graph algorithms expose a tremendous variety in structure and functionality caused by their often domain-specific implementations and therefore can be hardly integrated into a database management system other than with custom coding. Since the majority of these enterprise-critical applications exclusively run on relational DBMSs, employing a specialized system for storing and processing graph data is typically not sensible. Besides the maintenance overhead for keeping the systems in sync, combining graph and relational operations is hard to realize as it requires data transfer across system boundaries. A basic ingredient of graph queries and algorithms are traversal operations and are a fundamental component of any database management system that aims at storing, manipulating, and querying graph data. Well-established graph traversal algorithms are standalone implementations relying on optimized data structures. The integration of graph traversals as an operator into a database management system requires a tight integration into the existing database environment and a development of new components, such as a graph topology-aware optimizer and accompanying graph statistics, graph-specific secondary index structures to speedup traversals, and an accompanying graph query language. In this thesis, we introduce and describe GRAPHITE, a hybrid graph-relational data management system. GRAPHITE is a performance-oriented graph data management system as part of an RDBMS allowing to seamlessly combine processing of graph data with relational data in the same system. We propose a columnar storage representation for graph data to leverage the already existing and mature data management and query processing infrastructure of relational database management systems. At the core of GRAPHITE we propose an execution engine solely based on set operations and graph traversals. Our design is driven by the observation that different graph topologies expose different algorithmic requirements to the design of a graph traversal operator. We derive two graph traversal implementations targeting the most common graph topologies and demonstrate how graph-specific statistics can be leveraged to select the optimal physical traversal operator. To accelerate graph traversals, we devise a set of graph-specific, updateable secondary index structures to improve the performance of vertex neighborhood expansion. Finally, we introduce a domain-specific language with an intuitive programming model to extend graph traversals with custom application logic at runtime. We use the LLVM compiler framework to generate efficient code that tightly integrates the user-specified application logic with our highly optimized built-in graph traversal operators. Our experimental evaluation shows that GRAPHITE can outperform native graph management systems by several orders of magnitude while providing all the features of an RDBMS, such as transaction support, backup and recovery, security and user management, effectively providing a promising alternative to specialized graph management systems that lack many of these features and require expensive data replication and maintenance processes. info:eu-repo/classification/ddc/004 ddc:004
3	Towards More Scalable and Practical Program Synthesis Yanjun Wang (12240227) 29 April 2022 (has links) <p>Program synthesis aims to generate programs automatically from user-provided specifications and has the potential to aid users in real-world programming tasks from different domains. Although there have been great achievements of synthesis techniques in specific domains such as spreadsheet programming, computer-aided education and software engineering, there still exist huge barriers that keep us from achieving scalable and practical synthesis tools.</p> <p><br></p> <p>This dissertation presents several techniques towards more scalable and practical program synthesis from three perspectives: 1) intention: Writing formal specification for synthesis is a major barrier for average programmers. In particular, in some quantitative synthesis scenarios (such as network design), the first challenge faced by users is expressing their optimization targets. To address this problem, we present comparative synthesis, an interactive synthesis framework that learns near optimal programs through comparative queries, without explicitly specified optimization targets. 2) invention: Synthesis algorithms are key to pushing the performance limit of program synthesis. Aiming to solve syntax-guided synthesis problems efficiently, we introduce a cooperative synthesis technique that combines the merits of enumerative and deductive synthesis. 3) adaptation: Besides functional correctness, quality of generated code is another important aspect. Towards automated provably-correct optimization over tree traversals, we propose a stack-based representation for iterations in tree traversals and an encoding to Monadic Second-Order logic over trees, which enables reasoning about tree traversal transformations which were not possible before.</p> Programming Languages Networking and Communications Automatic Programming Program Synthesis Automated Reasoning Syntax-Guided Synthesis Tree Traversals Traffic Engineering
4	GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores Paradies, Marcus, Rudolf, Michael, Bornhövd, Christof, Lehner, Wolfgang 25 August 2022 (has links) Native graph query and processing capabilities have become indispensable for modern business applications in enterprise-critical operations on data that is stored in relational database management systems. Traversal operations are a basic ingredient of graph algorithms and graph queries. As a consequence, they are fundamental for querying graph data in a relational database management system. In this paper we present gratin, a concise secondary index structure to speedup graph traversals in main-memory column stores. Conventional approaches for graph traversals rely on repeated full column scans, making it an inefficient approach for deep traversals on very large graphs. To tackle this challenge, we devise a novel and adaptive block-based index to handle graphs efficiently. Most importantly, gratin is updateable in constant time and allows supporting evolving graphs with frequent updates to the graph topology. We conducted an extensive evaluation on real-world data sets from different domains for a large variety of traversal queries. Our experiments show improvements of up to an order of magnitude compared to a scan-based traversal algorithm. info:eu-repo/classification/ddc/004 ddc:004
5	Cluster Identification : Topic Models, Matrix Factorization And Concept Association Networks Arun, R 07 1900 (has links) (PDF) The problem of identifying clusters arising in the context of topic models and related approaches is important in the area of machine learning. The problem concerning traversals on Concept Association Networks is of great interest in the area of cognitive modelling. Cluster identification is the problem of finding the right number of clusters in a given set of points(or a dataset) in different settings including topic models and matrix factorization algorithms. Traversals in Concept Association Networks provide useful insights into cognitive modelling and performance. First, We consider the problem of authorship attribution of stylometry and the problem of cluster identification for topic models. For the problem of authorship attribution we show empirically that by using stop-words as stylistic features of an author, vectors obtained from the Latent Dirichlet Allocation (LDA) , outperforms other classifiers. Topics obtained by this method are generally abstract and it may not be possible to identify the cohesiveness of words falling in the same topic by mere manual inspection. Hence it is difficult to determine if the chosen number of topics is optimal. We next address this issue. We propose a new measure for topics arising out of LDA based on the divergence between the singular value distribution and the L1 norm distribution of the document-topic and topic-word matrices, respectively. It is shown that under certain assumptions, this measure can be used to find the right number of topics. Next we consider the Non-negative Matrix Factorization(NMF) approach for clustering documents. We propose entropy based regularization for a variant of the NMF with row-stochastic constraints on the component matrices. It is shown that when topic-splitting occurs, (i.e when an extra topic is required) an existing topic vector splits into two and the divergence term in the cost function decreases whereas the entropy term increases leading to a regularization. Next we consider the problem of clustering in Concept Association Networks(CAN). The CAN are generic graph models of relationships between abstract concepts. We propose a simple clustering algorithm which takes into account the complex network properties of CAN. The performance of the algorithm is compared with that of the graph-cut based spectral clustering algorithm. In addition, we study the properties of traversals by human participants on CAN. We obtain experimental results contrasting these traversals with those obtained from (i) random walk simulations and (ii) shortest path algorithms. Machine Learning Clustering (Concepts) Association Networks Concept Association Networks (CAN) Latent Dirichlet Allocation (LDA) Matrix Factorization Entropy (Information Theory) Cognitive Clustering Cluster Identification Non-negative Matrix Factorization (NMF) Computer Science
6	GPU-enhanced power flow analysis / Calcul de Flux de Puissance amélioré grâce aux Processeurs Graphiques Marin, Manuel 11 December 2015 (has links) Cette thèse propose un large éventail d'approches afin d'améliorer différents aspects de l'analyse des flux de puissance avec comme fils conducteur l'utilisation du processeurs graphiques (GPU). Si les GPU ont rapidement prouvés leurs efficacités sur des applications régulières pour lesquelles le parallélisme de données était facilement exploitable, il en est tout autrement pour les applications dites irrégulières. Ceci est précisément le cas de la plupart des algorithmes d'analyse de flux de puissance. Pour ce travail, nous nous inscrivons dans cette problématique d'optimisation de l'analyse de flux de puissance à l'aide de coprocesseur de type GPU. L'intérêt est double. Il étend le domaine d'application des GPU à une nouvelle classe de problème et/ou d'algorithme en proposant des solutions originales. Il permet aussi à l'analyse des flux de puissance de rester pertinent dans un contexte de changements continus dans les systèmes énergétiques, et ainsi d'en faciliter leur évolution. Nos principales contributions liées à la programmation sur GPU sont: (i) l'analyse des différentes méthodes de parcours d'arbre pour apporter une réponse au problème de la régularité par rapport à l'équilibrage de charge ; (ii) l'analyse de l'impact du format de représentation sur la performance des implémentations d'arithmétique floue. Nos contributions à l'analyse des flux de puissance sont les suivantes: (ii) une nouvelle méthode pour l'évaluation de l'incertitude dans l'analyse des flux de puissance ; (ii) une nouvelle méthode de point fixe pour l'analyse des flux de puissance, problème que l'on qualifie d'intrinsèquement parallèle. / This thesis addresses the utilization of Graphics Processing Units (GPUs) for improving the Power Flow (PF) analysis of modern power systems. Currently, GPUs are challenged by applications exhibiting an irregular computational pattern, as is the case of most known methods for PF analysis. At the same time, the PF analysis needs to be improved in order to cope with new requirements of efficiency and accuracy coming from the Smart Grid concept. The relevance of GPU-enhanced PF analysis is twofold. On one hand, it expands the application domain of GPU to a new class of problems. On the other hand, it consistently increases the computational capacity available for power system operation and design. The present work attempts to achieve that in two complementary ways: (i) by developing novel GPU programming strategies for available PF algorithms, and (ii) by proposing novel PF analysis methods that can exploit the numerous features present in GPU architectures. Specific contributions on GPU computing include: (i) a comparison of two programming paradigms, namely regularity and load-balancing, for implementing the so-called treefix operations; (ii) a study of the impact of the representation format over performance and accuracy, for fuzzy interval algebraic operations; and (iii) the utilization of architecture-specific design, as a novel strategy to improve performance scalability of applications. Contributions on PF analysis include: (i) the design and evaluation of a novel method for the uncertainty assessment, based on the fuzzy interval approach; and (ii) the development of an intrinsically parallel method for PF analysis, which is not affected by the Amdahl's law. Flux de Puissance Processeurs Graphiques Algorithmes parallèles Arithmétique des ordinateurs Architectures multicœurs Méthode de Newton Analyse d’incertitude Parcours d'arbre Itérations asynchrones Réseaux intelligents Intervalles floues Power Flow Graphic Processing Units Parallel algorithms Parallel algorithms Multi-core architectures Newton method Uncertainty analysis Tree traversals Asynchronous iterations Smart grids Fuzzy intervals 004 005

1

Page generated in 0.0748 seconds