Global ETD Search

21	Why-Query Support in Graph Databases Vasilyeva, Elena 28 March 2017 (has links) (PDF) In the last few decades, database management systems became powerful tools for storing large amount of data and executing complex queries over them. In addition to extended functionality, novel types of databases appear like triple stores, distributed databases, etc. Graph databases implementing the property-graph model belong to this development branch and provide a new way for storing and processing data in the form of a graph with nodes representing some entities and edges describing connections between them. This consideration makes them suitable for keeping data without a rigid schema for use cases like social-network processing or data integration. In addition to a flexible storage, graph databases provide new querying possibilities in the form of path queries, detection of connected components, pattern matching, etc. However, the schema flexibility and graph queries come with additional costs. With limited knowledge about data and little experience in constructing the complex queries, users can create such ones, which deliver unexpected results. Forced to debug queries manually and overwhelmed by the amount of query constraints, users can get frustrated by using graph databases. What is really needed, is to improve usability of graph databases by providing debugging and explaining functionality for such situations. We have to assist users in the discovery of what were the reasons of unexpected results and what can be done in order to fix them. The unexpectedness of result sets can be expressed in terms of their size or content. In the first case, users have to solve the empty-answer, too-many-, or too-few-answers problems. In the second case, users care about the result content and miss some expected answers or wonder about presence of some unexpected ones. Considering the typical problems of receiving no or too many results by querying graph databases, in this thesis we focus on investigating the problems of the first group, whose solutions are usually represented by why-empty, why-so-few, and why-so-many queries. Our objective is to extend graph databases with debugging functionality in the form of why-queries for unexpected query results on the example of pattern matching queries, which are one of general graph-query types. We present a comprehensive analysis of existing debugging tools in the state-of-the-art research and identify their common properties. From them, we formulate the following features of why-queries, which we discuss in this thesis, namely: holistic support of different cardinality-based problems, explanation of unexpected results and query reformulation, comprehensive analysis of explanations, and non-intrusive user integration. To support different cardinality-based problems, we develop methods for explaining no, too few, and too many results. To cover different kinds of explanations, we present two types: subgraph- and modification-based explanations. The first type identifies the reasons of unexpectedness in terms of query subgraphs and delivers differential graphs as answers. The second one reformulates queries in such a way that they produce better results. Considering graph queries to be complex structures with multiple constraints, we investigate different ways of generating explanations starting from the most general one that considers only a query topology through coarse-grained rewriting up to fine-grained modification that allows fine changes of predicates and topology. To provide a comprehensive analysis of explanations, we propose to compare them on three levels including a syntactic description, a content, and a size of a result set. In order to deliver user-aware explanations, we discuss two models for non-intrusive user integration in the generation process. With the techniques proposed in this thesis, we are able to provide fundamentals for debugging of pattern-matching queries, which deliver no, too few, or too many results, in graph databases implementing the property-graph model. Graph Datenbanken Anfragebearbeitung Graph databases pattern matching empty-answer problem why-queries ddc:004 rvk:ST 265 rvk:ST 270
22	Effective and efficient similarity search in databases Lange, Dustin January 2013 (has links) Given a large set of records in a database and a query record, similarity search aims to find all records sufficiently similar to the query record. To solve this problem, two main aspects need to be considered: First, to perform effective search, the set of relevant records is defined using a similarity measure. Second, an efficient access method is to be found that performs only few database accesses and comparisons using the similarity measure. This thesis solves both aspects with an emphasis on the latter. In the first part of this thesis, a frequency-aware similarity measure is introduced. Compared record pairs are partitioned according to frequencies of attribute values. For each partition, a different similarity measure is created: machine learning techniques combine a set of base similarity measures into an overall similarity measure. After that, a similarity index for string attributes is proposed, the State Set Index (SSI), which is based on a trie (prefix tree) that is interpreted as a nondeterministic finite automaton. For processing range queries, the notion of query plans is introduced in this thesis to describe which similarity indexes to access and which thresholds to apply. The query result should be as complete as possible under some cost threshold. Two query planning variants are introduced: (1) Static planning selects a plan at compile time that is used for all queries. (2) Query-specific planning selects a different plan for each query. For answering top-k queries, the Bulk Sorted Access Algorithm (BSA) is introduced, which retrieves large chunks of records from the similarity indexes using fixed thresholds, and which focuses its efforts on records that are ranked high in more than one attribute and thus promising candidates. The described components form a complete similarity search system. Based on prototypical implementations, this thesis shows comparative evaluation results for all proposed approaches on different real-world data sets, one of which is a large person data set from a German credit rating agency. / Ziel von Ähnlichkeitssuche ist es, in einer Menge von Tupeln in einer Datenbank zu einem gegebenen Anfragetupel all diejenigen Tupel zu finden, die ausreichend ähnlich zum Anfragetupel sind. Um dieses Problem zu lösen, müssen zwei zentrale Aspekte betrachtet werden: Erstens, um eine effektive Suche durchzuführen, muss die Menge der relevanten Tupel mithilfe eines Ähnlichkeitsmaßes definiert werden. Zweitens muss eine effiziente Zugriffsmethode gefunden werden, die nur wenige Datenbankzugriffe und Vergleiche mithilfe des Ähnlichkeitsmaßes durchführt. Diese Arbeit beschäftigt sich mit beiden Aspekten und legt den Fokus auf Effizienz. Im ersten Teil dieser Arbeit wird ein häufigkeitsbasiertes Ähnlichkeitsmaß eingeführt. Verglichene Tupelpaare werden entsprechend der Häufigkeiten ihrer Attributwerte partitioniert. Für jede Partition wird ein unterschiedliches Ähnlichkeitsmaß erstellt: Mithilfe von Verfahren des Maschinellen Lernens werden Basisähnlichkeitsmaßes zu einem Gesamtähnlichkeitsmaß verbunden. Danach wird ein Ähnlichkeitsindex für String-Attribute vorgeschlagen, der State Set Index (SSI), welcher auf einem Trie (Präfixbaum) basiert, der als nichtdeterministischer endlicher Automat interpretiert wird. Zur Verarbeitung von Bereichsanfragen wird in dieser Arbeit die Notation der Anfragepläne eingeführt, um zu beschreiben welche Ähnlichkeitsindexe angefragt und welche Schwellwerte dabei verwendet werden sollen. Das Anfrageergebnis sollte dabei so vollständig wie möglich sein und die Kosten sollten einen gegebenen Schwellwert nicht überschreiten. Es werden zwei Verfahren zur Anfrageplanung vorgeschlagen: (1) Beim statischen Planen wird zur Übersetzungszeit ein Plan ausgewählt, der dann für alle Anfragen verwendet wird. (2) Beim anfragespezifischen Planen wird für jede Anfrage ein unterschiedlicher Plan ausgewählt. Zur Beantwortung von Top-k-Anfragen stellt diese Arbeit den Bulk Sorted Access-Algorithmus (BSA) vor, der große Mengen von Tupeln mithilfe fixer Schwellwerte von den Ähnlichkeitsindexen abfragt und der Tupel bevorzugt, die hohe Ähnlichkeitswerte in mehr als einem Attribut haben und damit vielversprechende Kandidaten sind. Die vorgestellten Komponenten bilden ein vollständiges Ähnlichkeitssuchsystem. Basierend auf einer prototypischen Implementierung zeigt diese Arbeit vergleichende Evaluierungsergebnisse für alle vorgestellten Ansätze auf verschiedenen Realwelt-Datensätzen; einer davon ist ein großer Personendatensatz einer deutschen Wirtschaftsauskunftei. Datenbanken Ähnlichkeitssuche Suchverfahren Ähnlichkeitsmaße Indexstrukturen Databases Similarity Search Search Algorithms Similarity Measures Index Structures Data processing Computer science
23	Profillinie 4: Kundenorientierte Gestaltung von vernetzten Wertschöpfungsketten Zanger, Cornelia, Müller, Egon, Meyer, Matthias, Bocklisch, Steffen, Fischer, Marco, Jähn, Hendrik, Käschel, Joachim, Roth, Steffen, Benn, Wolfgang, Jurczek, Peter, Schöne, Roland, Freitag, Matthias 11 November 2005 (has links) (PDF) Wertschöpfungsstrukturen in der Gesellschaft passen sich vor dem Hintergrund globaler Herausforderungen sehr flexibel und dynamisch an ständig neue Anforderungen an. Innovative Wertschöpfung kann dabei immer weniger von einzelnen Akteuren in Wissenschaft, Technik, Wirtschaft und Gesellschaft geleistet werden, sondern verschiedene interdisziplinäre Kompetenzen müssen gebündelt werden, die zielorientierte Vernetzung von Wissen und Ressourcen muss gestaltet werden. Unbestechlicher Maßstab für den Erfolg jeglicher wirtschaftlicher Unternehmung und damit auch von vernetzten Wertschöpfungsketten ist der Kunde mit seinen individuellen Bedürfnissen, das heißt letztlich die Akzeptanz und Absatzchancen von Produkten und Dienstleistungen am Markt. Forschungsprofillinie 4 Kommunale Netze Netzwerke Netzwerkstrukturen Produktionsnetze Prozessgestaltung intelligente Datenbanken vernetzte Systeme ddc:330 Chemnitz / Technische Universität Netzwerkmanagement Profil Wertschöpfungskette
24	A Study of Partitioning and Parallel UDF Execution with the SAP HANA Database Große, Philipp, May, Norman, Lehner, Wolfgang 08 July 2014 (has links) (PDF) Large-scale data analysis relies on custom code both for preparing the data for analysis as well as for the core analysis algorithms. The map-reduce framework offers a simple model to parallelize custom code, but it does not integrate well with relational databases. Likewise, the literature on optimizing queries in relational databases has largely ignored user-defined functions (UDFs). In this paper, we discuss annotations for user-defined functions that facilitate optimizations that both consider relational operators and UDFs. We believe this to be the superior approach compared to just linking map-reduce evaluation to a relational database because it enables a broader range of optimizations. In this paper we focus on optimizations that enable the parallel execution of relational operators and UDFs for a number of typical patterns. A study on real-world data investigates the opportunities for parallelization of complex data flows containing both relational operators and UDFs. UDF Datenbanken Optimierung SAP HANA benutzerdefinierte Funktionen UDF Database Optimization SAP HANA user-defined functions ddc:004 rvk:SS 5514
25	Query Answering in Probabilistic Data and Knowledge Bases Ceylan, Ismail Ilkan 04 June 2018 (has links) (PDF) Probabilistic data and knowledge bases are becoming increasingly important in academia and industry. They are continuously extended with new data, powered by modern information extraction tools that associate probabilities with knowledge base facts. The state of the art to store and process such data is founded on probabilistic database systems, which are widely and successfully employed. Beyond all the success stories, however, such systems still lack the fundamental machinery to convey some of the valuable knowledge hidden in them to the end user, which limits their potential applications in practice. In particular, in their classical form, such systems are typically based on strong, unrealistic limitations, such as the closed-world assumption, the closed-domain assumption, the tuple-independence assumption, and the lack of commonsense knowledge. These limitations do not only lead to unwanted consequences, but also put such systems on weak footing in important tasks, querying answering being a very central one. In this thesis, we enhance probabilistic data and knowledge bases with more realistic data models, thereby allowing for better means for querying them. Building on the long endeavor of unifying logic and probability, we develop different rigorous semantics for probabilistic data and knowledge bases, analyze their computational properties and identify sources of (in)tractability and design practical scalable query answering algorithms whenever possible. To achieve this, the current work brings together some recent paradigms from logics, probabilistic inference, and database theory. Logik Probabilistische Datenbanken Probabilistische Wissensbasen Anfragebeantwortung Komplexität Logic Probabilistic Databases Probabilistic Knowledge Bases Query Answering Complexity ddc:004 rvk:ST 270
26	A Common Programming Interface for Managed Heterogeneous Data Analysis Luong, Johannes 28 July 2021 (has links) The widespread success of data analysis in a growing number of application domains has lead to the development of a variety of purpose build data processing systems. Today, many organizations operate whole fleets of different data related systems. Although this differentiation has good reasons there is also a growing need to create holistic perspectives that cut across the borders of individual systems. Application experts that want to create such perspectives are confronted with a variety of programming interfaces, data formats, and the task to combine available systems in an efficient manner. These issues are generally unrelated to the application domain and require a specialized set of skills. As a consequence, development is slowed down and made more expensive which stifles exploration and innovation. In addition, the direct use of specialized system interfaces can couple application code to specific processing systems. In this dissertation, we propose the data processing platform DataCalc which presents users with a unified application oriented programming interface and which automatically executes this interface in an efficient manner on a variety of processing systems. DataCalc offers a managed environment for data analyses that enables domain experts to concentrate on their application logic and decouples code from specific processing technology. The basis of this managed processing environment are the high-level domain oriented program representation DCIL and a flexible and extensible cost based optimization component. In addition to traditional up-front optimization, the optimizer also supports dynamic re-optimization of partially executed DCIL programs. This enables the system to benefit from dynamic information that only becomes available during execution of queries. DataCalc assigns workloads to available processing systems using a fine grained task scheduling model to enable efficient exploitation of available resources. In the second part of the dissertation we present a prototypical implementation of the DataCalc platform which includes connectors for the relational DBMS PostgreSQL, the document store MongoDB, the graph database Neo4j, and for the custom build PyProc processing system. For the evaluation of this prototype we have implemented an extended application scenario. Our experiments demonstrate that DataCalc is able to find and execute efficient execution strategies that minimize cross system data movement. The system achieves much better results than a naive implementation and it comes close to the performance of a hand-optimized solution. Based on these findings we are confident to conclude that the DataCalc platform architecture provides an excellent environment for cross domain data analysis on a heterogeneous federated processing architecture. info:eu-repo/classification/ddc/004 ddc:004
27	Why-Query Support in Graph Databases Vasilyeva, Elena 08 November 2016 (has links) In the last few decades, database management systems became powerful tools for storing large amount of data and executing complex queries over them. In addition to extended functionality, novel types of databases appear like triple stores, distributed databases, etc. Graph databases implementing the property-graph model belong to this development branch and provide a new way for storing and processing data in the form of a graph with nodes representing some entities and edges describing connections between them. This consideration makes them suitable for keeping data without a rigid schema for use cases like social-network processing or data integration. In addition to a flexible storage, graph databases provide new querying possibilities in the form of path queries, detection of connected components, pattern matching, etc. However, the schema flexibility and graph queries come with additional costs. With limited knowledge about data and little experience in constructing the complex queries, users can create such ones, which deliver unexpected results. Forced to debug queries manually and overwhelmed by the amount of query constraints, users can get frustrated by using graph databases. What is really needed, is to improve usability of graph databases by providing debugging and explaining functionality for such situations. We have to assist users in the discovery of what were the reasons of unexpected results and what can be done in order to fix them. The unexpectedness of result sets can be expressed in terms of their size or content. In the first case, users have to solve the empty-answer, too-many-, or too-few-answers problems. In the second case, users care about the result content and miss some expected answers or wonder about presence of some unexpected ones. Considering the typical problems of receiving no or too many results by querying graph databases, in this thesis we focus on investigating the problems of the first group, whose solutions are usually represented by why-empty, why-so-few, and why-so-many queries. Our objective is to extend graph databases with debugging functionality in the form of why-queries for unexpected query results on the example of pattern matching queries, which are one of general graph-query types. We present a comprehensive analysis of existing debugging tools in the state-of-the-art research and identify their common properties. From them, we formulate the following features of why-queries, which we discuss in this thesis, namely: holistic support of different cardinality-based problems, explanation of unexpected results and query reformulation, comprehensive analysis of explanations, and non-intrusive user integration. To support different cardinality-based problems, we develop methods for explaining no, too few, and too many results. To cover different kinds of explanations, we present two types: subgraph- and modification-based explanations. The first type identifies the reasons of unexpectedness in terms of query subgraphs and delivers differential graphs as answers. The second one reformulates queries in such a way that they produce better results. Considering graph queries to be complex structures with multiple constraints, we investigate different ways of generating explanations starting from the most general one that considers only a query topology through coarse-grained rewriting up to fine-grained modification that allows fine changes of predicates and topology. To provide a comprehensive analysis of explanations, we propose to compare them on three levels including a syntactic description, a content, and a size of a result set. In order to deliver user-aware explanations, we discuss two models for non-intrusive user integration in the generation process. With the techniques proposed in this thesis, we are able to provide fundamentals for debugging of pattern-matching queries, which deliver no, too few, or too many results, in graph databases implementing the property-graph model. info:eu-repo/classification/ddc/004 ddc:004 Graph Datenbanken, Anfragebearbeitung
28	Komplexe Datenanalyseprozesse in serviceorientierten Umgebungen Habich, Dirk 08 December 2008 (has links) Im Rahmen dieser Dissertation wird sich mit der Einbettung komplexer Datenanalyseprozesse in serviceorientierten Umgebungen beschäftigt. Diese Betrachtung beginnt mit einem konkreten Anwendungsgebiet, indem derartige Analyseprozesse eine entscheidende Rolle bei der Wissenserschließung spielen und ohne deren Hilfe kein Fortschritt erzielt werden kann. Im zweiten Teil werden konkrete komplexe Datenanalyseprozesse entwickelt, die den Ausgangspunkt für die Erörterung der Einbettung in eine serviceorientierte Umgebung bilden. Auf diese Einbettung wird schlussendlich im dritten Teil der Dissertation eingegangen und entsprechende Erweiterungen an den Technologien der bekanntesten Realisierungsform präsentiert. In der Evaluierung wird gezeigt, dass diese neue Form wesentlich besser geeignet ist für komplexe Datenanalyseprozesse als die bisherige Variante. info:eu-repo/classification/ddc/004 ddc:004
29	InVerDa - co-existing Schema Versions Made Foolproof Herrmann, Kai, Voigt, Hannes, Seyschab, Thorsten, Lehner, Wolfgang 01 July 2021 (has links) In modern software landscapes multiple applications usually share one database as their single point of truth. All these applications will evolve over time by their very nature. Often former versions need to stay available, so database developers find themselves maintaining co-existing schema version of multiple applications in multiple versions. This is highly error-prone and accounts for significant costs in software projects, as developers realize the translation of data accesses between schema versions with hand-written delta code. In this demo, we showcase INVERDA, a tool for integrated, robust, and easy to use database versioning. We rethink the way of specifying the evolution to new schema versions. Using the richer semantics of a descriptive database evolution language, we generate all required artifacts automatically and make database versioning foolproof. info:eu-repo/classification/ddc/004 ddc:004
30	SynopSys: Foundations for Multidimensional Graph Analytics Rudolf, Michael, Voigt, Hannes, Bornhövd, Christof, Lehner, Wolfgang 02 February 2023 (has links) The past few years have seen a tremendous increase in often irregularly structured data that can be represented most naturally and efficiently in the form of graphs. Making sense of incessantly growing graphs is not only a key requirement in applications like social media analysis or fraud detection but also a necessity in many traditional enterprise scenarios. Thus, a flexible approach for multidimensional analysis of graph data is needed. Whereas many existing technologies require up-front modelling of analytical scenarios and are difficult to adapt to changes, our approach allows for ad-hoc analytical queries of graph data. Extending our previous work on graph summarization, in this position paper we lay the foundation for large graph analytics to enable business intelligence on graph-structured data. info:eu-repo/classification/ddc/004 ddc:004

Search results