Spelling suggestions: "subject:"graph databases"" "subject:"graph atabases""
21 |
Návrh postupu tvorby aplikace pro Linked Open Data / The proposal of application development process for Linked Open DataBudka, Michal January 2014 (has links)
This thesis deals with the issue of Linked Open Data. The goal of this thesis is to introduce the reader to this issue as a whole and to the possibility of using Linked Open Data for developing useful applications by proposing a new development process focusing on such applications. The theoretical part offers an insight into the issue of Open Data, Linked Open Data and the NoSQL database systems and their usability in this field. It focuses mainly on graph database systems and compares them with relational database systems using predefined criteria. Additionally, the goal of this thesis is to develop an application using the proposed development process, which provides a tool for data presentation and statistical visualisation for open data sets published by the Supreme Audit Office and the Czech Trade Inspection. The application is mainly developed for the purpose of verifying the proposed development process and to demonstrate the connectivity of open data published by two different organizations.The thesis includes the process of selecting a development methodology, which is then used for optimising work on the implementation of the resulting application and the process of selecting a graph database system, that is used to store and modify open data for the purposes of the application.
|
22 |
Why-Query Support in Graph DatabasesVasilyeva, Elena 08 November 2016 (has links)
In the last few decades, database management systems became powerful tools for storing large amount of data and executing complex queries over them. In addition to extended functionality, novel types of databases appear like triple stores, distributed databases, etc. Graph databases implementing the property-graph model belong to this development branch and provide a new way for storing and processing data in the form of a graph with nodes representing some entities and edges describing connections between them. This consideration makes them suitable for keeping data without a rigid schema for use cases like social-network processing or data integration. In addition to a flexible storage, graph databases provide new querying possibilities in the form of path queries, detection of connected components, pattern matching, etc.
However, the schema flexibility and graph queries come with additional costs. With limited knowledge about data and little experience in constructing the complex queries, users can create such ones, which deliver unexpected results. Forced to debug queries manually and overwhelmed by the amount of query constraints, users can get frustrated by using graph databases. What is really needed, is to improve usability of graph databases by providing debugging and explaining functionality for such situations. We have to assist users in the discovery of what were the reasons of unexpected results and what can be done in order to fix them.
The unexpectedness of result sets can be expressed in terms of their size or content. In the first case, users have to solve the empty-answer, too-many-, or too-few-answers problems. In the second case, users care about the result content and miss some expected answers or wonder about presence of some unexpected ones. Considering the typical problems of receiving no or too many results by querying graph databases, in this thesis we focus on investigating the problems of the first group, whose solutions are usually represented by why-empty, why-so-few, and why-so-many queries. Our objective is to extend graph databases with debugging functionality in the form of why-queries for unexpected query results on the example of pattern matching queries, which are one of general graph-query types. We present a comprehensive analysis of existing debugging tools in the state-of-the-art research and identify their common properties.
From them, we formulate the following features of why-queries, which we discuss in this thesis, namely: holistic support of different cardinality-based problems, explanation of unexpected results and query reformulation, comprehensive analysis of explanations, and non-intrusive user integration. To support different cardinality-based problems, we develop methods for explaining no, too few, and too many results. To cover different kinds of explanations, we present two types: subgraph- and modification-based explanations. The first type identifies the reasons of unexpectedness in terms of query subgraphs and delivers differential graphs as answers. The second one reformulates queries in such a way that they produce better results. Considering graph queries to be complex structures with multiple constraints, we investigate different ways of generating explanations starting from the most general one that considers only a query topology through coarse-grained rewriting up to fine-grained modification that allows fine changes of predicates and topology. To provide a comprehensive analysis of explanations, we propose to compare them on three levels including a syntactic description, a content, and a size of a result set. In order to deliver user-aware explanations, we discuss two models for non-intrusive user integration in the generation process.
With the techniques proposed in this thesis, we are able to provide fundamentals for debugging of pattern-matching queries, which deliver no, too few, or too many results, in graph databases implementing the property-graph model.
|
23 |
SynopSys: Foundations for Multidimensional Graph AnalyticsRudolf, Michael, Voigt, Hannes, Bornhövd, Christof, Lehner, Wolfgang 02 February 2023 (has links)
The past few years have seen a tremendous increase in often irregularly structured data that can be represented most naturally and efficiently in the form of graphs. Making sense of incessantly growing graphs is not only a key requirement in applications like social media analysis or fraud detection but also a necessity in many traditional enterprise scenarios. Thus, a flexible approach for multidimensional analysis of graph data is needed. Whereas many existing technologies require up-front modelling of analytical scenarios and are difficult to adapt to changes, our approach allows for ad-hoc analytical queries of graph data. Extending our previous work on graph summarization, in this position paper we lay the foundation for large graph analytics to enable business intelligence on graph-structured data.
|
24 |
Top-k Differential Queries in Graph DatabasesVasilyeva, Elena, Thiele, Maik, Bornhövd, Christof, Lehner, Wolfgang 03 February 2023 (has links)
The sheer volume as well as the schema complexity of today’s graph databases impede the users in formulating queries against these databases and often cause queries to “fail” by delivering empty answers. To support users in such situations, the concept of differential queries can be used to bridge the gap between an unexpected result (e.g. an empty result set) and the query intention of users. These queries deliver missing parts of a query graph and, therefore, work with such scenarios that require users to specify a query graph. Based on the discovered information about a missing query subgraph, users may understand which vertices and edges are the reasons for queries that unexpectedly return empty answers, and thus can reformulate the queries if needed. A study showed that the result sets of differential queries are often too large to be manually introspected by users and thus a reduction of the number of results and their ranking is required. To address these issues, we extend the concept of differential queries and introduce top-k differential queries that calculate the ranking based on users’ preferences and therefore significantly support the users’ understanding of query database management systems. The idea consists of assigning relevance weights to vertices or edges of a query graph by users that steer the graph search and are used in the scoring function for top-k differential results. Along with the novel concept of the top-k differential queries, we further propose a strategy for propagating relevance weights and we model the search along the most relevant paths.
|
25 |
Graphdatenbanken für die textorientierten e-HumanitiesEfer, Thomas 15 February 2017 (has links) (PDF)
Vor dem Hintergrund zahlreicher Digitalisierungsinitiativen befinden sich weite Teile der Geistes- und Sozialwissenschaften derzeit in einer Transition hin zur großflächigen Anwendung digitaler Methoden. Zwischen den Fachdisziplinen und der Informatik zeigen sich große Differenzen in der Methodik und bei der gemeinsamen Kommunikation. Diese durch interdisziplinäre Projektarbeit zu überbrücken, ist das zentrale Anliegen der sogenannten e-Humanities. Da Text der häufigste Untersuchungsgegenstand in diesem Feld ist, wurden bereits viele Verfahren des Text Mining auf Problemstellungen der Fächer angepasst und angewendet. Während sich langsam generelle Arbeitsabläufe und Best Practices etablieren, zeigt sich, dass generische Lösungen für spezifische Teilprobleme oftmals nicht geeignet sind. Um für diese Anwendungsfälle maßgeschneiderte digitale Werkzeuge erstellen zu können, ist eines der Kernprobleme die adäquate digitale Repräsentation von Text sowie seinen vielen Kontexten und Bezügen.
In dieser Arbeit wird eine neue Form der Textrepräsentation vorgestellt, die auf Property-Graph-Datenbanken beruht – einer aktuellen Technologie für die Speicherung und Abfrage hochverknüpfter Daten. Darauf aufbauend wird das Textrecherchesystem „Kadmos“ vorgestellt, mit welchem nutzerdefinierte asynchrone Webservices erstellt werden können. Es bietet flexible Möglichkeiten zur Erweiterung des Datenmodells und der Programmfunktionalität und kann Textsammlungen mit mehreren hundert Millionen Wörtern auf einzelnen Rechnern und weitaus größere in Rechnerclustern speichern. Es wird gezeigt, wie verschiedene Text-Mining-Verfahren über diese Graphrepräsentation realisiert und an sie angepasst werden können. Die feine Granularität der Zugriffsebene erlaubt die Erstellung passender Werkzeuge für spezifische fachwissenschaftliche Anwendungen. Zusätzlich wird demonstriert, wie die graphbasierte Modellierung auch über die rein textorientierte Forschung hinaus gewinnbringend eingesetzt werden kann. / In light of the recent massive digitization efforts, most of the humanities disciplines are currently undergoing a fundamental transition towards the widespread application of digital methods. In between those traditional scholarly fields and computer science exists a methodological and communicational gap, that the so-called \\\"e-Humanities\\\" aim to bridge systematically, via interdisciplinary project work. With text being the most common object of study in this field, many approaches from the area of Text Mining have been adapted to problems of the disciplines. While common workflows and best practices slowly emerge, it is evident that generic solutions are no ultimate fit for many specific application scenarios. To be able to create custom-tailored digital tools, one of the central issues is to digitally represent the text, as well as its many contexts and related objects of interest in an adequate manner.
This thesis introduces a novel form of text representation that is based on Property Graph databases – an emerging technology that is used to store and query highly interconnected data sets. Based on this modeling paradigm, a new text research system called \\\"Kadmos\\\" is introduced. It provides user-definable asynchronous web services and is built to allow for a flexible extension of the data model and system functionality within a prototype-driven development process. With Kadmos it is possible to easily scale up to text collections containing hundreds of millions of words on a single device and even further when using a machine cluster. It is shown how various methods of Text Mining can be implemented with and adapted for the graph representation at a very fine granularity level, allowing the creation of fitting digital tools for different aspects of scholarly work. In extended usage scenarios it is demonstrated how the graph-based modeling of domain data can be beneficial even in research scenarios that go beyond a purely text-based study.
|
26 |
Supporting device-to-device search and sharing of hyper-localized dataMichel, Jonas Reinhardt 08 September 2015 (has links)
Supporting emerging mobile applications in densely populated environments requires connecting mobile users and their devices with the surrounding digital landscape. Specifically, the volume of digitally-available data in such computing spaces presents an imminent need for expressive mechanisms that enable humans and applications to share and search for relevant information within their digitally accessible physical surroundings. Device-to-device communications will play a critical role in facilitating transparent access to proximate digital resources. A wide variety of approaches exist that support device-to-device dissemination and query-driven data access. Very few, however, capitalize on the contextual history of the shared data itself to distribute additional data or to guide queries. This dissertation presents Gander, an application substrate and mobile middleware designed to ease the burden associated with creating applications that require support for sharing and searching of hyper-localized data in situ. Gander employs a novel trajectory-driven model of spatiotemporal provenance that enriches shared data with its contextual history -- annotations that capture data's geospatial and causal history across a lifetime of device-to-device propagation. We demonstrate the value of spatiotemporal data provenance as both a tool for improving ad hoc routing performance and for driving complex application behavior. This dissertation discusses the design and implementation of Gander's middleware model, which abstracts away tedious implementation details by enabling developers to write high-level rules that govern when, where, and how data is distributed and to execute expressive queries across proximate digital resources. We evaluate Gander within several simulated large-scale environments and one real-world deployment on the UT Austin campus. The goal of this research is to provide formal constructs realized within a software framework that ease the software engineering challenges encountered during the design and deployment of several applications in emerging mobile environments. / text
|
27 |
Contribution à la mise en oeuvre d’un outillage unifié pour faciliter la qualification d’environnements normés / Toward a unified tooling to ease the qualification process of standardized environmentsGelibert, Anthony 27 October 2016 (has links)
Les environnements confinés, tels que les blocs chirurgicaux ou les salles blanches, hébergent des processus complexes auxquels sont associés de nombreux risques. Leur conception, leur réalisation et leur exploitation sont complexes, de par les très nombreuses normes les encadrant.La qualification de ces « environnements normés », afin d’en garantir la qualité de conception, requiert une expertise fine du métier et souffre du manque d’outil en permettant l’automatisation. Partant de ce constat, nous proposons une approche unifiée visant à faciliter la qualification des environnements normés. Celle-ci s’appuie sur une représentation du contexte normatif sous la forme d’un graphe unique, ainsi que sur une modélisation de l’environnement et son objet final par étapes successives permettant une vérification incrémentale de même que la production d’informations nécessaires à la traçabilité lors de l’exploitation. Cette démarche, illustrée au travers du domaine des environnements confinés médicaux, est générique et peut s’appliquer à l’ensemble des environnements normés. / Industrial clean rooms or operating rooms are critical places often hosting dangerous or complex processes. Their design, building and use are thus difficult and constrained by a large amount of standards and rules. Qualifying these environments, in order to ensure their quality, consequently requires a high level of expertise and lacks assisting tools.This leads us to propose a unified approach aiming at easing the qualification process of standardized environments. It relies on a graph-based representation of the set of standards and rules that apply to a specific case, as well as on step-by-step modelling of the whole target environment. The verification process is then eased as it becomes incremental. During each stage, relevant information can also be gathered in order to ensure environment traceability during its use.This approach, applied to medical environments for validation purposes, remains generic and can be applied to any kind of standardized environment.
|
28 |
Flexible querying of RDF databases : a contribution based on fuzzy logic / Interrogation flexible de bases de données RDF : une contribution basée sur la logique floueSlama, Olfa 22 November 2017 (has links)
Cette thèse porte sur la définition d'une approche flexible pour interroger des graphes RDF à la fois classiques et flous. Cette approche, basée sur la théorie des ensembles flous, permet d'étendre SPARQL qui est le langage de requête standardisé W3C pour RDF, de manière à pouvoir exprimer i) des préférences utilisateur floues sur les données (par exemple, l'année de publication d'un album est récente) et sur la structure du graphe (par exemple, le chemin entre deux amis doit être court) et ii) des préférences utilisateur plus complexes, prenant la forme de propositions quantifiées floues (par exemple, la plupart des albums qui sont recommandés par un artiste, sont très bien notés et ont été créés par un jeune ami de cet artiste). Nous avons effectué des expérimentations afin d'étudier les performances de cette approche. L'objectif principal de ces expérimentations était de montrer que le coût supplémentaire dû à l'introduction du flou reste limité/acceptable. Nous avons également étudié, dans un cadre plus général, celui de bases de données graphe, la question de l'intégration du même type de propositions quantifiées floues dans une extension floue de Cypher qui est un langage déclaratif pour l'interrogation des bases de données graphe classiques. Les résultats expérimentaux obtenus montrent que le coût supplémentaire induit par la présence de conditions quantifiées floues dans les requêtes reste également très limité dans ce cas. / This thesis concerns the definition of a flexible approach for querying both crisp and fuzzy RDF graphs. This approach, based on the theory of fuzzy sets, makes it possible to extend SPARQL which is the W3C-standardised query language for RDF, so as to be able to express i) fuzzy user preferences on data (e.g., the release year of an album is recent) and on the structure of the data graph (e.g., the path between two friends is required to be short) and ii) more complex user preferences, namely, fuzzy quantified statements (e.g., most of the albums that are recommended by an artist, are highly rated and have been created by a young friend of this artist). We performed some experiments in order to study the performances of this approach. The main objective of these experiments was to show that the extra cost due to the introduction of fuzziness remains limited/acceptable. We also investigated, in a more general framework, namely graph databases, the issue of integrating the same type of fuzzy quantified statements in a fuzzy extension of Cypher which is a declarative language for querying (crisp) graph databases. Some experimental results are reported and show that the extra cost induced by the fuzzy quantified nature of the queries also remains very limited.
|
29 |
Vizualizace RDF dat ve webových prohlížečích / RDF Data Visualization in Web BrowsersŠkrobánek, Kristián January 2021 (has links)
This diploma thesis focuses on graph database data visualization, where data is stored in RDF format. Standard visualisation of RDF data in tables does not offer sufficiently usable user view. One of the goals of this work is to show RDF data in interactive graph, which is ideal form of viewing data considering lucidity and information value. The graph gives good view of not only the data itself but also relationships between the data. Another goal is to test ability of browsers to visualize large amounts of data.
|
30 |
Indexing RDF data using materialized SPARQL queriesEspinola, Roger Humberto Castillo 10 September 2012 (has links)
In dieser Arbeit schlagen wir die Verwendung von materialisierten Anfragen als Indexstruktur für RDF-Daten vor. Wir streben eine Reduktion der Bearbeitungszeit durch die Minimierung der Anzahl der Vergleiche zwischen Anfrage und RDF Datenmenge an. Darüberhinaus betonen wir die Rolle von Kostenmodellen und Indizes für die Auswahl eines efizienten Ausführungsplans in Abhängigkeit vom Workload. Wir geben einen Überblick über das Problem der Auswahl von materialisierten Anfragen in relationalen Datenbanken und diskutieren ihre Anwendung zur Optimierung der Anfrageverarbeitung. Wir stellen RDFMatView als Framework für SPARQL-Anfragen vor. RDFMatView benutzt materializierte Anfragen als Indizes und enthalt Algorithmen, um geeignete Indizes fur eine gegebene Anfrage zu finden und sie in Ausführungspläne zu integrieren. Die Auswahl eines effizienten Ausführungsplan ist das zweite Thema dieser Arbeit. Wir führen drei verschiedene Kostenmodelle für die Verarbeitung von SPARQL Anfragen ein. Ein detaillierter Vergleich der Kostmodelle zeigt, dass ein auf Index-- und Prädikat--Statistiken beruhendes Modell die genauesten Informationen liefert, um einen effizienten Ausführungsplan auszuwählen. Die Evaluation zeigt, dass unsere Methode die Anfragebearbeitungszeit im Vergleich zu unoptimierten SPARQL--Anfragen um mehrere Größenordnungen reduziert. Schließlich schlagen wir eine einfache, aber effektive Strategie für das Problem der Auswahl von materialisierten Anfragen über RDF-Daten vor. Ausgehend von einem bestimmten Workload werden algorithmisch diejenigen Indizes augewählt, die die Bearbeitungszeit des gesamten Workload minimieren sollen. Dann erstellen wir auf der Basis von Anfragemustern eine Menge von Index--Kandidaten und suchen in dieser Menge Zusammenhangskomponenten. Unsere Auswertung zeigt, dass unsere Methode zur Auswahl von Indizes im Vergleich zu anderen, die größten Einsparungen in der Anfragebearbeitungszeit liefert. / In this thesis, we propose to use materialized queries as a special index structure for RDF data. We strive to reduce the query processing time by minimizing the number of comparisons between the query and the RDF dataset. We also emphasize the role of cost models in the selection of execution plans as well as index sets for a given workload. We provide an overview of the materialized view selection problem in relational databases and discuss its application for optimization of query processing. We introduce RDFMatView, a framework for answering SPARQL queries using materialized views as indexes. We provide algorithms to discover those indexes that can be used to process a given query and we develop different strategies to integrate these views in query execution plans. The selection of an efficient execution plan states the topic of our second major contribution. We introduce three different cost models designed for SPARQL query processing with materialized views. A detailed comparison of these models reveals that a model based on index and predicate statistics provides the most accurate cost estimation. We show that selecting an execution plan using this cost model yields a reduction of processing time with several orders of magnitude compared to standard SPARQL query processing. Finally, we propose a simple yet effective strategy for the materialized view selection problem applied to RDF data. Based on a given workload of SPARQL queries we provide algorithms for selecting a set of indexes that minimizes the workload processing time. We create a candidate index by retrieving all connected components from query patterns. Our evaluation shows that using the set of suggested indexes usually achieves larger runtime savings than other index sets regarding the given workload.
|
Page generated in 0.037 seconds