Spelling suggestions: "subject:"graph databases"" "subject:"graph atabases""
21 |
Návrh postupu tvorby aplikace pro Linked Open Data / The proposal of application development process for Linked Open DataBudka, Michal January 2014 (has links)
This thesis deals with the issue of Linked Open Data. The goal of this thesis is to introduce the reader to this issue as a whole and to the possibility of using Linked Open Data for developing useful applications by proposing a new development process focusing on such applications. The theoretical part offers an insight into the issue of Open Data, Linked Open Data and the NoSQL database systems and their usability in this field. It focuses mainly on graph database systems and compares them with relational database systems using predefined criteria. Additionally, the goal of this thesis is to develop an application using the proposed development process, which provides a tool for data presentation and statistical visualisation for open data sets published by the Supreme Audit Office and the Czech Trade Inspection. The application is mainly developed for the purpose of verifying the proposed development process and to demonstrate the connectivity of open data published by two different organizations.The thesis includes the process of selecting a development methodology, which is then used for optimising work on the implementation of the resulting application and the process of selecting a graph database system, that is used to store and modify open data for the purposes of the application. Read more
|
22 |
Why-Query Support in Graph DatabasesVasilyeva, Elena 08 November 2016 (has links)
In the last few decades, database management systems became powerful tools for storing large amount of data and executing complex queries over them. In addition to extended functionality, novel types of databases appear like triple stores, distributed databases, etc. Graph databases implementing the property-graph model belong to this development branch and provide a new way for storing and processing data in the form of a graph with nodes representing some entities and edges describing connections between them. This consideration makes them suitable for keeping data without a rigid schema for use cases like social-network processing or data integration. In addition to a flexible storage, graph databases provide new querying possibilities in the form of path queries, detection of connected components, pattern matching, etc.
However, the schema flexibility and graph queries come with additional costs. With limited knowledge about data and little experience in constructing the complex queries, users can create such ones, which deliver unexpected results. Forced to debug queries manually and overwhelmed by the amount of query constraints, users can get frustrated by using graph databases. What is really needed, is to improve usability of graph databases by providing debugging and explaining functionality for such situations. We have to assist users in the discovery of what were the reasons of unexpected results and what can be done in order to fix them.
The unexpectedness of result sets can be expressed in terms of their size or content. In the first case, users have to solve the empty-answer, too-many-, or too-few-answers problems. In the second case, users care about the result content and miss some expected answers or wonder about presence of some unexpected ones. Considering the typical problems of receiving no or too many results by querying graph databases, in this thesis we focus on investigating the problems of the first group, whose solutions are usually represented by why-empty, why-so-few, and why-so-many queries. Our objective is to extend graph databases with debugging functionality in the form of why-queries for unexpected query results on the example of pattern matching queries, which are one of general graph-query types. We present a comprehensive analysis of existing debugging tools in the state-of-the-art research and identify their common properties.
From them, we formulate the following features of why-queries, which we discuss in this thesis, namely: holistic support of different cardinality-based problems, explanation of unexpected results and query reformulation, comprehensive analysis of explanations, and non-intrusive user integration. To support different cardinality-based problems, we develop methods for explaining no, too few, and too many results. To cover different kinds of explanations, we present two types: subgraph- and modification-based explanations. The first type identifies the reasons of unexpectedness in terms of query subgraphs and delivers differential graphs as answers. The second one reformulates queries in such a way that they produce better results. Considering graph queries to be complex structures with multiple constraints, we investigate different ways of generating explanations starting from the most general one that considers only a query topology through coarse-grained rewriting up to fine-grained modification that allows fine changes of predicates and topology. To provide a comprehensive analysis of explanations, we propose to compare them on three levels including a syntactic description, a content, and a size of a result set. In order to deliver user-aware explanations, we discuss two models for non-intrusive user integration in the generation process.
With the techniques proposed in this thesis, we are able to provide fundamentals for debugging of pattern-matching queries, which deliver no, too few, or too many results, in graph databases implementing the property-graph model. Read more
|
23 |
SynopSys: Foundations for Multidimensional Graph AnalyticsRudolf, Michael, Voigt, Hannes, Bornhövd, Christof, Lehner, Wolfgang 02 February 2023 (has links)
The past few years have seen a tremendous increase in often irregularly structured data that can be represented most naturally and efficiently in the form of graphs. Making sense of incessantly growing graphs is not only a key requirement in applications like social media analysis or fraud detection but also a necessity in many traditional enterprise scenarios. Thus, a flexible approach for multidimensional analysis of graph data is needed. Whereas many existing technologies require up-front modelling of analytical scenarios and are difficult to adapt to changes, our approach allows for ad-hoc analytical queries of graph data. Extending our previous work on graph summarization, in this position paper we lay the foundation for large graph analytics to enable business intelligence on graph-structured data.
|
24 |
Top-k Differential Queries in Graph DatabasesVasilyeva, Elena, Thiele, Maik, Bornhövd, Christof, Lehner, Wolfgang 03 February 2023 (has links)
The sheer volume as well as the schema complexity of today’s graph databases impede the users in formulating queries against these databases and often cause queries to “fail” by delivering empty answers. To support users in such situations, the concept of differential queries can be used to bridge the gap between an unexpected result (e.g. an empty result set) and the query intention of users. These queries deliver missing parts of a query graph and, therefore, work with such scenarios that require users to specify a query graph. Based on the discovered information about a missing query subgraph, users may understand which vertices and edges are the reasons for queries that unexpectedly return empty answers, and thus can reformulate the queries if needed. A study showed that the result sets of differential queries are often too large to be manually introspected by users and thus a reduction of the number of results and their ranking is required. To address these issues, we extend the concept of differential queries and introduce top-k differential queries that calculate the ranking based on users’ preferences and therefore significantly support the users’ understanding of query database management systems. The idea consists of assigning relevance weights to vertices or edges of a query graph by users that steer the graph search and are used in the scoring function for top-k differential results. Along with the novel concept of the top-k differential queries, we further propose a strategy for propagating relevance weights and we model the search along the most relevant paths. Read more
|
25 |
Graphdatenbanken für die textorientierten e-HumanitiesEfer, Thomas 15 February 2017 (has links) (PDF)
Vor dem Hintergrund zahlreicher Digitalisierungsinitiativen befinden sich weite Teile der Geistes- und Sozialwissenschaften derzeit in einer Transition hin zur großflächigen Anwendung digitaler Methoden. Zwischen den Fachdisziplinen und der Informatik zeigen sich große Differenzen in der Methodik und bei der gemeinsamen Kommunikation. Diese durch interdisziplinäre Projektarbeit zu überbrücken, ist das zentrale Anliegen der sogenannten e-Humanities. Da Text der häufigste Untersuchungsgegenstand in diesem Feld ist, wurden bereits viele Verfahren des Text Mining auf Problemstellungen der Fächer angepasst und angewendet. Während sich langsam generelle Arbeitsabläufe und Best Practices etablieren, zeigt sich, dass generische Lösungen für spezifische Teilprobleme oftmals nicht geeignet sind. Um für diese Anwendungsfälle maßgeschneiderte digitale Werkzeuge erstellen zu können, ist eines der Kernprobleme die adäquate digitale Repräsentation von Text sowie seinen vielen Kontexten und Bezügen.
In dieser Arbeit wird eine neue Form der Textrepräsentation vorgestellt, die auf Property-Graph-Datenbanken beruht – einer aktuellen Technologie für die Speicherung und Abfrage hochverknüpfter Daten. Darauf aufbauend wird das Textrecherchesystem „Kadmos“ vorgestellt, mit welchem nutzerdefinierte asynchrone Webservices erstellt werden können. Es bietet flexible Möglichkeiten zur Erweiterung des Datenmodells und der Programmfunktionalität und kann Textsammlungen mit mehreren hundert Millionen Wörtern auf einzelnen Rechnern und weitaus größere in Rechnerclustern speichern. Es wird gezeigt, wie verschiedene Text-Mining-Verfahren über diese Graphrepräsentation realisiert und an sie angepasst werden können. Die feine Granularität der Zugriffsebene erlaubt die Erstellung passender Werkzeuge für spezifische fachwissenschaftliche Anwendungen. Zusätzlich wird demonstriert, wie die graphbasierte Modellierung auch über die rein textorientierte Forschung hinaus gewinnbringend eingesetzt werden kann. / In light of the recent massive digitization efforts, most of the humanities disciplines are currently undergoing a fundamental transition towards the widespread application of digital methods. In between those traditional scholarly fields and computer science exists a methodological and communicational gap, that the so-called \\\"e-Humanities\\\" aim to bridge systematically, via interdisciplinary project work. With text being the most common object of study in this field, many approaches from the area of Text Mining have been adapted to problems of the disciplines. While common workflows and best practices slowly emerge, it is evident that generic solutions are no ultimate fit for many specific application scenarios. To be able to create custom-tailored digital tools, one of the central issues is to digitally represent the text, as well as its many contexts and related objects of interest in an adequate manner.
This thesis introduces a novel form of text representation that is based on Property Graph databases – an emerging technology that is used to store and query highly interconnected data sets. Based on this modeling paradigm, a new text research system called \\\"Kadmos\\\" is introduced. It provides user-definable asynchronous web services and is built to allow for a flexible extension of the data model and system functionality within a prototype-driven development process. With Kadmos it is possible to easily scale up to text collections containing hundreds of millions of words on a single device and even further when using a machine cluster. It is shown how various methods of Text Mining can be implemented with and adapted for the graph representation at a very fine granularity level, allowing the creation of fitting digital tools for different aspects of scholarly work. In extended usage scenarios it is demonstrated how the graph-based modeling of domain data can be beneficial even in research scenarios that go beyond a purely text-based study. Read more
|
26 |
Supporting device-to-device search and sharing of hyper-localized dataMichel, Jonas Reinhardt 08 September 2015 (has links)
Supporting emerging mobile applications in densely populated environments requires connecting mobile users and their devices with the surrounding digital landscape. Specifically, the volume of digitally-available data in such computing spaces presents an imminent need for expressive mechanisms that enable humans and applications to share and search for relevant information within their digitally accessible physical surroundings. Device-to-device communications will play a critical role in facilitating transparent access to proximate digital resources. A wide variety of approaches exist that support device-to-device dissemination and query-driven data access. Very few, however, capitalize on the contextual history of the shared data itself to distribute additional data or to guide queries. This dissertation presents Gander, an application substrate and mobile middleware designed to ease the burden associated with creating applications that require support for sharing and searching of hyper-localized data in situ. Gander employs a novel trajectory-driven model of spatiotemporal provenance that enriches shared data with its contextual history -- annotations that capture data's geospatial and causal history across a lifetime of device-to-device propagation. We demonstrate the value of spatiotemporal data provenance as both a tool for improving ad hoc routing performance and for driving complex application behavior. This dissertation discusses the design and implementation of Gander's middleware model, which abstracts away tedious implementation details by enabling developers to write high-level rules that govern when, where, and how data is distributed and to execute expressive queries across proximate digital resources. We evaluate Gander within several simulated large-scale environments and one real-world deployment on the UT Austin campus. The goal of this research is to provide formal constructs realized within a software framework that ease the software engineering challenges encountered during the design and deployment of several applications in emerging mobile environments. / text Read more
|
27 |
Contribution à la mise en oeuvre d’un outillage unifié pour faciliter la qualification d’environnements normés / Toward a unified tooling to ease the qualification process of standardized environmentsGelibert, Anthony 27 October 2016 (has links)
Les environnements confinés, tels que les blocs chirurgicaux ou les salles blanches, hébergent des processus complexes auxquels sont associés de nombreux risques. Leur conception, leur réalisation et leur exploitation sont complexes, de par les très nombreuses normes les encadrant.La qualification de ces « environnements normés », afin d’en garantir la qualité de conception, requiert une expertise fine du métier et souffre du manque d’outil en permettant l’automatisation. Partant de ce constat, nous proposons une approche unifiée visant à faciliter la qualification des environnements normés. Celle-ci s’appuie sur une représentation du contexte normatif sous la forme d’un graphe unique, ainsi que sur une modélisation de l’environnement et son objet final par étapes successives permettant une vérification incrémentale de même que la production d’informations nécessaires à la traçabilité lors de l’exploitation. Cette démarche, illustrée au travers du domaine des environnements confinés médicaux, est générique et peut s’appliquer à l’ensemble des environnements normés. / Industrial clean rooms or operating rooms are critical places often hosting dangerous or complex processes. Their design, building and use are thus difficult and constrained by a large amount of standards and rules. Qualifying these environments, in order to ensure their quality, consequently requires a high level of expertise and lacks assisting tools.This leads us to propose a unified approach aiming at easing the qualification process of standardized environments. It relies on a graph-based representation of the set of standards and rules that apply to a specific case, as well as on step-by-step modelling of the whole target environment. The verification process is then eased as it becomes incremental. During each stage, relevant information can also be gathered in order to ensure environment traceability during its use.This approach, applied to medical environments for validation purposes, remains generic and can be applied to any kind of standardized environment. Read more
|
28 |
Flexible querying of RDF databases : a contribution based on fuzzy logic / Interrogation flexible de bases de données RDF : une contribution basée sur la logique floueSlama, Olfa 22 November 2017 (has links)
Cette thèse porte sur la définition d'une approche flexible pour interroger des graphes RDF à la fois classiques et flous. Cette approche, basée sur la théorie des ensembles flous, permet d'étendre SPARQL qui est le langage de requête standardisé W3C pour RDF, de manière à pouvoir exprimer i) des préférences utilisateur floues sur les données (par exemple, l'année de publication d'un album est récente) et sur la structure du graphe (par exemple, le chemin entre deux amis doit être court) et ii) des préférences utilisateur plus complexes, prenant la forme de propositions quantifiées floues (par exemple, la plupart des albums qui sont recommandés par un artiste, sont très bien notés et ont été créés par un jeune ami de cet artiste). Nous avons effectué des expérimentations afin d'étudier les performances de cette approche. L'objectif principal de ces expérimentations était de montrer que le coût supplémentaire dû à l'introduction du flou reste limité/acceptable. Nous avons également étudié, dans un cadre plus général, celui de bases de données graphe, la question de l'intégration du même type de propositions quantifiées floues dans une extension floue de Cypher qui est un langage déclaratif pour l'interrogation des bases de données graphe classiques. Les résultats expérimentaux obtenus montrent que le coût supplémentaire induit par la présence de conditions quantifiées floues dans les requêtes reste également très limité dans ce cas. / This thesis concerns the definition of a flexible approach for querying both crisp and fuzzy RDF graphs. This approach, based on the theory of fuzzy sets, makes it possible to extend SPARQL which is the W3C-standardised query language for RDF, so as to be able to express i) fuzzy user preferences on data (e.g., the release year of an album is recent) and on the structure of the data graph (e.g., the path between two friends is required to be short) and ii) more complex user preferences, namely, fuzzy quantified statements (e.g., most of the albums that are recommended by an artist, are highly rated and have been created by a young friend of this artist). We performed some experiments in order to study the performances of this approach. The main objective of these experiments was to show that the extra cost due to the introduction of fuzziness remains limited/acceptable. We also investigated, in a more general framework, namely graph databases, the issue of integrating the same type of fuzzy quantified statements in a fuzzy extension of Cypher which is a declarative language for querying (crisp) graph databases. Some experimental results are reported and show that the extra cost induced by the fuzzy quantified nature of the queries also remains very limited. Read more
|
29 |
Vizualizace RDF dat ve webových prohlížečích / RDF Data Visualization in Web BrowsersŠkrobánek, Kristián January 2021 (has links)
This diploma thesis focuses on graph database data visualization, where data is stored in RDF format. Standard visualisation of RDF data in tables does not offer sufficiently usable user view. One of the goals of this work is to show RDF data in interactive graph, which is ideal form of viewing data considering lucidity and information value. The graph gives good view of not only the data itself but also relationships between the data. Another goal is to test ability of browsers to visualize large amounts of data.
|
30 |
Semantically-enabled stream processing and complex event processing over RDF graph streams / Traitement de flux sémantiquement activé et traitement d'évènements complexes sur des flux de graphe RDFGillani, Syed 04 November 2016 (has links)
Résumé en français non fourni par l'auteur. / There is a paradigm shift in the nature and processing means of today’s data: data are used to being mostly static and stored in large databases to be queried. Today, with the advent of new applications and means of collecting data, most applications on the Web and in enterprises produce data in a continuous manner under the form of streams. Thus, the users of these applications expect to process a large volume of data with fresh low latency results. This has resulted in the introduction of Data Stream Processing Systems (DSMSs) and a Complex Event Processing (CEP) paradigm – both with distinctive aims: DSMSs are mostly employed to process traditional query operators (mostly stateless), while CEP systems focus on temporal pattern matching (stateful operators) to detect changes in the data that can be thought of as events. In the past decade or so, a number of scalable and performance intensive DSMSs and CEP systems have been proposed. Most of them, however, are based on the relational data models – which begs the question for the support of heterogeneous data sources, i.e., variety of the data. Work in RDF stream processing (RSP) systems partly addresses the challenge of variety by promoting the RDF data model. Nonetheless, challenges like volume and velocity are overlooked by existing approaches. These challenges require customised optimisations which consider RDF as a first class citizen and scale the processof continuous graph pattern matching. To gain insights into these problems, this thesis focuses on developing scalable RDF graph stream processing, and semantically-enabled CEP systems (i.e., Semantic Complex Event Processing, SCEP). In addition to our optimised algorithmic and data structure methodologies, we also contribute to the design of a new query language for SCEP. Our contributions in these two fields are as follows: • RDF Graph Stream Processing. We first propose an RDF graph stream model, where each data item/event within streams is comprised of an RDF graph (a set of RDF triples). Second, we implement customised indexing techniques and data structures to continuously process RDF graph streams in an incremental manner. • Semantic Complex Event Processing. We extend the idea of RDF graph stream processing to enable SCEP over such RDF graph streams, i.e., temporalpattern matching. Our first contribution in this context is to provide a new querylanguage that encompasses the RDF graph stream model and employs a set of expressive temporal operators such as sequencing, kleene-+, negation, optional,conjunction, disjunction and event selection strategies. Based on this, we implement a scalable system that employs a non-deterministic finite automata model to evaluate these operators in an optimised manner. We leverage techniques from diverse fields, such as relational query optimisations, incremental query processing, sensor and social networks in order to solve real-world problems. We have applied our proposed techniques to a wide range of real-world and synthetic datasets to extract the knowledge from RDF structured data in motion. Our experimental evaluations confirm our theoretical insights, and demonstrate the viability of our proposed methods Read more
|
Page generated in 0.4125 seconds