Global ETD Search

11	Graph Processing in Main-Memory Column Stores Paradies, Marcus 29 May 2017 (has links) (PDF) Evermore, novel and traditional business applications leverage the advantages of a graph data model, such as the offered schema flexibility and an explicit representation of relationships between entities. As a consequence, companies are confronted with the challenge of storing, manipulating, and querying terabytes of graph data for enterprise-critical applications. Although these business applications operate on graph-structured data, they still require direct access to the relational data and typically rely on an RDBMS to keep a single source of truth and access. Existing solutions performing graph operations on business-critical data either use a combination of SQL and application logic or employ a graph data management system. For the first approach, relying solely on SQL results in poor execution performance caused by the functional mismatch between typical graph operations and the relational algebra. To the worse, graph algorithms expose a tremendous variety in structure and functionality caused by their often domain-specific implementations and therefore can be hardly integrated into a database management system other than with custom coding. Since the majority of these enterprise-critical applications exclusively run on relational DBMSs, employing a specialized system for storing and processing graph data is typically not sensible. Besides the maintenance overhead for keeping the systems in sync, combining graph and relational operations is hard to realize as it requires data transfer across system boundaries. A basic ingredient of graph queries and algorithms are traversal operations and are a fundamental component of any database management system that aims at storing, manipulating, and querying graph data. Well-established graph traversal algorithms are standalone implementations relying on optimized data structures. The integration of graph traversals as an operator into a database management system requires a tight integration into the existing database environment and a development of new components, such as a graph topology-aware optimizer and accompanying graph statistics, graph-specific secondary index structures to speedup traversals, and an accompanying graph query language. In this thesis, we introduce and describe GRAPHITE, a hybrid graph-relational data management system. GRAPHITE is a performance-oriented graph data management system as part of an RDBMS allowing to seamlessly combine processing of graph data with relational data in the same system. We propose a columnar storage representation for graph data to leverage the already existing and mature data management and query processing infrastructure of relational database management systems. At the core of GRAPHITE we propose an execution engine solely based on set operations and graph traversals. Our design is driven by the observation that different graph topologies expose different algorithmic requirements to the design of a graph traversal operator. We derive two graph traversal implementations targeting the most common graph topologies and demonstrate how graph-specific statistics can be leveraged to select the optimal physical traversal operator. To accelerate graph traversals, we devise a set of graph-specific, updateable secondary index structures to improve the performance of vertex neighborhood expansion. Finally, we introduce a domain-specific language with an intuitive programming model to extend graph traversals with custom application logic at runtime. We use the LLVM compiler framework to generate efficient code that tightly integrates the user-specified application logic with our highly optimized built-in graph traversal operators. Our experimental evaluation shows that GRAPHITE can outperform native graph management systems by several orders of magnitude while providing all the features of an RDBMS, such as transaction support, backup and recovery, security and user management, effectively providing a promising alternative to specialized graph management systems that lack many of these features and require expensive data replication and maintenance processes. Graphdatenverarbeitung Spalten-orientierte Datenbanksysteme Graph Traversierung Graph data processing column stores graph traversals ddc:004 rvk:ST 270
12	Why-Query Support in Graph Databases Vasilyeva, Elena 28 March 2017 (has links) (PDF) In the last few decades, database management systems became powerful tools for storing large amount of data and executing complex queries over them. In addition to extended functionality, novel types of databases appear like triple stores, distributed databases, etc. Graph databases implementing the property-graph model belong to this development branch and provide a new way for storing and processing data in the form of a graph with nodes representing some entities and edges describing connections between them. This consideration makes them suitable for keeping data without a rigid schema for use cases like social-network processing or data integration. In addition to a flexible storage, graph databases provide new querying possibilities in the form of path queries, detection of connected components, pattern matching, etc. However, the schema flexibility and graph queries come with additional costs. With limited knowledge about data and little experience in constructing the complex queries, users can create such ones, which deliver unexpected results. Forced to debug queries manually and overwhelmed by the amount of query constraints, users can get frustrated by using graph databases. What is really needed, is to improve usability of graph databases by providing debugging and explaining functionality for such situations. We have to assist users in the discovery of what were the reasons of unexpected results and what can be done in order to fix them. The unexpectedness of result sets can be expressed in terms of their size or content. In the first case, users have to solve the empty-answer, too-many-, or too-few-answers problems. In the second case, users care about the result content and miss some expected answers or wonder about presence of some unexpected ones. Considering the typical problems of receiving no or too many results by querying graph databases, in this thesis we focus on investigating the problems of the first group, whose solutions are usually represented by why-empty, why-so-few, and why-so-many queries. Our objective is to extend graph databases with debugging functionality in the form of why-queries for unexpected query results on the example of pattern matching queries, which are one of general graph-query types. We present a comprehensive analysis of existing debugging tools in the state-of-the-art research and identify their common properties. From them, we formulate the following features of why-queries, which we discuss in this thesis, namely: holistic support of different cardinality-based problems, explanation of unexpected results and query reformulation, comprehensive analysis of explanations, and non-intrusive user integration. To support different cardinality-based problems, we develop methods for explaining no, too few, and too many results. To cover different kinds of explanations, we present two types: subgraph- and modification-based explanations. The first type identifies the reasons of unexpectedness in terms of query subgraphs and delivers differential graphs as answers. The second one reformulates queries in such a way that they produce better results. Considering graph queries to be complex structures with multiple constraints, we investigate different ways of generating explanations starting from the most general one that considers only a query topology through coarse-grained rewriting up to fine-grained modification that allows fine changes of predicates and topology. To provide a comprehensive analysis of explanations, we propose to compare them on three levels including a syntactic description, a content, and a size of a result set. In order to deliver user-aware explanations, we discuss two models for non-intrusive user integration in the generation process. With the techniques proposed in this thesis, we are able to provide fundamentals for debugging of pattern-matching queries, which deliver no, too few, or too many results, in graph databases implementing the property-graph model. Graph Datenbanken Anfragebearbeitung Graph databases pattern matching empty-answer problem why-queries ddc:004 rvk:ST 265 rvk:ST 270
13	Effiziente Schemamigration in der modellgetriebenen Datenbankanwendungsentwicklung Claußnitzer, Ralf 30 May 2008 (has links) (PDF) Unter dem Terminus der MDA (Model Driven Architecture)versteht man eine Methode, Anwendungen im Rahmen der UML zu spezifizieren und ablauffähigen Programm-Code durch automatische Generierung zu erzeugen. Am Lehrstuhl für Datenbanken existiert in diesem Zusammenhang das GignoMDA-Projekt, daß sich mit der modellgetriebenen Entwicklung von Datenbankenanwendungen beschäftigt. Als wesentlicher Bestandteil der jeweiligen Anwendung sind Datenmodelle jedoch, genau wie die Anwendungsarchitektur selbst, Anpassungen an sich veränderte Zielstellungen und Umgebungsbedingungen unterworfen. Es stellt sich also die Notwendigkeit der Überführung von Datenbeständen in neu generierte Zielsysteme, als Bestandteil eines vollständig modellgetriebenen Ansatzes dar. Diese Arbeit stellt ein Konzept zur Schema- und Datenmigration bei der Weiterentwicklung der Anwendungs-Datenbankmodelle vor. Dabei werden Datenmigrationen, gemäß dem MDA-Ansatz, als Modell in UML ausgedrückt und anschließend zur automatischen Erzeugung von plattformabhängigen Migrationsmodellen genutzt. Aus diesen Migrationsmodellen können so, Datenbanktechnik basierte Programme (ETL, Stored Procedures) zur effizienten Ausführung von Migrationen generiert werden. Datenbank Schemamigration MDA Modell GignoMDA ETL Migration Database Schemamigration MDA Model GignoMDA ETL Migration ddc:004 rvk:ST 270
14	Probabilistic information retrieval in a distributed heterogeneous environment Baumgarten, Christoph 01 October 1999 (has links) (PDF) This thesis describes a probabilistic model for optimum information retrieval in a distributed heterogeneous environment. The model assumes the collection of documents offered by the environment to be hierarchically partitioned into subcollections. Documents as well as subcollections have to be indexed. At this indexing methods using different indexing vocabularies can be employed. A query provided by a user is answered in terms of a ranked list of documents. The model determines a procedure for ranking the documents that stems from the Probability Ranking Principle: For each subcollection the subcollection´s elements are ranked; the resulting ranked lists are combined into a final ranked list of documents where the ordering is determined by the documents´ probabilities of being relevant with respect to the user´s query. Various probabilistic ranking methods may be involved in the distributed ranking process. The underlying data volume is arbitrarily scalable. A criterion for effectively limiting the ranking process to a subset of subcollections extends the model. The model´s applicability is experimentally confirmed. When exploiting the degrees of freedom provided by the model experiments showed evidence that the model even outperforms comparable models for the non-distributed case with respect to retrieval effectiveness. An architecture for a distributed information retrieval system is presented that realizes the probabilistic model. The system provides access to an arbitrary number of dynamic multimedia databases. nicht angegeben nicht angegeben ddc:28 rvk:ST 270 Abfrage Heterogenes Rechnernetz Information retrieval Metadaten Stochastisches Modell Verteiltes System
15	Predictive Data Analytics for Energy Demand Flexibility Neupane, Bijay 12 June 2018 (has links) (PDF) The depleting fossil fuel and environmental concerns have created a revolutionary movement towards the installation and utilization of Renewable Energy Sources (RES) such as wind and solar energy. The RES entails challenges, both in regards to the physical integration into a grid system and regarding management of the expected demand. The flexibility in energy demand can facilitate the alignment of the supply and demand to achieve a dynamic Demand Response (DR). The flexibility is often not explicitly available or provided by a user and has to be analyzed and extracted automatically from historical consumption data. The predictive analytics of consumption data can reveal interesting patterns and periodicities that facilitate the effective extraction and representation of flexibility. The device-level analysis captures the atomic flexibilities in energy demand and provides the largest possible solution space to generate demand/supply schedules. The presence of stochasticity and noise in the device-level consumption data and the unavailability of contextual information makes the analytics task challenging. Hence, it is essential to design predictive analytical techniques that work at an atomic data granularity and perform various analyses on the effectiveness of the proposed techniques. The Ph.D. study is sponsored by the TotalFlex Project (http://www.totalflex.dk/) and is part of the IT4BI-DC program with Aalborg University and TU Dresden as Home and Host University, respectively. The main objective of the TotalFlex project is to develop a cost-effective, market-based system that utilizes total flexibility in energy demand, and provide financial and environmental benefits to all involved parties. The flexibilities from various devices are modeled using a unified format called a flex-offer, which facilitates, e.g., aggregation and trading in the energy market. In this regards, this Ph.D. study focuses on the predictive analytics of the historical device operation behavior of consumers for an efficient and effective extraction of flexibilities in their energy demands. First, the thesis performs a comprehensive survey of state-of-the-art work in the literature. It presents a critical review and analysis of various previously proposed approaches, algorithms, and methods in the field of user behavior analysis, forecasting, and flexibility analysis. Then, the thesis details the flexibility and flex-offer concepts and formally discusses the terminologies used throughout the thesis. Second, the thesis contributes to a comprehensive analysis of energy consumption behavior at the device-level. The key motive of the analysis is to extract device operation patterns of users, the correlation between devices operations, and influence of external factors in device-level demands. A novel cost/benefit trade-off analysis of device flexibility is performed to categorize devices into various segments according to their flexibility potential. Moreover, device-specific data preprocessing steps are proposed to clean device-level raw data into a format suitable for flexibility analysis. Third, the thesis presents various prediction models that are specifically tuned for device-level energy demand prediction. Further, it contributes to the feature engineering aspect of generating additional features from a demand consumption timeseries that effectively capture device operation preferences and patterns. The demand predictions utilize the carefully crafted features and other contextual information to improve the performance of the prediction models. Further, various demand prediction models are evaluated to determine the model, forecast horizon, and data granularity best suited for the device-level flexibility analysis. Furthermore, the effect of the forecast accuracy on flexibility-based DR is evaluated to identify an error level a market can absorb maintaining profitability. Fourth, the thesis proposes a generalized process for automated generation and evaluation of flex-offers from the three types of household devices, namely Wet-devices, Electric Vehicles (EV), and Heat Pumps. The proposed process automatically predicts and estimates times and values of device-specific events representing flexibility in its operations. The predicted events are combined to generate flex-offers for the device future operations. Moreover, the actual flexibility potential of household devices is quantified for various contextual conditions and degree days. Fifth, the thesis presents user-comfort oriented prescriptive techniques to prescribe flex-offers schedules. The proposed scheduler considers the trade-off between both social and financial aspects during scheduling of flex-offers, i.e., maximizing the financial benefits in a market and at the same time minimizing the loss of user comfort. Moreover, it also provides a distance-aware error measure that quantifies the actual performance of forecast models designed for flex-offers generation and scheduling. Sixth, the thesis contributes to the comprehensive analysis of the financial viability of device-level flexibility for dynamic balancing of demand and supply. The thesis quantifies the financial benefits of flexibility and investigates the device type specific market that maximizes the potential of flexibility, both regarding DR and financial incentives. Henceforth, a financial analysis of each proposed technique, namely forecast model, flex-offer generation model, and flex-offer scheduling is performed. The key motive is to evaluate the usability of the proposed models in the device-level flexibility based DR scheme and their potential in generating a positive financial incentive to markets and customers. Seven, the thesis presents a benchmark platform for device-level demand prediction. The platform provides the research community with a centralized repository of device-level datasets, forecast models, and functionalities that facilitate comparisons, evaluations, and validation of device-level forecast models. The results of the thesis can contribute to the energy market in materializing the vision of utilizing consumption and production flexibility to obtain dynamic energy balance. The developed demand forecast and flex-offer generation models also contribute to the energy data analytics and data mining fields. The quantification of flexibility further contributes by demonstrating the feasibility and financial benefits of flexibility-based DR. The developed experimental platform provide researchers and practitioners with the resources required for device-level demand analytics and prediction. Predictive Analytics Demand Response Energieflexibilität Predictive Analytics Demand Response Demand Flexibility ddc:004 rvk:ST 270 rvk:ST 620
16	Query Answering in Probabilistic Data and Knowledge Bases Ceylan, Ismail Ilkan 04 June 2018 (has links) (PDF) Probabilistic data and knowledge bases are becoming increasingly important in academia and industry. They are continuously extended with new data, powered by modern information extraction tools that associate probabilities with knowledge base facts. The state of the art to store and process such data is founded on probabilistic database systems, which are widely and successfully employed. Beyond all the success stories, however, such systems still lack the fundamental machinery to convey some of the valuable knowledge hidden in them to the end user, which limits their potential applications in practice. In particular, in their classical form, such systems are typically based on strong, unrealistic limitations, such as the closed-world assumption, the closed-domain assumption, the tuple-independence assumption, and the lack of commonsense knowledge. These limitations do not only lead to unwanted consequences, but also put such systems on weak footing in important tasks, querying answering being a very central one. In this thesis, we enhance probabilistic data and knowledge bases with more realistic data models, thereby allowing for better means for querying them. Building on the long endeavor of unifying logic and probability, we develop different rigorous semantics for probabilistic data and knowledge bases, analyze their computational properties and identify sources of (in)tractability and design practical scalable query answering algorithms whenever possible. To achieve this, the current work brings together some recent paradigms from logics, probabilistic inference, and database theory. Logik Probabilistische Datenbanken Probabilistische Wissensbasen Anfragebeantwortung Komplexität Logic Probabilistic Databases Probabilistic Knowledge Bases Query Answering Complexity ddc:004 rvk:ST 270
17	A Family of Role-Based Languages Kühn, Thomas 29 August 2017 (has links) (PDF) Role-based modeling has been proposed in 1977 by Charles W. Bachman, as a means to model complex and dynamic domains, because roles are able to capture both context-dependent and collaborative behavior of objects. Consequently, they were introduced in various fields of research ranging from data modeling via conceptual modeling through to programming languages. More importantly, because current software systems are characterized by increased complexity and context-dependence, there is a strong demand for new concepts beyond object-oriented design. Although mainstream modeling languages, i.e., Entity-Relationship Model, Unified Modeling Language, are good at capturing a system's structure, they lack ways to model the system's behavior, as it dynamically emerges through collaborating objects. In turn, roles are a natural concept capturing the behavior of participants in a collaboration. Moreover, roles permit the specification of interactions independent from the interacting objects. Similarly, more recent approaches use roles to capture context-dependent properties of objects. The notion of roles can help to tame the increased complexity and context-dependence. Despite all that, these years of research had almost no influence on current software development practice. To make things worse, until now there is no common understanding of roles in the research community and no approach fully incorporates both the context-dependent and the relational nature of roles. In this thesis, I will devise a formal model for a family of role-based modeling languages to capture the various notions of roles. Together with a software product line of Role Modeling Editors, this, in turn, enables the generation of a role-based language family for Role-based Software Infrastructures (RoSI). Rollenbasierte Modellierung Sprachen Rollen Modellierung Software role-based modeling language roles modeling software ddc:004 rvk:ST 265 rvk:ST 270
18	Recovering the Semantics of Tabular Web Data Braunschweig, Katrin 26 October 2015 (has links) (PDF) The Web provides a platform for people to share their data, leading to an abundance of accessible information. In recent years, significant research effort has been directed especially at tables on the Web, which form a rich resource for factual and relational data. Applications such as fact search and knowledge base construction benefit from this data, as it is often less ambiguous than unstructured text. However, many traditional information extraction and retrieval techniques are not well suited for Web tables, as they generally do not consider the role of the table structure in reflecting the semantics of the content. Tables provide a compact representation of similarly structured data. Yet, on the Web, tables are very heterogeneous, often with ambiguous semantics and inconsistencies in the quality of the data. Consequently, recognizing the structure and inferring the semantics of these tables is a challenging task that requires a designated table recovery and understanding process. In the literature, many important contributions have been made to implement such a table understanding process that specifically targets Web tables, addressing tasks such as table detection or header recovery. However, the precision and coverage of the data extracted from Web tables is often still quite limited. Due to the complexity of Web table understanding, many techniques developed so far make simplifying assumptions about the table layout or content to limit the amount of contributing factors that must be considered. Thanks to these assumptions, many sub-tasks become manageable. However, the resulting algorithms and techniques often have a limited scope, leading to imprecise or inaccurate results when applied to tables that do not conform to these assumptions. In this thesis, our objective is to extend the Web table understanding process with techniques that enable some of these assumptions to be relaxed, thus improving the scope and accuracy. We have conducted a comprehensive analysis of tables available on the Web to examine the characteristic features of these tables, but also identify unique challenges that arise from these characteristics in the table understanding process. To extend the scope of the table understanding process, we introduce extensions to the sub-tasks of table classification and conceptualization. First, we review various table layouts and evaluate alternative approaches to incorporate layout classification into the process. Instead of assuming a single, uniform layout across all tables, recognizing different table layouts enables a wide range of tables to be analyzed in a more accurate and systematic fashion. In addition to the layout, we also consider the conceptual level. To relax the single concept assumption, which expects all attributes in a table to describe the same semantic concept, we propose a semantic normalization approach. By decomposing multi-concept tables into several single-concept tables, we further extend the range of Web tables that can be processed correctly, enabling existing techniques to be applied without significant changes. Furthermore, we address the quality of data extracted from Web tables, by studying the role of context information. Supplementary information from the context is often required to correctly understand the table content, however, the verbosity of the surrounding text can also mislead any table relevance decisions. We first propose a selection algorithm to evaluate the relevance of context information with respect to the table content in order to reduce the noise. Then, we introduce a set of extraction techniques to recover attribute-specific information from the relevant context in order to provide a richer description of the table content. With the extensions proposed in this thesis, we increase the scope and accuracy of Web table understanding, leading to a better utilization of the information contained in tables on the Web. Webtabellen semantische Analyse strukturierte Daten Web tables table understanding semantics ddc:004 rvk:ST 252 rvk:ST 265 rvk:ST 270
19	Linked Enterprise Data als semantischer, integrierter Informationsraum für die industrielle Datenhaltung / Linked Enterprise Data as semantic and integrated information space for industrial data Graube, Markus 01 June 2018 (has links) (PDF) Zunehmende Vernetzung und gesteigerte Flexibilität in Planungs- und Produktionsprozessen sind die notwendigen Antworten auf die gesteigerten Anforderungen an die Industrie in Bezug auf Agilität und Einführung von Mehrwertdiensten. Dafür ist eine stärkere Digitalisierung aller Prozesse und Vernetzung mit den Informationshaushalten von Partnern notwendig. Heutige Informationssysteme sind jedoch nicht in der Lage, die Anforderungen eines solchen integrierten, verteilten Informationsraums zu erfüllen. Ein vielversprechender Kandidat ist jedoch Linked Data, das aus dem Bereich des Semantic Web stammt. Aus diesem Ansatz wurde Linked Enterprise Data entwickelt, welches die Werkzeuge und Prozesse so erweitert, dass ein für die Industrie nutzbarer und flexibler Informationsraum entsteht. Kernkonzept dabei ist, dass die Informationen aus den Spezialwerkzeugen auf eine semantische Ebene gehoben, direkt auf Datenebene verknüpft und für Abfragen sicher bereitgestellt werden. Dazu kommt die Erfüllung industrieller Anforderungen durch die Bereitstellung des Revisionierungswerkzeugs R43ples, der Integration mit OPC UA über OPCUA2LD, der Anknüpfung an industrielle Systeme (z.B. an COMOS), einer Möglichkeit zur Modelltransformation mit SPARQL sowie feingranularen Informationsabsicherung eines SPARQL-Endpunkts. / Increasing collaboration in production networks and increased flexibility in planning and production processes are responses to the increased demands on industry regarding agility and the introduction of value-added services. A solution is the digitalisation of all processes and a deeper connectivity to the information resources of partners. However, today’s information systems are not able to meet the requirements of such an integrated, distributed information space. A promising candidate is Linked Data, which comes from the Semantic Web area. Based on this approach, Linked Enterprise Data was developed, which expands the existing tools and processes. Thus, an information space can be created that is usable and flexible for the industry. The core idea is to raise information from legacy tools to a semantic level, link them directly on the data level even across organizational boundaries, and make them securely available for queries. This includes the fulfillment of industrial requirements by the provision of the revision tool R43ples, the integration with OPC UA via OPCUA2LD, the connection to industrial systems (for example to COMOS), a possibility for model transformation with SPARQL as well as fine granular information protection of a SPARQL endpoint. Linked Data Informationsraum OPC UA Semantic Web Informationmodellierung Modelltransformation SPARQL Linked Data information space information model model transformation SPARQL ddc:004 rvk:ST 270
20	Automatic Extraction and Assessment of Entities from the Web Urbansky, David 23 October 2012 (has links) (PDF) The search for information about entities, such as people or movies, plays an increasingly important role on the Web. This information is still scattered across many Web pages, making it more time consuming for a user to ﬁnd all relevant information about an entity. This thesis describes techniques to extract entities and information about these entities from the Web, such as facts, opinions, questions and answers, interactive multimedia objects, and events. The ﬁndings of this thesis are that it is possible to create a large knowledge base automatically using a manually-crafted ontology. The precision of the extracted information was found to be between 75–90 % (facts and entities respectively) after using assessment algorithms. The algorithms from this thesis can be used to create such a knowledge base, which can be used in various research ﬁelds, such as question answering, named entity recognition, and information retrieval. Entitätenextraktion Faktextraktion entity extraction named entity recognition fact extraction ddc:004 rvk:ST 270 rvk:ST 530

Search results