Global ETD Search

11	Unsupervised Bayesian Data Cleaning Techniques for Structured Data January 2014 (has links) abstract: Recent efforts in data cleaning have focused mostly on problems like data deduplication, record matching, and data standardization; few of these focus on fixing incorrect attribute values in tuples. Correcting values in tuples is typically performed by a minimum cost repair of tuples that violate static constraints like CFDs (which have to be provided by domain experts, or learned from a clean sample of the database). In this thesis, I provide a method for correcting individual attribute values in a structured database using a Bayesian generative model and a statistical error model learned from the noisy database directly. I thus avoid the necessity for a domain expert or master data. I also show how to efficiently perform consistent query answering using this model over a dirty database, in case write permissions to the database are unavailable. A Map-Reduce architecture to perform this computation in a distributed manner is also shown. I evaluate these methods over both synthetic and real data. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2014 Computer science Consistent Query Answering Databases Data Cleaning Information Retrieval Probabilistic Databases
12	Using Ontology-Based Data Access to Enable Context Recognition in the Presence of Incomplete Information Thost, Veronika 24 August 2017 (has links) (PDF) Ontology-based data access (OBDA) augments classical query answering in databases by including domain knowledge provided by an ontology. An ontology captures the terminology of an application domain and describes domain knowledge in a machine-processable way. Formal ontology languages additionally provide semantics to these specifications. Systems for OBDA thus may apply logical reasoning to answer queries; they use the ontological knowledge to infer new information, which is only implicitly given in the data. Moreover, they usually employ the open-world assumption, which means that knowledge not stated explicitly in the data or inferred is neither assumed to be true nor false. Classical OBDA regards the knowledge however only w.r.t. a single moment, which means that information about time is not used for reasoning and hence lost; in particular, the queries generally cannot express temporal aspects. We investigate temporal query languages that allow to access temporal data through classical ontologies. In particular, we study the computational complexity of temporal query answering regarding ontologies written in lightweight description logics, which are known to allow for efficient reasoning in the atemporal setting and are successfully applied in practice. Furthermore, we present a so-called rewritability result for ontology-based temporal query answering, which suggests ways for implementation. Our results may thus guide the choice of a query language for temporal OBDA in data-intensive applications that require fast processing, such as context recognition. Beschreibungslogik Anfragebeantwortung Temporale Logik Komplexität Description Logics Query Answering Temporal Logic Complexity ddc:004 rvk:ST 270
13	Gestion des incohérences pour l'accès aux données en présence d'ontologies / Inconsistency Handling in Ontology-Mediated Query Answering Bourgaux, Camille 29 September 2016 (has links) Interroger des bases de connaissances avec des requêtes conjonctives a été une préoccupation majeure de la recherche récente en logique de description. Une question importante qui se pose dans ce contexte est la gestion de données incohérentes avec l'ontologie. En effet, une théorie logique incohérente impliquant toute formule sous la sémantique classique, l'utilisation de sémantiques tolérantes aux incohérences est nécessaire pour obtenir des réponses pertinentes. Le but de cette thèse est de développer des méthodes pour gérer des bases de connaissances incohérentes en utilisant trois sémantiques naturelles (AR, IAR et brave) proposées dans la littérature et qui reposent sur la notion de réparation, définie comme un sous-ensemble maximal des données cohérent avec l'ontologie. Nous utilisons ces trois sémantiques conjointement pour identifier les réponses associées à différents niveaux de confiance. En plus de développer des algorithmes efficaces pour interroger des bases de connaissances DL-Lite incohérentes, nous abordons trois problèmes : (i) l'explication des résultats des requêtes, pour aider l'utilisateur à comprendre pourquoi une réponse est (ou n'est pas) obtenue sous une des trois sémantiques, (ii) la réparation des données guidée par les requêtes, pour améliorer la qualité des données en capitalisant sur les retours des utilisateurs sur les résultats de la requête, et (iii) la définition de variantes des sémantiques à l'aide de réparations préférées pour prendre en compte la fiabilité des données. Pour chacune de ces trois questions, nous développons un cadre formel, analysons la complexité des problèmes de raisonnement associés, et proposons et mettons en œuvre des algorithmes, qui sont étudiés empiriquement sur un jeu de bases de connaissance DL-Lite incohérentes que nous avons construit. Nos résultats indiquent que même si les problèmes à traiter sont théoriquement durs, ils peuvent souvent être résolus efficacement dans la pratique en utilisant des approximations et des fonctionnalités des SAT solveurs modernes. / The problem of querying description logic knowledge bases using database-style queries (in particular, conjunctive queries) has been a major focus of recent description logic research. An important issue that arises in this context is how to handle the case in which the data is inconsistent with the ontology. Indeed, since in classical logic an inconsistent logical theory implies every formula, inconsistency-tolerant semantics are needed to obtain meaningful answers. This thesis aims to develop methods for dealing with inconsistent description logic knowledge bases using three natural semantics (AR, IAR, and brave) previously proposed in the literature and that rely on the notion of a repair, which is an inclusion-maximal subset of the data consistent with the ontology. In our framework, these three semantics are used conjointly to identify answers with different levels of confidence. In addition to developing efficient algorithms for query answering over inconsistent DL-Lite knowledge bases, we address three problems that should support the adoption of this framework: (i) query result explanation, to help the user to understand why a given answer was (not) obtained under one of the three semantics, (ii) query-driven repairing, to exploit user feedback about errors or omissions in the query results to improve the data quality, and (iii) preferred repair semantics, to take into account the reliability of the data. For each of these three topics, we developed a formal framework, analyzed the complexity of the relevant reasoning problems, and proposed and implemented algorithms, which we empirically studied over an inconsistent DL-Lite benchmark we built. Our results indicate that even if the problems related to dealing with inconsistent DL-Lite knowledge bases are theoretically hard, they can often be solved efficiently in practice by using tractable approximations and features of modern SAT solvers. Logiques de description Réponse aux requêtes Gestion de l'incohérence Description logics Query answering Inconsistency handling
14	Tirer parti de la structure des données incertaines / Leveraging the structure of uncertain data Amarilli, Antoine 14 March 2016 (has links) La gestion des données incertaines peut devenir infaisable, dans le cas des bases de données probabilistes, ou même indécidable, dans le cas du raisonnement en monde ouvert sous des contraintes logiques. Cette thèse étudie comment pallier ces problèmes en limitant la structure des données incertaines et des règles. La première contribution présentée s'intéresse aux conditions qui permettent d'assurer la faisabilité de l'évaluation de requêtes et du calcul de lignage sur les instances relationnelles probabilistes. Nous montrons que ces tâches sont faisables, pour diverses représentations de la provenance et des probabilités, quand la largeur d'arbre des instances est bornée. Réciproquement, sous des hypothèses faibles, nous pouvons montrer leur infaisabilité pour toute autre condition imposée sur les instances. La seconde contribution concerne l'évaluation de requêtes sur des données incomplètes et sous des contraintes logiques, sous l'hypothèse de finitude généralement supposée en théorie des bases de données. Nous montrons la décidabilité de cette tâche pour les dépendances d'inclusion unaires et les dépendances fonctionnelles. Ceci constitue le premier résultat positif, sous l'hypothèse de la finitude, pour la réponse aux requêtes en monde ouvert avec un langage d'arité arbitraire qui propose à la fois des contraintes d'intégrité référentielle et des contraintes de cardinalité. / The management of data uncertainty can lead to intractability, in the case of probabilistic databases, or even undecidability, in the case of open-world reasoning under logical rules. My thesis studies how to mitigate these problems by restricting the structure of uncertain data and rules. My first contribution investigates conditions on probabilistic relational instances that ensure the tractability of query evaluation and lineage computation. I show that these tasks are tractable when we bound the treewidth of instances, for various probabilistic frameworks and provenance representations. Conversely, I show intractability under mild assumptions for any other condition on instances. The second contribution concerns query evaluation on incomplete data under logical rules, and under the finiteness assumption usually made in database theory. I show that this task is decidable for unary inclusion dependencies and functional dependencies. This establishes the first positive result for finite open-world query answering on an arbitrary-arity language featuring both referential constraints and number restrictions. Incertitude Probabilités Bases de données Réponse aux requêtes Uncertainty Probabilities Databases Query answering
15	OWL query answering using machine learning Huster, Todd 21 December 2015 (has links) No description available. Artificial Intelligence Computer Science approximate reasoning OWL query answering Semantic Web SPARQL ontology description logic
16	Ontology-Mediated Queries for Probabilistic Databases: Extended Version Borgwardt, Stefan, Ceylan, Ismail Ilkan, Lukasiewicz, Thomas 28 December 2023 (has links) Probabilistic databases (PDBs) are usually incomplete, e.g., contain only the facts that have been extracted from the Web with high confidence. However, missing facts are often treated as being false, which leads to unintuitive results when querying PDBs. Recently, open-world probabilistic databases (OpenPDBs) were proposed to address this issue by allowing probabilities of unknown facts to take any value from a fixed probability interval. In this paper, we extend OpenPDBs by Datalog± ontologies, under which both upper and lower probabilities of queries become even more informative, enabling us to distinguish queries that were indistinguishable before. We show that the dichotomy between P and PP in (Open)PDBs can be lifted to the case of first-order rewritable positive programs (without negative constraints); and that the problem can become NP^PP-complete, once negative constraints are allowed. We also propose an approximating semantics that circumvents the increase in complexity caused by negative constraints. info:eu-repo/classification/ddc/004 ddc:004
17	Preferential Query Answering in the Semantic Web with Possibilistic Networks Borgwardt, Stefan, Fazzinga, Bettina, Lukasiewicz, Thomas, Shrivastava, Akanksha, Tifrea-Marciuska, Oana 28 December 2023 (has links) In this paper, we explore how ontological knowledge expressed via existential rules can be combined with possibilistic networks (i) to represent qualitative preferences along with domain knowledge, and (ii) to realize preference-based answering of conjunctive queries (CQs). We call these combinations ontological possibilistic networks (OP-nets). We define skyline and k-rank answers to CQs under preferences and provide complexity (including data tractability) results for deciding consistency and CQ skyline membership for OP-nets. We show that our formalism has a lower complexity than a similar existing formalism. ST 136
18	Most Probable Explanations for Probabilistic Database Queries: Extended Version Ceylan, Ismail Ilkan, Borgwardt, Stefan, Lukasiewicz, Thomas 28 December 2023 (has links) Forming the foundations of large-scale knowledge bases, probabilistic databases have been widely studied in the literature. In particular, probabilistic query evaluation has been investigated intensively as a central inference mechanism. However, despite its power, query evaluation alone cannot extract all the relevant information encompassed in large-scale knowledge bases. To exploit this potential, we study two inference tasks; namely finding the most probable database and the most probable hypothesis for a given query. As natural counterparts of most probable explanations (MPE) and maximum a posteriori hypotheses (MAP) in probabilistic graphical models, they can be used in a variety of applications that involve prediction or diagnosis tasks. We investigate these problems relative to a variety of query languages, ranging from conjunctive queries to ontology-mediated queries, and provide a detailed complexity analysis. info:eu-repo/classification/ddc/004 ddc:004
19	Query Answering in Probabilistic Data and Knowledge Bases Ceylan, Ismail Ilkan 04 June 2018 (has links) (PDF) Probabilistic data and knowledge bases are becoming increasingly important in academia and industry. They are continuously extended with new data, powered by modern information extraction tools that associate probabilities with knowledge base facts. The state of the art to store and process such data is founded on probabilistic database systems, which are widely and successfully employed. Beyond all the success stories, however, such systems still lack the fundamental machinery to convey some of the valuable knowledge hidden in them to the end user, which limits their potential applications in practice. In particular, in their classical form, such systems are typically based on strong, unrealistic limitations, such as the closed-world assumption, the closed-domain assumption, the tuple-independence assumption, and the lack of commonsense knowledge. These limitations do not only lead to unwanted consequences, but also put such systems on weak footing in important tasks, querying answering being a very central one. In this thesis, we enhance probabilistic data and knowledge bases with more realistic data models, thereby allowing for better means for querying them. Building on the long endeavor of unifying logic and probability, we develop different rigorous semantics for probabilistic data and knowledge bases, analyze their computational properties and identify sources of (in)tractability and design practical scalable query answering algorithms whenever possible. To achieve this, the current work brings together some recent paradigms from logics, probabilistic inference, and database theory. Logik Probabilistische Datenbanken Probabilistische Wissensbasen Anfragebeantwortung Komplexität Logic Probabilistic Databases Probabilistic Knowledge Bases Query Answering Complexity ddc:004 rvk:ST 270
20	Using Ontology-Based Data Access to Enable Context Recognition in the Presence of Incomplete Information Thost, Veronika 19 June 2017 (has links) Ontology-based data access (OBDA) augments classical query answering in databases by including domain knowledge provided by an ontology. An ontology captures the terminology of an application domain and describes domain knowledge in a machine-processable way. Formal ontology languages additionally provide semantics to these specifications. Systems for OBDA thus may apply logical reasoning to answer queries; they use the ontological knowledge to infer new information, which is only implicitly given in the data. Moreover, they usually employ the open-world assumption, which means that knowledge not stated explicitly in the data or inferred is neither assumed to be true nor false. Classical OBDA regards the knowledge however only w.r.t. a single moment, which means that information about time is not used for reasoning and hence lost; in particular, the queries generally cannot express temporal aspects. We investigate temporal query languages that allow to access temporal data through classical ontologies. In particular, we study the computational complexity of temporal query answering regarding ontologies written in lightweight description logics, which are known to allow for efficient reasoning in the atemporal setting and are successfully applied in practice. Furthermore, we present a so-called rewritability result for ontology-based temporal query answering, which suggests ways for implementation. Our results may thus guide the choice of a query language for temporal OBDA in data-intensive applications that require fast processing, such as context recognition. info:eu-repo/classification/ddc/004 ddc:004

Search results