Return to search

Embracing Incompleteness in Schema Mappings

Various forms of information integration have become ubiquitous in current Business Intelligence (BI) technologies. In many cases, the semantic relationship between heterogeneous data sources is specified using high-level declarative rules, called schema mappings. For decades, Skolem functions have been regarded as an important tool in schema mappings as they permit a precise representation of incomplete information. The powerful mapping language of second-order tuple generating dependencies (SO tgds) permits arbitrary Skolem functions and has been proven to be the right class for modeling many integration problems, such as composition and correlation of mappings. This language is strictly more powerful than the languages used in many integration systems, including source-to-target and nested tgds which are both first-order (FO) languages (commonly known as GLAV and nested GLAV mappings). An important class of GLAV mappings are Local-As-View (LAV) tgds, which has found important application in data integration. These FO mapping languages are known to have more desirable programmatic and computational properties. In this thesis, we present a number of techniques for translating some SO tgds into equivalent, more manageable FO schema mappings. Our results rely on understanding and controlling the presence of incompleteness in mappings. We show that the composition of LAV mappings is not only FO, but can always be expressed as a LAV mapping. As a byproduct, we show that the problem of recovery checking for LAV mappings becomes tractable, in contrast to the case of GLAV mappings for which it is known to be undecidable. We introduce two approaches for transforming SO tgds into equivalent nested GLAV mappings. Our approach considers the presence of source constraints, and provides sufficient conditions for when the rich Skolem functions in SO tgds are well-behaved and have an FO semantics. We experimentally show that these conditions are able to handle a very large number of real schema mappings. Last, we propose a first-step for embracing incompleteness in the context of BI applications. Specifically, we present elements of a formal framework for vivifying data with respect to a business model. We view the task of discovering data-to-business interpretations as one of removing incompleteness from these mappings.

Identiferoai:union.ndltd.org:TORONTO/oai:tspace.library.utoronto.ca:1807/35943
Date09 August 2013
CreatorsRodriguez-Gianolli, Patricia
ContributorsMiller, Renee J., Mylopoulos, John
Source SetsUniversity of Toronto
Languageen_ca
Detected LanguageEnglish
TypeThesis

Page generated in 0.0024 seconds