Traditional data integration offers high quality services for managing and querying interrelated but heterogeneous data sources but at a high cost. This is because a significant amount of manual effort is required to help specify precise relationships between the data sources in order to set up a data integration system. The recent proposed vision of dataspaces aims to reduce the upfront effort required to set up the system. A possible solution to approaching this aim is to infer schematic correspondences between the data sources, thus enabling the development of automated means for bootstrapping dataspaces. In this thesis, we discuss a two-step research programme to automatically infer schematic correspondences between data sources. In the first step, we investigate the effectiveness of existing schema matching approaches for inferring schematic correspondences and contribute a benchmark, called MatchBench, to achieve this aim. In the second step, we contribute an evolutionary search method to identify the set of entity-level relationships (ELRs) between data sources that qualify as entity-level schematic correspondences. Specifically, we model the requirements using a vector space model. For each resulting ELR we further identify a set of attribute-level relationships (ALRs) that qualify as attribute-level schematic correspondences. We demonstrate the effectiveness of the contributed inference technique using both MatchBench scenarios and real world scenarios.
Identifer | oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:549009 |
Date | January 2011 |
Creators | Guo, Chenjuan |
Contributors | Fernandes, Alvaro |
Publisher | University of Manchester |
Source Sets | Ethos UK |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Source | https://www.research.manchester.ac.uk/portal/en/theses/inferring-information-about-correspondences-between-data-sources-for-dataspaces(db744fc9-a87d-425c-be80-60a1313869b2).html |
Page generated in 0.017 seconds