Global ETD Search

Return to search

Metadata-Driven Data Integration

Data has an undoubtable impact on society. Storing and processing large amounts of available data is currently one of the key success factors for an organization. Nonetheless, we are recently witnessing a change represented by huge and heterogeneous amounts of data. Indeed, 90% of the data in the world has been generated in the last two years. Thus, in order to carry on these data exploitation tasks, organizations must first perform data integration combining data from multiple sources to yield a unified view over them. Yet, the integration of massive and heterogeneous amounts of data requires revisiting the traditional integration assumptions to cope with the new requirements posed by such data-intensive settings.This PhD thesis aims to provide a novel framework for data integration in the context of data-intensive ecosystems, which entails dealing with vast amounts of heterogeneous data, from multiple sources and in their original format. To this end, we advocate for an integration process consisting of sequential activities governed by a semantic layer, implemented via a shared repository of metadata. From an stewardship perspective, this activities are the deployment of a data integration architecture, followed by the population of such shared metadata. From a data consumption perspective, the activities are virtual and materialized data integration, the former an exploratory task and the latter a consolidation one. Following the proposed framework, we focus on providing contributions to each of the four activities.We begin proposing a software reference architecture for semantic-aware data-intensive systems. Such architecture serves as a blueprint to deploy a stack of systems, its core being the metadata repository. Next, we propose a graph-based metadata model as formalism for metadata management. We focus on supporting schema and data source evolution, a predominant factor on the heterogeneous sources at hand. For virtual integration, we propose query rewriting algorithms that rely on the previously proposed metadata model. We additionally consider semantic heterogeneities in the data sources, which the proposed algorithms are capable of automatically resolving. Finally, the thesis focuses on the materialized integration activity, and to this end, proposes a method to select intermediate results to materialize in data-intensive flows. Overall, the results of this thesis serve as contribution to the field of data integration in contemporary data-intensive ecosystems. / Doctorat en Sciences de l'ingénieur et technologie / info:eu-repo/semantics/nonPublished

Informatique générale

data integration

metadata

Identifer	oai:union.ndltd.org:ulb.ac.be/oai:dipot.ulb.ac.be:2013/285547
Date	16 May 2019
Creators	Nadal Francesch, Sergi
Contributors	Vansummeren, Stijn, Abello, Alberto, Romero, Oscar, Zimanyi, Esteban, Sakr, Mahmoud, Fletcher, George, Mena, Eduardo, Wrembel, Robert
Publisher	Universite Libre de Bruxelles, Universitat Politècnica de Catalunya, Facultat d'Informàtica de Barcelona, Enginyeria de Serveis i Sistemes d'Informació - ERASMUS MUNDUS DOCTORAL DEGREE IN INFORMATION TECHNOLOGIES FOR BUSINESS INTELLIGENCE, Université libre de Bruxelles, Ecole polytechnique de Bruxelles – Informatique, Bruxelles
Source Sets	Université libre de Bruxelles
Language	English
Detected Language	English
Type	info:eu-repo/semantics/doctoralThesis, info:ulb-repo/semantics/doctoralThesis, info:ulb-repo/semantics/openurl/vlink-dissertation
Format	1 v. (186 p.), 3 full-text file(s): application/pdf \| application/pdf \| application/pdf
Rights	3 full-text file(s): info:eu-repo/semantics/openAccess \| info:eu-repo/semantics/restrictedAccess \| info:eu-repo/semantics/closedAccess

Page generated in 0.0027 seconds

Metadata-Driven Data Integration

Description

Links & Downloads

Tags

Additional Fields