Return to search

A metadata service for an infrastructure of large scale distributed scientific datasets

In this constantly growing information technology driven era, data migration and
replication pose a serious bottleneck in the distributed database infrastructure envi-
ronment. For large heterogeneous environments with domains such as geospatial sci-
ence and high energy physics, where large array of scienti c data are involved, diverse
challenges are encountered with respect to dataset identi cation, location services,
and e cient retrieval of information. These challenges include locating data sources,
identifying e ective transfer route, and replication, just to mention a few. As dis-
tributed systems aimed at constant delivery of data to the point of query origination
continue to expand in size and functionality, e cient replication and data retrieval
systems have subsequently become increasingly important and relevant. One such
system is an infrastructure for large scale distributed scienti c data management.
Several data management systems have been developed to help manage these fast
growing datasets and their metadata. However little work has been done on allowing
cross-communication and data-sharing between these di erent dataset management
systems in a distributed, heterogeneous environment.
This dissertation addresses this problem, focusing particularly on metadata and
provenance service associated with it. We present the Virtual Uni ed Metadata
architecture to establish communication between remote sites within a distributed
heterogeneous environment using a client-server model. The system provides a frame-
work that allows heterogeneous metadata services communicate and share metadata
and datasets through the implementation of a communication interface. It allows
for metadata discovery and dataset identi cation by enabling remote query between
heterogeneous metadata repositories. The signi cant contributions of this system
include: { the design and implementation of a client/server based remote metadata query
system for scienti c datasets within distributed heterogeneous dataset reposito-
ries; { Implementation of a caching mechanism for optimizing the system performance; { Analyzing the quality of service with respect to correct dataset identi cation,
estimation of migration and replication time frame, and cache performance.

Identiferoai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:wits/oai:wiredspace.wits.ac.za:10539/14780
Date12 June 2014
CreatorsAdeleke, Oluwalani Aeoluwa
Source SetsSouth African National ETD Portal
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Formatapplication/pdf

Page generated in 0.0022 seconds