In this constantly growing information technology driven era, data migration and
replication pose a serious bottleneck in the distributed database infrastructure envi-
ronment. For large heterogeneous environments with domains such as geospatial sci-
ence and high energy physics, where large array of scienti c data are involved, diverse
challenges are encountered with respect to dataset identi cation, location services,
and e cient retrieval of information. These challenges include locating data sources,
identifying e ective transfer route, and replication, just to mention a few. As dis-
tributed systems aimed at constant delivery of data to the point of query origination
continue to expand in size and functionality, e cient replication and data retrieval
systems have subsequently become increasingly important and relevant. One such
system is an infrastructure for large scale distributed scienti c data management.
Several data management systems have been developed to help manage these fast
growing datasets and their metadata. However little work has been done on allowing
cross-communication and data-sharing between these di erent dataset management
systems in a distributed, heterogeneous environment.
This dissertation addresses this problem, focusing particularly on metadata and
provenance service associated with it. We present the Virtual Uni ed Metadata
architecture to establish communication between remote sites within a distributed
heterogeneous environment using a client-server model. The system provides a frame-
work that allows heterogeneous metadata services communicate and share metadata
and datasets through the implementation of a communication interface. It allows
for metadata discovery and dataset identi cation by enabling remote query between
heterogeneous metadata repositories. The signi cant contributions of this system
include: { the design and implementation of a client/server based remote metadata query
system for scienti c datasets within distributed heterogeneous dataset reposito-
ries; { Implementation of a caching mechanism for optimizing the system performance; { Analyzing the quality of service with respect to correct dataset identi cation,
estimation of migration and replication time frame, and cache performance.
Identifer | oai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:wits/oai:wiredspace.wits.ac.za:10539/14780 |
Date | 12 June 2014 |
Creators | Adeleke, Oluwalani Aeoluwa |
Source Sets | South African National ETD Portal |
Language | English |
Detected Language | English |
Type | Thesis |
Format | application/pdf |
Page generated in 0.0019 seconds