• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

An Instance based Approach to Find the Types of Correspondence between the Attributes of Heterogeneous Datasets

Riaz, Muhammad Atif, Munir, Sameer January 2012 (has links)
Context: Determining attribute correspondence is the most important, time consuming and knowledge intensive part during databases integration. It is also used in other data manipulation applications such as data warehousing, data design, semantic web and e-commerce. Objectives: In this thesis the aim is to investigate how to find the types of correspondence between the attributes of heterogeneous datasets when schema design information of the data sets is unknown. Methods: A literature review was conducted to extract the knowledge related to the approaches that are used to find the correspondence between the attributes of heterogeneous datasets. Extracted knowledge from the literature review is used in developing an instance based approach for finding types of correspondence between the attributes of heterogeneous datasets when schema design information is unknown. To validate the proposed approach an experiment was conducted in the real environment using the data provided by the Telecom Industry (Ericsson) Karlskrona. Evaluation of the results was carried using the well known and mostly used measures from information retrieval field precision, recall and F-measure. Results: To find the types of correspondence between the attributes of heterogeneous datasets, good results depend on the ability of the algorithm to avoid the unmatched pairs of rows during the Row Similarity Phase. An evaluation of proposed approach is performed via experiments. We found 96.7% (average of three experiments) F-measure. Conclusions: The analysis showed that the proposed approach was feasible to be used and it provided users a mean to find the corresponding attributes and the types of correspondence between corresponding attributes, based on the information extracted from the similar pairs of rows from the heterogeneous data sets where their similarity based on the same common primary keys values.

Page generated in 0.2252 seconds