1 |
An Instance based Approach to Find the Types of Correspondence between the Attributes of Heterogeneous DatasetsRiaz, Muhammad Atif, Munir, Sameer January 2012 (has links)
Context: Determining attribute correspondence is the most important, time consuming and knowledge intensive part during databases integration. It is also used in other data manipulation applications such as data warehousing, data design, semantic web and e-commerce. Objectives: In this thesis the aim is to investigate how to find the types of correspondence between the attributes of heterogeneous datasets when schema design information of the data sets is unknown. Methods: A literature review was conducted to extract the knowledge related to the approaches that are used to find the correspondence between the attributes of heterogeneous datasets. Extracted knowledge from the literature review is used in developing an instance based approach for finding types of correspondence between the attributes of heterogeneous datasets when schema design information is unknown. To validate the proposed approach an experiment was conducted in the real environment using the data provided by the Telecom Industry (Ericsson) Karlskrona. Evaluation of the results was carried using the well known and mostly used measures from information retrieval field precision, recall and F-measure. Results: To find the types of correspondence between the attributes of heterogeneous datasets, good results depend on the ability of the algorithm to avoid the unmatched pairs of rows during the Row Similarity Phase. An evaluation of proposed approach is performed via experiments. We found 96.7% (average of three experiments) F-measure. Conclusions: The analysis showed that the proposed approach was feasible to be used and it provided users a mean to find the corresponding attributes and the types of correspondence between corresponding attributes, based on the information extracted from the similar pairs of rows from the heterogeneous data sets where their similarity based on the same common primary keys values.
|
2 |
Instance-Based Matching of Large Life Science OntologiesKirsten, Toralf, Thor, Andreas, Rahm, Erhard 06 February 2019 (has links)
Ontologies are heavily used in life sciences so that there is increasing value to match different ontologies in order to determine related conceptual categories. We propose a simple yet powerful methodology for instance-based ontology matching which utilizes the associations between molecular-biological objects and ontologies. The approach can build on many existing ontology associations for instance objects like sequences and proteins and thus makes heavy use of available domain knowledge. Furthermore, the approach is flexible and extensible since each instance source with associations to the ontologies of interest can contribute to the ontology mapping. We study several approaches to determine the instance-based similarity of ontology categories. We perform an extensive experimental evaluation to use protein associations for different species to match between subontologies of the Gene Ontology and OMIM. We also provide a comparison with metadata-based ontology matching.
|
Page generated in 0.0836 seconds