Master of Science / Department of Computing and Information Sciences / Doina Caragea / In recent years, we have witnessed a significant increase in the number, size and diversity of the available data sources in many application domains. Data sources in a particular domain are autonomously created and maintained, and therefore distributed and semantically heterogeneous. In this thesis, we focused on the problem of querying such semantically heterogeneous data sources from a user's perspective. We approach this problem by using the concepts of ontologies and mappings between ontologies. A system for answering queries in a transparent way to the user has been designed and implemented. The main components of this system are an ontology mapping algorithm that maps user ontologies to data source ontologies, and a query processing engine that maps user queries to queries that can be answered by the data sources in the system. We have shown that machine learning algorithms can also be incorporated in the system, thus making it possible to learn machine learning classifiers (in particular, generative models such as Naïve Bayes) from distributed, semantically heterogeneous data sources. Because many data sources today are relational in nature, in this work we have dealt specifically with relational data sources, as opposed to flat files, XML or object oriented data sources. However, our system can be easily extended to other types of data sources.
Identifer | oai:union.ndltd.org:KSU/oai:krex.k-state.edu:2097/1089 |
Date | January 1900 |
Creators | Breed, Aditi |
Publisher | Kansas State University |
Source Sets | K-State Research Exchange |
Language | en_US |
Detected Language | English |
Type | Thesis |
Page generated in 0.035 seconds