• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • No language data
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

A Framework for Extraction Plans and Heuristics in an Ontology-Based Data-Extraction System

Wessman, Alan E. 26 January 2005 (has links) (PDF)
Extraction of information from semi-structured or unstructured documents, such as Web pages, is a useful yet complex task. Research has demonstrated that ontologies may be used to achieve a high degree of accuracy in data extraction while maintaining resiliency in the face of document changes. Ontologies do not, however, diminish the complexity of a data-extraction system. As research in the field progresses, the need for a modular data-extraction system that de-couples the various functional processes involved continues to grow. In this thesis we propose a framework for such a system. The nature of the framework allows new algorithms and ideas to be incorporated into a data extraction system without requiring wholesale rewrites of a large part of the system’s source code. It also allows researchers to focus their attention on parts of the system relevant to their research without having to worry about introducing incompatibilities with the remaining components. We demonstrate the value of the framework by providing a implementation of it, and we show that our implementation is capable of achieving accuracy in its extraction results comparable to that achieved by the legacy BYU-Ontos data-extraction system. We also suggest alternate ways in which the framework may be extended and implemented, and we supply documentation on the framework for future use by data-extraction researchers.

Page generated in 0.052 seconds