Return to search

Effect of ontology hierarchy on a concept vector machine's ability to classify web documents

As the quantity of text documents created on the web grows the ability of experts to manually classify them has decreased. Because people need to find and organize this information, interest has grown in developing automatic means of categorizing these documents. In this effort, ontologies have been developed that capture domain specific knowledge in the form of a hierarchy of concepts.
Support Vector Machines are machine learning methods that are widely used for automated document categorization. Recent studies suggest that the classification accuracy of a Support Vector Machine may be improved by using concepts defined by a domain ontology instead of using the words that appear in the document. However, such studies have not taken into account the hierarchy inherent in the relationship between concepts. The goal of this dissertation was to investigate whether the hierarchical relationships among concepts in ontologies can be exploited to improve the classification accuracy of web documents by a Support Vector Machine.
Concept vectors that capture the hierarchy of domain ontologies were created and used to train a Support Vector Machine. Tests conducted using the benchmark Reuters-21578 data set indicate that the Support Vector Machines achieve higher classification accuracy when they make use of the hierarchical relationships among concepts in ontologies.

Identiferoai:union.ndltd.org:nova.edu/oai:nsuworks.nova.edu:gscis_etd-1164
Date01 January 2009
CreatorsGraham, Jeffrey A.
PublisherNSUWorks
Source SetsNova Southeastern University
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceCEC Theses and Dissertations

Page generated in 0.0017 seconds