Global ETD Search

Return to search

Text Document Categorization by Machine Learning

Because of the explosion of digital and online text information, automatic organization of documents has become a very important research area. There are mainly two machine learning approaches to enhance the organization task of the digital documents. One of them is the supervised approach, where pre-defined category labels are assigned to documents based on the likelihood suggested by a training set of labeled documents; and the other one is the unsupervised approach, where there is no need for human intervention or labeled documents at any point in the whole process. In this thesis, we concentrate on the supervised learning task which deals with document classification. One of the most important tasks of information retrieval is to induce classifiers capable of categorizing text documents. The same document can belong to two or more categories and this situation is referred by the term multi-label classification. Multi-label classification domains have been encountered in diverse fields. Most of the existing machine learning techniques which are in multi-label classification domains are extremely expensive since the documents are characterized by an extremely large number of features. In this thesis, we are trying to reduce these computational costs by applying different types of algorithms to the documents which are characterized by large number of features. Another important thing that we deal in this thesis is to have the highest possible accuracy when we have the high computational performance on text document categorization.

http://scholarlyrepository.miami.edu/oa_theses/209

Identifer	oai:union.ndltd.org:UMIAMI/oai:scholarlyrepository.miami.edu:oa_theses-1208
Date	01 January 2008
Creators	Sendur, Zeynel
Publisher	Scholarly Repository
Source Sets	University of Miami
Detected Language	English
Type	text
Format	application/pdf
Source	Open Access Theses

Page generated in 0.015 seconds

Text Document Categorization by Machine Learning

Description

Links & Downloads

Tags

Additional Fields