Return to search

Text Document Categorization by Machine Learning

Because of the explosion of digital and online text information, automatic organization of documents has become a very important research area. There are mainly two machine learning approaches to enhance the organization task of the digital documents. One of them is the supervised approach, where pre-defined category labels are assigned to documents based on the likelihood suggested by a training set of labeled documents; and the other one is the unsupervised approach, where there is no need for human intervention or labeled documents at any point in the whole process. In this thesis, we concentrate on the supervised learning task which deals with document classification. One of the most important tasks of information retrieval is to induce classifiers capable of categorizing text documents. The same document can belong to two or more categories and this situation is referred by the term multi-label classification. Multi-label classification domains have been encountered in diverse fields. Most of the existing machine learning techniques which are in multi-label classification domains are extremely expensive since the documents are characterized by an extremely large number of features. In this thesis, we are trying to reduce these computational costs by applying different types of algorithms to the documents which are characterized by large number of features. Another important thing that we deal in this thesis is to have the highest possible accuracy when we have the high computational performance on text document categorization.

Identiferoai:union.ndltd.org:UMIAMI/oai:scholarlyrepository.miami.edu:oa_theses-1208
Date01 January 2008
CreatorsSendur, Zeynel
PublisherScholarly Repository
Source SetsUniversity of Miami
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceOpen Access Theses

Page generated in 0.0025 seconds