Global ETD Search

Return to search

An Ensemble Approach for Text Categorization with Positive and Unlabeled Examples

Text categorization is the process of assigning new documents to predefined document categories on the basis of a classification model(s) induced from a set of pre-categorized training documents. In a typical dichotomous classification scenario, the set of training documents includes both positive and negative examples; that is, each of the two categories is associated with training documents. However, in many real-world text categorization applications, positive and unlabeled documents are readily available, whereas the acquisition of samples of negative documents is extremely expensive or even impossible. In this study, we propose and develop an ensemble approach, referred to as E2, to address the limitations of existing algorithms for learning from positive and unlabeled training documents. Using the spam email filtering as the evaluation application, our empirical evaluation results suggest that the proposed E2 technique exhibits more stable and reliable performance than PNB and PEBL.

http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0729105-110206

Single-Class Classification

Identifer	oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0729105-110206
Date	29 July 2005
Creators	Chen, Hsueh-Ching
Contributors	Te-Min Chang, none, Chih-Ping Wei
Publisher	NSYSU
Source Sets	NSYSU Electronic Thesis and Dissertation Archive
Language	English
Detected Language	English
Type	text
Format	application/pdf
Source	http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0729105-110206
Rights	withheld, Copyright information available at source archive

Page generated in 0.0023 seconds

An Ensemble Approach for Text Categorization with Positive and Unlabeled Examples

Description

Links & Downloads

Tags

Additional Fields