Global ETD Search

Return to search

Poly-Lingual Text Categorization

With the rapid emergence and proliferation of Internet and the trend of globalization, a tremendous number of textual documents written in different languages are electronically accessible online. Efficiently and effectively managing these textual documents written different languages is essential to organizations and individuals. Although poly-lingual text categorization (PLTC) can be approached as a set of independent monolingual classifiers, this naïve approach employs only the training documents of the same language to construct to construct a monolingual classifier and fails to utilize the opportunity offered by poly-lingual training documents. Motivated by the significance of and need for such a poly-lingual text categorization technique, we propose a PLTC technique that takes into account all training documents of all languages when constructing a monolingual classifier for a specific language. Using the independent monolingual text categorization (MnTC) technique as our performance benchmark, our empirical evaluation results show that our proposed PLTC technique achieves higher classification accuracy than the benchmark technique does in both English and Chinese corpora. In addition, our empirical results also suggest the robustness of the proposed PLTC technique with respect to the range of training sizes investigated.

http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0809106-221247

Text categorization

Document management

Text mining

Poly-lingual text categorization

Identifer	oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0809106-221247
Date	09 August 2006
Creators	Shih, Hui-Hua
Contributors	Christopher C. Yang, Chin-Pin Wei, Wen-Hsiang Lu
Publisher	NSYSU
Source Sets	NSYSU Electronic Thesis and Dissertation Archive
Language	English
Detected Language	English
Type	text
Format	application/pdf
Source	http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0809106-221247
Rights	withheld, Copyright information available at source archive

Page generated in 0.0034 seconds

Poly-Lingual Text Categorization

Description

Links & Downloads

Tags

Additional Fields