Global ETD Search

Return to search

Cross-Lingual Text Categorization: A Training-corpus Translation-based Approach

Text categorization deals with the automatic learning of a text categorization model from a training set of preclassified documents on the basis of their contents and the assignment of unclassified documents to appropriate categories. Most of existing text categorization techniques deal with monolingual documents (i.e., all documents are written in one language) during the text categorization model learning and category assignment (or prediction). However, with the globalization of business environments and advances in Internet technology, an organization or individual often generates/acquires and subsequently archives documents in different languages, thus creating the need for cross-lingual text categorization (CLTC). Existing studies on CLTC focus on the prediction-corpus translation-based approach that lacks of a systematic mechanism for reducing translation noises; thus, limiting their cross-lingual categorization effectiveness. Motivated by the needs of providing more effective CLTC support, we design a training-corpus translation-based CLTC approach. Using the prediction-corpus translation-based approach as the performance benchmark, our empirical evaluation results show that our proposed CLTC approach achieves significantly better classification effectiveness than the benchmark approach does in both Chinese

http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0721105-122705

Text mining

Document management

Cross-lingual text categorization

Text categorization

Identifer	oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0721105-122705
Date	21 July 2005
Creators	Hsu, Kai-hsiang
Contributors	Christopher C. Yang, Chih-Ping Wei, Te-Min Chang
Publisher	NSYSU
Source Sets	NSYSU Electronic Thesis and Dissertation Archive
Language	English
Detected Language	English
Type	text
Format	application/pdf
Source	http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0721105-122705
Rights	campus_withheld, Copyright information available at source archive

Page generated in 0.0104 seconds

Cross-Lingual Text Categorization: A Training-corpus Translation-based Approach

Description

Links & Downloads

Tags

Additional Fields