Return to search

Feature Translation-based Multilingual Document Clustering Technique

Document clustering automatically organizes a document collection into distinct groups of similar documents on the basis of their contents. Most of existing document clustering techniques deal with monolingual documents (i.e., documents written in one language). However, with the trend of globalization and advances in Internet technology, an organization or individual often generates/acquires and subsequently archives documents in different languages, thus creating the need for multilingual document clustering (MLDC). Motivated by its significance and need, this study designs a translation-based MLDC technique. Our empirical evaluation results show that the proposed multilingual document clustering technique achieves satisfactory clustering effectiveness measured by both cluster recall and cluster precision.

Identiferoai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0808106-220124
Date08 August 2006
CreatorsLiao, Shan-Yu
ContributorsChih-Ping Wei, Wen-Hsiang Lu, Chuen-Chi Yang
PublisherNSYSU
Source SetsNSYSU Electronic Thesis and Dissertation Archive
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0808106-220124
Rightscampus_withheld, Copyright information available at source archive

Page generated in 0.0098 seconds