Document clustering automatically organizes a document collection into distinct groups of similar documents on the basis of their contents. Most of existing document clustering techniques deal with monolingual documents (i.e., documents written in one language). However, with the trend of globalization and advances in Internet technology, an organization or individual often generates/acquires and subsequently archives documents in different languages, thus creating the need for multilingual document clustering (MLDC). Motivated by its significance and need, this study designs a translation-based MLDC technique. Our empirical evaluation results show that the proposed multilingual document clustering technique achieves satisfactory clustering effectiveness measured by both cluster recall and cluster precision.
Identifer | oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0808106-220124 |
Date | 08 August 2006 |
Creators | Liao, Shan-Yu |
Contributors | Chih-Ping Wei, Wen-Hsiang Lu, Chuen-Chi Yang |
Publisher | NSYSU |
Source Sets | NSYSU Electronic Thesis and Dissertation Archive |
Language | English |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0808106-220124 |
Rights | campus_withheld, Copyright information available at source archive |
Page generated in 0.0022 seconds