Return to search

Clustering Multilingual Documents: A Latent Semantic Indexing Based Approach

Document clustering automatically organizes a document collection into distinct groups of similar documents on the basis of their contents. Most of existing document clustering techniques deal with monolingual documents (i.e., documents written in one language). However, with the trend of globalization and advances in Internet technology, an organization or individual often generates/acquires and subsequently archives documents in different languages, thus creating the need for multilingual document clustering (MLDC). Motivated by its significance and need, this study designs a Latent Semantic Indexing (LSI) based MLDC technique. Our empirical evaluation results show that the proposed LSI-based multilingual document clustering technique achieves satisfactory clustering effectiveness, measured by both cluster recall and cluster precision.

Identiferoai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0209106-024252
Date09 February 2006
CreatorsLin, Chia-min
ContributorsTe-min Chang, Christopher C. Yang, Chih-ping Wei
PublisherNSYSU
Source SetsNSYSU Electronic Thesis and Dissertation Archive
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0209106-024252
Rightsoff_campus_withheld, Copyright information available at source archive

Page generated in 0.0017 seconds