Return to search

A tightness continuum measure of Chinese semantic units, and its application to information retrieval

Chinese is very different from alphabetical languages such as English, as there are no delimiters between Chinese words. So Chinese segmentation is an important step for most Chinese natural language processing (NLP) tasks.
We propose a tightness continuum for Chinese semantic units. The construction of the continuum is based on statistical informations. Based on this continuum, sequences can be dynamically segmented, and then that information can be exploited in a number of information retrieval tasks.
In order to show that our tightness continuum is useful for NLP tasks, we propose two methods to exploit the tightness continuum within IR systems. The first method refines the result of a general Chinese word segmenter. The second method embeds the tightness value into IR score functions. Experimental results show that our tightness measure is reasonable and does improve the performance of IR systems.

Identiferoai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:AEU.10048/1123
Date06 1900
CreatorsXu, Ying
ContributorsGoebel, Randy (Computing Science), Ringlstetter, Christoph (Center of Language and Information Processing, University of Munich), Kondrak, Greg (Computing Science), Zhao, Dangzhi (School of Library and Information Science)
Source SetsLibrary and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Format1004486 bytes, application/pdf
RelationYing Xu, Christoph Ringlstetter and Randy Goebel. A Continuum-based Approach for Tightness Analysis of Chinese Semantic Units. PACLIC 23. 2009

Page generated in 0.0016 seconds