Global ETD Search

Return to search

A tightness continuum measure of Chinese semantic units, and its application to information retrieval

Chinese is very different from alphabetical languages such as English, as there are no delimiters between Chinese words. So Chinese segmentation is an important step for most Chinese natural language processing (NLP) tasks.
We propose a tightness continuum for Chinese semantic units. The construction of the continuum is based on statistical informations. Based on this continuum, sequences can be dynamically segmented, and then that information can be exploited in a number of information retrieval tasks.
In order to show that our tightness continuum is useful for NLP tasks, we propose two methods to exploit the tightness continuum within IR systems. The first method refines the result of a general Chinese word segmenter. The second method embeds the tightness value into IR score functions. Experimental results show that our tightness measure is reasonable and does improve the performance of IR systems.

http://hdl.handle.net/10048/1123

Chinese

Compound

information retrieval

Identifer	oai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:AEU.10048/1123
Date	06 1900
Creators	Xu, Ying
Contributors	Goebel, Randy (Computing Science), Ringlstetter, Christoph (Center of Language and Information Processing, University of Munich), Kondrak, Greg (Computing Science), Zhao, Dangzhi (School of Library and Information Science)
Source Sets	Library and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada
Language	English
Detected Language	English
Type	Thesis
Format	1004486 bytes, application/pdf
Relation	Ying Xu, Christoph Ringlstetter and Randy Goebel. A Continuum-based Approach for Tightness Analysis of Chinese Semantic Units. PACLIC 23. 2009

Page generated in 0.0022 seconds

A tightness continuum measure of Chinese semantic units, and its application to information retrieval

Description

Links & Downloads

Tags

Additional Fields