Return to search

Using web texts for word sense disambiguation

In all natural languages, ambiguity is a universal phenomenon. When a word has multiple meaning depending on its contexts it is called an ambiguous word. The process of determining the correct meaning of a word (formally named word sense) in a given context is word sense disambiguation(WSD). WSD is one of the most fundamental problems in natural language processing. If properly addressed, it could lead to revolutionary advancement in many other technologies such as text search engine technology, automatic text summarization and classification, automatic lexicon construction, machine translation and automatic learning agent technology. One difficulty that has always confronted WSD researchers is the lack of high quality sense specific information. For example, if the word "power" Immediately preceds the word "plant", it would strongly constrain the meaning of "plant" to be "an industrial facility". If "power" is replaced by the phrase "root of a", then the sense of "plant" is dictated to be "an organism" of the kingdom Planate. It is obvious that manually building a comprehensive sense specific information base for each sense of each word is impractical. Researchers also tried to extract such information from large dictionaries as well as manually sense tagged corpora. Most of the dictionaries used for WSD are not built for this purpose and have a lot of inherited peculiarities. While manual tagging is slow and costly, automatic tagging is not successful in providing a reliable performance. Furthermore, it is often the case that for a randomly chosen word (to be disambiguated), the sense specific context corpora that can be collected from dictionaries are not large enough. Therefore, manually building sense specific information bases or extraction of such information from dictionaries are not effective approaches to obtain sense specific information. A web text, due to its vast quantity and wide diversity, becomes an ideal source for extraction of large quantity of sense specific information. In this thesis, the impacts of Web texts on various aspects of WSD has been investigated. New measures and models are proposed to tame enormous amount of Web texts for the purpose of WSD. They are formally evaluated by experimenting their disambiguation performance on about 70 ambiguous nouns. The results are very encouraging and have helped revealing the great potential of using Web texts for WSD. The results are published in three papers at Australia national and international level (Wang&Hoffmann,2004,2005,2006)[42][43][44].

Identiferoai:union.ndltd.org:ADTP/242982
Date January 2007
CreatorsWang, Yuanyong, Computer Science & Engineering, Faculty of Engineering, UNSW
Source SetsAustraliasian Digital Theses Program
LanguageEnglish
Detected LanguageEnglish
Rightshttp://unsworks.unsw.edu.au/copyright, http://unsworks.unsw.edu.au/copyright

Page generated in 0.0021 seconds