As a single English word can have several different meanings, a single meaning can be expressed by several different English words. The meaning of a word depends on the sense intended. Thus to select the most appropriate meaning for an ambiguous word within a context is a critical problem for the applications using the technologies of natural language processing. However, at present, most word sense disambiguation methods either disambiguate only restricted parts of speech words such as only nouns or the accuracy in disambiguating word senses is not satisfiable. The ambiguous situation often bothers users.
In this study, a new word sense disambiguation method using WordNet lexicon database, SemCor text files, and the Web is presented. In addition to nouns, the proposed method also attempts to disambiguate verbs, adjectives, and adverbs in sentences. The text files and sentences investigated in the experiments were randomly selected from SemCor. The semantic similarity between the senses of individually semantically ambiguous words in a word pair is measured to select the applicable candidate senses of a target word in that word pair. By a synonym weighting method, the possible sense diversity in synonym sets is considered based on the synonym sets WordNet provides. Thus corresponding synonym sets of the candidate senses are determined. The candidate senses expanded with the senses in the corresponding synonym sets, and enhanced by the context window technique form new queries. After the new queries are submitted to a search engine to search for the matching documents on the Web, the candidate senses are ranked by the number of the matching documents found. The first sense in the list of the ranked candidate senses is viewed as the most appropriate sense of the target word.
The proposed method as well as Stetina et al.¡¦s and Mihalcea et al.¡¦s methods are evaluated based on the SemCor text files. The experimental results show that for the top sense selected this method having the average accuracy of disambiguating word senses with 81.3% for nouns, verbs, adjectives, and adverbs is slightly better than Stetina et al.¡¦s method of 80% and Mihalcea et al.¡¦s method of 80.1%. Furthermore, the proposed method is the only method with the accuracy of disambiguating word senses for verbs achieving 70% for the top one sense selected. Moreover, for the top three senses selected this method is superior to the other two methods by an average accuracy of the four parts of speech exceeding 96%. It is expected that the proposed method can improve the performance of the word sense disambiguation applications in machine translation, document classification, or information retrieval.
Identifer | oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0124106-010103 |
Date | 24 January 2006 |
Creators | Guo, Jian-Yi |
Contributors | Cha-Hwa Lin, Chungnan Lee, Chun-I Fan |
Publisher | NSYSU |
Source Sets | NSYSU Electronic Thesis and Dissertation Archive |
Language | English |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0124106-010103 |
Rights | not_available, Copyright information available at source archive |
Page generated in 0.0019 seconds