• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

DURHAM : a word sense disambiguation system

Hawkins, Paul Martin January 1999 (has links)
Ever since the 1950's when Machine Translation first began to be developed, word sense disambiguation (WSD) has been considered a problem to developers. In more recent times, all NLP tasks which are sensitive to lexical semantics potentially benefit from WSD although to what extent is largely unknown. The thesis presents a novel approach to the task of WSD on a large scale. In particular a novel knowledge source is presented named contextual information. This knowledge source adopts a sub-symbolic training mechanism to learn information from the context of a sentence which is able to aid disambiguation. The system also takes advantage of frequency information and these two knowledge sources are combined. The system is trained and tested on SEMCOR. A novel disambiguation algorithm is also developed. The algorithm must tackle the problem of a large possible number of sense combinations in a sentence. The algorithm presented aims to make an appropriate choice between accuracy and efficiency. This is performed by directing the search at a word level. The performance achieved on SEMCOR is reported and an analysis of the various components of the system is performed. The results achieved on this test data are pleasing, but are difficult to compare with most of the other work carried out in the field. For this reason the system took part in the SENSEVAL evaluation which provided an excellent opportunity to extensively compare WSD systems. SENSEVAL is a small scale WSD evaluation using the HECTOR lexicon. Despite this, few adaptations to the system were required. The performance of the system on the SENSEVAL task are reported and have also been presented in [Hawkins, 2000].
2

Word Sense Disambiguation Using WordNet and Conceptual Expansion

Guo, Jian-Yi 24 January 2006 (has links)
As a single English word can have several different meanings, a single meaning can be expressed by several different English words. The meaning of a word depends on the sense intended. Thus to select the most appropriate meaning for an ambiguous word within a context is a critical problem for the applications using the technologies of natural language processing. However, at present, most word sense disambiguation methods either disambiguate only restricted parts of speech words such as only nouns or the accuracy in disambiguating word senses is not satisfiable. The ambiguous situation often bothers users. In this study, a new word sense disambiguation method using WordNet lexicon database, SemCor text files, and the Web is presented. In addition to nouns, the proposed method also attempts to disambiguate verbs, adjectives, and adverbs in sentences. The text files and sentences investigated in the experiments were randomly selected from SemCor. The semantic similarity between the senses of individually semantically ambiguous words in a word pair is measured to select the applicable candidate senses of a target word in that word pair. By a synonym weighting method, the possible sense diversity in synonym sets is considered based on the synonym sets WordNet provides. Thus corresponding synonym sets of the candidate senses are determined. The candidate senses expanded with the senses in the corresponding synonym sets, and enhanced by the context window technique form new queries. After the new queries are submitted to a search engine to search for the matching documents on the Web, the candidate senses are ranked by the number of the matching documents found. The first sense in the list of the ranked candidate senses is viewed as the most appropriate sense of the target word. The proposed method as well as Stetina et al.¡¦s and Mihalcea et al.¡¦s methods are evaluated based on the SemCor text files. The experimental results show that for the top sense selected this method having the average accuracy of disambiguating word senses with 81.3% for nouns, verbs, adjectives, and adverbs is slightly better than Stetina et al.¡¦s method of 80% and Mihalcea et al.¡¦s method of 80.1%. Furthermore, the proposed method is the only method with the accuracy of disambiguating word senses for verbs achieving 70% for the top one sense selected. Moreover, for the top three senses selected this method is superior to the other two methods by an average accuracy of the four parts of speech exceeding 96%. It is expected that the proposed method can improve the performance of the word sense disambiguation applications in machine translation, document classification, or information retrieval.

Page generated in 0.0314 seconds