Return to search

Extracting Named Entities and Synonyms from Wikipedia for use in News Search

<p>In news articles the focus on named entities is quite common and usually a news case is tied around a person, a company, or similar. One challenge from an information retrieval point of view is that one entity often have more than one way of referring to it. This means that when users use news search engines they have to use the exact same name for the entity as the articles they are interested in use. Therefore the usage of synonyms to refer to the same entity forms the basis of this thesis. We explore the idea of using Wikipedia as a data source for building a large dictionary of named entities and their synonyms. An entity dictionary like that would be very interesting because it make it possible to link synonyms to the same entity. The evaluation shows that Wikipedia is well suited as a source of named entities and synonyms as the semi-structure aids in recognizing the entities and related synonyms. The use of the dictionary in a modified search solution shows on the other hand mixed results. On problem with evaluating a solution like this is that the precision of the different synonyms is usually very high for popular entities, and when we combine different synonyms in the same query we end up giving more weight to the results that use multiple synonyms.</p>

Identiferoai:union.ndltd.org:UPSALLA/oai:DiVA.org:ntnu-8906
Date January 2008
CreatorsBøhn, Christian
PublisherNorwegian University of Science and Technology, Department of Computer and Information Science, Institutt for datateknikk og informasjonsvitenskap
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, text

Page generated in 0.0017 seconds