Return to search

Graph-Based Keyphrase Extraction Using Wikipedia

Keyphrases describe a document in a coherent and simple way, giving the prospective reader a way to quickly determine whether the document satisfies their information needs. The pervasion of huge amount of information on Web, with only a small amount of documents have keyphrases extracted, there is a definite need to discover automatic keyphrase extraction systems. Typically, a document written by human develops around one or more general concepts or sub-concepts. These concepts or sub-concepts should be structured and semantically related with each other, so that they can form the meaningful representation of a document. Considering the fact, the phrases or concepts in a document are related to each other, a new approach for keyphrase extraction is introduced that exploits the semantic relations in the document. For measuring the semantic relations between concepts or sub-concepts in the document, I present a comprehensive study aimed at using collaboratively constructed semantic resources like Wikipedia and its link structure. In particular, I introduce a graph-based keyphrase extraction system that exploits the semantic relations in the document and features such as term frequency. I evaluated the proposed system using novel measures and the results obtained compare favorably with previously published results on established benchmarks.

Identiferoai:union.ndltd.org:unt.edu/info:ark/67531/metadc67939
Date12 1900
CreatorsDandala, Bharath
ContributorsTarau, Paul, Mihalcea, Rada, 1974-, Ruiz, Miguel E.
PublisherUniversity of North Texas
Source SetsUniversity of North Texas
LanguageEnglish
Detected LanguageEnglish
TypeThesis or Dissertation
FormatText
RightsPublic, Copyright, Dandala, Bharath, Copyright is held by the author, unless otherwise noted. All rights reserved.

Page generated in 0.0016 seconds