Global ETD Search

Return to search

Extracting Keyphrases from Individual News Articles

Extraction of keyphrases from individual documents is a research area in which one try to extract a small set of keyphrases that describe the content of a single document. The advantages with this form of extraction is that it retains most of the semantic context from the document.In this thesis we focus on the news article domain and use the structure of a news article to improve the quality of the extracted keyphrases. An existing individual document keyphrase extraction algorithm is used as the basis. This algorithm is enhanced by implementing a weighting system based upon the structure of news articles. In addition some other common methods for keyword extraction is applied. The effects of these changes are tested extensively in the evaluation phase.In the evaluation of the implemented prototype we find that the introduction of a weight based system yields results that are equal to the basic algorithm and that few improvements can be made. We do however find that an automatically generated stopword list based on the corpus improves the results by 1-2%.

ntnudaim:6047

MTDT datateknikk

Program- og informasjonssystemer

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:ntnu-14225
Date	January 2011
Creators	Lund, Kristian
Publisher	Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, Institutt for datateknikk og informasjonsvitenskap
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0018 seconds

Extracting Keyphrases from Individual News Articles

Description

Links & Downloads

Tags

Additional Fields