Return to search

Enhancing Relevant Region Classifying

In this thesis we present a new way of extracting relevant data from texts. We use the method presented in the paper by Patwardhan and Rilo (2007), with improvements of our own. Our approach modifes the input to the support vector machine, to construct a self-trained relevant sentence classi er. This classffer is used to identify relevant sentences on the MUC-4 terrorism corpus.We modify the input by removing stopwords, converting words to its stem and only using words that occur at least three times in the corpus. We also changed how each word is weighted, using TF x IDF as weighting function. By using the relevant sentence classiffer together with domain relevant extraction patterns, we achieved higher performance on the MUC-4 terrorism corpus than the original model.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-32661
Date January 2011
CreatorsKarlsson, Thomas
PublisherKTH, Skolan för informations- och kommunikationsteknik (ICT)
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess
RelationTrita-ICT-EX ; 52

Page generated in 0.017 seconds