With an increasing amount of textual information available there is also an increased need to make this information more accessible. Our paper describes a modified TextRank model and investigates the different methods available to use automatic text summarization as a means for summary creation of swedish news articles. To evaluate our model we focused on intrinsic evaluation methods, in part through content evaluation in the form of of measuring referential clarity and non-redundancy, and in part by text quality evaluation measures, in the form of keyword retention and ROUGE evaluation. The results acquired indicate that stemming and improved stop word capabilities can have a positive effect on the ROUGE scores. The addition of redundancy checks also seems to have a positive effect on avoiding repetition of information. Keyword retention decreased somewhat, however. Lastly all methods had some trouble with dangling anaphora, showing a need for further work within anaphora resolution.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-159972 |
Date | January 2019 |
Creators | Lehto, Niko, Sjödin, Mikael |
Publisher | Linköpings universitet, Institutionen för datavetenskap, Linköpings universitet, Institutionen för datavetenskap |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0023 seconds