Return to search

Using Information Extraction and Text Classification in an Effort to Support Systematic Literature Reviews

Systematic literature reviews are an important tool in Evidence-basedSoftware Engineering, but require a large amount of effort and time from theresearchers. Data extraction is an important step in these reviews, but currentpractice requires the researchers to manually extract large amounts ofdata. This thesis investigates the possibility of developing a prototype forautomatic extraction, so to reduce the time spent on manually extracting thisdata. By reviewing related research, and experimenting with different features and machine learning models, two different models were implemented in the prototype: Conditional Random Fields for information extraction and Maximum Entropy for text classification. The models achieved average F1 performance score of 67.02% and 73.82%, respectively. These results can be characterized as good results, and show that it is possible to automate the data extraction process, by annotating a small part of the dataset and training machine learning models to perform the extraction.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:ntnu-18415
Date January 2012
CreatorsLazreg, Sofien
PublisherNorges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, Institutt for datateknikk og informasjonsvitenskap
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0018 seconds