Global ETD Search

Return to search

Using Information Extraction and Text Classification in an Effort to Support Systematic Literature Reviews

Systematic literature reviews are an important tool in Evidence-basedSoftware Engineering, but require a large amount of effort and time from theresearchers. Data extraction is an important step in these reviews, but currentpractice requires the researchers to manually extract large amounts ofdata. This thesis investigates the possibility of developing a prototype forautomatic extraction, so to reduce the time spent on manually extracting thisdata. By reviewing related research, and experimenting with different features and machine learning models, two different models were implemented in the prototype: Conditional Random Fields for information extraction and Maximum Entropy for text classification. The models achieved average F1 performance score of 67.02% and 73.82%, respectively. These results can be characterized as good results, and show that it is possible to automate the data extraction process, by annotating a small part of the dataset and training machine learning models to perform the extraction.

ntnudaim:6040

MIT informatikk

Informasjonsforvaltning

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:ntnu-18415
Date	January 2012
Creators	Lazreg, Sofien
Publisher	Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, Institutt for datateknikk og informasjonsvitenskap
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0015 seconds

Using Information Extraction and Text Classification in an Effort to Support Systematic Literature Reviews

Description

Links & Downloads

Tags

Additional Fields