Global ETD Search

Return to search

Email Mining Classifier : The empirical study on combining the topic modelling with Random Forest classification

Filtering out and replying automatically to emails are of interest to many but is hard due to the complexity of the language and to dependencies of background information that is not present in the email itself. This paper investigates whether Latent Dirichlet Allocation (LDA) combined with Random Forest classifier can be used for the more general email classification task and how it compares to other existing email classifiers. The comparison is based on the literature study and on the empirical experimentation using two real-life datasets. Firstly, a literature study is performed to gain insight of the accuracy of other available email classifiers. Secondly, proposed model’s accuracy is explored with experimentation. The literature study shows that the accuracy of more general email classifiers differs greatly on different user sets. The proposed model accuracy is within the reported accuracy range, however in the lower part. It indicates that the proposed model performs poorly compared to other classifiers. On average, the classifier performance improves 15 percentage points with additional information. This indicates that Latent Dirichlet Allocation (LDA) combined with Random Forest classifier is promising, however future studies are needed to explore the model and ways to further increase the accuracy.

http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-14710

Email mining

Latent Dirichlet Allocation

Random Forest classification

Computer Sciences

Datavetenskap (datalogi)

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:his-14710
Date	January 2017
Creators	Halmann, Marju
Publisher	Högskolan i Skövde, Institutionen för informationsteknologi
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0019 seconds

Email Mining Classifier : The empirical study on combining the topic modelling with Random Forest classification

Description

Links & Downloads

Tags

Additional Fields