Spam messages in the form of e-mail is a growing problem in today's businesses. It is a problem that costs time and resources to counteract. Research into this has been done to produce techniques and tools aimed at addressing the growing number on incoming spam e-mails. The research on different algorithms and their ability to classify e-mail messages needs an update since both tools and spam e-mails have become more advanced. In this study, three different machine learning algorithms have been evaluated based on their ability to correctly classify e-mails as legitimate or spam. These algorithms are naive Bayes, support vector machine and decision tree. The algorithms are tested in an experiment with the Enron spam dataset and are then compared against each other in their performance. The result of the experiment was that support vector machine is the algorithm that correctly classified most of the data points. Even though support vector machine has the largest percentage of correctly classified data points, other algorithms can be useful from a business perspective depending on the task and context.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-389384 |
Date | January 2019 |
Creators | Bergens, Simon, Frykengård, Pontus |
Publisher | Uppsala universitet, Institutionen för informatik och media, Högskolan på Gotland, Avdelningen för Programvaruteknik, Uppsala universitet, Institutionen för informatik och media, Högskolan på Gotland, Avdelningen för Programvaruteknik |
Source Sets | DiVA Archive at Upsalla University |
Language | Swedish |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.002 seconds