Return to search

Email classification using machine learning algorithms

The goal of this project is to construct a machine learning algorithmthat improves over time. This was done by first constructing a datasetthat reflects real world messages, that would simulate receiving emailsfrom two different sources. The data set was constructed by combiningdata from two different online forums. Two application programminginterrfaces were used to collect and send data to the program. Thedataset was tested on 4 different methods where the best one would beused for the final product. The 4 different methods were: k-nearestneighbors, adaptive boosting, random forest and artificial neuralnetwork. All the above methods were tested and tuned to achieve the bestaccuracy. From the result it became clear that the artificial neuralnetwork outperformed the other methods by a large margin and would bemost suited for the final product. The final product was an algorithmthat would improve over time. This was achieved by using a feedback loopon the new data that was collected over time from the online forums. Ifthe algorithm was sure that a new datapoint was the right class it wouldincorporate it into the dataset and over time the dataset would growlarger and the algorithm would adapt to new data and trends. The finalresult became a growing dataset that started on a 1000 data points andended up at 8464 data points, where the total amount ofmisclassification ended up at 74.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-476318
Date January 2022
CreatorsJonsson, Isak
PublisherUppsala universitet, Institutionen för materialvetenskap
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess
RelationMATVET-F ; 22016

Page generated in 0.002 seconds