The majority of documented phishing attacks have been carried by email, yet few studies have measured the impact of email headers on the predictive accuracy of machine learning techniques in detecting email phishing attacks. Research has shown that the inclusion of a limited subset of email headers as features in training machine learning algorithms to detect phishing attack did increase the predictive accuracy of these learning algorithms. The same research also recommended further investigation of the impact of including an expanded set of email headers on the predictive accuracy of machine learning algorithms.
In addition, research has shown that the cost of misclassifying legitimate emails as phishing attacks--false positives--was far higher than that of misclassifying phishing emails as legitimate--false negatives, while the opposite was true in the case of fraud detection. Consequently, they recommended that cost sensitive measures be taken in order to further improve the weighted predictive accuracy of machine learning algorithms.
Motivated by the potentially high impact of the inclusion of email headers on the predictive accuracy of machines learning algorithms and the significance of enabling cost-sensitive measures as part of the learning process, the goal of this research was to quantify the impact of including an extended set of email headers and to investigate the impact of imposing penalty as part of the learning process on the number of false positives. It was believed that if email headers were included and cost-sensitive measures were taken as part of the learning process, than the overall weighted, predictive accuracy of the machine learning algorithm would be improved. The results showed that adding email headers as features did improve the overall predictive accuracy of machine learning algorithms and that cost-sensitive measure taken as part of the learning process did result in lower false positives.
Identifer | oai:union.ndltd.org:nova.edu/oai:nsuworks.nova.edu:gscis_etd-1324 |
Date | 01 January 2013 |
Creators | Tout, Hicham Refaat |
Publisher | NSUWorks |
Source Sets | Nova Southeastern University |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | CEC Theses and Dissertations |
Page generated in 0.002 seconds