Threats of malware, attacks and intrusion have been around since the very conception ofcomputing. Yet, it was not until the sudden growth of the internet that awareness of security anddigital assets really started to pick up steam. The internet presents a new liability, as the everincreasingnumber of machines on the web provides a new goldmine for those seeking to exploitvulnerabilities. As access increases, new ways are created for attackers to exploit network systemsand their users. Among various types of attack, DDoS remains the most devastating and severedue to its potential impact, and the potentiality keeps on growing, making intrusion detection amust for network security and defense. As a result, machine learning and artificial intelligenceresearch has flourished over the last few years, opening new doors for intrusion detectiontechnologies. However, data availability still limits greatly the success of such technologies, asresearch faces a shortage of good quality IDS datasets.This study bases itself on this persisting issue as it assesses the state-of-the-art of open datasetsand their ability to detect intrusion and harmful network traffic. In particular, this study focuseson providing a comparison of intrusion detection performance of open DDoS attack datasets.DDoS attacks are some of the most concerning due to the magnitude of damage that they arecapable of. Literature on open DDoS datasets is fairly scarce in comparison to other forms ofattacks, hence, this study seeks to shed more light on the nature of existing DDoS data in relationto intrusion detection. The proposed solution sees four DDoS datasets analysed using a set of sixmachine learning algorithms, namely, k-NN, SVM, naïve Bayes, decision tree and logisticregression. This study aims to assess these datasets and analyse their performance with regardsto classification of network traffic.The results of this study contribute to a better understanding of the intrusion detection capacityof open DDoS datasets. The datasets are analysed on the basis of 5 performance metrics: accuracy,precision, recall, F-measure and computation time. The results show that voluminous datasets,such as the CEC-CIC-IDS2018 dataset, can achieve very high performance. In modelling terms, the results denote that random forest performs very well over a wide range of datasets, while naïveBayes and SVM are less consistent.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:ltu-78980 |
Date | January 2020 |
Creators | Kiourkoulis, Stefanos |
Publisher | Luleå tekniska universitet, Institutionen för system- och rymdteknik |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0021 seconds