Return to search

Window-based stream data mining for classification of Internet traffic

Accurate classification of Internet applications is a fundamental requirement for network provisioning, network security, maintaining quality of services and network management. Increasingly, new applications are being introduced on the Internet. The traffic volume and patterns of some of the new applications such as Peer-to-Peer (P2P) file sharing put pressure on service providers' networks in terms of congestion and delay, to the point that maintaining Quality of Services (QoS) planned in the access network requires the provisioning of additional bandwidth sooner than planned. Peer-to-Peer applications enable users to communicate directly over the Internet, thus bypassing central server control implemented by service providers and poses threats in terms of network congestion, and creating an environment for malicious attacks on networks. One key challenge in this area is to adapt to the dynamic nature of Internet traffic. With the growth in Internet traffic, in terms of number and type of applications, traditional classification techniques such as port matching, protocol decoding or packet payload analysis are no longer effective For instance, P2P applications may use randomly selected non-standard ports to communicate which makes it difficult to distinguish from other types of traffic only by inspecting port number.
The present research introduces two new techniques to classify stream (online) data using K-means clustering and Fast Decision Tree (FDT). In the first technique, we first generate micro-clusters using k-means clustering with different values of k. Micro clusters are then merged into two clusters based on weighted averages of P2P and NonP2P population. This technique generates two merged clusters, each representing P2P or NonP2P traffic. The simulation results confirm that the two final clusters represent P2P and NonP2P traffic each with a good accuracy.
The second technique employs a two-stage architecture for classification of P2P traffic, where in the first stage, the traffic is filtered using standard port numbers and layer 4 port matching to label well-known P2P and NonP2P traffics, leaving the rest of the traffic as "Unknown". The labeled traffic generated in the first stage is used to train a Fast Decision Tree (FDT) classifier with high accuracy. The Unknown traffic is then applied to the FDT model which classifies the traffic into P2P and NonP2P with high accuracy. The two-stage architecture, therefore, not only classifies well-known P2P applications, it also classifies applications that use random or private (non standard) port numbers and can not be classified otherwise. We performed various experiments where we captured Internet traffic at a main gateway router, pre-processed the data and selected three most significant attributes, namely Packet Length, Source IP address and Destination IP address. We then applied the proposed technique to three different windows of records. Accuracy, Specificity and Sensitivity of the model are calculated. Our simulation results confirm that the predicted output represents P2P and NonP2P traffic with accuracy higher than 90%.

Identiferoai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/27601
Date January 2008
CreatorsMumtaz, Ali
PublisherUniversity of Ottawa (Canada)
Source SetsUniversité d’Ottawa
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Format74 p.

Page generated in 0.0144 seconds