Network traffic classification and characterisation is playing an increasingly vital role in understanding and solving security-related issues in internet-based applications. The priority of research studies in this area has focused on characterisation of network traffic based on various layers of communication protocols as outlined in the TCP/IP stack and even further expanded to concentrate on specific application-layer protocols. Virtual Private Networks (VPNs) have become one of the most popular remote access communication methods among users over the public internet and other Internet Protocol (IP)-based networks. VPNs are governed by IP Security, which is a suite of protocols used for tunnelling the already encrypted IP traffic, to guarantee secure remote access to servers. In this paper, we propose and develop a framework to classify VPN or non-VPN network traffic using time-related features. Our focus is on classification of network traffic which is encrypted, tunnelled through a VPN, and the one which is normally encrypted (non-VPN transmission), using machine-learning techniques on data sets of time-related features. Six classification models: logistic regression, support vector machine, Naïve Bayes, k-nearest neighbour and ensemble methods - the Random Forest (RF) classifier and Gradient Boosting Tree (GBT) classifiers - are compared, and recommendations of optimised RF and GBT models over other models are provided in terms of high accuracy and low overfitting. Features which contributed to achieve 90% accuracy in each category were also identified.
Related links
Details
Title
Comparison of machine-learning algorithms for classification of VPN network traffic flow using time-related features