Logo image
Classifying UNSW-NB15 Network Traffic in the Big Data Framework Using Random Forest in Spark
Journal article   Peer reviewed

Classifying UNSW-NB15 Network Traffic in the Big Data Framework Using Random Forest in Spark

Sikha Bagui, Subhash Bagui, Jason Simonds, Russell Plenkers and Timothy Bennett
International journal of big data intelligence and applications, Vol.2(1), pp.39-61
01/07/2022

Metrics

142 Record Views

Abstract

The focus of this work is on detecting and classifying attacks in network traffic using a binary as well as multi-class machine learning classifier, Random Forest, in a distributed big data environment using Apache Spark. The classifier is tested using the UNSW-NB15 dataset. Major problems in these types of datasets include high dimensionality and imbalanced data. To address the issue of high dimensionality, both information gain as well as principal components analysis (PCA) were applied before training and testing the data using Random Forest in Apache Spark. Binary as well as multi-class Random Forest classifiers were compared in a distributed environment, with and without using PCA, using various number of Spark cores and Random Forest trees, in terms of performance time and statistical measures. The highest accuracy was obtained by the binary classifier at 99.94%, using 8 cores and 30 trees. This study obtained higher accuracy and lower False Alarm Rates than previously achieved, with low testing times.

Details

Logo image