Logo image
Detecting Cyber Threats in UWF-ZeekDataFall22 Using K-Means Clustering in the Big Data Environment
Journal article   Open access   Peer reviewed

Detecting Cyber Threats in UWF-ZeekDataFall22 Using K-Means Clustering in the Big Data Environment

Sikha Bagui, Carvalho Germano Correa Silva De, Mishra Asmi, Dustin Mink, Subhash Bagui and Stephanie Eager
Future internet, Vol.17(6), p.267
06/18/2025
Web of Science ID: WOS:001515805100001

Metrics

228 File views/ downloads
32 Record Views

Abstract

In an era marked by the rapid growth of the Internet of Things (IoT), network security has become increasingly critical. Traditional Intrusion Detection Systems, particularly signature-based methods, struggle to identify evolving cyber threats such as Advanced Persistent Threats (APTs)and zero-day attacks. Such threats or attacks go undetected with supervised machine-learning methods. In this paper, we apply K-means clustering, an unsupervised clustering technique, to a newly created modern network attack dataset, UWF-ZeekDataFall22. Since this dataset contains labeled Zeek logs, the dataset was de-labeled before using this data for K-means clustering. The labeled data, however, was used in the evaluation phase, to determine the attack clusters post-clustering. In order to identify APTs as well as zero-day attack clusters, three different labeling heuristics were evaluated to determine the attack clusters. To address the challenges faced by Big Data, the Big Data framework, that is, Apache Spark and PySpark, were used for our development environment. In addition, the uniqueness of this work is also in using connection-based features. Using connection-based features, an in-depth study is done to determine the effect of the number of clusters, seeds, as well as features, for each of the different labeling heuristics. If the objective is to detect every single attack, the results indicate that 325 clusters with a seed of 200, using an optimal set of features, would be able to correctly place 99% of attacks.
pdf
Detecting Cyber Threats in UWF-ZeekDataFall22 Using K-Means Clustering in the Big Data Environment1.45 MBDownloadView
Published (Version of record)Article pdfCC BY V4.0 Open Access
url
Detecting Cyber Threats in UWF-ZeekDataFall22 Using K-Means Clustering in the Big Data EnvironmentView
Published (Version of record)link to articleCC BY V4.0 Open

Related links

Details

Logo image