Logo image
Model Retraining upon Concept Drift Detection in Network Traffic Big Data
Journal article   Open access   Peer reviewed

Model Retraining upon Concept Drift Detection in Network Traffic Big Data

Sikha S. Bagui, Mohammad Pale Khan, Chedlyne Valmyr, Subhash C. Bagui and Dustin Mink
Future internet, Vol.17(8), p.328
07/24/2025
Web of Science ID: WOS:001558323000001

Metrics

2 File views/ downloads
20 Record Views

Abstract

This paper presents a comprehensive model for detecting and addressing concept drift in network security data using the Isolation Forest algorithm. The approach leverages Isolation Forest’s inherent ability to efficiently isolate anomalies in high-dimensional data, making it suitable for adapting to shifting data distributions in dynamic environments.Anomalies in network attack data may not occur in large numbers, so it is important to be able to detect anomalies even with small batch sizes. The novelty of this work lies in successfully detecting anomalies even with small batch sizes and identifying the point at which incremental retraining needs to be started. Triggering retraining early also keeps the model in sync with the latest data, reducing the chance for attacks to be successfully conducted. Our methodology implements an end-to-end workflow that continuously monitors incoming data and detects distribution changes using Isolation Forest, then manages model retraining using Random Forest to maintain optimal performance. We evaluate our approach using UWF-ZeekDataFall22, a newly created dataset that analyzes Zeek’s Connection Logs collected through Security Onion 2 network security monitor and labeled using the MITRE ATT&CK framework. Incremental as well as full retraining are analyzed using Random Forest. There was a steady increase in the model’s performance with incremental retraining and a positive impact on the model’s performance with full model retraining.
pdf
Model Retraining upon Concept Drift Detection in Network Traffic Big Data3.81 MBDownloadView
Published (Version of record)Article pdfCC BY V4.0 Open Access
url
Model Retraining upon Concept Drift Detection in Network Traffic Big DataView
Published (Version of record)link to articleCC BY V4.0 Open

Related links

Details

Logo image