Logo image
Class-Specific GAN Augmentation for Imbalanced Intrusion Detection: A Comparative Study Using the UWF-ZeekData22 Dataset
Journal article   Open access   Peer reviewed

Class-Specific GAN Augmentation for Imbalanced Intrusion Detection: A Comparative Study Using the UWF-ZeekData22 Dataset

Future internet, Vol.18(4), p.200
04/10/2026
Web of Science ID: WOS:001751356300001

Metrics

1 Record Views

Abstract

Extreme class imbalance is a persistent obstacle for machine learning-driven intrusion detection, as rare but high-impact cyberattacks occur far less frequently than benign traffic in training data. In many real-world cybersecurity datasets, this imbalance becomes extreme, with certain attack types containing a handful of samples, effectively placing the problem in a few-shot learning regime. This paper presents a controlled benchmarking study of Generative Adversarial Network (GAN) objectives for synthesizing minority-class cyberattack data. Using the UWF-ZeekData22 network traffic dataset, each MITRE ATT&CK tactic is framed as a separate binary detection task, and tactic-specific GANs are trained solely on minority samples to generate synthetic attack records. Four widely used GAN variants—Vanilla GAN, Conditional GAN (cGAN), Wasserstein GAN (WGAN), and Wasserstein GAN with Gradient Penalty (WGAN-GP)—are compared under unified training steps and fixed augmentation conditions. The utility of generated data is assessed by evaluating downstream detection performance using five traditional classifiers: Logistic Regression, Support Vector Machine, k-Nearest Neighbors, Decision Tree, and Random Forest. The results indicate that GAN augmentation generally strengthens minority-class detection across tactics and models, reducing false negatives and improving recall consistency, while not systematically harming majority-class performance. However, the effectiveness of each GAN objective varies significantly with data sparsity. Specifically, simpler adversarial objectives often outperform more complex architectures by preserving discriminative feature structure, while heavily regularized models may overly smooth minority-class distributions and reduce separability. Wasserstein-based objectives provide improved training stability, but additional regularization does not consistently translate to better detection performance. Overall, the results demonstrate that in extreme-imbalance settings, GAN effectiveness is governed more by data sparsity and structure preservation than by architectural complexity. These findings establish class-specific generative augmentation as a practical strategy for intrusion detection and provide empirical guidance for selecting appropriate GAN objectives for tabular cybersecurity data under highly imbalanced conditions.
url
Class-Specific GAN Augmentation for Imbalanced Intrusion Detection: A Comparative Study Using the UWF-ZeekData22 DatasetView
Published (Version of record) link to article Open CC BY V4.0

Related links

Details

Logo image