A Hybrid Time Series Forecasting Model Combining ARIMA and Decision Trees to Detect Attacks in MITRE ATT&CK Labeled Zeek Log Data

Raymond Freeman; Sikha Bagui; Subhash Bagui; Mink Dustin; Sarah Cameron; Carvalho Germano Correa Silva De

doi:10.3390/electronics15040871

Back

A Hybrid Time Series Forecasting Model Combining ARIMA and Decision Trees to Detect Attacks in MITRE ATT&CK Labeled Zeek Log Data

Journal article

Open access

Peer reviewed

A Hybrid Time Series Forecasting Model Combining ARIMA and Decision Trees to Detect Attacks in MITRE ATT&CK Labeled Zeek Log Data

Raymond Freeman, Sikha Bagui, Subhash Bagui, Mink Dustin, Sarah Cameron and Carvalho Germano Correa Silva De

Electronics (Basel), Vol.15(4), p.871

02/15/2026

DOI: https://doi.org/10.3390/electronics15040871

Web of Science ID: WOS:001700998200001

Metrics

1 Record Views

Abstract

Intrusion detection systems face challenges in processing high-volume network traffic while maintaining accuracy across diverse low volume attack types. This study presents a hybrid approach combining ARIMA time series forecasting with Decision Tree classification to detect attacks in Zeek network flow data labeled with MITRE ATT&CK tactics, leveraging PySpark for scalability. ARIMA identifies temporal anomalies which Decision Trees then classify by attack type. The ARIMA model was evaluated across 13 MITRE ATT&CK tactics, though only 7 maintained sufficient class balance for valid assessment. Results are reported at three evaluation levels: Baseline (Decision Tree only), ARIMA-DT (Decision Tree tested on ARIMA-filtered anomalies), and End-to-End (pipeline performance measured against the original test population). The hybrid model demonstrated two distinct benefits: performance improvement for detectable attacks and detection enablement for previously undetectable attacks. For high-volume attacks with existing baseline detection, ARIMA preprocessing substantially improved performance, for example, Reconnaissance achieved an ARIMA-DT F1 score of 99.71% (from a baseline of 80.88%) with End-to-End metrics confirming this improvement at 97.59% F1-score. Credential Access reached a perfect 100% precision and recall on the ARIMA-filtered subset (from a baseline recall of 7.48%); however, End-to-End evaluation revealed that ARIMA filtering removed the vast majority of Credential Access attacks, resulting in a 1.28% End-to-End F1-score—worse than the baseline F1-score of 7.41%—demonstrating that the hybrid pipeline is counterproductive for attack types whose flow characteristics closely resemble legitimate traffic. More significantly, ARIMA preprocessing enabled detection where traditional Decision Trees completely failed (0% recall) for four stealthy attack types: Defense Evasion (ARIMA-DT recall of 93.22%, End-to-End 67.83%), Discovery (ARIMA-DT recall of 100%, End-to-End 63.43%), Persistence (ARIMA-DT recall of 86.92%, End-to-End 73.38%), and Privilege Escalation (ARIMA-DT recall of 89.93%, End-to-End 64.68%). These results demonstrate that ARIMA-based statistical anomaly detection is particularly effective for attacks involving subtle, low-volume activities that blend with legitimate operations, while also improving classification accuracy for high-volume reconnaissance activities.

Files and links (1)

url

A Hybrid Time Series Forecasting Model Combining ARIMA and Decision Trees to Detect Attacks in MITRE ATT&CK Labeled Zeek Log DataView

Published (Version of record)link to articleCC BY V4.0, Open

Details

Title: A Hybrid Time Series Forecasting Model Combining ARIMA and Decision Trees to Detect Attacks in MITRE ATT&CK Labeled Zeek Log Data
Publication Details: Electronics (Basel), Vol.15(4), p.871
Resource Type: Journal article
Publisher: MDPI AG
Number of pages: 42
Grant note: Askew Institute at The University of West Florida
This work was partially supported by the Askew Institute at The University of West Florida.
Identifiers: WOS:001700998200001; 99381656086806600
Academic Unit: Cybersecurity and Information Technology; Mathematics and Statistics; Computer Science; Hal Marcus College of Science and Engineering
Language: English

A Hybrid Time Series Forecasting Model Combining ARIMA and Decision Trees to Detect Attacks in MITRE ATT&CK Labeled Zeek Log Data

Metrics

Abstract

Files and links (1)

Related links

Details

University of West Florida Social media