Intrusion detection systems face challenges in processing high-volume network traffic while maintaining accuracy across diverse low volume attack types. This study presents a hybrid approach combining ARIMA time series forecasting with Decision Tree classification to detect attacks in Zeek network flow data labeled with MITRE ATT&CK tactics, leveraging PySpark for scalability. ARIMA identifies temporal anomalies which Decision Trees then classify by attack type. The ARIMA model was evaluated across 13 MITRE ATT&CK tactics, though only 7 maintained sufficient class balance for valid assessment. Results are reported at three evaluation levels: Baseline (Decision Tree only), ARIMA-DT (Decision Tree tested on ARIMA-filtered anomalies), and End-to-End (pipeline performance measured against the original test population). The hybrid model demonstrated two distinct benefits: performance improvement for detectable attacks and detection enablement for previously undetectable attacks. For high-volume attacks with existing baseline detection, ARIMA preprocessing substantially improved performance, for example, Reconnaissance achieved an ARIMA-DT F1 score of 99.71% (from a baseline of 80.88%) with End-to-End metrics confirming this improvement at 97.59% F1-score. Credential Access reached a perfect 100% precision and recall on the ARIMA-filtered subset (from a baseline recall of 7.48%); however, End-to-End evaluation revealed that ARIMA filtering removed the vast majority of Credential Access attacks, resulting in a 1.28% End-to-End F1-score—worse than the baseline F1-score of 7.41%—demonstrating that the hybrid pipeline is counterproductive for attack types whose flow characteristics closely resemble legitimate traffic. More significantly, ARIMA preprocessing enabled detection where traditional Decision Trees completely failed (0% recall) for four stealthy attack types: Defense Evasion (ARIMA-DT recall of 93.22%, End-to-End 67.83%), Discovery (ARIMA-DT recall of 100%, End-to-End 63.43%), Persistence (ARIMA-DT recall of 86.92%, End-to-End 73.38%), and Privilege Escalation (ARIMA-DT recall of 89.93%, End-to-End 64.68%). These results demonstrate that ARIMA-based statistical anomaly detection is particularly effective for attacks involving subtle, low-volume activities that blend with legitimate operations, while also improving classification accuracy for high-volume reconnaissance activities.
Files and links (1)
url
A Hybrid Time Series Forecasting Model Combining ARIMA and Decision Trees to Detect Attacks in MITRE ATT&CK Labeled Zeek Log DataView
Published (Version of record)link to articleCC BY V4.0, Open
Related links
Details
Title
A Hybrid Time Series Forecasting Model Combining ARIMA and Decision Trees to Detect Attacks in MITRE ATT&CK Labeled Zeek Log Data
Publication Details
Electronics (Basel), Vol.15(4), p.871
Resource Type
Journal article
Publisher
MDPI AG
Number of pages
42
Grant note
Askew Institute at The University of West Florida
This work was partially supported by the Askew Institute at The University of West Florida.