List of works
Conference proceeding
Published 12/16/2025
Algorithms, 18, 12, 795
This work focuses on finding frequent patterns in continuous flow network traffic Big Data using incremental frequent pattern mining. A newly created Zeek Conn Log MITRE ATT&CK framework labeled dataset, UWF-ZeekData24, generated using the Cyber Range at The University of West Florida, was used for this study. While FP-Growth is effective for static datasets, its standard implementation does not support incremental mining, which poses challenges for applications involving continuously growing data streams, such as network traffic logs. To overcome this limitation, a staged incremental FP-Growth approach is adopted for this work. The novelty of this work is in showing how incremental FP-Growth can be used efficiently on continuous flow network traffic, or streaming network traffic data, where no rebuild is necessary when new transactions are scanned and integrated. Incremental frequent pattern mining also generates feature subsets that are useful for understanding the nature of the individual attack tactics. Hence, a detailed understanding of the features or feature subsets of the seven different MITRE ATT&CK tactics is also presented. For example, the results indicate that core behavioral rules, such as those involving TCP protocols and service associations, emerge early and remain stable throughout later increments. The incremental FP-Growth framework provides a structured lens through which network behaviors can be observed and compared over time, supporting not only classification but also investigative use cases such as anomaly tracking and technique attribution. And finally, the results of this work, the frequent itemsets, will be useful for intrusion detection machine learning/artificial intelligence algorithms.
Journal article
Classifying Cyber Ranges: A Case-Based Analysis Using the UWF Cyber Range
Published 10/10/2025
Encyclopedia (Basel, Switzerland), 5, 4, 162
To address the gaps in cyber range survey research, this entry develops and applies a structured classification taxonomy to support the comparison, evaluation, and design of cyber ranges. The entry will address the following question: What are the objectives and key features of current cyber ranges, and how can they be classified into a comprehensive taxonomy? The entry synthesizes existing frameworks and analyzes and classifies a variety of documented cyber ranges to find similarities and gaps in the current classification methods. The findings indicate recurring design elements across ranges, persistent gaps in standardization, and demonstrate how the University of West Florida (UWF) Cyber Range exemplifies the taxonomy application in practice. The goal is to facilitate informed decision-making by cybersecurity professionals when choosing platforms and to support academic research in cybersecurity education. Pulling information from studies about other cyber ranges to compare with the UWF Cyber Range, this taxonomy aims to contribute to the documentation of cyber ranges by providing a clear understanding of the current cyber range landscape.
Journal article
Published 09/10/2025
Electronics (Basel), 14, 18, 3597
Data preprocessing is usually necessary before running most machine learning classifiers. This work compares three different preprocessing techniques, minimal preprocessing, Principal Components Analysis (PCA), and Linear Discriminant Analysis (LDA). The efficiency of these three preprocessing techniques is measured using the Support Vector Machine (SVM) classifier. Efficiency is measured in terms of statistical metrics such as accuracy, precision, recall, the F-1 measure, and AUROC. The preprocessing times and the classifier run times are also compared using the three differently preprocessed datasets. Finally, a comparison of performance timings on CPUs vs. GPUs with and without the MapReduce environment is performed. Two newly created Zeek Connection Log datasets, collected using the Security Onion 2 network security monitor and labeled using the MITRE ATT&CK framework, UWF-ZeekData22 and UWF-ZeekDataFall22, are used for this work. Results from this work show that binomial LDA, on average, performs the best in terms of statistical measures as well as timings using GPUs or MapReduce GPUs.
Journal article
Published 08/06/2025
Bioengineering (Basel), 12, 8, 846
Neuromuscular hip dysplasia (NHD) is a common deformity in children with cerebral palsy (CP). Although some predictive factors of NHD are known, the prediction of NHD is in its infancy. We present a Clinical Decision Support System (CDSS) designed to calculate the probability of developing NHD in children with CP. The system utilizes an ensemble of three machine learning (ML) algorithms: Neural Network (NN), Support Vector Machine (SVM), and Logistic Regression (LR). The development and evaluation of the CDSS followed the DECIDE-AI guidelines for AI-driven clinical decision support tools. The ensemble was trained on a data series from 182 subjects. Inclusion criteria were age between 12 and 18 years and diagnosis of CP from two specialized units. Clinical and functional data were collected prospectively between 2005 and 2023, and then analyzed in a cross-sectional study. Accuracy and area under the receiver operating characteristic (AUROC) were calculated for each method. Best logistic regression scores highlighted history of previous orthopedic surgery (p = 0.001), poor motor function (p = 0.004), truncal tone disorder (p = 0.008), scoliosis (p = 0.031), number of affected limbs (p = 0.05), and epilepsy (p = 0.05) as predictors of NHD. Both accuracy and AUROC were highest for NN, 83.7% and 0.92, respectively. The novelty of this study lies in the development of an efficient Clinical Decision Support System (CDSS) prototype, specifically designed to predict future outcomes of neuromuscular hip dysplasia (NHD) in patients with cerebral palsy (CP) using clinical data. The proposed system, PredictMed-CDSS, demonstrated strong predictive performance for estimating the probability of NHD development in children with CP, with the highest accuracy achieved using neural networks (NN). PredictMed-CDSS has the potential to assist clinicians in anticipating the need for early interventions and preventive strategies in the management of NHD among CP patients.
Journal article
Model Retraining upon Concept Drift Detection in Network Traffic Big Data
Published 07/24/2025
Future internet, 17, 8, 328
This paper presents a comprehensive model for detecting and addressing concept drift in network security data using the Isolation Forest algorithm. The approach leverages Isolation Forest’s inherent ability to efficiently isolate anomalies in high-dimensional data, making it suitable for adapting to shifting data distributions in dynamic environments.Anomalies in network attack data may not occur in large numbers, so it is important to be able to detect anomalies even with small batch sizes. The novelty of this work lies in successfully detecting anomalies even with small batch sizes and identifying the point at which incremental retraining needs to be started. Triggering retraining early also keeps the model in sync with the latest data, reducing the chance for attacks to be successfully conducted. Our methodology implements an end-to-end workflow that continuously monitors incoming data and detects distribution changes using Isolation Forest, then manages model retraining using Random Forest to maintain optimal performance. We evaluate our approach using UWF-ZeekDataFall22, a newly created dataset that analyzes Zeek’s Connection Logs collected through Security Onion 2 network security monitor and labeled using the MITRE ATT&CK framework. Incremental as well as full retraining are analyzed using Random Forest. There was a steady increase in the model’s performance with incremental retraining and a positive impact on the model’s performance with full model retraining.
Journal article
Detecting Cyber Threats in UWF-ZeekDataFall22 Using K-Means Clustering in the Big Data Environment
Published 06/18/2025
Future internet, 17, 6, 267
In an era marked by the rapid growth of the Internet of Things (IoT), network security has become increasingly critical. Traditional Intrusion Detection Systems, particularly signature-based methods, struggle to identify evolving cyber threats such as Advanced Persistent Threats (APTs)and zero-day attacks. Such threats or attacks go undetected with supervised machine-learning methods. In this paper, we apply K-means clustering, an unsupervised clustering technique, to a newly created modern network attack dataset, UWF-ZeekDataFall22. Since this dataset contains labeled Zeek logs, the dataset was de-labeled before using this data for K-means clustering. The labeled data, however, was used in the evaluation phase, to determine the attack clusters post-clustering. In order to identify APTs as well as zero-day attack clusters, three different labeling heuristics were evaluated to determine the attack clusters. To address the challenges faced by Big Data, the Big Data framework, that is, Apache Spark and PySpark, were used for our development environment. In addition, the uniqueness of this work is also in using connection-based features. Using connection-based features, an in-depth study is done to determine the effect of the number of clusters, seeds, as well as features, for each of the different labeling heuristics. If the objective is to detect every single attack, the results indicate that 325 clusters with a seed of 200, using an optimal set of features, would be able to correctly place 99% of attacks.
Journal article
Published 04/25/2025
Data (Basel), 10, 5, 59
This paper describes the creation of a new dataset, UWF-ZeekData24, aligned with the Enterprise MITRE ATT&CK Framework, that addresses critical shortcomings in existing network security datasets. Controlling the construction of attacks and meticulously labeling the data provides a more accurate and dynamic environment for testing of IDS/IPS systems and their machine learning algorithms. The outcomes of this research will assist in the development of cybersecurity solutions as well as increase the robustness and adaptability towards modern day cybersecurity threats. This new carefully engineered dataset will enhance cyber defense mechanisms that are responsible for safeguarding critical infrastructures and digital assets. Finally, this paper discusses the differences between crowd-sourced data and data collected in a more controlled environment.
Journal article
Critical Reflection Sessions: Teacher's Perspectives During Professional Development
First online publication 10/14/2024
International Journal of Changes in Education, online ahead of print
This study aimed to explore the critical reflection experiences of teachers who took part in effective and ineffective professional learning events. It examined the influence of D.A. Kolb's reflective observation stage within the experiential learning theory (ELT) framework on teachers' professional development. Using a qualitative interpretive phenomenological method, the research investigated teachers' viewpoints, beliefs, and experiences related to various professional development activities to evaluate their effectiveness. Eleven teachers attended the same educational conference. The investigation involved semi-structured individual interviews and a focus group discussion. The interview questions centered around the concepts of enactive mastery and vicarious experiences. Open-ended discussions allowed participants to explore their experiences of professional learning. To highlight emergent patterns, this study employed a phenomenological technique to analyze data and identified critical observations through an inductive coding method using the NVivo software. The study's findings suggested that the reflective process necessitated time, collaboration, and structure during professional learning sessions. Moreover, having a group of peers was advantageous for critical reflection. They created a learning environment where support was the norm, and dedicated time encouraged self-reflection, which promoted effective growth in teacher development. The researchers found that inquiry-based professional development promoted introspection and facilitated profound learning. The research emphasized that teachers responded positively when professional development facilitators allowed participants time to link their personal experiences to new teaching methods, prompting reflection and collaboration with peers.
Journal article
Published 10/03/2024
Electronics (Basel), 13, 19, 3916
This study investigates the technical challenges of applying Support Vector Machines (SVM) for multi-class classification in network intrusion detection using the UWF-ZeekDataFall22 dataset, which is labeled based on the MITRE ATT&CK framework. A key challenge lies in handling imbalanced classes and complex attack patterns, which are inherent in intrusion detection data. This work highlights the difficulties in implementing SVMs for multi-class classification, particularly with One-vs.-One (OvO) and One-vs.-All (OvA) methods, including scalability issues due to the large volume of network traffic logs and the tendency of SVMs to be sensitive to noisy data and class imbalances. SMOTE was used to address class imbalances, while preprocessing techniques were applied to improve feature selection and reduce noise in the data. The unique structure of network traffic data, with overlapping patterns between attack vectors, posed significant challenges in achieving accurate classification. Our model reached an accuracy of over 90% with OvO and over 80% with OvA, demonstrating that despite these challenges, multi-class SVMs can be effectively applied to complex intrusion detection tasks when combined with appropriate balancing and preprocessing techniques.
Journal article
MongoDB: Meeting the Dynamic Needs of Modern Applications
Published 09/27/2024
Encyclopedia (Basel, Switzerland), 4, 4, 1433 - 1453
This entry reviews MongoDB’s fundamentals, architectural features, advantages, and limitations, providing a comprehensive understanding of its capabilities. MongoDB’s impact on the database landscape is profound, challenging traditional relational databases and influencing the adoption of NoSQL solutions globally. With its continued growth, innovation, and commitment to addressing evolving market needs, MongoDB remains a pivotal player in modern data management, empowering organizations to build scalable, efficient, and high-performance applications.