Dr. Sikha S Bagui

Distinguished University Professor, Computer Science

Big Data analytics

The Big Data Framework (hadoop)

Machine Learning and Deep Learning

Data Mining

Structured Query Language-SQL

database design and architecture

Data Pre-Processing

Load Balancing

Resampling

Attribute Selection

Database design

Entity-Relationship Modeling

Association Rule Mining

decision trees

Random Forest

SVM

Hive

Conference proceeding Open access Peer reviewed

Selecting Feature Subsets in Continuous Flow Network Attack Traffic Big Data Using Incremental Frequent Pattern Mining

by Sikha Bagui, Andrew Benyacko, Mink Dustin, Subhash Bagui and Bagchi Arijit

Published 12/16/2025

Algorithms, 18, 12, 795

This work focuses on finding frequent patterns in continuous flow network traffic Big Data using incremental frequent pattern mining. A newly created Zeek Conn Log MITRE ATT&CK framework labeled dataset, UWF-ZeekData24, generated using the Cyber Range at The University of West Florida, was used for this study. While FP-Growth is effective for static datasets, its standard implementation does not support incremental mining, which poses challenges for applications involving continuously growing data streams, such as network traffic logs. To overcome this limitation, a staged incremental FP-Growth approach is adopted for this work. The novelty of this work is in showing how incremental FP-Growth can be used efficiently on continuous flow network traffic, or streaming network traffic data, where no rebuild is necessary when new transactions are scanned and integrated. Incremental frequent pattern mining also generates feature subsets that are useful for understanding the nature of the individual attack tactics. Hence, a detailed understanding of the features or feature subsets of the seven different MITRE ATT&CK tactics is also presented. For example, the results indicate that core behavioral rules, such as those involving TCP protocols and service associations, emerge early and remain stable throughout later increments. The incremental FP-Growth framework provides a structured lens through which network behaviors can be observed and compared over time, supporting not only classification but also investigative use cases such as anomaly tracking and technique attribution. And finally, the results of this work, the frequent itemsets, will be useful for intrusion detection machine learning/artificial intelligence algorithms.

Conference proceeding Open access

A Case Study of Taking AP Computer Science Principles: A Student's Perspective

by Sarah Cameron, Tony Pham and Sikha Bagui

Published 03/15/2024

Proceedings of the 55th ACM Technical Symposium on Computer Science Education, 2, 1588 - 1589

SIGCSE 2024: The 55th ACM Technical Symposium on Computer Science Education, 03/20/2024–03/24/2024, Portland, Oregon, USA

With the increased demand for Computer Science degrees in the work force, Computer Science is becoming more prominent in high schools. AP Computer Science Principles (AP CSP) is a course that serves as a bridge into Computer Science. Code.org provides a year-long curriculum for this AP course to be led by teachers in the classroom. Beyond an analysis of the pass rates of students, and with the recency of the AP CSP course, a reflection of the AP CSP curriculum from the student's perspective is in order. This study breaks down the strengths and weaknesses of AP CSP from a student's perspective. Results show there are many strengths compared to weaknesses in relation to the Code.org curriculum. However, the course can be a little challenging in motivating and engaging students if not executed properly by the teacher.

Conference proceeding Open access Peer reviewed

Marine Vessel Tracking using a Monocular Camera

by Tobias Jacob, Raffaele Galliera, Muddasar Ali and Sikha Bagui

Published 08/23/2021

DeLTA2021-Proceedings of the 2nd International Conference on Deep Learning Theory and Applications, 17 - 28

International Conference on Deep Learning Theory and Applications

In this paper, a new technique for camera calibration using only GPS data is presented. A new way of tracking objects that move on a plane in a video is achieved by using the location and size of the bounding box to estimate the distance, achieving an average prediction error of 5.55m per 100m distance from the camera. This solution can be run in real-time at the edge, achieving efficient inference in a low-powered IoT environment, while being also able to track multiple different vessels.

Conference proceeding Peer reviewed

Trusted Digital Identities for Mobile Devices

by P. Renee Carnley, Pam Rowland, Dave Bishop, Sikha Bagui and Matt Miller

Published 08/2020

2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), 483 - 490

IEEE International Symposium on Dependable, Autonomic and Secure Computing (DASC), 08/17/2020–08/22/2020, Calgary, AB, Canada

The pace of usage of personal devices has increased within industry, driving demand for trusted digital identities. As the world becomes increasingly mobile with the number of smartphone users growing yearly and the mobile web thriving, it is critical to implement strong security on mobile devices. As passwords are not a secure method of authentication, mobile devices and other forms of IoT require a means of two-factor authentication that meets strong security standards. The research presented in this paper provides a framework that updates the existing Public Key Infrastructure (PKI) in use by industry and governments today to accommodate the use of digital identities on smartphones, tablets, or even a smartwatch.

Conference proceeding

Applying a Verified Trusted Computing Base to Cyber Protect a Vulnerable Traffic Control Cyber-Physical System

by Stephen Hopkins, Carolyn Henry, Sikha Bagui, Amitabh Mishra, Ezhil Kalaimannan and Caroline Sangeetha John

Published 2020

IEEE SoutheastCon 2020

IEEE SoutheastCon, 2020, Raleigh, NC

Traffic control systems were developed with operational performance, reliability, and safety in mind. Traffic control systems were designed well before the heavy integration of advanced communications including radio frequency (RF), the Internet and cellular transmissions. These technologies were integrated to provide more control and enable the traffic systems to become adaptive to real-time traffic flow and environmental conditions. These advances increase the opportunity for attackers to affect traffic system operations, sometimes creating a congestion which essentially halts traffic. The Secure SCADA Framework presents eight objectives which would increase the cyber resilience of an existing vulnerable cyber physical system, such as a traffic control system [1]. This approach retains the current operational performance, reliability, and safety. The concept of using a Trusted Computing Base (TCB) in a cyber-physical system is one goal of the eight presented for the Secure SCADA Framework. The SCADA TCB (STCB) project designs, develops, and verifies a core set of hardware, software, and firmware which operate in conjunction to establish a high level of security protecting a traffic control system. This research defines the requirements of a traffic control system, establishes a security policy, develops a trusted computing base, identifies and designs attacks on the system, and meets the development life-cycle requirements to proceed with implementation, verification, and testing.

Conference proceeding Peer reviewed

Classifying Phishing Email Using Machine Learning and Deep Learning

by Sikha Bagui, Debarghya Nandi, Subhash Bagui and Robert Jamie White

Published 06/2019

2019 International Conference on Cyber Security and Protection of Digital Services (Cyber Security)

International Conference on Cyber Security and Protection of Digital Services (Cyber Security), 06/03/2019–06/04/2019, Oxford, UK

In this work, we applied deep semantic analysis, and machine learning and deep learning techniques, to capture inherent characteristics of email text, and classify emails as phishing or non -phishing.

Conference proceeding

Test bed development for security engineered SCADA laboratory

by Stephen Hopkins, Sikha Bagui, Ezhil Kalaimannan, Amitabh Mishra, Bhavyansh Mishra and Daniel Kelly

Published 2019

IEEE SoutheastCon 2019: At the Von Braun Center in Huntsville, Alabama

IEEE SoutheastCon, 2019, Huntsville, Alabama

Supervisory Control and Data Acquisition (SCADA) systems and Industrial Control Systems (ICS) are critical for infrastructure operations, production processes, automation systems, and other automated control systems. SCADA systems are vulnerable to cyber physical attacks from many vectors. The attack surface varies widely. The problem is complicated by the inherent focus on performance, reliability, and safety rather than cybersecurity. The Secure SCADA Framework establishes a security engineered approach evolving SCADA and ICS systems towards a more cybersecure posture. The encapsulating and integrating concepts of the Secure SCADA Framework require development and analysis of implementations achieving the eight framework goals. This work provides a solution for research, design, development, and evaluation of components of a Secure SCADA System. A Security Engineered SCADA Laboratory requires a test bed with which engineering and science designs can be developed. This paper presents a test bed which can model and collect data on simulated attacks, analyze data to evaluate performance, and provide the foundations for a Security Engineered SCADA Laboratory.

Conference proceeding Peer reviewed

A Parallel Implementation of Information Gain Using Hive in conjunction with MapReduce for Continuous Features

by S. Bagui, Sharon John, Baggs John S. and Subhash C Bagui

Published 2018

Trends and Applications in Knowledge Discovery and Data Mining: Knowledge Discovery and Data Mining Book Subtitle PAKDD 2018 Workshops, BDASC, BDM, ML4Cyber, PAISI, DaMEMO, Melbourne, VIC, Australia, June 3, 2018, Revised Selected Papers, 283 - 294

Pacific Asia Workshop on Intelligence and Security Informatics (PAISI)

Finding efficient ways to perform the Information Gain algorithm is becoming even more important as we enter the Big Data era where data and dimensionality are increasing at alarming rates. When machine learning algorithms get over-burdened with large dimensional data with redundant features, information gain becomes very crucial for feature selection. Information gain is also often used as a pre-cursory step in creating decision trees, text classifiers, support vector machines, etc. Due to the very large volume of today’s data, there is a need to efficiently parallelize classic algorithms like Information Gain. In this paper, we present a parallel implementation of Information Gain in the MapReduce environment, using MapReduce in conjunction with Hive, for continuous features. In our approach, Hive was used to calculate the counts and parent entropy and a Map only job was used to complete the Information Gain calculations. Our approach demonstrated gains in run times as we carefully designed MapReduce jobs efficiently leveraging the Hadoop cluster.

Conference proceeding Peer reviewed

Adaptable Enterprise Architectures for Software Evolution of SmartLife Ecosystems

by Alfred Zimmermann, Bilal Gonen, Rainer Schmidt, Eman El-Sheikh, Sikha Bagui and Norman Wilde

Published 09/01/2014

2014 IEEE 18th International Enterprise Distributed Object Computing Conference Workshops and Demonstrations, 316

International Enterprise Distributed Object Computing Conference Workshops and Demonstrations, 09/01/2014–09/02/2014, Ulm, Germany

Conference Title: 2014 IEEE 18th International Enterprise Distributed Object Computing Conference Workshops and Demonstrations (EDOCW) Conference Start Date: 2014, Sept. 1 Conference End Date: 2014, Sept. 2 Conference Location: Ulm, Germany SmartLife ecosystems are emerging as intelligent user-centered systems that will shape future trends in technology and communication. Biological metaphors of living adaptable ecosystems provide the logical foundation for self-optimizing and self-healing run-time environments for intelligent adaptable business services and related information systems with service-oriented enterprise architectures. The present research in progress work investigates mechanisms for adaptable enterprise architectures for the development of service-oriented ecosystems with integrated technologies like Semantic Technologies, Web Services, Cloud Computing and Big Data Management. With a large and diverse set of ecosystem services with different owners, our scenario of service-based SmartLife ecosystems can pose challenges in their development, and more importantly, for maintenance and software evolution. Our research explores the use of knowledge modeling using ontologies and flexible metamodels for adaptable enterprise architectures to support program comprehension for software engineers during maintenance and evolution tasks of service-based applications. Our previous reference enterprise architecture model ESARC -- Enterprise Services Architecture Reference Cube -- and the Open Group SOA Ontology was extended to support agile semantic analysis, program comprehension and software evolution for a SmartLife applications scenario. The Semantic Browser is a semantic search tool that was developed to provide knowledge-enhanced investigation capabilities for service-oriented applications and their architectures.

Conference proceeding

Towards Semantic-Supported SmartLife System Architectures for Big Data Services in the Cloud

by Eman M El-Sheikh, Sikha S Bagui, Donald G. Firesmith, Ilia Petrov, Norman Wilde and Alfred Zimmerman

Published 2013

SERVICE COMPUTATION 2013, The Fifth International Conferences on Advanced Service Computing, 59 - 64

ComputationWorld 2013, 05/27/2013–06/01/2013, Valencia, Spain

SmartLife applications are emerging as intelligent user-centered systems that will shape future trends in technology and communication. The development of such applications integrates web services, cloud computing, and big data management, among other frameworks and methods. Our paper reports on new perspectives of services and cloud computing architectures for the challenging domain of SmartLife applications. In this research, we explore SmartLife applications in the context of semantic-supported systems architectures and big data in cloud settings. Using a SmartLife application scenario, we investigate graph data management, fast big data, and semantic support through ontological modeling. The ontological model and architecture reference model can be used to support semantic analysis and program comprehension of SmartLife applications.

Dr. Sikha S Bagui

Distinguished University Professor, Computer Science

List of works

University of West Florida Social media