Dr. Sikha S Bagui

Distinguished University Professor, Computer Science

Big Data analytics

The Big Data Framework (hadoop)

Machine Learning and Deep Learning

Data Mining

Structured Query Language-SQL

database design and architecture

Data Pre-Processing

Load Balancing

Resampling

Attribute Selection

Database design

Entity-Relationship Modeling

Association Rule Mining

decision trees

Random Forest

SVM

Hive

Book chapter

User-centric Focus for Detecting Phishing Emails

by Regina Eckhardt and Sikha Bagui

Published 2023

AI, Machine Learning and Deep Learning, 313 - 333

Phishing is a cleverly crafted social engineering attack characterized by an attacker imitating a trustworthy source to obtain confidential and private information from a user for malicious purposes. Phishing attacks are primarily carried out via email or other electronic communication channels, affecting both businesses and private individuals. This work focuses only on phishing attacks performed via email. For a successful defense against phishing attacks, the ability to detect phishing is of utmost necessity. Measures to detect phishing can be classified into technical and user-centric. To date, there has been widespread emphasis on technical measures, with little focus on user-centric approaches. Moreover, technical and user-centric measures, taken individually, have shown inherent drawbacks and limited effectiveness. The goal of this work is, for an optimal solution, to develop a solution capturing the interaction of a technical phishing detector and user involvement in the backdrop of behavioral models. This work is focused on explainable AI (XAI). With XAI, presented through the use of LIME and anchor explanations, the aim is to improve the thoughtful cognitive handling of emails, moving a user's behavior from System 1 to System 2 thinking. The novelty of this work leads to the design of an artifact for detecting phishing emails combining technical and user-centric measures, with the aim of moving a user's behavior from System 1 to System 2 thinking.

Book chapter Peer reviewed

Machine Learning in Spark for Attack Traffic Classification in IoT Devices Using Protocol Usage Statistics

by Xiaojian Wang, Sikha Bagui and Subhash Bagui

Published 05/13/2021

Proceedings of International Conference on Innovations in Information and Communication Technologies. ICI2CT 2020. Algorithms for Intelligent Systems, 1 - 11

In this paper, we use three different machine learning classifiers in spark, decision tree, random forest, and logistic regression, to classify attack traffic of different types of IoT devices from the Kitsune dataset. Kitsune allows us to use real-time network traffic information from data streams to dynamically generate features in real time. In this work, only protocol usage statistics generated from pcap files of the original data streams is used to detect malicious traffic in real time using the Big Data framework. Performance is measured in terms of accuracy, attack detection rate (ADR), false alarm rate (FAR), and runtime.

Book chapter

A Key-Based Database Sharding Implementation for Big Data Analytics

by Shadi Aljawarneh and Sikha S Bagui

Published 09/23/2015

Advanced Research on Cloud Computing Design and Applications, 321 - 345

In this chapter, we use MySQL Database Cluster to demonstrate and discover the capabilities of key based database sharding and provide the implementation details to build a key based sharded database system. After the implementation section, we present some examples of datasets that were sharded using our implementation. The sharded data is then used for data mining, specifically association rule mining. We present the results (association rules) for the sharded data as well as the non-sharded data.

Book chapter

An Architecture for Query Optimization Using Association Rule Mining

by Sikha Bagui, Mohammad Islam and Subhash Bagui

Published 01/01/2013

Intelligence Methods and Systems Advancements for Knowledge-Based Business, 281 - 304

This research presents a way to identify attribute-value relationships already existing in a database by using association rule mining to optimize query processing. Once relationships have been determined, these relationships can be used as a basis for creating temporary structures like views to optimize query operations. This paper presents an architecture that shows how table partitions in the form of views, created based on association rules, can be used to optimize queries. The results of this study were statistically significant.

Book chapter

Automating the Generation of Joins in Large Databases and Web Services

by Sikha S Bagui and Ghazi Alkhatib

Published 04/30/2011

Web Engineered Applications for Evolving Organizations: Emerging Knowledge, 123 - 140

In this data-centric world, as web services and service oriented architectures gain momentum and become a standard for data usage, there will be a need for tools to automate data retrieval. In this paper we propose a tool that automates the generation of joins in a transparent and integrated fashion in heterogeneous large databases as well as web services. This tool reads metadata information and automatically displays a join path and a SQL join query. This tool will be extremely useful for performing joins to help in the retrieval of information in large databases as well as web services.

Book chapter

Ternary and Higher-Order ER Diagrams

by Sikha Bagui and Richard Earp

Published 2011

Database Design Using Entity-Relationship Diagrams 3rd edition, 257 - 280

Book chapter

An Approach to Mining Crime Patterns

by Sikha Bagui

Published 2009

Selected Readings on Database Technologies and Applications, 2, 1, 268

This paper presents a knowledge discovery effort to retrieve meaningful information about crime from a U.S. state database. The raw data were preprocessed, and data cubes were created using Structured Query Language (SQL). The data cubes then were used in deriving quantitative generalizations and for further analysis of the data. An entropy-based attribute relevance study was undertaken to determine the relevant attributes. A machine learning software called WEKA was used for mining association rules, developing a decision tree, and clustering. SOM was used to view multidimensional clusters on a regular two-dimensional grid.

Book chapter

Oracle's Joins

by Richard Earp and Sikha S Bagui

Published 2001

Oracle Internals: Tips, Tricks, and Techniques for DBAs, 643 - 650

There is often a need to select data from columns from more than one table. A join combines columns and data from two or more tables (and in some cases, of one table with itself). The tables are listed in a FROM clause of a SELECT statement, and a join condition between the two tables is speciﬁed in a WHERE clause.

Dr. Sikha S Bagui

Distinguished University Professor, Computer Science

List of works

University of West Florida Social media