List of works
Book chapter
User-centric Focus for Detecting Phishing Emails
Published 2023
AI, Machine Learning and Deep Learning, 313 - 333
Phishing is a cleverly crafted social engineering attack characterized by an attacker imitating a trustworthy source to obtain confidential and private information from a user for malicious purposes. Phishing attacks are primarily carried out via email or other electronic communication channels, affecting both businesses and private individuals. This work focuses only on phishing attacks performed via email. For a successful defense against phishing attacks, the ability to detect phishing is of utmost necessity. Measures to detect phishing can be classified into technical and user-centric. To date, there has been widespread emphasis on technical measures, with little focus on user-centric approaches. Moreover, technical and user-centric measures, taken individually, have shown inherent drawbacks and limited effectiveness. The goal of this work is, for an optimal solution, to develop a solution capturing the interaction of a technical phishing detector and user involvement in the backdrop of behavioral models. This work is focused on explainable AI (XAI). With XAI, presented through the use of LIME and anchor explanations, the aim is to improve the thoughtful cognitive handling of emails, moving a user's behavior from System 1 to System 2 thinking. The novelty of this work leads to the design of an artifact for detecting phishing emails combining technical and user-centric measures, with the aim of moving a user's behavior from System 1 to System 2 thinking.
Book chapter
Published 05/13/2021
Proceedings of International Conference on Innovations in Information and Communication Technologies. ICI2CT 2020. Algorithms for Intelligent Systems, 1 - 11
In this paper, we use three different machine learning classifiers in spark, decision tree, random forest, and logistic regression, to classify attack traffic of different types of IoT devices from the Kitsune dataset. Kitsune allows us to use real-time network traffic information from data streams to dynamically generate features in real time. In this work, only protocol usage statistics generated from pcap files of the original data streams is used to detect malicious traffic in real time using the Big Data framework. Performance is measured in terms of accuracy, attack detection rate (ADR), false alarm rate (FAR), and runtime.
Book chapter
A Key-Based Database Sharding Implementation for Big Data Analytics
Published 09/23/2015
Advanced Research on Cloud Computing Design and Applications, 321 - 345
In this chapter, we use MySQL Database Cluster to demonstrate and discover the capabilities of key based database sharding and provide the implementation details to build a key based sharded database system. After the implementation section, we present some examples of datasets that were sharded using our implementation. The sharded data is then used for data mining, specifically association rule mining. We present the results (association rules) for the sharded data as well as the non-sharded data.
Book chapter
An Architecture for Query Optimization Using Association Rule Mining
Published 01/01/2013
Intelligence Methods and Systems Advancements for Knowledge-Based Business, 281 - 304
This research presents a way to identify attribute-value relationships already existing in a database by using association rule mining to optimize query processing. Once relationships have been determined, these relationships can be used as a basis for creating temporary structures like views to optimize query operations. This paper presents an architecture that shows how table partitions in the form of views, created based on association rules, can be used to optimize queries. The results of this study were statistically significant.
Book chapter
Automating the Generation of Joins in Large Databases and Web Services
Published 04/30/2011
Web Engineered Applications for Evolving Organizations: Emerging Knowledge, 123 - 140
In this data-centric world, as web services and service oriented architectures gain momentum and become a standard for data usage, there will be a need for tools to automate data retrieval. In this paper we propose a tool that automates the generation of joins in a transparent and integrated fashion in heterogeneous large databases as well as web services. This tool reads metadata information and automatically displays a join path and a SQL join query. This tool will be extremely useful for performing joins to help in the retrieval of information in large databases as well as web services.
Book chapter
Ternary and Higher-Order ER Diagrams
Published 2011
Database Design Using Entity-Relationship Diagrams 3rd edition, 257 - 280
Book chapter
An Approach to Mining Crime Patterns
Published 2009
Selected Readings on Database Technologies and Applications, 2, 1, 268
This paper presents a knowledge discovery effort to retrieve meaningful information about crime from a U.S. state database. The raw data were preprocessed, and data cubes were created using Structured Query Language (SQL). The data cubes then were used in deriving quantitative generalizations and for further analysis of the data. An entropy-based attribute relevance study was undertaken to determine the relevant attributes. A machine learning software called WEKA was used for mining association rules, developing a decision tree, and clustering. SOM was used to view multidimensional clusters on a regular two-dimensional grid.
Book chapter
Published 2001
Oracle Internals: Tips, Tricks, and Techniques for DBAs, 643 - 650
There is often a need to select data from columns from more than one table. A join combines columns and data from two or more tables (and in some cases, of one table with itself). The tables are listed in a FROM clause of a SELECT statement, and a join condition between the two tables is specified in a WHERE clause.