Spark Configurations to Optimize Decision Tree Classification on UNSW-NB15

Sikha Bagui; Mary Walauskis; Robert DeRush; Huyen Praviset; Shaunda Boucugnani

doi:10.3390/bdcc6020038

Back

Spark Configurations to Optimize Decision Tree Classification on UNSW-NB15

Journal article

Open access

Peer reviewed

Spark Configurations to Optimize Decision Tree Classification on UNSW-NB15

Sikha Bagui, Mary Walauskis, Robert DeRush, Huyen Praviset and Shaunda Boucugnani

Big data and cognitive computing, Vol.6(2), p.38

06/01/2022

DOI: https://doi.org/10.3390/bdcc6020038

Web of Science ID: WOS:000818378300001

Metrics

125 Record Views

11 Times Cited - Web of Science

Abstract

This paper looks at the impact of changing Spark's configuration parameters on machine learning algorithms using a large dataset-the UNSW-NB15 dataset. The environmental conditions that will optimize the classification process are studied. To build smart intrusion detection systems, a deep understanding of the environmental parameters is necessary. Specifically, the focus is on the following environmental parameters: the executor memory, number of executors, number of cores per executor, execution time, as well as the impact on statistical measures. Hence, the objective was to optimize resource usage and minimize processing time for Decision Tree classification, using Spark. This shows whether additional resources will increase performance, lower processing time, and optimize computing resources. The UNSW-NB15 dataset, being a large dataset, provides enough data and complexity to see the changes in computing resource configurations in Spark. Principal Component Analysis was used for preprocessing the dataset. Results indicated that a lack of executors and cores result in wasted resources and long processing time. Excessive resource allocation did not improve processing time. Environmental tuning has a noticeable impact.

Files and links (1)

url

Spark Configurations to Optimize Decision Tree Classification on UNSW-NB15View

Published (Version of record)link to articleCC BY V4.0, Open

Details

Title: Spark Configurations to Optimize Decision Tree Classification on UNSW-NB15
Publication Details: Big data and cognitive computing, Vol.6(2), p.38
Resource Type: Journal article
Publisher: MDPI
Number of pages: 12
Identifiers: WOS:000818378300001; 99380178992806600
Academic Unit: Computer Science; Hal Marcus College of Science and Engineering
Language: English

Spark Configurations to Optimize Decision Tree Classification on UNSW-NB15

Metrics

Abstract

Files and links (1)

Related links

Details

University of West Florida Social media