Logo image
Spark Configurations to Optimize Decision Tree Classification on UNSW-NB15
Journal article   Open access   Peer reviewed

Spark Configurations to Optimize Decision Tree Classification on UNSW-NB15

Sikha Bagui, Mary Walauskis, Robert DeRush, Huyen Praviset and Shaunda Boucugnani
Big data and cognitive computing, Vol.6(2), p.38
06/01/2022
Web of Science ID: WOS:000818378300001

Metrics

Abstract

This paper looks at the impact of changing Spark's configuration parameters on machine learning algorithms using a large dataset-the UNSW-NB15 dataset. The environmental conditions that will optimize the classification process are studied. To build smart intrusion detection systems, a deep understanding of the environmental parameters is necessary. Specifically, the focus is on the following environmental parameters: the executor memory, number of executors, number of cores per executor, execution time, as well as the impact on statistical measures. Hence, the objective was to optimize resource usage and minimize processing time for Decision Tree classification, using Spark. This shows whether additional resources will increase performance, lower processing time, and optimize computing resources. The UNSW-NB15 dataset, being a large dataset, provides enough data and complexity to see the changes in computing resource configurations in Spark. Principal Component Analysis was used for preprocessing the dataset. Results indicated that a lack of executors and cores result in wasted resources and long processing time. Excessive resource allocation did not improve processing time. Environmental tuning has a noticeable impact.
url
Spark Configurations to Optimize Decision Tree Classification on UNSW-NB15View
Published (Version of record)link to articleCC BY V4.0 Open

Related links

Details

Logo image