Logo image
Optimizing Random Forests: Spark Implementations of Random Genetic Forests
Journal article   Open access

Optimizing Random Forests: Spark Implementations of Random Genetic Forests

Sikha S Bagui
BOHR International Journal of Engineering, Vol.1(1), pp.44-52
2022

Metrics

47 Record Views

Abstract

The Random Forest (RF) algorithm, originally proposed by Breiman [7], is a widely used machine learning algorithm that gains its merit from its fast learning speed as well as high classification accuracy. However, despite its widespread use, the different mechanisms at work in Breiman’s RF are not yet fully understood, and there is still on-going research on several aspects of optimizing the RF algorithm, especially in the big data environment. To optimize the RF algorithm, this work builds new ensembles that optimize the random portions of the RF algorithm using genetic algorithms, yielding Random Genetic Forests (RGF), Negatively Correlated RGF (NC-RGF), and Preemptive RGF (PFS-RGF). These ensembles are compared with Breiman’s classic RF algorithm in Hadoop’s big data framework using Spark on a large, high-dimensional network intrusion dataset, UNSW-NB15
url
Optimizing Random ForestsView
Published (Version of record)link to articleCC BY-NC-ND V4.0 Open

Related links

Details

Logo image