This study presents an efficient way to deal with discrete as well as continuous values in Big Data in a parallel Naive Bayes implementation on Hadoop's MapReduce environment. Two approaches were taken: (i) discretizing continuous values using a binning method; and (ii) using a multinomial distribution for probability estimation of discrete values and a Gaussian distribution for probability estimation of continuous values. The models were analyzed and compared for performance with respect to run time and classification accuracy for varying data sizes, data block sizes, and map memory sizes.
Related links
Details
Title
MapReduce Implementation of a Multinomial and Mixed Naive Bayes Classifier
Publication Details
International journal of intelligent information technologies, Vol.16(2)