Logo image
Improving the Performance of kNN in the MapReduce Framework Using Locality Sensitive Hashing
Journal article   Peer reviewed

Improving the Performance of kNN in the MapReduce Framework Using Locality Sensitive Hashing

Sikha Bagui, Arup Kumar Mondal and Subhash Bagui
International journal of distributed systems and technologies, Vol.10(4)
10/01/2019
Web of Science ID: WOS:000511357800001

Metrics

Abstract

In this work the authors present a parallel k nearest neighbor (kNN) algorithm using locality sensitive hashing to preprocess the data before it is classified using kNN in Hadoop's MapReduce framework. This is compared with the sequential (conventional) implementation. Using locality sensitive hashing's similarity measure with kNN, the iterative procedure to classify a data object is performed within a hash bucket rather than the whole data set, greatly reducing the computation time needed for classification. Several experiments were run that showed that the parallel implementation performed better than the sequential implementation on very large datasets. The study also experimented with a few map and reduce side optimization features for the parallel implementation and presented some optimum map and reduce side parameters. Among the map side parameters, the block size and input split size were varied, and among the reduce side parameters, the number of planes were varied, and their effects were studied.

Details

Logo image