Abstract:
Nowadays, a large amount of digital data is
generated from everywhere, every second of the day.
One of the challenges is the volume of generated data
with high dimensionality. Most of traditional machine
learning algorithms are not good in training time and
classification result to find hidden insights from these
high dimensional data. Back-propagation Neural
Network, one of the most popular Artificial Neural
Networks, is widely used in many classification
applications. To reduce the data dimension, feature
selection is needed to consider. MapReduce is a
software framework for writing applications which
are run on Hadoop that supports rapid computation
and processing of Big Data. In this paper, first the
dimension of data is reduced using Chi-square
method. Then, Backpropagation Neural Network with
MapReduce paradigm is used for classification.
MapReduce-based Neural Network classifier is
constructed using one and two hidden layers. Six
different datasets are used as case study and the
performance measures involve the training time,
accuracy and number of selected features. The
results of MapReduce-based Neural Network
algorithm training on complete features and features
selected subset are compared with WEKA tool and
Conventional Back-propagation Neural Network.
Based on the experimental results, MapReduce-based
Neural Network algorithm give the superior
efficiency in training time and accuracy with reduced
number of features selected.