Feature Selection and MapReduce Based Neural Network Classification for Big Data

Shine, Chit Thu

Feature Selection and MapReduce Based Neural Network Classification for Big Data

Shine, Chit Thu

URI: http://onlineresource.ucsy.edu.mm/handle/123456789/2252

Date: 2018-12

Abstract:

Nowadays, a large amount of digital data is generated from everywhere, every second of the day. One of the challenges is the volume of generated data with high dimensionality. Most of traditional machine learning algorithms are not good in training time and classification result to find hidden insights from these high dimensional data. Backpropagation Neural Network, one of the most popular Artificial Neural Networks, is widely used in many classification applications. To reduce the data dimension, feature selection is needed to consider. MapReduce is a software framework for writing applications which are run on Hadoop that supports rapid computation and processing of Big Data. First, the data preprocessing is performed by substituting missing values. And then, the dimension of data is reduced using Chi-square feature selection method. After that, Backpropagation Neural Network with MapReduce paradigm is used for classification. For this MapReduce-based Neural Network classifier, it is constructed using one and two hidden layers. The outputs of the proposed system are the performance measures which involve the training time, accuracy and number of selected features. The experiments have made with feature selection and without feature selection. Then, the results are compared with the results obtained from WEKA tool and Conventional Backpropagation Neural Network. Six different datasets (Thyroid Disease Diagnosis, Diabetics Diagnosis, Insurance Classification, Intrusion Detection, Customer Churn Prediction and Human Activity Recognition) are used as case study. Based on the experimental results, the MapReduce-based Neural Network algorithm gives the superior efficiency in training time faster than the WEKA tool in large dataset. And it is also found that feature selection can retain a suitably accuracy in representing the original features by selection a minimal feature subset from a problem domain. The proposed system is implemented by Java programming language on Linux platform.

Show full item record