UCSY's Research Repository

Feature Selection and MapReduce Based Neural Network Classification for Big Data

Show simple item record

dc.contributor.author Shine, Chit Thu
dc.date.accessioned 2019-09-23T05:06:41Z
dc.date.available 2019-09-23T05:06:41Z
dc.date.issued 2018-12
dc.identifier.uri http://onlineresource.ucsy.edu.mm/handle/123456789/2252
dc.description.abstract Nowadays, a large amount of digital data is generated from everywhere, every second of the day. One of the challenges is the volume of generated data with high dimensionality. Most of traditional machine learning algorithms are not good in training time and classification result to find hidden insights from these high dimensional data. Backpropagation Neural Network, one of the most popular Artificial Neural Networks, is widely used in many classification applications. To reduce the data dimension, feature selection is needed to consider. MapReduce is a software framework for writing applications which are run on Hadoop that supports rapid computation and processing of Big Data. First, the data preprocessing is performed by substituting missing values. And then, the dimension of data is reduced using Chi-square feature selection method. After that, Backpropagation Neural Network with MapReduce paradigm is used for classification. For this MapReduce-based Neural Network classifier, it is constructed using one and two hidden layers. The outputs of the proposed system are the performance measures which involve the training time, accuracy and number of selected features. The experiments have made with feature selection and without feature selection. Then, the results are compared with the results obtained from WEKA tool and Conventional Backpropagation Neural Network. Six different datasets (Thyroid Disease Diagnosis, Diabetics Diagnosis, Insurance Classification, Intrusion Detection, Customer Churn Prediction and Human Activity Recognition) are used as case study. Based on the experimental results, the MapReduce-based Neural Network algorithm gives the superior efficiency in training time faster than the WEKA tool in large dataset. And it is also found that feature selection can retain a suitably accuracy in representing the original features by selection a minimal feature subset from a problem domain. The proposed system is implemented by Java programming language on Linux platform. en_US
dc.language.iso en_US en_US
dc.publisher University of Computer Studies, Yangon en_US
dc.title Feature Selection and MapReduce Based Neural Network Classification for Big Data en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository



Browse

My Account

Statistics