Performance Analysis of a Scalable Naïve Bayes Classifier on MapReduce and Beyond MapReduce

Oo, Myat Cho Mon; Thein, Thandar

UCSYRR Home
/
Conferences
/
International Conference on Computer Applications (ICCA)
/
Sixteenth International Conference On Computer Applications (ICCA 2018)
/
View Item

Performance Analysis of a Scalable Naïve Bayes Classifier on MapReduce and Beyond MapReduce

Oo, Myat Cho Mon; Thein, Thandar

URI: http://onlineresource.ucsy.edu.mm/handle/123456789/247

Date: 2018-02-22

Abstract:

Many real world areas from different sources generate the big data with large volume of high velocity, complex and variable data. Big data becomes a challenge when they are difficult to process and extract knowledge using traditional analysis tools. Therefore the scalable machine learning algorithms are needed for processing such big data. Recently Hadoop MapReduce framework has been adapted for parallel computing. MapReduce may not fit for most of the real world data applications. For large scale machine learning on distributed system, Spark has finally become much more viable beyond MapReduce. Although both of these frameworks are Apache-hosted data analytic framework, their performance varies significantly based on the use case under their implementation. This paper aims to analyze the performance of scalable Naïve Bayes classifier (SNB) which is implemented on MapReduce and Beyond MapReduce over different real world datasets. The comparison results show that SNB on Beyond MapReduce provides minimal processing time than SNB on MapReduce for efficiently big data classification.

Show full item record