Abstract:
Big data in terms is the huge volume of data
that is difficult to be processed, handled and
managed. The term big data is an emerging trend
where a number of researches and data scientists are
carried out for data analytics. One of the most
interesting thing in big data analytics is about the
predicting for future by using the data. Predictive
analytics is the use of data and machine learning
(ML) techniques to identify the future outcomes based
on historical data. With predictive analytics a
company can meaningfully leverage that business
data to diagnose and solve business problems. But
choosing a proper predictive analytics framework
need to be consider many things, and it is also vital
for every data analysis project. The best and the
simplest practical way is to compare the response
time of each framework. In this paper, we will
investigate and compare two big data predictive
analytics frameworks, Apache Mahout and Spark
MLlib, from performance point of view. This
comparative study will make things easier to the
researcher and data scientists in the selection of big
data analytics frameworks according to their
analytics areas.