Abstract:
In today’s world, almost every enterprise is
seeing an explosion of data. They are getting huge
amount of digital data generated daily. Such huge
amount of data needs to be stored for various
reasons. Now the important question that arises at
this point of time is how do we store, manage,
process and analyze such huge amount of data most
of which is Semi structured or Unstructured in a
scalable, fault tolerant and efficient manner. The
challenges of big data are most of them is semi
structured or unstructured data, need to carry out
complex computations over big data and the time
required to process big data is as low as possible. In
this paper, we propose big data platform based on
Hadoop MapReduce framework and Gluster file
system over large scale shared storage system to
address these challenges. Our big data platform can
support large scale data analysis efficiently and
effectively.