Abstract:
Data continue a massive expansion in scale,
diversity, and complexity. Data underpin activities in all
sectors of society. Achieving the full transformative
potential from the use of data in this increasingly digital
world requires not only new data analysis algorithms
but also a new generation of systems and distributed
computing environments to handle the dramatic growth
in the volume of data, the lack of structure for much of it
and the increasing computational needs of massive
scale analytics. In this paper, we propose big data
platform that is built upon open source and built on
Hadoop MapReduce , Gluster File System, Apache Pig,
Apache Hive and Jaql and compare our platform with
other two big data platforms I BM big da ta platform
and Splunk . Our big data platform can support large
scale data analysis efficiently and effectively.