Abstract:
Big data analytics is the process of examining large amounts of data of a variety
of types to uncover hidden patterns, unknown correlations and other useful information.
Hadoop-based platform emerges to deal with big data. In Hadoop NameNode is used to store metadata in a single system’s memory, which is a performance bottleneck for scale-out. Gluster file system has no performance bottlenecks related to metadata. To achieve massive performance, scalability and fault tolerance for big data analytics, a big data platform is proposed. The proposed big data platform consists of big data storage and big data processing. The Hadoop big data platform and the proposed big data platform are implemented on commodity Linux virtual machines clusters and performance evaluations are conducted. According to the evaluation analysis, the proposed big data platform provides better scalability, fault tolerance, and faster query response time than the Hadoop platform.