Abstract:
The widespread popularity of Cloud
computing as a preferred platform for the
deployment of web applications has resulted in
an enormous number of applications moving to
the cloud, and the huge success of cloud service
providers. The data center storage management
plays a vital role in cloud computing
environments. Especially the PC cluster-based
data storage is necessary to manage data on low
cost storage servers in which storage space can
be reduced. This system presents an efficient
data storage approach to push work out to many
nodes in a cluster using Hadoop File System
(HDFS) with variable chunk size to facilitate
massive data processing. This system introduces
the implementation enhancement on MapReduce
to improve the system throughput and the
scalability to keep on working with the amount of
existing physical storage capacity when the
number of users and files increase.