Abstract:
Replication is an essential corner stone for
data storage not only for traditional storage systems
but also for cloud environment. Data popularity is a
key factor in data replication as popular files are
accessed most frequently and then they become
unstable and unpredictable. Moreover, replicas
placement is one of key issues that affect the
performance of the system such as load balancing,
file access rate etc. Therefore, we focus these
factors in this paper. Although the current Hadoop
Distributed File System (HDFS) replica placement
policy can achieve both fault tolerance and
read/write efficiency, but this cannot achieve load
distributions. Moreover, the current HDFS replica
placement policy does not consider bandwidth and
DataNodes’ storage utilization. To address these
challenges, this paper proposes a dynamic
replication management scheme; it includes replica
allocation and replica placement algorithms.
Bandwidth and storage utilization are considered in
the proposed data placement algorithm in order to
achieve faster file access rate and load balancing.
We calculated the popularity growth rate and
replica degree based on popularity growth rate as
verification. Our proposed scheme will be effective
for large-scale cloud storage.