Abstract:
Nowadays, replication technique is widely used in data
center storage systems to prevent data loss. Data
popularity is a key factor in data replication as popular
files are accessed most frequently and then they become
unstable and unpredictable. Moreover, replicas
placement is one of key issues that affect the performance
of the system such as load balancing, data locality etc.
Data locality is a fundamental problem to data-parallel
applications that often happens (i.e., a data block should
be copied to the processing node when a processing node
does not possess the data block in its local storage), and
this problem leads to the decrease in performance. To
address these challenges, this paper proposes a dynamic
replication management scheme based on data popularity
and data locality; it includes replica allocation and
replica placement algorithms. Data locality, disk
bandwidth, CPU processing speed and storage utilization
are considered in the proposed data placement algorithm
in order to achieve better data locality and load
balancing effectively. Our proposed scheme will be
effective for large-scale cloud storage.