Abstract:
Replication plays an important role for storage system to improve data availability, throughput
and response time for user and control storage cost. Due to different nature of data access pattern, data
popularity is important in replication because of the unstable and unpredictable nature of popular files. Also,
replicas placement is important in consideration of system's performance. In data-parallel applications, data
locality is a key issue and this consequence of this issue occurs the decrement of system’ performance.
Therefore, this paper proposes a data locality-based replication for Hadoop Distributed File System (HDFS).
In replica allocation, data popularity is considered for maintaining less replicas for unpopular data and also,
disk bandwidth, CPU utilization and disk utilization are considered in the proposed replica placement
algorithm in order to get better data locality and more effective storage utilization. Our proposed scheme will
be effective for HDFS.