Performance-Aware Data Placement Policy for Hadoop Distributed File System

Soe, Nang Kham; Yee, Tin Tin; Htoon, Ei Chaw

UCSYRR Home
/
Conferences
/
International Conference on Computer Applications (ICCA)
/
Sixteenth International Conference On Computer Applications (ICCA 2018)
/
View Item

Performance-Aware Data Placement Policy for Hadoop Distributed File System

Soe, Nang Kham; Yee, Tin Tin; Htoon, Ei Chaw

URI: http://onlineresource.ucsy.edu.mm/handle/123456789/228

Date: 2018-02-22

Abstract:

Apache Hadoop is an open-source software framework for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. The Hadoop Distributed File System (HDFS) is the underlying file system of a Hadoop cluster. The default HDFS data placement strategy works well in homogeneous cluster. But it performs poorly in heterogeneous clusters because of the heterogeneity of the nodes capabilities. It may cause overload in some computing nodes and reduce Hadoop performance. Therefore, Hadoop Distributed File System (HDFS) has to rely on load balancing utility to balance data distribution. As a result, data can be placed evenly across the Hadoop cluster. But it may cause the overhead of transferring unprocessed data from slow nodes to fast nodes because each node has different computing capacity in heterogeneous Hadoop cluster. In order to solve these problems, a data/replica placement policy based on storage utilization and computing capacity of each data node in heterogeneous Hadoop Cluster is proposed. The proposed policy tends to reduce the overload of some computing nodes as well as reduce overhead of data transmission between different computing nodes.

Show full item record