Merging Small Files Based on Agglomerative Hierarchical Clustering on HDFS for Cloud Storage

Wai, Khin Su Su; Myint, Julia; Yee, Tin Tin

UCSYRR Home
/
Conferences
/
International Conference on Computer Applications (ICCA)
/
Sixteenth International Conference On Computer Applications (ICCA 2018)
/
View Item

Merging Small Files Based on Agglomerative Hierarchical Clustering on HDFS for Cloud Storage

Wai, Khin Su Su; Myint, Julia; Yee, Tin Tin

URI: http://onlineresource.ucsy.edu.mm/handle/123456789/234

Date: 2018-02-22

Abstract:

Hadoop distributed file system (HDFS) was originally designed for large files. HDFS stores each small file as one separate block although the size of several small files is lesser than the size of block size. Therefore, a large number of blocks are created with massive small files. When the large number of small files is accessed, NameNode often becomes the bottleneck. The problem of storing and accessing large number of small files is named as small file problem. In order to solve this issue in HDFS, an approach of merging small files on HDFS is proposed. In this paper, small files are merged into a larger file based on the agglomerative hierarchical clustering mechanism to reduce NameNode memory consumption. This approach will provide small files for cloud storage.

Show full item record