Abstract:
Data rebalancing is one of the most interesting
research areas in distributed file systems. In Gluster
file system, data among the storage servers are
rebalanced after adding a new storage server to the
Gluster storage pool or removing the storage server
from the Gluster storage pool. The main issue in
Gluster file system is inefficient data rebalancing; a
large number of file migrations, a large amount of
files migration time and inefficient storage utilization.
Therefore, a data rebalancing mechanism for Gluster
file system is proposed to achieve efficient storage
utilization, to reduce the number of file migrations
and to save files migration time. There are two main
contributions in this paper: using consistent hashing
algorithm with virtual nodes from Amazon dynamo to
reduce the number of file migrations and to save files
migration time and migration of virtual nodes
between storage servers to provide efficient storage
utilization. The proposed data rebalancing
mechanism and current data rebalancing mechanism
are simulated with Java and the proposed mechanism
provides 82% (fullness percent), 20% of the number
of file migrations, 20% of the files migration time,
and 73% of the number of required storage servers of
the current mechanism of Gluster file system.