Abstract:
As the amount of storage utilization become
larger and larger, people have been tried to find out
the efficient ways to safe storage space. The single
instance storage or data deduplication becomes
vague in storage management as it can eliminate
duplicated data or segments in those files. In this
paper, we proposed Data Deduplication System for
sub-file level. This system can perform deduplication
with the integrated use of file chunking algorithm;
secure hash function and B+ tree indexing. In this
system, we will first separate the file into
variable_length segments or chunks using Two
Thresholds Two Divisors chunking algorithm.
ChunkIDs are then obtained by applying hash
function to the chunks. The resulted ChunkIDs are
used to build as indexing keys in B+ tree like index
structure. This system can reduce the indexing time
complexity from O (n) to O (log n). The performance
of proposed system will be compared with the other
systems in terms of performance metrics such as
WinZIP, WinRAR, etc.