Abstract:
As the amounts of storage utilization become
the vast, people are being encountered out of storage
space in almost every situation. Therefore, they tried to
find out the efficient ways to safe storage space. The
single instance storage or data deduplication can
eliminate multiple copies of the same file and
duplicated segments or chunks of data within those
files. Hence, data de-duplication becomes an
interesting field in storage environments especially in
persistent data storage for data centers. Current issue
for data deduplication is to avoid full-chunk indexing
to identify the incoming data is new, which is time
consuming process as it need to match every content of
one file to another. This paper, propose an Efficient
Indexing Mechanism for Data Deduplication
(EIMDD) and recovery system by combining the secure
hash algorithm and B+ tree indexing and show
experimental results tested on the extents of various file
types except media data files. In the proposed system,
it will first separate the file into variable-length chunks
using Two Thresholds Two Divisors (TTTD algorithm)
chunking algorithm. ChunkIDs are then obtained by
applying secure hash function to the chunks. The
resulted ChunkIDs are used to build as indexing keys
in B+ tree index structure. So the searching time for
the duplicate chunks of the files reduces from O (n) to
O (log n), which can avoid the risk of full chunk
indexing. Once the chunks are stored in disk, the system can reconstruct the original file, which is even
deleted, using the stored chunks and metadata,
whenever the user wants. This meant the recovery
ability of the proposed system.