EIMDD: Sub-file Level Data Deduplication and Recovery

Thwel, Tin Thein

UCSYRR Home
/
Conferences
/
International Conference on Computer Applications (ICCA)
/
Twelfth International Conference On Computer Applications (ICCA 2014)
/
View Item

EIMDD: Sub-file Level Data Deduplication and Recovery

Thwel, Tin Thein

URI: http://onlineresource.ucsy.edu.mm/handle/123456789/118

Date: 2014-02-17

Abstract:

As the amounts of storage utilization become the vast, people are being encountered out of storage space in almost every situation. Therefore, they tried to find out the efficient ways to safe storage space. The single instance storage or data deduplication can eliminate multiple copies of the same file and duplicated segments or chunks of data within those files. Hence, data de-duplication becomes an interesting field in storage environments especially in persistent data storage for data centers. Current issue for data deduplication is to avoid full-chunk indexing to identify the incoming data is new, which is time consuming process as it need to match every content of one file to another. This paper, propose an Efficient Indexing Mechanism for Data Deduplication (EIMDD) and recovery system by combining the secure hash algorithm and B+ tree indexing and show experimental results tested on the extents of various file types except media data files. In the proposed system, it will first separate the file into variable-length chunks using Two Thresholds Two Divisors (TTTD algorithm) chunking algorithm. ChunkIDs are then obtained by applying secure hash function to the chunks. The resulted ChunkIDs are used to build as indexing keys in B+ tree index structure. So the searching time for the duplicate chunks of the files reduces from O (n) to O (log n), which can avoid the risk of full chunk indexing. Once the chunks are stored in disk, the system can reconstruct the original file, which is even deleted, using the stored chunks and metadata, whenever the user wants. This meant the recovery ability of the proposed system.

Show full item record