EIMDD: Sub-file Level Data Deduplication and Recovery

Thwel, Tin Thein

UCSYRR Home
/
Conferences
/
International Conference on Computer Applications (ICCA)
/
Twelfth International Conference On Computer Applications (ICCA 2014)
/
View Item

dc.contributor.author	Thwel, Tin Thein
dc.date.accessioned	2019-07-03T03:04:23Z
dc.date.available	2019-07-03T03:04:23Z
dc.date.issued	2014-02-17
dc.identifier.uri	http://onlineresource.ucsy.edu.mm/handle/123456789/118
dc.description.abstract	As the amounts of storage utilization become the vast, people are being encountered out of storage space in almost every situation. Therefore, they tried to find out the efficient ways to safe storage space. The single instance storage or data deduplication can eliminate multiple copies of the same file and duplicated segments or chunks of data within those files. Hence, data de-duplication becomes an interesting field in storage environments especially in persistent data storage for data centers. Current issue for data deduplication is to avoid full-chunk indexing to identify the incoming data is new, which is time consuming process as it need to match every content of one file to another. This paper, propose an Efficient Indexing Mechanism for Data Deduplication (EIMDD) and recovery system by combining the secure hash algorithm and B+ tree indexing and show experimental results tested on the extents of various file types except media data files. In the proposed system, it will first separate the file into variable-length chunks using Two Thresholds Two Divisors (TTTD algorithm) chunking algorithm. ChunkIDs are then obtained by applying secure hash function to the chunks. The resulted ChunkIDs are used to build as indexing keys in B+ tree index structure. So the searching time for the duplicate chunks of the files reduces from O (n) to O (log n), which can avoid the risk of full chunk indexing. Once the chunks are stored in disk, the system can reconstruct the original file, which is even deleted, using the stored chunks and metadata, whenever the user wants. This meant the recovery ability of the proposed system.	en_US
dc.language.iso	en	en_US
dc.publisher	Twelfth International Conference On Computer Applications (ICCA 2014)	en_US
dc.title	EIMDD: Sub-file Level Data Deduplication and Recovery	en_US
dc.type	Article	en_US