Data Deduplication using B+ Tree Indexing

Thwel, Tin Thein; Thein, Ni Lar

UCSYRR Home
/
Conferences
/
Local Conference on Parallel and Soft Computing
/
Fourth Local Conference on Parallel and Soft Computing
/
View Item

Data Deduplication using B+ Tree Indexing

Thwel, Tin Thein; Thein, Ni Lar

URI: http://onlineresource.ucsy.edu.mm/handle/123456789/1893

Date: 2009-12-30

Abstract:

As the amount of storage utilization become larger and larger, people have been tried to find out the efficient ways to safe storage space. The single instance storage or data deduplication becomes vague in storage management as it can eliminate duplicated data or segments in those files. In this paper, we proposed Data Deduplication System for sub-file level. This system can perform deduplication with the integrated use of file chunking algorithm; secure hash function and B+ tree indexing. In this system, we will first separate the file into variable_length segments or chunks using Two Thresholds Two Divisors chunking algorithm. ChunkIDs are then obtained by applying hash function to the chunks. The resulted ChunkIDs are used to build as indexing keys in B+ tree like index structure. This system can reduce the indexing time complexity from O (n) to O (log n). The performance of proposed system will be compared with the other systems in terms of performance metrics such as WinZIP, WinRAR, etc.

Show full item record