UCSY's Research Repository

Deduplication for Chunk-Based File Backup

Show simple item record

dc.contributor.author Maw, Yu Yu
dc.contributor.author Soe, Khin Mar
dc.date.accessioned 2019-07-19T05:06:55Z
dc.date.available 2019-07-19T05:06:55Z
dc.date.issued 2017-12-27
dc.identifier.uri http://onlineresource.ucsy.edu.mm/handle/123456789/1054
dc.description.abstract Data deduplication has become a popular technology for reducing the amount of storage space necessary for backup and archival data. As the amount of data growth all over the world, many organization need to reduce amount of data safely. To protect size of data enormously growth, deduplication become a solution to solve this problem. Data deduplication is found in many forms. Data deduplication reduces the data that duplicate within a file or among other files. Virtual tape libraries, archive storage, disk storage systems, and applications such as email systems, content managers, backup systems and more, are examples of where data deduplication can be applied. It can achieve more storage space although there are multiple files. So, data deduplication becomes essential and critical component of backup systems. In this thesis, deduplication for chunk-based file backup is implemented using Content-Defined Chunking Algorithm and Secure Hash Algorithm. In this thesis, deduplication works with four steps: (1) Chunking (2) Fingerprinting (3) Index lookup (4) Writing.Content-Defined Chunking Algorithm chunks input file stream to generate chunks. Secure Hash Algorithm is use to generate hash key (fingerprint) and to compare new chunks and old chunks in order to get unique chunks in the file. In such system, we focus on Microsoft Word files. In normal, deduplication takes a long time to complete and use a lot of CPU cycles in the process of deduplicating data, possibly introducing performance issues on production machines. Instant of deduplication make over entire file, this thesis dedupes a file by partition three parts in order to reduce processing time. en_US
dc.language.iso en en_US
dc.publisher Eighth Local Conference on Parallel and Soft Computing en_US
dc.title Deduplication for Chunk-Based File Backup en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository



Browse

My Account

Statistics