DATA DEDUPLICATION FOR MYANMAR  LANGUAGE STORAGE BY USING SECURE HASH  ALGORITHM

Aye, Thae Nu

DATA DEDUPLICATION FOR MYANMAR LANGUAGE STORAGE BY USING SECURE HASH ALGORITHM

Aye, Thae Nu

URI: https://onlineresource.ucsy.edu.mm/handle/123456789/2787

Date: 2023-01

Abstract:

There is a vast amount of duplicated or redundant data in storage systems. The existing data deduplication attempted to reduce the storage spaces in file-level, sub file-level data storage in terms of byte-level. There is also a need to reduce content level data deduplication, especially in Myanmar language contents. This study aims to deduplicate the data for sentences written in Burmese. The system accepts Myanmar sentences as input and uses Text Splitter to segment the input file into chunks according to the whitespace. Input the separated chunks into the ChunkID generator to generate the ChunkID by applying the Secure Hash Algorithm (SHA1). The system will search for duplicate phrases, and then it will work on reducing those duplicate phrases. The system is implemented with python in Visual Code IDE. According to the tested result, the system can dedupe the duplicated data which are written in Myanmar language with the file type .txt and .docx, especially, it is work well in .txt file for both deduplication and reconstruction process.

Show full item record