DATA DEDUPLICATION FOR MYANMAR  LANGUAGE STORAGE BY USING SECURE HASH  ALGORITHM

Aye, Thae Nu

dc.contributor.author	Aye, Thae Nu
dc.date.accessioned	2023-01-22T12:58:36Z
dc.date.available	2023-01-22T12:58:36Z
dc.date.issued	2023-01
dc.identifier.uri	https://onlineresource.ucsy.edu.mm/handle/123456789/2787
dc.description.abstract	There is a vast amount of duplicated or redundant data in storage systems. The existing data deduplication attempted to reduce the storage spaces in file-level, sub file-level data storage in terms of byte-level. There is also a need to reduce content level data deduplication, especially in Myanmar language contents. This study aims to deduplicate the data for sentences written in Burmese. The system accepts Myanmar sentences as input and uses Text Splitter to segment the input file into chunks according to the whitespace. Input the separated chunks into the ChunkID generator to generate the ChunkID by applying the Secure Hash Algorithm (SHA1). The system will search for duplicate phrases, and then it will work on reducing those duplicate phrases. The system is implemented with python in Visual Code IDE. According to the tested result, the system can dedupe the duplicated data which are written in Myanmar language with the file type .txt and .docx, especially, it is work well in .txt file for both deduplication and reconstruction process.	en_US
dc.language.iso	en	en_US
dc.publisher	University of Computer Studies, Yangon	en_US
dc.subject	SECURE HASH ALGORITHM	en_US
dc.title	DATA DEDUPLICATION FOR MYANMAR LANGUAGE STORAGE BY USING SECURE HASH ALGORITHM	en_US
dc.type	Thesis	en_US