UCSY's Research Repository

DATA DEDUPLICATION FOR MYANMAR LANGUAGE STORAGE BY USING SECURE HASH ALGORITHM

Show simple item record

dc.contributor.author Aye, Thae Nu
dc.date.accessioned 2023-01-22T12:58:36Z
dc.date.available 2023-01-22T12:58:36Z
dc.date.issued 2023-01
dc.identifier.uri https://onlineresource.ucsy.edu.mm/handle/123456789/2787
dc.description.abstract There is a vast amount of duplicated or redundant data in storage systems. The existing data deduplication attempted to reduce the storage spaces in file-level, sub file-level data storage in terms of byte-level. There is also a need to reduce content level data deduplication, especially in Myanmar language contents. This study aims to deduplicate the data for sentences written in Burmese. The system accepts Myanmar sentences as input and uses Text Splitter to segment the input file into chunks according to the whitespace. Input the separated chunks into the ChunkID generator to generate the ChunkID by applying the Secure Hash Algorithm (SHA1). The system will search for duplicate phrases, and then it will work on reducing those duplicate phrases. The system is implemented with python in Visual Code IDE. According to the tested result, the system can dedupe the duplicated data which are written in Myanmar language with the file type .txt and .docx, especially, it is work well in .txt file for both deduplication and reconstruction process. en_US
dc.language.iso en en_US
dc.publisher University of Computer Studies, Yangon en_US
dc.subject SECURE HASH ALGORITHM en_US
dc.title DATA DEDUPLICATION FOR MYANMAR LANGUAGE STORAGE BY USING SECURE HASH ALGORITHM en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository



Browse

My Account

Statistics