UCSY's Research Repository

Duplicate Record Detection in Data Cleaning Using DCS++ Algorithm

Show simple item record

dc.contributor.author Phyo, Yin Yin
dc.contributor.author Win, Thidar
dc.date.accessioned 2021-06-11T05:39:39Z
dc.date.available 2021-06-11T05:39:39Z
dc.date.issued 2021-06
dc.identifier.uri https://onlineresource.ucsy.edu.mm/handle/123456789/2586
dc.description.abstract Nowadays, Duplicate Record Detection is a multiple record search process that represents the same physical entity in a dataset. It is also known as the record linkage (or) entity matching. The databases contain a very large dataset. Datasets contain duplicate records that do not share a common key or contain errors such as incomplete information, transcription errors and missing or differing standard formats (nonstandardized abbreviations) in the detailed schemas of records from multiple databases. Therefore, the duplicate detection needs to complete its process in a very shorter time. Duplicate detection requires an algorithm for determining whether records are duplicate records or not. In this system, the researcher calculates a similarity metric that is commonly used to find similar field items and uses the Duplicate Count Strategy-Multi Record Increase (DCS++) Algorithm for approximately duplicate records detection over publication xml dataset. en_US
dc.language.iso en en_US
dc.publisher University of Computer Studies, Yangon en_US
dc.title Duplicate Record Detection in Data Cleaning Using DCS++ Algorithm en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository



Browse

My Account

Statistics