UCSY's Research Repository

Duplicate Record Detection in Data Cleaning Using DCS++ Algorithm

Show simple item record

dc.contributor.author Phyo, Yin Yin
dc.contributor.author Win, Thidar
dc.date.accessioned 2021-06-11T05:29:59Z
dc.date.available 2021-06-11T05:29:59Z
dc.date.issued 2021-01
dc.identifier.uri https://onlineresource.ucsy.edu.mm/handle/123456789/2584
dc.description.abstract Duplicate Record Detection is a multiple record search process that represents the same physical entity in a dataset. It is also known as the record linkage (or) entity matching [1]. The databases contain very large datasets. Datasets contain duplicate records that do not share a common key or contain errors such as incomplete information, transcription errors and missing or differing standard formats (non-standardized abbreviations) in the detailed schemas of records from multiple databases. So, the duplicate detection needs to complete its process in a very shorter time. Duplicate detection requires an algorithm for determining whether records are duplicate records or not. In this paper, calculate a similarity metric that is commonly used to find similar field items and use the Duplicate Count Strategy Multi-Record Increase (DCS++) Algorithm for approximately duplicate records detection over publication xml dataset. en_US
dc.language.iso en en_US
dc.publisher University of Computer Studies, Yangon en_US
dc.title Duplicate Record Detection in Data Cleaning Using DCS++ Algorithm en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository



Browse

My Account

Statistics