Duplicate Record Detection in Data Cleaning Using DCS++ Algorithm

dc.contributor.author	Phyo, Yin Yin
dc.contributor.author	Win, Thidar
dc.date.accessioned	2021-06-11T05:29:59Z
dc.date.available	2021-06-11T05:29:59Z
dc.date.issued	2021-01
dc.identifier.uri	https://onlineresource.ucsy.edu.mm/handle/123456789/2584
dc.description.abstract	Duplicate Record Detection is a multiple record search process that represents the same physical entity in a dataset. It is also known as the record linkage (or) entity matching [1]. The databases contain very large datasets. Datasets contain duplicate records that do not share a common key or contain errors such as incomplete information, transcription errors and missing or differing standard formats (non-standardized abbreviations) in the detailed schemas of records from multiple databases. So, the duplicate detection needs to complete its process in a very shorter time. Duplicate detection requires an algorithm for determining whether records are duplicate records or not. In this paper, calculate a similarity metric that is commonly used to find similar field items and use the Duplicate Count Strategy Multi-Record Increase (DCS++) Algorithm for approximately duplicate records detection over publication xml dataset.	en_US
dc.language.iso	en	en_US
dc.publisher	University of Computer Studies, Yangon	en_US
dc.title	Duplicate Record Detection in Data Cleaning Using DCS++ Algorithm	en_US
dc.type	Thesis	en_US