Duplicate Record Detection in Data Cleaning Using DCS++ Algorithm

Phyo, Yin Yin; Win, Thidar

dc.contributor.author	Phyo, Yin Yin
dc.contributor.author	Win, Thidar
dc.date.accessioned	2021-06-11T05:39:39Z
dc.date.available	2021-06-11T05:39:39Z
dc.date.issued	2021-06
dc.identifier.uri	https://onlineresource.ucsy.edu.mm/handle/123456789/2586
dc.description.abstract	Nowadays, Duplicate Record Detection is a multiple record search process that represents the same physical entity in a dataset. It is also known as the record linkage (or) entity matching. The databases contain a very large dataset. Datasets contain duplicate records that do not share a common key or contain errors such as incomplete information, transcription errors and missing or differing standard formats (nonstandardized abbreviations) in the detailed schemas of records from multiple databases. Therefore, the duplicate detection needs to complete its process in a very shorter time. Duplicate detection requires an algorithm for determining whether records are duplicate records or not. In this system, the researcher calculates a similarity metric that is commonly used to find similar field items and uses the Duplicate Count Strategy-Multi Record Increase (DCS++) Algorithm for approximately duplicate records detection over publication xml dataset.	en_US
dc.language.iso	en	en_US
dc.publisher	University of Computer Studies, Yangon	en_US
dc.title	Duplicate Record Detection in Data Cleaning Using DCS++ Algorithm	en_US
dc.type	Thesis	en_US