| dc.contributor.author | Phyo, Yin Yin | |
| dc.contributor.author | Win, Thidar | |
| dc.date.accessioned | 2021-06-11T05:29:59Z | |
| dc.date.available | 2021-06-11T05:29:59Z | |
| dc.date.issued | 2021-01 | |
| dc.identifier.uri | https://onlineresource.ucsy.edu.mm/handle/123456789/2584 | |
| dc.description.abstract | Duplicate Record Detection is a multiple record search process that represents the same physical entity in a dataset. It is also known as the record linkage (or) entity matching [1]. The databases contain very large datasets. Datasets contain duplicate records that do not share a common key or contain errors such as incomplete information, transcription errors and missing or differing standard formats (non-standardized abbreviations) in the detailed schemas of records from multiple databases. So, the duplicate detection needs to complete its process in a very shorter time. Duplicate detection requires an algorithm for determining whether records are duplicate records or not. In this paper, calculate a similarity metric that is commonly used to find similar field items and use the Duplicate Count Strategy Multi-Record Increase (DCS++) Algorithm for approximately duplicate records detection over publication xml dataset. | en_US |
| dc.language.iso | en | en_US |
| dc.publisher | University of Computer Studies, Yangon | en_US |
| dc.title | Duplicate Record Detection in Data Cleaning Using DCS++ Algorithm | en_US |
| dc.type | Thesis | en_US |