dc.contributor.author |
Phyo, Yin Yin |
|
dc.contributor.author |
Win, Thidar |
|
dc.date.accessioned |
2021-06-11T05:29:59Z |
|
dc.date.available |
2021-06-11T05:29:59Z |
|
dc.date.issued |
2021-01 |
|
dc.identifier.uri |
https://onlineresource.ucsy.edu.mm/handle/123456789/2584 |
|
dc.description.abstract |
Duplicate Record Detection is a multiple
record search process that represents the same
physical entity in a dataset. It is also known as the
record linkage (or) entity matching [1]. The databases
contain very large datasets. Datasets contain
duplicate records that do not share a common key or
contain errors such as incomplete information,
transcription errors and missing or differing standard
formats (non-standardized abbreviations) in the
detailed schemas of records from multiple databases.
So, the duplicate detection needs to complete its
process in a very shorter time. Duplicate detection
requires an algorithm for determining whether
records are duplicate records or not.
In this paper, calculate a similarity metric that is
commonly used to find similar field items and use the
Duplicate Count Strategy Multi-Record Increase
(DCS++) Algorithm for approximately duplicate
records detection over publication xml dataset. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
University of Computer Studies, Yangon |
en_US |
dc.title |
Duplicate Record Detection in Data Cleaning Using DCS++ Algorithm |
en_US |
dc.type |
Thesis |
en_US |