UCSY's Research Repository

Record Matching System for Publication Dataset using Multi-pass Sorted Neighborhood Method

Show simple item record

dc.contributor.author Yi, Soe Lai
dc.date.accessioned 2019-07-12T04:26:34Z
dc.date.available 2019-07-12T04:26:34Z
dc.date.issued 2010-12-16
dc.identifier.uri http://onlineresource.ucsy.edu.mm/handle/123456789/832
dc.description.abstract Record matching is the task of identifying records that match the same real world entity. Detecting data records that are approximate duplicates, is an important task. Datasets may contain duplicate records concerning the same real-world entity because of data entry errors, unstandardized abbreviations, or differences in the detailed schemas of records from multiple databases. This paper describes a record matching algorithm, is based on the multi-pass sorted neighborhood method for publication datasets. It also detects data duplication over publication xml database, produces a higher percentage of correct duplicates and a lower percentage of false positive, on multiple key sorting pass. Multi-pass approach is used, which is based on the combination of keys. Since no single key is sufficient to catch all matching records, combining results of individual passes produces more accurate results at lower cost. According to experimental results, multi-pass approach is at lowest false positive error (FPE) and lowest false negative error (FNE). en_US
dc.language.iso en en_US
dc.publisher Fifth Local Conference on Parallel and Soft Computing en_US
dc.subject record matching en_US
dc.subject approximate duplicate en_US
dc.subject multi-pass sorted neighborhood method en_US
dc.title Record Matching System for Publication Dataset using Multi-pass Sorted Neighborhood Method en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository



Browse

My Account

Statistics