UCSY's Research Repository

Duplicate Records Elimination in Bibliographical Dataset using Priority Queue Algorithm with Smith-Waterman Algorithm

Show simple item record

dc.contributor.author Thaung, Su Mon
dc.contributor.author Htike, Thin Thin
dc.date.accessioned 2019-07-12T03:57:18Z
dc.date.available 2019-07-12T03:57:18Z
dc.date.issued 2010-12-16
dc.identifier.uri http://onlineresource.ucsy.edu.mm/handle/123456789/811
dc.description.abstract Often, in the real world, entities have two or more representations in databases. Duplicate records do not share a common key and / or they contain errors that make duplicate matching a difficult task. A major problem that arises from integrating different databases is the existence of duplicates. Data cleaning is the process for identifying two or more records within the database, which represent the same real world object (duplicates), so that a unique representation for each object is adopted. This system addresses the data cleaning problem of detecting duplicate records that are approximate duplicates, but not exact duplicates. It uses Priority Queue algorithm with Smith Waterman algorithm for computing minimum edit-distance similarity values to recognize pairs of approximately duplicates and then eliminate the detected duplicate records. And, we also determine the performance evaluation with the lowest FP %( false positive percentage) and FN %( false negative percentage) as the best result. en_US
dc.language.iso en en_US
dc.publisher Fifth Local Conference on Parallel and Soft Computing en_US
dc.title Duplicate Records Elimination in Bibliographical Dataset using Priority Queue Algorithm with Smith-Waterman Algorithm en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository



Browse

My Account

Statistics