Duplicate Records Elimination in Bibliographical Dataset using Priority Queue Algorithm with Smith-Waterman Algorithm

Thaung, Su Mon; Htike, Thin Thin

UCSYRR Home
/
Conferences
/
Local Conference on Parallel and Soft Computing
/
Fifth Local Conference on Parallel and Soft Computing
/
View Item

dc.contributor.author	Thaung, Su Mon
dc.contributor.author	Htike, Thin Thin
dc.date.accessioned	2019-07-12T03:57:18Z
dc.date.available	2019-07-12T03:57:18Z
dc.date.issued	2010-12-16
dc.identifier.uri	http://onlineresource.ucsy.edu.mm/handle/123456789/811
dc.description.abstract	Often, in the real world, entities have two or more representations in databases. Duplicate records do not share a common key and / or they contain errors that make duplicate matching a difficult task. A major problem that arises from integrating different databases is the existence of duplicates. Data cleaning is the process for identifying two or more records within the database, which represent the same real world object (duplicates), so that a unique representation for each object is adopted. This system addresses the data cleaning problem of detecting duplicate records that are approximate duplicates, but not exact duplicates. It uses Priority Queue algorithm with Smith Waterman algorithm for computing minimum edit-distance similarity values to recognize pairs of approximately duplicates and then eliminate the detected duplicate records. And, we also determine the performance evaluation with the lowest FP %( false positive percentage) and FN %( false negative percentage) as the best result.	en_US
dc.language.iso	en	en_US
dc.publisher	Fifth Local Conference on Parallel and Soft Computing	en_US
dc.title	Duplicate Records Elimination in Bibliographical Dataset using Priority Queue Algorithm with Smith-Waterman Algorithm	en_US
dc.type	Article	en_US