UCSY's Research Repository

Adaptive Duplicate Detection in XML Document Based on Hash Function

Show simple item record

dc.contributor.author Lwin, Thandar
dc.contributor.author Nyunt, Thi Thi Soe
dc.date.accessioned 2019-08-06T12:44:43Z
dc.date.available 2019-08-06T12:44:43Z
dc.date.issued 2009-12-30
dc.identifier.uri http://onlineresource.ucsy.edu.mm/handle/123456789/1914
dc.description.abstract The task of detecting duplicate records that represents the same real world object in multiple data sources, commonly known as duplicate detection and it is relevant in data cleaning and data integration applications. Numerous approaches both for duplicate detection in relational and XML data exist. As XML becomes increasingly popular for data representation, algorithms to detect duplicates in XML documents are required. Previous domain independent solutions to this problem relied on standard textual similarity functions (e.g., edit distance, cosine metric) between objects. However, such approaches result in large numbers of false positives if we want to identify domain-specific abbreviations and conventions. In this paper, we present a generalized framework for duplicate detection, specialized to XML. The aim of this research is to develop an efficient algorithm for detecting duplicate in complex XML documents and to reduce number of false positive by using hash function algorithm. en_US
dc.language.iso en en_US
dc.publisher Fourth Local Conference on Parallel and Soft Computing en_US
dc.title Adaptive Duplicate Detection in XML Document Based on Hash Function en_US
dc.type Article en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


My Account