Adaptive Duplicate Detection in XML Document Based on Hash Function

dc.contributor.author	Lwin, Thandar
dc.contributor.author	Nyunt, Thi Thi Soe
dc.date.accessioned	2019-08-06T12:44:43Z
dc.date.available	2019-08-06T12:44:43Z
dc.date.issued	2009-12-30
dc.identifier.uri	http://onlineresource.ucsy.edu.mm/handle/123456789/1914
dc.description.abstract	The task of detecting duplicate records that represents the same real world object in multiple data sources, commonly known as duplicate detection and it is relevant in data cleaning and data integration applications. Numerous approaches both for duplicate detection in relational and XML data exist. As XML becomes increasingly popular for data representation, algorithms to detect duplicates in XML documents are required. Previous domain independent solutions to this problem relied on standard textual similarity functions (e.g., edit distance, cosine metric) between objects. However, such approaches result in large numbers of false positives if we want to identify domain-specific abbreviations and conventions. In this paper, we present a generalized framework for duplicate detection, specialized to XML. The aim of this research is to develop an efficient algorithm for detecting duplicate in complex XML documents and to reduce number of false positive by using hash function algorithm.	en_US
dc.language.iso	en	en_US
dc.publisher	Fourth Local Conference on Parallel and Soft Computing	en_US
dc.title	Adaptive Duplicate Detection in XML Document Based on Hash Function	en_US
dc.type	Article	en_US