UCSY's Research Repository

Clustering XML Document Based On Path Similarities Using Structure Only

Show simple item record

dc.contributor.author Mon, Ei Ei
dc.contributor.author Tun, Khin Nwe Ni
dc.date.accessioned 2019-08-06T12:51:56Z
dc.date.available 2019-08-06T12:51:56Z
dc.date.issued 2009-12-30
dc.identifier.uri http://onlineresource.ucsy.edu.mm/handle/123456789/1918
dc.description.abstract We propose a methodology for clustering XML documents on the basis of their structural similarities. This research combines the methods of common XPath and K-means clustering that improve the efficiency for those XML documents with many different structures. The common XPath is used for searching similarities between huge numbers of XML documents’ paths. K-means clustering algorithm is essentially used to accurate clusters. In order to cluster the documents’ paths we indicate the steps by step methods. The first step includes frequent structure mining for searching similarities between the huge amounts of XML documents’ structures by using the F-P growth method. The second step builds dimensional feature vector matrix by using extracted paths. Based on the set of common path vectors collected, we compute the structure similarity between the XML documents. And the last step utilizes the K-means clustering algorithm is used to create accurate clusters which are based on the idea of using path based clustering, which groups the documents according to their common XPaths, i.e. their frequent structures. The quality of clustering can be measured on the dissimilarity of document structures. Also, experimental evaluation performed on both synthetic and real data shows the effectiveness of our approach. en_US
dc.language.iso en en_US
dc.publisher Fourth Local Conference on Parallel and Soft Computing en_US
dc.subject common XPath en_US
dc.subject K-means clustering en_US
dc.subject XML Document Clustering en_US
dc.subject Data Mining en_US
dc.subject Frequent Structure Mining en_US
dc.title Clustering XML Document Based On Path Similarities Using Structure Only en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository



Browse

My Account

Statistics