Abstract:
Extensible Mark-up Language (XML) is
increasingly important in data exchange and
information management. The automatic processing
and management of XML-based data are ever more
popular research issues due to the increasing
abundant use of XML, especially on the web.
Clustering is also helpful for categorizing web
documents. Clustering, which means the physical
arrangement of objects, can be an important factor in
improving the performance in the storage model.
Clustering XML documents using structural
similarity based on Progressively Clustering XML by
Structural Similarity (PCXSS) method is presented in
this paper. The PCXSS method intends to deal with
the heterogeneous XML schemas to cluster XML
documents by considering only the structural
similarity. The efficiency of PCXSS methodology has
been analysed with the real datasets which are ACM
SIGMOD record, DBLP, XML Repository and
Wisconsin’s XML data bank.