Clustering in Hyper-Linked Document Database using Efficient Graph Algorithm

Naing, Win Lai Lai; Htike, Thin Thin

UCSYRR Home
/
Conferences
/
Local Conference on Parallel and Soft Computing
/
Fifth Local Conference on Parallel and Soft Computing
/
View Item

dc.contributor.author	Naing, Win Lai Lai
dc.contributor.author	Htike, Thin Thin
dc.date.accessioned	2019-07-12T04:08:41Z
dc.date.available	2019-07-12T04:08:41Z
dc.date.issued	2010-12-16
dc.identifier.uri	http://onlineresource.ucsy.edu.mm/handle/123456789/820
dc.description.abstract	Clustering is an essential data mining task with numerous applications. Clustering is the process of grouping the data into classes or clusters, so that objects within a cluster have high similarity in comparison to one another but are very dissimilar to objects in other clusters. This system uses efficient graph clustering algorithm to group online scientific literature. The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. Our approach to clustering uses the citation patterns of the CiteSeer database to form previously established clusters (soft clusters). The soft clusters, in turn, can be compared to one another in terms of the papers that they have in common. Similar soft clusters are merged by Ward’s agglomerative hierarchical clustering method. In the end we find the collections of documents that are all related to one another by their citation patterns. By approaching in this manner, we can rapidly calculate clusters for datasets with tens of thousands of documents.	en_US
dc.language.iso	en	en_US
dc.publisher	Fifth Local Conference on Parallel and Soft Computing	en_US
dc.title	Clustering in Hyper-Linked Document Database using Efficient Graph Algorithm	en_US
dc.type	Article	en_US