Abstract:
Clustering is currently one of the most crucial techniques for dealing with massive amount of heterogeneous information on the web, which is beyond human being’s capacity to digest. Recent studies have shown that the most commonly used partitioning-based clustering algorithm, the K-means algorithm, is more suitable for large datasets. However, the K-means algorithm can generate a local optimal solution. This paper presents our work that aims to avoid these shortcomings by using Harmony K-means (HKA) algorithm. HKA deals with documents clustering based on harmony search optimization method that finds near global optimal clusters.