Abstract:
The World Wide Web, the largest shared
information source is growing exponentially and the
amount of business news on the web is overwhelming
and need to be handled properly. As such, grouping the
web document into cluster for speedy information
retrieved becomes imperative. Clustering technique
organizes a large quantity of unordered text documents
into a small number of meaningful and coherent
clusters, thereby providing a basis for intuitive and
informative navigation and browsing mechanisms. The
quality of clustering result depends greatly on the
representation of documents and the clustering
algorithm. In traditional document representation
methods, the frequency count of the document terms is
used for the feature vector representing the documents.
But traditional document representation methods
cannot identify related terms semantically. Documents
written in human language contain contexts and the
words used to describe these contexts are generally
semantically related. Motivated by this fact, domain
ontology is developed to promote the enrichment of
semantic representation of terms. Then, Particle Swarm
Optimization (PSO) clustering algorithm is used to
cluster the web documents efficiently. The paper
constitutes the comparative results of using PSO
algorithm only and PSO algorithm with Ontology for
clustering web documents. According to the analytical
results, the representation of terms by using Ontology
is significantly efficient and the implementation of PSO
algorithm achieves better performance in intra cluster
and inters cluster similarity.