Abstract:
Web document clustering becomes an essential
technology with the popularity of the Internet. That
also means that fast and high-quality document
clustering techniques play core topics. One of the
main issues for clustering is the feature selection for
the documents. The selected features should contain
sufficient or more reliable information about original
web documents. Feature selection is important
because some of the irrelevant or redundant feature
may misguide the clustering result. To counteract
this issue, this paper proposes the concept weight for
feature selection which can improve the efficiency
and accuracy of clustering. The system is designed to
perform document preprocessing, weight estimation
and clustering process that uses the term weight and
semantic weight. This paper introduces a method
which proposed the concept weight for clustering
process.