Abstract:
Documents clustering become an essential
technology with the popularity of the Internet.
That also means that fast and high-quality
document clustering technique play core topics.
Text clustering or shortly clustering is about
discovering semantically related groups in an
unstructured collection of documents. Clustering
has been very popular for a long time because it
provides unique ways of digesting and
generalizing large amounts of information. One
of the issue of clustering is to extract proper
feature (terms) of a problem domain. The
existing clustering technology mainly focuses on
term weight calculation. To achieve more
accurate document clustering, more informative
features including concept weight are important.
Feature Selection is important for clustering
process because some of the irrelevant or
redundant feature may misguide the clustering
results. To counteract this issue, the proposed
system uses the concept weight for clustering in
accordance with the principles of ontology. To a
certain extent, it has resolved the semantic
problem in specific areas.