Abstract:
World Wide Web overwhelms us with the
immense amounts of widely distributed
interconnected, rich and dynamic information. As a
consequence of this, Web Usage Mining becomes one
of the popular research areas. It involves the
application of data mining techniques to discover
usage patterns from the Web access logs data.
Clustering is one of the important functions in Web
Usage Mining to group the user access patterns which
have the same access behavior. In this paper, we
would like to propose a new approach, asymmetric
binary variables (one type of Jaccard coefficient) to
perform clustering. And then the performance of our
proposed approach is compared with k-means
clustering. The resulting clusters from these two
methods are tested with two internal validation
methods: Dunn Index and DB Index (Davies and
Bouldin Index). Finally, we point out the strengths and
weaknesses of each method. According to the analysis
results, the findings of clustering upon these methods
can be seen clearly.