Abstract:
Many organizations use World Wide Web for multipurpose
platform during these days. It is very important to understand how a web
site is being used by users. Web usage mining also known as web log
mining, aims to discover interesting and frequent user access patterns
from web browsing data that are stored in web server logs, proxy server
logs or browser logs. Web usage mining involves the automatic discovery
of patterns from one or more Web servers using web log data. Usage
Mining tools discover and predict user behavior, in order to help
designer, improve the web site, attract visitors, or give regular users a
personalized and adaptive service. In this thesis, the aim is to find
frequent user access pattern from web log entries. Combined effort of
clustering and association rule mining is used to apply for pattern
discovery. The 30 web log files are used from United Nations High
Commissioner for Refugees. Density-based clustering spatial clustering
application with noise (DBSCAN) has been used to group the users based
on their access patterns and Apriori algorithm is applied to generate
frequent user access patterns. As DBSCAN groups the user based on their
access patterns, those users who don’t share the similar access patterns
are removed. Hence clustering reduces the data size and Apriori generates
concise and relevant rules. The result from this system is highly depends
on the parameters provided by users. This system is implemented using
python programming language and SQLite is used a storage layer.