dc.description.abstract |
Obtaining important pages rapidly is very
useful when a crawler cannot visit the entire Web
in a reasonable amount of time. One approach is
using focused crawler because it tries to
download only pages with pre-defined topic to
avoid the irrelevant web documents and reduce
network traffic. It can also minimize the overall
number of downloaded Web pages for processing
and maximize the percentage of relevant pages.
In this paper, we present in what order a focused
crawler should visit the URLs it has seen, in
order to obtain more “important” pages first.
During crawling,Naive Bayes Classifier with
four feature representations is used to
enhancecorrectness of a specific topic. To
provide sorting URLs, we use the Priority
equation that gives every page a score. |
en_US |