Abstract:
Web page classification is significantly different from traditional text classification because of the presence of some additional information, provided by the HTML structure and by the presence of hyperlinks. Web classification is based on a text classification method known as Naïve Bayes. Naïve Bayes is often used in text classification applications and experiments because of its simplicity and effectiveness. In text and web page classification, Bayesian prior probabilities are usually based on term of word frequencies and term counts within a page and its linked pages. This paper presents Naïve Bayes method to classify Web pages by using keywords and defines the respective sections or departments for trading company. This paper is focused on web page representation by text content.