Abstract:
The web is a huge repository of information and there is a need for categorizing web documents to facilitate the search and retrieval of pages. Existing algorithms rely solely on the text content of the web pages for classification. In text and web page classification, Bayesian prior probabilities are usually based on term frequencies, term counts within a page. This paper presented a Naïve Bayes web page classification system to classify news genres .The features of web news genres are represented as vector representations using TF*IDF functions. For classification, there are two step; first is extracting the features from the web page and second is based on the training set by using Bayes Theorem to determine the categories of unknown web pages such as arts, health and so on. The system used
these technique minimize the set of resulting pages to
the user when searching and show the users what
information is available