Abstract:
With the large amount of information on the
Internet, Web pages have been the potential
source of information retrieval and data mining
technology. Apart from the informative blocks
(main blocks), a web page also comprises of
noisy information that can degrade the
performance of information retrieval
applications. A method to identify and extract the
informative content from web pages is needed. In
this paper, we propose an Informative Block
Extraction System using Document Object Model
(DOM) tree and Entropy Evaluation Model
(EEM) which quantifies the expected value of the
information contained in a web page. To
evaluate the proposed system, Myanmar News
Web Pages are applied and also measure the
accuracy of the system, precision and recall are
used.