Extraction of Informative Blocks from Myanmar Web Pages Based on Entropy Measure

Nyein, Swe Swe

UCSYRR Home
/
Conferences
/
International Conference on Computer Applications (ICCA)
/
Tenth International Conference On Computer Applications (ICCA 2012)
/
View Item

Extraction of Informative Blocks from Myanmar Web Pages Based on Entropy Measure

Nyein, Swe Swe

URI: http://onlineresource.ucsy.edu.mm/handle/123456789/2390

Date: 2012-02-28

Abstract:

With the large amount of information on the Internet, Web pages have been the potential source of information retrieval and data mining technology. Apart from the informative blocks (main blocks), a web page also comprises of noisy information that can degrade the performance of information retrieval applications. A method to identify and extract the informative content from web pages is needed. In this paper, we propose an Informative Block Extraction System using Document Object Model (DOM) tree and Entropy Evaluation Model (EEM) which quantifies the expected value of the information contained in a web page. To evaluate the proposed system, Myanmar News Web Pages are applied and also measure the accuracy of the system, precision and recall are used.

Show full item record