UCSY's Research Repository

Detecting the Noise from Web Pages Using Entropy Measure

Show simple item record

dc.contributor.author Nyein, Swe Swe
dc.date.accessioned 2019-07-03T06:55:06Z
dc.date.available 2019-07-03T06:55:06Z
dc.date.issued 2011-05-05
dc.identifier.uri http://onlineresource.ucsy.edu.mm/handle/123456789/253
dc.description.abstract The rapid expansion of the Internet has made Web a popular place for disseminating and collecting information from the web. The noisy items in web pages are one of the major problems to extract the main contents. It is also important how to detect noises and distinguish valuable information from noisy data within a single Web page. In this paper, we propose a noise detection technique is based on the Document Object Model (DOM) tree. In DOM tree, weight of each node calculated by tf-idf scheme is added in entropy measure to get the respective value, which will be compared with a threshold value. Those less than threshold value are regarded as noise. Experimental results on a range of datasets using precision and recall measure show that our framework can improve noise detection accuracy. en_US
dc.language.iso en en_US
dc.publisher Ninth International Conference On Computer Applications (ICCA 2011) en_US
dc.title Detecting the Noise from Web Pages Using Entropy Measure en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository



Browse

My Account

Statistics