UCSY's Research Repository

Extracting Information Content from Web Pages Using Block Clustering Method

Show simple item record

dc.contributor.author Hlaing, Nwe Nwe
dc.contributor.author Nyunt, Thi Thi Soe
dc.date.accessioned 2019-11-15T04:39:32Z
dc.date.available 2019-11-15T04:39:32Z
dc.date.issued 2012-02-28
dc.identifier.uri http://onlineresource.ucsy.edu.mm/handle/123456789/2447
dc.description.abstract The World Wide Web is the main “all kind of information” repository and has been so far very successful in disseminating to humans. As web sites are getting more complicated, the construction of web information extraction systems becomes more difficult and timeconsuming. Therefore we need to mine the main content of web page in order to extract information from such web pages. In this paper, we study the problem of automatically extracting the web information (unsupervised IE) without any learning examples or other similar human input. Firstly, web pages are segment into several raw chunks. Then remove the noisy blocks based on product features. Data region identification is based on the observation that appearance similarity of the data record in web document. Therefore block clustering method is proposed based on this observation. This approach requires no human intervention and experimental results have shown its accuracy to be promising. en_US
dc.language.iso en_US en_US
dc.publisher Tenth International Conference On Computer Applications (ICCA 2012) en_US
dc.subject Information Extraction (IE) en_US
dc.subject Wrapper en_US
dc.subject Document Object Model (DOM) en_US
dc.title Extracting Information Content from Web Pages Using Block Clustering Method en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository



Browse

My Account

Statistics