UCSY's Research Repository

Discovering Informative Content Blocks for Efficient Web Data Extraction

Show simple item record

dc.contributor.author Hlaing, Nwe Nwe
dc.contributor.author Nyunt, Thi Thi Soe
dc.date.accessioned 2019-07-25T04:33:38Z
dc.date.available 2019-07-25T04:33:38Z
dc.date.issued 2010-12-16
dc.identifier.uri http://onlineresource.ucsy.edu.mm/handle/123456789/1265
dc.description.abstract As web sites are getting more complicated, the construction of web information extraction systems becomes more troublesome and timeconsuming. A common theme is the difficulty in locating the segments of a page in which the target information is contained, which we call the informative blocks. So discriminating informative blocks from the noisy blocks and then extracting the informative blocks from web page is an important task. In this paper, we propose a method that utilizes both the visual features and semantic information to extract information block. First, the VIPS (Visionbased Page Segmentation) algorithm is used to partition a web page into semantic blocks with a hierarchy structure. Then spatial features (such as position, size) and content feature (the number of image and links) are extracted to construct feature vector for each block. Secondly based on these feature, the blocks with similar content structures and spatial structures are clustered by means of similarity computation. After clustering blocks with similar structures, determine the cluster with the largest size and nearest distance to the centre of page as informative block. en_US
dc.language.iso en en_US
dc.publisher Fifth Local Conference on Parallel and Soft Computing en_US
dc.subject Vision-based Page Segmentation en_US
dc.subject Information Extraction en_US
dc.subject Block Clustering en_US
dc.title Discovering Informative Content Blocks for Efficient Web Data Extraction en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository



Browse

My Account

Statistics