UCSY's Research Repository

Informative Content Extraction for Web Page using Text Density and Visionbased Page Segmentation (VIPS) Algorithm Integration

Show simple item record

dc.contributor.author Mon, Ei Phyu Phyu
dc.contributor.author Yuzana
dc.date.accessioned 2019-07-19T15:03:43Z
dc.date.available 2019-07-19T15:03:43Z
dc.date.issued 2017-12-27
dc.identifier.uri http://onlineresource.ucsy.edu.mm/handle/123456789/1097
dc.description.abstract Web pages consist of not only actual content, but also other elements such as branding banners, navigational elements, advertisements, copyright etc.Irrelevant content in the Web page is treated as noisy content. This noisy content is typically not related to the main subjects of the webpages. A method is necessary to extract the informative content and discard the noisy content from Web pages. This system is used an integration of textual and visual importance features to extract the informative contents from Web pages. Initially a web page is converted into Document Object Model (DOM) tree. For each node in the DOM tree, textual and visual importance is calculated. Textual importance and visual importance is combined to form hybriddensity.DensitySumis calculated and used in content extraction algorithm to extract the informative content from Web pages. The algorithm is tested with various web domains and styles of web pages. Performance of web content extraction is obtained by calculating precision and recall. en_US
dc.language.iso en en_US
dc.publisher Eighth Local Conference on Parallel and Soft Computing en_US
dc.title Informative Content Extraction for Web Page using Text Density and Visionbased Page Segmentation (VIPS) Algorithm Integration en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository



Browse

My Account

Statistics