Abstract:
Web pages consist of both informative and non-informative content for the users. Web Page Indexing System, Web Page Classification and Clustering System and Web Information Extraction System that help the users to support with their desired information need to know both the informative and non-informative content exactly. The accurate segmentation of web page into semantic blocks is needed for the correct differentiation between informative and non-informative content blocks of the web page. Although DOM-based Segmentation Approaches are used to segment a web page into semantic blocks, the results are not satisfactory. So, Vision-based Segmentation Approaches have been used for Web Page Segmentation. However, these approaches also have some drawbacks. So the need of a good Vision-based Web Page Segmentation becomes evident. This paper proposes Effective Visual Block Extractor (EVBE), a new Algorithm, to overcome the problems of DOM-based Approaches and reduce the drawbacks of previous works in Web Page Segmentation