UCSY's Research Repository

Web Page Categorization Based on Content and Data Extraction for Academic Community

Show simple item record

dc.contributor.author Phyu, Sabai
dc.contributor.author Linn, Khaing Wah Wah
dc.date.accessioned 2019-07-02T07:05:46Z
dc.date.available 2019-07-02T07:05:46Z
dc.date.issued 2014-02-17
dc.identifier.uri http://onlineresource.ucsy.edu.mm/handle/123456789/82
dc.description.abstract The web is a large amount of data and difficult to search information or data of user interest (IT academic field). Therefore, it needs to categorize for meet user’s interesting field easily. Web page categorization help improve the quality of web search. In this paper, we proposed a framework for web data extraction by using categorized web pages to improve data extraction accuracy and result. Firstly, the numbers of test web pages are defined as inputs. We use page segmentation algorithm (VIPS) to perform segmentation these pages to achieve content structure for web page cleaning and to evaluate informative or main content block. These main contents are categorized by using Support Vector Machine (SVM) which gives accurate and efficient result. These categorized web pages are stored into the database (IT library) to output data accurately when user query. en_US
dc.language.iso en en_US
dc.publisher Twelfth International Conference On Computer Applications (ICCA 2014) en_US
dc.subject VIPS en_US
dc.subject SVM en_US
dc.subject Web Page Segmentation en_US
dc.subject Categorization en_US
dc.subject Data extraction en_US
dc.title Web Page Categorization Based on Content and Data Extraction for Academic Community en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository



Browse

My Account

Statistics