UCSY's Research Repository

An Efficient Approach for Web Data Extraction

Show simple item record

dc.contributor.author Htwe, Thanda
dc.contributor.author Kham, Nang Saing Moon
dc.date.accessioned 2019-08-06T11:50:03Z
dc.date.available 2019-08-06T11:50:03Z
dc.date.issued 2009-12-30
dc.identifier.uri http://onlineresource.ucsy.edu.mm/handle/123456789/1902
dc.description.abstract Most of the Web page typically contains clutter unlike conventional data or text. It usually has such noise data as navigation panels, copyright and privacy notices, and advertisement. These noise data can seriously harm for Web miners by extracting whole document rather than the informative content and also retrieve non-relevant results. So, eliminating these noise patterns is great important. In this paper, we propose an effective technique to detect and remove various noise patterns from Web document to enhance Web mining. Our system first builds DOM tree structure for an incoming Web page and then split it into subtrees to detect noise data. We also apply back propagation neural network algorithm to classify various noise patterns, data patterns and mixture patterns in current Web page. The classification result of neural network is used for eliminating various noise patterns. The proposed technique is evaluated on several commercial Web sites and News Web sites to show the performance and improvement of our approach. en_US
dc.language.iso en en_US
dc.publisher Fourth Local Conference on Parallel and Soft Computing en_US
dc.title An Efficient Approach for Web Data Extraction en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository



Browse

My Account

Statistics