dc.contributor.author |
Hlaing, Nwe Nwe
|
|
dc.contributor.author |
Nyunt, Thi Thi Soe
|
|
dc.date.accessioned |
2019-11-15T04:39:32Z |
|
dc.date.available |
2019-11-15T04:39:32Z |
|
dc.date.issued |
2012-02-28 |
|
dc.identifier.uri |
http://onlineresource.ucsy.edu.mm/handle/123456789/2447 |
|
dc.description.abstract |
The World Wide Web is the main “all
kind of information” repository and has been so
far very successful in disseminating to humans.
As web sites are getting more complicated, the
construction of web information extraction
systems becomes more difficult and timeconsuming. Therefore we need to mine the main
content of web page in order to extract
information from such web pages. In this paper,
we study the problem of automatically
extracting the web information (unsupervised
IE) without any learning examples or other
similar human input. Firstly, web pages are
segment into several raw chunks. Then remove
the noisy blocks based on product features.
Data region identification is based on the
observation that appearance similarity of the
data record in web document. Therefore block
clustering method is proposed based on this
observation. This approach requires no human
intervention and experimental results have
shown its accuracy to be promising. |
en_US |
dc.language.iso |
en_US |
en_US |
dc.publisher |
Tenth International Conference On Computer Applications (ICCA 2012) |
en_US |
dc.subject |
Information Extraction (IE) |
en_US |
dc.subject |
Wrapper |
en_US |
dc.subject |
Document Object Model (DOM) |
en_US |
dc.title |
Extracting Information Content from Web Pages Using Block Clustering Method |
en_US |
dc.type |
Article |
en_US |