Web Content Classification using Content Features and Ant Colony Optimization Algorithm

Aye, Nilar; San, Pan Ei

UCSYRR Home
/
Conferences
/
International Conference on Computer Applications (ICCA)
/
Fourteenth International Conference On Computer Applications (ICCA 2016)
/
View Item

dc.contributor.author	Aye, Nilar
dc.contributor.author	San, Pan Ei
dc.date.accessioned	2019-07-03T08:26:26Z
dc.date.available	2019-07-03T08:26:26Z
dc.date.issued	2016-02-25
dc.identifier.uri	http://onlineresource.ucsy.edu.mm/handle/123456789/343
dc.description.abstract	The web content classification system classifies the noise or content from HTML web pages. The system proposes the Content Extraction algorithm using content features to remove the boilerplate and to extract the main content from the web page. After observation the HTML tags, one line may not contain a piece of complete information and long texts are distributed in close lines, this system uses Text-Block Concept to determine the distance of any two neighbor lines with text and Feature Extraction such as Text Density (TD), anchor Anchor Link Density (ALD) and a new feature Title Keywords Density (TKD) classifies noise or content. After extracting the features, the system uses the C4.8 decision tree method to classify the block is content or non-content by using above features. After extracting the main contents, the system uses a new classification algorithm, Ant Colony Algorithm (ACO) that is able to solve discrete problems and discreteness of text document’s features. Texts are classified by crawling of class population ants which have class information with them to find an optimal path matching during it iterates in the algorithm. Finally, the system gains more interest as the classifier improves its performance with experience.	en_US
dc.language.iso	en	en_US
dc.publisher	Fourteenth International Conference On Computer Applications (ICCA 2016)	en_US
dc.title	Web Content Classification using Content Features and Ant Colony Optimization Algorithm	en_US
dc.type	Article	en_US