dc.description.abstract |
Classification is a form of data analysis that can
be used to extract models describing important data
classes or to predict future data trends. Data
classification is a two step process. This system is to
study the Naïve Bayesian Classifier and to classify
the class labels of data sets. In this system, classifier
is built on the training data sets and tests the
unknown datasets. And then, calculate the accuracy
of classifier by using F1-Measure (F1-score). The
Naïve Bayesian (NB) classifiers have been one of the
most popular techniques as basis of many
classification applications both theoretically and
practically. Before the classifier is built, standard
text documents are read, remove stop words and
punctuations, stemming the words by using Porter
Stemming Algorithm and then features are extracted
by using Bigram probability based on keywords such
as preprocessing step. The experiment is performed
on IEEE and ACM standard documents, research
documents. This system is determined the kind of
document, such as medicine, computer, engineering
and agriculture by using Naïve Bayesian Classifier. |
en_US |