dc.description.abstract |
A central problem in machine learning is
identifying a representative set of features from
which to construct a classification model for a
particular task. A good feature set that contains
highly correlated features with the class not only
improves the efficiency of the classification
algorithms but also improve the classification
accuracy. Modified-Multiple Correspondence
Analysis (M-MCA or MCA with Geometrical
Representation) explores the correlation between
different features and classes to score the
features for feature selection. The dependence
between a feature and a class is measured by a
derived value from χ
2 distance called the p-value.
It is a standard measure of the reliability of a
relation and is examined by p-value. The smaller
the p-value, the higher the possibility of the
correlation between a feature and a class is true.
In this paper, the conventional confidence
interval of Multiple Correspondence Analysis
(MCA) is modified to get smaller p-value and be
more reliable. To evaluate the performance of
proposed Modified-MCA, experiments are
carried out on benchmark datasets identified and
provided by WEKA and UCI repository. In the
experiments, Naïve Bayes, Decision Table and
JRip are used as the classifiers. The proposed
Modified-MCA demonstrates promising results
and performs better than well-known feature
selection, MCA. The results show that the
proposed method outperforms in terms of
classification accuracy and reduces the size of
feature subspace significantly. |
en_US |