Abstract:
Features are the most important things and huge influence on machine learning models. The removing of noisy and irrelevant features causes to increase the efficiency and accuracy of the model. The recent increase in feature dimension presents a significant challenge to many current methods of feature selection in terms of learning accuracy. In this work, combined correlation with distance measures approach is presented for the selection of informative features in high dimensional classification problem. Firstly, Pearson Correlation Coefficient is used to measure the relevance between attributes and class. And then, to calculate the redundancy of two feature vectors, Euclidean Distance is applied. To evaluate the presented approach, various classifiers with ten-fold cross validation are utilized based on five well- known datasets from UCI machine learning repository. The results show that the attribute subsets of all five well-known datasets generated by the proposed algorithm are more likely to be truly representative of classifier performance compared with original attribute set.