Abstract:
K-nearest neighbor algorithm is one of the
most popular classifications in machine learning
zone. However, as k-nearest neighbor is a lazy
learning method, when a system bases on huge
amount of history data, it faces processing
performance degradation. Many researchers
usually care about only classification accuracy,
but the speed of estimation also play an essential
role in real time prediction systems. For this issue,
this research proposes correlation coefficientbased
k-mean clustering for k-nearest neighbor
aiming at upgrading the performance of k-nearest
neighbor classification by improving processing
time performance. For the experiments, we used
the real data sets, Breast Cancer, Breast Tissue
and Iris, from UCI machine learning repository.
Moreover, the real traffic data collected from
Ojana junction, Route 58, Okinawa, Japan, was
also utilized to show the efficiency of this method.
By using these datasets, we prove the better
processing performance and prediction accuracy
of the new approach by comparing the classical
k-nearest neighbor with the new k-nearest
neighbor.