Abstract:
Clustering is the task of discovering group of
similar objects or items and there have been many
applications for clustering such as image
segmentation, document retrieval and data mining.
The increasing volumes of information emerging by
the development of technology makes clustering of
very large scale of data a challenging task.
Differential evolution (DE) algorithm is an
innovative evolutionary algorithm (EA) for global
optimization, where the mutation operator is based
on the distribution of solutions in the population.
Clustering can be viewed as optimization problem
where the task is finding the optimal cluster solution.
To deal with clustering of huge amount of data sets,
the use of classical DE is time-consuming that it is
infeasible. This paper proposes a parallel differential
evolution algorithm for clustering enormous data
based on Spark framework. The proposed approach
will be efficient for large-scale data clustering.