Abstract:
Clustering is the process of grouping a set of
physical or abstract objects into classes of similar
objects is called clustering. A cluster is a collection
of data objects that are similar to one another
within the same cluster and are dissimilar to the
object in other cluster. Measuring the dissimilarity
between data objects is one of the primary tasks for
distance-based techniques in data mining and
machine learning, e.g., distance-based clustering
and distance-based classification. The quality of
clustering can be accessed based on dissimilarity
measures of objects which can be computed for
various types of data. In this paper, we propose
general framework for measuring a dissimilarity
betweens various data analysis is proposed. The key
idea is to consider the dissimilarity between two
values of an attribute as a combination of
dissimilarities between the conditional probability
distributions of other attributes given these two
values. In this system, the similarity is guessed by
computing the dissimilarity measure between two
objects. This can get the most similar values and
the least similar values before clustering analysis.