Abstract:
Document clustering is text processing that groups documents with similar concept. Clustering is defined as a process of partitioning a set of objects (patterns) into a set disjoined group (clusters). Its goal is to reduce the amount of data by categorizing or grouping similar data items together and obtain useful information. Clustering methods can be divided into two basic types: hierarchical and partitional clustering. This system used two partitional clustering methods. They are Self-Organizing Map (SOM) and K-Means. Self-Organization Maps is an artificial neural network model that is well suited for mapping high dimensional data into a two-dimensional representation space. SOM clustering is one of the well-known unsupervised clustering techniques. The goal of K-Means is to find k points of a dataset that can best represent the dataset in a certain mathematical sense (to be detailed later). These k points are also known as cluster centers, prototypes, centroids, or code words, and so on. The most known class of partitioned clustering algorithms is the K-Means algorithm and its variants. In this paper, documents are clustered by SOM algorithm how these are related to each other and K-Means start by randomly selecting k point cluster means; then assigns each document to its nearest cluster mean.