Abstract:

The amount of text generated isincreasing day by day.This large volume of unstructured text cannot be merely processed and understand by computers. Therefore, efficient and effective techniques are required to find useful patterns.Text mining is the task of extracting meaningful information from text,.Clusteringis one of the data mining task, and is important for text mining. Text Clustering is mainly described as grouping of the similar documents a largecollection of unstructured documents. In this paper, we compare Euclidean distance and mahattan distance based onexecution time, number of iterations, number of clusters, sum of squared errors using clustering algorithm such as simple K means, and hierarchical clustering.


Keywords: Text Mining, Clustering, Simple K means, Hierarchical clustering, Euclidean distance, Manhattan distance. ;