Date of Submission

Spring 2015

Academic Programs and Concentrations

Mathematics

Project Advisor 1

Mary Krembs

Abstract/Artist's Statement

The K-means clustering algorithm works on a data set with n data points in d dimensional space R^d. It determines a set of K centroids in Rd. Clustering is accomplished by assigning each point in the data set to its closest centroid. However, the K-means algorithm has a few draw backs. First, it requires the value of K, number of centroids, to be pre-specified. Second, the algorithm begins by randomly selecting the centroid locations. Third the accuracy of the output clusters in K-means is dependent on the type of clustering in the data. In this paper we propose a distance based definition for the clusterability of a data set using the edges in a Delaunay triangulation. We propose an algorithm to pre-process K-means; the output of the algorithm contains a range for K and initial centroid information. Our results show that a pre-processed K-means requires a lower number of iterations to reach completion. Also, by using cluster evaluation techniques such as the F-measure, Purity, and Entropy, we show that the results obtained from a pre-processed K-means consistently produces more accurate clusters. We also propose a new clustering algorithm which uses Delaunay triangulation to obtain clusters. We also show that this algorithm produces very accurate clusters in large number of data sets, even in data sets where $K$-means fails.

Open Access Agreement

On-Campus only

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.

Recommended Citation

Khan, Mohd Ahnaf Habib, "Pre-processing for K-means Clustering Algorithm" (2015). Senior Projects Spring 2015. 260.
https://digitalcommons.bard.edu/senproj_s2015/260

Download

This work is protected by a Creative Commons license. Any use not permitted under that license is prohibited.

Bard Off-campus Download

Bard College faculty, staff, and students can login from off-campus by clicking on the Off-campus Download button and entering their Bard username and password.

COinS

Senior Projects Spring 2015

Pre-processing for K-means Clustering Algorithm

Date of Submission

Academic Programs and Concentrations

Project Advisor 1

Abstract/Artist's Statement

Open Access Agreement

Creative Commons License

Recommended Citation

Search the Site

Browse the Commons

Author Corner

Senior Projects Spring 2015

Pre-processing for K-means Clustering Algorithm

Author

Date of Submission

Academic Programs and Concentrations

Project Advisor 1

Abstract/Artist's Statement

Open Access Agreement

Creative Commons License

Recommended Citation

Share

Search the Site

Browse the Commons

Author Corner