Date of Submission

Spring 2024

Academic Program

Computer Science; Mathematics

Project Advisor 1

Rose Sloan

Project Advisor 2

Ethan Bloch

Abstract/Artist's Statement

Clustering algorithms provide a useful method for classifying data. The majority of well known clustering algorithms are designed to find globular clusters, however this is not always desirable. In this senior project I present a new clustering algorithm, GBCN (Grid Box Clustering with Noise), which applies a box grid to points in Euclidean space to identify areas of high point density. Points within the grid space that are in adjacent boxes are classified into the same cluster. Conversely, if a path from one point to another can only be completed by traversing an empty grid box, then they are classified into separate clusters. GBCN requires two hyperparameters, one to determine the size of the grid and the other to adjust noise sensitivity. I provide algorithms and evaluation metrics to help the user determine appropriate hyperparameter values. I performed experiments on synthetic and real world data sets using GBCN and other clustering algorithms to evaluate GBCN's effectiveness and efficiency. The results of these experiments demonstrate that GBCN can effectively identify both globular and density-based clusters when given the right hyperparameter values, and that these hyperparameter values can be discovered using evaluation metrics.

Open Access Agreement

Open Access

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

This work is protected by a Creative Commons license. Any use not permitted under that license is prohibited.

Share

COinS