Date of Submission
Spring 2024
Academic Program
Computer Science; Mathematics
Project Advisor 1
Rose Sloan
Project Advisor 2
Ethan Bloch
Abstract/Artist's Statement
Clustering algorithms provide a useful method for classifying data. The majority of well known clustering algorithms are designed to find globular clusters, however this is not always desirable. In this senior project I present a new clustering algorithm, GBCN (Grid Box Clustering with Noise), which applies a box grid to points in Euclidean space to identify areas of high point density. Points within the grid space that are in adjacent boxes are classified into the same cluster. Conversely, if a path from one point to another can only be completed by traversing an empty grid box, then they are classified into separate clusters. GBCN requires two hyperparameters, one to determine the size of the grid and the other to adjust noise sensitivity. I provide algorithms and evaluation metrics to help the user determine appropriate hyperparameter values. I performed experiments on synthetic and real world data sets using GBCN and other clustering algorithms to evaluate GBCN's effectiveness and efficiency. The results of these experiments demonstrate that GBCN can effectively identify both globular and density-based clusters when given the right hyperparameter values, and that these hyperparameter values can be discovered using evaluation metrics.
Open Access Agreement
Open Access
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Recommended Citation
Lazar, Josef, "An Unsupervised Machine Learning Algorithm for Clustering Low Dimensional Data Points in Euclidean Grid Space" (2024). Senior Projects Spring 2024. 164.
https://digitalcommons.bard.edu/senproj_s2024/164
This work is protected by a Creative Commons license. Any use not permitted under that license is prohibited.
Included in
Analysis Commons, Applied Mathematics Commons, Applied Statistics Commons, Artificial Intelligence and Robotics Commons, Categorical Data Analysis Commons, Data Science Commons, Discrete Mathematics and Combinatorics Commons, Numerical Analysis and Scientific Computing Commons, Other Statistics and Probability Commons, Set Theory Commons