Date of Submission
Sven Anderson, Rebecca Thomas
In this project documents that come from defined classes are clustered. The clustering is done using non-negative matrix factorization performed by a approximation method called rank one residue iterations. In order to employ this method the optimal number of clusters and cluster sparsity has to be determined. Normalized mutual information is a measure of how well the clustering represents the original class structure, and this measure is used to find the optimal number of clusters and sparsity.
Access restricted to On-Campus only
Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.
Tsikhanovich, Maksim, "An information theoretic approach to determining sparsity in clustering classified documents" (2011). Senior Projects Spring 2011. 85.