Size Regularized Cut for Data Clustering

Part of Advances in Neural Information Processing Systems 18 (NIPS 2005)

Bibtex Metadata Paper

Authors

Yixin Chen, Ya Zhang, Xiang Ji

Abstract

We present a novel spectral clustering method that enables users to incorporate prior knowledge of the size of clusters into the clustering process. The cost function, which is named size regularized cut (SRcut), is defined as the sum of the inter-cluster similarity and a regularization term measuring the relative size of two clusters. Finding a partition of the data set to minimize SRcut is proved to be NP-complete. An approximation algorithm is proposed to solve a relaxed version of the optimization problem as an eigenvalue problem. Evaluations over different data sets demonstrate that the method is not sensitive to outliers and performs better than normalized cut.