Part of Advances in Neural Information Processing Systems 23 (NIPS 2010)
Christos Boutsidis, Anastasios Zouzias, Petros Drineas
This paper discusses the topic of dimensionality reduction for k-means clustering. We prove that any set of n points in d dimensions (rows in a matrix A∈\RRn×d) can be projected into t=Ω(k/\eps2) dimensions, for any \eps∈(0,1/3), in O(nd⌈\eps−2k/log(d)⌉) time, such that with constant probability the optimal k-partition of the point set is preserved within a factor of 2+\eps. The projection is done by post-multiplying A with a d×t random matrix R having entries +1/√t or −1/√t with equal probability. A numerical implementation of our technique and experiments on a large face images dataset verify the speed and the accuracy of our theoretical results.