Part of Advances in Neural Information Processing Systems 13 (NIPS 2000)
Michael Tipping
'Kernel' principal component analysis (PCA) is an elegant non(cid:173) linear generalisation of the popular linear data analysis method, where a kernel function implicitly defines a nonlinear transforma(cid:173) tion into a feature space wherein standard PCA is performed. Un(cid:173) fortunately, the technique is not 'sparse', since the components thus obtained are expressed in terms of kernels associated with ev(cid:173) ery training vector. This paper shows that by approximating the covariance matrix in feature space by a reduced number of exam(cid:173) ple vectors, using a maximum-likelihood approach, we may obtain a highly sparse form of kernel PCA without loss of effectiveness.