Sparsity of Data Representation of Optimal Kernel Machine and Leave-one-out Estimator

Part of Advances in Neural Information Processing Systems 13 (NIPS 2000)

Bibtex Metadata Paper

Authors

Adam Kowalczyk

Abstract

Vapnik's result that the expectation of the generalisation error ofthe opti(cid:173) mal hyperplane is bounded by the expectation of the ratio of the number of support vectors to the number of training examples is extended to a broad class of kernel machines. The class includes Support Vector Ma(cid:173) chines for soft margin classification and regression, and Regularization Networks with a variety of kernels and cost functions. We show that key inequalities in Vapnik's result become equalities once "the classification error" is replaced by "the margin error", with the latter defined as an in(cid:173) stance with positive cost. In particular we show that expectations of the true margin error and the empirical margin error are equal, and that the sparse solutions for kernel machines are possible only if the cost function is "partially" insensitive.