Part of Advances in Neural Information Processing Systems 4 (NIPS 1991)
I. Guyon, V. Vapnik, B. Boser, L. Bottou, S. A. Solla
The method of Structural Risk Minimization refers to tuning the capacity of the classifier to the available amount of training data. This capac(cid:173) ity is influenced by several factors, including: (1) properties of the input space, (2) nature and structure of the classifier, and (3) learning algorithm. Actions based on these three factors are combined here to control the ca(cid:173) pacity of linear classifiers and improve generalization on the problem of handwritten digit recognition.
1 RISK MINIMIZATION AND CAPACITY
1.1 EMPIRICAL RISK MINIMIZATION
A common way of training a given classifier is to adjust the parameters w in the classification function F( x, w) to minimize the training error Etrain, i.e. the fre(cid:173) quency of errors on a set of p training examples. Etrain estimates the expected risk based on the empirical data provided by the p available examples. The method is thus called Empirical Risk Minimization. But the classification function F(x, w*) which minimizes the empirical risk does not necessarily minimize the generalization error, i.e. the expected value of the risk over the full distribution of possible inputs and their corresponding outputs. Such generalization error Egene cannot in general be computed, but it can be estimated on a separate test set (Ete$t). Other ways of