Part of Advances in Neural Information Processing Systems 20 (NIPS 2007)
Jean-yves Audibert
We consider the learning task consisting in predicting as well as the best function in a finite reference set G up to the smallest possible additive term. If R(g) denotes the generalization error of a prediction function g, under reasonable assumptions on the loss function (typically satisfied by the least square loss when the output is bounded), it is known that the progressive mixture rule gn satisfies E R(gn) < min_{g in G} R(g) + Cst (log|G|)/n where n denotes the size of the training set, E denotes the expectation wrt the training set distribution. This work shows that, surprisingly, for appropriate reference sets G, the deviation convergence rate of the progressive mixture rule is only no better than Cst / sqrt{n}, and not the expected Cst / n. It also provides an algorithm which does not suffer from this drawback.