Part of Advances in Neural Information Processing Systems 9 (NIPS 1996)
Peter Sollich, David Barber
We analyse online learning from finite training sets at non(cid:173) infinitesimal learning rates TJ. By an extension of statistical me(cid:173) chanics methods, we obtain exact results for the time-dependent generalization error of a linear network with a large number of weights N. We find, for example, that for small training sets of size p ~ N, larger learning rates can be used without compromis(cid:173) ing asymptotic generalization performance or convergence speed. Encouragingly, for optimal settings of TJ (and, less importantly, weight decay ,) at given final learning time, the generalization per(cid:173) formance of online learning is essentially as good as that of offline learning.