Optimal Stopping and Effective Machine Complexity in Learning

Changfeng Wang, Santosh S. Venkatesh, J. Stephen Judd

Advances in Neural Information Processing Systems 6 (NIPS 1993)

We study tltt' problem of when to stop If'arning a class of feedforward networks - networks with linear outputs I1PUrOIl and fixed input weights - when they are trained with a gradient descent algorithm on a finite number of examples. Under general regularity conditions, it is shown that there a.re in general three distinct phases in the generalization performance in the learning process, and in particular, the network has hetter gt'neralization pPTformance when learning is stopped at a certain time before til(' global miniIl111lu of the empirical error is reachert. A notion of effective size of a machine is rtefil1e