Part of Advances in Neural Information Processing Systems 9 (NIPS 1996)
Genevieve Orr
Stochastic (on-line) learning can be faster than batch learning. However, at late times, the learning rate must be annealed to re(cid:173) move the noise present in the stochastic weight updates. In this annealing phase, the convergence rate (in mean square) is at best proportional to l/T where T is the number of input presentations. An alternative is to increase the batch size to remove the noise. In this paper we explore convergence for LMS using 1) small but fixed batch sizes and 2) an adaptive batch size. We show that the best adaptive batch schedule is exponential and has a rate of conver(cid:173) gence which is the same as for annealing, Le., at best proportional to l/T.