Part of Advances in Neural Information Processing Systems 34 (NeurIPS 2021)
Idan Amir, Yair Carmon, Tomer Koren, Roi Livni
We study the generalization performance of \emph{full-batch} optimization algorithms for stochastic convex optimization: these are first-order methods that only access the exact gradient of the empirical risk (rather than gradients with respect to individual data points), that include a wide range of algorithms such as gradient descent, mirror descent, and their regularized and/or accelerated variants. We provide a new separation result showing that, while algorithms such as stochastic gradient descent can generalize and optimize the population risk to within ϵ after O(1/ϵ2) iterations, full-batch methods either need at least Ω(1/ϵ4) iterations or exhibit a dimension-dependent sample complexity.