NeurIPS 2020

An Improved Analysis of Stochastic Gradient Descent with Momentum

Meta Review

The paper studies the convergence of SGD with momentum, which is of strong research interest. It shows that SGD with momentum converges as fast as SGD for smooth strongly-convex/non-convex objectives, and faster in a multi-stage scenario with learning-rate decay. While the core contribution was liked by all reviewers, Reviewer 3 brought a serious issue in the proof of Lemma 1 to our attention, which forms the foundation for the main results. After the feedback and additional clarification by the authors and longer discussions, we share the impression with the authors that the issue can be fixed by replacing E[m^k] by v^k throughout the paper and adjusting minor constants. We expect trust the authors to perform these changes and should any issues remain, withdraw the paper. Additionally, we hope the detailed feedback with improvement suggestions from the 4 reviews will be implemented for the camera ready version.