NeurIPS 2020

Towards Better Generalization of Adaptive Gradient Methods

Meta Review

While the central idea and detailed experimentation were appreciated unanimously by the reviewers and therefore I am recommending accept, there are multiple issues paper that the authors are enouraged to address, including a comparison with the SGD+DP baseline - even for the theoretical considerations. Furthermore the experiments presented by the paper are not run to completion (100 epochs) and therefore they do not acieve SOTA numbers - this should be fixed.