NeurIPS 2020

Practical Quasi-Newton Methods for Training Deep Neural Networks

Meta Review

This paper develops a quasi-newton optimization algorithm for training fully connected neural networks. The authors develop an LBFGS like approximation of the Hessian via a block kronecker product factorization. On the experimental side they demonstrate the acceleration effect of the proposed methods for autoencoder feed-forward neural network models with either nine or thirteen layers applied to three datasets. The authors also provide some theoretical guarantees on convergence to stationarity with additional assumptions. All reviewers found the paper interesting. Some reviewers raised concerns about the additional tuning parameters, lack of results for test error, and some limitations in the experiments (no result on classification or more complex data sets as numerical experiments use MNIST). Most of these concerns were alleviated based on the authors’ response and in particular the generalization error results provided in the authors feedback. I concur with the reviewers and think the development of second order methods which speed up training of neural networks is interesting and well suited to Neurips. While there are some limitations in the numerical experiments I think this paper takes an important step in the right direction and therefore recommend acceptance. I do encourage the authors to follow the reviewers’ suggestions to further improve their final manuscript including the addition of the generalization error curves.