NeurIPS 2020

Replica-Exchange Nos\'e-Hoover Dynamics for Bayesian Learning on Large Datasets

Meta Review

The paper proposes a novel MCMC-type algorithm to perform Bayesian inference on large datasets. The paper is a mixture of replica exchange, Nose-Hoover dynamics and non-standard acceptance criterion to deal with mini-batches. All the reviewers participated actively to the discussion after the rebuttal was made available. Although all the ingredients of the proposed method do exist, their combination is original and potentially useful for the ML literature as pointed out by most reviewers. Theorem 2 is also neat and proposes a nice way to propose swaps between replicas using mini-batches. However, the paper suffers from several serious weaknesses which should be addressed in the final version of the paper. There is first a readibility issue. The authors have used savetrees / space saving to some extreme degree. This is bothersome and this was pointed out by reviewers during the dicussion and reported to the PC. The readibility/presentation has to be significantly improved. As pointed out by some referees, there are important highly relevant references that should be included. The final version should also propose guidelines on the selection of lambda and be more explicit about the limitations of the correction distribution approach in the final version. Reviewer 3 points out a signficant limitation ``if the minibatch noise is too big, one cannot find a suitable correction distribution. This issue is directly related to the data-efficiency of the method. Since the method could require additional batches to reduce the variance of the energy estimate, at this point, I would expect a proper comparison with other methods.'' Finally, the final version has to include better baselines and the authors in this respect should follow the recommendations of Reviewer 3. In this current version the ResNet architecture on CIFAR-10 is off. Ensembling methods should be added for comparison as well as techniques such as cyclical stochastic gradient MCMC.