The paper investigates the performance of deep ensembles as a function of number of ensemble members, and identifies conditions under which calibrated negative log-likelihood follows a power law behavior. Given the increasing popularity of deep ensembles, this a timely work and all the reviewers recommend acceptance. The reviewers raised questions about missing references, experiments on additional datasets and how to use this in practice. The author rebuttal and the proposed revisions to the camera ready mostly address these concerns. Overall, this is a good paper and I recommend acceptance.