Paper ID: | 161 |
---|---|

Title: | Provable Gradient Variance Guarantees for Black-Box Variational Inference |

Strengths: The paper is clearly written and easy to follow. The author established several unimprovable variance bounds under specific assumptions. The property is appealing, though the smooth constants for most functional families are difficult to compute. The subsampling trick may be useful or be able to generalize to infinite some of functions. Weaknesses: 1. The problem discussed in this paper is relatively straightforward, even it is targeted for black box variational inference. The overall discussed topic is still relying on the existence of affine mapping for location scale distributed latent variables. The more general reparameterization trick has been discussed in [12] and "Implicit reparameterization gradients (NIPS 2018)". I admit there is no error in this manuscript, but I think that a thorough analysis for more challenging task would meet the acceptance bar of NIPS venue. 2. In the experimental section, only two toy models (Baysian linear and logistic regression) are considered. I understand the experiments are supportive to verify the variance bound proposed in this paper. However, in order to recognize the value of this work, I would like to see at least one additional experiment with amortized variational inference, such as a simple VAE whose bound may be easy to calculate. Typos: In Appendix: Eq below Line 298: \epsilon -> u Eq below Line 301: t -> T

I have only positive remarks on this paper. Originality: This paper extends theoretical results from [Xu et al] and [Fan et al] on bounds of the variance of reparameterization estimators. Quality: The paper is of high quality: - Theoretical results of high quality: assumptions precisely stated, proofs given in their gist with details in supp materials. The limitations of the results are clearly stated as well. - Experimentations are rigorous and intuitive. The results on the different types of subsampling, to optimize the variance of the estimator, are interesting. Clarity: This paper is very well written, clear in its theoretical and detailed and intuitive in its experimental part. Significance: The results are significant and of interest to the NeurIPS community.

Originality: The variance bound for the gradient estimator is novel and understanding the technical condition would be helpful in analyzing the stochastic gradient descent method. Quality: The technical results are sound and clearly highlights the novel contribution. However, it was not entirely clear how the choice of constant M or M_n would be made? Clarity: The theorems and the lemmas are clearly written. The main contribution of the paper is clearly described through the technical results. Significance: How the results on the bound on the variance of the gradient estimator is overall makes an impact in variational inference? Surely, there is the other term the entropy whose gradient will also play a part in optimizing the ELBO (w) function and hence variance of the gradient of h(w) term should also be controlled - isn't that so or am I making a mistake?