NIPS 2018
Sun Dec 2nd through Sat the 8th, 2018 at Palais des Congrès de Montréal
Paper ID: 2196 Importance Weighting and Variational Inference

### Reviewer 1

== After author feedback == The generative process does indeed appear in the VSMC paper, eq. (4) in that paper discussed the joint distribution of all random variables generated. The exact expression for the distribution qM in your notation can be found in the unnumbered equation between equations (13) and (14) in the supplementary material of the VSMC paper (let T=1 and b_1=1 and the result follows). The two results are equivalent in the sense that KL(qIS||pIS) = KL(qM||pM) in their respective notations, and thus not in my opinion fundamentally different. The result was included in the original AESMC arXiv version that was published 1 year ago. However, they do illustrate two different perspectives that bridges the gap between VSMC and AESMC in a very interesting way! Especially in Thm. 2 as I mentioned in my original review. Regarding test integrals, Thm 1. relates the joint qM(z1:M) whereas you propose in eq. (9) that it is actually the marginal qM(z1) that is of interest (which is the same as the VSMC generative process of their Alg. 1). This is exactly why I believe Thm. 2 is a highly relevant and novel result that should be the focus. Theorem 3 is contained in the result from the FIVO paper, note the difference in the definition of Var(R) (in your notation) and Var(\hat pN) (in their notation). Because \hat pN is an average of iid rvs (in the IS special case), the rate follows from that Var(\hat pN)= Var(\sum_i w^i / N) = Var(w)/N which is equal to Var(R)/M in your notation. I believe the paper does contain interesting results that are novel and useful to the community, but to accurately reflect the above points would require major revision to sections 3 and 4. Since NIPS does not allow for major revision I will have to recommend reject, but would strongly encourage the authors to revise and resubmit! == Original review == The paper studies importance sampling and its use in constructing variational inference objectives. Furthermore, the authors propose to consider use of elliptical distributions for more efficient proposal distributions. The authors provide some interesting insights in the experimental sections and with the elliptical distributions. However, the generative process is not new, it is the same as in Variational SMC, and Rao-Blackwellization over the particle chosen (eq. 9 in this paper) is standard. The auxiliary variable approach (Thm. 1 in this paper) is studied in the Auto-Encoding SMC paper, see e.g. equations 10-12 in that paper. Thm 3 in this paper is studied in the Filtering Variational Objectives paper, see their proposition 1. The paper is generally well written and easy to follow. I would suggest that the authors revise to focus on Thm 2, which essentially connects Variational SMC and Auto-Encoding SMC in a very nice way, and the elliptical distributions as main contributions. Naesseth et al, Variational Sequential Monte Carlo, AISTATS 2018 Le et al, Auto-Encoding Sequential Monte Carlo, ICLR 2018 Maddison et al, Filtering Variational Objectives, NIPS 2017