NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:6826
Title:Continuous Hierarchical Representations with Poincaré Variational Auto-Encoders

Reviewer 1


		
To the best of my knowledge the idea of using either the Riemannian normal or the wrapped normal in a VAE is new and worth of attention. The paper is excellently written and very clear. 1) Line 48: "We show that a VAE endowed with a hyperbolic latent space will be more capable at representing and discovering hierarchies than traditional VAEs that use a Euclidean latent space". While I do understand what you mean I feel like this line is a bit misleading. It is true that your VAE samples from a hyperbolic space and that a standard VAE samples in a euclidean space but there is no guarantee that the resulting latent spaces are respectively hyperbolic or euclidean. Would it be possible to change this sentence? 2) In table 1 you measure the negative test marginal likehood with L_IWAE. But you never define what L_IWAE is. Could you define it properly? 3) Could you move figure 5? It is refereed a whole page after been introduced and it is a bit confusing. You cite figure 6 before 5 in the text. 4) Line 183: "Derived earlier in Eq 8 and 11". The derivation is in the appendix, could you change this line to make it more clear? It is a bit confusing right now. To summarise, this is an excellent paper that introduces a theoretically sound modification of the standard VAE and achieves state of the art results on a series of baselines. -------------------------------------------------------------- Post rebuttal: The authors agreed about fixing my main complain therefore I will maintain my overall score and argue for acceptance.

Reviewer 2


		
--- Review update: The authors clarified the details of the ablation study in Figure 5, so now I am convinced that the proposed updates to the decoder architecture constitute a significant improvement. Therefore, I am increasing my score to 7. --- The authors consider variational autoencoders with hidden variable $z$ in hyperbolic space. Intuitively, hyperbolic space is suitable for learning hierarchical representations. The exponential growth of surface with radius allows locating an exponential number of leaves of a tree on an equal distance. In the concurrent work, Nagano2018 considered analogous models. The submission has two principal differences. First, it studies a different type of distribution on a hyperbolic plane as a building block of VAE. Second, it uses a particular “hyperbolic” layer in the decoder. The former leads to better results for one of the tasks in the experimental section, but the paper does not study the effect of the layer. In general, the paper is well-written and technically sound. The claims are supported by thoroughly described experimental results. However, the use of hyperbolic spaces is motivated by two-dimensional illustration. How does the analogy between trees and hyperbolic spaces work beyond two-dimensional case?

Reviewer 3


		
-Originality: Doing variational inference based on a standard ELBO with reparamterised gradients on the Poincare ball is new. It uses ideas similar to very recent/concurrent work (Ganea et al., 2018; Ovinnikov, 2018; Nagano et al., 2019), but it is made clear how this work differs from related work. Quality: The submission seems technically sound, with detailed experimental results. The paper empirically compares their approach mostly with their Euclidean counterpart. This is fair, of course, but it would be interesting to see how it compares empirically with the Poincaré Wasserstein Autoencoder (Ovinnikov, 2019) and the hyperboloid model of Nagano et al. (2019), like do they yield similar latent representations, how are the respective sample qualities? -Clarity: The paper is polished and well written. The background on Riemannian geometry is to the point, so that the paper is in most parts accessible to readers without training in non-Euclidean geometry. Nevertheless, I feel that readers could benefit from more high-level guidance in Appendix B, like what do we learn from Section B.8 and B.9? -Significance: I feel that this is a significant work and others can build on these ideas either methodologically or experimentally. The experiments presented show advantages over Euclidean counterparts in different domains. I also feel that the proposed approach is easier to use for practitioners compared to some related work. -Some comments/thoughts: I was wondering if instead of using a Gaussian, similar extensions can be made for other spherical distributions such as a Student-t, as they also have a stochastic representation of z=r*\alpha (Fang et al, Symmetric Multivariate and Related Distributions, 1990, Chapter 2)? Also Mallasto et al, Probabilistic Riemannian submanifold learning with wrapped Gaussian process latent variable models, 2019, considered the pushforward of a Gaussian Process by the exponential map. Could something like this be used here for a VAE on the Poincare ball, say with a GP prior (Casale et al, 2018)? Further, the Riemannian Normal distribution seems to be unstable in higher dimensions from Table 3. Are there different limiting distributions of the Wrapped/Riemannian model as the latent state space dimension goes to infinity, and does this provide some guidance for applications? POST AUTHOR RESPONSE: Having read the rebuttal and the other reviews, I keep my initial score of 7. The authors agreed to improve Appendix B, which I felt lacked some high-level guidance. Their response also indicated that the proposed approach can be extended in different ways. My main complaint was that the paper lacks some empirical comparison with very recent related work (Ovinnikov, 2019, Nagano et al., 2019). However, even without such a comparison, I think it is still a complete and interesting paper.