NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:4757
Title:Modelling heterogeneous distributions with an Uncountable Mixture of Asymmetric Laplacians

Reviewer 1

## Update ## I have read the rebuttal and would like to thank the authors for the new experimental results they have included. The additional results are very helpful for evaluating the method, although I would have liked to see the a similar plot as Figure 3 in Tagasovska and Lopez-Paz [1]. I find the calibration of UMAL predictions on room-price forecasting for BCN quite convincing. These results, along with the calibration on UCI, have resolved my concerns about calibration of the UMAL method. The test log-likelihoods on UCI are less interesting, but it is good that UMAL performs as expected. For instance, the fact that UMAL outperforms the similar Independent ALD method is nice to see. Overall, I think that the additional experiments make a good argument for accepting the submission. This said, I believe that my initial review may have been overly positive and so, given new results, I have decided to maintain my score of 7. ## Initial Review ## The authors examine uncertainty modeling in regression problems with potentially complex target distributions. They propose learning asymmetric Laplace distributions over targets in order to better capture the uncertainty stemming from the data-generating process. The scale and location parameters of the asymmetric Laplace distributions are parameterized by a neural network, which also takes the asymmetry parameter as an input. The authors treat the asymmetry parameter as a latent variable with a uniform distribution, which, by integration, yields a "uncountable" mixture of asymmetric Laplace distributions. The parameterizing network is learned by maximizing the marginal-likelihood, which is approximated using Monte Carlo integration. Related work on simultaneous quantile regression [1] with neural networks is shown to maximize a lower bound on the proposed marginal-likelihood objective with respect to the location parameter [1, 2]. The proposed uncountable mixture of asymmetric Laplacians (UMAL) model is compared to finite-mixtures models and methods for quantile regression on a synthetic dataset, two rental-price datasets, and a financial time-series problem. The proposed method outperforms these approaches with respect to test log-likelihood. Originality: The paper is an incremental advance on recent work for quantile regression with neural networks [1,2]. The major differences from simultaneous quantile regression [1] are (1) the scale parameter of the asymmetric Laplacian is also parameterized; and (2) the asymmetry parameter is (approximately) marginalized out. In comparison, simultaneous quantile regression trains the parameterizing network in expectation over the scale parameter using SGD. These changes appear to have a meaningful impact on experimental performance. Clarity: The writing is clear and well-organized and I enjoyed reading the submission. However, there are several typos whose correction will improve the flow of the paper. See minor comments below. Quality and Significance: This paper synthesizes recent work on quantile regression and provides a clear, probabilistic interpretation for these approaches. This discussion is useful for the community. The progression from simultaneous quantile regression to UMAL was straightforward, but is a good contribution in my opinion. It might show the effectiveness of the proposed method better if additional experimental results were presented. Further comparisons could be done on regression datasets from the UCI repository [3]. An empirical study of uncertainty quality would also make the experimental section much more convincing. For instance, Tagasovska and Lopez-Paz [1] evaluate the quality of uncertainty estimates using calibration of prediction intervals (see Figure 3 of [1]). A similar baseline would verify that the uncertainty learned by UMAL models is useful and that the improved test log-likelihoods given in Table 1 are not only the result of a more expressive model. Minor Comments: - What was the learning procedure for the mixture models in Table 1? The variance of the test log-likelihood for the synthetic dataset is surprisingly large. See, for example, the result for the three-component Laplace mixture. - How were asymmetry parameters selected at test time? The number of samples used during training is given in Line 212, but I could not find this information for the test procedure. Furthermore, the input to Algorithm 3 seems to imply that this variable is not marginalized using MC integration at test time, as it is during training. - Line 78 should read "... distribution, *a* fact that ..." - Line 169: "has" should be used instead of "have". - Line 175: "integrate" rather than "integrates". - Line 221: This sentence is hard to understand and should be re-phrased. What is the "50% random uniform generated data"? I understood this as 50% of the synthetic data was selected to form the training set, but that meaning is not obvious from the sentence. References: [1] Tagasovska, N., & Lopez-Paz, D. (2018). Frequentist uncertainty estimates for deep learning. arXiv preprint arXiv:1811.00908. [2] Dabney, W., Ostrovski, G., Silver, D., & Munos, R. (2018). Implicit quantile networks for distributional reinforcement learning. arXiv preprint arXiv:1806.06923. [3] Dua, D. and Graff, C. (2019). UCI Machine Learning Repository []. Irvine, CA: University of California, School of Information and Computer Science.

Reviewer 2

This paper proposes a mixture of asymmetric laplacians (ALD) to model the distribution of an output variable for regression in deep learning. > The paper lacks clarity which is not helped by the dispersed typographic errors. The overuse of "uncountable mixtures" to refer to the simple process of marginalizing a 1D random variable, which is fed as input to a quantile regression network is confusing. > This work lacks novelty as the only contribution is quite simply a Monte Carlo average of a fixed number of ALDs. > The evaluation of the uncertainty measures crucially omits the concept of calibration and respective metrics. > This work is missing standard regression benchmarks such as the UCI datasets [1] or the benchmarks used in [2] (as well as prior work). I believe that the expressivity of this framework should allow for adequate performance on homogeneous distributions just as well as heterogeneous [1] Asuncion, Arthur, and David Newman. "UCI machine learning repository." (2007). [2] Lakshminarayanan, Balaji, Alexander Pritzel, and Charles Blundell. "Simple and scalable predictive uncertainty estimation using deep ensembles." Advances in Neural Information Processing Systems. 2017. ============ Having read the rebuttal, I appreciate the extensive experiments that empirically establish the calibration of the uncertainty measurement. Therefore, I updated my score but kept my comments as is for completeness.

Reviewer 3

Originality: The proposed UMAL is novel. It combines and extends the ideas of mixture density network and quantil regression, which treats each quantile level as a single component. It also draws connections to other quantil models. Quality: The technique used in this paper seems to be valid. The synthetic dataset experiment demonstrates the claimed advantages of UMAL. Clarity: When I first read the paper, I got stuck at the quantil regression section. So it would be great to include a brief review of the properties of quantil regression in the appendix. Also I am still a bit confused about the differences between Independent QR (IQR) model and Independent ALD, could you make it more clearer (such as the training objective function/training algorithm for IQR)? Significance: The proposed model empirically demonstrates the better modelling performance for the hetergeneous distribution. This might be a useful tool for modelling aleatoric uncertainties.