Paper ID: | 5101 |
---|---|

Title: | Covariate-Powered Empirical Bayes Estimation |

I like the overall setup and the analysis from the statistical point of view. The theoretical results are sound, though I think obtained mostly with standard methods. What I am missing most is motivation and empirical validation which is crucial for NeurIPS submission in my opinion. First of all, I am not aware of serious practical interest in accurate computation of mean ratings for recommender systems. That might be important in some biological applications but seems not to be on agenda for recommender systems. Furthermore, the experimental comparison is 1) Very limited: one synthetic example and one real-world dataset 2) Standard deviations are not provided in all the experiments. It is thus not clear, whether the experimental benefits of the proposed method are statistically significant.

This paper shows a lot of theoretical work related to empirical Bayes estimates using covariates. It also proposes EBCF estimate by splitting the sample into two parts, one of which is used for mean estimation and the other of which is for variance estimations. Many theoretical proofs have been provided and robustness has been shown. Experiments on both synthetic and real data demonstrate better prediction accuracy of EBCF compared to other related methods. I am not familiar with Empirical Bayes theories, so I cannot evaluate the originality, the technical quality and the significance. My review is focused on the experiment results. Concretely, can the authors quantify to what degree EBCF dominates the methods compared by showing standard errors of MSE? In addition, what does EBCF stand for?

I feel the treatment in this paper is somewhat contradictory to its setup. It is assumed that the density of X is uniformly lower bounded. This reduces the problem to standard linear/nonparametric regression and all these rates are somewhat known. The fact that A is unknown is, I feel, not a big deal because this A is "global" in the sense that all movies should suffer if A is large and vice versa. What is really interesting, which I think the authors correctly set up, is when some movies have many more ratings compared to the other movies. In this case, we have an extreme form of heteroscedasticity (i.e., V(x_i) ~ sqrt{N} * V(x_j), for example) and things become interesting. The following form of estimation error might be expected (e.g., for Lipschitz smooth functions): err(x_i) ~= 1/root{n_i} + n^{-1/3}, where n_i is the number of times x_i is reviewed, and n is the total number of reviews in the movie data base. Such results would be more interesting. The bottomline is that pointwise error asymptotics needs to be understood, because there should be a difference on estiming ratings of well-seen and less-seen movies.

The Authors provide a simple but powerful approach to empirical bayesian inference under rather broad assumptions. The method is relevant in settings where both a) standard statistical estimators (such as the average) can be evaluated and b) covariates can be used to train machine learning models to estimate the same value. The paper tries to solve this problem in the setting where the standard estimator is not reliable enough (e.g. because sample size is too small) and the covariates only give weak information on the target variable. The problem setting considered is highly relevant in many real-world settings. Considering the practical relevance and theoretical interest in empirical bayes methods, it seems quite surprising that this approach has not been investigated earlier (only for special cases such as linear models). As such, the presented paper fills a very important gap by giving the proposed method (that apparently has already been used in practice, as noted in the related-works section) a clear theoretical basis. The focus of the paper lies on the theoretical analysis; the main result are minimax bounds that explicitly incorporate the ML model, as well as (potentially unknown) model variances. Additional theoretical contributions are robustness under misspecification of the model (leading to very general results), as well as an analysis of the practical implementation of the proposed model. The paper concludes with an empirical evaluation on both synthetic and real-world data. The paper is very well written; even without strong background in the topic it is easy to follow, results are being discussed, explained and put into context. As a minor drawback, a discussion of potential disadvantages of the proposed method could have been enlightening.