Review for NeurIPS paper: Learning Rich Rankings

NeurIPS 2020

Learning Rich Rankings

Review 1

Summary and Contributions: This work proposes a novel modeling of ranking, called "contextual repeated selection (CRS)", that provides natural multimodality to the ranking space. Theoretical guarantees for the performance under the CRS model is provided, and experiments are conducted on a variety of ranking domains.

Strengths: * Connecting ranking with choice to create choice-based model of ranking data is the main contribution of this paper. The proposed contextual repeated selection subsumes well known ranking models by applying repeated selection to choice models. This model does not assume transitivity or center the distribution around aa single total ordering, and it can be efficiently learned as shown in the experiment results. * Rigorous theoretical guarantee are provided for the maximum likelihood estimation for MNL and PL, and convergence guarantees for the CRS model are demonstrated. * Empirical results show that the proposed model offers better performance for a variety of ranking domains.

Weaknesses: Generally speaking, the paper is interesting with solid theoretical results. I have few questions: * In the analysis of the CRS model, the full CRS is considered. However, the factorized CRS is considered for the empirical results. Do the derived theoretical guarantees follow for the factorized CRS? Is it possible to simulate both versions of CRS to visualize the performance gap, if any? * In the empirical results section, It is evident that the performance gap between PL and CRS is much lower than the one between CRS and Mallows across most of the databases. Are there any insights about such a behavior?

Correctness: For the main paper, the reviewer believes that the theorems sound correct. However, the reviewer did not thoroughly proofread the supplementary materials.

Clarity: The paper is generally well written. However, there are many typos in the main paper that should be corrected. Examples are "that is crucial our paper's framework", "our new model perform", "Repeated these choices until", etc.

Relation to Prior Work: Yes, the literature review is extensive, and the contribution of this paper is discussed and compared against previous contributions.

Reproducibility: Yes

Additional Feedback: ################## Update after Rebuttal ################## I would like to thank the authors for addressing all the major comments raised by the reviewers in their feedback. Hence, I will keep my score the same.

Review 2

Summary and Contributions: The paper proposes a repeated selection interpretation of the recently introduced Context dependent utlity model, a discrete choice model, for defining potentially non-unimodal distributions over permutations. The theoretical guarantees of the maximum likelihood estimator for such a model is analyzed.

Strengths: - Extends the discrete choice model to a novel model for permutations (Models of intransitivity is not as well understood as the transitive models) - Theoretically sound and the techniques end up improving guarantees known for existing models such as Plackett- Luce and MNL

Weaknesses: - As the authors point out, the analysis seems to be sub optimal when compared to what is observed in practice. This is expected because there is no clean way to recover the previous models based on further assumptions about the parameters to be learnt. - More specifically, it would have been nice to see a dependency on the rank of the factor matrices C,T showing up in the tail bounds. The rank determines the degree of freedom of the parameters and hence one would expect the learning rates to depend on these. Also, how does the rank play a role in identifiability? The authors mention one needs O(nlogn2) samples for the model to be identifiable with high probability. Is this independent of the rank?

Correctness: - The experimental results seem to show that increasing the rank typically increase the average out of sample negative log likelihood. It would have been nice to see a study on the effect on synthetic data where the underlying rank of the parameter factors known.

Clarity: - A reasonably well written paper

Relation to Prior Work: The work depends on and extends the recently introduced CDM for discrete choice. The authors do a reasonable job of reviewing the previous contributions.

Reproducibility: Yes

Additional Feedback:

Review 3

Summary and Contributions: The authors extend the recently proposed CDM rich choice model to rankings via contextual repeated selection (CRS). The main results of the paper are bounds on the expected risk as well as risk tail bounds, ie, guarantees on the speed of convergence of the ML estimate. As special cases, they also provide bounds and the expected risk of ML for MNL and PL models. Some limited experiments with real data show that CRS captures additional structure that in particular plain Plackett-Luce misses.

Strengths: - This is a well written paper. The concepts are clearly discussed and explained. (but see Weaknesses regarding utilization of the page budget.) - A ranking model that permits richer interaction between items than in classical PL et al is interesting, even though the generalization of CDM to CRS is relatively straightforward. - The main result of the paper, Theorem 3, which finds the theoretical bound for the risk of ML estimate is nice, as are the Theorems 1 and 2 which as a byproduct analyze MNL and PL models. Their bounds strengthen those from the literature for the PL model in particular.

Weaknesses: - While the theoretical bounds are nice, no algorithmic results for efficiently estimating the CRS model is given. I suppose that this all relies directly on the machinery developed for CDM in [48]. - I also find the utilization of the page limit very suboptimal. Main discussions such as those about the discrepancy of the optimization should not appear in the Appendix. - One important limitation is that the theorems provide guarantees only for datasets of *full* rankings. In many practical scenarios, the data would consist of partial rankings over (many small) subsets of items. Similar guarantees for this case would be very helpful, and would strengthen the contribution. - Given the parameter scaling of the model (quadratic in the number of items) it might not be easily applicable in practice. In fact I find the simulation results confusing. It is not clear why CRS has such nonlinear performance. I do not find the explanation on line 522-527 satisfactory.

Correctness: The proofs are quite long. From my quick scanning of the derivations, they look correct.

Clarity: Yes the paper is well written but I believe the main body should be shortened and more space should be dedicated to empirical results.

Relation to Prior Work: This is very thorough and well written.

Reproducibility: Yes

Additional Feedback: - the text cites some equations by #, but equations are not actually labeled - line 102: "...crucial *to* our paper" - line 160: should be \sigma_0

Review 4

Summary and Contributions: This paper proposes a CRS (contextual repeated selection) model, which is a generalization of Plackett-Luce model. The authors bound the risk (an indicator of MLE) for MNL model, Plackett-Luce model, and the proposed CRS model. Experiments show better fitness of CRS model to real-world data.

Strengths: 1. A novel context-dependent ranking model that is shown to better fit the real-world data; 2. Theoretical bounds on the performance of MLE for MNL, PL, and finally CRS with rigorous proofs.

Weaknesses: 1. It is hard to find the formal definition of the proposed CRS model. It seems to be the equation after line 175, but the authors did not say it explicitly. 2. Theorem 1 does not seem to be very challenging as it has \lambda_2(L) in the bounds. It would be more interesting if \lambda_2(L) can be bounded. 3. In the tail bound of Theorem 2 (equation between lines 259 and 260), it seems this bound becomes trivial for very large n since the last term can be greater than one. 4. The authors' understanding of identifiability is not correct. Identifiability is a property of a model assuming infinite data can be observed. You can not say a model is not identifiable due to insufficient data.

Correctness: I did not see obvious errors. Theorems 1 and 2 are not very surprising to me, but okay as intermediate steps to build Theorem 3.

Clarity: Yes.

Relation to Prior Work: Yes.

Reproducibility: Yes

Additional Feedback: About the "identifiability" discussed in this paper. I suggest using the conditions on data as conditions for MLE to converge. In order to characterize identifiability, the authors have to define different models and prove that some models are identifiable while others are not. I have read all reviews and the authors' response. My score is updated.