__ Summary and Contributions__: The authors propose a generation process for candidate labels when learning with partially labeled data. Additionally, they formulate two partially labeled data algorithms that they show to be provably consistent.

__ Strengths__: The authors propose novel methods for partial label learning and provide theoretical justifications for their work.
The experiments compare to several baselines and the authors run their methods on several datasets

__ Weaknesses__: The paper is not clear and a bit difficult to follow. There are some typos and the methodology for the algorithms could be better explained.

__ Correctness__: The empirical validation seems correct as the authors run multiple trials and report mean and standard deviations of their results.

__ Clarity__: No, the paper is not very clear as there are some typos in the text and the methodology section 4 is not well explained.

__ Relation to Prior Work__: While the authors present prior works in their paper, they do not do a good job of explaining how these works relate or differ from each other. They do motivate their work with limitations of existing methods.

__ Reproducibility__: Yes

__ Additional Feedback__: I think the paper should be better organized to assist with clarity. One suggestion is to have a related works section that compares existing methods and their limitations.
Weakly supervised learning is used as an umbrella term to classify different learning methods in the introduction. This is somewhat confusing as weakly supervised learning is a separate research area and not an umbrella term for the different research areas listed. See works on data programming, adversarial label learning, snorkel and its variants.
=========================================
I acknowledge that I read the rebuttal and thank the authors for providing explanations to the questions and concerns I had.

__ Summary and Contributions__: This paper handles the Partial-Label Learning problem where each training example has a set of candidate labels. Since most of the previous works do not consider the data distribution, there lacks a theoretical understanding of the consistency. Inspired by this, this paper proposes a novel statistical model to depict the generation process of partially labeled data (i.e. a set of candidate labels). Based on the generation model authors further develop consistent PLL methods, both risk-consistent and classifier-consistent verify the consistency. Authors theoretically derive an estimation error bound for each of the methods and show that the error bound of the risk-consistent method is tighter than the other method. In the experiment, the authors show the effectiveness of the method in the benchmark and real-world partially labeled dataset.
== post review
Thanks for the author's response. I understand the difference between PLL and NLL; the difference between them might be the ambiguity. Although the paper still has some unclear points as R1 review, it might be complemented in the final version. After the rebuttal process, I decided to increase my rating.

__ Strengths__: If their statistical model is novel (please check the weaknesses first)
+ By depicting the explicit data distribution, this paper effectively defined the PLL problem as an empirical risk minimization.
+ The suggested two novel methods (i.e., risk-consistent and classifier-consistent) outperform the existing works on benchmark and real-world dataset.

__ Weaknesses__: - Major: The proposed generation process assume that the correct label is always included in the set of candidate labels, and the set of candidate labels set is uniformly sampled (line 42 and 124). I do not know the difference between the suggested data distribution and a general noisy label setting [Chen, Pengfei, et al. "Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels." ICML. 2019.] [Ren, Mengye, et al. "Learning to Reweight Examples for Robust Deep Learning." ICML. 2018.].

__ Correctness__: Good

__ Clarity__: Not significant

__ Relation to Prior Work__: Somewhat weak

__ Reproducibility__: Yes

__ Additional Feedback__: Please check the weakness. If the authors address my concerns, I will re-consider my rating.

__ Summary and Contributions__: This paper studies the partial label learning problem by proposing a data generation process. The data generation process guarantee that the true label is contained in the label candidate set provided. The paper then proposed two methods for partial label learning. These methods are statistically consistent from two different directions, flexible with the choice on base model and with guaranteed performance. Finally, experimental results on the proposed method as well as the effectiveness of the proposed generation model are shown.

__ Strengths__: The paperâ€™s primary goal is to form a solid generation model for partial label learning. The paper successfully achieves this goal: it proposed a generation model agreeing well with the basic assumption on partial label learning, i.e., the true label needs to lie in the candidate label set. The generation model is also well-motivated by assigning equal weight to sample every subset containing the true label. As far as I know, there is no study on the generation model of partial label learning before, despite the fact that the generation model is important to generate the data and get the distribution of the data.
In the part of the experiment, the paper also uses the entropy to test whether the given data agrees well with the generation model. This shows another value of the generation model: to help us know whether the proposed methods will work given some data. This result is really impressive, useful, and not encountered much in other machine learning literature.
Based on the generation model, the paper further proposes two methods, both guaranteed but with slightly different statistical properties. Theoretically, given infinite data, both methods should achieve the optimal classifier as using ordinary labels. The paper analyses the methods through generalization error bound. The empirical results also agree with the theoretical results.
Generally, the paper is well-organized, well-written. It studied an important problem for partial label learning, proposed novel generation model, and theoretical guaranteed method. I think the paper could make itself more self-consistent by giving more discussions on the difference between risk-consistent method and classifier-consistent method.

__ Weaknesses__: After reading the paper, I am a bit confused about why we need a classifier-consistent method since both theoretically and empirically it is worse than the risk-consistent method. Is it because some literature focusing on proposing a classifier-consistent method, or there are some limitations on the loss functions used? I would suggest the paper to have more discussion on this part.

__ Correctness__: yes

__ Clarity__: yes

__ Relation to Prior Work__: yes

__ Reproducibility__: Yes

__ Additional Feedback__: I have read the author response and peer's reviews. I think the paper contains some things interesting and novel. The finding also contains something important to the partial label learning. I would like to keep the rating.