Review for NeurIPS paper: A General Method for Robust Learning from Batches

NeurIPS 2020

A General Method for Robust Learning from Batches

Review 1

Summary and Contributions: This paper extends recent algorithmic work for the problem of learning from untrusted batches. The focus is on the setting where the underlying distribution has additional structure, in which case more sample efficient algorithms are possible. This paper develops sample and computationally efficient algorithms for such settings.

Strengths: The paper presents novel theoretical results that are highly relevant for the machine learning community. At a high-level, the paper combines in a natural but non-trivial way VC theory with the filtering framework (developed in prior work).

Weaknesses: One potential limitation is the assumption that each good batch contains samples from the distribution p -- as opposed to some distribution that is close to p in L1 distance. The original framework of Qiao-Valiant covers this setting and some of the previous efficient algorithms work in this setting as well. It is not immediately clear from the writeup if this is an inherent limitation of the methodology proposed in this paper.

Correctness: The proofs appear to be correct. However, the paper is fairly long and *all* the proofs appear in an appendix.

Clarity: This writing is quite clear.

Relation to Prior Work: While the prior work, in terms of theorem statements, is sufficiently explained, very limited space is devoted in explaining the the techniques and in particular providing a comparison to prior techniques. Some specific comments/questions are provided below.

Reproducibility: Yes

Additional Feedback: From the writeup it is not clear how the techniques are related to prior work. For example: 1) The filtering framework of [J019], that the current work builds on, is itself an adaptation of the filtering technique developed in the robust statistics literature, [DKK+16], etc. It would be appropriate to explain this and also elaborate on the differences between [J019] and the current application of filtering (which is done to some extent on p. 8). 2) The idea of using VC theory for computationally efficient density estimation has been used in a sequence of prior works (including ADLS17). The current paper builds on these ideas. It would be useful for the non-expert reader if this was clarified. 3) The paper contains essentially zero proofs in the first 8 pages. I understand there are space issues. But it would have certainly been possible to include 2-3 pages of content and move some of the secondary contributions to the appendix. With this presentation, a non-expert reviewer either needs to believe correctness or read the appendix. ======== Post rebuttal: -- The authors responded that their algorithm can be extended to work for the case that the samples from each good batch come from a distribution that is close to p. And that this follows immediately from their approach. It would be useful if this is included in the final version. -- The authors also promised that they will revise their presentation to address the relation to prior work at a technical level. Given the above, I am happy to increase my score to 7.

Review 2

Summary and Contributions: This paper generalizes the filter algorithm proposed in [JO19] for learning from batches from discrete distribution learning to continuous domain. The main technical lemma shows that the proposed filter algorithm can “clean” the batches to guarantee uniform convergence with near optimal sample complexity. This lemma is leveraged to learn piecewise polynomial distribution, binary classification. Though the result is generally statistical in its nature and does not yield a computation efficient algorithm, the paper leverages an efficient learning algorithm for unions of intervals to give efficient algorithms for learning piecewise polynomial distributions and union of intervals classifiers.

Strengths: This paper generalizes the filter algorithm proposed in [JO19] for learning from batches from discrete distribution learning to continuous domain, and provides the first efficient robust batch learning algorithm for several fundamental learning problems.

Weaknesses: The proof for the sample efficient filter for general VC class follows from a very similar technical approach as in [JO19].

Correctness: Yes

Clarity: yes

Relation to Prior Work: Yes

Reproducibility: Yes

Additional Feedback: I have read the authors' feedback and my score does not change.

Review 3

Summary and Contributions: I thank the authors for their feedback. --- The authors present robust learning algorithms using batches for 1) learning structured distributions 2) classification problems 3) 1d classification problems All above algorithms are based on a common filtering framework.

Strengths: The authors give tractable algorithms that match sample complexity lower bounds up to log factors.

Weaknesses: It's unclear how the authors' bounds will break down if we do not make the assumption that the distribution is structured. Also, it is unclear how large is the polynomial time complexity. If time complexity's polynomial degree is too high, then the algorithms are still impractical. And the authors should also emphasize more how their work is technically superior to [JO19].

Correctness: Yes

Clarity: There are many results in this paper and it'd be nicer if they're more organized.

Relation to Prior Work: Yes

Reproducibility: Yes

Additional Feedback:

Review 4

Summary and Contributions: Sample complexity bound for the task of distribution-learning in the setting of batch sampling with adversarial batches, i.e. a $\beta$ fraction of the batches might get maliciously corrupted. The bounds are almost tight as they match the lower information-theoretic bounds for this case, up to logarithmic factors. The main novelty is that the result can be applied to distributions over infinite or even continues domains. As a corollary the authors provide an efficient learning algorithm, in terms of sample and run-time, for learning piecewise polynomial distributions. The notions of interest are - $\mathcal{F}$-distance, which generalizes TV-distance. - Yatracos class and the Yatracos-minimizer. - VC-dimension and the VC-inequality and convergence theorems.

Strengths: The paper seems to be of high-quality. The results are impressive, non-trivial and interesting, in the sense that they might lead to further research in their direction. The authors did a through work and provided several applications to their main result which are of individual interest. Update: I read the authors' feedback and my score stays the same.

Weaknesses: Didn't find any specific weakness.

Correctness: As far as I saw - it is correct.

Clarity: Yes.

Relation to Prior Work: Yes.

Reproducibility: Yes

Additional Feedback: