NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:6133
Title:A Necessary and Sufficient Stability Notion for Adaptive Generalization

Reviewer 1


		
I like the paper, and support to accept it. The major contribution is that the authors define a new measure of stability, which is both necessary and sufficient for generalization. Understanding specialized necessary and sufficient properties for restricted classes of queries and distributions is an interesting direction for future work. Lifting the condition that the mechanisms need to be sample accurate is not covered in the current formalization, and is an important direction of further research. Writing style: The paper introduces a lot of notation and jargon, and I found it to be very hard to read. Simplifying the paper in the final version will definitely increase the accessibility and is strongly recommended. Post feedback: Thanks for the feedback. I am happy that authors would invest time in improving the writing style of the paper. I still support to accept it.

Reviewer 2


		
This paper contributes to the line of works aiming at identifying stability notions that can be used to guarantee generalization in adaptive data analysis. The notion that the paper proposes is based on a measure of stability loss defined in terms of statistical distance between the prior distribution over the possible inputs and the posterior distribution over the inputs induced by an output of the mechanism. The way the paper bounds this notion is a reminiscence of the epsilon,delta loss bounds. This notions seems to have the good properties that one would like in notions to guarantee generalization and it also relates well with the notions that have been previously studied in this line of work. One thing that is missing in the paper is a discussion of how to enforce a bound on this measure and so have good generalization properties, also just a discussion of how this quantity can be estimated is missing. This makes quite difficult to understand the effective usefulness of this notion. Another minor issue I have with this paper is in the way the separation results are formulated. Perhaps I am being too formal but Theorem 4.6 and Theorem 4.7 seem very specific to particular values of LMI. Could this be formulated in a more general way for arbitrary values of the parameters? It would be great if the authors can clarify this aspect in the rebuttal. Besides these two issues I find this an interesting and well written paper. Pros: -novel notion that guarantees adaptive generalization -the paper shows that previous notions guaranteeing generalization all imply a bound on Local Statistical Stability Cons: -unclear the scope of the separation results -the paper doesn't provide any method to guarantee a bound on the Local Statistical Stability, neither any concrete example which satisfies it. Comments after rebuttal ---------------------------- Thanks for your answers and for clarifying some of the doubts I had on this paper. Please, include a more thorough discussion on possible directions to find methods to enforce LSS in future versions of the paper. Please, also add a discussion about the relations between LSS and other stability notions guaranteeing generalization.

Reviewer 3


		
What I like about the paper is that is provides a unifying theory for various stability notions that have been discussed in the literature. One common thread that existing algorithms were known to have is post-hoc generalization, but it was later proved that this notion does not compose adaptively, meaning that, if individual algorithms prevent overfitting in the sense of post-hoc generalization, combining them may permit overfitting. The current notion, LSS, composes adaptively, similarly to differential privacy (DP). However, LSS is proved to be weaker than DP, but still sufficient to guarantee generalization. This proof is given via the monitor argument of Bassily et al. Under no further modeling of the general problem of adaptive data analysis, from previous results in the area we know that sqrt(k)/epsilon^2 samples are sufficient and necessary for generalization, where k is the number of queries and epsilon is the target accuracy, and we know differential privacy can guarantee this complexity, so I am not sure about direct practical implications of this work. For now I see the advantages as being primarily of theoretical nature, however the possible directions given in Section 5 sound promising. In Definition 2.1., is there a specific reason why total variation distance is used? Could similar results have been proved with a different distance between distributions? I’m not sure how to reason about the alpha_i sequence in Theorem 2.7 (this has to interplay somehow with the sample accuracy requirement, and without an explicit mechanism it’s hard to analyze this trade-off). That aside, I don’t completely agree with the comment made just below, that the theorem is non-trivial for alpha_i <= epsilon_i. In particular, if alpha_i = epsilon_i, then the result is strictly worse than that of Theorem 2.7? The main weakness of this paper is that it is not clear what the space of LSS mechanisms is. It possibly allows mechanisms that give us all the benefits of DP, but circumvent other issues (or under some modeling assumptions they might even outperform DP). But currently the definition is too general to see explicit mechanisms. (Since the mechanisms don’t know the prior data distribution, it’s hard to see how to guarantee LSS.) 
I wonder if this setup can be used to say something when the prior of the adversary is off. Currently it’s assumed that the prior is the actual data distribution, but does this framework allow saying something about overfitting if we know how much the adversary’s prior deviates from the true one? Minor comment: Is there a reason why sequences in X^n (possibly non-iid samples) are considered? I didn’t see any motivation for non-iid data sets, however considering a distribution over sequences seemed to make notation and discussion unnecessarily more complicated at times. Overall I think the paper gives some interesting theoretical insights. It’s hard to see at this point what its practical implications might be. Post-feedback: I thank the authors for clarifying some confusions that I had, and I'm happy to see the authors are also thinking of novel mechanisms, as suggested in the reviews. I think this paper makes a good theoretical contribution and I still recommend acceptance.