NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID: 4934 Preference-Based Batch and Sequential Teaching: Towards a Unified View of Models

### Reviewer 1

In this paper, a unified view of different teaching models is proposed based on preference functions. It shows that classical teaching models and their corresponding teaching complexity measures can be treated as teaching under preference functions with different levels of flexibility. Even though it is known in the literature that preference functions play a crucial role in teaching, proposing such unified view is still meaningful for inspiring future researches. Furthermore, the most interesting result in the paper is the study on the family of “local version space” preference function, which relies on both the current version space and the learner’s current local hypothesis. A constructive proof is provided to show that there exists such LVS preference function, under which the size of the optimal teaching set is linearly upper bounded by the VC dimension of the hypothesis space. This is an interesting result since studying the relationship between VC and the teaching dimension is one of the most important problem in machine teaching. As cracking the well-known open problem about the relationship between recursive teaching dimension, which corresponds to teaching with a global preference function, and VC dimension is still difficult, the above result reveals the importance of designing more flexible teaching models. To my understanding, this paper provides a potentially high impact to inspire further researches towards this direction. Further suggestions: 1) In the proof of Theorem 3, the notion of VC dimension w.r.t. a hypothesis class and a specific dataset is utilized with formal definition. This definition is different from classical VC dimension which involves only on the hypothesis class. Due to its importance in the proof, a formal definition is bettered to be included. 2) The room to improve paper presentation and organization is significant. To my understanding, the current version of the paper is not friendly enough to readers without a good knowledge about previous machine teaching literatures. For example, it is not discussed in detail about the teaching protocols: whether the teacher knows about the learner’s preference function? Will the teacher know the preference function beforehand, or get better knowledge during the sequential teaching process? A more detailed explanation about such stuffs and intuition behand the main results can be beneficial to further improve the potential impact of the paper. ------ after rebuttal: I have read all the reviews and the author response. I agree with other reviewers that the work is interesting and novel, thus my score remains the same.

### Reviewer 2

The paper is on teaching models, a topic in computational learning theory. One of the recurring issues is how to formulate a model which avoids collusion, i.e., cheating'', where the teacher communicates the target to the learner using some encoding of its description into the teaching examples. The standard approach to teaching is through a set of examples uniquely defining the target; thus in this case the learner's task is to find a consistent hypothesis. More recent approaches, such as the recursive teaching dimension and the model of [CSMA+18], assume more sophisticated learners. In this paper an even more sophisticated class of learners is defined. Here learning is done in a sequential process, and in each step the learner finds a most preferred hypothesis, based on an evaluation of the current version space and the last hypothesis. This model is closely related to [CSMA+18], except here the version space is an additional parameter in the preference function. The results of the paper are interesting in themselves, and they also provide an interesting picture for the relative power of the various teaching models. The second half of Theorem 3 is an example of the former: it shows that in the most general model there is already a preference function which is proportional to the VC-dimension (whether this holds for RTD is a major open problem). This result is based on an interesting combinatorial lemma (Lemma 4). The latter is summarized in Figure 1 (where the relationships to previous models are given in Table 1). Comments The paper emphasizes the critical role of the appropriate definition of collusion-freeness for sequential models. The corresponding part of the proof of Theorem 3, however, is relegated to the Appendix. Some discussion of this should be given in the main text. Also, the concluding section mentions defining (other?) notions of collusion-freeness in sequential models. Here it should be clarified what are the issues with the current definition.