Review for NeurIPS paper: The Advantage of Conditional Meta-Learning for Biased Regularization and Fine Tuning

NeurIPS 2020

The Advantage of Conditional Meta-Learning for Biased Regularization and Fine Tuning

Meta Review

Thank you for bringing up the shortcomings of the initial reviewers. I'm disappointed that they did not seem to evaluate the core contributions of the paper, which are theoretical in nature. [And while I agree with some of the sentiments of the reviewers, this paper isn't setting out to solve the problems that the reviewers raise.] After the rebuttal, I sought out and found two emergency reviewers to the paper who are better suited to review this paper. The two new reviewers (whose reviews should be visible) both scored the paper above the bar. I generally agree with their assessment, as well as their feedback on the paper. I encourage the authors to incorporate their valuable feedback into the camera-ready version of the paper, including: - better motivation for the use of side information - more discussion of this work in relation to prior conditional meta-learning works - suggested adjustments of the notation - moving experiments on real datasets & other experimental details to the main text (which should be possible with the extra page) - ideally a more realistic experiment where the side information is not data Beyond these reviewer's comments, I have two additional pieces of feedback: 1. It would be valuable to discuss the connection between this work and the theoretical findings of [A], since the motivation of conditioning is somewhat at odds with the result in [A] that unconditional meta-learning is maximally expressive given a large enough network. 2. A lot of the terminology in this paper diverges from the terminology used in other meta-learning papers, especially more empirically-focused papers. For example, there is "biased regularization and fine-tuning" versus "gradient-based meta-learning"/"optimization-based meta-learning", and "conditional meta-learning" vs. the names of prior methods like LEO and multi-modal MAML. It would be helpful to draw more of a bridge between the two sets of terminology in the text of the paper, to better help readers connect papers and methods in the field of meta-learning. [A] Finn & Levine. Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm. ICLR '18