__ Summary and Contributions__: The paper proposes a meta-learning approach to find explainable plasticity rules that install a certain functionality in a neuron or network, where the plasticity rule itself is given as a mathematical expression. The main contributions are the method in itself, and empirical evidence supporting that the proposed method works as intended for small problems with both rate-based neuron models as well as spike-based neuron models.

__ Strengths__: One strength of the proposed approach is that, in principle, it could allow to find synaptic plasticity rules that can install a certain functionality on the network level among the set of plasticity rules that are considered biologically realistic. An additional strength is that these plasticity rules are described by symbolic formulas, and because they are optimised using L1 regularisation, the resulting plasticity rules facilitate explainability due to a smaller number of terms.

__ Weaknesses__: There are some weaknesses within the proposed approach, many of which are already identified and acknowledged by the authors. For example, the approach apparently does not scale well. This seems to be due to the fact that finite-difference gradients are used for optimisation, and due to a loss landscape that is very unforgiving. In order to yield meaningful insights into brain functioning and plasticity on the network level, the network needs to be somewhat larger than a handful of neurons, which I think is somewhat a problematic issue with the present approach.

__ Correctness__: The claims are well-supported by the experiments, which appear to be carried out correct. The authors provide insightful figures that further support correctness of methodology.

__ Clarity__: The paper is well-understandable and clearly written.

__ Relation to Prior Work__: Prior work is adequately addressed but one could try to add a paragraph to connect it to relevant works in terms of meta-learning in machine learning. For instance, in [1] potential weight changes, obtained by BP, are further transformed by a meta-learned rule, or [2] learns how much plasticity should be exhibited by certain synapses. However, I am not aware of a similar approach that considers symbolic weight update rules, hence further supports the novelty of this work.
[1] Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M. W., Pfau, D., Schaul, T., ... & De Freitas, N. (2016). Learning to learn by gradient descent by gradient descent. In Advances in neural information processing systems (pp. 3981-3989).
[2] Miconi, T., Clune, J., & Stanley, K. O. (2018). Differentiable plasticity: training plastic neural networks with backpropagation. arXiv preprint arXiv:1804.02464.

__ Reproducibility__: Yes

__ Additional Feedback__: I believe that the paper is well-written and does present its results adequately. My main concern is that the approach is stuck at the level of just a few neurons, and it is not clear if it would ever apply to more complicated scenarios, where there is opportunity to really discover new plasticity rules that achieve learning in such networks of neurons.
l.100: I think "...received synapses" should be "...receive."
l.160: "we only allowed *one* minute"
Post rebuttal update: The authors have responded to my concern with regard to the scalability of the approach with a different optimization procedure, showing that it applies to larger problems. Hence, I increased my score.

__ Summary and Contributions__: The authors used meta-learning to train synaptic plasticity rules. They were able to recover two known plasticity rules in a simple but classical use case. SGD was able to identify Oja’s rule and an anti-Hebbian rule in neural networks performing PCA. They also applied this method to an excitatory-inhibitory spiking circuit, and found that the inhibitory plasticity rules emerged live in an analytically predicted region in the parameter space.

__ Strengths__: Identifying the set of synaptic plasticity rules that support development, learning, and memory in brains remains one of the most daunting challenge in neuroscience. A success at this effort could presumably benefit machine learning as well. So the overall topic is important, and the authors are testing recent machine learning tools to this classical problem. This work is relevant to the NeurIPS community.

__ Weaknesses__: The main weakness of this work, in my opinion, is its lack of empirical demonstration that this approach could actually scale and generalize. The authors first focused on a very simple problem, recovering Oja’s rule, and already demonstrated that in a modestly-sized network (# input neurons > 11), it is difficult to do so. This of course doesn’t suggest that the entire meta-learning approach to synaptic plasticity rules couldn’t scale, rather it concerns the specific implementation and parametrization used by the authors. In the second example, the author shows that the meta-learning approach recovers plasticity rules (Fig 4D) that could be derived with one-line of derivation (eq 13 to eq 15). So here it’s not clear what the additional value of the meta-learning approach is.
----------
Update after rebuttal:
I have updated my score from 5 to 6 after seeing the new results on better scalability.

__ Correctness__: The empirical methods are sound.
Typo at line 153, should be $\beta = -\alpha \niu_pre / \niu_post$

__ Clarity__: The paper is overall clearly written and understandable.
It is, however, not very clear to me how in section 2.3, the authors were able to train the plasticity rule of a spiking network (a non-differentiable system) using gradient-based methods.

__ Relation to Prior Work__: This work is well-motivated and well-referenced. A potentially relevant reference missing is Miconi, Clune, Stanley 2018.

__ Reproducibility__: Yes

__ Additional Feedback__: None.

__ Summary and Contributions__: The authors propose to learn the meta-parameters of unsupervised learning rules by performing supervised learning on top of training a neural network on various datasets.

__ Strengths__: This work offers a proof of principle for the learning of meta-parameters. This approach complements existing analytical work relating learning rules to computational objectives.

__ Weaknesses__: The significance of this work for neuroscience is not clear

__ Correctness__: Yes

__ Clarity__: Yes

__ Relation to Prior Work__: The gap between the learning rules and the computational function has been bridged previously by analytical derivations of online optimization algorithms, see e.g. C. Pehlevan and D. B. Chklovskii, "Neuroscience-Inspired Online Unsupervised Learning Algorithms: Artificial Neural Networks," in IEEE Signal Processing Magazine, vol. 36, no. 6, pp. 88-96, Nov. 2019, doi: 10.1109/MSP.2019.2933846.

__ Reproducibility__: Yes

__ Additional Feedback__: Given that analytical derivations of Oja-like learning rules (considered in the first two problems) exist, the implication of their work for neuroscience is not clear. Such analytical derivations from principled objective functions that reflect computational tasks are not subject to the limitations of numerical optimization reported here such as having only a few input or output channels (in contrast to thousands in a biological neural network). Is their optimization algorithm modeling the evolution of learning rules? If yes, they should discuss this. If not, how can it be used to gain insight into brain computation? What specific predictions can be tested experimentally? When, in general, do they expect numerical optimization to be needed?
I find the third problem addressed in the paper somewhat artificial because it assumes that the excitatory synapses are stable while inhibitory - plastic. It is my impression that in biology the opposite relationship exists. If so, their model may miss the dynamics of synaptic weights taking place in biological neurons.