Reviews: Push-pull Feedback Implements Hierarchical Information Retrieval Efficiently

Update: I apologize for my confusion about the dynamics. I feel more positively now about this work, and have increased my score. 1) For this work to have a lot of value to neuroscientists, I'd want to understand the realism of the dynamics much better. There are two issues here to be addressed: a) how realistic is it for the dynamics of the feedforward pass + recurrence within layers to run to convergence *before* sending down the top-down feedback? What happens if these are concurrent processes, such that units get both bottom-up and top-down inputs at the same time? b) The discussion about inhibitory feedback being delayed relative to excitatory is correct, but suggests a tight constraint: the biology data tell us how much time elapses between these two feedback types. Given the time-scale of the recurrent dynamics in cortex, the authors could then ask (in their model) whether this delay is "enough" for their push-pull mechanism to work. If yes, that would strengthen the result a fair bit. 2) There are other students of hierarchical RNNs, and it is worth discussing the details of those works (there are many) to highlight the novelty of this paper. So, for example, the 2nd sentence of introduction is correct (most DNNs are just feedforward) but a bit misleading (because there are several recent papers about hierarchical RNNs). 3) If the unit activities are sign(input) (e.g., Eq. 1), couldn't the pattern overlap (eq. 6) be -1, in the extreme case where two patterns are opposite? In that case, I disagree with the inequality given after Eq. 6, of 1>m>0. This could be easy to fix (or, in case I am wrong, please just tell me why). 4) It's neat that the dynamics with feedback resemble the monkey data on success trials, and the dynamics without feedback resemble those on unsuccessful ones (no contour recognition). 5) I don't understand how the system is implemented for the natural images. It seems to me like, to define the feedforward connectivity, you need to know the "parent" and "grandparent" patterns corresponding to the "child" patterns. That's straightforward (by construction) in the first part of the paper, where you define distributions for these patterns. But for the natural images, I don't understand how you know the parent and grandparent patterns. E.g. what is the "parent" pattern for the dog images? I get that it has something to do with dogs, but I don't see how you determined the specific numerical values.

The paper aims to investigate the role of feedback connections in hierarchal information retrieval. It specifically looks into the potential role of push-pull feedback mechanism in aiding the retrieval process. The authors do this by first formulating a toy model that illustrates how the push and pull feedback helps in reducing inter-class and intra-class noise respectively. Then they show simulation results using real images that further corroborates their intuition. This is an interesting idea and a reasonable attempt at mathematizing a widely observed phenomenon in biology. I think the work would benefit from a more thorough set of experiments and explanations. Strengths: – The paper provides a novel perspective on how the push-pull feedback mechanism could possibly aid hierarchical information retrieval. This could be a potential first step towards building intuitions for understanding the role of feedback in predictive coding. – In addition to the paper’s contribution in the direction of understanding push-pull feedback from a neuroscience point of view, this work also shows a possible advantage of incorporating dynamic feedback connections in deep neural networks for improving information retrieval performance. – The paper addresses the conflicting consequences of correlation in neural encoding: higher correlation is essential for embedding categorical relationship between objects, but higher correlation between neural encodings could degrade information retrieval. Substantial weakness: – The authors distinctly claim that push feedback helps in reducing inter-class noise and pull-feedback helps in reducing intra-class noise. But Fig.S2 suggests that only-pull feedback plays a much greater role than only-push feedback in reducing the inter-class noise. It also seems like the push feedback plays a very minor role in improving the performance when both push-pull is used, as compared to just pull. Thus their argument about both push AND pull feedback seems like it fails to explain the actual empirical findings. In other words, pull-only is almost as good as Push-pull, and within error bars. So the claim that push is helpful is a bit dubious. – When push only is applied and then removed, I expect that the coarse-scale class is improved. They should demonstrate this, even if the fine scale classes have additional errors. – It is unclear how they would handle multiple dimensions of parent categories (white animals vs black animals, and not just cats vs dogs) – Confusing notation. – The justification for the push feedback is clear, but I don't understand the mechanism of the pull improvement. This could use substantially more explanation. Given that the pull feedback is linear, I don't see how it could help the actual inputs be distinguished from the average of a parent class. That seems like it would need to impose an energy maximum at the average children of a given parent, and this requires an energy decreases on either side of the maximum, inconsistent with a linear feedback term. Minor weaknesses: – As mentioned in lines 146-149, the mathematical analysis to show the role of push and pull feedback was restricted to the specific case where a parent pattern is perfectly retrieved already. It would be really insightful if its generalization to other cases were also analytically shown, in addition to the empirical analysis provided in the paper. This may be substantially harder, so it is not necessary. – It would be really informative if the authors could shed some light on how they arrived at the forms of push and pull feedback proposed in equations 9 and 10. Or one step further, they could investigate a richer family of forms that push and pull feedback could possibly take. – Figures have been scaled to non-unity aspect ratio, so fonts and labels are messy. – Their push-pull explanation derives from inter-class versus intra-class variability. This structure is created by their generative model. But other stimulus ensembles have more structure than just inheritance. Would this create more mechanisms than just push and pull? ——After author feedback This is somewhat confusing, because now in this simulation with fewer patterns, pull feedback doesn't matter much, and yet the difference in pattern number is only 2.5x. Typical natural scene statistics will include huge numbers of possible images, categories, etc, which according to the authors should make push feedback irrelevant in natural conditions. I thank the authors for their explanation of the pull mechanism. I have increased my score slightly, from 5 to 6.

Paper ID:	3063
Title:	Push-pull Feedback Implements Hierarchical Information Retrieval Efficiently

Reviewer 1

Reviewer 2

Reviewer 3