
Submitted by
Assigned_Reviewer_4
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
This paper proposes a localized, lowrank receptive
field model and develops approximate Bayesian inference methods with
Gaussian and Poisson observation models. The paper is a nice step
forward in RF estimation. It’s well written, and technically sound. I
enjoyed reading the paper and have only a few minor comments.
Although the paper provides clear proofofconcept examples that
show the benefits of lowrank, localized RF estimation, one area that
would be especially interesting to expand on is model performance with
highly nonGaussian, natural scene stimuli. In this case lowrank STAs
estimated from the whitened stimuli still have large biases that a
lowrank modelbased approach should resolve.
line 137 “scaler”
should be “scalar” Wording is a bit awkward at line 159 “avoids the
algorithm from searching”
In response to rebuttal: all my
concerns have been addressed. Q2: Please summarize your
review in 12 sentences
This paper proposes a localized, lowrank receptive
field model and develops approximate Bayesian inference methods with
Gaussian and Poisson observation models. The paper is a nice step
forward in RF estimation. Submitted by
Assigned_Reviewer_5
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
In this paper, the authors develop a sophisticated set
of Bayesian techniques to infer lowrank neural receptive fields (RFs).
This is a problem of interest to many modellers and experimenters within
sensory neuroscience, as identifying the transformations from stimulus
spaces to output spike trains is both extremely useful for experimental
protocols, and very difficult or dataintensive due to the very high
dimensionality of the stimulus space. One option has been to exploit
potential symmetries in stimulus space, and build lowrank approximations
to RFs based on sums of spacetime (e.g. Pillow et al, 2008) or
frequencytime (e.g. Ahrens et al 2008) separable kernels. While these
methods are useful, they have, to date, received less attention from
modeldevelopment front. This has limited their widespread use.
Here, the authors extend a set of Bayesian inference techniques to
lowrank RF estimation. This consists of adding hierarchical priors to the
components of the reducedrank RFs (using the ALD framework of Park &
Pillow, 2012), providing fullBayesian and fast approximate inference
algorithms for the hierarchical model under a Gaussian likelihood, and
fast approximate inference method for the popular linearnonlinearPoisson
(LNP) likelihood using a Laplace approximation. The authors then
demonstrate the utility (and speed) of these methods by fitting lowrank
RFs to simulated and real neural data from retina and V1. Overall, this
work is successful and pleasing, generally clear (constrained by the
required brevity of the format), original, and likely to yield promising
future results.
My only constructive comments on the paper would
be to clear up a few confusions and add some minor detail:
 Lines
133141. My understanding of this is that there are separate
hyperparameters {rho, ..., Phi_f} for C_t and C_x, but this is not clear
within the text. The current version is written to describe ALD for
fullrank RFs, and should be amended for the lowrank cases. In the
sentence "In the ALD prior..." (line 135), the localisation of C_x should
be in space and spatial frequency, while that for C_t should be in time
and temporal frequency. Likewise, the description of mu_s and mu_f as
being centred in spacetime and frequency should change (and line 140
too). Also, mu_s and mu_f as length D vectors, and Phi_s and Phi_f as DxD
matrices, should be D_x / D_t and D_x by D_x / D_t by D_t respectively,
with footnote 1 changed to reflect this.
 Figs 12: space/time
labelling of axes, as in Fig. 3, would be helpful.
 The results
in Figs 34 assume a Gaussian likelihood, yes? I am guessing this due to
the use of the Gibbs sampler. This should be stated.
 Are the
gains in model performance shown in Figs 34 the result of using the
lowrank approximation, or the result of using the ALD prior? It would be
helpful to quantify how much is being gained or lost by making the
lowrank assumption (especially since a major contribution of this paper
is to extend ALD to the lowrank case). To this end, a comparison against
a fullrank ALD estimate would be of value: just demonstrating an
improvement over STA is a straw man. From Fig 5B, my guess would be that
lowrank ALD would perform the same as fullrank ALD for the Gaussian
likelihood used in Figs 34. Notwithstanding this, the orderofmagnitude
improvement in compute time for lowrank over fullrank remains useful.
 As always, the experimenters who would find this work most
useful would also likely find it near impossible to digest. The net
utility of this work depends on code being made
available. Q2: Please summarize your review in 12
sentences
This paper provides a useful and welcome extension of
Bayesian receptive field inference to the estimation of lowrank receptive
fields. Submitted by
Assigned_Reviewer_6
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
This paper describes an elegant approach to inferring
lowrank spatiotemporal receptive fields of neurons using a Bayesian
formulation of reducedrank regression. The problem is clearly specified,
the implementation is nicely developed (in the form of both a sampling
scheme, and a fast approximate inference, applied to both gaussian and
Poisson noise models), and appears quite robust, in both simulation and on
real data.
That said, I am not convinced by the magnitude of
advance over existing approaches.
Most important, perhaps, are the
simulation results in Figs. 1 and 2. The lowrank approaches outperform ML
by a long shot, but the improvement over the fullrank ALD is small. The
lowrank estimates incorporates the ALD prior; how do the lowrank
estimates look without that prior? Is much of the benefit for all nonML
approaches coming from the ALD prior rather than the lowrank method per
se? And for the nonsimulation results, how do the lowrank results
compare to the fullrank ALD estimates? Are they similar, as in the
simulation? In that case, the benefit of the reduced rank approach isn't
entirely clear. Indeed, even in the introduction, the notion that some
form of regularization is neccessary is well articulated, but the
motivation for reducedrank versus imposing the prior (but estimating a
full rank RF) is left vague. An additional concern regards the possibility
of RFs that are not well described as spacetime separable. How does the
approach described here perform in such a setting? And how do the results
compare to a fullrank, but regularized solution?
The one clear
benefit from solving the reduced rank problem is speed (Figure c), though
this only holds for the fast, approximate algorithm. This point could be
emphasized and elaborated. Because beyond the speed improvement, although
the method is nicely developed and mathematically interesting, the benefit
to the RF estimation problem, compared to other regularizationbased
approaches, is not clear. Q2: Please summarize your
review in 12 sentences
The authors develop a Bayesian approach for reduced
rank regression to estimate lowrank neuronal spatiotemporal receptive
fields. The approach is nicely developed and appears robust in simulation
and in practice, but the benefit over existing approaches based on
regularization (e.g. full rank ALD) is not dramatic, aside from a large
reduction in speed of computation.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 6000 characters. Note
however that reviewers and area chairs are very busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
We thank the reviewers for their careful reading of
our manuscript and helpful comments for improving it. Below, we'll address
a few specific comments raised in the reviews.
============
Assigned_Reviewer_4 ============
> Application to
natural scenes data.
This is a great idea, and thanks for the
suggestion. We agree that the STA should have larger bias in such cases,
and so the improvement over a lowrank STA should be even more
substantial. We can easily substitute a simulated example with
naturalistic stimuli in the final version.
> Typos and awkward
wording: Thanks, we will fix these.
============
Assigned_Reviewer_5 ============
> Separate
hyperparameters {rho, ..., Phi_f} for C_t and C_x
We apologize for
the confusion. Yes, these are indeed separate sets of hyperparameters
governing C_t and C_x (the prior covariance for temporal and spatial
components, respectively). We will make this clear in the final version.
> Figs 12: space/time labelling of axes
Thanks, we
will add labels.
> The results in Figs 34 assume a Gaussian
likelihood?
Yes, sorry for the omission. We will make this clear
in the final version.
> Are the gains in model performance
shown in Figs 34 the result of using the lowrank approximation, or the
result of using the ALD prior? It would be helpful to quantify how much is
being gained or lost by making the lowrank assumption.
Thanks for
this question; we agree this is an interesting and important question. Our
intuition says that both the rank constraint and the localized prior
should contribute substantially to the improvement, but we will
investigate this issue more closely by fitting fullrank ALD and and
showing its relative performance; we will add this to the final
version. (From the simulations in Fig. 1 and 2, it's unclear whether it's
having the Poisson likelihood or having data with Poisson variability
that's responsible for the relative improvement of the low rank estimate;
we will investigate this as well.)
> Publishing code:
Yes, we agree wholeheartedly, and definitely intend to release
code.
============ Assigned_Reviewer_6 ============
> Magnitude of the improvement over fullrank ALD.
We
admit we're a bit surprised that fullrank ALD did so well, as we had
expected a more dramatic improvement over the range of sample sizes shown
in figures 1 and 2. However, we would like to draw the reviewer's
attention to the first data point in Fig. 2B (performance 250 training
samples). Here, the lowrank estimate achieves roughly 70% reduction in
MSE compared to fullrank ALD (MSE of 0.8 vs. 0.25). So, we think that the
improvement in realistic settings may be more dramatic than indicated by
this pair of figures. We intend to look more closely at the lowSNR regime
(larger filters, fewer number of samples), where we think the relative
improvement should be greater.
> Comparison to lowrank ML
estimates
We omitted the lowrank ML estimates due to the space
limitations, but will add them to the revision. (The lowrank ML estimate
is generally better than fullrank ML estimate, but substantially worse
than lowrank estimates with the ALD prior.)
> Motivation for
reducedrank versus imposing the prior (but estimating a full rank RF) is
left vague
Thanks for this comment. We will attempt to articulate
this more clearly in the revision. Part of the motivation is just that (as
the reviewer noted) rank constraints provide a very powerful form of
regularization, since they dramatically reduce the number of coefficients.
(There are also computational and memory advantages, as the reviewer
mentioned). But there is also some biological motivation for this kind of
parametrization, which stems from the fact that if individual neurons have
a small number of relevant timescales, then a neuron that integrates input
from a large number of such neurons will also have a small number of
timescales, resulting in a lowrank RF even if it pools over a large
region of visual space RF. Neuroscientists have also shown particular
interest in the issue of spacetime separability (e.g., Adelson &
Bergen 1985, showing how to construct motion detectors from a small number
of spacetime separable units).
> Possibility of RFs that are
not well described as spacetime separable.
Apologies for the
confusion: none of the examples shown involved RFs that were actually
spacetime separable (i.e., rank1). For example, the V1 RFs shown in Fig
3 have a clear spacetime orientation, but are still well described as
lowrank (optimal rank of 2 for cell #1, optimal rank of 4 for cell #2).
We feel that most "standard" RFs in the early visual, auditory, or
somatosensory pathways could be well parametrized as lowrank, even though
they are not (for the most part) separable.
> Method is nicely
developed and mathematically interesting
Thanks again for this
comment. We agree, and would like to mention that we are particularly
proud of the theoretical contribution of formulating a prior for lowrank
RFs (and accompanying inference method) that places a marginally Gaussian
prior over the RF coefficients despite the rank constraint, which puts it
on equal footing with other regularization methods that derive from
Gaussian priors (e.g., ridge regression, graph Laplacian, ASD, ALD). We
undertook an extensive review of the statistics literature on "reduced
rank regression" and could not find any previous formulation of this idea;
prior work emphasized either noninformative or Gaussian priors on the
separable components (which, obviously, lead to a nonGaussian marginal
prior and posterior).
 