NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:2642
Title:Scalable Spike Source Localization in Extracellular Recordings using Amortized Variational Inference

Reviewer 1

The paper is fairly clear and proposes a novel biologically inspired model for spike localization. Largely, it is well-down, and provides new paths for exploring the link between individual neurons and electrophysiological properties. It could be used later on for identifying properties of subtypes of neurons and their biological role, for instance, by matching multiple sensing techniques. However, there are a few issues. 1. To me, it's unclear why the data augmentation is truly necessary. Under the model, I feel like it would work without this step. An ablation analysis of what it actually accomplishes and a clear, precise description of why it is helpful would be beneficial. 2. The spike sorting analysis is frustrating for a number of reasons. First, the authors sweep over the number of clusters to report the results. The fact that the number of clusters is unknown a priori is one of the biggest issues, so this is unrealistic. Second, there is a complete lack of comparisons to state-of-the-art methods. Many, many methods are available with publicly available code for such datasets, including more useful evaluation metrics (e.g., [1]). The claim that combining location estimates and waveform estimation was introduced in 2017 is somewhat tenuous; this is implicit in nearly every dense MEA sorting method. Because of this, it is unclear to this reader whether the approach would actually contribute to a state-of-the-art sorting package. 3. The scalability here is through time, but does not appear to directly address scalability in channels. Specifically, a neuropixels device is a good current dense MEA, but several research groups are building and evaluating devices with >10,000 electrodes. Since the current VAE takes all channels as inputs and the number of detection typically scales linearly with the number of channel, I estimate this method would be quadratic with the number of channels. While certainly not an issue for most devices today, it would be useful to comment on. [1] Barnett, Alex H., Jeremy F. Magland, and Leslie F. Greengard. "Validation of neural spike sorting algorithms without ground-truth information." Journal of neuroscience methods 264 (2016): 65-77. The author feedback was reasonable to address my criticisms, and I have revised my score appropriately.

Reviewer 2

The paper introduces a generative model for predicting the location of spikes from microelectrode array recordings. Inference is performed by reformulating the learning algorithm as a Variational Autoencoder. While the model is well defined it bothers me that the amplitudes are modelled as Gaussian random variables. Spikes are essentially defined as non-gaussian events on the potential. Even though spike detection is not addressed in this work, I would expect the amplitude of the spikes to be modeled as non-gaussian in order to learn the appropriate structure. The paper seems to deal with localisation in a much more sophisticated way than anything else that is currently available. However, the numerical comparisons do not provide sufficient comparison with other localisation approaches. The paper would improve considerably through a better comparison with other modern localization approaches. All in all a good paper but it needs more comparison with state of the art. After reviewing the rebuttal: The reply for non-gaussianity of spikes was not satisfactory. If the authors are going to use a variational inference approach you should try to work on models with the appropriate prior distributions. Modern computational frameworks allow for more modelling flexibility than what is exposed in this work. I think the original score is sufficient for this work.

Reviewer 3

The authors develop an unsupervised, probabilistic, and scalable approach for spike localization from MEA recordings, in contrast to previous approaches which either required supervision, did not scale to large datasets, or relied on a simple heuristic (e.g., COM). Though the proposed model is relatively straightforward and the authors use standard approximate inference techniques to learn the desired posterior, the application to this domain and empirical validation of the approach seem to be novel contributions. The work is technically sound, with empirical results that demonstrate improved performance as compared to the COM heuristic for spike localization. However, there is no experimental validation of the proposed data augmentation scheme alone, as it seems that this scheme is used in both the MCMC and VAE approaches which makes it unclear what fraction of the performance improvement over COM is due to data augmentation versus the model-based posterior inference. A comparison to alternative spike localization methods besides COM would also strengthen the work, though I'm not sure if the works cited by the authors would even be feasible for the scale of the datasets being analyzed and therefore don't consider this a major shortcoming of the work. Though the writing is clear overall, some minor details could be clarified: the description of the model in Section 3.1 describes a procedure for choosing the location prior means (lines 122-123), but the proposed inference methods are stated as using a location prior mean of zero, which seems to be a discrepancy. Section 3.2 describes a bounding box of width W and number of channels L that are used in the data augmentation scheme, but the values for these used in the experiments are not explicitly stated (based on the captions from the figures/tables, it seems like values of W = 3, 5 and therefore L = 4-9, 9-25 were used, but this could be more made more clear). However, these are more minor issues that could easily be fixed. Update based on author feedback: Having read the authors' response, I feel like my concern about the effect of the data augmentation scheme was properly addressed, particularly if the authors commit to including an empirical analysis of the effect of the data augmentation on overall performance in the appendix as they mention. However, the other reviewers' comments on the lack of comparison to state-of-the-art methods makes me feel that this is more of a shortcoming of the submission than I had initially thought, and I'm not familiar enough with the alternative methods to know if leaving out any comparison to them is justified. Overall, I would still lean more towards accepting the submission but don't feel confident enough to strongly recommend acceptance, and therefore maintain my original score.