{"title": "Fast amortized inference of neural activity from calcium imaging data with variational autoencoders", "book": "Advances in Neural Information Processing Systems", "page_first": 4024, "page_last": 4034, "abstract": "Calcium imaging permits optical measurement of neural activity. Since intracellular calcium concentration is an indirect measurement of neural activity, computational tools are necessary to infer the true underlying spiking activity from fluorescence measurements. Bayesian model inversion can be used to solve this problem, but typically requires either computationally expensive MCMC sampling, or faster but approximate maximum-a-posteriori optimization.  Here, we introduce a flexible algorithmic framework for fast, efficient and accurate extraction of neural spikes from imaging data. Using the framework of variational autoencoders, we propose to amortize inference by training a deep neural network to perform model inversion efficiently. The recognition network is trained to produce samples from the posterior distribution over spike trains.  Once trained, performing inference amounts to a fast single forward pass through the network, without the need for iterative optimization or sampling. We show that amortization can be applied flexibly to a wide range of nonlinear generative models and significantly improves upon the state of the art in computation time, while achieving competitive accuracy.  Our framework is also able to represent posterior distributions over spike-trains. We demonstrate the generality of our method by proposing the first probabilistic approach for separating backpropagating action potentials from putative synaptic inputs in calcium imaging of dendritic spines.", "full_text": "Fast amortized inference of neural activity from\n\ncalcium imaging data with variational autoencoders\n\nArtur Speiser12, Jinyao Yan3, Evan Archer4\u2217, Lars Buesing4\u2020,\n\nSrinivas C. Turaga3\u2021 and Jakob H. Macke1\u2021\u00a7\n\n1research center caesar, an associate of the Max Planck Society, Bonn, Germany\n\n2IMPRS Brain and Behavior Bonn/Florida\n\n3HHMI Janelia Research Campus\n\n4Columbia University\n\nartur.speiser@caesar.de, turagas@janelia.hhmi.org, jakob.macke@caesar.de\n\nAbstract\n\nCalcium imaging permits optical measurement of neural activity. Since intracellular\ncalcium concentration is an indirect measurement of neural activity, computational\ntools are necessary to infer the true underlying spiking activity from \ufb02uorescence\nmeasurements. Bayesian model inversion can be used to solve this problem, but\ntypically requires either computationally expensive MCMC sampling, or faster but\napproximate maximum-a-posteriori optimization. Here, we introduce a \ufb02exible\nalgorithmic framework for fast, ef\ufb01cient and accurate extraction of neural spikes\nfrom imaging data. Using the framework of variational autoencoders, we propose\nto amortize inference by training a deep neural network to perform model inversion\nef\ufb01ciently. The recognition network is trained to produce samples from the posterior\ndistribution over spike trains. Once trained, performing inference amounts to a fast\nsingle forward pass through the network, without the need for iterative optimization\nor sampling. We show that amortization can be applied \ufb02exibly to a wide range\nof nonlinear generative models and signi\ufb01cantly improves upon the state of the\nart in computation time, while achieving competitive accuracy. Our framework is\nalso able to represent posterior distributions over spike-trains. We demonstrate the\ngenerality of our method by proposing the \ufb01rst probabilistic approach for separating\nbackpropagating action potentials from putative synaptic inputs in calcium imaging\nof dendritic spines.\n\n1\n\nIntroduction\n\nSpiking activity in neurons leads to changes in intra-cellular calcium concentration which can be\nmeasured by \ufb02uorescence microscopy of synthetic calcium indicators such as Oregon Green BAPTA-1\n[1] or genetically encoded calcium indictors such as GCaMP6 [2]. Such calcium imaging has become\nimportant since it enables the parallel measurement of large neural populations in a spatially resolved\nand minimally invasive manner [3, 4]. Calcium imaging can also be used to study neural activity at\nsubcellular resolution, e.g. for measuring the tuning of dendritic spines [5, 6]. However, due to the\nindirect nature of calcium imaging, spike inference algorithms must be used to infer the underlying\nneural spiking activity leading to measured \ufb02uorescence dynamics.\n\n\u2217current af\ufb01liation: Cogitai.Inc\n\u2020current af\ufb01liation: DeepMind\n\u2021equal contribution\n\u00a7current primary af\ufb01liation: Centre for Cognitive Science, Technical University Darmstadt\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fMost commonly-used approaches to spike inference [7, 8, 9, 10, 11, 12, 13, 14] are based on carefully\ndesigned generative models that describe the process by which spiking activity leads to \ufb02uorescence\nmeasurements. Spikes are treated as latent variables, and spike-prediction is performed by inferring\nboth the parameters of the model and the spike latent variables from \ufb02uorescence time series, or\n\u201ctraces\u201d [7, 8, 9, 10]. The advantage of this approach is that it does not require extensive ground\ntruth data for training, since simultaneous electrophysiological and \ufb02uorescence recordings of neural\nactivity are dif\ufb01cult to acquire, and that prior knowledge can be incorporated in the speci\ufb01cation of the\ngenerative model. The accuracy of the predictions depends on the faithfulness of the generative model\nof the transformation of spike trains into \ufb02uorescence measurements [14, 12]. The disadvantage\nof this approach is that spike-inference requires either Markov-Chain Monte Carlo (MCMC) or\nSequential Monte-Carlo techniques to sample from the posterior distribution over spike-trains or\nalternatively, iterative optimization to obtain an approximate maximum a-posteriori (MAP) prediction.\nCurrently used approaches rely on bespoke, model-speci\ufb01c inference algorithms, which can limit\nthe \ufb02exibility in designing suitable generative models. Most commonly used methods are based on\nsimple phenomenological (and often linear) models [7, 8, 9, 10, 13].\nRecently, a small number of cell-attached electrophysiological recordings of neural activity have\nbecome available, with simultaneous \ufb02uorescence calcium measurements in the same neurons.\nThis has made it possible to train powerful and fast classi\ufb01ers to perform spike-inference in a\ndiscriminative manner, precluding the need for accurate generative models of calcium dynamics\n[15]. The disadvantage of this approach is that it can require large labeled data-sets for every new\ncombination of calcium indicator, cell-type and microscopy method, which can be expensive or\nimpossible to acquire. Further, these discriminative methods do not easily allow the incorporation\nof prior knowledge about the generative process. Finally, current classi\ufb01cation approaches yield\nonly pointwise predictions of spike probability (i.e. \ufb01ring rates), independent across time, and ignore\ntemporal correlations in the posterior distribution of spikes.\n\nFigure 1: Amortized inference for predicting spikes from imaging data. A) Our goal is to infer a\nspike train s from an observed time-series of \ufb02uorescence-measurements f. We assume that we have\na generative model of \ufb02uorescence given spikes with (unknown) parameters \u03b8, and we simultaneously\nlearn \u03b8 as well as a \u2018recognition model\u2019 which approximates the posterior over spikes s given f\nand which can be used for decoding a spike train from imaging data. B) We parameterize the\nrecognition-model by a multi-layer network architecture: Fluorescence-data is \ufb01rst \ufb01ltered by a deep\n1D convolutional network (CNN), providing input to a stochastic forward running recurrent neural\nnetwork (RNN) which predicts spike-probabilities and takes previously sampled spikes as additional\ninput. An additional deterministic RNN runs backward in time and provides further context.\n\nHere, we develop a new spike inference framework called DeepSpike (DS) based on the variational\nautoencoder technique which uses stochastic variational inference (SVI) to teach a classi\ufb01er to predict\nspikes in an unsupervised manner using a generative model. This new strategy allows us to combine\nthe advantages of generative [7] and discriminative approaches [15] into a single fast classi\ufb01er-based\nmethod for spike inference. In the variational autoencoder framework, the classi\ufb01er is called a\nrecognition model and represents an approximate posterior distribution over spike trains from which\nsamples can be drawn in an ef\ufb01cient manner. Once trained to perform spike inference on one dataset,\nthe recognition model can be applied to perform inference on statistically similar datasets without any\nretraining: The computational cost of variational spike inference is amortized, dramatically speeding\nup inference at test-time by exploiting fast, classi\ufb01er based recognition models.\n\n2\n\nSampled spikesPredicted probabilityForward RNN1D CNNBackwardRNN\fWe introduce two recognition models: The \ufb01rst is a temporal convolutional network which produces\na posterior distribution which is factorized in time, similar to standard classi\ufb01er-based methods [15].\nThe second is a recurrent neural network-based recognition model, similar to [16, 17] which can\nrepresent any correlated posterior distribution in the non-parametric limit. Once trained, both models\nperform spike inference with state-of-the-art accuracy, and enable simultaneous spike inference for\npopulations as large as 104 in real time on a single GPU.\nWe show the generality of this black-box amortized inference method by demonstrating its accuracy\nfor inference with a classic linear generative model [7, 8], as well as two nonlinear generative models\n[12]. Finally, we show an extension of the spike inference method to simultaneous inference and\ndemixing of synaptic inputs from backpropagating somatic action potentials from simultaneous\nsomatic and dendritic calcium imaging.\n\n2 Amortized inference using variational autoencoders\n\n2.1 Approach and training procedure\n\nWe observe \ufb02uorescence traces f i\nt , t = 1 . . . T i representing noisy measurements of the dynamics\nof somatic calcium concentration in neurons i = 1 . . . N. We assume a parametrised, probabilistic,\ndifferentiable generative model p\u03b8i(f|s) with (unknown) parameters \u03b8i. The generative model\npredicts a \ufb02uorescence trace given an underlying binary spike train si, where si\nt = 1 indicates that\nthe neuron i produced an action potential in the interval indexed by t. Our goal is to infer a latent\nspike-train s given only \ufb02uorescence observations f. We will solve this problem by training a deep\nneural network as a \u201crecognition model\u201d [18, 19, 20] parametrized by weights \u03c6. Use of a recognition\nmodel enables fast computation of an approximate posterior distribution over spike trains from a\n\ufb02uorescence trace q\u03c6(s|f ). We will share one recognition model across multiple cells, i.e. that\nq\u03c6(si|f i) \u2248 p\u03b8i(si|f i) for each i. We describe an unsupervised training procedure which jointly\noptimizes parameters of the generative model \u03b8 and the recognition network \u03c6 in order to maximize a\nlower bound on the log likelihood of the observed data, log p(f ) [19, 18, 20].\nWe learn the parameters \u03c6 and \u03b8 simultaneously by jointly maximizing LK(\u03b8, \u03c6), a multi-sample\nimportance-weighting lower bound on the log likelihood log p(f ) given by [21]\n\n(cid:34)\n\nK(cid:88)\n\nk=1\n\n(cid:35)\n\nLK(\u03b8, \u03c6) = Es1,...,sK\u223cq\u03c6(s|f )\n\nlog\n\n1\nK\n\np\u03b8(sk, f )\nq\u03c6(sk|f )\n\n\u2264 log p(f ),\n\n(1)\n\nwhere sk are spike trains sampled from the recognition model q\u03c6(s|f ). This stochastic objective\ninvolves drawing K samples from the recognition model, and evaluating their likelihood by passing\nthem through the generative model. When K = 1, the bound reduces to the evidence lower bound\n(ELBO). Increasing K yields a tighter lower bound (than the ELBO) on the marginal log likelihood,\nat the cost of additional training time. We found that increasing the number of samples leads to better\n\ufb01ts of the generative model; in our experiments, we used K = 64.\nTo train \u03b8 and \u03c6 by stochastic gradient ascent, we must estimate the gradient \u2207\u03c6,\u03b8L(\u03b8, \u03c6). As our\nrecognition model produces an approximate posterior over binary spike trains, the gradients have to be\nestimated based on samples. Obtaining functional estimates of the gradients \u2207\u03c6L(\u03b8, \u03c6) with respect\nto parameters of the recognition model is challenging and relies on constructing effective control\nvariates to reduce variance [22]. We use the variational inference for monte carlo objectives (VIMCO)\napproach of [23] to produce low-variance unbiased estimates of the gradients \u2207\u03c6,\u03b8LK(\u03b8, \u03c6). The\ngenerative training procedure could be augmented with a supervised cost term [24, 25], resulting in a\nsemi-supervised training method.\n\nGradient optimization: We use ADAM [26], an adaptive gradient update scheme, to perform\nonline stochastic gradient ascent. The training data is cut into short chunks of several hundred\ntime-steps and arranged in batches containing samples from a single cell. As we train only one\nrecognition model but multiple generative models in parallel, we load the respective generative model\nand ADAM parameters at each iteration. Finally, we use norm-clipping to scale the gradients acting\non the recognition model: the norm of all gradients is calculated, and if it exceeds a \ufb01xed threshold the\ngradients are rescaled. While norm-clipping was introduced to prevent exploding gradients in RNNs\n\n3\n\n\f[27], we found it to be critical to achieve high performance both for RNN and CNN architectures in\nour learning problem. Very small threshold values (0.02) empirically yielded best results.\n2.2 Generative models p\u03b8(f|s)\nTo demonstrate that our computational strategy can be applied to a wide range of differentiable\nmodels in a black-box manner, we consider four generative models: a simple, but commonly used\nlinear model of calcium dynamics [7, 8, 9, 10], two more sophisticated nonlinear models which\nadditionally incorporate saturation and facilitation resulting from the dynamics of calcium binding to\nthe calcium sensor, and \ufb01nally a multi-dimensional model for dendritic imaging data.\n\nLinear auto-regressive generative model (SCF): We use the name SCF for the classic linear\nconvolutional generative model used in [7, 8, 9, 10], since this generative process is described by the\nSpikes st, which linearly impact Calcium concentration ct, which in turn determines the observed\nFluorescence intensity ft,\n\nct =\n\n\u03b3t(cid:48)ct\u2212t(cid:48) + \u03b4st,\n\nft = \u03b1ct + \u03b2 + et,\n\n(2)\n\np(cid:88)\n\nt(cid:48)=1\n\nwith linear auto-regressive dynamics of order p for the calcium concentration with parameters\n\u03b3, spike-amplitude \u03b4, gain \u03b1, constant \ufb02uorescence baseline \u03b2, and additive measurement noise\net \u223c N (0, \u03c32).\n\nNonlinear auto-regressive and sensor dynamics generative models (SCDF & MLphys): As\nexamples of nonlinear generative models [28], we consider two simple models of the discrete-time\ndynamics of the calcium sensor or dye. In the \ufb01rst (SCDF), the concentration of \ufb02uorescent dye\nmolecules dt is a function of the somatic Calcium concentration ct, and has Dynamics\n\ndt \u2212 dt\u22121 = \u03baonc\u03b7\n\nt ([D] \u2212 dt\u22121) \u2212 \u03bao\ufb00 dt\u22121,\n\nft = \u03b1dt + \u03b2 + et,\n\n(3)\n\nwhere \u03baon and \u03bao\ufb00 are the rates at which the calcium sensor binds and unbinds calcium ions, and \u03b7 is\na Hill coef\ufb01cient. We constrained these parameters to be non-negative. [D] is the total concentration\nof the dye molecule in the soma, which sets the maximum possible value of dt. The richer dynamics\nof the SCDF model allow for facilitation of \ufb02uorescence at low \ufb01ring rates, and saturation at high\nrates. The parameters of the SCDF model are \u03b8 = {\u03b1, \u03b2, \u03b3, \u03baon, \u03bao\ufb00 , \u03b7, [D], \u03c32}.\nThe second nonlinear model (MLphys) is a discrete-time version of the MLspike generative model\n[12], simpli\ufb01ed by not including a model of the time-varying baseline. The dynamics for ft and ct\nare as above, with \u03b4 = 1. We replace the dynamics for dt by\n\ndt \u2212 dt\u22121 =\n\n1\n\u03c4on\n\n(1 + \u03c9((c0 + ct)\u03b7 \u2212 c\u03b7\n\n0))(\n\n((c0 + ct)\u03b7 \u2212 c\u03b7\n0)\n(1 + \u03c9((c0 + ct)\u03b7 \u2212 c\u03b7\n0))\n\n\u2212 dt\u22121).\n\n(4)\n\nMulti-dimensional soma + dendrite generative model (DS-F-DEN): The dendritic generative\nmodel is a multi-dimensional SCDF model that incorporates back-propagating action potentials\n(bAPs). The calcium concentration at the cell body (superscript c) is generated as for SCDF, whereas\nfor the spine (superscript s), there are two components: synaptic inputs and bAPs from the soma,\n\ncc\nt =\n\n\u03b3c\nt(cid:48)cc\n\nt\u2212t(cid:48) + \u03b4csc\nt ,\n\ncs\nt =\n\n\u03b3s\nt(cid:48)cs\n\nt\u2212t(cid:48) + \u03b4sss\n\nt + \u03b4bs sc\nt ,\n\n(5)\n\nwhere \u03b4bs are the amplitude coef\ufb01cients of bAPs for different spine locations, and c \u2208 {1, ..., Nc},\ns \u2208 {1, ..., Ns}. The spines and soma share the same dye dynamics as in (3). The parameters of the\ndendritic integration model are \u03b8 = {\u03b1s,c, \u03b2s,c, \u03b3s,c, \u03baon, \u03bao\ufb00 , \u03b7, [D], \u03c32\ns,c}. We note that this simple\ngenerative model does not attempt to capture the full complexity of nonlinear processing in dendrites\n(e.g. it does not incorporate nonlinear phenomena such as dendritic plateau potentials). Its goal is\nto separate local in\ufb02uences (synaptic inputs) from global events (bAPs, or potentially regenerative\ndendritic events).\n\n4\n\np(cid:88)\n\nt(cid:48)=1\n\np(cid:88)\n\nt(cid:48)=1\n\n\f2.3 Recognition models: parametrization of the approximate posterior q\u03c6(s|f )\nThe goal of the recognition model is to provide a fast and ef\ufb01cient approximation q\u03c6(s|f ) to the\ntrue posterior p(s|f ) over discrete latent spike trains s. We will use both a factorized, localized\napproximation (parameterized as a convolutional neural network), and a more \ufb02exible, non-factorized\nand non-localized approximation (parameterized using additional recurrent neural networks).\n\nConvolutional neural network: Factorized posterior approximation (DS-F)\nIn [15], it was\nreported that good spike-prediction performance can be achieved by making the spike probability\nq\u03c6(st|ft\u2212\u03c4...t+\u03c4 ) depend on a local window of the \ufb02uorescence trace of length 2\u03c4 + 1 centered at t\nwhen training such a model fully supervised. We implement a scaled up version of this idea, using a\ndeep neural network which is convolutional in time as the recognition model. We use architectures\nwith up to \ufb01ve hidden layers and \u2248 20 \ufb01lters per layer with Leaky ReLUs units [29]. The output\nlayer uses a sigmoid nonlinearity to compute the Bernoulli spike probabilities q\u03c6(st|f ).\n\nfactorize the joint distribution over spikes as q\u03c6(s|f ) =(cid:81)\n\nRecurrent neural network: Capturing temporal correlations in the posterior (DS-NF) The\nfully-factorized posterior approximation (DS-F) above ignores temporal correlations in the posterior\nover spike trains. Such correlations can be useful in modeling uncertainty in the precise timing of a\nspike, which induces negative correlations between nearby time bins. To model temporal correlations,\nwe developed a RNN-based non-factorizing distribution which can approach the true posterior in the\nnon-parametric limit (see \ufb01gure 1B). Similar to [16], we use the temporal ordering over spikes and\nt q\u03c6(st|f, s0, ..., st\u22121), by conditioning\nspikes at t on all previously sampled spikes. Our RNN uses a CNN as described above to extract\nfeatures from the input trace. Additional input is provided by a a backwards RNN which also receives\ninput from the CNN features. The outputs of the forward RNN and CNN are transformed into\nBernoulli spike probabilities q\u03c6(st|f ) through a dense sigmoid layer. This probability and the sample\ndrawn from it are relayed to the forward RNN in the next time step. Forward and backward RNN\nhave a single layer with 64 gated recurrent units each [30].\n\n2.4 Details of synthetic and real data and evaluation methodology\n\nWe evaluated our method on simulated and experimental data. From our SCF and SCDF generative\nmodels for spike-inference, we simulated traces of length T = 104 assuming a recording frequency\nof 60 Hz. Initial parameters where obtained by \ufb01tting the models to real data (see below), and\nheterogeneity across neurons was achieved by randomly perturbing parameters. We used 50 neurons\neach for training and validation and 100 neurons in the test set. For each cell, we generated three\ntraces with \ufb01ring rates of 0.6, 0.9 and 1.1 Hz, assuming i.i.d. spikes.\nFinally, we compared methods on two-photon imaging data from 9 + 11 cells from [2], which is\navailable at www.crcns.org. Layer 2/3 pyramidal neurons in mouse visual cortex were imaged at 60 Hz\nusing the genetically encoded calcium-indicators GCaMP6s and GCaMP6f, while action-potentials\nwere measured electrophysiologically using cell-attached recordings. Data was pre-processed by\nremoving a slow moving baseline using the 5th percentile in a window of 6000 time steps. Furthermore\nwe used this baseline estimate to calculate \u2206F/F . Cross-validated results where obtained using 4\nfolds, where we trained and validated on 3/4 of the cells in each dataset and tested on the remaining\ncells to highlight the potential for amortized inference. Early stopping was performed based on the\nthe correlation achieved on the train/validation set, which was evaluated every 100 update steps.\nWe report results using the cross-correlation between true and predicted spike-rates, at the sampling\ndiscretization of 16.6 ms for simulated data and 40 ms for real data. As the predictions of our DS-NF\nmodel are not deterministic, we sample 30 times from the model and average over the resulting\nprobability distributions to obtain an estimate of the marginal probability before we calculate cross-\ncorrelations.\nWe used multiple generative models to show that our inference algorithm is not tied to a particular\nmodel: SCDF for the experiments depicted in Fig. 2, SCF for a comparison with established methods\nbased on this linear model (Table 1, column 1), and MLphys on real data as it is used by the current\nstate-of-the-art inference algorithm (Table 1, columns 2 & 3, Fig. 3).\n\n5\n\n\fFigure 2: Model-inversion with variational autoencoders, simulated data A) Illustration of\nfactorized (CNN, DS-F) and non-factorized posterior approximation (RNN, DS-NF) on simulated\ndata (SCDF generative model). DS-NF yields more accurate reconstructions, but both methods lead\nto similar marginal predictions (i.e. predicted \ufb01ring rates, bottom). B) Number of spikes sampled for\nevery true spike for the factorized (red) and non-factorized (red) posterior. The correlated posterior\nconsistently samples the correct number of spikes while still accounting for the uncertainty in the\nspike timing. C) Performance of amortized vs non-amortized inference on simulated data. D) Scatter\nplots of achieved log-likelihood of the true spike train under the posterior model (top) and achieved\ncorrelation coef\ufb01cients between the marginalized spiking probabilities and true spike trains (bottom).\n\n3 Results\n\n3.1 Stochastic variational spike inference of factorized and correlated posteriors\n\nWe \ufb01rst illustrate our approach on synthetic data, and compare our two different architectures for\nrecognition models. We simulated data from the SCDF nonlinear generative model and trained\nDeepSpike unsupervised using the same SCDF model. While only the more expressive recognition\nmodel (DS-NF) is able to achieve a close-to-perfect reconstructions of the \ufb02uorescence traces (Fig. 2\nA, top row), both approaches yield similar marginal \ufb01ring rate predictions (second row). However,\nas the factorized model does not model correlations in the posterior, it yields higher variance in the\nnumber of spikes reconstructed for each true spike (Fig. 2 B). This is because the factorized model\ncan not capture that a \ufb02uorescence increase might be \u2018explained away\u2019 by a spike that has just been\nsampled, i.e. it can not capture the difference between uncertainty in spike-timing and uncertainty in\n(local) spike-counts. Therefore, while both approaches predict \ufb01ring rates similarly well on simulated\ndata (as quanti\ufb01ed using correlation, Fig. 2 D), the DS-NF model assigns higher posterior probability\nto the true spike trains.\n\n3.2 Amortizing inference leads to fast and accurate test-time inference\n\nIn principle, our unsupervised learning procedure could be re-trained on every data-set of interest.\nHowever, it also allows for amortizing inference by sharing one recognition model across multiple\ncells, and applying the recognition model directly on new data without additional training for fast\ntest-time performance. Amortized inference allows for the recognition model to be used for inference\nin the same way as a network that was trained fully supervised. Since there is no variational\noptimization at test time, inference with this network is just as fast as inference with a supervised\nnetwork. Similarly to supervised learning, there will be limitations on the ability of this network to\ngeneralize to different imaging conditions or indicators that where not included in the training set.\nTo test if our recognition model generalizes well enough for amortized inference to work across\nmultiple cells, as well as on cells it did not see during training, we trained one DS-NF model on 50\n\n6\n\n02468ATrue spikesTraceReconstruction | DS-FReconstruction | DS-NF0.00.51.0Marginal probabilitySampled spiketrains 02468Time in seconds0.40.60.81.0Amortized network0.40.60.81.0Single cell inferenceMean correlation: 0.77Mean correlation: 0.80C01002003004005005004003002001000Correlated posterior Loglikelihood (True spiketrain)D0.40.60.81.0Factorized posterior0.40.60.81.0Correlation (Marginal probability)0123Sampled spikes / True spike0100200300BDS-FDS-NF\fcells (simulated data, SCDF) and evaluated its performance on a non-overlapping set of 30 cells. For\ncomparison, we also trained 30 DS-NF models separately, on each of those cells\u2013 this amounts to\nstandard variational inference using a neural network to parametrize the posterior approximation,\nbut without amortizing inference. We found that amortizing inference only causes a small drop\nin performance (Fig. 2 C). However, this drop in performance is offset by the the large gain in\ncomputational ef\ufb01ciency as training a neural network takes several orders of magnitude more time\nthen applying it at test time.\nInference using the DS-F model only requires a single forward pass through a convolutional network\nto predict \ufb01ring rates, and DS-NF requires running a stochastic RNN for each sampled spike train.\nWhile the exact running-time of each of these applications will depend on both implementation\nand hardware, we give rough indications of computational speed number estimated on an Intel(R)\nXeon(R) CPU E5-2697 v3. On the CPU, our DS-F approach takes 0.05 s to process a single trace of\n10K time steps, when using a network appropriate for 60 Hz data. This is on the same order as the\n0.07 s (Intel Core i5 2.7 GHz CPU) reported by [31] for their OASIS algorithm, which is currently\nthe fastest available implementation for constrained deconvolution (CDEC) of SCF, but restricted to\nthis linear generative model. The DS-NF algorithm requires 4.6 s which still compares favourably\nto MLspike which takes 9.2 s (evaluated on the same CPU). As our algorithm is implemented in\nTheano [32] it can be easily accelerated and allows for massive parallelization on a single GPU. On a\nGTX Titan X, DS-F and DS-NF take 0.001 s and 1.5 s, respectively. When processing 500 traces in\nparallel, DS-NF becomes only 2.5 times slower. Extrapolating from these results, this implies that\neven when using the DS-NF algorithm, we would be able to perform spike-inference on 1 hour of\nrecordings at 60 Hz for 500 cells in less then 90 s.\n\nTable 1: Performance comparison. Values are correlations between predicted marginal probabilities\nand ground truth spikes.\n\nAlgorithm\nDS-F\nDS-NF\nCDEC [10]\nMCMC [9]\nMLSpike [12]\nDS-F-DEN\nFoopsi-RR [2]\n\nDataset\nSCF-Sim.\n0.88 \u00b1 0.01\n0.89 \u00b1 0.01\n0.86 \u00b1 0.01\n0.87 \u00b1 0.01\n\nGCaMP6s\n0.74 \u00b1 0.02\n0.72 \u00b1 0.02\n0.39 \u00b1 0.03 *\n0.47 \u00b1 0.03 *\n0.60 \u00b1 0.02 *\n\nGCaMP6f\n0.74 \u00b1 0.02\n0.73 \u00b1 0.02\n0.58 \u00b1 0.02 *\n0.53 \u00b1 0.03 *\n0.67 \u00b1 0.01 *\n\nDendritic dataset\nSoma\n\nSpine\n\n0.84 \u00b1 0.01\n0.66 \u00b1 0.02\n\n0.78 \u00b1 0.01\n0.60 \u00b1 0.01\n\n3.3 DS achieves competitive results on simulated and publicly available imaging data\n\nThe advantages of our framework (black-box inference for different generative models, fast test-\ntime performance through amortization, correlated posteriors through RNNs) are only useful if the\napproach can also achieve competitive performance. To demonstrate that this is the case, we compare\nour approach to alternative generative-model based spike prediction methods on data sampled from\nthe SCF model\u2013 as this is the generative model underlying commonly used methods [10, 9], it is\ndif\ufb01cult to beat their performance on this data. We \ufb01nd that both DS-F and DS-NF achieve competitive\nperformance, as measured by correlation between predicted \ufb01ring rates and true (simulated) spike\ntrains (Table 1, left column. Values are means and standard error of the mean calculated over cells).\nTo evaluate our performance on real data we compare to the current state-of-the-art method for spike\ninference based on generative models[12]. For these experiments we trained separate models on each\nof the GCaMP variants using the MLspike generative model. We achieve competitive accuracy to\nthe results in [12] (see Table 1, values marked with an asterisk are taken from [12], Fig. 6d) and\nclearly outperform methods that are based on the linear SCF model. We note that, while our method\nperforms inference in an unsupervised fashion and is trained using an un-supervised objective, we\ninitialized our generative model with the mean values given in [12] (Fig. S6a), which were obtained\nusing ground truth data. An example of inference and reconstruction using the DS-NF model is\nshown in Fig. 3. The reconstruction based on the true spikes (purple line) was obtained using the\ngenerative model parameters which had been acquired from unsupervised learning. This explains why\nthe reconstruction using the inferred spikes is more accurate and suggests that there is a mismatch\n\n7\n\n\fFigure 3: Inference and reconstruction using the DS-NF algorithm on GECI data. The recon-\nstruction based on the inferred spike trains (blue) shows that the algorithm converges to a good joint\nmodel while the reconstruction based on the true spikes (purple) shows a mismatch of the generative\nmodel for high activity which results in an overestimate of the overall \ufb01ring rate.\n\nbetween the MLphys model and the true data-generating generating process. Developing more\naccurate generative models would therefore likely further increase the performance of the algorithm.\n\nInference of somatic spikes and synaptic input spikes from simulated dendritic\nFigure 4:\nimaging data. We simulated imaging data from our generative model, and compared our approach\n(DS-F-DEN) to an analysis inspired by [2] (Foopsi-RR), and found that our method can extract\nsynaptic inputs more accurately. Traces at the soma and spines are used to infer somatic spikes and\nsynaptic inputs at spines. Top: somatic trace and predictions. DS-F-DEN produces better predictions\nat the soma since it uses all traces to infer global events. Bottom: spine trace and predictions.\nDS-F-DEN performs better in terms of extracting synaptic inputs.\n\n3.4 Extracting putative synaptic inputs from calcium imaging in dendritic spines\n\nWe generalized the DeepSpike variational-inference approach to perform simultaneous inference of\nbackpropagating APs and synaptic inputs, imaged jointly across the entire neuronal dendritic arbor.\nWe illustrate this idea on synthetic data based on the DS-F-DEN generative model (5). We simulated\n15 cells each with 10 dendritic spines with a range of \ufb01ring rates and noise levels. We then used a\nmulti-input multi-output convolutional neural network (CNN, DS-F) in the non-amortized setting to\ninfer a fully-factorized Bernoulli posterior distribution over global action potentials and local synaptic\nevents.\nWe compared our results to an analysis technique inspired by [2] which we call Foopsi-RR. We \ufb01rst\napply constrained deconvolution [33] to somatic and dendritic calcium traces, and then use robust\n\n8\n\nCorr: 0.73Spikes: 41.74 / 35.0GCaMP6sCorr. posteriorTrue spikesTracePrediction | Infered spiketrainPrediction | True spiketrain0.00.51.0Marginal probability01020304050Time in secondsTrue soma spikesSoma trace0.00.51.0Marginal probabilityInferred: DS-F-DENInferred: FOOPSI-RRTrue synaptic inputsSpine trace024681012Time in seconds0.00.51.0Marginal probabilityCell cartoon\flinear regression to identify and subtract deconvolved components of the spine signal that correlated\nwith global back-propagated action potential. Compared to the method suggested by [2], our model\nis signi\ufb01cantly more accurate. The average correlation of our model is 0.84 for soma and 0.78 for\nspines, whereas for Foopsi-RR the average correlation is 0.66 for soma and 0.60 for spines (Table 1).\n\n4 Discussion\n\nSpike inference is an important step in the analysis of \ufb02uorescence imaging. We here propose a\nstrategy based on variational autoencoders that combines the advantages of generative [7] and dis-\ncriminative approaches [15]. The generative model makes it possible to incorporate knowledge about\nunderlying mechanisms and thus learn from unlabeled data. A simultaneously-learned recognition\nnetwork allows fast test-time performance, without the need for expensive optimization or MCMC\nsampling. This opens up the possibility of scaling up spike inference to very large neural populations\n[34], and to real-time and closed-loop applications. Furthermore, our approach is able to estimate full\nposteriors rather than just marginal \ufb01ring rates.\nIt is likely that improvements in performance and interpretability will result from the design of\nbetter, biophysically accurate and possibly dye-, cell-type- and modality-speci\ufb01c models of the\n\ufb02uorescence measurement process, the dynamics of neurons [28] and indicators, as well as from\ntaking spatial information into account. Our goal here is not to design such models or to improve\naccuracy per se, but rather to develop an inference strategy which can be applied to a large class\nof such potential generative models without model-speci\ufb01c modi\ufb01cations: A trained recognition\nmodel that can invert, and provide fast test-time performance, for any such model while preserving\nperformance in spike-detection.\nOur recognition model is designed to serve as the common approximate posterior for multiple,\npossibly heterogeneous populations of cells, requiring an expressive model. These assumptions are\nsupported by prior work [15] and our results on simulated and publicly available data, but might be\nsuboptimal or not appropriate in other contexts, or for other performance measures. In particular, we\nemphasize that our comparisons are based on a speci\ufb01c data-set and performance measure which\nis commonly used for comparing spike-inference algorithms, but which can in itself not provide\nconclusive evidence for performance in other settings and measures. Our approach includes rich\nposterior approximations [35] based on RNNs to make predictions using longer context-windows and\nmodelling posterior correlations. Possible extensions include causal recurrent recognition models for\nreal-time spike inference, which would require combining them with fast algorithms for detecting\nregions of interest from imaging-movies [10, 36]. Another promising avenue is extending our\nvariational inference approach so it can also learn from available labeled data to obtain a semi-\nsupervised algorithm [37].\nAs a statistical problem, spike inference has many similarities with other analysis problems in\nbiological imaging\u2013 an underlying, sparse signal needs to be reconstructed from spatio-temporal\nimaging observations, and one has substantial prior knowledge about the image-formation process\nwhich can be encapsulated in generative models. As a concrete example of generalization, we\nproposed an extension to multi-dimensional inference of inputs from dendritic imaging data, and\nillustrated it on simulated data. We expect the approach pursued here to also be applicable in other\ninference tasks, such as the localization of particles from \ufb02uorescence microscopy [38].\n\n5 Acknowledgements\n\nWe thank T. W. Chen, K. Svoboda and the GENIE project at Janelia Research Campus for sharing\ntheir published GCaMP6 data, available at http://crcns.org. We also thank T. Deneux for sharing his\nresults for comparison and comments on the manuscript and D. Greenberg, L. Paninski and A. Mnih\nfor discussions. This work was supported by SFB 1089 of the German Research Foundation (DFG)\nto J. H. Macke. A. Speiser was funded by an IMPRS for Brain & Behavior scholarship by the Max\nPlanck Society.\n\n9\n\n\fReferences\n\n[1] R. Y. Tsien, \u201cNew calcium indicators and buffers with high selectivity against magnesium and protons:\ndesign, synthesis, and properties of prototype structures,\u201d Biochemistry, vol. 19, no. 11, pp. 2396\u20132404,\n1980.\n\n[2] T.-W. Chen, T. J. Wardill, Y. Sun, S. R. Pulver, S. L. Renninger, A. Baohan, E. R. Schreiter, R. A. Kerr,\nM. B. Orger, V. Jayaraman, L. L. Looger, K. Svoboda, and D. S. Kim, \u201cUltrasensitive \ufb02uorescent proteins\nfor imaging neuronal activity,\u201d Nature, vol. 499, no. 7458, pp. 295\u2013300, 2013.\n\n[3] J. N. D. Kerr and W. Denk, \u201cImaging in vivo: watching the brain in action,\u201d Nat Rev Neurosci, vol. 9,\n\npp. 195\u2013205, Mar 2008.\n\n[4] C. Grienberger and A. Konnerth, \u201cImaging calcium in neurons.,\u201d Neuron, vol. 73, no. 5, pp. 862\u2013885,\n\n2012.\n\n[5] S. L. Smith, I. T. Smith, T. Branco, and M. H\u00e4usser, \u201cDendritic spikes enhance stimulus selectivity in\n\ncortical neurons in vivo,\u201d Nature, vol. 503, no. 7474, pp. 115\u2013120, 2013.\n\n[6] T.-W. Chen, T. J. Wardill, Y. Sun, S. R. Pulver, S. L. Renninger, A. Baohan, E. R. Schreiter, R. A. Kerr,\nM. B. Orger, V. Jayaraman, et al., \u201cUltrasensitive \ufb02uorescent proteins for imaging neuronal activity,\u201d\nNature, vol. 499, no. 7458, pp. 295\u2013300, 2013.\n\n[7] J. T. Vogelstein, B. O. Watson, A. M. Packer, R. Yuste, B. Jedynak, and L. Paninski, \u201cSpike inference from\ncalcium imaging using sequential monte carlo methods,\u201d Biophysical journal, vol. 97, no. 2, pp. 636\u2013655,\n2009.\n\n[8] J. T. Vogelstein, A. M. Packer, T. A. Machado, T. Sippy, B. Babadi, R. Yuste, and L. Paninski, \u201cFast\nnonnegative deconvolution for spike train inference from population calcium imaging,\u201d Journal of neuro-\nphysiology, vol. 104, no. 6, pp. 3691\u20133704, 2010.\n\n[9] E. Pnevmatikakis, J. Merel, A. Pakman, L. Paninski, et al., \u201cBayesian spike inference from calcium\nimaging data,\u201d in Signals, Systems and Computers, 2013 Asilomar Conference on, pp. 349\u2013353, IEEE,\n2013.\n\n[10] E. A. Pnevmatikakis, D. Soudry, Y. Gao, T. A. Machado, J. Merel, D. Pfau, T. Reardon, Y. Mu, C. Lace\ufb01eld,\nW. Yang, et al., \u201cSimultaneous denoising, deconvolution, and demixing of calcium imaging data,\u201d Neuron,\n2016.\n\n[11] E. Ganmor, M. Krumin, L. F. Rossi, M. Carandini, and E. P. Simoncelli, \u201cDirect estimation of \ufb01ring rates\n\nfrom calcium imaging data,\u201d arXiv preprint arXiv:1601.00364, 2016.\n\n[12] T. Deneux, A. Kaszas, G. Szalay, G. Katona, T. Lakner, A. Grinvald, B. R\u00f3zsa, and I. Vanzetta, \u201cAccurate\nspike estimation from noisy calcium signals for ultrafast three-dimensional imaging of large neuronal\npopulations in vivo,\u201d Nature Communications, vol. 7, 2016.\n\n[13] M. Pachitariu, C. Stringer, M. Dipoppa, S. Schr\u00f6der, L. F. Rossi, H. Dalgleish, M. Carandini, and K. D.\n\nHarris, \u201cSuite2p: beyond 10,000 neurons with standard two-photon microscopy,\u201d bioRxiv, 2017.\n\n[14] D. Greenberg, D. Wallace, J. Vogelstein, and J. Kerr, \u201cSpike detection with biophysical models for gcamp6\nand other multivalent calcium indicator proteins,\u201d 2015 Neuroscience Meeting Planner. Washington, DC:\nSociety for Neuroscience, 2015.\n\n[15] L. Theis, P. Berens, E. Froudarakis, J. Reimer, M. Rom\u00e1n Ros\u00f3n, T. Baden, T. Euler, A. S. Tolias, and\nM. Bethge, \u201cBenchmarking spike rate inference in population calcium imaging,\u201d Neuron, vol. 90, no. 3,\npp. 471\u201382, 2016.\n\n[16] A. v. d. Oord, N. Kalchbrenner, and K. Kavukcuoglu, \u201cPixel recurrent neural networks,\u201d arXiv preprint\n\narXiv:1601.06759, 2016.\n\n[17] H. Larochelle and I. Murray, \u201cThe neural autoregressive distribution estimator.,\u201d in AISTATS, vol. 1, p. 2,\n\n2011.\n\n[18] D. J. Rezende, S. Mohamed, and D. Wierstra, \u201cStochastic backpropagation and approximate inference in\n\ndeep generative models,\u201d arXiv preprint arXiv:1401.4082, 2014.\n\n[19] D. P. Kingma and M. Welling, \u201cAuto-encoding variational bayes,\u201d arXiv preprint arXiv:1312.6114, 2013.\n[20] M. Titsias and M. L\u00e1zaro-Gredilla, \u201cDoubly stochastic variational bayes for non-conjugate inference,\u201d in\nProceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 1971\u20131979, 2014.\n[21] Y. Burda, R. Grosse, and R. Salakhutdinov, \u201cImportance weighted autoencoders,\u201d arXiv preprint\n\narXiv:1509.00519, 2015.\n\n[22] A. Mnih and K. Gregor, \u201cNeural variational inference and learning in belief networks,\u201d arXiv preprint\n\narXiv:1402.0030, 2014.\n\n[23] A. Mnih and D. J. Rezende, \u201cVariational inference for monte carlo objectives,\u201d in Proceedings of the 33st\n\nInternational Conference on Machine Learning, 2016.\n\n10\n\n\f[24] D. P. Kingma, S. Mohamed, D. J. Rezende, and M. Welling, \u201cSemi-supervised learning with deep\n\ngenerative models,\u201d in Advances in Neural Information Processing Systems, pp. 3581\u20133589, 2014.\n\n[25] L. Maaloe, C. K. Sonderby, S. K. S\u00f8nderby, and O. Winther, \u201cImproving semi-supervised learning with\nauxiliary deep generative models,\u201d in NIPS Workshop on Advances in Approximate Bayesian Inference,\n2015.\n\n[26] D. Kingma and J. Ba, \u201cAdam: A method for stochastic optimization,\u201d arXiv preprint arXiv:1412.6980,\n\n2014.\n\n[27] R. Pascanu, T. Mikolov, and Y. Bengio, \u201cOn the dif\ufb01culty of training recurrent neural networks.,\u201d ICML\n\n(3), vol. 28, pp. 1310\u20131318, 2013.\n\n[28] V. Rahmati, K. Kirmse, D. Markovi\u00b4c, K. Holthoff, and S. J. Kiebel, \u201cInferring neuronal dynamics from\ncalcium imaging data using biophysical models and bayesian inference,\u201d PLoS Comput Biol, vol. 12, no. 2,\np. e1004736, 2016.\n\n[29] A. L. Maas, A. Y. Hannun, and A. Y. Ng, \u201cRecti\ufb01er nonlinearities improve neural network acoustic models,\u201d\n\nin Proc. ICML, vol. 30, 2013.\n\n[30] K. Cho, B. Van Merri\u00ebnboer, D. Bahdanau, and Y. Bengio, \u201cOn the properties of neural machine translation:\n\nEncoder-decoder approaches,\u201d arXiv preprint arXiv:1409.1259, 2014.\n\n[31] J. Friedrich, P. Zhou, and L. Paninski, \u201cFast Active Set Methods for Online Deconvolution of Calcium\n\nImaging Data,\u201d arXiv.org, Sept. 2016.\n\n[32] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley,\nand Y. Bengio, \u201cTheano: A cpu and gpu math compiler in python,\u201d in Proc. 9th Python in Science Conf,\npp. 1\u20137, 2010.\n\n[33] E. A. Pnevmatikakis, Y. Gao, D. Soudry, D. Pfau, C. Lace\ufb01eld, K. Poskanzer, R. Bruno, R. Yuste, and\nL. Paninski, \u201cA structured matrix factorization framework for large scale calcium imaging data analysis,\u201d\narXiv preprint arXiv:1409.2903, 2014.\n\n[34] M. B. Ahrens, J. M. Li, M. B. Orger, D. N. Robson, A. F. Schier, F. Engert, and R. Portugues, \u201cBrain-wide\n\nneuronal dynamics during motor adaptation in zebra\ufb01sh,\u201d Nature, vol. 485, pp. 471\u20137, May 2012.\n\n[35] C. K. Sonderby, T. Raiko, L. Maaloe, S. K. Sonderby, and O. Winther, \u201cHow to train deep variational\n\nautoencoders and probabilistic ladder networks,\u201d arXiv preprint arXiv:1602.02282, 2016.\n\n[36] N. Apthorpe, A. Riordan, R. Aguilar, J. Homann, Y. Gu, D. Tank, and H. S. Seung, \u201cAutomatic neuron\ndetection in calcium imaging data using convolutional networks,\u201d in Advances In Neural Information\nProcessing Systems, pp. 3270\u20133278, 2016.\n\n[37] L. Maal\u00f8e, C. K. S\u00f8nderby, S. K. S\u00f8nderby, and O. Winther, \u201cImproving semi-supervised learning with\nauxiliary deep generative models,\u201d in NIPS Workshop on Advances in Approximate Bayesian Inference,\n2015.\n\n[38] E. Betzig, G. H. Patterson, R. Sougrat, O. W. Lindwasser, S. Olenych, J. S. Bonifacino, M. W. Davidson,\nJ. Lippincott-Schwartz, and H. F. Hess, \u201cImaging intracellular \ufb02uorescent proteins at nanometer resolution,\u201d\nScience, vol. 313, no. 5793, pp. 1642\u20131645, 2006.\n\n11\n\n\f", "award": [], "sourceid": 2141, "authors": [{"given_name": "Artur", "family_name": "Speiser", "institution": "research center caesar, an associate of the Max Planck Society"}, {"given_name": "Jinyao", "family_name": "Yan", "institution": "Janelia Research Campus"}, {"given_name": "Evan", "family_name": "Archer", "institution": null}, {"given_name": "Lars", "family_name": "Buesing", "institution": "DeepMind"}, {"given_name": "Srinivas", "family_name": "Turaga", "institution": "Janelia Research Campus, Howard Hughes Medical Institute"}, {"given_name": "Jakob", "family_name": "Macke", "institution": "research center caesar, an associate of the Max Planck Society"}]}