{"title": "Large-scale biophysical parameter estimation in single neurons via constrained linear regression", "book": "Advances in Neural Information Processing Systems", "page_first": 25, "page_last": 32, "abstract": null, "full_text": "Large-scale biophysical parameter estimation in single neurons via constrained linear regression\n\nMisha B. Ahrens , Quentin J.M. Huys , Liam Paninski Gatsby Computational Neuroscience Unit University College London {ahrens, qhuys, liam}@gatsby.ucl.ac.uk\n\nAbstract\nOur understanding of the input-output function of single cells has been substantially advanced by biophysically accurate multi-compartmental models. The large number of parameters needing hand tuning in these models has, however, somewhat hampered their applicability and interpretability. Here we propose a simple and well-founded method for automatic estimation of many of these key parameters: 1) the spatial distribution of channel densities on the cell's membrane; 2) the spatiotemporal pattern of synaptic input; 3) the channels' reversal potentials; 4) the intercompartmental conductances; and 5) the noise level in each compartment. We assume experimental access to: a) the spatiotemporal voltage signal in the dendrite (or some contiguous subpart thereof, e.g. via voltage sensitive imaging techniques), b) an approximate kinetic description of the channels and synapses present in each compartment, and c) the morphology of the part of the neuron under investigation. The key observation is that, given data a)-c), all of the parameters 1)-4) may be simultaneously inferred by a version of constrained linear regression; this regression, in turn, is efficiently solved using standard algorithms, without any \"local minima\" problems despite the large number of parameters and complex dynamics. The noise level 5) may also be estimated by standard techniques. We demonstrate the method's accuracy on several model datasets, and describe techniques for quantifying the uncertainty in our estimates.\n\n1\n\nIntroduction\n\nThe usual tradeoff in parameter estimation for single neuron models is between realism and tractability. Typically, the more biophysical accuracy one tries to inject into the model, the harder the computational problem of fitting the model's parameters becomes, as the number of (nonlinearly interacting) parameters increases (sometimes even into the thousands, in the case of complex multicompartmental models).\n These authors contributed equally. Support contributed by the Gatsby Charitable Foundation (LP, MA), a Royal Society International Fellowship (LP), the BIBA consortium and the UCL School of Medicine (QH). We are indebted to P. Dayan, M. Hausser, M. London, A. Roth, and S. Roweis for helpful and interesting discussions, and to R. Wood for channel definitions.\n\n\f\nPrevious authors have noted the difficulties of this large-scale, simultaneous parameter estimation problem, which are due both to the highly nonlinear nature of the \"cost functions\" minimized (e.g., the percentage of correctly-predicted spike times [1]) and the abundance of local minima on the very large-dimensional allowed parameter space [2, 3]. Here we present a method that is both computationally tractable and biophysically detailed. Our goal is to simultaneously infer the following dendritic parameters: 1) the spatial distribution of channel densities on the cell's membrane; 2) the spatiotemporal pattern of synaptic input; 3) the channels' reversal potentials; 4) the intercompartmental conductances; and 5) the noise level in each compartment. Achieving this somewhat ambitious goal comes at a price: our method assumes that the experimenter a) knows the geometry of the cell, b) has a good understanding of the kinetics of the channels present in each compartment, and c) most importantly, is able to observe the spatiotemporal voltage signal on the dendritic tree, or at least a fraction thereof (e.g. by voltage-sensitive imaging methods; in electrotonically compact cells, single electrode recordings can be used). The key to the proposed method is to recognise that, when we condition on data a)-c), the dynamics governing this observed spatiotemporal voltage signal become linear in the parameters we are seeking to estimate (even though the system itself may behave highly nonlinearly), so that the parameter estimation can be recast into a simple constrained linear regression problem (see also [4, 5]). This implies, somewhat counterintuitively, that optimizing the likelihood of the parameters in this setting is a convex problem, with no non-global local extrema. Moreover, linearly constrained quadratic optimization is an extremely well-studied problem, with many efficient algorithms available. We give examples of the resulting methods successfully applied to several types of model data below. In addition, we discuss methods for incorporating prior knowledge and analyzing uncertainty in our estimates, again basing our techniques on the well-founded probabilistic regression framework.\n\n2\n\nMethods\n\nBiophysically accurate models of single cells are typically formulated compartmentally a set of first-order coupled differential equations that form a spatially discrete approximation to the cable equations. Modeling the cell under investigation in this discretized manner, a typical equation describing the voltage in compartment x is i d Cx dVx (t) = ai,x Ji,x (t) + Ix (t) t + x dNx,t . (1) Here x Nx,t is evolution (current) noise and Ix (t) is externally injected current. Dropping the subscript x where possible, the terms ai Ji (t) represent currents due to: 1. voltage mismatch in neighbouring compartments, fx,y (Vy (t) - Vx (t)), 2. synaptic input, gs (t)(Es - V (t)) , 3. membrane channels, active (voltage-dependent) or passive, gj gj (t)(Ej - V (t)). Here ai are parameters to be inferred: 1. the intercompartmental conductances fx,y , 2. the spatiotemporal input from synapse s, us (t), from which gs (t) is obtained by dgs (t)/dt = -gs (t)/s + us (t), (2)\n\na linear convolution operation (the synaptic kinetic parameter s is assumed known) which may be written in matrix notation gs = Ku.\n\n\f\n3. the ion channel concentrations gj . The open probabilities of channel j , gj (t), are obtained from the channel kinetics, which are assumed to evolve deterministically, with a known dependence on V , as in the Hodgkin-Huxley model, gN a = m3 h, m dm(t)/dt = m (V ) - m, (3)\n\nand similarly for h. Again, we emphasize that the kinetic parameters m and m (V ) are assumed known; only the inhomogeneous concentrations are unknown. (For passive channels gj is taken constant and independent of voltage.) The parameters 1-3 are relative to membrane capacitance Cx .1 When modeling the dynamics of a single neuron according to (1), the voltage V (t) and channel kinetics gj (t) are typically evolved in parallel, according to the injected current I (t) and synaptic inputs us (t). Suppose, on the other hand, that we have observed the voltage Vx (t) in each compartment. Since we have assumed we also know the channel kinetics (equation 3), the synaptic kinetics (equation 2) and the reversal potentials Ej of the channels present in each compartment, we may decouple the equations and determine the open probabilities gj,x (t) for t [0, T ]. This, in turn, implies that the currents Ji,x (t) and voltage differentials Vx (t) are all known, and we may interpret equation 1 as a regression equation, linear in the unknown parameters ai , instead of an evolution equation. This is the key observation of this work. Thus we can use linear regression methods to simultaneously infer optimal values of the parameters {gj,x , us,x (t), fx,y }2 . More precisely, rewrite equation (1) in matrix form, V = Ma + , where each column of the matrix M is composed of one of the known currents {Ji (t), t [0, T ]} (with T the length of the experiment) and the column vectors V, a, and are defined in the obvious way. Then ^ aopt = arg min V - Ma 2 . 2\na\n\n(4)\n\nIn addition, since on physical grounds the channel concentrations, synaptic input, and conductances must be non-negative, we require our solution ai 0. The resulting linearlyconstrained quadratic optimization problem has no local minima (due to the convexity of the objective function and of the domain gi 0), and allows quadratic programming (QP) tools (e.g., quadprog.m in Matlab) to be employed for highly efficient optimization. Quadratic programming tactics: As emphasized above, the dimension d of the parameter space to be optimized over in this application is quite large (d Ncomp (T Nsyn + Nchan ), with N denoting the number of compartments, synapse types, and membrane channel types respectively). While our problem is convex, and therefore tractable in the sense of having no nonglobal local optima, the time-complexity of QP, implemented naively, is O(d3 ), which is too slow for our purposes. Fortunately, the correlational structure of the parameters allows us to perform this optimization more efficiently, by several natural decompositions: in particular, given the spatiotemporal voltage signal Vx (t), parameters which are distant in space (e.g., the densities of channels in widely-separated compartments) and time (i.e., the synaptic input us,x (t) for t = ti and tj with |ti - tj | large) may be optimized independently. This amounts to a kind of \"coordinate descent\" algorithm, in which we decompose our parameter set into a set of (not necessarily disjoint) subsets, and iteratively optimize the parameters in each subset\n1 Note that Cx is the proportionality constant between the externally injected electrode current and . It is linear in the data and can be included with the other parameters ai in the joint estimation. 2 In the case that the reversal potentials Ej are unknown as well, we may estimate these terms by separating the term gj gj (t)(V (t) - Ej ) into gj gj (t)V (t) and (gj Ej )gj (t), thereby increasing the number of parameters in the regression by one per channel; Ej is then set to (gj Ej )/gj . dV dt\n\n\f\nwhile holding all the other parameters fixed. (The quadratic nature of the original problem guarantees that each of these subset problems will be quadratic, with no local minima.) Empirically, we found that this decomposition / sequential optimization approach reduced the computation time from O(d3 ) to near O(d).\n\n2.1\n\nThe probabilistic framework\n\nIf we assume the noise Nx,t is Gaussian and white, then the mean-square regression solution for a described above coincides exactly with the (constrained) maximum likelihood ^ estimate, aM L = arg mina V - Ma 2 /2 2 . (The noise scale may also be estimated via 2 maximum likelihood.) This suggests several straightforward likelihood-based techniques for representing the uncertainty in our estimates. Posterior confidence intervals: The assumption of Gaussian noise implies that the poste1 rior distribution of the parameters a is of the form p(a|V) = Z p(a)G, (a), with Z a normalizing constant, the prior p(a) supported on ai 0, and the mean and covariance of the likelihood Gaussian G(a) given by = (MT M)-1 MT V and -1 = MT M/ 2 . We will assume a flat prior distribution p(a) (that is, no prior knowledge) on the non-synaptic parameters {gj,x , fx,y } (although clearly non-flat priors can be easily incorporated here [6]); for the synaptic parameters us,x (t) it will be convenient to use a product-of-exponentials i i exp(-i ui ). In each case, computing confidence intervals for ai prior, p(u) = reduces to computing moments of multidimensional Gaussian distributions, truncated to ai 0. We use importance sampling methods [7] to compute these moments for the channel parameters. Sampling from high-dimensional truncated Gaussians via sample-reject is inefficient (since samples from the non-truncated Gaussian call this distribution p (a|V) may violate the constraint ai 0 with high probability). Therefore we sample instead from a proposal density q (a) with support on ai 0 (specifically, a product of univariate truncated Gaussians with mean ai and appropriate variance) and evaluate the second moments around aM L by nN p (an |V) . q (an ) =1 (5) Hessian Principal Components Analysis: The procedure described above allows us to quantify the uncertainty of individual estimated parameters ai . We are also interested in the uncertainty of our estimates in a joint sense (e.g., in the posterior covariance instead of just the individual variances). The negative Hessian of the loglikelihood function, A MT M, contains a great deal of this information, which may be extracted via a kind of principal components analysis: the eigenvectors of A corresponding to the greatest eigenvalues tell us in which directions the model is most strongly constrained by the data, while low eigenvalues correspond to directions in which the likelihood changes relatively slowly, e.g. channels whose corresponding currents are highly correlated (and therefore approximately interchangeable). These ideas will be illustrated in section 3.4.\nN 1 n p (an |V) n (ai - aM Li )2 Z =1 q (an )\n\nE[(ai - aM Li )2 |V] \n\nwhere\n\nZ=\n\n3\n\nResults\n\nTo test the validity, efficiency and accuracy of the proposed method we apply it to model data of varying complexity.\n\n\f\n3.1\n\nInferring channel conductances in a multicompartmental model\nNchan c y dVx gc gc (Vx , t)(Ec - Vx (t)) + = fx,y (Vy (t) - Vx (t)) + Ix (t) + x dNx,t ; dt =1\n\nWe take a simple 14-compartment model neuron, described by Cx\n\nrecall fx,y are the intercompartmental conductances, gc (V , t) is channel c's conductance state given the voltage history up to time t, and gc is the channel concentration. We min imize a vectorized expression as above (equation 4). On biophysical grounds we require fx,y = fy,x ; we enforce this (linear) constraint by only including one parameter for each connected pair of compartments (x, y ). In this case the true channel kinetics were of standard Hodgkin-Huxley form (Na+ , K+ and leak), with inhomogeneous densities (figure 1). To test the selectivity of the estimation procedure, we fitted Nchan = 8 candidate channels from [8, 9, 10] (five of which were absent in the true model cell). Figure 1 shows the performance of the inference; despite the fact that we used only 20 ms of model data, the last 7 ms of which were used for the actual fitting (the first 13 ms were used to evolve the random initial conditions to an approximately correct value), the fit is near perfect in the = 0 case, with vanishingly small errorbars. The concentrations of the five channels that were not present when generating the data were set to approximately zero, as desired (data not shown). The lower panels demonstrate the robustness of the methods on highly noisy (large ) data, in which case the estimated errorbars become significant, but the performance degrades only slightly.\nNa\n\n200 150 100 50 100 HH g 200\nNa\n\ninferred HH g\ninferred HH g\n\nNa\n\n200 150 100 50 100 HH g 200\nNa\n\nFigure 1: Top panels: = 0. 14 compartment model neuron, Na+ channel concentration indicated by grey scale; estimated Na+ channel concentrations in the noiseless case; observed voltage traces (one per compartment); estimated concentrations. Bottom panels: large. Na+ channel concentration legend, values relative to Cm (e.g. in mS/cm2 if Cm = 1F /cm2 ); estimated Na+ concentrations in the noisy case; noisy voltage traces; estimated channel concentrations. K+ channel concentrations and intercompartmental conductances fx,y not shown (similar performance).\n\n3.2\n\nInferring synaptic input in a passive model\n\nNext we simulated a single-compartment, leaky neuron (i.e., no voltage-sensitive membrane channels) with synaptic input from three synapses, two excitatory (glutamatertic;\n\n\f\n = 3 ms, E = 0 mV) and one inhibitory (GABAA ; = 5 ms, E = -75 mV). When we attempted to estimate the synaptic input us (t) via the ML estimator described above (figure 2, left), we observe an overfitting phenomenon: the current noise due to Nt is being \"explained\" by competing balanced excitatory and inhibitory synaptic inputs. This overfit ting is unsurprising, given that we are modeling a T -dimensional observation, V, with 2T regressor variables, u- (t) and u+ (t), 0 < t < T (indeed, overfitting is much less apparent in the case that only one synapse is modeled, where no balance of excitation and inhibition is possible; data not shown). Once again, we may make use of well-known techniques from the regression literature to solve this problem: in this case, we need to regularize our estimated synaptic parameters. Instead of maximizing the likelihood, uM L , we maximize the posterior likelihood 1 V - MKu 2 + u n with ut 0 t, (6) 2 2 2 where n is a vector of ones and is the Lagrange multiplier for the regularizer, or equivalently parametrizes the exponential prior distribution over u(t). As mentioned above, this maximum a posteriori (MAP) estimate corresponds to a product exponential prior on the synaptic input ut ; the multiplier may be chosen as the expected synaptic input per unit time. It is well known that this type of prior has a sparsening effect, shrinking small values of uM L (t) to zero. This is visible in figure 2 (right); we see that the small, noise-matching synaptic activity is effectively suppressed, permitting much more accurate detection of the true input spike timing. ^ uM AP = arg min\nu\nwithout regularisation 12 Exc spikes 2 [mS/cm ] with regularisation\n\nVoltage [mV] | Inh spikes 2 [mS/cm ]\n\n|\n\n0 -49\n\n-53\n\n-57 23\n\n0 0 0.05 0.1 0.15 0.2 Time [s] 0.25 0.3 0.35 0.4 0 0.05 0.1 0.15 0.2 Time [s] 0.25 0.3 0.35 0.4\n\nFigure 2: Inferring synaptic inputs to a passive membrane. Top traces: excitatory inputs; bottom: inhibitory inputs; middle: the resulting voltage trace. Left panels: synaptic inputs inferred by ML; right: MAP estimates under the exponential (shrinkage) prior. Note the overfitting by the ML estimate (left) and the higher accuracy under the MAP estimate (right); in particular note that the two excitatory synapses of differing magnitudes may easily be distinguished.\n\n3.3\n\nInferring synaptic input and channel distribution in an active model\n\nThe optimization is, as mentioned earlier, jointly convex in both channel densities and synaptic input. We illustrate the simultaneous inference of channel densities and synaptic inputs in a single compartment, writing the model as:\nNchan sS c dV gs (t)(Vs - V (t)) + dN (t), gc gc (V , t)(Vc - V (t)) + = dt =1 =1\n\n(7)\n\nwith the same channels and synapse types as above. The combination of leak conductance and inhibitory synaptic input leads to very small eigenvalues in A and slow convergence\n\n\f\nwhen applying the above decomposition; thus, to speed convergence here we coarsened the time resolution of the synaptic input from 0.1 ms to 0.2 ms. Figure 3 demonstrates the accuracy of the results.\nSynaptic conductances 120 1 Inh spikes | Voltage [mV] | Exc spikes max conductance [mS/cm2] 100 80 60 40 20 0 HHNa HHK 20 40 60 Time [ms] 80 100 True parameters (spikes and conductances) Data (voltage trace) Inferred (MAP) spikes Inferred (ML) channel densities Channel conductances\n\n0 22 mV -23 mV -69 mV 1\n\n0\n\nLeak\n\nMNa\n\nMK\n\nSNa\n\nSKA\n\nSKDR\n\nFigure 3: Joint inference of synaptic input and channel densities. The true parameters are in blue, the inferred parameters in red. The top left panel shows the excitatory synaptic input, the middle left panel the voltage trace (the only data) and the bottom left traces the inhibitory synaptic input. The right panel shows the true and inferred channel densities; channels are the same as in 3.1.\n\n3.4\n\nEigenvector analysis for a single-compartment model\n\nFinally, as discussed above, the eigenvectors (\"principal components\") of the loglikelihood Hessian A carry significant information about the dependence and redundancy of the parameters under study here. An example is given in figure 4; for simplicity, we restrict our attention again to the single-compartment case. In the leftmost panels, we see that the direction amost most highly-constrained by the data the eigenvector corresponding to the largest eigenvalue of A turns out to have the intuitive form of the balance between Na+ and K+ channels. When we perturb this balance slightly (that is, when we shift the model parameters slightly along this direction in parameter space, aM L aM L + amost ), the cell's behavior changes dramatically. Conversely, the least-sensitive direction, aleast , corresponds roughly to the balance between the concentrations of two Na+ channels with similar kinetics, and moving in this direction in parameter space (aM L aM L + aleast ) has a negligible effect on the model's dynamical behavior.\n0.4 0.3 0.2\n\n50\n\n0.8 0.6 0.4\n\n50\n\nvoltage [mV]\n\n0.1 0 -0.1 -0.2 -0.3 -0.4 Na . . . K . . . L R\n\n0.2 0\n\n-50\n\n-0.2 -0.4 -0.6\n\nvoltage [mV]\n\n0\n\n0\n\n-50\n\n-100 0\n\n20\n\n40 60 time [ms]\n\n80\n\n100\n\n-0.8\n\nNa\n\n.\n\n.\n\n.\n\nK\n\n.\n\n.\n\n.\n\nL\n\nR\n\n-100 0\n\n20\n\n40 60 time [ms]\n\n80\n\n100\n\nFigure 4: Eigenvectors of A corresponding to largest (amost , left) and smallest (aleast , right) eigenvalues, and voltage traces of the model neuron after equal sized perturbations by both (solid line: perturbed model; dotted line: original model). The first four parameters are the concentrations of four Na+ channels (the first two of which are in fact the same Hodgkin-Huxley channel, but with slightly different kinetic parameters); the next four of K+ channels; the next of the leak channel; the last of 1/C .\n\n\f\n4\n\nDiscussion and future work\n\nWe have developed a probabilistic regression framework for estimation of biophysical single neuron properties and synaptic input. This framework leads directly to efficient, globally-convergent algorithms for determining these parameters, and also to well-founded methods for analyzing the uncertainty of the estimates. We believe this is a key first step towards applying these techniques in detailed, quantitative studies of dendritic input and processing in vitro and in vivo. However, some important caveats and directions for necessary future work should be emphasized. Observation noise: While we have explicitly allowed current noise in our main evolution equation (1) (and experimented with a variety of other current- and conductance-noise terms; data not shown), we have assumed that the resulting voltage V (t) is observed noiselessly, with sufficiently high sampling rates. This is a reasonable assumption when voltage is recorded directly, via patch-clamp methods. However, while voltage-sensitive imaging techniques have seen dramatic improvements over the last few years (and will continue to do so in the near future), currently these methods still suffer from relatively low signalto-noise ratios and spatiotemporal sampling rates. While the procedure proved to be robust to low-level noise of various forms (data not shown), it will be important to relax the noiseless-observation assumption, most likely by adapting standard techniques from the hidden Markov model signal processing literature [11]. Hidden branches: Current imaging and dye technologies allow for the monitoring of only a fraction of a dendritic tree; therefore our focus will be on estimating the properties of these sub-structures. Furthermore, these dyes diffuse very slowly and may miss small branches of dendrites, thereby effectively creating unobserved current sources. Misspecified channel kinetics and channels with chemical dependence: Channels dependent on unobserved variables (e.g., Ca++ -dependent K+ channels), have not been included in the model. The techniques described here may thus be applied unmodified to experimental data for which such channels have been blocked pharmacologically. However, we should note that our methods extend directly to the case where simultaneous access to voltage and calcium signals is possible; more generally, one could develop a semi-realistic model of calcium concentration, and optimize over the parameters of this model as well. We have discussed in some detail (e.g. figure 1) the effect of misspecifications of voltagedependent channel kinetics and how the most relevant channels may be selected by supplying sufficiently rich \"channel libraries\". Such libraries can also contain several \"copies\" of the same channel, with one or more systematically varying parameters, thus allowing for a limited search in the nonlinear space of channel kinetics. Finally, in our discussion of \"equivalence classes\" of channels (figure 4), we illustrate how eigenvector analysis of our objective function allows for insights into the joint behaviour of channels.\n\nReferences\n[1] Jolivet, Lewis, and Gerstner, 2004. J. Neurophysiol., 92, 959-976. [2] Vanier and Bower, 1999. J. Comput. Neurosci., 7(2), 149-171. [3] Goldman, Golowasch, Marder and Abbott, 2001. J. Neurosci., 21(14), 5229-5238. [4] Wood, Gurney and Wilson, 2004. Neurocomputing, 58-60, 1109-1116. [5] Morse, Davison and Hines, 2001. Soc. Neurosci. Abs., 606.5. [6] Baldi, Vanier and Bower, 1998. J. Comp. Neurosci., 5(3), 285-314. [7] Press et al., 1992. Numerical Recipes in C, CUP. [8] Hodgkin and Huxley, 1952. J. Physiol., 117. [9] Poirazi, Brannon and Mel, 2003. Neuron, 37(6), 977-87. [10] Mainen, Joerges, Huguenard, and Sejnowski, 1995. Neuron, 15(6), 1427-39. [11] Rabiner, 1989. Proc. IEEE, 77(2), 257-286.\n\n\f\n", "award": [], "sourceid": 2888, "authors": [{"given_name": "Misha", "family_name": "Ahrens", "institution": null}, {"given_name": "Liam", "family_name": "Paninski", "institution": null}, {"given_name": "Quentin", "family_name": "Huys", "institution": null}]}