{"title": "Efficient and direct estimation of a neural subunit model for sensory coding", "book": "Advances in Neural Information Processing Systems", "page_first": 3104, "page_last": 3112, "abstract": "Many visual and auditory neurons have response properties that are well explained by pooling the rectified responses of a set of self-similar linear filters. These filters cannot be found using spike-triggered averaging (STA), which estimates only a single filter. Other methods, like spike-triggered covariance (STC), define a multi-dimensional response subspace, but require substantial amounts of data and do not produce unique estimates of the linear filters. Rather, they provide a linear basis for the subspace in which the filters reside. Here, we define a 'subunit' model as an LN-LN cascade, in which the first linear stage is restricted to a set of shifted (\"convolutional\") copies of a common filter, and the first nonlinear stage consists of rectifying nonlinearities that are identical for all filter outputs; we refer to these initial LN elements as the 'subunits' of the receptive field. The second linear stage then computes a weighted sum of the responses of the rectified subunits. We present a method for directly fitting this model to spike data. The method performs well for both simulated and real data (from primate V1), and the resulting model outperforms STA and STC in terms of both cross-validated accuracy and efficiency.", "full_text": "To appear in: Neural Information Processing Systems (NIPS),\n\nLake Tahoe, Nevada. December 3-6, 2012.\n\nEf\ufb01cient and direct estimation of a neural subunit\n\nmodel for sensory coding\n\nBrett Vintch\n\nAndrew D. Zaharia\n\nJ. Anthony Movshon\n\nEero P. Simoncelli \u2020\n\nCenter for Neural Science, and\n\n\u2020Howard Hughes Medical Institute\n\nNew York University\nNew York, NY 10003\n\nvintch@cns.nyu.edu\n\nAbstract\n\nMany visual and auditory neurons have response properties that are well explained\nby pooling the recti\ufb01ed responses of a set of spatially shifted linear \ufb01lters. These\n\ufb01lters cannot be estimated using spike-triggered averaging (STA). Subspace meth-\nods such as spike-triggered covariance (STC) can recover multiple \ufb01lters, but re-\nquire substantial amounts of data, and recover an orthogonal basis for the subspace\nin which the \ufb01lters reside rather than the \ufb01lters themselves. Here, we assume a\nlinear-nonlinear\u2013linear-nonlinear (LN-LN) cascade model in which the \ufb01rst lin-\near stage is a set of shifted (\u2018convolutional\u2019) copies of a common \ufb01lter, and the\n\ufb01rst nonlinear stage consists of rectifying scalar nonlinearities that are identical\nfor all \ufb01lter outputs. We refer to these initial LN elements as the \u2018subunits\u2019 of\nthe receptive \ufb01eld. The second linear stage then computes a weighted sum of the\nresponses of the recti\ufb01ed subunits. We present a method for directly \ufb01tting this\nmodel to spike data, and apply it to both simulated and real neuronal data from\nprimate V1. The subunit model signi\ufb01cantly outperforms STA and STC in terms\nof cross-validated accuracy and ef\ufb01ciency.\n\n1\n\nIntroduction\n\nAdvances in sensory neuroscience rely on the development of testable functional models for the\nencoding of sensory stimuli in neural responses. Such models require procedures for \ufb01tting their\nparameters to data, and should be interpretable in terms both of sensory function and of the biological\nelements from which they are made. The most common models in the visual and auditory literature\nare based on linear-nonlinear (LN) cascades, in which a linear stage serves to project the high-\ndimensional stimulus down to a one-dimensional signal, where it is then nonlinearly transformed\nto drive spiking. LN models are readily \ufb01t to data, and their linear operators specify the stimulus\nselectivity and invariance of the cell. The weights of the linear stage may be loosely interpreted\nas representing the ef\ufb01cacy of synapses, and the nonlinearity as a transformation from membrane\npotential to \ufb01ring rate.\nFor many visual and auditory neurons, responses are not well described by projection onto a single\nlinear \ufb01lter, but instead re\ufb02ect a combination of several \ufb01lters. In the cat retina, the responses of Y\ncells have been described by linear pooling of shifted recti\ufb01ed linear \ufb01lters, dubbed \u201csubunits\u201d [1, 2].\nSimilar behaviors are seen in guinea pig [3] and monkey retina [4].\nIn the auditory nerve, responses\nare described as computing the envelope of the temporally \ufb01ltered sound waveform, which can be\ncomputed via summation of squared quadrature \ufb01lter responses [5]. In primary visual cortex (V1),\nsimple cells are well described using LN models [6, 7], but complex cell responses are more like a\n\n1\n\n\fsuperposition of multiple spatially shifted simple cells [8], each with the same orientation and spatial\nfrequency preference [9]. Although the description of complex cells is often reduced to a sum of\ntwo squared \ufb01lters in quadrature [10], more recent experiments indicate that these cells (and indeed\nmost \u2019simple\u2019 cells) require multiple shifted \ufb01lters to fully capture their responses [11, 12, 13].\nIntermediate nonlinearities are also required to describing the response properties of some neurons\nin V2 to stimuli (e.g., angles [14] and depth edges [15]).\nEach of these examples is consistent with a canonical but constrained LN-LN model, in which the\n\ufb01rst linear stage consists of convolution with one (or a few) \ufb01lters, and the \ufb01rst nonlinear stage\nis point-wise and rectifying. The second linear stage then pools the responses of these \u201csubunits\u201d\nusing a weighted sum, and the \ufb01nal nonlinearity converts this to a \ufb01ring rate. Hierarchical stacks of\nthis type of \u201cgeneralized complex cell\u201d model have also been proposed for machine vision [16, 17].\nWhat is lacking is a method for validating this model by \ufb01tting it directly to spike data.\nA widely used procedure for \ufb01tting a simple LN model to neural data is reverse correlation [18, 19].\nThe spike-triggered average of a set of Gaussian white noise stimuli provides an unbiased estimate\nof the linear kernel. In a subunit model, the initial linear stage projects the stimulus into a multi-\ndimensional subspace, which can be estimated using spike-triggered covariance (STC) [20, 21].\nThis has been used successfully for \ufb02y motion neurons [22], vertebrate retina [23], and primary vi-\nsual cortex [24, 11]. But this method relies on a Gaussian stimulus ensemble, requires a substantial\namount of data, and recovers only a set of orthogonal axes for the response subspace\u2014not the under-\nlying biological \ufb01lters. More general methods based on information maximization alleviate some of\nthe stimulus restrictions [25] but strongly limit the dimensionality of the recoverable subspace and\nstill produce only a basis for the subspace.\nHere, we develop a speci\ufb01c subunit model and a maximum likelihood procedure to estimate its\nparameters from spiking data. We \ufb01t the model to both simulated and real V1 neuronal data, demon-\nstrating that it is substantially more accurate for a given amount of data than the current state-of-the-\nart V1 model which is based on STC [11], and that it produces biologically interpretable \ufb01lters.\n\n2 Subunit model\n\nWe assume that neural responses arise from a weighted sum of the responses of a set of nonlinear\nsubunits. Each subunit applies a linear \ufb01lter to its input (which can be either the raw stimulus, or\nthe responses arising from a previous stage in a hierarchical cascade), and transforms the \ufb01ltered\nresponse using a memoryless rectifying nonlinearity. A critical simpli\ufb01cation is that the subunit\n\ufb01lters are related by a \ufb01xed transformation; here, we assume they are spatially translated copies of\na common \ufb01lter, and thus the population of subunits can be viewed as computing a convolution.\nFor example, the subunits of a V1 complex cell could be simple cells in V1 that share the same\norientation and spatial frequency preference, but differ in spatial location, as originally proposed by\nHubel & Wiesel [8, 9]. We also assume that all subunits use the same rectifying nonlinearity. The\nresponse to input de\ufb01ned over two discrete spatial dimensions and time, x(i, j, t), is written as:\n\n\u02c6r(t) =Xm,n\n\nwm,n f\u21e50@Xi,j,\u2327\n\nk(m, n, \u2327 )\u00b7 x(i m, j n, t \u2327 )1A + . . . + b,\n\n(1)\n\nwhere k is the subunit \ufb01lter, f\u21e5 is a point-wise function parameterized by vector \u21e5, wn,m are the\nspatial weights, and b is an additive baseline. The ellipsis indicates that we allow for multiple\nsubunit channels, each with its own \ufb01lter, nonlinearity, and pooling weights. We interpret \u02c6r(t) as a\n\u2018generator potential\u2019, (e.g., time-varying membrane voltage) which is converted to a \ufb01ring rate by\nanother rectifying nonlinearity.\nThe subunit model of Eq. (1) may be seen as a speci\ufb01c instance of a subspace model, in which the\ninput is initially projected onto a linear subspace. Bialek and colleagues introduced spike-triggered\ncovariance as a means of recovering such subspaces [20, 22]. Speci\ufb01cally, eigenvector analysis\nof the covariance matrix of the spike-triggered input ensemble exposes orthogonal axes for which\nthe spike-triggered ensemble has a variance that differs signi\ufb01cantly from that of the raw input\nensemble. These axes may be separated into those along which variance is greater (excitatory) or\nless than (suppressive) that of the input. Figure 1 demonstrates what happens when STC is applied\nto a simulated complex cell with 15 spatially shifted subunits. The response of this model cell is\n\n2\n\n\fFigure 1: Spike-triggered covariance analysis of a simulated V1 complex cell. (a) The model output\nis formed by summing the recti\ufb01ed responses of multiple linear \ufb01lter kernels which are shifted and\nscaled copies of a canonical form. (b) The shifted \ufb01lters lie along a manifold in stimulus space\n(four shown), and are not mutually orthogonal in general. STC recovers an orthogonal basis for\na low-dimensional subspace that contains this manifold by \ufb01nding the directions in stimulus space\nalong which spikes are elicited or suppressed. (c) STC analysis of this model cell returns a variable\nnumber of \ufb01lters dependent upon the amount of acquired data. A modest amount of data typically\nreveals two strong STC eigenvalues (top), whose eigenvectors form a quadrature (90-degree phase-\nshifted) pair and span the best-\ufb01tting plane for the set of shifted model \ufb01lters. These will generally\nhave tuning properties (orientation, spatial frequency) similar to the true model \ufb01lters. However, the\nmanifold does not generally lie in a two-dimensional subspace [26], and a larger data set reveals\nadditional eigenvectors (bottom) that serve to capture the deviations from the ~e1,2 plane. Due to\nthe constraint of mutual orthogonality, these \ufb01lters are usually not localized and they have tuning\nproperties that differ from true model \ufb01lters.\n\n\u02c6r(t) =Pi wib(~ki\u00b7 ~x(t))c2, where the ~k\u2019s are shifted \ufb01lters, w weights \ufb01lters by position, and ~xis\n\nthe stimulus vector. The recovered STC axes span the same subspace as the shifted model \ufb01lters, but\nthere are fewer of them, and the enforced orthogonality of eigenvectors means that they are generally\nnot a direct match to any of the model \ufb01lters. This has also been observed in \ufb01lters extracted from\nphysiological data [11, 12]. Although one may follow the STC analysis by indirectly identifying a\nlocalized \ufb01lter whose shifted copies span the recovered subspace [11, 13], the reliance on STC still\nimposes the stimulus limitations and data requirements mentioned above.\n\n3 Direct subunit model estimation\n\nA generic subspace method like STC does not exploit the speci\ufb01c structure of the subunit model.\nWe therefore developed an estimation procedure explicitly tailored for this type of computation. We\n\ufb01rst introduce a piecewise-linear parameterization of the subunit nonlinearity:\n\nf (s) =Xl\n\n\u21b5lTl(s),\n\n(2)\n\nwhere the \u21b5\u2019s scale a small set of overlapping \u2018tent\u2019 functions, Tl(\u00b7), that represent localized portions\nof f (\u00b7) (we \ufb01nd that a dozen or so basis functions are typically suf\ufb01cient to provide the needed\n\ufb02exibility). Incorporating this into the model response of Eq. (1) allows us to fold the second linear\npooling stage and the subunit nonlinearity into a single sum:\n\n\u02c6r(t) = Xm,n,l\n\nwm,n\u21b5l Tl0@Xi,j,\u2327\n\nk(m, n, \u2327 )\u00b7 x(i m, j n, t \u2327 )1A + ... + b.\n\nThe model is now partitioned into two linear stages, separated by the \ufb01xed nonlinear functions Tl(\u00b7).\nIn the \ufb01rst, the stimulus is convolved with k, and in the second, the nonlinear responses are summed\nwith a set of weights that are separable in the indices l and n, m. The partition motivates the use of an\niterative coordinate descent scheme: the linear weights of each portion are optimized in alternation,\n\n(3)\n\n3\n\n(cid:2)(cid:1)(cid:1)(cid:2)(cid:1)(cid:1)(cid:2)(cid:1)(cid:1)(cid:2)(cid:1)(cid:1)a)b)c)(cid:1)(cid:2)(cid:1)(cid:1)(cid:2)(cid:1)(cid:1)(cid:2)(cid:1)(cid:1)(cid:2)(cid:1)(cid:1)//10//10\u03bb1-2eigenvalueseigenvectors10 4 data points10 6 data points(cid:2)(cid:1)(cid:1)(cid:2)(cid:1)(cid:1)(cid:2)(cid:1)(cid:1)(cid:2)(cid:1)(cid:1)shifted filtermanifoldSTCplanepositionenvelope(cid:2)(cid:1)(cid:1)(cid:1)(cid:1)(cid:2)\u03bb1-4\fwhile the other portion is held constant. For each step, we minimized the mean square error between\nthe observed \ufb01ring rate of a cell and the \ufb01ring rate predicted by the model. For models that include\ntwo subunit channels we optimize over both channels simultaneously (see section 3.3 for comments\nregarding two-channel initialization).\n\n3.1 Estimating the convolutional subunit kernel\n\nThe \ufb01rst coordinate descent leg optimizes the convolutional subunit kernel, k, using gradient descent\nwhile \ufb01xing the subunit nonlinearity and the \ufb01nal linear pooling. Because the tent basis functions\nare \ufb01xed and piecewise linear, the gradient is easily determined. This property also ensures that\nthe descent is locally convex: assuming that updating k does not cause any of the the linear subunit\nresponses to jump between the localized tent functions representing f, then the optimization is linear\nand the objective function is quadratic. In practice, the full gradient descent path causes the linear\nsubunit responses to move slowly across bins of the piecewise nonlinearity. However, we include\na regularization term to impose smoothness on the nonlinearity (see below) and this yields a well-\nbehaved minimization problem for k.\n\n3.2 Estimating the subunit nonlinearities and linear subunit pooling\n\nThe second leg of coordinate descent optimizes the subunit nonlinearity (more speci\ufb01cally, the\nweights on the tent functions, \u21b5l), and the subunit pooling, wn,m. As described above, the objective\nis bilinear in \u21b5l and wn,m when k is \ufb01xed. Estimating both \u21b5l and wn,m can be accomplished with\nalternating least-squares, which assures convergence to a (local) minimum [27]. We also include\ntwo regularization terms in the objective function. The \ufb01rst ensures smoothness in the nonlinearity\nf, by penalizing the square of the second derivative of the function in the least-squares \ufb01t. This\nsmooth nonlinearity helps to guarantee that the optimization of k is well behaved, even where \ufb01nite\ndata sets leave the function poorly constrained. We also include a cross-validated ridge prior for\nthe pooling weights to bias wn,m toward zero. The \ufb01lter kernel k can also be regularized to ensure\nsmoothness, but for the examples shown here we did not \ufb01nd the need to include such a term.\n\n3.3 Model initialization\n\nOur objective function is non-convex and contains local minima, so the selection of initial parameter\nvalues may affect the solution. We found that initializing our two-channel subunit model to have\na positive pooling function for one channel and a negative pooling function for the second channel\nallowed the optimization of the second channel to proceed much more quickly. This is probably\ndue in part to a suppressive channel that is much weaker than the excitatory channel in general.\nWe initialized the nonlinearity to halfwave-recti\ufb01cation for the excitatory channel and fullwave-\nrecti\ufb01cation for the suppressive channel.\nTo initialize the convolutional \ufb01lter we use a novel technique that we term \u2018convolutional STC\u2019. The\nsubunit model describes a receptive \ufb01eld as the linear combination of nonlinear kernel responses that\nspatially tile the stimulus. Thus, the contribution of each localized patch of stimulus (of a size equal\nto the subunit kernel) is the same, up to a scale factor set by the weighting used in the subsequent\npooling stage. As such, we compute an STC analysis on the union of all localized patches of stimuli.\nFor each subunit location, {m, n}, we extract the local stimulus values in a window, gm,n(i, j), the\nsize of the convolutional kernel and append them vertically in a \u2019local\u2019 stimulus matrix. As an initial\nguess for the pooling weights, we weight each of these blocks by a Gaussian spatial pro\ufb01le, chosen\nto roughly match the size of the receptive \ufb01eld. We also generate a vector containing the vertical\nconcatenation of copies of the measured spike train, ~r(one copy for each subunit location).\n\nAfter performing STC analysis on the localized stimulus matrix, we use the \ufb01rst (largest variance)\neigenvector to initialize the subunit kernel of the excitatory channel, and the last (lowest variance)\neigenvector to initialize the kernel of the suppressive channel. In practice, we \ufb01nd that this initializa-\ntion greatly reduces the number of iterations, and thus the run time, of the optimization procedure.\n\n4\n\nw1,1Xg1,1(i,j)\nw1,2Xg1,2(i,j)\n\n...\n\n0B@\n\n1CA ! Xloc ; 0B@\n\n~r\n~r\n...\n\n1CA ! ~rloc.\n\n(4)\n\n\fa)\n\n1\n\nSimulated simple cell\n\nb)\n\n1\n\nSimulated complex cell\n\ntrain\n\ntest\n\n0.75\n\n0.5\n\n0.25\n\n)\nr\n(\n \ne\nc\nn\na\nm\nr\no\nf\nr\ne\np\n\n \nl\ne\nd\no\nM\n\n0\n\n0.75\n\n0.5\n\n0.25\n\n0\n\nsubunit model\nRust-STC model\n\n5/60 10/60\n\n1\n\n5\n\n10\n\n20\n\n40\n\n5/60 10/60\n\n1\n\n5\n\n10\n\n20\n\n40\n\nMinutes of simulated data\n\nMinutes of simulated data\n\nFigure 2: Model \ufb01tting performance for simulated V1 neurons. Shown are correlation coef\ufb01cients\nfor the subunit model (black circles) and the Rust-STC model (blue squares) [11], computed on both\nthe training data (open), and on a holdout test set (closed). Spike counts for each presented stimulus\nframe are drawn from a Poisson distribution. Shaded regions indicate \u00b1 1 s.d. for 5 simulation runs.\n(a) \u2018Simple\u2019 cell, with spike rate determined by the halfwave-recti\ufb01ed and squared response of a\nsingle oriented linear \ufb01lter. (b) \u2018Complex\u2019 cell, with rate determined by a sum of squared Gabor\n\ufb01lters arranged in spatial quadrature. Insets show estimated \ufb01lters for the subunit (top) and Rust-\nSTC (bottom) models with ten seconds (400 frames; left) and 20 minutes (48,000 frames; right) of\ndata.\n\n4 Experiments\n\nWe \ufb01t the subunit model to physiological data sets in 3 different primate cortical areas: V1, V2, and\nMT. The model is able to explain a signi\ufb01cant amount of variance for each of these areas, but for\nillustrative purposes we show here only data for V1. Initially, we use simulated V1 cells to compare\nthe performance of the subunit model to that of the Rust-STC model [11], which is based upon STC\nanalysis.\n\n4.1 Simulated V1 data\n\nWe simulated the responses of canonical V1 simple cells and complex cells in response to white\nnoise stimuli. Stimuli consisted of a 16x16 spatial array of pixels whose luminance values were\nset to independent ternary white noise sequences, updated every 25 ms (or 40 Hz). The simulated\ncells use spatiotemporally oriented Gabor \ufb01lters: The simple cell has one even-phase \ufb01lter and a\nhalf-squaring output nonlinearity while the complex cell has two \ufb01lters (one even and one odd)\nwhose squared responses are combined to give a \ufb01ring rate. Spike counts are drawn from a Poisson\ndistribution, and overall rates are scaled so as to yield an average of 40 ips (i.e. one spike per time\nbin).\nFor consistency with the analysis of the physiological data, we \ufb01t the simulated data using a subunit\nmodel with two subunit channels (even though the simulated cells only possess an excitatory chan-\nnel). When \ufb01tting the Rust-STC model, we followed the procedure described in [11]. Brie\ufb02y, after\nthe STA and STC \ufb01lters are estimated, they are weighted according to their predictive power and\ncombined in excitatory and suppressive pools, E and S (we use cross-validation to determine the\nnumber of \ufb01lters to use for each pool). These two pooled responses are then combined using a joint\noutput nonlinearity: \u02c6r(t)Rust = \u21b5 + (E \u21e2 S \u21e2)/(E \u21e2 + \u270fS\u21e2 + 1). Parameters {\u21b5, , , , \u270f, \u21e2}\nare optimized to minimizing mean squared error between observed spike counts and the model rate.\nModel performances, measured as the correlation between the model rate and spike count, are shown\nin Figure 2. In low data regimes, both models perform nearly perfectly on the training data, but\npoorly on separate test data not used for \ufb01tting, a clear indication of over-\ufb01tting. But as the data\nset increases in size, the subunit model rapidly improves, reaching near-perfect performance for\nmodest spike counts. The Rust-STC model also improves, but much more slowly; It requires more\nthan an order of magnitude more data to achieve the same performance as the subunit model. This\n\n5\n\n\fFigure 3: Two-channel subunit model \ufb01t to a physiological data from a macaque V1 cell. (a) Fitted\nparameters for the excitatory (top row) and suppressive (bottom row) channels, including the space-\ntime subunit \ufb01lters (8 grayscale images, corresponding to different time frames), the nonlinearity,\nand the spatial weighting function wn,m that is used to combine the subunit responses. (b) A raster\nshowing spiking responses to 20 repeated presentations of an identical stimulus with the average\nspike count (black) and model prediction (blue) plotted above. (c) Simulated models (subunit model:\nblue, Rust-STC model: purple) and measured (black) responses to drifting sinusoidal gratings.\n\ninef\ufb01ciency is more pronounced for the complex cell, because the simple cell is fully explained by\nthe STA \ufb01lter, which can be estimated much more reliably than the STC \ufb01lters for small amounts of\ndata. We conclude that directly \ufb01tting the subunit model is much more ef\ufb01cient in the use of data\nthan using STC to estimate a subspace model.\n\n4.2 Physiological data from macaque V1\n\nWe presented spatio-temporal pixel noise to 38 cells recorded from V1 in anesthetized macaques (see\n[11] for details of experimental design). The stimulus was a 16x16 grid with luminance values set\nby independent ternary white noise sequences refreshed at 40 Hz. For 21 neurons we also presented\n20 repeats of a sequence of 1000 stimulus frames as a validation set. The model \ufb01lters were assumed\nto respond over a 200 ms (8 frame) causal time window in which the stimulus most strongly affected\nthe \ufb01ring of the neurons, and thus, model responses were derived from a stimulus vector with 2048\ndimensions (16x16x8).\nFigure 3 shows the \ufb01t of a 2-channel subunit model to data from a typical V1 cell. Figure 3a\nillustrates the subunit kernels and their associated nonlinearities and spatial pooling maps, for both\nthe excitatory channel (top row) and the suppressive channel (bottom row).\nThe two channels\nshow clear but opposing direction selectivity, starting at a latency of 50 ms. The fact that this cell\nis complex is re\ufb02ected in two aspects of the model parameters. First, the model shows a symmetric,\nfull-wave rectifying nonlinearity for the excitatory channel. Second, the \ufb01nal linear pooling for this\nchannel is diffuse over space, eliciting a response that is invariant to the exact spatial position and\nphase of the stimulus.\nFor this particular example the model \ufb01ts well. For the cross-validated set of repeated stimuli (which\nhave the same structure as for the \ufb01tting data), on average the model correlates with each trial\u2019s \ufb01ring\nrate with an r-value of 0.54. A raster of spiking responses to twenty repetitions of a 5 s stimulus\nare depicted in Fig. 3b, along with the average \ufb01ring rate and the model prediction, which are well\nmatched. The model can also capture the direction selectivity of this cell\u2019s response to moving\nsinusoidal gratings (whose spatial and temporal frequency are chosen to best drive the cell) (Fig.\n3c). The subunit model acceptably \ufb01ts most of the cells we recorded in V1. Moreover, \ufb01t quality\nis not correlated with modulation index (r = 0.08; n.s.), suggesting that the model captures the\nbehavior of both simple and complex cells equally well.\nThe \ufb01tted subunit model also signi\ufb01cantly outperforms the Rust-STC model in terms of predicting\nresponses to novel data. Figure 4a shows the performance of the Rust-STC and subunit models for\n21 V1 neurons, for both training data and test data on single trials. For the training data, the Rust-\n\n6\n\nExcitatorySuppressiveConvolutional subunit filtersNonlinearityPosition map090180270050100150200OrientationFiring rate (ips)measuredsubunitRust1sTrialsspikes50a)b)c)0 ms2550751001251501750.96\u00ba+\fFigure 4: Model performance comparisons on physiological data. (a) Subunit model performance\nvs. Rust-STC model for V1 data. Training accuracy is computed for a single variable-length se-\nquence extracted from the \ufb01tting data. Test accuracy is computed on the average response to 20\nrepeats of a 25 s stimulus. (b) Subunit model performance vs. an \u2018Oracle\u2019 model for V1 data (see\ntext). Each point represents the average accuracy in predicting responses to each of 20 repeated stim-\nuli. The oracle model uses the average spike count over the other 19 repeats as a prediction. Inset:\nRatio of subunit-to-oracle performance. Error bars indicate 1 s.d. (c) Subunit model performance\non test data, as a function of the total number of recorded spikes.\n\nSTC model performs signi\ufb01cantly better than the subunit model (Figure 4a; < rRust >= 0.81,\n< rsubunit >= 0.33; p \u2327 0.005). However, this is primarily due to over-\ufb01tting: Visual inspection\nof the STC kernels for most cells reveals very little structure. For test data (that was not included\nin the data used to \ufb01t the models), the subunit model exhibits signi\ufb01cantly better performance than\nthe Rust-STC model (< rRust >= 0.16, < rsubunit >= 0.27; p \u2327 0.005). This is primarily due\nto over-\ufb01tting in the STC analysis. For a stimulus composed of a 16x16 pixel grid with 8 frames,\nthe spike-triggered covariance matrix contains over 2 million parameters. For the same stimulus, a\nsubunit model with two channels and an 8x8x8 subunit kernel has only about 1200 parameters.\nThe subunit model performs well when compared to the Rust-STC model, but we were interested in\nobtaining a more absolute measure of performance. Speci\ufb01cally, no purely stimulus-driven model\ncan be expected to explain the response variability seen across repeated presentations of the same\nstimulus. We can estimate an upper bound on stimulus-driven model performance by implementing\nan empirical \u2018oracle\u2019 model that uses the average response over all but one of a set of repeated\nstimulus trials to predict the response on the remaining trial. Over the 21 neurons with repeated\nstimulus data, we found that the subunit model achieved, on average, 76% the performance of the\noracle model (Figure 4b). Moreover, the cells that were least well \ufb01t by the subunit model were also\nthe cells that responded only weakly to the stimulus (Figure 4c). We conclude that, for most cells,\nthe \ufb01tted subunit model explains a signi\ufb01cant fraction of the response that can be explained by any\nstimulus-driven model.\n\n5 Discussion\n\nSubunits have been proposed as a qualitative description of many types of receptive \ufb01elds in sensory\nsystems [2, 28, 8, 11, 12], and have enjoyed a recent renewal of interest by the modeling community\n[13, 29]. Here we have described a new parameterized canonical subunit model that can be applied\nto an arbitrary set of inputs (either a sensory stimulus, or a population of afferents from a previous\nstage of processing), and we have developed a method for directly estimating the parameters of this\nmodel from measured spiking data. Compared with STA or STC, the model \ufb01ts are more accurate\nfor a given amount of data, less sensitive to the choice of stimulus ensemble, and more interpretable\nin terms of biological mechanism.\nFor V1, we have applied this model directly to the visual stimuli, adopting the simplifying assump-\ntion that subcortical pathways faithfully relay the image data to V1. Higher visual areas build their\nresponses on the afferent inputs arriving from lower visual areas, and we have applied this subunit\n\n7\n\na)b)c)00.250.50.751025,00050,00000.250.50.75100.250.50.75100.250.50.75100.250.50.751n = 21trainingtestSubunit model accuracy (r)Rust-STC model accuracy (r)\u2018Oracle\u2019 accuracy (r)Total number of recorded spikes75%50%25%100%Oracle\fmodel to such neurons by \ufb01rst simulating the responses of a population of the afferent V1 neurons,\nand then optimizing a subunit model that best maps these afferent responses to the spiking responses\nobserved in the data. Speci\ufb01cally, for neurons in area V2, we model the afferent V1 population as\na collection of simple cells that tile visual space. The V1 \ufb01lters are chosen to uniformly cover the\nspace of orientations, scales, and positions [30]. We also include four different phases. For neurons\nin area MT (V5), we use an afferent V1 population that also includes direction selective subunits, be-\ncause the projections from V1 to MT are known to be sensitive to the direction of visual motion [31].\nSpeci\ufb01cally, the V1 \ufb01lters are a rotation-invariant set of 3-dimensional, space-space-time steerable\n\ufb01lters [32]. We \ufb01t these models to neural responses to textured stimuli that varied in contrast and\nlocal orientation content (for MT, the local elements also drift over time). Our preliminary results\nshow that the subunit model outperforms standard models for these higher order areas as well.\nWe are currently working to re\ufb01ne and generalize the subunit model in a number of ways. The\nmean squared error objective function, while computationally appealing, does not accurately re\ufb02ect\nthe noise properties of real neurons, whose variance changes with their mean rate. A likelihood\nobjective function, based on a Poisson or similar spiking model, can improve the accuracy of the\n\ufb01tted model, but it does so at a cost to the simplicity of model estimation (e.g. Alternating Least\nSquares can no longer be used to solve the bilinear problem). Real neurons also possess other forms\nof nonlinearities, such as local gain control that is been observed in neurons through the visual and\nauditory systems [33]. We are exploring means by which this functionality can be included directly\nin the model framework (e.g. [11]), while retaining the tractability of the parameter estimation.\n\nAcknowledgments\nThis work was supported by the Howard Hughes Medical Institute, and by NIH grant EY04440.\n\nReferences\n[1] H. B. Barlow and W. R. Levick. The mechanism of directionally selective units in rabbit\u2019s retina. The\n\nJournal of Physiology, 178(3):477, June 1965.\n\n[2] S. Hochstein and R. M. Shapley. Linear and nonlinear spatial subunits in Y cat retinal ganglion cells,\n\n1976.\n\n[3] J. B. Demb, K. Zaghloul, L. Haarsma, and P. Sterling. Bipolar cells contribute to nonlinear spatial\nsummation in the brisk-transient (Y) ganglion cell in mammalian retina. The Journal of neuroscience,\n21(19):7447\u20137454, 2001.\n\n[4] J.D. Crook, B.B. Peterson, O.S. Packer, F.R. Robinson, J.B. Troy, and D.M. Dacey. Y-cell receptive \ufb01eld\nand collicular projection of parasol ganglion cells in macaque monkey retina. The Journal of neuroscience,\n28(44):11277\u201311291, 2008.\n\n[5] P.X. Joris, C.E. Schreiner, and A. Rees. Neural processing of amplitude-modulated sounds. Physiol. Rev.,\n\n84:541\u2013577, 2004.\n\n[6] J. P. Jones and L. A. Palmer. The two-dimensional spatial structure of simple receptive \ufb01elds in cat striate\n\ncortex. Journal of neurophysiology, 58(6):1187\u20131211, 1987.\n\n[7] G. C. DeAngelis, I. Ohzawa, and R. D. Freeman. Spatiotemporal organization of simple-cell receptive\n\ufb01elds in the cat\u2019s striate cortex. I. General characteristics and postnatal development. Journal of neuro-\nphysiology, 69(4):1091\u20131117, 1993.\n\n[8] D. H. Hubel and T. N. Wiesel. Receptive \ufb01elds, binocular interaction and functional architecture in the\n\ncat\u2019s visual cortex. The Journal of Physiology, 160(1):106\u2013154, 1962.\n\n[9] J. A. Movshon, I. D. Thompson, and D. J. Tolhurst. Receptive \ufb01eld organization of complex cells in the\n\ncat\u2019s striate cortex. The Journal of Physiology, 283(1):79\u201399, 1978.\n\n[10] E. H. Adelson and J. R. Bergen. Spatiotemporal energy models for the perception of motion. Journal of\n\nthe Optical Society of America A, 2(2):284\u2013299, 1985.\n\n[11] N. C. Rust, O. Schwartz, J. A. Movshon, and E. P. Simoncelli. Spatiotemporal elements of macaque V1\n\nreceptive \ufb01elds. Neuron, 46(6):945\u2013956, June 2005.\n\n[12] X. Chen, F. Han, M. M. Poo, and Y. Dan. Excitatory and suppressive receptive \ufb01eld subunits in awake\nmonkey primary visual cortex (V1). Proceedings of the National Academy of Sciences, 104(48):19120\u2013\n19125, November 2007.\n\n[13] T. Lochmann, T. Blanche, and D. A. Butts. Construction of direction selectivity in V1: from simple to\n\ncomplex cells. Computational and Systems Neuroscience (CoSyNe), 2011.\n\n8\n\n\f[14] M. Ito and H. Komatsu. Representation of angles embedded within contour stimuli in area V2 of macaque\n\nmonkeys. The Journal of neuroscience, 24(13):3313\u20133324, 2004.\n\n[15] C. E. Bredfeldt, J. C. A. Read, and B. G. Cumming. A quantitative explanation of responses to disparity-\n\nde\ufb01ned edges in macaque V2. Journal of neurophysiology, 101(2):701\u2013713, 2009.\n\n[16] K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recog-\n\nnition unaffected by shift in position. Biological cybernetics, 36(4):193\u2013202, 1980.\n\n[17] M. Riesenhuber and T. Poggio. Hierarchical models of object recognition in cortex. Nature neuroscience,\n\n2:1019\u20131025, 1999.\n\n[18] E. De Boer. Reverse correlation I. A heuristic introduction to the technique of triggered correlation with\n\napplication to the analysis of compound systems. Proc. Kon. Nederl. Akad. Wet, 1968.\n\n[19] E. J. Chichilnisky. A simple white noise analysis of neuronal light responses. Network: Computation in\n\nNeural Systems, 12(2):199\u2013213, 2001.\n\n[20] R. D. R. V. Steveninck and W. Bialek. Real-Time Performance of a Movement-Sensitive Neuron in the\nBlow\ufb02y Visual System: Coding and Information Transfer in Short Spike Sequences. Proceedings of the\nRoyal Society B: Biological Sciences, 234(1277):379\u2013414, September 1988.\n\n[21] O. Schwartz, J. W. Pillow, N.C. Rust, and E.P. Simoncelli. Spike-triggered neural characterization. Jour-\n\nnal of Vision, 6(4):13\u201313, February 2006.\n\n[22] N. Brenner, W. Bialek, and R. de Ruyter van Steveninck. Adaptive rescaling maximizes information\n\ntransmission. Neuron, 26(3):695\u2013702, June 2000.\n\n[23] O. Schwartz, E. J. Chichilnisky, and E. P. Simoncelli. Characterizing neural gain control using spike-\n\ntriggered covariance. Advances in neural information processing systems, 1:269\u2013276, 2002.\n\n[24] J. Touryan, B. Lau, and Y Dan.\n\nIsolation of relevant visual features from random stimuli for cortical\n\ncomplex cells. The Journal of neuroscience, 22(24):10811\u201310818, 2002.\n\n[25] T. Sharpee, N. C. Rust, and W. Bialek. Analyzing neural responses to natural signals: maximally infor-\n\nmative dimensions. Neural computation, 16(2):223\u2013250, 2004.\n\n[26] C. Ekanadham, D. Tranchina, and E. P. Simoncelli. Recovery of sparse translation-invariant signals with\n\ncontinuous basis pursuit. IEEE Trans Signal Processing, 59(10):4735\u20134744, Oct 2011.\n\n[27] M. Ahrens, L. Paninski, and M. Sahani. Inferring input nonlinearities in neural encoding models. Net-\n\nwork: Computation in Neural Systems, 19(1):35\u201367, 2008.\n\n[28] J. D. Victor and R. M. Shapley. The nonlinear pathway of Y ganglion cells in the cat retina. The Journal\n\nof General Physiology, 74(6):671\u2013689, December 1979.\n\n[29] M. Eickenberg, R. J. Rowekamp, M. Kouh, and T. O. Sharpee. Characterizing responses of translation-\ninvariant neurons to natural stimuli: maximally informative invariant dimensions. Neural computation,\n24(9):2384\u20132421, September 2012.\n\n[30] E. P. Simoncelli and W. T. Freeman. The steerable pyramid: A \ufb02exible architecture for multi-scale\nderivative computation. Image Processing, 1995. Proceedings., International Conference on, 3:444\u2013447\nvol. 3, 1995.\n\n[31] J. A. Movshon and W. T. Newsome. Visual response properties of striate cortical neurons projecting to\n\narea MT in macaque monkeys. The Journal of neuroscience, 16(23):7733\u20137741, 1996.\n\n[32] E. P. Simoncelli and D. J. Heeger. A model of neuronal responses in visual area MT. Vision Research,\n\n38(5):743\u2013761, March 1998.\n\n[33] M. Carandini and D. J. Heeger. Normalization as a canonical neural computation. Nature Reviews Neu-\n\nroscience, 13(1):51\u201362, November 2011.\n\n9\n\n\f", "award": [], "sourceid": 1432, "authors": [{"given_name": "Brett", "family_name": "Vintch", "institution": null}, {"given_name": "Andrew", "family_name": "Zaharia", "institution": null}, {"given_name": "J", "family_name": "Movshon", "institution": null}, {"given_name": "Eero", "family_name": "Simoncelli", "institution": ""}]}