{"title": "How Linear are Auditory Cortical Responses?", "book": "Advances in Neural Information Processing Systems", "page_first": 125, "page_last": 132, "abstract": null, "full_text": "How Linear are Auditory Cortical Responses?\n\nManeesh Sahani\nGatsby Unit, UCL\n\n17 Queen Sq., London, WC1N 3AR, UK.\n\nJennifer F. Linden\nKeck Center, UCSF\n\nSan Francisco, CA 94143\u20130732.\n\nmaneesh@gatsby.ucl.ac.uk\n\nlinden@phy.ucsf.edu\n\nAbstract\n\nBy comparison to some other sensory cortices, the functional proper-\nties of cells in the primary auditory cortex are not yet well understood.\nRecent attempts to obtain a generalized description of auditory cortical\nresponses have often relied upon characterization of the spectrotempo-\nral receptive \ufb01eld (STRF), which amounts to a model of the stimulus-\nresponse function (SRF) that is linear in the spectrogram of the stimulus.\nHow well can such a model account for neural responses at the very \ufb01rst\nstages of auditory cortical processing? To answer this question, we de-\nvelop a novel methodology for evaluating the fraction of stimulus-related\nresponse power in a population that can be captured by a given type of\nSRF model. We use this technique to show that, in the thalamo-recipient\nlayers of primary auditory cortex, STRF models account for no more\nthan 40% of the stimulus-related power in neural responses.\n\n1 Introduction\n\nA number of recent studies have suggested that spectrotemporal receptive \ufb01eld (STRF)\nmodels [1, 2], which are linear in the stimulus spectrogram, can describe the spiking re-\nsponses of auditory cortical neurons quite well [3, 4]. At the same time, other authors have\npointed out signi\ufb01cant non-linearities in auditory cortical responses [5, 6], or have empha-\nsized both linear and non-linear response components [7, 8]. Some of the differences in\nthese results may well arise from differences in the stimulus ensembles used to evoke neu-\nronal responses. However, even for a single type of stimulus, it is extremely dif\ufb01cult to put\na number to the proportion of the response that is linear or non-linear, and so to judge the\nrelative contributions of the two components to the stimulus-evoked activity.\n\nThe dif\ufb01culty arises because repeated presentations of identical stimulus sequences evoke\nhighly variable responses from neurons at intermediate stages of perceptual systems, even\nin anaesthetized animals. While this variability may re\ufb02ect meaningful changes in the\ninternal state of the animal or may be completely random, from the point of view of mod-\nelling the relationship between stimulus and neural response it must be treated as noise.\nAs previous authors have noted [9, 10], this noise complicates the evaluation of the perfor-\nmance of a particular class of stimulus-response function (SRF) model (for example, the\nclass of STRF models) in two ways. First, it makes it dif\ufb01cult to assess the quality of the\npredictions given by any single model. Perfect prediction of a noisy response is impossi-\nble, even in principle, and since the the true underlying relationship between stimulus and\nneural response is unknown, it is unclear what degree of partial prediction could possibly\n\n\fbe expected. Second, the noise introduces error into the estimation of the model parame-\nters; consequently, even where direct unbiased evaluations of the predictions made by the\nestimated models are possible, these evaluations understate the performance of the model\nin the class that most closely matches the true SRF.\n\nThe dif\ufb01culties can be illustrated in the context of the classical statistical measure of the\n\nfraction of variance explained by a model, the coef\ufb01cient of determination or \u0002\u0001 statistic.\n\nThis is the ratio of the reduction in variance achieved by the regression model (the total\nvariance of the outputs minus the variance of the residuals) to the total variance of the\noutputs. The total variance of the outputs includes contributions from the noise, and so\n\nMoreover, the reduction of variance on the training data, which appears in the numerator\n\nan \u0003\u0001 of 1 is an unrealistic target, and the actual maximum achievable value is unclear.\nof the \u0003\u0001 , includes some \u201cexplanation\u201d of noise due to over\ufb01tting. The extent to which\nmodel in the class. Hypothesis tests based on \u0004\u0001 compensate for these shortcomings in\n\nthis happens is dif\ufb01cult to estimate; if the reduction in variance is evaluated on test data,\nestimation errors in the model will lead to an underestimate of the performance of the best\n\nanswering questions of model suf\ufb01ciency. However, these tests do not provide a way to\nassess the extent of partial validity of a model class; indeed, it is well known that even\nthe failure of a hypothesis test to reject a speci\ufb01c model class is not suf\ufb01cient evidence to\nregard the model as fully adequate. One proposed method for obtaining a more quantita-\ntive measure of model performance is to compare the correlation (or, equivalently, squared\ndistance) between the model prediction and a new response measurement to that between\ntwo successive responses to the same stimulus [9, 11]; as acknowledged in those propos-\nals, however, this yardstick underestimates the response reliability even after considerable\naveraging, and so the comparison will tend to overestimate the validity of the SRF model.\n\nMeasures like\u0003\u0001\n\nthat are based on the fractional variance (or, for time series, the power) ex-\nplained by a model do have some advantages; for example, contributions from independent\nsources are additive. Here, we develop analytic techniques that overcome the systematic\nnoise-related biases in the usual variance measures 1, and thus obtain, for a population of\nneurons, a quantitative estimate of the fraction of stimulus-related response captured by a\ngiven class of models. This statistical framework may be applicable to analysis of response\nfunctions for many types of neural data, ranging from intracellular recordings to imaging\nmeasurements. We apply it to extracellular recordings from rodent auditory cortex, quan-\ntifying the degree to which STRF models can account for neuronal responses to dynamic\nrandom chord stimuli. We \ufb01nd that on average less than half of the reliable stimulus-related\npower in these responses can be captured by spectrogram-linear STRF models.\n\n2 Signal power\n\nThe analysis assumes that the data consist of spike trains or other neural measurements\ncontinuously recorded during presentation of a long, complex, rapidly varying stimulus.\nThis stimulus is treated as a discrete-time process. In the auditory experiment considered\nhere, the discretization was set by the duration of regularly clocked sound pulses of \ufb01xed\nlength; in a visual experiment, the discretization might be the frame rate of a movie. The\nneural response can then be measured with the same level of precision, counting action\npotentials (or integrating measurements) to estimate a response rate for each time bin, to\n. We propose to measure model performance in\nterms of the fraction of response power predicted successfully, where \u201cpower\u201d is used in\n\nobtain a response vector \u0005\u0007\u0006\nthe sense of average squared deviation from the mean: \u0014\n\n1An alternative would be to measure information or conditional entropy rates. However, the ques-\ntion of how much relevant information is preserved by a model is different from the question of how\naccurate a model\u2019s prediction is. For example, an information theoretic measure would not distin-\nguish between a linear model and the same linear model cascaded with an invertible non-linearity.\n\n\u000b\u001c\u0001\u0004\u001d (\u0019\u001c\u001e\u0012\u001b denoting\n\n\t\u0018\u0017\u001a\u0019\n\n\n\t\f\u000b\r\t\n\n\u000e\u0004\u000f\u0011\u0010\u0012\u0010\u0012\u0010\n\n\u0006\u0016\u0015\n\n\t\f\u001b\n\n\b\n\u0013\n\b\n\u0005\n\u000b\n\b\n\n\n\faverages over time). As argued above, only some part of the total response power is pre-\ndictable, even in principle; fortunately, this signal power can be estimated by combining\nrepeated responses to the same stimulus sequence. We present a method-of-moments [12]\nderivation of the relevant estimator below.\n\n\u0001\u0003\u0002\f\u0004\n\n\u0001\u0003\u0002\f\u0004\n\u0001\u0016\u0002\f\u0004\n\u000b\u0015\u000e\n\u000b\u0018\u0017\n\u0001\u0016\u0002\f\u0004\n\u000b\u001a\u0019\n. Thus solving for \u0014\n\u0001\u0003\u0002\f\u0004\n\u000b\u001a\u001e\u0014\u0017\n\nresponses \u0005\n\u0001\u0003\u0002\f\u0004\n\b\u0012\u0011\n\u000b\f\u0001\u0005\u0013\n\nSuppose we have\ndependent component (signal) in the response and\nnent of the response in the\n\nis the common, stimulus-\nis the (zero-mean) noise compo-\nth trial. The expected power in each response is given by\n\n\u000b\u000f\u000e\n\nsimple relationship depends only on the noise component having been de\ufb01ned to have zero\nmean, and holds even if the variance or other property of the noise depends on the signal\nstrength. We now construct two trial-averaged quantities, similar to the sum-of-squares\nterms used in the analysis of variance (ANOVA) [12]: the power of the average response,\n\n\u0001\u0003\u0002\f\u0004\nand the average power per response. Using \u001e to indicate trial averages:\n\n\u0001\u0003\u0002\f\u0004 , where\n\u0001\u0003\u0002\f\u0004\n\u0006 means \u201cequal in expectation\u201d). This\n\n\u0001\u0003\u0002\u0005\u0004\n\u0006\u0007\u0006\t\b\u000b\n\n(where the symbol \u000e\n\u0001\u0003\u0002\f\u0004\n\nand\n\nAssuming the noise in each trial is independent (although the noise in different time bins\nwithin a trial need not be), we have:\nsuggests the following estimator for the signal power:\n\n\u0001\u0003\u0002\f\u0004\n\n\u000b\u0014\u000e\n\n\u0001\u0003\u0002\f\u0004\n\n(1)\n(A similar estimator for the noise power is obtained by subtracting this expression from\n\ntributed for recordings that are considerably longer than the time-scale of noise correlation\n\nnecessarily positive). However, since each of the power terms in (1) is the mean of at least\n\n\ufb01rst and second moments and is independent between trials, as can be veri\ufb01ed by explic-\nitly calculating its expected value. Unlike the sum-of-squares terms encountered in an\n\n\u0001\u0003\u0002\f\u0004\n\u000b .) This estimator is unbiased, provided only that the noise distribution has de\ufb01ned\n\u0001 variate even when the noise is normally distributed (indeed, it is not\nANOVA, it is not a\u001f\n numbers, the central limit theorem suggests that\n\u0014 will be approximately normally dis-\n(in the experiment considered here, \n\u0006\"!\u0005#$#\u0005#\n%\"&\n\u001721\nTr<=/>/@?\n\u0006*)\n\u0014('\n:\u0015C(2)\n 4365\n36893;:\n\u0006.-0/.\u0006\n-\u0016\u00067\b\n,+\nis the ( ED\u000f \nwhere/\n) covariance matrix of the noise,5\n,8\nis the average of all the elements of/\neach column of/\n. Thus,%GF\n' depends only on the \ufb01rst and second moments of the response dis-\nerror bar for the estimator. In this way we have obtained an estimate \u001b\n\nmean\ntribution; substitution of data-derived estimates of these moments into (2) yields a standard\n(with correspond-\ning uncertainty) of the maximum possible signal power that any model could accurately\npredict, without having assumed any particular distribution or time-independence of the\nnoise.\n\nis a vector formed by averaging\nis the time-average of the\n\n). Its variance is given by:\n\n\u0017A1\n B5\n\nand3\n\n3 Extrapolating Model Performance\n\nTo compare the performance of an estimated SRF model to this maximal value, we must\ndetermine the amount of response power successfully predicted by the model. This is not\nnecessarily the power of the predicted response, since the prediction may be inaccurate.\n\nInstead, the residual power in the difference between a measured response \u0005 and the pre-\n\u000b , is taken as an estimate of the error power.\ndicted responseH\n\n(The measured response used for this evaluation, and the stimulus which elicited it, may or\nmay not also have been used to identify the parameters of the SRF model being evaluated;\nsee explanation of training and test predictive powers below.) The difference between the\n\nto the same stimulus, \u0014\n\n\n\u0006\n\n\u0014\n\b\n\u0005\n\u0006\n\u0014\n\b\n\u0006\n\u000b\n\b\n\u0010\n\t\n\u0014\n\b\n\u0005\n\u0006\n\u0014\n\b\n\u0006\n\u000b\n\b\n\u0014\n\b\n\n\u000b\n\u0014\n\b\n\u0005\n\u0006\n\u0014\n\b\n\u0006\n\u000b\n\b\n\u0014\n\b\n\n\u0014\n\b\n\n\u000b\n\u000e\n\u0006\n\u0014\n\b\n\n\n\b\n\u0006\n\u000b\n\u001b\n\u0014\n\b\n\u0006\n\u000b\n\u0006\n\u001c\n\n\u0017\n\u001c\n\u001d\n\n\u0014\n\b\n\u0005\n\u000b\n\u0017\n\u0014\n\b\n\u0005\n\u0014\n\b\n\u0005\n\u001b\n\u001b\n\u001c\n \n\u0001\n\b\n1\n\n\b\n\n\u0017\n\u001c\n\u000b\n+\n\u001c\n \n\u0001\n-\n5\n\b\n8\n\u0001\n\u0006\n\n&\n\u001b\n\u0014\n\u0014\n\b\n\u0005\n\u0017\nH\n\fmodel; it is this value that can be compared to the estimated signal power\n\n\u000b and the error power gives the predictive power of the\n\npower in the observed response \u0014\n\n\u000b .\n\nTo be able to describe more than one neuron, an SRF model class must contain parameters\nthat can be adapted to each case. Ideally, the power of the model class to describe a pop-\nulation of neurons would be judged using parameters that produced models closest to the\ntrue SRFs (the ideal models), but we do not have a priori knowledge of those parameters.\nInstead, the parameters must be tuned in each case using the measured neural responses.\nOne way to choose SRF model parameters is to minimize the mean squared error (MSE)\nbetween the neural response in the training data and the model prediction for the same\nstimulus; for example, the Wiener kernel minimizes the MSE for a model based on a \ufb01nite\nimpulse response \ufb01lter of \ufb01xed length. This MSE is identical to the error power that would\nbe obtained when the training data themselves are used as the reference measured response\n\n\u0005 . Thus, by minimizing the MSE, we maximize the predictive power evaluated against the\n\ntraining data. The resulting maximum value, hereafter the training predictive power, will\noverestimate the predictive ability of the ideal model, since the minimum-MSE parameters\nwill be over\ufb01t to the training data. (Over\ufb01tting is inevitable, because model estimates based\non \ufb01nite data will always capture some stimulus-independent response variability.) More\nprecisely, the expected value of the training predictive power is an upper bound on the true\npredictive power of the model class; we therefore refer to the training predictive power\nitself as an upper estimate of the SRF model performance. We can also obtain a lower es-\ntimate, de\ufb01ned similarly, by empirically measuring the generalization performance of the\nmodel by cross-validation. This provides an unbiased estimate of the average generaliza-\ntion performance of the \ufb01tted models; however, since these models are inevitably over\ufb01t\nto their training data, the expected value of this cross-validation predictive power bounds\nthe true predictive power of the ideal model from below, and thereby provides the desired\nlower estimate.\n\nFor any one recording, the predictive power of the ideal SRF model of a particular class can\nonly be bracketed between these upper and lower estimates (that is, between the training\nand cross-validation predictive powers). As the noise in the recording grows, the model\nparameters will over\ufb01t more and more to the noise, and hence both estimates will grow\nlooser. Indeed, in high-noise conditions, the model may primarily describe the stimulus-\nindependent (noise) part of the training data, and so the training predictive power might\n\n\u000b ), while the cross-validation predictive power may\n\nexceed the estimated signal power (\u001b\n\nfall below zero (that is, the model\u2019s predictions may become more inaccurate than simply\npredicting a constant response). As such, the estimates may not usefully constrain the\npredictive power on a particular recording. However, assuming that the predictive power\nof a single model class is similar for a population of similar neurons, the noise dependence\ncan be exploited to tighten the estimates when applied to the population as a whole, by\nextrapolating within the population to the zero noise point. This extrapolation allows us\nto answer the sort of question posed at the outset: how well, in an absolute sense, can a\nparticular SRF model class account for the responses of a population of neurons?\n\n4 Experimental Methods\n\nExtracellular neural responses were collected from the primary auditory cortex of rodents\nduring presentation of dynamic random chord stimuli. Animals (6 CBA/CaJ mice and 4\nLong-Evans rats) were anaesthetized with either ketamine/medetomidine or sodium pento-\nbarbital, and a skull fragment over auditory cortex was removed; all surgical and experi-\nmental procedures conformed to protocols approved by the UCSF Committee on Animal\nResearch. An ear plug was placed in the left ear, and the sound \ufb01eld created by the free-\n\ufb01eld speakers was calibrated near the opening of the right pinna. Neural responses (205\nrecordings collected from 68 recording sites) were recorded in the thalamo-recipient layers\n\n\b\n\u0005\n\u001b\n\u0014\n\b\n\u0006\n\u0014\n\b\n\u0006\n\f0.4\n\n)\nn\nb\n\ni\n\n0.3\n\n/\n\n2\n\ni\n\ns\ne\nk\np\ns\n(\n \nr\ne\nw\no\np\n\n \nl\n\na\nn\ng\nS\n\ni\n\n0.2\n\n0.1\n\n0\n\n0\n\n0.5\n\n1\n\n1.5\n\n2.5\nNoise power (spikes2/bin)\n\n2\n\n3\n\n3.5\n\n4\n\n0\n\n50\n\n100\n\n150\n\nNumber of recordings\n\nFigure 1: Signal power in neural responses.\n\nof the left auditory cortex while the stimulus (see below) was presented to the right ear.\nRecordings often re\ufb02ected the activity of a number of neurons; single neurons were iden-\nti\ufb01ed by Bayesian spike-sorting techniques [13, 14] whenever possible. All analyses pool\ndata from mice and rats, barbiturate and ketamine/medetomidine anesthesia, high and low\nfrequency stimulation, and single-unit and multi-unit recordings; each group individually\nmatched the aggregate behaviour described here.\n\nThe dynamic random chord stimulus used in the auditory experiments was similar to that\nused in a previous study [15], except that the intensity of component tone pulses was vari-\nable. Tone pulses were 20 ms in length, ramped up and down with 5 ms cosine gates. The\ntimes, frequencies and sound intensities of the pulses were chosen randomly and indepen-\ndently from 20 ms bins in time, 1/12 octave bins covering either 2\u201332 or 25\u2013100 kHz in\nfrequency, and 5 dB SPL bins covering 25\u201370 dB SPL in level. At any time point, the\nstimulus averaged two tone pulses per octave, with an expected loudness of approximately\n73 dB SPL for the 2\u201332 kHz stimulus and 70 dB SPL for the 25\u2013100 kHz stimulus. The\ntotal duration of each stimulus was 60 s. At each recording site, the 2\u201332 kHz stimulus was\nrepeated 20 times, and the 25\u2013100 kHz stimulus was repeated 10 times.\n\nNeural responses were binned at 20 ms, and STRFs \ufb01t by linear regression of the average\nspike rate in each bin onto vectors formed from the amplitudes of tone pulses falling within\nthe preceding 300 ms of the stimulus (15 pulse-widths, starting with pulses coincident with\nthe target spike-rate bin). The regression parameters thus included a single \ufb01lter weight\nfor each frequency-time bin in this window, and an additional offset (or bias) weight. A\nBayesian technique known as automatic relevance determination (ARD) [16] was used to\nimprove the STRF estimates. In this case, an additional parameter re\ufb02ecting the average\nnoise in the response was also estimated. Models incorporating static output non-linearities\nwere \ufb01t by kernel regression between the output of the linear model (\ufb01t by ARD) and the\ntraining data. The kernel employed was Gaussian with a half-width of 0.05 spike/bin; per-\nformance at this width was at least as good as that obtained by selecting widths individually\nfor each recording by leave-one-out cross-validation. Cross-validation for lower estimates\non model predictive power used 10 disjoint splits into 9/10 training data and 1/10 test data.\nExtrapolation of the predictive powers in the population, shown in Figs. 2 and 3, was per-\nformed using polynomial \ufb01ts. The degree of the polynomial, determined by leave-one-out\ncross-validation, was quadratic for the lower estimates in Fig. 3 and linear in all other cases.\n\n5 Results\n\nWe used the techniques described above to ask how accurate a description of auditory\ncortex responses could be provided by the STRF. Recordings were binned to match the\n\n\fdiscretization rate of the stimulus and the signal power estimated using equation (1). Fig. 1\nshows the distribution of signal powers obtained, as a scatter plot against the estimated\nnoise power and as a histogram. The error bars indicate standard error intervals based on\nthe estimated variances obtained from equation (2). A total of 92 recordings in the data\nset (42 from mouse, 50 from rat), shown by \ufb01lled circles and histogram bars in Fig. 1,\nhad signal power greater than one standard error above zero. The subsequent analysis was\ncon\ufb01ned to these stimulus-responsive recordings.\n\nFor each such recording we estimated an STRF model by minimum-MSE linear regres-\nsion, which is equivalent to obtaining the Wiener kernel for the time-series. The training\npredictive power of this model provided the upper estimate for the predictive power of\nthe model class. The minimum-MSE solution generalizes poorly, and so generates overly\npessimistic lower estimates in cross-validation. However, the linear regression literature\nprovides alternative parameter estimation techniques with improved generalization ability.\nIn particular, we used a Bayesian hyperparameter optimization technique known as Auto-\nmatic Relevance Determination [16] (ARD) to \ufb01nd an optimized prior on the regression\nparameters, and then chose parameters which optimized the posterior distribution under\nthis prior and the training data (this and other similar techniques are discussed in Sahani\nand Linden, \u201cEvidence Optimization Techniques for Estimating Stimulus-Response Func-\ntions\u201d, this volume). The cross-validation predictive power of these estimates served as the\nlower estimates of the model class performance.\n\n) and lower (\n\nFig. 2 shows the upper (\n) estimates for the predictive power of the class of\nlinear STRF models in our population of rodent auditory cortex recordings, as a function\nof the estimated noise level in each recording. The divergence of the estimates at higher\nnoise levels, described above, is evident. At low noise levels the estimates do not converge\nperfectly, the extrapolated values being\n\nfor the lower (intervals are standard errors). This gap is indicative of an SRF model\nclass that is insuf\ufb01ciently powerful to capture the true stimulus-response relationship; even\nif noise were absent, the trained model from the class would only be able to approximate\nthe true SRF in the region of the \ufb01nite amount of data used for training, and so would\nperform better on those training data than on test data drawn from outside that region.\n\n#\u0005#\b\u0005\n\t\n\n\u0002\u0004\u0003\n\n\u0002 for the upper estimate and\n\n#$#\u0006\u0005\n\nFig. 3 shows the same estimates for simulations derived from linear \ufb01ts to the cortical\ndata. Simulated data were produced by generating Poisson spike trains with mean rates\nas predicted by the ARD-estimated models for real cortical recordings, and rectifying so\nthat negative predictions were treated as zero. Simulated spike trains were then binned and\nanalyzed in the same manner as real spike trains. Since the simulated data are spectrogram-\nlinear by construction apart from the recti\ufb01cation, we expect the estimates to converge to a\nvalue very close to 1 with little separation. This result is evident in Fig. 3. Thus, the analysis\ncorrectly reports that virtually all of the response power in these simulations is linearly\n\nr\ne\nw\no\np\n\n \n\nl\n\ne\nb\na\n\ni\n\nt\nc\nd\ne\nr\np\n\n \ny\nl\nr\na\ne\nn\n\ni\nl\n \n\nd\ne\nz\n\ni\nl\n\na\nm\nr\no\nN\n\n1.5\n\n1\n\n0.5\n\n0\n\n\u22120.5\n\n0\n\n20\n\n40\n\n60\n\nNormalized noise power\n\n1\n\n0.5\n\n0\n\nFigure 2: Evaluation of STRF predictive power in auditory cortex.\n\n0\n\n10\n\n20\n\n30\n\n\n\u0001\n#\n\u0017\n)\n#\n1\n#\n\u0017\n#\n\u0017\n\u001c\n\u0007\n\u001c\n\u0005\n\u0003\n#\n\u0017\n\fl\n\n \n\nr\ne\nw\no\np\ne\nb\na\nt\nc\nd\ne\nr\np\n \ny\nl\nr\na\ne\nn\n\ni\n\ni\nl\n \n\nd\ne\nz\n\ni\nl\n\na\nm\nr\no\nN\n\n1.5\n\n1\n\n3\n\n2.5\n\n2\n\n1.5\n\n1\n\n0.5\n\n0\n\n\u22120.5\n\n0\n\n50\n\n100\n\nNormalized noise power\n\n20\nFigure 3: Evaluation of linearity in simulated data.\n\n10\n\n0\n\n0.5\n\n30\n\npredictable from the stimulus spectrogram, attesting to the reliability of the extrapolated\nestimates for the real data in Fig. 2.\n\nSome portion of the scatter of the points about the population average lines in Fig. 2 re\ufb02ects\ngenuine variability in the population, and so the extrapolated scatter at zero noise is also\nof interest. Intervals containing at least 50% of the population distribution for the cortical\n\ndata are \b\n\n\u000b for the upper estimate and \b\n\n\u000b for the lower estimate\n\n1\u0001\n\n(assuming normal scatter). These will be overestimates of the spread in the underlying\npopulation distribution because of additional scatter from estimation noise. The variability\nof STRF predictive power in the population appears unimodal, and the hypothesis that\nthe distributions of the deviations from the regression lines are zero-mean normal in both\n). Thus the treatment of these\ncases cannot be rejected (Kolmogorov-Smirnov test,\nrecordings as coming from a single homogeneous population is reasonable. In Fig. 3, there\nis a small amount of downward bias and population scatter due to the varying amounts of\nrecti\ufb01cation in the simulations; however, most of the observed scatter is due to estimation\nerror resulting from the incorporation of Poisson noise.\n\nThe linear model is not constrained to predict non-negative \ufb01ring rates. To test whether\nincluding a static output non-linearity could improve predictions, we also \ufb01t models in\nwhich the prediction from the ARD-derived STRF estimates was transformed time-point by\ntime-point by a non-parametric non-linearity (see Experimental Methods) to obtain a new\n\ufb01ring rate prediction. The resulting cross-validation predictive powers were compared to\nthose of the spectrogram-linear model (data not shown). The addition of a static output non-\nlinearity contributed very little to the predictive power of the STRF model class. Although\n, Wilcoxon signed rank\nthe difference in model performance was signi\ufb01cant (\ntest), the mean normalized predictive power increase with the addition of a static output\nnon-linearity was very small (0.031).\n\n\u0003\u0005\u0004\n\n#$#\n\n6 Conclusions\n\nWe have demonstrated a novel way to evaluate the fraction of response power in a popula-\ntion of neurons that can be captured by a particular class of SRF models. The confounding\neffects of noise on evaluation of model performance and estimation of model parameters\nare overcome by two key analytic steps. First, multiple measurements of neural responses\nto the same stimulus are used to obtain an unbiased estimate of the fraction of the response\nvariance that is predictable in principle, against which the predictive power of a model may\nbe judged. Second, Bayesian regression techniques are employed to lessen the effects of\nnoise on linear model estimation, and the remaining noise-related bias is eliminated by\nexploiting the noise-dependence of parameter-estimation-induced errors in the predictive\npower to extrapolate model performance for a population of similar recordings to the zero\n\n#\n\u0017\n\u0007\nC\n#\n\u0017\n\n)\n\u0007\n#\n\u0017\n#\n1\n!\nC\n#\n\u0017\n!\n)\n#\n\u0002\n\u0006\n#\n\u0017\n\u001c\n#\n\u0017\n\u001c\n\fnoise point. This technique might \ufb01nd broad applicability to regression problems in neu-\nroscience and elsewhere, provided certain essential features of the data considered here are\nshared: repeated measurements must be made at the same input values in order to esti-\nmate the signal power; both inputs and repetitions must be numerous enough for the signal\npower estimate, which appears in the denominator of the normalized powers, to be well-\nconditioned; and \ufb01nally we must have a group of different regression problems, with dif-\nferent normalized noise powers, that might be expected to instantiate the same underlying\nmodel class. Data with these features are commonly encountered in sensory neuroscience,\nwhere the sensory stimulus can be reliably repeated. The outputs modelled may be spike\ntrains (as in the present study) or intracellular recordings; local-\ufb01eld, evoked-potential, or\noptical recordings; or even fMRI measurements.\n\nApplying this technique to analysis of the primary auditory cortex we \ufb01nd that spectrogram-\nlinear response components can account for only 18% to 40% (on average) of the power\nin extracellular responses to dynamic random chord stimuli. Further, elaborated models\nthat append a static output non-linearity to the linear \ufb01lter are barely more effective at pre-\ndicting responses to novel stimuli than is the linear model class alone. Previous studies\nof auditory cortex have reached widely varying conclusions regarding the degree of lin-\nearity of neural responses. Such discrepancies may indicate that response properties are\ncritically dependent on the statistics of the stimulus ensemble [6, 5, 10], or that cortical\nresponse linearity differs between species. Alternatively, as previous measures of linearity\nhave been biased by noise, the divergent estimates might also have arisen from variation\nin the level of noise power across studies. Our approach represents the \ufb01rst evaluation of\nauditory cortex response predictability that is free of this potential noise confound. The\nhigh degree of response non-linearity we observe may well be a characteristic of all audi-\ntory cortical responses, given the many known non-linearities in the peripheral and central\nauditory systems [17]. Alternatively, it might be unique to auditory cortex responses to\nnoisy sounds like dynamic random chord stimuli, or else may be general to all stimulus en-\nsembles and all sensory cortices. Current and future work will need to be directed toward\nmeasurement of auditory cortical response linearity using different stimulus ensembles and\nin different species, and toward development of non-linear classes of models that predict\nauditory cortex responses more accurately than spectrogram-linear models.\n\nReferences\n[1] Aertsen, A. M. H. J, Johannesma, P. I. M, & Hermes, D. J. (1980) Biol  Cybern  38, 235\u2013248.\n[2] Eggermont, J. J, Johannesma, P. M, & Aertsen, A. M. (1983) Q Rev Biophys 16, 341\u2013414.\n[3] Kowalski, N, Depireux, D. A, & Shamma, S. A. (1996) J  Neurophysiol  76, 3524\u20133534.\n[4] Shamma, S. A & Versnel, H. (1995) Aud  Neurosci  1, 255\u2013270.\n[5] Nelken, I, Rotman, Y, & Yosef, O. B. (1999) Nature 397, 154\u2013157.\n[6] Rotman, Y, Bar-Yosef, O, & Nelken, I. (2001) Hear  Res  152, 110\u2013127.\n[7] Nelken, I, Prut, Y, Vaadia, E, & Abeles, M. (1994) Hear  Res  72, 206\u2013222.\n[8] Calhoun, B. M & Schreiner, C. E. (1998) Eur J Neurosci 10, 926\u2013940.\n[9] Eggermont, J. J, Aertsen, A. M, & Johannesma, P. I. (1983) Hear  Res  10, 167\u2013190.\n[10] Theunissen, F. E, Sen, K, & Doupe, A. J. (2000) J. Neurosci. 20, 2315\u20132331.\n[11] Nelken, I, Prut, Y, Vaadia, E, & Abeles, M. (1994) Hear  Res  72, 223\u2013236.\n[12] Lindgren, B. W. (1993) Statistical Theory. (Chapman & Hall), 4th edition. ISBN: 0412041812.\n[13] Lewicki, M. S. (1994) Neural Comp 6, 1005\u20131030.\n[14] Sahani, M. (1999) Ph.D. thesis (California Institute of Technology, Pasadena, California).\n[15] deCharms, R. C, Blake, D. T, & Merzenich, M. M. (1998) Science 280, 1439\u20131443.\n[16] MacKay, D. J. C. (1994) ASHRAE Transactions 100, 1053\u20131062.\n[17] Popper, A & Fay, R, eds.\n\n(1992) The Mammalian Auditory Pathway: Neurophysiology.\n\n(Springer, New York).\n\n\f", "award": [], "sourceid": 2335, "authors": [{"given_name": "Maneesh", "family_name": "Sahani", "institution": null}, {"given_name": "Jennifer", "family_name": "Linden", "institution": null}]}