{"title": "A Bayesian method for reducing bias in neural representational similarity analysis", "book": "Advances in Neural Information Processing Systems", "page_first": 4951, "page_last": 4959, "abstract": "In neuroscience, the similarity matrix of neural activity patterns in response to different sensory stimuli or under different cognitive states reflects the structure of neural representational space. Existing methods derive point estimations of neural activity patterns from noisy neural imaging data, and the similarity is calculated from these point estimations. We show that this approach translates structured noise from estimated patterns into spurious bias structure in the resulting similarity matrix, which is especially severe when signal-to-noise ratio is low and experimental conditions cannot be fully randomized in a cognitive task. We propose an alternative Bayesian framework for computing representational similarity in which we treat the covariance structure of neural activity patterns as a hyper-parameter in a generative model of the neural data, and directly estimate this covariance structure from imaging data while marginalizing over the unknown activity patterns. Converting the estimated covariance structure into a correlation matrix offers a much less biased estimate of neural representational similarity. Our method can also simultaneously estimate a signal-to-noise map that informs where the learned representational structure is supported more strongly, and the learned covariance matrix can be used as a structured prior to constrain Bayesian estimation of neural activity patterns. Our code is freely available in Brain Imaging Analysis Kit (Brainiak) (https://github.com/IntelPNI/brainiak), a python toolkit for brain imaging analysis.", "full_text": "A Bayesian method for reducing bias in neural\n\nrepresentational similarity analysis\n\nMing Bo Cai\n\nPrinceton Neuroscience Institute\n\nPrinceton University\nPrinceton, NJ 08544\n\nmcai@princeton.edu\n\nJonathan W. Pillow\n\nPrinceton Neuroscience Institute\n\nPrinceton University\nPrinceton, NJ 08544\n\npillow@princeton.edu\n\nNicolas W. Schuck\n\nPrinceton Neuroscience Institute\n\nPrinceton University\nPrinceton, NJ 08544\n\nnschuck@princeton.edu\n\nYael Niv\n\nPrinceton Neuroscience Institute\n\nPrinceton University\nPrinceton, NJ 08544\n\nyael@princeton.edu\n\nAbstract\n\nIn neuroscience, the similarity matrix of neural activity patterns in response to\ndifferent sensory stimuli or under different cognitive states re\ufb02ects the structure\nof neural representational space. Existing methods derive point estimations of\nneural activity patterns from noisy neural imaging data, and the similarity is\ncalculated from these point estimations. We show that this approach translates\nstructured noise from estimated patterns into spurious bias structure in the resulting\nsimilarity matrix, which is especially severe when signal-to-noise ratio is low and\nexperimental conditions cannot be fully randomized in a cognitive task. We propose\nan alternative Bayesian framework for computing representational similarity in\nwhich we treat the covariance structure of neural activity patterns as a hyper-\nparameter in a generative model of the neural data, and directly estimate this\ncovariance structure from imaging data while marginalizing over the unknown\nactivity patterns. Converting the estimated covariance structure into a correlation\nmatrix offers a much less biased estimate of neural representational similarity. Our\nmethod can also simultaneously estimate a signal-to-noise map that informs where\nthe learned representational structure is supported more strongly, and the learned\ncovariance matrix can be used as a structured prior to constrain Bayesian estimation\nof neural activity patterns. Our code is freely available in Brain Imaging Analysis\nKit (Brainiak) (https://github.com/IntelPNI/brainiak).\n\n1 Neural pattern similarity as a way to understand neural representations\n\nUnderstanding how patterns of neural activity relate to internal representations of the environment\nis one of the central themes of both system neuroscience and human neural imaging [20, 5, 7, 15].\nOne can record neural responses (e.g. by functional magnetic resonance imaging; fMRI) while\nparticipants observe sensory stimuli, and in parallel, build different computational models to mimic\nthe brain\u2019s encoding of these stimuli. Neural activity pattern corresponding to each feature of an\nencoding model can then be estimated from the imaging data. Such activity patterns can be used\nto decode the perceived content with respect to the encoding features from new imaging data. The\ndegree to which stimuli can be decoded from one brain area based on different encoding models\ninforms us of the type of information represented in that area. For example, an encoding model based\non motion energy in visual stimuli captured activity \ufb02uctuations from visual cortical areas V1 to V3,\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fand was used to successfully decode natural movie watched during an fMRI scan [14]. In contrast,\nencoding models based on semantic categories can more successfully decode information from higher\nlevel visual cortex [7].\nWhile the decoding performance of different encoding models informs us of the type of information\nrepresented in a brain region, it does not directly reveal the structure of the representational space in\nthat area. Such structure is indexed by how distinctively different contents are represented in that\nregion [21, 4]. Therefore, one way to directly quantify the structure of the representational space\nin the neural population activity is to estimate the neural activity pattern elicited by each sensory\nstimulus, and calculate the similarity between the patterns corresponding to each pair of stimuli.\nThis analysis of pair-wise similarity between neural activity patterns to different stimuli was named\nRepresentational Similarity Analysis (RSA) [11]. In fact, one of the earliest demonstrations of\ndecoding from fMRI data was based on pattern similarity [7]. RSA revealed that the representational\nstructures in the inferotemporal (IT) cortex of natural objects are highly similar between human and\nmonkey [12] and a continuum in the abstract representation of biological classes exist in human\nventral object visual cortex [2]. Because the similarity structure can be estimated from imaging data\neven without building an encoding model, RSA allows not only for model testing (by comparing the\nsimilarity matrix of neural data with the similarity matrix of the feature vectors when stimuli are\nrepresented with an encoding model) but also for exploratory study (e.g., by projecting the similarity\nstructure to a low-dimensional space to visualize its structure, [11]). Therefore, originally as a tool\nfor studying visual representations [2, 16, 10], RSA has recently attracted neuroscientists to explore\nthe neural representational structure in many higher level cognitive areas [23, 18].\n\n2 Structured noise in pattern estimation translates into bias in RSA\n\nAlthough RSA is gaining popularity, a few recent studies revealed that in certain circumstances\nthe similarity structure estimated by standard RSA might include a signi\ufb01cant bias. For example,\nthe estimated similarity between fMRI patterns of two stimuli is much higher when the stimuli are\ndisplayed closer in time [8]. This dependence of pattern similarity on inter-stimulus interval was\nhypothesized to re\ufb02ect \"temporal drift of pattern\"[1], but we believe it may also be due to temporal\nautocorrelation in fMRI noise. Furthermore, we applied RSA to a dataset from a structured cognitive\ntask (Fig 1A) [19] and found that the highly structured representational similarity matrix obtained\nfrom the neural data (Fig 1B,C) is very similar to the matrix obtained when RSA is applied to pure\nwhite noise (Fig 1D). Since no task-related similarity structure should exist in white noise while the\nresult in Fig 1D is replicable from noise, this shows that the standard RSA approach can introduce\nsimilarity structure not present in the data.\nWe now provide an analytical derivation to explain the source of both types of bias (patterns closer\nin time are more similar and spurious similarity emerges from analyzing pure noise). It is notable\nthat almost all applications of RSA explicitly or implicitly assume that fMRI responses are related to\ntask-related events through a general linear model (GLM):\nY = X \u00b7 \u03b2 + \u0001.\n\n(1)\nHere, Y \u2208 RnT \u00d7nS is the fMRI time series from an experiment with nT time points from nS brain\nvoxels. The experiment involves nC different conditions (e.g., different sensory stimuli, task states,\nor mental states), each of which comprises events whose onset time and duration is either controlled\nby the experimenter, or can be measured experimentally (e.g., reaction times). In fMRI, the measured\nblood oxygen-level dependent (BOLD) response is protracted, such that the response to condition\nc is modelled as the time course of events in the experimental condition sc(t) convolved with a\ntypical hemodynamic response function (HRF) h(t). Importantly, each voxel can respond to different\nconditions with different amplitudes \u03b2 \u2208 RnC\u00d7nS , and the responses to all conditions are assumed\nto contribute linearly to the measured signal. Thus, denoting the matrix of HRF-convolved event time\ncourses for each task condition with X \u2208 RnT \u00d7nC , often called the design matrix, the measured Y is\nassumed to be a linear sum of X weighted by response amplitude \u03b2 plus zero-mean noise.\nEach row of \u03b2 is the spatial response pattern (i.e., the response across voxels) to an experimental\ncondition. The goal of RSA is therefore to estimate the similarity between the rows of \u03b2. Because\n\u03b2 is unknown, pattern similarity is usually calculated based on ordinary least square estimation of\n\u03b2: \u02c6\u03b2 = (XT X)\u22121XT Y, and then using Pearson correlation of \u02c6\u03b2 to measure similarity. Because\n\n2\n\n\fFigure 1: Standard RSA introduces bias structure to the similarity matrix. (A) A cognitive task\nthat includes 16 different experimental conditions. Transitions between conditions follow a Markov\nprocess. Arrows indicate possible transitions, each with p = 0.5. The task conditions can be grouped\nto 3 categories (color coded) according to the semantics, or mental operations, required in each\ncondition (the exact meaning of these conditions is not relevant to this paper). (B) Standard RSA of\nactivity patterns corresponding to each condition estimated from a region of interest (ROI) reveal\na highly structured similarity matrix. (C) Converting the similarity matrix C to a distance matrix\n1 \u2212 C and projecting it to a low-dimensional space using multi-dimensional scaling [13] reveals a\nhighly regular structure. Seeing such a result, one may infer that representational structure in the\nROI is strongly related to the semantic meanings of the task conditions. (D) However, a very similar\nsimilarity matrix can also be obtained if one applies standard RSA to pure white noise, with a similar\nlow-dimensional projection (not shown). This indicates that standard RSA can introduce spurious\nstructure in the resulting similarity matrix that does not exist in the data.\n\ncalculating sample correlation implies the belief that there exists an underlying covariance structure\nof \u03b2, we examine the source of bias by focusing on the covariance of \u02c6\u03b2 compared to that of true \u03b2.\nWe assume \u03b2 of all voxels in the ROI are indeed random vectors generated from a multivariate\nGaussian distribution N(0, U) (the size of U being nC \u00d7 nC). If one knew the true U, similarity\nmeasures such as correlation could be derived from it. Substituting the expression Y from equation 1\nwe have \u02c6\u03b2 = \u03b2 + (XT X)\u22121XT \u0001. We assume that the signal \u03b2 is independent from the noise \u0001,\nand therefore also independent from its linear transformation (XT X)\u22121XT \u0001. Thus the covariance of\n\u02c6\u03b2 is the sum of the true covariance of \u03b2 and the covariance of (XT X)\u22121XT \u0001:\n\n\u02c6\u03b2 \u223c N(0, U + (XT X)\u22121XT \u03a3\u0001X(XT X)\u22121)\n\n(2)\nWhere \u03a3\u0001 \u2208 RnT \u00d7nT is the temporal covariance of the noise \u0001 (for illustration purposes, in this\nsection we assume that all voxels have the same noise covariance).\nThe term\n\n(XT X)\u22121XT \u03a3\u0001X(XT X)\u22121\n\nis the source of the bias. Since the covariance of \u02c6\u03b2 has this bias term adding to U which we are\ninterested in, their sample correlation is also biased. So are many other similarity measures based on\n\u02c6\u03b2, such as Eucledian distance.\n\n3\n\nAMarkovian state transitionCLow-dimensional projectionInternalEnterExitBSimilarity in brainEnterInternalExitD\u201cSimilarity\u201d from noise\fThe bias term (XT X)\u22121XT \u03a3\u0001X(XT X)\u22121 depends on both the design matrix and the properties of\nthe noise. It is well known that autocorrelation exists in fMRI noise [24, 22]. Even if we assume that\nthe noise is temporally independent (i.e., \u03a3\u0001 is a diagonal matrix, which may be a valid assumption\nif one \"pre-whitens\" the data before further analysis [22]), the bias structure still exists but reduces to\n(XT X)\u22121\u03c32, where \u03c32 is the variance of the noise. Diedrichsen et al. [6] realized that the noise in\n\u02c6\u03b2 could contribute to a bias in the correlation matrix but assumed the bias is only in the diagonal\nof the matrix. However, the bias is a diagonal matrix only if the columns of X (hypothetical fMRI\nresponse time courses to different conditions) are orthogonal to each other and if the noise has no\nautocorrelation. This is rarely the case for most cognitive tasks. In the example in Figure 1A, the\ntransitions between experimental conditions follow a Markov process such that some conditions are\nalways temporally closer than others. Due to the long-lasting HRF, conditions of temporal proximity\nwill have higher correlation in their corresponding columns in X. Such correlation structure in X is\nthe major determinant of the bias structure in this case. On the other hand, if each single stimulus is\nmodelled as a condition in X and regularization is used during regression, the correlation between \u02c6\u03b2\nof temporally adjacent stimuli is higher primarily because of the autocorrelation property of the noise.\nThis can be the major determinant of the bias structure in cases such as [8].\nIt is worth noting that the magnitude of bias is larger relative to the true covariance structure U when\nthe signal-to-noise ratio (SNR) is lower, or when X has less power (i.e., there are few repetitions of\neach condition, thus few measurements of the related neural activity), as illustrated later in Figure 2B.\nThe bias in RSA was not noticed until recently [1, 8], probably because RSA was initially applied\nto visual tasks in which stimuli are presented many times in a well randomized order. Such designs\nmade the bias structure close to a diagonal matrix and researchers typically only focus on off-diagonal\nelements of a similarity matrix. In contrast, the neural signals in higher-level cognitive tasks are\ntypically weaker than those in visual tasks [9]. Moreover, in many decision-making and memory\nstudies the orders of different task conditions cannot be fully counter-balanced. Therefore, we expect\nthe bias in RSA to be much stronger and highly structured in these cases, misleading researchers and\nhiding the true (but weaker) representational structure in the data.\nOne alternative to estimating \u02c6\u03b2 using regression as above, is to perform RSA on the raw condition-\naveraged fMRI data (for instance, taking the average signal \u223c 6 sec after the onset of an event as a\nproxy for \u02c6\u03b2). This is equivalent to using a design matrix that assumes a 6-sec delayed single-pulse\nHRF. Although here columns of X are orthogonal by de\ufb01nition, the estimate \u02c6\u03b2 is still biased, so is its\ncovariance (XT X)\u22121XT XtrueUXT\ntrueX(XT X)\u22121 + (XT X)\u22121XT \u03a3\u0001X(XT X)\u22121 (where Xtrue\nis the design matrix re\ufb02ecting the true HRF in fMRI). See supplementary material for illustration of\nthis bias.\n\n3 Maximum likelihood estimation of similarity structure directly from data\n\nAs shown in equation 2, the bias in RSA stems from treating the noisy estimate of \u03b2 as the true \u03b2 and\nperforming a secondary analysis (correlation) on this noisy estimate. The similarly-structured noise\n(in terms of the covariance of their generating distribution) in each voxel\u2019s \u02c6\u03b2 translates into bias in\nthe secondary analysis. Since the bias comes from inferring U indirectly from point estimation of \u03b2,\na good way to avoid such bias is by not relying analysis on this point estimation. With a generative\nmodel relating U to the measured fMRI data Y, we can avoid the point estimation of unknown \u03b2 by\nmarginalizing it in the likelihood of observing the data. In this section, we propose a method which\nperforms maximum-likelihood estimation of the shared covariance structure U of activity patterns\ndirectly from the data.\nOur generative model of fMRI data follows most of the assumptions above, but also allows the noise\nproperty and the SNR to vary across voxels. We use an AR(1) process to model the autocorrelation\nof noise in each voxel: for the ith voxel, we denote the noise at time t(> 0) as \u0001t,i, and assume\n\n\u0001t,i = \u03c1i \u00b7 \u0001t\u22121,i + \u03b7t,i, \u03b7t,i \u223c N(0, \u03c32\ni )\n\n(3)\ni is the variance of the \"new\" noise and \u03c1i is the autoregressive coef\ufb01cient for the ith voxel.\nwhere \u03c32\nWe assume that the covariance of the Gaussian distribution from which the activity amplitudes \u03b2i of\nthe ith voxel are generated has a scaling factor that depends on its SNR si:\n\n\u03b2i \u223c N(0, (si\u03c3i)2U).\n\n4\n\n(4)\n\n\fThis is to re\ufb02ect the fact that not all voxels in an ROI respond to tasks (voxels covering partially or\nentirely white matter might have little or no response). Because the magnitude of the BOLD response\nto a task is determined by the product of the magnitude of X and \u03b2, but s is a hyper-parameter only\nof \u03b2, we hereforth refer to s as pseudo-SNR.\nWe further use the Cholesky decomposition to parametrize the shared covariance structure across\nvoxels: U = LLT , where L is a lower triangular matrix. Thus, \u03b2i can be written as \u03b2i = si\u03c3iL\u03b1i,\nwhere \u03b1i \u223c N (0, I) (this change of parameter allows for estimating U of less than full rank\nby setting L as lower-triangular matrix with a few rightmost-columns truncated). And we have\nYi \u2212 si\u03c3iXL\u03b1i \u223c N (0, \u03a3\u0001i (\u03c3i, \u03c1i)). Therefore, for the ith voxel, the likelihood of observing\ndata Yi given the parameters is:\np(Yi|L, \u03c3i, \u03c1i, si) =\n2 |\u03a3\u22121\n\np(Yi|L, \u03c3i, \u03c1i, si, \u03b1i)p(\u03b1i)d\u03b1i\n| 1\n2 exp[\u2212 1\n2\n\n(Yi \u2212 si\u03c3iXL\u03b1i)T \u03a3\u2212 1\n\n(Yi \u2212 si\u03c3iXL\u03b1i)]\n\n(2\u03c0)\u2212 nT\n\n(cid:90)\n\n(cid:90)\n\n=\n\n\u0001i\n\n2\n\n\u0001i\n\n\u00b7 (2\u03c0)\u2212 nC\n\n2 exp[\u2212 1\n2\n2 |\u03a3\u22121\n\n\u03b1T\n| 1\n2|\u039bi| 1\n\n\u0001i\n\n=(2\u03c0)\u2212 nT\n\ni \u03b1i]d\u03b1i\n\n2 exp[\n\n((si\u03c3i)2Y T\n\ni \u03a3\u22121\n\n\u0001i\n\n1\n\n2\n\nXL\u039biLT X T \u03a3\u22121\n\n\u0001i\n\nYi \u2212 Y T\n\ni \u03a3\u22121\n\n\u0001i\n\nYi)]\n(5)\n\ni LT X T \u03a3\u22121\n\nXL + I)\u22121. \u03a3\u22121\n\n\u0001i\n\ni \u03c32\n\nwhere \u039bi = (s2\nthe ith voxel, which is a function of \u03c3i and \u03c1i (see supplementary material).\nFor simplicity, we assume that the noise for different voxels is independent, which is the common\nassumption of standard RSA (although see [21]). The likelihood of the whole dataset, including all\nvoxels in an ROI, is then\n\nis the inverse of the noise covariance matrix of\n\n\u0001i\n\np(Y |L, \u03c3, \u03c1, s) =\n\np(Yi|L, \u03c3i, \u03c1i, si).\n\n(6)\n\n(cid:89)\n\ni\n\nWe can use gradient-based methods to optimize the model, that is, to search for the values of\nparameters that maximize the log likelihood of the data. Note that s are determined only up to a scale,\nbecause L can be scaled down by a factor and all si can be scaled up by the same factor without\nin\ufb02uencing the likelihood. Therefore, we set the geometric mean of s to be 1 to circumvent this\nindeterminacy, and \ufb01t s and L iteratively. The spatial pattern of s thus only re\ufb02ects the relative SNR\nof different voxels.\nOnce we obtain \u02c6L, the estimate of L, we can convert the covariance matrix \u02c6U = \u02c6L \u02c6LT into a\ncorrelation matrix, which is our estimation of neural representational similarity. Because U is a\nhyper-parameter of the activity pattern in our generative model and we estimate it directly from data,\nthis is an empirical Bayesian approach. We therefore refer to our method as \u201cBayesian RSA\u201d now.\n\n4 Performance of the method\n\n4.1 Reduced bias in recovering the latent covariance structure from simulated data\n\nTo test if the proposed method indeed reduces bias, we simulated fMRI data with a prede\ufb01ned\ncovariance structure and compared the structure recovered by our method with that recovered by\nstandard RSA. Fig 2A shows the hypothetical covariance structure from which we drew \u03b2i for each\nvoxel. The bias structure in Fig 1D is the average structure induced by the design matrices of all\nparticipants. To simplify the comparison, we use the design matrices of the experiment experienced\nby one participant. As a result, the bias structure induced by the design matrix deviates slightly from\nthat in Fig 1D.\nAs mentioned, the contribution of the bias to the covariance of \u02c6\u03b2 depends on both the level of noise\nand the power in the design matrix X. The more each experimental condition is measured during an\nexperiment (roughly speaking, the longer the experiment), the less noisy the estimation of \u02c6\u03b2, and\nthe less biased the standard RSA is. To evaluate the improvement of our method over standard RSA\n\n5\n\n\fFigure 2: Bayesian RSA reduces bias in the recovered shared covariance structure of activity\npatterns. (A) The covariance structure from which we sampled neural activity amplitudes \u03b2 for\neach voxel. fMRI data were synthesized by weighting the design matrix of the task from Fig 1A\nwith the simulated \u03b2 and adding AR(1) noise. (B) The recovered covariance structure for different\nsimulated pseudo-SNR. Standard individual: covariance calculated directly from \u02c6\u03b2 as is done in\nstandard RSA, for one simulated participant. Standard average: average of covariance matrices of\n\u02c6\u03b2 from 20 simulated participants. Bayesian individual: covariance estimated directly from data by\nour method for one simulated participant. Bayesian average: average of the covariance matrices\nestimated by Bayesian RSA from 20 simulated participants. (C) The ratio of the variation in the\nrecovered covariance structure which cannot be explained by the true covariance structure in Fig 2A.\nLeft: the ratio for covariance matrix from individual simulation (panel 1 and 3 of Fig 2B). Right: the\nratio for average covariance matrix (panel 2 and 4 of Fig 2B). Number of runs: the design matrices of\n1, 2, or 4 runs of a participant in the experiment of Fig 1A were used in each simulation, to test the\neffect of experiment duration. Error bar: standard deviation.\n\nin different scenarios, we therefore varied two factors: the average SNR of voxels and the duration\nof the experiment. 500 voxels were simulated. For each voxel, \u03c3i was sampled uniformly from\n[1.0, 3.0], \u03c1i was sampled uniformly from [\u22120.2, 0.6] (our empirical investigation of example\nfMRI data shows that small negative autoregressive coef\ufb01cient can occur in white matter), si was\nsampled uniformly from f \u00b7 [0.5, 2.0]. The average SNR was manipulated by choosing f from one\nof three levels {1, 2, 4} in different simulations. The duration of the experiment was manipulated by\nusing the design matrices of run 1, runs 1-2, and runs 1-4 from one participant.\nFig 2B displays the covariance matrix recovered by standard RSA (\ufb01rst two columns) and Bayesian\nRSA (last two columns), with an experiment duration of approximately 10 minutes (one run, measure-\nment resolution: TR = 2.4 sec). The rows correspond to different levels of average SNR (calculated\npost-hoc by averaging the ratio std(X\u03b2i)\nacross voxels). Covariance matrices recovered from one\nsimulated participant and the average of covariance matrices recovered from 20 simulated participants\n(\u201caverage\u201d) are displayed. Comparing the shapes of the matrix and the magnitudes of values (color\nbars) across rows, one can see that the bias structure in standard RSA is most severe when SNR is\nlow. Averaging the estimated covariance matrices across simulated participants can reduce noise, but\nnot bias. Comparing between columns, one can see that strong residual structure exists in standard\nRSA even after averaging, but almost disappears for Bayesian RSA. This is especially apparent for\nlow SNR \u2013 the block structure of the true covariance matrix from Figure 2A is almost undetectable\nfor standard RSA even after averaging (column 2, row 1 of Fig 2B), but emerges after averaging\nfor Bayesian RSA (column 4, row 1 of Fig 2B). Fig 2C compares the proportion of variation in the\nrecovered covariance structure that cannot be explained by the true structure in Fig 2A, for different\nlevels of SNR and different experiment durations, for individual simulated participants and for\naverage results. This comparison con\ufb01rms that the covariance recovered by Bayesian RSA deviates\nmuch less from the true covariance matrix than that by standard RSA, and that the deviation observed\nin an individual participant can be reduced considerably by averaging over multiple participants\n(comparing the left with right panels of Fig 2C for Bayesian RSA).\n\n\u03c3i\n\n6\n\n Recovered covariance structureBCCovariance structureof simulated \u03b2A% of recovered structure not explained by true structureindividualaveragestandard individualstandard averageBayesian individualBayesian averageSNR0.160.310.63\f4.2 Application to real data: simultaneous estimation of neural representational similarity\n\nand spatial location supporting the representation\n\nIn addition to reducing bias in estimation of representational similarity, our method also has an\nadvantage over standard RSA: it estimates the pseudo-SNR map s. This map reveals the locations\nwithin the ROI that support the identi\ufb01ed representational structure. When a researcher looks into\nan anatomically de\ufb01ned ROI, it is often the case that only some of the voxels respond to the task\nconditions. In standard RSA, \u02c6\u03b2 in voxels with little or no response to tasks is dominated by structured\nnoise following the bias covariance structure (XT X)\u22121XT \u03a3\u0001X(XT X)\u22121, but all voxels are taken\ninto account equally in the analysis. In contrast, si in our model is a hyper-parameter learned directly\nfrom data \u2013 if a voxel does not respond to any condition of the task, si would be small and the\ncontribution of the voxel to the total log likelihood is small. The \ufb01tting of the shared covariance\nstructure is thus less in\ufb02uenced by this voxel.\nFrom our simulated data, we found that parameters of the noise (\u03c3 and \u03c1) can be recovered re-\nliably with small variance. However, the estimation of s had large variance from the true values\nused in the simulation. One approach to reduce variance of estimation is by harnessing prior\nknowledge about data. Voxels supporting similar representation of sensory input or tasks tend to\nspatially cluster together. Therefore, we used a Gaussian Process to impose a smooth prior on\nlog(s) [17]. Speci\ufb01cally, for any two voxels i and j, we assumed cov(log(si), log(sj)) =\nb2exp(\u2212 (xi\u2212xj )T (xi\u2212xj )\n), where xi and xj are the spatial coordinates of the two\nvoxels and Ii and Ij are the average intensities of fMRI signals of the two voxels. Intuitively, this\nmeans that if two voxels are close together and have similar signal intensity (that is, they are of the\nsame tissue type), then they should have similar SNR. Such a kernel of a Gaussian Process imposes\nspatial smoothness but also allows the pseudo-SNR to change quickly at tissue boundaries. The\nvariance of the Gaussian process b2, the length scale lspace and linten were \ufb01tted together with the\nother parameters by maximizing the joint log likelihood of all parameters (here again, we restrict the\ngeometric mean of s to be 1).\n\n\u2212 (Ii\u2212Ij )2\n\ninten\n\n2l2\n\nspace\n\n2l2\n\nFigure 3: Bayesian RSA estimates both the representational similarity structure from fMRI\ndata and the spatial map supporting the learned representation. (A) Similarity between 6 animal\ncategories, as judged behaviorally (reproduced from [2]). (B) Average representational similarity\nestimated from IT cortex from all participants of [2], using our approach. The estimated structure\nresembles the subjectively-reported structure. (C) Pseudo-SNR map in IT cortex corresponding to\none participant. Red: high pseudo-SNR, green: low pseudo-SNR. Only small clusters of voxels show\nhigh pseudo-SNR.\n\nWe applied our method to the dataset of Connolly et al. (2012) [2]. In their experiment, participants\nviewed images of animals from 6 different categories during an fMRI scan and rated the similarity\nbetween animals outside the scanner. fMRI time series were pre-processed in the same way as in their\nwork [2]. Inferior temporal (IT) cortex is generally considered as the late stage of ventral pathway of\nthe visual system, in which object identity is represented. Fig 3 shows the similarity judged by the\nparticipants and the average similarity matrix estimated from IT cortex, which shows similar structure\nbut higher correlations between animal classes. Interestingly, the pseudo-SNR map shows that only\npart of the anatomically-de\ufb01ned ROI supports the representational structure.\n\n7\n\n ABCMap of pseudo-SNRlunamothladybugwarblermallardmonkeylemurlunamothladybugwarblermallardmonkeylemurlemurmonkeymallardwarblerladybuglunamothlemurmonkeymallardwarblerladybuglunamothSubjectively judged similaritySimilarity in IT by Bayesian RSA\f5 Discussion\n\nIn this paper, we demonstrated that representational similarity analysis, a popular method in many\nrecent fMRI studies, suffers from a bias. We showed analytically that such bias is contributed by both\nthe structure of the experiment design and the covariance structure of measurement and neural noise.\nThe bias is induced because standard RSA analyzes noisy estimates of neural activation level, and\nthe structured noise in the estimates turns into bias. Such bias is especially severe when SNR is low\nand when the order of task conditions cannot be fully counterbalanced. To overcome this bias, we\nproposed a Bayesian framework of the fMRI data, incorporating the representational structure as the\nshared covariance structure of activity levels across voxels. Our Bayesian RSA method estimates this\ncovariance structure directly from data, avoiding the structured noise in point estimation of activity\nlevels. Our method can be applied to neural recordings from other modalities as well.\nUsing simulated data, we showed that, as compared to standard RSA, the covariance structure\nestimated by our method deviates much less from the true covariance structure, especially for low\nSNR and short experiments. Furthermore, our method has the advantage of taking into account the\nvariation in SNR across voxels. In future work, we will use the pseudo-SNR map and the covariance\nstructure learned from the data jointly as an empirical prior to constrain the estimation of activation\nlevels \u03b2. We believe that such structured priors learned directly from data can potentially provide\nmore accurate estimation of neural activation patterns\u2014the bread and butter of fMRI analyses.\nA number of approaches have recently been proposed to deal with the bias structure in RSA, such\nas using the correlation or Mahalanobis distance between neural activity patterns estimated from\nseparate fMRI scans instead of from the same fMRI scan, or modeling the bias structure as a diagonal\nmatrix or by a Taylor expansion of an unknown function of inter-events intervals [1, 21, 6]. Such\napproaches have different limitations. The correlation between patterns estimated from different scans\n[1] is severely underestimated if SNR is low (for example, unless there is zero noise, the correlation\nbetween the neural patterns corresponding to the same conditions estimated from different fMRI\nscans is always smaller than 1, while the true patterns should presumably be the same across scans in\norder for such an analysis to be justi\ufb01ed). Similar problems exists for using Mahalanobis distance\nbetween patterns estimated from different scans [21]: with noise in the data, it is not guaranteed that\nthe distance between patterns of the same condition estimated from separate scans is smaller than the\ndistance between patterns of different conditions. Such a result cannot be interpreted as a measure\nof \u201csimilarity\u201d because, theoretically, neural patterns should be more similar if they belong to the\nsame condition than if they belong to different conditions. Our approach does not suffer from such\nlimitations, because we are directly estimating a covariance structure, which can always be converted\nto a correlation matrix. Modeling the bias as a diagonal matrix [6] is not suf\ufb01cient, as the bias can\nbe far from diagonal, as shown in Fig 1D. Taylor expansion of the bias covariance structure as a\nfunction of inter-event intervals can potentially account for off-diagonal elements of the bias structure,\nbut it has the risk of removing structure in the true covariance matrix if it happens to co-vary with\ninter-event intervals, and becomes complicated to set up if conditions repeat multiple times [1].\nOne limitation of our model is the assumption that noise is spatially independent. Henriksson et\nal. [8] suggested that global \ufb02uctuations of fMRI time series over large areas (which is re\ufb02ected as\nspatial correlation) might contribute largely to their RSA pattern. This might also be the reason that\nthe overall correlation in Fig 1B is higher than the bias obtained from standard RSA on independent\nGaussian noise (Fig 1D). Our future work will explicitly incorporate such global \ufb02uctuations of noise.\n\nAcknowledgement\n\nThis publication was made possible through the support of grants from the John Templeton Foundation and the\nIntel Corporation. The opinions expressed in this publication are those of the authors and do not necessarily re\ufb02ect\nthe views of the John Templeton Foundation. JWP was supported by grants from the McKnight Foundation,\nSimons Collaboration on the Global Brain (SCGB AWD1004351) and the NSF CAREER Award (IIS-1150186).\nWe thank Andrew C. Connolly etc. for sharing of the data used in 4.2. Data used in the supplementary material\nwere obtained from the MGH-USC Human Connectome Project (HCP) database.\n\nReferences\n[1] A. Alink, A. Walther, A. Krugliak, J. J. van den Bosch, and N. Kriegeskorte. Mind the drift-improving\nsensitivity to fmri pattern information by accounting for temporal pattern drift. bioRxiv, page 032391,\n\n8\n\n\f2015.\n\n[2] A. C. Connolly, J. S. Guntupalli, J. Gors, M. Hanke, Y. O. Halchenko, Y.-C. Wu, H. Abdi, and J. V.\nHaxby. The representation of biological classes in the human brain. The Journal of Neuroscience, 32(8):\n2608\u20132618, 2012.\n\n[3] R. W. Cox. Afni: software for analysis and visualization of functional magnetic resonance neuroimages.\n\nComputers and Biomedical research, 29(3):162\u2013173, 1996.\n\n[4] T. Davis and R. A. Poldrack. Measuring neural representations with fmri: practices and pitfalls. Annals of\n\nthe New York Academy of Sciences, 1296(1):108\u2013134, 2013.\n\n[5] R. C. Decharms and A. Zador. Neural representation and the cortical code. Annual review of neuroscience,\n\n23(1):613\u2013647, 2000.\n\n[6] J. Diedrichsen, G. R. Ridgway, K. J. Friston, and T. Wiestler. Comparing the similarity and spatial structure\n\nof neural representations: a pattern-component model. Neuroimage, 55(4):1665\u20131678, 2011.\n\n[7] J. V. Haxby, M. I. Gobbini, M. L. Furey, A. Ishai, J. L. Schouten, and P. Pietrini. Distributed and overlapping\n\nrepresentations of faces and objects in ventral temporal cortex. Science, 293(5539):2425\u20132430, 2001.\n\n[8] L. Henriksson, S.-M. Khaligh-Razavi, K. Kay, and N. Kriegeskorte. Visual representations are dominated\n\nby intrinsic \ufb02uctuations correlated between areas. NeuroImage, 114:275\u2013286, 2015.\n\n[9] P. Jazzard, P. Matthews, and S. Smith. Functional magnetic resonance imaging: An introduction to methods,\n\n2003.\n\n[10] D. J. Kravitz, C. S. Peng, and C. I. Baker. Real-world scene representations in high-level visual cortex: it\u2019s\n\nthe spaces more than the places. The Journal of Neuroscience, 31(20):7322\u20137333, 2011.\n\n[11] N. Kriegeskorte, M. Mur, and P. A. Bandettini. Representational similarity analysis-connecting the branches\n\nof systems neuroscience. Frontiers in systems neuroscience, 2:4, 2008.\n\n[12] N. Kriegeskorte, M. Mur, D. A. Ruff, R. Kiani, J. Bodurka, H. Esteky, K. Tanaka, and P. A. Bandettini.\nMatching categorical object representations in inferior temporal cortex of man and monkey. Neuron, 60(6):\n1126\u20131141, 2008.\n\n[13] J. B. Kruskal. Multidimensional scaling by optimizing goodness of \ufb01t to a nonmetric hypothesis. Psy-\n\nchometrika, 29(1):1\u201327, 1964.\n\n[14] S. Nishimoto, A. T. Vu, T. Naselaris, Y. Benjamini, B. Yu, and J. L. Gallant. Reconstructing visual\n\nexperiences from brain activity evoked by natural movies. Current Biology, 21(19):1641\u20131646, 2011.\n\n[15] K. A. Norman, S. M. Polyn, G. J. Detre, and J. V. Haxby. Beyond mind-reading: multi-voxel pattern\n\nanalysis of fmri data. Trends in cognitive sciences, 10(9):424\u2013430, 2006.\n\n[16] M. V. Peelen and A. Caramazza. Conceptual object representations in human anterior temporal cortex.\n\nThe Journal of Neuroscience, 32(45):15728\u201315736, 2012.\n\n[17] C. E. Rasmussen. Gaussian processes for machine learning. 2006.\n\n[18] M. Ritchey, E. A. Wing, K. S. LaBar, and R. Cabeza. Neural similarity between encoding and retrieval is\n\nrelated to memory via hippocampal interactions. Cerebral Cortex, page bhs258, 2012.\n\n[19] N. W. Schuck, M. B. Cai, R. C. Wilson, and Y. Niv. Human orbitofrontal cortex represents a cognitive map\n\nof state space. Neuron, 91:1\u201311, 2016.\n\n[20] E. P. Simoncelli and B. A. Olshausen. Natural image statistics and neural representation. Annual review of\n\nneuroscience, 24(1):1193\u20131216, 2001.\n\n[21] A. Walther, H. Nili, N. Ejaz, A. Alink, N. Kriegeskorte, and J. Diedrichsen. Reliability of dissimilarity\n\nmeasures for multi-voxel pattern analysis. NeuroImage, 2015.\n\n[22] M. W. Woolrich, B. D. Ripley, M. Brady, and S. M. Smith. Temporal autocorrelation in univariate linear\n\nmodeling of fmri data. Neuroimage, 14(6):1370\u20131386, 2001.\n\n[23] G. Xue, Q. Dong, C. Chen, Z. Lu, J. A. Mumford, and R. A. Poldrack. Greater neural pattern similarity\n\nacross repetitions is associated with better memory. Science, 330(6000):97\u2013101, 2010.\n\n[24] E. Zarahn, G. K. Aguirre, and M. D\u2019Esposito. Empirical analyses of bold fmri statistics. NeuroImage, 5\n\n(3):179\u2013197, 1997.\n\n9\n\n\f", "award": [], "sourceid": 2514, "authors": [{"given_name": "Mingbo", "family_name": "Cai", "institution": "Princeton University"}, {"given_name": "Nicolas", "family_name": "Schuck", "institution": "Princeton Neuroscience Institute"}, {"given_name": "Jonathan", "family_name": "Pillow", "institution": "Princeton University"}, {"given_name": "Yael", "family_name": "Niv", "institution": "Princeton University"}]}