{"title": "A Reduced-Dimension fMRI Shared Response Model", "book": "Advances in Neural Information Processing Systems", "page_first": 460, "page_last": 468, "abstract": "Multi-subject fMRI data is critical for evaluating the generality and validity of findings across subjects, and its effective utilization helps improve analysis sensitivity. We develop a shared response model for aggregating multi-subject fMRI data that accounts for different functional topographies among anatomically aligned datasets. Our model demonstrates improved sensitivity in identifying a shared response for a variety of datasets and anatomical brain regions of interest. Furthermore, by removing the identified shared response, it allows improved detection of group differences. The ability to identify what is shared and what is not shared opens the model to a wide range of multi-subject fMRI studies.", "full_text": "A Reduced-Dimension fMRI Shared Response Model\n\nPo-Hsuan Chen1, Janice Chen2, Yaara Yeshurun2,\nUri Hasson2, James V. Haxby3, Peter J. Ramadge1\n\n1Department of Electrical Engineering, Princeton University\n\n2Princeton Neuroscience Institute and Department of Psychology, Princeton University\n\n3Department of Psychological and Brain Sciences\n\nand Center for Cognitive Neuroscience, Dartmouth College\n\nAbstract\n\nMulti-subject fMRI data is critical for evaluating the generality and validity\nof \ufb01ndings across subjects, and its effective utilization helps improve analysis\nsensitivity. We develop a shared response model for aggregating multi-subject\nfMRI data that accounts for different functional topographies among anatomically\naligned datasets. Our model demonstrates improved sensitivity in identifying a\nshared response for a variety of datasets and anatomical brain regions of interest.\nFurthermore, by removing the identi\ufb01ed shared response, it allows improved de-\ntection of group differences. The ability to identify what is shared and what is not\nshared opens the model to a wide range of multi-subject fMRI studies.\n\n1\n\nIntroduction\n\nMany modern fMRI studies of the human brain use data from multiple subjects. The use of multiple\nsubjects is critical for assessing the generality and validity of the \ufb01ndings across subjects. It is also\nincreasingly important since from one subject one can gather at most a few thousand noisy instances\nof functional response patterns. To increase the power of multivariate statistical analysis, one there-\nfore needs to aggregate response data across multiple subjects. However, the successful aggregation\nof fMRI brain imaging data across subjects requires resolving the major problem that both anatomi-\ncal structure and functional topography vary across subjects [1, 2, 3, 4]. Moreover, it is well known\nthat standard methods of anatomical alignment [1, 4, 5] do not adequately align functional topogra-\nphy [4, 6, 7, 8, 9]. Hence anatomical alignment is often followed by spatial smoothing of the data\nto blur functional topographies. Recently, functional spatial registration methods have appeared that\nuse cortical warping to maximize inter-subject correlation of time series [7] or inter-subject corre-\nlation of functional connectivity [8, 9]. A more radical approach learns a latent multivariate feature\nthat models the shared component of each subject\u2019s response [10, 11, 12].\nMultivariate statistical analysis often begins by identifying a set of features that capture the informa-\ntive aspects of the data. For example, in fMRI analysis one might select a subset of voxels within an\nanatomical region of interest (ROI), or select a subset of principal components of the ROI, then use\nthese features for subsequent analysis. In a similar way, one can think of the fMRI data aggregation\nproblem as a two step process. First use training data to learn a mapping of each subject\u2019s measured\ndata to a shared feature space in a way that captures the across-subject shared response. Then use\nthese learned mappings to project held out data for each subject into the shared feature space and\nperform a statistical analysis.\nTo make this more precise, let {Xi \u2208 Rv\u00d7d}m\ni=1 denote matrices of training data (v voxels in\nthe ROI, over d TRs) for m subjects. We propose using this data to learn subject speci\ufb01c bases\nWi \u2208 Rv\u00d7k, where k is to be selected, and a shared matrix S \u2208 Rk\u00d7d of feature responses such\nthat Xi = WiS + Ei where Ei is an error term corresponding to unmodeled aspects of the subject\u2019s\nresponse. One can think of the bases Wi as representing the individual functional topographies and\nS as a latent feature that captures the component of the response shared across subjects. We don\u2019t\nclaim that S is a suf\ufb01cient statistic, but that is a useful analogy.\n\n1\n\n\fFigure 1: Comparison of training objective value and testing accuracy for problem (1) and (2) over various k\non raider dataset with 500 voxels of ventral temporal cortex (VT) in image stimulus class\ufb01ciation experiment\n(details in Sec.4). In all cases, error bars show \u00b11 standard error.\nThe contribution of the paper is twofold: First, we propose a probabilistic generative framework for\nmodeling and estimating the subject speci\ufb01c bases Wi and the shared response latent variable S. A\ncritical aspect of the model is that it directly estimates k (cid:28) v shared features. This is in contrast to\nmethods where the number of features equals the number of voxels [10, 11]. Moreover, the Bayesian\nnature of the approach provides a natural means of incorporating prior domain knowledge. Second,\nwe give a demonstration of the robustness and effectiveness of our data aggregation model using a\nvariety of fMRI datasets captured on different MRI machines, employing distinct analysis pathways,\nand based on various brain ROIs.\n2 Preliminaries\nfMRI time-series data Xi \u2208 Rv\u00d7d, i = 1 : m, is collected for m subjects as they are presented\nwith identical, time synchronized stimuli. Here d is the number of time samples in TRs (Time of\nRepetition), and v is the number of voxels. Our objective is to model each subject\u2019s response as\nXi = WiS + Ei where Wi \u2208 Rv\u00d7k is a basis of topographies for subject i, k is a parameter selected\nby the experimenter, S \u2208 Rk\u00d7d is a corresponding time series of shared response coordinates, and\nEi is an error term, i = 1 : m. To ensure uniqueness of coordinates it is necessary that Wi has\nlinearly independent columns. We make the stronger assumption that each Wi has orthonormal\ncolumns, W T\nTwo approaches for estimating the bases Wi and the shared response S are illustrated below:\n\ni Wi = Ik.\n\n(cid:80)\ni (cid:107)Xi \u2212 WiS(cid:107)2\ni Wi = Ik,\n\nW T\n\nF\n\nminWi,S\ns.t.\n\ni W T\n\n1/m(cid:80)\n\n, where \u02dcUi \u02dc\u03a3i \u02dcV T\ni\n\nwhere (cid:107) \u00b7 (cid:107)F denotes the Frobenius norm. For k \u2264 v, (1) can be solved iteratively by \ufb01rst se-\nlecting initial conditions for Wi, i = 1 : m, and optimizing (1) with respect to S by setting S =\ni Xi. With S \ufb01xed, (1) becomes m separate subproblems of the form min(cid:107)Xi \u2212 WiS(cid:107)2\nF\nwith solution Wi = \u02dcUi \u02dcV T\nis an SVD of XiST [13]. These two steps can be iter-\nated until a stopping criterion is satis\ufb01ed. Similarly, for k \u2264 v, (2) can also be solved iteratively.\ni\nHowever, for k < v, there is no known fast update of Wi given S. Hence this must be done using\nlocal gradient decent on the Stiefel manifold [14]. Both approaches yield the same solution when\nk = v, but are not equivalent in the more interesting situation k (cid:28) v (Sup. Mat.). What is most\nimportant, however, is that problem (2) with k < v, often learns an uninformative shared response\nS. This is illustrated in Fig. 1 which plots of the value of the training objective and the test accuracy\nfor a stimulus classi\ufb01cation experiment versus iteration count (image classi\ufb01cation using the raider\nfMRI dataset, see Sec.4). For problem (1), test accuracy increases with decreasing training error,\nWhereas for problem (2), test accuracy decreases with decreasing training error (This can be ex-\nplained analytically, see Sup. Mat.). We therefore base our approach on a generalization of problem\n(1). We call the resulting S and {Wi}m\nBefore extending this simple model, we note a few important properties. First, a solution of (1)\ni=1, for any k \u00d7 k orthogonal\nis not unique. If S, {Wi}m\nmatrix Q. This is not a problem as long as we only learn one template and one set of subject bases.\nAny new subjects or new data will be referenced to the original SRM. However, if we independently\nlearn two SRMs, the group shared responses S1, S2, may not be registered (use the same Q). We\nregister S1 to S2 by \ufb01nding a k \u00d7 k orthogonal matrix Q to minimize (cid:107)S2 \u2212 QS1(cid:107)2\nF ; then use QS1\nin place of S1 and WjQT in place of Wj for subjects in the \ufb01rst SRM.\nNext, when projected onto the span of its basis, each subject\u2019s training data Xi has coordinates\ni Si. The projection to k shared features\nSi = W T\n\ni Xi and the learning phase ensures S = 1/m(cid:80)m\n\ni=1 is a solution, then so is QS, {WiQT}m\n\ni=1 a shared response model (SRM).\n\n(cid:80)\n\n(1)\n\nminWi,S\ns.t.\n\ni Xi \u2212 S(cid:107)2\n\ni (cid:107)W T\ni Wi = Ik,\n\nF\n\nW T\n\n(2)\n\n2\n\n02468100.40.50.60.702468100.40.50.60.702468100.40.50.60.702468100.40.502468100.40.602468100.50.60.71problem(1)k=501problem(1)k=1001problem(1)k=5001problem(2)k=501problem(2)k=1001problem(2)k=500IterationsTestAccuracyTrainObjective\fj Xj.\n\nand the averaging across subjects in feature space both contribute to across-subject denoising during\nthe learning phase. By mapping S back into voxel space we obtain the voxel space manifestation\nWiS of the denoised, shared component of each subject\u2019s training data. The training data of subject\nj can also be mapped through the shared response model to the functional topography and anatomy\nof subject i by the mapping \u02c6Xi,j = WiW T\nNew subjects are easily added to an existing SRM S, {Wi}m\ni=1. We refer to S as the training\ntemplate. To introduce a new subject j = m + 1 with training data Xj, form its orthonormal basis\nF . We solve this\nby minimizing the mean squared modeling error minWj ,W T\nfor the least norm solution. Note that S, and the existing W1:m do not change; we simply add a\nnew subject by using its training data for the same stimulus and the template S to determine its\nbasis of functional topographies. We can also add new data to an SRM. Let X(cid:48)i, i = 1:m, denote\nnew data collected under a distinct stimulus from the same subjects. This is added to the study by\nforming S(cid:48)i = W T\ni X(cid:48)i, then averaging these projections to form the shared response for the new\ni X(cid:48)i. This assumes the learned subject speci\ufb01c topographies Wi generalize\ni=1 W T\n\ndata: S(cid:48) = 1/m(cid:80)m\n\nj Wj =Ik (cid:107)Xj \u2212 WjS(cid:107)2\n\nto the new data. This usually requires a suf\ufb01ciently rich stimulus in the learning phase.\n3 Probabilistic Shared Response Model\nWe now extend our simple shared response model to a probabilistic setting. Let xit \u2208 Rv denote\nthe observed pattern of voxel responses of the i-th subject at time t. For the moment, assume these\nobservations are centered over time. Let st \u2208 Rk be a hyperparameter modeling the shared response\nat time t = 1:d, and model the observation at time t for dataset i as the outcome of a random vector:\n(3)\nwhere, xit takes values in Rv, Wi \u2208 Rv\u00d7k, i = 1 : m, and \u03c12 is a subject independent hyperpa-\n2 (xit \u2212\nWist)T (xit\u2212Wist). Noting that xit is the t-th column of Xi, we see that minimizing L with respect\nto Wi and S = [s1, . . . , sd], requires the solution of:\n\nrameter. The negative log-likelihood of this model is L =(cid:80)\n\n(cid:80)\n(cid:80)\ni(xit \u2212 Wist)T (xit \u2212 Wist) = min(cid:80)\n\nxit \u223c N (Wist, \u03c12I),\n\ni (cid:107)Xi \u2212 WiS(cid:107)2\nF .\n\nmin(cid:80)\n\n2 log \u03c12 + \u03c1\n\n2 log 2\u03c0 + v\n\ni Wi = Ik,\n\nwith W T\n\nt\n\ni\n\nt\n\nv\n\n\u22122\n\nThus maximum likelihood estimation for this model matches (1).\nIn our fMRI datasets, and most multi-subject fMRI datasets available today, d (cid:29) m. Since st is\ntime speci\ufb01c but shared across the m subjects, we see that there is palpable value in regularizing its\nestimation. In contrast, subject speci\ufb01c variables such as Wi are shared across time, a dimension\nin which data is relatively plentiful. Hence, a natural extension of (3) is to make st a shared latent\nrandom vector st \u223c N (0, \u03a3s) taking values in Rk. The observation for dataset i at time t then has\nthe conditional density p(xit|st) = N (Wist + \u00b5i, \u03c12\ni I), where the subject speci\ufb01c mean \u00b5i allows\nfor a non-zero mean and we assume subject dependent isotropic noise covariance \u03c12\ni I. This is an\nextended multi-subject form of factor analysis, but in factor analysis one normally assumes \u03a3s = I.\nm], \u03a8 =\nTo form a joint model, let xT\ndiag(\u03c12\n\nmI), \u0001 \u223c N (0, \u03a8), and \u03a3x = W \u03a3sW T + \u03a8. Then\n\nT ], W T = [W T\n\nm], \u00b5T = [\u00b5T\n\n1I, . . . , \u03c12\n\n1 . . . W T\n\nT . . . xmt\n\nt = [x1t\n\n1 . . . \u00b5T\n\nwith xt \u223c N (\u00b5, \u03a3x) taking values in Rmv. For this joint model, we formulate SRM as:\n\nxt = W st + \u00b5 + \u0001,\n\n(4)\n\nst \u223c N (0, \u03a3s),\n\nxit|st \u223c N (Wist + \u00b5i, \u03c12\n\ni I),\n\nW T\n\ni Wi = Ik,\n\n(5)\n\nFigure 2: Graphical model\nfor SRM. Shaded nodes: ob-\nservations, unshaded nodes:\nlatent variables, and black\nsquares: hyperparameters.\n\nwhere st takes values in Rk, xit takes values in Rv, and the hyperparameters Wi are matrices in\nRv\u00d7k, i = 1:m. The latent variable st, with covariance \u03a3s, models a shared elicited response across\nthe subjects at time t. By applying the same orthogonal transform to each of the Wi, we can assume,\nwithout loss of generality, that \u03a3s is diagonal. The SRM graphical model is displayed in Fig. 2.\n\n3\n\nstxitmd\u2303sWi,\u00b5i,\u21e2i\f3.1 Parameter Estimation for SRM\n\nTo estimate the parameters of the SRM model we apply a constrained EM algorithm to \ufb01nd max-\nimum likelihood solutions. Let \u03b8 denote the vector of all parameters. In the E-step, given initial\nvalue or estimated value \u03b8old from the previous M-step, we calculate the suf\ufb01cient statistics by tak-\ning expectation with respect to p(st|xt, \u03b8old):\n\ns|x[st] = (W \u03a3s)T (W \u03a3sW T + \u03a8)\u22121(xt \u2212 \u00b5),\nE\nE\ns|x[stsT\n\n= \u03a3s \u2212 \u03a3T\n\nt ] = Vars|x[st] + E\n(cid:80)d\n\nQ(\u03b8, \u03b8old) = 1\nd\n\ns W T (W \u03a3sW T + \u03a8)\u22121W \u03a3s + E\n\ns|x[st]T\n\ns|x[st]E\ns|x[st]E\n(cid:82) p(st|xt, \u03b8old) log p(xt, st|\u03b8)dst.\n\nt=1\n\ns|x[st]T .\n\nIn the M-step, we update the parameter estimate to \u03b8new by maximizing Q with respect to Wi, \u00b5i,\ni , i = 1:m, and \u03a3s. This is given by \u03b8new = arg max\u03b8 Q(\u03b8, \u03b8old), where\n\u03c12\n\n(6)\n\n(7)\n\nDue to the model structure, Q can be maximized with respect to each parameter separately. To\nenforce the orthogonality of Wi, we bring a symmetric matrix \u039bi of Lagrange multipliers and add\ni Wi \u2212 I)) to the objective function. Setting the derivatives of the\nthe constraint term tr(\u039bi(W T\nmodi\ufb01ed objective to zero, we obtain the following update equations:\n\ni Ai)\u22121/2, Ai = 1\n2\n\n(cid:0)(cid:107)xit \u2212 \u00b5new\n\ni (cid:107)2 \u2212 2(xit \u2212 \u00b5new\n\ni\n\n(cid:0)(cid:80)\nt(xit \u2212 \u00b5new\n\ni\n\n)E\n)T W new\n\ni\n\ns|x[st]T(cid:1),\n\ns|x[st] + tr(E\nE\n\ns|x[stsT\n\nt ])(cid:1),\n\n(8)\n(9)\n(10)\n(11)\n\n(cid:80)\n(cid:80)\n(cid:80)\n\nt xit,\n\n\u00b5new\ni = 1\nd\nW new\ni = Ai(AT\nnew\n= 1\n\u03c12\ni\ndv\n\u03a3new\ns = 1\nd\n\nt(E\n\nt\n\nt ]).\n\ns|x[stsT\nThe orthonormal constraint W T\ni Wi = Ik in SRM is similar to that of PCA. In general, there is no\nreason to believe that key brain response patterns are orthogonal. So, the orthonormal bases found\nvia SRM are a computational tool to aid statistical analysis within an ROI. From a computational\nviewpoint, orthogonality has the advantage of robustness and preserving temporal geometry.\n\n3.2 Connections with related methods\n\nFor one subject, SRM is similar to a variant of pPCA [15] that imposes an orthogonality constraint\non the loading matrix. pPCA yields an orthogonal loading matrix. However, due to the increase in\nmodel complexity to handle multiple datasets, SRM has an explicit constraint of orthogonal loading\nmatrices. Topographic Factor Analysis (TFA) [16] is a factor model using a topographic basis com-\nposed of spherical Gaussians with different centers and widths. This choice of basis is constraining\nbut since each factor is an \u201cblob\u201d in the brain it has the advantage of providing a simple spatial in-\nterpretation. Hyperalignment (HA) [10], learns a shared representational by rotating subjects\u2019 time\nseries responses to maximize inter-subject time series correlation. The formulation in [10] is based\non problem (2) with k = v and Wi a v \u00d7 v orthogonal matrix (Sup. Mat.). So this method does\nnot directly reduce the dimension of the feature space, nor does it directly extend to this case (see\nFig. 1). Although dimensionality reduction can be done posthoc using PCA, [10] shows that this\ndoesn\u2019t lead to performance improvement. In contrast, we show in \u00a74 that selecting k (cid:28) v can\nimprove the performance of SRM beyond that attained by HA. The GICA, IVA algorithms [17] do\nnot assume time-synchronized stimulus and hence concatenate data along the time dimension (im-\nplying spatial consistency) and learn spatial independent components. We use the assumption of a\ntime-synchronized stimulus for anchoring the shared response to overcome a spatial mismatch in\nfunctional topographies. Finally, SRM can be regarded as a re\ufb01nement of the concept of hyperalign-\nment [10] cast into a probabilistic framework. The HA approach has connections with regularized\nCCA [18]. Additional details of these connections and connections with Canonical Correlation\nAnalysis (CCA) [19], ridge regression, Independent Component Analysis (ICA) [20], regularized\nHyperalignment [18] are discussed in the supplementary material.\n\n4 Experiments\n\nWe assess the performance and robustness of SRM using fMRI datasets (Table 1) collected using\ndifferent MRI machines, subjects, and preprocessing pipelines. The sherlock dataset was collected\n\n4\n\n\fDataset\nVoxels\nsherlock (audio-visual movie) [21]\n813\nraider (audio-visual movie) [10]\n500/H\nforrest (audio movie) [24]\n1300/H\naudiobook (narrated story) [26]\n2500/H\nTable 1: fMRI datasets are shown in the left four columns, and the ROIs are shown in right two columns. The\nROIs vary in functional from visual, language, memory, to mental states. H stands for hemisphere.\n\nposterior medial cortex (PMC) [22]\nventral temporal cortex (VT) [23]\n\n1976 (2)\n2203 (3)\n3599 (2)\n449 (2)\n\nplanum temporale (PT) [25]\n\ndefault mode network (DMN) [27]\n\nSubjs TRs (s/TR)\n\nRegion of interest (ROI)\n\n16\n10\n18\n40\n\nwhile subjects watched an episode of the BBC TV series \u201cSherlock\u201d (66 mins).The raider dataset\nwas collected while subjects viewed the movie \u201cRaiders of the Lost Ark\u201d (110 mins) and a series\nof still images (7 categories, 8 runs). The forrest dataset was collected while subjects listened to\nan auditory version of the \ufb01lm \u201cForrest Gump\u201d (120 mins). The audiobook dataset was collected\nwhile subjects listened to a narrated story (15 mins) with two possible interpretations. Half of the\nsubjects had a prior context favoring one interpretation, the other half had a prior context favoring\nthe other interpretation. Post scanning questionnaires showed no difference in comprehension but a\nsigni\ufb01cant difference in interpretations between groups.\nExperiment 1: SRM and spatial smoothing. We \ufb01rst use spatial smoothing to determine if we can\ndetect a shared response in PMC for the sherlock dataset. The subjects are randomly partitioned into\ntwo equal sized groups, the data for each group is averaged, we calculate the Pearson correlation\nover voxels between these averaged responses for each time, then average these correlations over\ntime. This is a measure of similarity of the sequence of brain maps in the two average responses. We\nrepeat this for \ufb01ve random subject divisions and average the results. If there is a shared response,\nwe expect a positive average correlation between the groups, but if functional topographies differ\nsigni\ufb01cantly across subjects, this correlation may be small. If the result not distinct from zero, a\nshared response is not detected. The computation yields the benchmark value 0.26 \u00b1 0.006 shown\nas the purple bar in the right plot in Fig. 3. This is support for a shared response in PMC, but we posit\nthat the subject\u2019s functional topographies in PMC are misaligned. To test this, we use a Gaussian\n\ufb01lter, with width at half height of 3, 4, 5 and 6mm, to spatially smooth each subject\u2019s fMRI data,\nthen recalculate the average Pearson correlation as described above. The results, shown as blue\nbars in Fig. 3, indicate higher correlations with greater spatial smoothing. This indicates greater\naverage correlation of the responses at lower spatial frequencies, suggesting a \ufb01ne scale mismatch\nof functional topographies across subjects.\nWe now test the robustness of SRM using the unsmoothed data. The subjects are randomly parti-\ntioned into two equal sized groups. The data in each group is divided in time into two halves, and the\nsame half in each group is used to learn a shared response model for the group. The independently\nobtained group templates S1, S2, are then registered using a k \u00d7 k orthogonal matrix Q (method\noutlined in \u00a72). For each group, the second half of the data is projected to feature space using the\nsubject-speci\ufb01c bases and averaged. Then the Pearson correlation over features is calculated be-\ntween the group averaged shared responses, and averaged over time. This is repeated using the other\nthe halves of the subject\u2019s data for training and the results are averaged. The average results over 5\nrandom subject divisions are report as the green bars in Fig. 3. With k = 813 there is no reduction\nof dimension and SRM achieves a correlation equivalent to 6mm spatial smoothing. This strong\naverage correlation between groups, suggests some form of shared response. As expected, if the\ndimension of the feature space k is reduced, the correlation increases. A smaller value of k, forces\n\nFigure 3: Experiment 1. Left: Learn using half of the data, then compute between group correlation on other\nhalf. Right: Pearson correlation after spatial smoothing, and SRM with various k. Error bars: \u00b11 stand. error.\n\n5\n\nsubject \t\n\r1\t\n\rsubject m\t\n\r\u2026\t\n\rsubject \t\n\r1\t\n\rsubject m\t\n\r\u2026\t\n\rhalf\t\n\rmovie\t\n\rdata\t\n\rhalf\t\n\rmovie\t\n\rdata\t\n\rR\t\n\rComputing Correlation between Groups\t\n\rsubject \t\n\r1\t\n\rsubject m\t\n\r\u2026\t\n\rsubject \t\n\r1\t\n\rsubject m\t\n\r\u2026\t\n\rhalf\t\n\rmovie\t\n\rdata\t\n\rproject\t\n\rto bases\t\n\rlearn\t\n\rbases\t\n\rlearn\t\n\rbases\t\n\rGroup 2\t\n\rshared \t\n\rresponse\t\n\rGroup 1\t\n\rshared \t\n\rresponse\t\n\rGroup 1\t\n\rshared \t\n\rresponse\t\n\rGroup 2\t\n\rshared \t\n\rresponse\t\n\rWmWmWmW1W1W1W1QTWmQTQLearning Subject Speci\ufb01c Bases\t\n\r\fFigure 4: Left: Experiment 2. Learn subject speci\ufb01c bases. Test on held out subject and data.\nRight: Experiment 2. Time segment matching by correlating with 9 TR segments in the shared response.\n\nFigure 5: Experiment 2. Top: Comparison of 18s time segment classi\ufb01cation on three datasets using distinct\nROIs. Bottom: (Left) SRM time segment classi\ufb01cation accuracy vs k. (Right) Learn bases from movie re-\nsponse, classify stimulus category using still image response. For raider and forrest, we conduct experiment\non ROI in each hemisphere separately and then average the results. For sherlock, we conduct experiment over\nwhole PMC. The TAL results for the raider dataset are from [10]. Error bars: \u00b11 stand. error.\nSRM to focus on shared features yielding the best data representation and gives greater noise re-\njection. Learning 50 features achieves a 33% higher average correlation in feature space than is\nachieved by 6mm spatial smoothing in voxel space. A commensurate improvement occurs when\nSRM is applied to the spatially smoothed data.\nExperiment 2 : Time segment matching and image classi\ufb01cation. We test if the shared response\nestimated by SRM generalizes to new subjects and new data using versions of two experiments from\n[10] (unlike in [10], here the held out subject is not included in learning phase). The \ufb01rst experiment\ntests if an 18s time segment from a held-out subject\u2019s new data can be located in the corresponding\nnew data of the training subjects. A shared response and subject speci\ufb01c bases are learned using half\nof the data, and the held out subject\u2019s basis is estimated using the shared response as a template. Then\na random 18s test segment from the unused half of the held out subject\u2019s data is projected onto the\nsubject\u2019s basis, and we locate the 18s segment in the averaged shared response of the other subject\u2019s\nnew data that is maximally correlated with the test segment (see Fig. 4). The held out subject\u2019s\ntest segment is correctly located (matched) if its correlation with the average shared response at\nthe same time point is the highest; segments overlapping with the test segment are excluded. We\nrecord the average accuracy and standard error by two-fold cross-validation over the data halves\nand leave-one-out over subjects. The results using three different fMRI datasets with distinct ROIs\nare shown in the top plot of Fig. 5. The accuracy is compared using: anatomical alignment (MNI\n[4], Talairach (TAL) [1]); standard PCA, and ICA feature selection (FastICA implementation [20]);\nthe Hyperalignment (HA) method [10]; and SRM. PCA and ICA are directly applied on joint data\nm]. SRM\nmatrix X T = [X T\ndemonstrates the best matching of the estimated shared temporal features of the methods tested. This\nsuggests that the learned shared response is more informative of the shared brain state trajectory at\nan 18s time scale. Moreover, the experiment veri\ufb01es generalization of the estimated shared features\nto subjects not included in the training phase and new (but similar) data collected during the other\nhalf of the movie stimulus. Since we expect accuracy to improve as the time segment is lengthened,\n\nm] for learning W and S, where X \u2248 W S and W T = [W T\n\n1 . . . X T\n\n1 . . . W T\n\n6\n\nLearning Subject Speci\ufb01c Bases\t\n\rmovie \t\n\rdata\t\n\rlearn bases\t\n\rsubject m\t\n\rsubject 1\t\n\rsubject m-1\t\n\rimage data/\t\n\rmovie data\t\n\rproject to \t\n\rbases\t\n\rsubject m\t\n\rsubject 1\t\n\rsubject m-1\t\n\rprojected \t\n\rdata\t\n\rclassi\ufb01er\t\n\rtest\t\n\rTesting on Held-out Subject\t\n\r\u2026\t\n\r\u2026\t\n\r\u2026\t\n\rshared response\t\n\r(template)\t\n\routput\t\n\rtrain\t\n\rsubject m\t\n\rsubject m-1\t\n\rsubject 1\t\n\rW1Wm1WmW1Wm1Wm\u2026\t\n\r10\t\n\r9\t\n\r\u2026\t\n\r2\t\n\r1\t\n\rt\t\n\r-1\t\n\rt\t\n\rt\t\n\r+1\t\n\r\u2026\t\n\rt\t\n\r+8\t\n\rt\t\n\r+9\t\n\r\u2026\t\n\rsegment 1\t\n\rsegment 2\t\n\rSegment t-9\t\n\roverlapping segment\t\n\rSegment t+9\t\n\rtesting segment\t\n\roverlapping segment\t\n\r\u2026\t\n\r\u2026\t\n\rmovie\t\n\rTR\t\n\rsegt\t\n\rtesting\t\n\rsubject\u2019s\t\n\rprojected\t\n\rresponse\t\n\rshared\t\n\rresponse\t\n\rseg\t\n\rt-9\t\n\rseg\t\n\r2\t\n\rseg1\t\n\rseg\t\n\rt\t\n\rSeg\t\n\rt+9\t\n\r\u2026\t\n\r\u2026\t\n\roverlapping \t\n\rseg. excluded\t\n\roverlapping \t\n\rseg. excluded\t\n\rseg\t\n\rt-9\t\n\rseg\t\n\r2\t\n\rseg1\t\n\rSeg\t\n\rt+9\t\n\r\u2026\t\n\r\u2026\t\n\r\u2026\t\n\r\u2026\t\n\r\fFig. 6.1\n\nFig. 6.2\n\nFigure 6: Experiment 3. Fig. 6.1: Experimental procedure. Fig 6.2: Data components (left) and group\nclassi\ufb01cation performance with SRM (right) in different steps of the procedure. Fig. 6.3: Group classi\ufb01cation\non audiobook dataset in DMN before and after removing an estimated shared response for various values of k1\nand k2 with SRM, PCA and ICA. Error bars: \u00b11 stand. error.\n\nFig. 6.3\n\nwhat is important is the relative accuracy of the compared methods. The method in (1) can be viewed\nas non-probabilistic SRM. In this experiment, it performs worse than SRM but better than the other\ncompared methods. The effect of the number of features used in SRM is shown in Fig. 5, lower left.\nThis can be used to select k. A similar test on the number of features used in PCA and ICA indicates\nlower performance than SRM (results not shown).\nWe now use the image viewing data and the movie data from the raider dataset to test the generaliz-\nability of a learned shared response to a held-out subject and new data under a very distinct stimulus.\nThe raider movie data is used to learn a shared response model, while excluding a held-out subject.\nThe held-out subject\u2019s basis is estimated by matching its movie response data to the estimated shared\nresponse. The effectiveness of the learned bases is then tested using the image viewing dataset [10].\nAfter projecting the image data using the subject bases to feature space, an SVM classi\ufb01er is trained\nand the average classi\ufb01er accuracy and standard error is recorded by leave-one-out across subject\ntesting. The results, lower right plot in Fig. 5, support the effectiveness of SRM in generalizing to\na new subject and a distinct new stimulus. Under SRM, the image stimuli can be slightly more ac-\ncurately identi\ufb01ed using other subjects\u2019 data for training than using a subject\u2019s own data, indicating\nthat the learned shared response is informative of image category.\nExperiment 3: Differentiating between groups. Now consider the audiobook dataset and the\nDMN ROI. If subjects are given group labels according to the two prior contexts, a linear SVM\nclassi\ufb01er trained on labeled voxel space data and tested on the voxel space data of held out subjects,\ncan distinguish the two groups at an above chance level. This is shown as the leftmost bar in the\nbottom \ufb01gure of Fig. 6.3. This is consistent with previous similar studies [28].\nWe test if SRM can distinguish the two subject groups with a higher rate of success. To do so we use\ng2\ng1\n1:m of all subjects\nthe procedure outlined in rows of Fig. 6.1. We \ufb01rst use the original data X\n1:m, X\n(Fig. 6.1 (a)) to learn a k1-dimensional shared response Sall and subject bases W all\ngj,1:m. This shared\nresponse is then mapped to voxel space using each subject\u2019s learned topography (Fig. 6.1 (b)) and\ngj,iSall for subject i in group\nsubtracted from the subject\u2019s data to form the residual response X\nj (Fig. 6.1 (c)). Leaving out one subject from each group, we use two within-group applications of\ng2\nSRM to \ufb01nd k2-dimensional within-group shared responses Sg1, Sg2, and subject bases W\n1:m\n\ni \u2212W all\n\ngj\n\ng1\n1:m, W\n\n7\n\n(b)!(c)!Group 1!subj 1!Group 2!original data!TR!voxel!\u2026!subj m!subj 1!\u2026!subj m!train!train!train!train!shared response within group!in voxel space (k2)!test!test!shared response across all subjects!in voxel space (k1)!residual!shared response within group!in voxel space (k2)!voxel!voxel!voxel!TR!TR!TR!(d)!(a)!(d)!shared !by all!within group!individual!group classi\ufb01cation!accuracy with SRM!(a)!0.72\u00b10.06!(b)!0.54\u00b10.06!k1=3(c)!0.70\u00b10.04!k1=3group 1!group 2!k1=00.72\u00b10.04!k2=100group 1!group 2!0.82\u00b10.04!k1=10k2=100\fgj\ni Sgj for each subject (Fig. 6.1 (d)).\nfor the residual response. These are mapped into voxel space W\nThe \ufb01rst application of SRM yields an estimate of the response shared by all subjects. This is used\nto form the residual response. The subsequent within-group applications of SRM to the residual give\nestimates of the within-group shared residual response. Both applications of SRM seek to remove\ncomponents of the original response that are uninformative of group membership. Finally, a linear\nSVM classi\ufb01er is trained using the voxel space group-labeled data, and tested on the voxel space\ndata of held out subjects. The results are shown as the red bars in Fig. 6.3. When using k1 = 10 and\nk2 = 100, we observe signi\ufb01cant improvement in distinguishing the groups.\nOne can visualize why this works using the cartoon in Fig. 6.2 showing the data for one subject\nmodeled as the sum of three components: the response shared by all subjects, the response shared\nby subjects in the same group after the response shared by all subjects is removed, and a \ufb01nal residual\nterm called the individual response (Fig. 6.2(a)). We \ufb01rst identify the response shared by all subjects\n(Fig. 6.2(b)); subtracting this from the subject response gives the residual (Fig. 6.2(c)). The second\nwithin-group application of SRM removes the individual response (Fig. 6.2(d)). By tuning k1 in the\n\ufb01rst application of SRM and tuning k2 in the second application of SRM, we estimate and remove\nthe uninformative components while keeping the informative component.\nClassi\ufb01cation using the estimated shared response (k1 \u2264 10) results in accuracy around chance\n(Fig. 6.2(b)), indicating that it is uninformative for distinguishing the groups. The classi\ufb01cation\naccuracy using the residual response is statistically equivalent to using the original data (Fig. 6.2(c)),\nindicating that only removing the response shared by all subjects is insuf\ufb01cient for improvement.\nThe classi\ufb01cation accuracy that results by not removing the shared response (k1 = 0) and only\napplying within-group SRM (Fig. 6.2(d)), is also statistically equivalent to using the original data.\nThis indicates that only removing the individual response is also insuf\ufb01cient for improvement. By\ncombining both applications of SRM we remove both the response shared by all subjects and the\nindividual responses, keeping only the responses shared within groups. For k1 = 10, k2 = 100, this\nleads to signi\ufb01cant improvement in performance (Fig. 6.2(d) and Fig. 6.3).\nWe performed the same experiment using PCA and ICA (Fig. 6.3). In this case, after removing the\nestimated shared response (k1 \u2265 1) group identi\ufb01cation quickly drops to chance since the shared\nresponse is informative of group difference (around 70% accuracy for distinguishing the groups\n(Sup. Mat.)). A detailed comparison of all three methods on the different steps of the procedure is\ngiven in the supplementary material.\n5 Discussion and Conclusion\nThe vast majority of fMRI studies require aggregation of data across individuals. By identifying\nshared responses between the brains of different individuals, our model enhances fMRI analyses that\nuse aggregated data to evaluate cognitive states. A key attribute of SRM is its built-in dimensionality\nreduction leading to a reduced-dimension shared feature space. We have shown that by tuning this\ndimensionality, the data-driven aggregation achieved by SRM demonstrates higher sensitivity in\ndistinguishing multivariate functional responses across cognitive states. This was shown across\na variety of datasets and anatomical brain regions of interest. This also opens the door for the\nidenti\ufb01cation of shared and individual responses. The identi\ufb01cation of shared responses after SRM\nis of great interest, as it allows us to assess the degree to which functional topography is shared\nacross subjects. Furthermore, the SRM allows the detection of group speci\ufb01c responses. This was\ndemonstrated by removing an estimated shared response to increase sensitivity in detecting group\ndifferences. We posit that this technique can be adapted to examine an array of situations where\ngroup differences are the key experimental variable. The method can facilitate studies of how neural\nrepresentations are in\ufb02uenced by cognitive manipulations or by factors such as genetics, clinical\ndisorders, and development.\nSuccessful decoding of a particular cognitive state (such as a stimulus category) in a given brain area\nprovides evidence that information relevant to that cognitive state is present in the neural activity of\nthat brain area. Conducting such analyses in locations spanning the brain, e.g., using a searchlight\napproach, can facilitate the discovery of information pathways. In addition, comparison of decoding\naccuracies between searchlights can suggest what kind of information is present and where it is\nconcentrated in the brain. SRM provides a more sensitive method for conducting such investigations.\nThis may also have direct application in designing better noninvasive brain-computer interfaces [29].\n\n8\n\n\fReferences\n[1] J. Talairach and P. Tournoux. Co-planar stereotaxic atlas of the human brain. 3-Dimensional proportional\n\nsystem: an approach to cerebral imaging. Thieme, 1988.\n\n[2] J.D.G. Watson, R. Myers, et al. Area V5 of the human brain: evidence from a combined study using\n\npositron emission tomography and magnetic resonance imaging. Cereb. Cortex, 3:79\u201394, 1993.\n\n[3] R. B. H. Tootell, J. B. Reppas, et al. Visual motion aftereffect in human cortical area MT revealed by\n\nfunctional magnetic resonance imaging. Nature, 375(6527):139\u2013141, 05 1995.\n\n[4] J. Mazziotta, A. Toga, et al. A probabilistic atlas and reference system for the human brain. Philosophical\n\nTransactions of the Royal Society B: Biological Sciences, 356(1412):1293\u20131322, 2001.\n\n[5] B. Fischl, M. I. Sereno, R.B.H. Tootell, and A. M. Dale. High-resolution intersubject averaging and a\n\ncoordinate system for the cortical surface. Human brain mapping, 8(4):272\u2013284, 1999.\n\n[6] M. Brett, I. S. Johnsrude, and A. M. Owen. The problem of functional localization in the human brain.\n\nNat Rev Neurosci, 3(3):243\u2013249, 03 2002.\n\n[7] M. R. Sabuncu, B. D. Singer, B. Conroy, R. E. Bryan, P. J. Ramadge, and J. V. Haxby. Function-based\n\nintersubject alignment of human cortical anatomy. Cerebral Cortex, 20(1):130\u2013140, 2010.\n\n[8] B. R. Conroy, B. D. Singer, J. V. Haxby, and P. J. Ramadge. fMRI-based inter-subject cortical alignment\n\nusing functional connectivity. In Advances in Neural Information Processing Systems, 2009.\n\n[9] B. R. Conroy, B. D. Singer, J. S. Guntupalli, P. J. Ramadge, and J. V. Haxby. Inter-subject alignment of\n\nhuman cortical anatomy using functional connectivity. NeuroImage, 2013.\n\n[10] J. V. Haxby, J. S. Guntupalli, et al. A common, high-dimensional model of the representational space in\n\nhuman ventral temporal cortex. Neuron, 72(2):404\u2013416, 2011.\n\n[11] A. Lorbert and P. J. Ramadge. Kernel hyperalignment. In Adv. in Neural Inform. Proc. Systems, 2012.\n[12] A. G. Huth, T. L. Grif\ufb01ths, F. E. Theunissen, and J. L. Gallant. PrAGMATiC: a Probabilistic and Genera-\n\ntive Model of Areas Tiling the Cortex. ArXiv 1504.03622, 2015.\n\n[13] R. A. Horn and C. R. Johnson. Matrix analysis. Cambridge university press, 2012.\n[14] A. Edelman, T. A. Arias, and S. T Smith. The geometry of algorithms with orthogonality constraints.\n\nSIAM journal on Matrix Analysis and Applications, 20(2):303\u2013353, 1998.\n\n[15] J.-H. Ahn and J.-H. Oh. A constrained EM algorithm for principal component analysis. Neural Compu-\n\ntation, 15(1):57\u201365, 2003.\n\n[16] J. R. Manning, R. Ranganath, K. A. Norman, and D. M. Blei. Topographic factor analysis: a bayesian\n\nmodel for inferring brain networks from neural data. PLoS One, 9(5):e94914, 2014.\n\n[17] A. M. Michael, M. Anderson, et al. Preserving subject variability in group fMRI analysis: performance\n\nevaluation of GICA vs. IVA. Frontiers in systems neuroscience, 8, 2014.\n\n[18] H. Xu, A. Lorbert, P. J. Ramadge, J. S. Guntupalli, and J. V. Haxby. Regularized hyperalignment of\n\nmulti-set fMRI data. In Proc. Statistical Signal Processing Workshop, pages 229\u2013232. IEEE, 2012.\n\n[19] H. Hotelling. Relations between two sets of variates. Biometrika, 28(3-4):321\u2013377, 1936.\n[20] A. Hyv\u00e4rinen, J. Karhunen, and E. Oja. Independent component analysis. John Wiley & Sons, 2004.\n[21] J. Chen, Y. C. Leong, K. A. Norman, and U. Hasson. Reinstatement of neural patterns during narrative\n\nfree recall. Abstracts of the Cognitive Neuroscience Society, 2014.\n\n[22] D. S. Margulies, J. L. Vincent, and et al. Precuneus shares intrinsic functional architecture in humans and\n\nmonkeys. Proceedings of the National Academy of Sciences, 106(47):20069\u201320074, 2009.\n\n[23] J. V. Haxby, M. I. Gobbini, M. L. Furey, A. Ishai, J. L. Schouten, and P. Pietrini. Distributed and overlap-\n\nping representations of faces and objects in ventral temporal cortex. Science, 293(5539), 2001.\n\n[24] M. Hanke, F. J. Baumgartner, et al. A high-resolution 7-Tesla fMRI dataset from complex natural stimu-\n\nlation with an audio movie. Scienti\ufb01c Data, 1, 2014.\n\n[25] T. D. Grif\ufb01ths and J. D. Warren. The planum temporale as a computational hub. Trends in neurosciences,\n\n25(7):348\u2013353, 2002.\n\n[26] Y. Yeshurun, S. Swanson, J. Chen, E. Simony, C. Honey, P. C. Lazaridi, and U. Hasson. How does the\nbrain represent different ways of understanding the same story? Society for Neuroscience Abstracts, 2014.\n\n[27] M. E. Raichle. The brain\u2019s default mode network. Annual Review of Neuroscience, 38(1), 2015.\n[28] D. L. Ames, C. J. Honey, M. Chow, A. Todorov, and U. Hasson. Contextual alignment of cognitive and\n\nneural dynamics. Journal of cognitive neuroscience, 2014.\n\n[29] M.T. deBettencourt, J. D. Cohen, R. F. Lee, K. A. Norman, and N. B. Turk-Browne. Closed-loop training\n\nof attention with real-time brain imaging. Nature neuroscience, 18(3):470\u2013475, 2015.\n\n9\n\n\f", "award": [], "sourceid": 344, "authors": [{"given_name": "Po-Hsuan (Cameron)", "family_name": "Chen", "institution": "Princeton"}, {"given_name": "Janice", "family_name": "Chen", "institution": null}, {"given_name": "Yaara", "family_name": "Yeshurun", "institution": null}, {"given_name": "Uri", "family_name": "Hasson", "institution": "Princeton University"}, {"given_name": "James", "family_name": "Haxby", "institution": null}, {"given_name": "Peter", "family_name": "Ramadge", "institution": "Princeton"}]}