{"title": "A state-space model of cross-region dynamic connectivity in MEG/EEG", "book": "Advances in Neural Information Processing Systems", "page_first": 1234, "page_last": 1242, "abstract": "Cross-region dynamic connectivity, which describes spatio-temporal dependence of neural activity among multiple brain regions of interest (ROIs), can provide important information for understanding cognition. For estimating such connectivity, magnetoencephalography (MEG) and electroencephalography (EEG) are well-suited tools because of their millisecond temporal resolution. However, localizing source activity in the brain requires solving an under-determined linear problem. In typical two-step approaches, researchers first solve the linear problem with general priors assuming independence across ROIs, and secondly quantify cross-region connectivity. In this work, we propose a one-step state-space model to improve estimation of dynamic connectivity. The model treats the mean activity in individual ROIs as the state variable, and describes non-stationary dynamic dependence across ROIs using time-varying auto-regression. Compared with a two-step method, which first obtains the commonly used minimum-norm estimates of source activity, and then fits the auto-regressive model, our state-space model yielded smaller estimation errors on simulated data where the model assumptions held. When applied on empirical MEG data from one participant in a scene-processing experiment, our state-space model also demonstrated intriguing preliminary results, indicating leading and lagged linear dependence between the early visual cortex and a higher-level scene-sensitive region, which could reflect feed-forward and feedback information flow within the visual cortex during scene processing.", "full_text": "A state-space model of cross-region dynamic\n\nconnectivity in MEG/EEG\n\nYing Yang\u2217 Elissa M. Aminoff\u2020 Michael J. Tarr\u2217 Robert E. Kass\u2217\n\n\u2217Carnegie Mellon University, \u2020Fordham University\n\nying.yang.cnbc.cmu@gmail.com, {eaminoff@fordham, michaeltarr@cmu, kass@stat.cmu}.edu\n\nAbstract\n\nCross-region dynamic connectivity, which describes the spatio-temporal depen-\ndence of neural activity among multiple brain regions of interest (ROIs), can\nprovide important information for understanding cognition. For estimating such\nconnectivity, magnetoencephalography (MEG) and electroencephalography (EEG)\nare well-suited tools because of their millisecond temporal resolution. However,\nlocalizing source activity in the brain requires solving an under-determined linear\nproblem. In typical two-step approaches, researchers \ufb01rst solve the linear problem\nwith generic priors assuming independence across ROIs, and secondly quantify\ncross-region connectivity. In this work, we propose a one-step state-space model to\nimprove estimation of dynamic connectivity. The model treats the mean activity in\nindividual ROIs as the state variable and describes non-stationary dynamic depen-\ndence across ROIs using time-varying auto-regression. Compared with a two-step\nmethod, which \ufb01rst obtains the commonly used minimum-norm estimates of source\nactivity, and then \ufb01ts the auto-regressive model, our state-space model yielded\nsmaller estimation errors on simulated data where the model assumptions held.\nWhen applied on empirical MEG data from one participant in a scene-processing\nexperiment, our state-space model also demonstrated intriguing preliminary results,\nindicating leading and lagged linear dependence between the early visual cortex\nand a higher-level scene-sensitive region, which could re\ufb02ect feedforward and\nfeedback information \ufb02ow within the visual cortex during scene processing.\n\n1\n\nIntroduction\n\nCortical regions in the brain are anatomically connected, and the joint neural activity in connected\nregions are believed to underlie various perceptual and cognitive functions. Besides anatomical\nconnectivity, researchers are particularly interested in the spatio-temporal statistical dependence\nacross brain regions, which may vary quickly in different time stages of perceptual and cognitive\nprocesses. Descriptions of such spatio-temporal dependence, which we call dynamic connectivity, not\nonly help to model the joint neural activity, but also provide insights to understand how information\n\ufb02ows in the brain. To estimate dynamic connectivity in human brains, we need non-invasive\ntechniques to record neural activity with high temporal resolution. Magnetoencephalography (MEG)\nand electroencephalography (EEG) are well-suited tools for such purposes, in that they measure\nchanges of magnetic \ufb01elds or scalp voltages, which are almost instantaneously induced by electric\nactivity of neurons.\nHowever, spatially localizing the source activity in MEG/EEG is challenging. Assuming the brain\nsource space is covered by m discrete points, each representing an electric current dipole generated\nby the activity of the local population of neurons, then the readings of n MEG/EEG sensors can\nbe approximated with a linear transformation of the m-dimensional source activity. The linear\ntransformation, known as the forward model, is computed using Maxwell equations given the relative\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fpositions of sensors with respect to the scalp (1). Typically m \u2248 103 \u223c 104 whereas n \u2248 102 (cid:28) m,\nso the source localization problem \u2014 estimating the source activity from the sensor data\u2014 is under-\ndetermined. Previous work has exploited various constraints or priors for regularization, including L2\nnorm penalty (2; 3), sparsity-inducing penalty (4), and priors that encourage local spatial smoothness\nor temporal smoothness (5; 6; 7; 8).\nWhen estimating dynamic connectivity from MEG/EEG recordings, especially among several pre-\nde\ufb01ned regions of interest (ROIs), researchers often use a two-step procedure: Step 1, estimating\nsource activity using one of the common source localization methods, for example, the minimum\nnorm estimate (MNE) that penalizes squared L2 norm (2); Step 2, extracting the mean activity of\nsource points within each ROI, and then quantifying the statistical dependence among the ROIs,\nusing various methods ranging from pairwise correlations of time series to Granger causality and\nother extensions (9). However, most of the popular methods in Step 1 do not assume dependence\nacross ROIs. For example, MNE assumes that all source points have independent and identical priors.\nEven in methods that assume auto-regressive structures of source activity (6; 8), only dependence\non the one-step-back history of a source point itself and its adjacent neighbors is considered, while\nlong-range dependence across ROIs is ignored. Biases due to these assumptions in Step 1 can not be\nadjusted in Step 2 and thus may result in additional errors in the connectivity analysis.\nAlternatively, one can combine source localization and connectivity analysis jointly in one step.\nTwo pioneering methods have explored this direction. The dynamical causal modeling (DCM\n(10)) assumes the source activity includes only one single current dipole in each ROI, and the ROI\ndipoles are modeled with a nonlinear, neurophysiology-informed dynamical system, where time-\ninvariant coef\ufb01cients describe how the current activity in each ROI is dependent on the history of all\nROIs. Another method (11) does not use pre-de\ufb01ned ROIs, but builds a time-invariant multivariate\nauto-regressive (AR) model of all m source points, where the AR coef\ufb01cients are constrained by\nstructural white-matter connectivity and sparsity-inducing priors. Both methods use static parameters\nto quantify connectivity, but complex perceptual or cognitive processes may involve fast changes of\nneural activity, and correspondingly require time-varying models of dynamic connectivity.\nHere, we propose a new one-step state-space model, designed to estimate dynamic spatio-temporal\ndependence across p given ROIs directly from MEG/EEG sensor data. We de\ufb01ne the mean activity\nof the source points within each individual ROI as our p-dimensional state variable, and use a time-\nvarying multivariate auto-regressive model to describe how much the activity in each ROI is predicted\nby the one-step-back activity in the p ROIs. More speci\ufb01cally, we utilize the common multi-trial\nstructure of MEG/EEG experiments, which gives independent observations at each time point and\nfacilitates estimating the time-varying auto-regressive coef\ufb01cients. Given the state variable at each\ntime point, activities of source points within each ROI are modeled as independent Gaussian variables,\nwith the ROI activity as the mean and a shared ROI-speci\ufb01c variance; activities of source points\noutside of all ROIs are also modeled as independent Gaussian variables with a zero mean and a shared\nvariance. Finally, along with the forward model that projects source activity to the sensor space, we\nbuild a direct relationship between the state variables (ROI activities) and the sensor observations,\nyielding a tractable Kalman \ufb01lter model. Comparing with the previous one-step methods (10; 11),\nthe main novelty of our model is the time-varying description of connectivity. We note that the\nprevious methods and our model all utilize speci\ufb01c assumptions to regularize the under-determined\nsource localization problem. These assumptions may not always be satis\ufb01ed universally. However,\nwe expect our model to serve as a good option in the one-step model toolbox for researchers, when\nthe assumptions are reasonably met. In this paper, we mainly compare our model with a two-step\nprocedure using the commonly applied MNE method, on simulated data and in a real-world MEG\nexperiment.\n\n2 Model\n\nModel formulation In MEG/EEG experiments, researchers typically acquire multiple trials of\nthe same condition and treat them as independent and identically distributed (i.i.d.) samples. Each\ntrial includes a \ufb01xed time window of (T +1) time points, aligned to the stimulus onset. Assuming\nthere are n sensors and q trials, we use y(r)\nto denote the n-dimensional sensor readings at time\nt (t = 0, 1, 2,\u00b7\u00b7\u00b7 , T ) in the rth trial (r = 1, 2,\u00b7\u00b7\u00b7 , q). To be more succinct, when alluding to the\nsensor readings in a generic trial without ambiguity, we drop the superscript (r) and use yt instead;\nthe same omission works for source activity and the latent ROI activity described below. We also\n\nt\n\n2\n\n\fsensor model (forward model): yt = GJ t + et,\n\nassume the mean of sensor data across trials is an n \u00d7 (T + 1) zero matrix; this assumption can be\neasily met by subtracting the n \u00d7 (T + 1) sample mean across trials from the data.\nMEG and EEG are mainly sensitive to electric currents in the pyramidal cells, which are perpendicular\nto the folded cortical surfaces (12). Here we de\ufb01ne the source space as a discrete mesh of m source\npoints distributed on the cortical surfaces, where each source point represents an electric current\ndipole along the local normal direction. If we use an m-dimensional vector J t to denote the source\nactivity at time t in a trial, then the corresponding sensor data yt has the following form\ni.i.d\u223c N (0, Qe)\n\n(1)\nwhere the n \u00d7 m matrix G describes the linear projection of the source activity into the sensor\nspace, and the sensor noise, et, is modeled as temporally independent draws from an n-dimensional\nGaussian distribution N (0, Qe). The noise covariance Qe can be pre-measured using recordings in\nthe empty room or in a baseline time window before experimental tasks.\nStandard source localization methods aim to solve for J t given yt, G and Qe. In contrast, our model\naims to estimate dynamic connectivity among p pre-de\ufb01ned regions of interest (ROIs) in the source\nspace (see Figure 1 for an illustration). We assume at each time point in each trial, the current dipoles\nof the source points within each ROI share a common mean. Given p ROIs, we have a p-dimensional\nstate variable ut at time t in a trial, where each element represents the mean activity in one ROI. The\nstate variable ut follows a time-varying auto-regressive model of order 1\n\net\n\n(2)\nwhere Q0 is a p \u00d7 p covariance matrix at t = 0, and Ats are the time-varying auto-regressive\ncoef\ufb01cients, which describe lagged dependence across ROIs. The p-dimensional Gaussian noise term\n\u0001t is independent of the past, with a zero mean and a covariance matrix Q.\n\nfor t = 1,\u00b7\u00b7\u00b7 , T.\n\n\u0001t \u223c N (0, Q),\n\nROI model: u0 \u223c N (0, Q0)\n\nut = Atut\u22121 + \u0001t,\n\nFigure 1: Illustration of the one-step state-space model\n\nNow we describe how the source activity is distributed given the state variable (i.e., the ROI means).\nBelow, we denote the lth element in a vector a by a[l], and the entry in ith row and jth column of a\nmatrix L by L[i, j]. Let Ai be the set of indices of source points in the ith ROI (i = 1, 2,\u00b7\u00b7\u00b7 , p);\nthen for any l \u2208 Ai, the activity of the lth source point at time t in a trial (scalar J t[l]) is modeled as\nthe ROI mean plus noise,\n\nJ t[l] = ut[i] + wt[l], wt[l] i.i.d.\u223c N (0, \u03c32\ni ),\n\n(3)\nwhere wt denotes the m-dimensional noise on the m source points given the ROI means ut, at time t\nin the trial. Note that the mean ut[i] is shared by all source points within the ith ROI, and the noise\nterm wt[l] given the mean is independent and identically distributed as N (0, \u03c32\ni ) for all source points\nwithin the ROI, at any time in any trial. Additionally, we denote the indices of source points outside\nof any ROIs by A0 = {l, l /\u2208 \u222ap\ni=1Ai}, and similarly, for each such source point, we also assume its\nactivity at time t in each trial has a Gaussian distribution, but with a zero mean, and a variance \u03c32\n0\n\n\u2200l \u2208 Ai,\n\nJ t[l] = 0 + wt[l], wt[l] i.i.d.\u223c N (0, \u03c32\n0),\n\n\u2200l \u2208 A0.\n\nWe can concisely re-write (3) and (4) as\n\nsource model: J t = Lut + wt, wt\n\ni.i.d.\u223c N (0, QJ )\n\n(4)\n\n(5)\n\n3\n\n .........sensor\u00a0spacesource\u00a0spacestate\u00a0space\u00a0(ROI\u00a0means)sensor\u00a0modelsource\u00a0modelROI\u00a0model\fwhere L is a 0/1 m \u00d7 p matrix indicating whether a source point is in an ROI (i.e., L[l, i] = 1\nif l \u2208 Ai and L[l, i] = 0 otherwise). The covariance QJ is an m \u00d7 m diagonal matrix, where\neach diagonal element is one among {\u03c32\np}, depending on which region the corresponding\n1 if l \u2208 A1,\nsource point is in; that is, QJ [l, l] = \u03c32\nQJ [l, l] = \u03c32\nCombining the conditional distributions of (yt|J t) given by (1) and (J t|ut) given by (5), we can\neliminate J t (by integrating over all values of J t) and obtain the following conditional distribution\nfor (yt|ut)\n\n0, \u03c32\n0 if l \u2208 A0 (outside of any ROIs), and QJ [l, l] = \u03c32\n\n2 if l \u2208 A2 and so on.\n\n1,\u00b7\u00b7\u00b7 \u03c32\n\nyt = Cut + \u03b7t, \u03b7t\n\ni.i.d.\u223c N (0, R) where C = GL, R = Qe + GQJ G(cid:48)\n\n(6)\nwhere G(cid:48) is the transpose of G. Putting (2) and (6) together, we have a time-varying Kalman\nt }T,q\n\ufb01lter model, where the observed sensor data from q trials {y(r)\nt=0,r=1 and parameters Qe, G\ni=0} are to be\nt=1, Q0, Q,{\u03c32\nand L are given, and the unknown set of parameters \u03b8 = {{At}T\ni }p\nestimated. Among these parameters, we are mainly interested in {At}T\nt=1, which describes the\nspatio-temporal dependence. Let f (\u00b7) denote probability density functions in general. We can add\n(cid:80)T\noptional priors on \u03b8 (denoted by f (\u03b8)) to regularize the parameters. For example, we can use\nf (\u03b8) = f ({At}T\nt=2 (cid:107)At \u2212 At\u22121(cid:107)2\nF )), which penalizes the\nsquared Frobenius norm ((cid:107) \u00b7 (cid:107)F ) of Ats and encourages temporal smoothness.\n\n(cid:80)T\nt=1 (cid:107)At(cid:107)2\n\nt=1) \u221d exp(\u2212(\u03bb0\n\nF + \u03bb1\n\nt }T,q\n\nt }T,q\n\n= E(u(r)\ndef\n\nFitting the parameters using the expectation-maximization (EM) algorithm To estimate \u03b8, we\nmaximize the objective function log f ({y(r)\nt=0,r=1; \u03b8) + log f (\u03b8) using the standard expectation-\nmaximization (EM) algorithm (13). Here log f ({y(r)\nt=0,r=1; \u03b8) is the marginal log-likelihood of\nthe sensor data, and log f (\u03b8) is the logarithm of the prior. We alternate between an E-step and an\nM-step.\nIn the E-step, given an estimate of the parameters (denoted by \u02dc\u03b8), we use the forward and\nbackward steps in the Kalman smoothing algorithm (13) to obtain the posterior mean of ut,\n\u03c4 }T\n|{y(r)\nu(r)\n\u03c4 =0), and\nt|T\n\u03c4 }T\nthe posterior cross covariance of ut and ut\u22121, P (r)\n\u03c4 =0), for each t\nin each trial r. Here E(\u00b7) and cov(\u00b7) denote the expectation and the covariance. More details are in\nAppendix and (13).\nIn the M-step, we maximize the expectation of log f ({y(r)\nt }T,q\nt=0,r=1; \u03b8) + log f (\u03b8),\nt }T,q\nt=0,r=1; \u02dc\u03b8). Let tr(\u00b7) and\nwith respect to the posterior distribution \u02dcf\ndet(\u00b7) denote the trace and the determinant of a matrix. Given results in the E-step based on \u02dc\u03b8, the\nM-step is equivalent to minimizing three objectives separately\n\n\u03c4 =0), the posterior covariance of ut, P (r)\nt|T\n\nt=0,r=1,{u(r)\nt }T,q\n\nt }T,q\nt=0,r=1|{y(r)\n\ndef\n= cov(u(r)\nt\u22121|{y(r)\n, u(r)\n\n= f ({u(r)\n\ndef\n= cov(u(r)\n\n|{y(r)\n\n(t,t\u22121)|T\n\n\u03c4 }T\n\ndef\n\nt\n\nt\n\nt\n\n\u03b8\n\nQ0\n\nt=1\n\ni=0\n\n{\u03c32\n\nt }T,q\n\n(\u2212E \u02dcf (log f ({y(r)\nL1 + min\nQ,{At}T\n\nt }T,q\nt=0,r=1; \u03b8)) \u2212 log f (\u03b8))\nL3.\n(cid:88)q\n\nt=0,r=1,{u(r)\nmin\n\u2261 min\nL2 + min\ni }p\nL1(Q0) = q log det(Q0) + tr(Q\u22121\n0 B0) where B0 =\nL2(Q,{At}T\n+ log f ({At}T\nwhere B1t =\n\nt=1) = qT log det(Q) + tr(Q\u22121(cid:88)T\n(cid:88)q\n(cid:88)q\n(cid:88)q\ni=0) = q(T + 1) log det(R) + tr(R\u22121B4), where R = Qe + GQJ G(cid:48),\n\nt|T )(cid:48)), B2t =\n(t\u22121)|T )(cid:48))\n\n(P (r)\n(B1t \u2212 AtB(cid:48)\n\nt|T (u(r)\n(t\u22121)|T (u(r)\n\n0|T + u(r)\n2t \u2212 B2tA(cid:48)\n\nB3t =\nL3({\u03c32\ni }p\n\n(t,t\u22121)|T + u(r)\n\n(t\u22121)|T + u(r)\n\n(cid:88)T\n\n(cid:88)q\n\nt|T + u(r)\n\nr=1\n(P (r)\n\n0|T (u(r)\n\n(P (r)\n\n(P (r)\n\nt=1)\n\nr=1\n\nr=1\n\nr=1\n\n[(y(r)\n\nt \u2212 Cu(r)\n\nt|T )(y(r)\n\nt \u2212 Cu(r)\n\nt|T )(cid:48) + CP (r)\n\nt|T C(cid:48))]\n\nand B4 =\n\nt=1\n\nr=1\n\nt=0\n\n0|T )(cid:48))\n\nt + AtB3tA(cid:48)\nt))\n\n(9)\n(t\u22121)|T )(cid:48), )\n\nt|T (u(r)\n\n(7)\n\n(8)\n\n(10)\n\nThe optimization for the three separate objectives is relatively easy.\n\n4\n\n\f\u2022 For L1, the analytical solution is Q0 \u2190 (1/q)(B0).\n\u2022 For L2, optimization for {At}T\n\nQ has the analytical solution Q \u2190 1/(qT )(cid:80)T\n\nt=1 and Q can be done in alternations. Given {At}T\nt=1,\nt + AtB3tA(cid:48)\nt).\nGiven Q, we use gradient descent with back-tracking line search (14) to solve for {At}T\nt=1,\n= 2Q\u22121(\u2212B2t + AtB3t) + 2Dt, Dt = \u03bb1(2At \u2212 At+1 \u2212\nwhere the gradients are \u2202L2\nAt\u22121) + \u03bb0At for t = 2,\u00b7\u00b7\u00b7 , T \u2212 1, Dt = \u03bb1(A1 \u2212 A2) + \u03bb0A1 for t = 1, and\nDt = \u03bb1(At \u2212 AT\u22121) + \u03bb0At for t = T .\n\nt=1(B1t \u2212 AtB(cid:48)\n\n2t \u2212 B2tA(cid:48)\n\n\u2202At\n\n\u2202\u03c3i\n\n\u2202\u03c3i\n\n\u2202R )(cid:48) \u2202R\n\n), where \u2202L3\n\n\u2202R = R\u22121\u2212R\u22121B4R\u22121 and \u2202R\n\n\u2022 For L3, we can also use gradient descent to solve for \u03c3i, with the gradient \u2202L3\n\n=\n= 2\u03c3iG[:, l \u2208 Ai]G[:, l \u2208 Ai](cid:48).\ntr(( \u2202L3\nHere G[:, l \u2208 Ai] denotes the columns in G corresponding to source points in the ith region.\nBecause the E-M algorithm only guarantees to \ufb01nd a local optimum, we use multiple initializations,\nand select the solution that yields the best objective function log f ({y(r)\nt=0,r=1) + log f (\u03b8) (see\nthe appendix on computing log f ({y(r)\nt=0,r=1; \u03b8)). The implementation of the model and the\nE-M algorithm in Python is available at github.com/YingYang/MEEG_connectivity.\n\nt }T,q\n\nt }T,q\n\n\u2202\u03c3i\n\nVisualizing the connectivity We visualize the lagged linear dependence between any pair of\nROIs. According to the auto-regressive model in (2), given {At}T\nt=1, we can characterize the linear\ndependence of ROI means at time t + h on those at time t by\n\nwhere \u02dcAt,t+h =(cid:81)t+1\n\n\u03c4 =t+h A\u03c4 , and in(cid:81)t+1\n\nut+h = \u02dcAt,t+hut + noise independent of ut\n\n\u03c4 =t+h, \u03c4 decreases from t + h to t + 1. For two ROIs indexed\nby i1 and i2, \u02dcAt,t+h[i1, i2] indicates the linear dependence of the activity in ROI i1 at time t + h\non the activity in ROI i2 at time t, where the linear dependence on the activity at time t in other\nROIs and ROI i1 itself is accounted for; similarly, \u02dcAt,t+h[i2, i1] indicates the linear dependence of\nthe activity in ROI i2 at time t + h on the activity in ROI i1 at time t. Therefore, we can create a\nT \u00d7 T matrix \u2206 for any pair of ROIs (i1 and i2) to describe their linear dependence at any time\nlag: \u2206[t, t + h] = \u02dcAt,t+h[i2, i1] (i1 leading i2) and \u2206[t + h, t] = \u02dcAt,t+h[i1, i2] (i2 leading i1), for\nt = 1,\u00b7\u00b7\u00b7 , T and h = 1,\u00b7\u00b7\u00b7 , T \u2212 t \u2212 1.\n\n3 Results\n\nTo examine whether our state-space model can improve dynamic connectivity estimation empirically,\ncompared with the two-step procedure, we applied both approaches on simulated and real MEG data.\nWe implemented the following two-step method as a baseline for comparison. In Step 1, we applied\nthe minimum-norm estimate (MNE (2)), one of the most commonly used source localization methods,\nto estimate J t for each time point in each trial. This is a Bayesian estimate assuming an L2 prior\non the source activity. Given G, Qe and a prior J t \u223c N (0, (1/\u03bb)I), \u03bb > 0 and the corresponding\nyt, the estimate is J t \u2190 G(cid:48)(GG(cid:48) + \u03bbQe)\u22121yt. We averaged the MNE estimates for source points\nwithin each ROI, at each time point and in each trial respectively, and treated the averages as an\nestimate of the ROI means {ut}T,q\nt=0,r=1. In Step 2, according to the auto-regressive model in (2), we\nestimated Q0,{At}T\nt=1 and Q by maximizing the sum of the log-likelihood and the logarithm of the\nt=0,r=1) + log f ({At}T\nprior (log f ({ut}T,q\nt=1); the maximization is very similar to the optimization\nfor L2 in the M-step. Details are deferred to the appendix.\n\n3.1 Simulation\n\nWe simulated MEG sensor data according to our model assumptions. The source space was de\ufb01ned\nas m \u2248 5000 source points covering the cortical surfaces of a real brain, with 6.2 mm spacing on\naverage, and n = 306 sensors were used. The sensor noise covariance matrix Qe was estimated\nfrom real data. Two bilaterally merged ROIs were used: the pericalcarine area (ROI 1), and the\nparahippocampal gyri (ROI 2) (see Figure 2a). We selected these two regions, because they were\nof interest when we applied the models on the real MEG data (see Section 3.2). We generated the\nauto-regressive coef\ufb01cients for T = 20 time points, where for each At, the diagonal entries were\n\n5\n\n\fset to 0.5, and the off-diagonal entries were generated as a Morlet function multiplied by a random\nscalar drawn uniformly from the interval (\u22121, 1) (see Figure 2b for an example). The covariances\nQ0 and Q were random positive de\ufb01nite matrices, whose diagonal entries were a constant a. The\nvariances of source space noise {\u03c32\ni }p\ni=0 were randomly drawn from a Gamma distribution with the\nshape parameter being 2 and the scale parameter being 1. We used two different values, a = 2 and\na = 5, respectively, where the relative strength of the ROI means compared with the source variance\n{\u03c32\ni }p\ni=0 were different. Each simulation had q = 200 trials, and 5 independent simulations for each\na value were generated. The unit of the source activity was nanoampere meter (nAm).\nWhen running the two-step MNE method for each simulation, a wide range of penalization values (\u03bb)\nwere used. When \ufb01tting the state-space model, multiple initializations were used, including one of\nthe two-step MNE estimates. In the prior of {At}T\nt=1, we set \u03bb0 = 0 and \u03bb1 = 0.1. For the \ufb01tted\nparameters {At}T\nt=1 and Q we de\ufb01ned the relative error as the Frobenius norm of the difference\nbetween the estimate and the true parameter, divided by the Frobenius norm of the true parameter\n(e.g., for the true Q and the estimate \u02c6Q, the relative error was (cid:107) \u02c6Q \u2212 Q(cid:107)F /(cid:107)Q(cid:107)F ). For different\ntwo-step MNE estimates with different \u03bbs, the smallest relative error was selected for comparison.\nFigure 2c and 2d show the relative errors and paired differences in errors between the two methods;\nin these simulations, the state-space model yielded smaller estimation errors than the two-step MNE\nmethod.\n\n(a)\n\n(b)\n\n(c)\n\n(d)\n\nFigure 2: Simulation results. (a), Illustration of the two ROIs. (b), The auto-regressive coef\ufb01cients\n{At}T\nt=1 of T = 20 time points in one example simulation (a = 5). Here A[:, i1, i2] indicates the\ntime-varying coef\ufb01cient in At[i1, i2], for i1, i2 = 1, 2. (The legends: truth (blue), true values; ss\n(green), estimates by the state-space model; mne (red), estimates by the two-step MNE method.) (c)\nand (d), Comparison of the state-space model (ss) with the two-step MNE method (mne) in relative\nerrors of {At}T\nt=1 (c) and Q (d). The error bars show standard errors across individual simulations.\n\n3.2 Real MEG data on scene processing\n\nWe also applied our state-space model and the two-step MNE method on real MEG data, to explore\nthe dynamic connectivity in the visual cortex during scene processing. It is hypothesized that the\nventral visual pathway, which underlies recognition of what we see, is organized in a hierarchical\nmanner\u2014along the pathway, regions at each level of the hierarchy receive inputs from previous\nlevels, and perform transformations to extract features that are more and more related to semantics\n(e.g., categories of objects/scenes ) (15). Besides such feedfoward processing, a large number of\ntop-down anatomical connections along the hypothesized hierarchy also suggest feedback effects\n\n6\n\n ROI 1ROI 2right hemisphereleft hemisphereA[:,1,1]A[:,2,2]A[:,1,2]A[:,2,1]05101520time index\u22121.0\u22120.50.00.51.0A[:,1,1]truthssmne05101520\u22121.0\u22120.50.00.51.0A[:,1,2]05101520\u22121.0\u22120.50.00.51.0A[:,2,1]05101520\u22121.0\u22120.50.00.51.0A[:,2,2]25a\u22121.0\u22120.50.00.51.0A errorssmnediff ss-mne25a\u22121.0\u22120.50.00.51.0Q error\f(16). Evidence for both directions has been reported previously (17; 18). However, details of the\ndynamic information \ufb02ow during scene processing, such as when and how signi\ufb01cant the feedback\neffect is, is not well understood. Here, as an exploratory step, we estimate dynamic connectivity\nbetween two regions in the ventral pathway: the early visual cortex (EVC) at the lowest level (in the\npericalcarine areas), which is hypothesized to process low-level features such as local edges, and\nthe parahippocampal place area (PPA), which is a scene-sensitive region on the higher level of the\nhierarchy and has been implicated in processing semantic information (19).\nThe 306-channel MEG data were recorded while a human participant was viewing 362 photos of\nvarious scenes. Each image was presented for 200 ms and repeated 5 times across the session,\nand data across the repetitions were averaged, resulting in q = 362 observations. The data was\ndown-sampled from a sampling rate of 1 kHz to 100 Hz, and cropped within \u2212100 \u223c 700 ms, where\n0 ms marked the stimulus onset. Together, we had T + 1 = 80 time points (see the appendix for\nmore preprocessing details). Given the data, we estimated the dynamic connectivity between the\nneural responses to the 362 images in the two ROIs (EVC and PPA), using our state-space model\nand the two-step MNE method. We created a source space including m \u2248 5000 source points for the\nparticipant. In the prior of {At}T\nt=1, we set \u03bb0 = 0 and \u03bb1 = 1.0; in the two-step MNE method, we\nused the default value of the tuning parameter (\u03bb) for single-trial data in the MNE-python software\n(20). After \ufb01tting Q0, {At}T\nt=1 and Q, we computed the \u2206 matrix, as de\ufb01ned in Section 2, to\nvisualize the lagged linear dependence between the two ROIs (EVC and PPA). We also bootstrapped\nthe 362 observations 27 times to obtain standard deviations of entries in \u2206, and then computed a\nz-score for each entry, de\ufb01ned as the ratio between the estimated value and the bootstrapped standard\ndeviation. Note that the sign of the source activity only indicates the direction of the electric current,\nso negative entries in \u2206 are as meaningful as positive ones. We ran two-tailed z-tests on the z-scores\n(assuming a standard normal null distribution); then we plotted the absolute values of the z-scores\nthat passed a threshold where the p-value < 0.05/(T 2), using the Bonferroni correction for T 2\ncomparisons in all the entries (Figure 3). Larger absolute values indicate more signi\ufb01cant non-zero\nentries of \u2206, and more signi\ufb01cant lagged linear dependence. As illustrated in Figure 3a, the lower\nright triangle of \u2206 indicates the linear dependence of PPA activity on previous EVC activity (EVC\nleading PPA, lower- to higher-level), whereas the upper left triangle indicates the linear dependence\nof EVC activity on previous PPA activity (PPA leading EVC, higher- to lower-level).\n\n(a) Illustration of the ROIs and \u2206 (b) Results by the state-space model\nFigure 3: Results from real MEG data on scene processing. (a), illustration of ROIs and the triangular\nparts of \u2206. (b) and (c), thresholded z-scores of \u2206 by the state-space model (b) and by the two-step\nMNE method (c).\n\n(c) Results by the two-step MNE\n\nFigure 3b and 3c show the thresholded absolute values of the z-scores by the state-space model and\nthe two-step MNE method. In Figure 3b by the state-space model, we observed clusters indicating\nsigni\ufb01cant non-zero lagged dependence, in the lower right triangle, spanning roughly from 60 to\n280 ms in EVC and from 120 to 300 ms in PPA, which suggests earlier responses in EVC can predict\nlater responses in PPA in these windows. This pattern could result from feedforward information\n\ufb02ow, which starts when EVC \ufb01rst receives the visual input near 60 ms. In the upper left triangle,\nwe also observed clusters spanning from 100 to 220 ms in PPA and from 140 to 300 ms in EVC,\nsuggesting earlier responses in PPA can predict later responses in EVC, which could re\ufb02ect feedback\nalong the top-down direction of the hierarchy. Figure 3c by the two-step MNE method also shows\nclusters in similar time windows, yet the earliest cluster in the lower right triangle appeared before\n0 ms in EVC, which could be a false positive as visual input is unlikely to reach EVC that early.\n\n7\n\n PPAright hemisphereleft hemisphereEVCPPAEVCPPA leadingEVCEVC leadingPPA0100200300400500600700PPA time (ms)0100200300400500600700EVC time (ms)0123456789100100200300400500600700PPA time (ms)0100200300400500600700EVC time (ms)012345678910\fWe also observed a small cluster in the top right corner near the diagonal by both methods. This\ncluster could indicate late dependence between the two regions, but it was later than the typically\nevoked responses before 500 ms. These preliminary results were based on only one participant, and\nfurther analysis for more participants is needed. In addition, the apparent lagged dependence between\nthe two regions are not necessarily direct or causal interactions; instead, it could be mediated by\nother intermediate or higher-level regions, as well as by the stimulus-driven effects. For example,\nthe disappearance of the stimuli at 200 ms could cause an image-speci\ufb01c offset-response starting at\n260 ms in the EVC, which could make it seem that image-speci\ufb01c responses in PPA near 120 ms\npredicted the responses at EVC after 260 ms. Therefore further analysis including more regions is\nneeded, and the stimulus-driven effect needs to be considered as well. Nevertheless, the interesting\npatterns in Figure 3b suggest that our one-step state-space model can be a promising tool to explore\nthe timing of feedforward and feedback processing in a data-driven manner, and such analysis can\nhelp to generate speci\ufb01c hypotheses about information \ufb02ow for further experimental testing.\n\n4 Discussion\n\nWe propose a state-space model to directly estimate the dynamic connectivity across regions of interest\nfrom MEG/EEG data, with the source localization step embedded. In this model, the mean activities\nin individual ROIs, (i.e., the state variable), are modeled with time-varying auto-regression, which\ncan \ufb02exibly describe the spatio-temporal dependence of non-stationary neural activity. Compared\nwith a two-step method, which \ufb01rst obtains the commonly used minimum-norm estimate of source\nactivity, and then \ufb01ts the auto-regressive model, our state-space model yielded smaller estimation\nerrors than the two-step method in simulated data, where the assumptions in our model held. When\napplied on empirical MEG data from one participant in a scene-processing experiment, our state-\nspace model also demonstrated intriguing preliminary results, indicating leading and lagged linear\ndependence between the early visual cortex and a higher-level scene-sensitive region, which could\nre\ufb02ect feedforward and feedback information \ufb02ow within the visual cortex. In sum, these results\nshed some light on how to better study dynamic connectivity using MEG/EEG and how to exploit the\nestimated connectivity to study information \ufb02ow in cognition.\nOne limitation of the work here is that we did not compare with other one-step models (10; 11). In\nfuture work, we plan to do comprehensive empirical evaluations of the available one-step methods.\nAnother issue is there can be violations of our model assumptions in practice. First, given the\nROI means, the noise on source points could be spatially and temporally correlated, rather than\nindependently distributed. Secondly, if we fail to include an important ROI, the connectivity estimates\nmay be inaccurate\u2014the estimates may not even be equivalent to the estimates when this ROI is\nmarginalized out, due to the under-determined nature of source localization. Thirdly, the assumption\nthat source points within an ROI share a common mean is typically correct for small ROIs but could\nbe less accurate for larger ROIs, where the diverse activities of many source points might not be\nwell-represented by a one-dimensional mean activity. That being said, as long as the activity in\ndifferent source points within the ROI is not fully canceled, positive dependence effects of the kind\nidenti\ufb01ed by our model would still be meaningful in the sense that they re\ufb02ect some cross-region\ndependence. To deal with the last two issues, one may divide the entire source space into suf\ufb01ciently\nsmall, non-overlapping ROIs, when applying our state-space model. In such cases, the number of\nparameters can be large, and some sparsity-inducing regularization (such as the one in (11)) can\nbe applied. In ongoing and future work, we plan to explore this idea and also address the effect of\npotential assumption violations.\n\nAcknowledgments\n\nThis work was supported in part by the National Science Foundation Grant 1439237, the National\nInstitute of Mental Health Grant RO1 MH64537, as well as the Henry L. Hillman Presidential\nFellowship at Carnegie Mellon University.\n\nReferences\n[1] J. C. Mosher, R. M. Leahy, and P. S. Lewis. EEG and MEG: forward solutions for inverse\n\nmethods. Biomedical Engineering, IEEE Transactions on, 46(3):245\u2013259, 1999.\n\n8\n\n\f[2] M. Hamalainen and R. Ilmoniemi. Interpreting magnetic \ufb01elds of the brain: minimum norm\n\nestimates. Med. Biol. Eng. Comput., 32:35\u201342, 1994.\n\n[3] A. M. Dale, A. K. Liu, B. R. Fischl, R. L. Buckner, J. W. Belliveau, J. D. Lewine, and E. Halgren.\nDynamic statistical parametric mapping: combining fMRI and MEG for high-resolution imaging\nof cortical activity. Neuron, 26(1):55\u201367, 2000.\n\n[4] A. Gramfort, M. Kowalski, and M. Hamaleinen. Mixed-norm estimates for the m/eeg inverse\nproblem using accelerated gradient methods. Physics in Medicine and Biology, 57:1937\u20131961,\n2012.\n\n[5] R. D. Pascual-Marqui, C. M. Michel, and D. Lehmann. Low resolution electromagnetic\ntomography: a new method for localizing electrical activity in the brain. International Journal\nof psychophysiology, 18(1):49\u201365, 1994.\n\n[6] A. Galka, O. Y. T. Ozaki, R. Biscay, and P. Valdes-Sosa. A solution to the dynamical inverse\nproblem of eeg generation using spatiotemporal kalman \ufb01ltering. NeuroImage, 23:435\u2013453,\n2004.\n\n[7] J. Mattout, C. Phillips, W. D. Penny, M. D. Rugg, and K. J. Friston. MEG source localization\nunder multiple constraints: an extended bayesian framework. NeuroImage, 30(3):753\u2013767,\n2006.\n\n[8] C. Lamus, M. S. Hamalainen, S. Temereanca, E. N. Brown, and P. L. Purdon. A spatiotemporal\n\ndynamic distributed solution to the MEG inverse problem. NeuroImage, 63:894\u2013909, 2012.\n\n[9] V. Sakkalis. Review of advanced techniques for the estimation of brain connectivity measured\n\nwith EEG/MEG. Computers in biology and medicine, 41(12):1110\u20131117, 2011.\n\n[10] O. David, S. J. Kiebel, L. M. Harrison, J. Mattout, J. M. Kilner, and K. J. Friston. Dynamic\ncausal modeling of evoked responses in EEG and MEG. NeuroImage, 30(4):1255\u20131272, 2006.\n[11] M. Fukushima, O. Yamashita, T. R. Kn\u00f6sche, and M.-a. Sato. MEG source reconstruction\nbased on identi\ufb01cation of directed source interactions on whole-brain anatomical networks.\nNeuroImage, 105:408\u2013427, 2015.\n\n[12] M. Hamalainen, R. Hari, R. J.\n\nJ. Knuutila, and O. V. Lounasmaa.\nMagnetoencephalography\u2013theory, instrumentation, to noninvasive studies of the working human\nbrain. Reviews of Modern Physics, 65:414\u2013487, 1993.\n\nIlmoniemi,\n\n[13] R. H. Shumway and D. S. Stoffer. An approach to time series smoothing and forecasting using\n\nthe EM algorithm. Journal of time series analysis, 3(4):253\u2013264, 1982.\n\n[14] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, New York,\n\nNY, USA, 2004.\n\n[15] J. J. DiCarlo and D. D. Cox. Untangling invariant object recognition. Trends in cognitive\n\nsciences, 11(8):333\u2013341, 2007.\n\n[16] D. J. Felleman and D. C. Van Essen. Distributed hierarchical processing in the primate cerebral\n\ncortex. Cerebral cortex, 1(1):1\u201347, 1991.\n\n[17] R. M. Cichy, A. Khosla, D. Pantazis, A. Torralba, and A. Oliva. Deep neural networks predict\nhierarchical spatio-temporal cortical dynamics of human visual object recognition. arXiv\npreprint arXiv:1601.02970, 2016.\n\n[18] M. Bar, K. S. Kassam, A. S. Ghuman, J. Boshyan, A. M. Schmid, A. M. Dale, M. S. H\u00e4m\u00e4l\u00e4inen,\nK. Marinkovic, D. L. Schacter, B. R. Rosen, et al. Top-down facilitation of visual recognition.\nProceedings of the National Academy of Sciences of the United States of America, 103(2):449\u2013\n454, 2006.\n\n[19] R. Epstein, A. Harris, D. Stanley, and N. Kanwisher. The parahippocampal place area: Recogni-\n\ntion, navigation, or encoding? Neuron, 23(1):115\u2013125, 1999.\n\n[20] A. Gramfort, M. Luessi, E. Larson, D. A. Engemann, D. Strohmeier, C. Brodbeck, R. Goj,\nM. Jas, T. Brooks, L. Parkkonen, et al. Meg and eeg data analysis with mne-python. Frontiers\nin neuroscience, 7:267, 2013.\n\n9\n\n\f", "award": [], "sourceid": 673, "authors": [{"given_name": "Ying", "family_name": "Yang", "institution": "Carnegie Mellon University"}, {"given_name": "Elissa", "family_name": "Aminoff", "institution": "Carnegie Mellon University"}, {"given_name": "Michael", "family_name": "Tarr", "institution": "Carnegie Mellon University"}, {"given_name": "Kass", "family_name": "Robert", "institution": "Carnegie Mellon University"}]}