{"title": "Invariant Common Spatial Patterns: Alleviating Nonstationarities in Brain-Computer Interfacing", "book": "Advances in Neural Information Processing Systems", "page_first": 113, "page_last": 120, "abstract": null, "full_text": "Invariant Common Spatial Patterns: Alleviating\nNonstationarities in Brain-Computer Interfacing\n\nBenjamin Blankertz1,2\n\nMotoaki Kawanabe2\n\nRyota Tomioka3\n\nFriederike U. Hohlefeld4\n\nVadim Nikulin5\n\nKlaus-Robert M\u00fcller1,2\n\n1TU Berlin, Dept. of Computer Science, Machine Learning Laboratory, Berlin, Germany\n\n2Fraunhofer FIRST (IDA), Berlin, Germany\n\n3Dept. Mathematical Informatics, IST, The University of Tokyo, Japan\n\n4Berlin School of Mind and Brain, Berlin, Germany\n\n5Dept. of Neurology, Campus Benjamin Franklin, Charit\u00e9 University Medicine Berlin, Germany\n\n{blanker,krm}@cs.tu-berlin.de\n\nAbstract\n\nBrain-Computer Interfaces can suffer from a large variance of the subject condi-\ntions within and across sessions. For example vigilance \ufb02uctuations in the indi-\nvidual, variable task involvement, workload etc. alter the characteristics of EEG\nsignals and thus challenge a stable BCI operation. In the present work we aim to\nde\ufb01ne features based on a variant of the common spatial patterns (CSP) algorithm\nthat are constructed invariant with respect to such nonstationarities. We enforce\ninvariance properties by adding terms to the denominator of a Rayleigh coef\ufb01cient\nrepresentation of CSP such as disturbance covariance matrices from \ufb02uctuations\nin visual processing. In this manner physiological prior knowledge can be used\nto shape the classi\ufb01cation engine for BCI. As a proof of concept we present a\nBCI classi\ufb01er that is robust to changes in the level of parietal a -activity. In other\nwords, the EEG decoding still works when there are lapses in vigilance.\n\n1 Introduction\n\nBrain-Computer Interfaces (BCIs) translate the intent of a subject measured from brain signals di-\nrectly into control commands, e.g. for a computer application or a neuroprosthesis ([1, 2, 3, 4, 5, 6]).\nThe classical approach to brain-computer interfacing is operant conditioning ([2, 7]) where a \ufb01xed\ntranslation algorithm is used to generate a feedback signal from the electroencephalogram (EEG).\nUsers are not equipped with a mental strategy they should use, rather they are instructed to watch\na feedback signal and using the feedback to \ufb01nd out ways to voluntarily control it. Successful BCI\noperation is reinforced by a reward stimulus. In such BCI systems the user adaption is crucial and\ntypically requires extensive training. Recently machine learning techniques were applied to the BCI\n\ufb01eld and allowed to decode the subject\u2019s brain signals, placing the learning task on the machine side,\ni.e. a general translation algorithm is trained to infer the speci\ufb01c characteristics of the user\u2019s brain\nsignals [8, 9, 10, 11, 12, 13, 14]. This is done by a statistical analysis of a calibration measurement\nin which the subject performs well-de\ufb01ned mental acts like imagined movements. Here, in principle\nno adaption of the user is required, but it is to be expected that users will adapt their behaviour\nduring feedback operation. The idea of the machine learning approach is that a \ufb02exible adaption\nof the system relieves a good amount of the learning load from the subject. Most BCI systems are\nsomewhere between those extremes.\n\n1\n\n\fAlthough the proof-of-concept of machine learning based BCI systems1 was given some years ago,\nseveral major challenges are still to be faced. One of them is to make the system invariant to non\ntask-related \ufb02uctuations of the measured signals during feedback. These \ufb02uctuations may be caused\nby changes in the subject\u2019s brain processes, e.g. change of task involvement, fatigue etc., or by\nartifacts such as swallowing, blinking or yawning. The calibration measurement that is used for\ntraining in machine learning techniques is recorded during 10-30 min, i.e. a relatively short period\nof time and typically in a monotone atmosphere, so this data does not contain all possible kinds of\nvariations to be expected during on-line operation.\nThe present contribution focusses on invariant feature extraction for BCI. In particular we aim to\nenhance the invariance properties of the common spatial patterns (CSP, [15]) algorithm. CSP is the\nsolution of a generalized eigenvalue problem and has as such a strong link to the maximization of a\nRayleigh coef\ufb01cient, similar to Fisher\u2019s discriminant analysis. Prior work by Mika et al. [16] in the\ncontext of kernel Fisher\u2019s discriminant analysis contains the key idea that we will follow: noise and\ndistracting signal aspects with respect to which we want to make our feature extractor invariant is\nadded to the denominator of a Rayleigh coef\ufb01cient. In other words, our prior knowledge about the\nnoise type helps to re-design the optimization of CSP feature extraction. We demonstrate how our\ninvariant CSP (iCSP) technique can be used to make a BCI system invariant to changes in the power\nof the parietal a -rhythm (see Section 2) re\ufb02ecting, e.g. changes in vigilance. Vigilance changes\nare among the most pressing challenges when robustifying a BCI system for long-term real-world\napplications.\nIn principle we could also use an adaptive BCI, however, adaptation typically has a limited time\nscale which might not allow to follow \ufb02uctuations quickly enough. Furthermore online adaptive BCI\nsystems have so far only been operated with 4-9 channels. We would like to stress that adaptation and\ninvariant classi\ufb01cation are no mutually exclusive alternatives but rather complementary approaches\nwhen striving for the same goal: a BCI system that is invariant to undesired distortions and non-\nstationarities.\n\n2 Neurophysiology and Experimental Paradigms\nNeurophysiological background. Macroscopic brain activity during resting wakefulness contains\ndistinct \u2018idle\u2019 rhythms located over various brain areas, e.g. the parietal a -rhythm (7-13 Hz) can\nbe measured over the visual cortex [17] and the m -rhythm can be measured over the pericentral\nsensorimotor cortices in the scalp EEG, usually with a frequency of about 8\u201314 Hz ([18]). The\nstrength of the parietal a -rhythm re\ufb02ects visual processing load as well as attention and fatigue\nresp. vigilance.\nThe moment-to-moment amplitude \ufb02uctuations of these local rhythms re\ufb02ect variable functional\nstates of the underlying neuronal cortical networks and can be used for brain-computer interfacing.\nSpeci\ufb01cally, the pericentral m - and b\nrythms are diminished, or even almost completely blocked, by\nmovements of the somatotopically corresponding body part, independent of their active, passive or\nre\ufb02exive origin. Blocking effects are visible bilateral but with a clear predominance contralateral to\nthe moved limb. This attenuation of brain rhythms is termed event-related desynchronization (ERD)\nand the dual effect of enhanced brain rhythms is called event-related synchronization (ERS) (see\n[19]).\nSince a focal ERD can be observed over the motor and/or sensory cortex even when a subject is only\nimagining a movement or sensation in the speci\ufb01c limb, this feature can be used for BCI control: The\ndiscrimination of the imagination of movements of left hand vs. right hand vs. foot can be based on\nthe somatotopic arrangement of the attenuation of the m and/or b\nrhythms. However the challenge\nis that due to the volume conduction EEG signal recorded at the scalp is a mixture of many cortical\nactivities that have different spatial localizations; for example, at the electrodes over the mortor\ncortex, the signal not only contains the m -rhythm that we are interested in but also the projection of\nparietal a -rhythm that has little to do with the motor-imagination. To this end, spatial \ufb01ltering is an\nindispensable technique; that is to take a linear combination of signals recorded over EEG channels\nand extract only the component that we are interested in.\nIn particular the CSP algorithm that\noptimizes spatial \ufb01lters with respect to discriminability is a good candidate for feature extraction.\n\nExperimental Setup.\nIn this paper we evaluate the proposed algorithm on off-line data in which\nthe nonstationarity is induced by having two different background conditions for the same primary\n\n1Note: In our exposition we focus on EEG-based BCI systems that does not rely on evoked potentials (for\n\nan extensive overview of BCI systems including invasive and systems based on evoked potentials see [1]).\n\n2\n\n\f0\n\n\u22120.05\n\n\u22120.1\n\n\u22120.15\n\n\u22120.2\n\n\u22120.25\n\n\u22120.3\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\nFigure 1: Topographies of r2\u2013values (multiplied by\nthe sign of the difference) quantifying the difference\nin log band-power in the alpha band (8\u201312 Hz) be-\ntween different recording sessions: Left: Difference\nbetween imag_move and imag_lett. Due to lower\nvisual processing demands, alpha power in occipi-\ntal areas is stronger in imag_lett. Right: Difference\nbetween imag_move and sham_feedback. The latter\nhas decreased alpha power in centro-parietal areas.\nNote the different sign in the colormaps.\n\ntask. The ultimate challenge will be on-line feedback with strong \ufb02uctuations of task demands etc,\na project envisioned for the near future.\nWe investigate EEG recordings from 4 subjects (all from whom we have an \u2018invariance measure-\nment\u2019, see below). Brain activity was recorded from the scalp with multi-channel ampli\ufb01ers using\n55 EEG channels.\nIn the \u2018calibration measurement\u2019 all 4.5\u20136 seconds one of 3 different visual stimuli indicated for 3\nseconds which mental task the subject should accomplish during that period. The investigated men-\ntal tasks were imagined movements of the left hand, the right hand, and the right foot. There were\ntwo types of visual stimulation: (1: imag_lett) targets were indicated by letters (L, R, F) appearing at\na central \ufb01xation cross and (2: imag_move) a randomly moving small rhomboid with either its left,\nright or bottom corner \ufb01lled to indicate left or right hand or foot movement, respectively. Since the\nmovement of the object was independent from the indicated targets, target-uncorrelated eye move-\nments are induced. Due to the different demands in visual processing, the background brain activity\ncan be expected to differ substancially in those two types of recordings. The topography of the\nr2\u2013values (bi-serial correlation coef\ufb01cient of feature values with labels) of the log band-power dif-\nference between imag_move and imag_lett is shown in the left plot of Fig. 2. It shows a pronounced\ndifferene in parietal areas.\nA sham_feedback paradigm was designed in order to charaterize invariance properties needed for\nstable real-world BCI applications. In this measurement the subjects received a fake feedback se-\nquence which was preprogrammed. The aim of this recording was to collect data during a large\nvariety of mental states and actions that are not correlated with the BCI control states (motor im-\nagery of hands and feet). Subjects were told that they could control the feedback in some way that\nthey should \ufb01nd out, e.g. with eye movements or muscle activity. They were instructed not to per-\nform movements of hands, arms, legs and feet. The type of feedback was a standard 1D cursor\ncontrol. In each trial the cursor starts in the middle and should be moved to either the left or right\nside as indicated by a target cue. When the cursor touched the left or right border, a response (correct\nor false) was shown. Furthermore the number of hits and misses was shown. The preprogrammed\n\u2018feedback\u2019 signal was constructed such that it was random in the beginning and then alternating peri-\nods of increasingly more hits and periods with chance level performance. This was done to motivate\nthe subjects to try a variety of different actions and to induce different states of mood (satisfaction\nduring \u2018successful\u2019 periods and anger resp. disfavor during \u2018failure\u2019). The right plot of Fig. 2 visual-\nizes the difference in log band-power between imag_move and sham_feedback. A decreased alpha\npower in centro-parietal areas during sham_feedback can be observed. Note that this recording in-\ncludes much more variations of background mental activity than the difference between imag_move\nand imag_lett.\n\n3 Methods\n\nCommon Spatial Patterns (CSP) Analysis. The CSP technique ([15]) allows to determine spatial\n\ufb01lters that maximize the variance of signals of one condition and at the same time minimize the\nvariance of signals of another condition. Since variance of band-pass \ufb01ltered signals is equal to band-\npower, CSP \ufb01lters are well suited to discriminate mental states that are characterized by ERD/ERS\neffects ([20]). As such it has been well used in BCI systems ([8, 14]) where CSP \ufb01lters are calculated\nindividually for each subject on the data of a calibration measurement.\nTechnically the Common Spatial Pattern (CSP) [21] algorithm gives spatial \ufb01lters based on a dis-\ncriminative criterion. Let X1 and X2 be the (time \u00d7 channel) data matrices of the band-pass \ufb01ltered\n\n3\n\n\fEEG signals (concatenated trials) under the two conditions (e.g., right-hand or left-hand imagination,\nrespectively2) and S 1 and S 2 be the corresponding estimates of the covariance matrices S\ni Xi.\nWe de\ufb01ne the two matrices Sd and Sc as follows:\n\ni = X >\n\nSd = S\nSc = S\n\n(1) \u2212 S\n(1) + S\n\n(2)\n\n(2)\n\n: discriminative activity matrix,\n: common activity matrix.\n\nThe CSP spatial \ufb01lter v \u2208 RC (C is the number of channels) can be obtained by extremizing the\nRayleigh coef\ufb01cient:\n\nv>Sdv\nv>Scv\nThis can be done by solving a generalized eigenvalue problem.\n\n{max, min}v\u2208RC\n\n.\n\nSdv = l Scv.\n\n(1)\n\n(2)\n\nThe eigenvalue l\nis bounded between \u22121 and 1; a large positive eigenvalue corresponds to a pro-\njection of the signal given by v that has large power in the \ufb01rst condition but small in the second\ncondition; the converse is true for a large negative eigenvalue. The largest and the smallest eigen-\nvalues correspond to the maximum and the minimum of the Rayleigh coef\ufb01cient problem (Eq. (1)).\nNote that v>Sdv = v>S 1v \u2212 v>S 2v is the average power difference in two conditions that we want\nto maximize. On the other hand, the projection of the activity that is common to two classes v>Scv\nshould be minimized because it doesn\u2019t contribute to the discriminability. Using the same idea from\n[16] we can rewrite the Rayleigh problem (Eq. (1)) as follows:\n\nmin\nv\u2208RC\n\nv>Scv,\n\ns.t.\n\nv>S 1v \u2212 v>S 2v = l ,\n\nwhich can be interpreted as \ufb01nding the minimum norm v with the condition that the average power\ndifference between two conditions to be l . The norm is de\ufb01ned by the common activity matrix Sc.\nIn the next section, we extend the notion of Sc to incorporate any disturbances that is common to\ntwo classes that we can measure a priori.\nIn this paper we call \ufb01lter the generalized eigenvectors v j ( j = 1, . . . ,C) of the generalized eigenvalue\nproblem (Eq. (2)) or a similar problem discussed in the next section. Moreover we denote by V the\nmatrix we obtain by putting the C generalized eigenvectors into columns, namely V = {v j}C\nj=1 \u2208\nRC\u00d7C and call patterns the row vectors of the inverse A = V \u22121. Note that a \ufb01lter v j \u2208 RC has its\ncorresponding pattern a j \u2208 RC; a \ufb01lter v j extracts only the activity spanned by a j and cancels out all\nother activities spanned by ai (i 6= j); therefore a pattern a j tells what the \ufb01lter v j is extracting out\n(see Fig. 2).\nFor classi\ufb01cation the features of single-trials are calculated as the log-variance in CSP projected\nsignals. Here only a few (2 to 6) patterns are used. The selection of patterns is typically based on\neigenvalues. But when a large amount of calibration data is not available it is advisable to use a\nmore re\ufb01ned technique to select the patterns or to manually choose them by visual inspection. The\nvariance features are approximately chi-square distributed. Taking the logarithm makes them similar\nto gaussian distributions, so a linear classi\ufb01er (e.g., linear discriminant analysis) is \ufb01ne.\nFor the evaluation in this paper we used the CSPs corresponding the the two largest and the two\nsmallest eigenvalues and used linear disciminant analysis for classi\ufb01cation. The CSP algorithm,\nseveral extentions as well as practical issues are reviewed in detail in [15].\n\nInvariant CSP. The CSP spatial \ufb01lters extracted as above are optimized for the calibration mea-\nsurement. However, in online operation of the BCI system different non task-related modulations\nof brain signals may occur which are not suppressed by the CSP \ufb01lters. The reason may be that\nthese modulations have not been recorded in the calibration measurement or that they have been so\ninfrequent that they are not consistently re\ufb02ected in the statistics (e.g. when they are not equally\ndistributed over the two conditions).\nThe proposed iCSP method minimizes the in\ufb02uence of modulations that can be characterized in\nadvance by a covariance matrix. In this manner we can code neurophysiological prior knowledge\n\n2We use the term covariance for zero-delay second order statistics between channels and not for the statis-\ntical variability. Since we assume the signal to be band-pass \ufb01ltered, the second order statistics re\ufb02ects band\npower.\n\n4\n\n\for further information such as the tangent covariance matrix ([22]) into such a covariante matrix X\nIn the following motivation we assume that X\nthe notions from above, the objective is then to calculate spatial \ufb01lters v(1)\nj\nmaximized and var(X2v(1)\nthat maximize var(X2v(2)\nPratically this can be accomplished by solving the following two generalized eigenvalue problems:\n\n.\nis the covariance matrix of a signal matrix Y . Using\nj ) is\nare determined\n\nj ) and var(Y v(1)\nj ) and minimize var(X1v(2)\n\nj ) are minimized. Dually spatial \ufb01lters v(2)\n\nj ) and var(Y v(2)\nj ).\n\nsuch that var(X1v(1)\n\nj\n\n(3)\n\nV (1)>S 1V (1) = D(1)\nV (2)>S 2V (2) = D(2)\n\nand V (1)>\nand V (2)>\n\n((1\u2212x )(S 1 + S 2) + x X )V (1) = I\n((1\u2212x )(S 1 + S 2) + x X )V (2) = I\n\n>\n\n((1 \u2212 x )S 2 + x X )v(1)\n\nj with high eigenvalues d(1)\n\n(4)\nwhere x \u2208 [0,1] is a hyperparameter to trade-off the discrimination of the training classes (X1,\nX2 ) against invariance (as characterized by X ). Section 4 discusses the selection of parame-\nter x . Filters v(1)\nj ) but also small\nv(1)\nj ). The dual\nj\nis true for the selection of \ufb01lters from v(2)\nj\nNote that for x = 0.5 there is a strong connection to the one-vs-rest strategy for 3-class CSP ([23]).\nFeatures for classi\ufb01cation are calculated as log-variance using the two \ufb01lters from each of v(1)\nand\nj\nv(2)\ncorresponding to the largest eigenvalues. Note that the idea of iCSP is in the spirit of the\nj\ninvariance constraints in (kernel) Fisher\u2019s Discriminant proposed in [16].\n\nj\n, i.e. small var(X2v(1)\n\nprovide not only high var(X1v(1)\n\nj ) and small var(Y v(1)\n\nj = 1 \u2212 (1 \u2212 x )d(1)\n\n.\n\nj\n\nA Theoretical Investigation of iCSP by In\ufb02uence Analysis. As mentioned, iCSP is aiming at\nrobust spatial \ufb01ltering against disturbances whose covariance X can be anticipated from prior knowl-\nedge. In\ufb02uence analysis is a statistical tool with which we can assess robustness of inference proce-\ndures [24]. Basically, it evaluates the effect in inference procedures, if we add a small perturbation\nof O(e ), where e (cid:28) 1. For example, in\ufb02uence functions for the component analyses such as PCA\nand CCA have been discussed so far [25, 26]. We applied the machinery to iCSP, in order to check\nwhether iCSP really reduces in\ufb02uence caused by the disturbance at least in local sense. For this\npurpose, we have the following lemma (its proof is included in the Appendix).\nLemma 1 (In\ufb02uence of generalized eigenvalue problems) Let l k and wk be k-th eigenvalue and\neigenvector of the generalized eigvenvalue problem\n\n(5)\nrespectively. Suppose that the matrices A and B are perturbed with small matrices e D and e P where\ne (cid:28) 1. Then the eigenvalues ewk and eigenvectors el k of the purterbed problem\n\nAw = l Bw,\n\n(A + e D )ew = el (B + e P)ew\n\n(6)\n\ncan be expanded as l k + ec\nc k = w>\n\nk + o(e ) and wk + ey\ny\n\nk (D \u2212 l kP)wk,\n\n(7)\nMk := B\u22121/2(B\u22121/2AB\u22121/2 \u2212 l kI)+B\u22121/2 and the suf\ufb01x \u2019+\u2019 denotes Moore-Penrose matrix inverse.\n\nk Pwk)wk,\n\nk = \u2212Mk(D \u2212 l kP)wk \u2212\n\n(w>\n\n1\n2\n\nk + o(e ), where\n\nThe generalized eigenvalue problem eqns (3) and (4) can be rephrased as\n\nS 1v = d{(1 \u2212 x )(S 1 + S 2) + x X }v,\n\nS 2u = c{(1 \u2212 x )(S 1 + S 2) + x X }u.\n\nFor simplicity, we consider here the simplest perturbation of the covariances as S 1 \u2192 S 1 + e X\nS 2 \u2192 S 1 + e X\nD 2 = X\ndk + ec\n\nand\n,\n. Therefore, we get the expansions of the eigenvalues and eigenvectors as\n\n. In this case, the perturbation matrices in the lemma can be expressed as D 1 = X\n\n, P = 2(1 \u2212 x )X\n1k, ck + ec\n\n2k, vk + ey\n\n1k and uk + ey\n\nc 1k = {1 \u2212 2(1 \u2212 x )dk}v>\n1k = \u2212{1 \u2212 2(1 \u2212 x )dk}M1kX vk \u2212 (1 \u2212 x )(v>\ny\n2k = \u2212{1 \u2212 2(1 \u2212 x )ck}M2kX uk \u2212 (1 \u2212 x )(u>\ny\n\nk\n\nk\n\nk\n\nX vk)vk,\nX uk)uk,\n\nc 2k = {1 \u2212 2(1 \u2212 x )ck}u>\n\nk\n\nX uk,\n\n(8)\n(9)\n(10)\n\n2k, where\nX vk,\n\n5\n\n\foriginal CSP\n11.4%\n12.9%\n\n10.7%\n\noriginal CSP \u2212 error:   10.7% / 11.4% / 12.9% / 37.9%\n\n37.9%\n\nerrors\n\ninvariant CSP\n\ninvariant CSP \u2212 error:   9.3% / 10.0% / 9.3% / 11.4%\n\n9.3%\n\n10.0%\n\n9.3%\n\n11.4%\n\na=0\n\nalpha=0.0\n\na=0.5\nalpha=0.5\n\na=1\n\nalpha=1.0\n\na=2\nalpha=2.0\n\na=0\n\nalpha=0.0\n\na=0.5\nalpha=0.5\n\na=1\nalpha=1.0\n\na=2\nalpha=2.0\n\nfilter\n\npattern\n\nfilter\n\npattern\n\nFigure 2: Comparison of CSP and iCSP on test data with arti\ufb01cially increased occipital alpha. The upper plots\nshow the classi\ufb01er output on the test data with different degrees of alpha added (factors a = 0, 0.5, 1, 2). The\nlower panel shows the \ufb01lter/pattern coef\ufb01cients topographically mapped on the scalp from original CSP (left)\nand iCSP (right). Here the invariance property was de\ufb01ned with respect to the increase in the alpha activity in\nthe visual cortex (occipital location) using an eyes open/eyes closed recording. See Section 3 for the de\ufb01nition\nof \ufb01lter and pattern.\n\nM1k := S \u22121/2(S \u22121/2S 1S \u22121/2 \u2212 dkI)+S \u22121/2, M2k := S \u22121/2(S \u22121/2S 2S \u22121/2 \u2212 dkI)+S \u22121/2, and S\n:=\n(1 \u2212 x )(S 1 + S 2) + x X\n(resp. x =\n) is satis\ufb01ed, the O(e ) term c 1k (resp. c 2k) of the k-th eigenvalue vanishes and also the k-th\n1 \u2212 1\n2ck\neigenvector does coincide with the one for the original problem up to e order, because the \ufb01rst term\n1k (resp. y\nof y\n\n. The implication of the result is the following. If x = 1 \u2212 1\n2dk\n\n2k) becomes zero (we note that dk and ck also depend on x ).\n\n4 Evaluation\nTest Case with Constructed Test Data. To validate the proposed iCSP, we \ufb01rst applied it to\nspeci\ufb01cally constructed test data. iCSP was trained (x = 0.5) on motor imagery data with the invari-\nance characterized by data from a measurement during \u2018eyes open\u2019 (approx. 40 s) and \u2018eyes closed\u2019\n(approx. 20 s). The motor imagery test data was used in its original form and variants that were\nmodi\ufb01ed in a controlled manner: From another data set during \u2018eyes closed\u2019 we extracted activity\nrelated to increased occipital alpha activity (backprojection of 5 ICA components) and added this\nwith 3 different factors (a = 0.5, 1, 2) to the test data.\nThe upper plots of Fig. 2 display the classi\ufb01er output on the constructed test data. While the per-\nformance of the original CSP is more and more deteriorated with increased alpha mixed in, the\nproposed iCSP method maintains a stable performance independent of the amount of increased al-\npha activity. The spatial \ufb01lters that were extracted by CSP analysis vs. the proposed iCSP often\nlook quite similar. However, tiny but apparently important differences exist. In the lower panel of\nFig. 2 the \ufb01lter (v j) pattern (a j) pairs from original CSP (left) and iCSP (right) are shown. The \ufb01lters\nfrom two approaches resemble each other strongly. However, the corresponding patterns reveal an\nimportant difference. While the pattern of the original CSP has positive weights at the right occipital\nside which might be susceptible to a modulations, the corresponding iCSP has not. A more detailed\ninspection shows that both \ufb01lters have a focus over the right (sensori-) motor cortex, but only the\ninvariant \ufb01lter has a spot of opposite sign right posterior to it. This spot will \ufb01lter out contributions\ncoming from occipital/parietal sites.\n\nModel selection for iCSP. For each subject, a cross-validation was performed for different values\nof x on the training data (session imag_move) and the x resulting in minimum error was chosen. For\nthe same values of x\nthe iCSP \ufb01lters + LDA classi\ufb01er trained on imag_move were applied to calcu-\n\n6\n\n\f]\n\n%\n\n[\n \nr\no\nr\nr\ne\n\n]\n\n%\n\n[\n \nr\no\nr\nr\ne\n\n35\n\n30\n\n25\n\n20\n\n15\n\n10\n\n5\n\n0\n\n35\n\n30\n\n25\n\n20\n\n15\n\n10\n\n5\n\n0\n\n Subject  cv\n\ntest\ntrain\n\n0\n\n0.2\n\n0.4\nxi\n\n0.6\n\n0.8\n\n Subject  zk\n\ntest\ntrain\n\n0\n\n0.2\n\n0.4\nxi\n\n0.6\n\n0.8\n\n]\n\n%\n\n[\n \nr\no\nr\nr\ne\n\n]\n\n%\n\n[\n \nr\no\nr\nr\ne\n\n35\n\n30\n\n25\n\n20\n\n15\n\n10\n\n5\n\n0\n\n35\n\n30\n\n25\n\n20\n\n15\n\n10\n\n5\n\n0\n\n Subject  zv\n\ntest\ntrain\n\n0\n\n0.2\n\n0.4\nxi\n\n0.6\n\n0.8\n\n Subject  zq\n\ntest\ntrain\n\n0\n\n0.2\n\n0.4\nxi\n\n0.6\n\n0.8\n\n]\n\n%\n\n[\n \nr\no\nr\nr\ne\n\n25\n\n20\n\n15\n\n10\n\n5\n\n0\n\nCSP\n\niCSP\n\ncv\nzv\nzk\nzq\n\nFigure 3: Modelselection and evaluation. Left subplots: Selection of hyperparameter x of the iCSP method.\nFor each subject, a cross-validation was performed for different values of x on the training data (session\nimag_move), see thin black line, and the x\nresulting in minimum error was chosen (circle). For the same\nvalues of x\nthe iCSP \ufb01lters + LDA classi\ufb01er trained on imag_move were applied to calculate the test error\non data from imag_lett (thick colorful line). Right plot: Test error in all four recordings for classical CSP\nand the proposed iCSP (with model parameter x chosen by cross-validation on the training set as described in\nSection 4).\n\nlate the test error on data from imag_lett. Fig. 3 (left plots) shows the result of this procedure. The\nshape of the cross-validation error on the training set and the test error is very similar. Accordingly,\nthe selection of values for parameter x\nis successful. For subject zq x = 0 was chosen, i.e. classical\nCSP. The case for subject zk shows that the selection of x may be a delicate issue. For larges val-\nues of x cross-validation error and test error differ dramatically. A choice of x > 0.5 would result\nin bad performance of iCSP, while this effect could have not been predicted so severely from the\ncross-validation of the training set.\n\nEvaluation of Performance with Real BCI Data. For evaluation we used the imag_move session\n(see Section 2) as training set and the imag_lett session as test set. Fig 3 (right plot) compares\nthe classi\ufb01cation error obtained by classical CSP and by the proposed method iCSP with model\nparameter x chosen by cross-validation on the training set as described above. Again an excellent\nimprovement is visible.\n\n5 Concluding discussion\nEEG data from Brain-Computer Interface experiments are highly challenging to evaluate due to\nnoise, nonstationarity and diverse artifacts. Thus, BCI provides an excellent testbed for testing the\nquality and applicability of robust machine learning methods (cf. the BCI Competitions [27, 28]).\nObviously BCI users are subject to variations in attention and motivation. These types of non-\nstationarities can considerably deteriorate the BCI classi\ufb01er performance. In present paper we pro-\nposed a novel method to alleviate this problem.\nA limitation of our method is that variations need to be characterized in advance (by estimating an\nappropriate covariance matrix). At the same time this is also a strength of our method as neuro-\nphysiological prior knowledge about possible sources of non-stationarity is available and can thus\nbe taken into account in a controlled manner. Also the selection of hyperparameter x needs more\ninvestigation, cf. the case of subject zk in Fig. 3. One strategy to pursue is to update the covariance\nmatrix X online with incoming test data. (Note that no label information is needed.) Online learning\n(learning algorithms for adaptation within a BCI session) could also be used to further stabilize the\nsystem against unforeseen changes. It remains to future research to explore this interesting direction.\n\nAppendix: Proof of Lemma 1.\nBy substituting the expansions of el k and ewk to Eq.(6) and taking the O(e ) term, we get\n\nEq.(7) can be obtained by multiplying w>\n\n(A \u2212 l kB)y\n\nk = \u2212(D \u2212 l kP)wk + c kBwk = \u2212(A \u2212 l kB)Mk(D \u2212 l kP)wk,\n\nAy\n\nk + D wk = l kBy\nk to Eq.(11) and applying Eq.(5). Then, from Eq.(11),\n\nk + l kPwk + c kBwk.\n\n(11)\n\n7\n\n\fholds, where we used the constraints w>\n\nj Bwk = d jk and\n\n(A \u2212 l kB)Mk = (cid:229)\n\nj6=k\n\nBw jw>\n\nj = I \u2212 Bwkw>\nk .\n\n(12)\n\ncan\n\nbe\nEq.(12)\n(B\u22121/2AB\u22121/2 \u2212 l kI)+ = (cid:229)\ny\nk can be explained as y\nout to be c = \u2212w>\nnormalization ew>\n\nB\u22121/2AB\u22121/2 \u2212 l kI = (cid:229)\n\nj6=k l\n\nproven\n\nj6=k 1/l\nk = \u2212Mk(D \u2212 l kP)wk + cwk. By a multiplication with w>\nk By k = \u2212w>\n\nby\nand\nj B1/2. Since span{wk} is the kernel of the operator A \u2212 l kB,\njB1/2w jw>\nk B, the constant c turns\nk Pwk/2 derived from the\n\nk BMk = 0> and w>\n\njB1/2w jw>\n\nj B1/2\n\nk Pwk/2, where we used the fact w>\nk (B + e P)ewk = 1.\n\nReferences\n[1] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, and T. M. Vaughan, \u201cBrain-computer interfaces for communication and\n\ncontrol\u201d, Clin. Neurophysiol., 113: 767\u2013791, 2002.\n\n[2] N. Birbaumer, N. Ghanayim, T. Hinterberger, I. Iversen, B. Kotchoubey, A. K\u00fcbler, J. Perelmouter, E. Taub, and H. Flor, \u201cA spelling\n\ndevice for the paralysed\u201d, Nature, 398: 297\u2013298, 1999.\n\n[3] G. Pfurtscheller, C. Neuper, C. Guger, W. Harkam, R. Ramoser, A. Schl\u00f6gl, B. Obermaier, and M. Pregenzer, \u201cCurrent Trends in Graz\n\nBrain-computer Interface (BCI)\u201d, IEEE Trans. Rehab. Eng., 8(2): 216\u2013219, 2000.\n\n[4] J. Mill\u00e1n, Handbook of Brain Theory and Neural Networks, MIT Press, Cambridge, 2002.\n[5] E. A. Curran and M. J. Stokes, \u201cLearning to control brain activity: A review of the production and control of EEG components for driving\n\nbrain-computer interface (BCI) systems\u201d, Brain Cogn., 51: 326\u2013336, 2003.\n\n[6] G. Dornhege, J. del R. Mill\u00e1n, T. Hinterberger, D. McFarland, and K.-R. M\u00fcller, eds., Toward Brain-Computer Interfacing, MIT Press,\n\nCambridge, MA, 2007.\n\n[7] T. Elbert, B. Rockstroh, W. Lutzenberger, and N. Birbaumer, \u201cBiofeedback of Slow Cortical Potentials. I\u201d, Electroencephalogr. Clin.\n\nNeurophysiol., 48: 293\u2013301, 1980.\n\n[8] C. Guger, H. Ramoser, and G. Pfurtscheller, \u201cReal-time EEG analysis with subject-speci\ufb01c spatial patterns for a Brain Computer Interface\n\n(BCI)\u201d, IEEE Trans. Neural Sys. Rehab. Eng., 8(4): 447\u2013456, 2000.\n\n[9] B. Blankertz, G. Curio, and K.-R. M\u00fcller, \u201cClassifying Single Trial EEG: Towards Brain Computer Interfacing\u201d, in: T. G. Diettrich,\n\nS. Becker, and Z. Ghahramani, eds., Advances in Neural Inf. Proc. Systems (NIPS 01), vol. 14, 157\u2013164, 2002.\n\n[10] L. Parra, C. Alvino, A. C. Tang, B. A. Pearlmutter, N. Yeung, A. Osman, and P. Sajda, \u201cLinear spatial integration for single trial detection\n\nin encephalography\u201d, NeuroImage, 7(1): 223\u2013230, 2002.\n\n[11] E. Curran, P. Sykacek, S. Roberts, W. Penny, M. Stokes, I. Jonsrude, and A. Owen, \u201cCognitive tasks for driving a Brain Computer\n\nInterfacing System: a pilot study\u201d, IEEE Trans. Rehab. Eng., 12(1): 48\u201354, 2004.\n\n[12] J. Mill\u00e1n, F. Renkens, J. M. no, and W. Gerstner, \u201cNon-invasive brain-actuated control of a mobile robot by human EEG\u201d, IEEE Trans.\n\nBiomed. Eng., 51(6): 1026\u20131033, 2004.\n\n[13] N. J. Hill, T. N. Lal, M. Schr\u00f6der, T. Hinterberger, B. Wilhelm, F. Nijboer, U. Mochty, G. Widman, C. E. Elger, B. Sch\u00f6lkopf, A. K\u00fcbler,\nand N. Birbaumer, \u201cClassifying EEG and ECoG Signals without Subject Training for Fast BCI Implementation: Comparison of Non-\nParalysed and Completely Paralysed Subjects\u201d, IEEE Trans. Neural Sys. Rehab. Eng., 14(6): 183\u2013186, 2006.\n\n[14] B. Blankertz, G. Dornhege, M. Krauledat, K.-R. M\u00fcller, and G. Curio, \u201cThe non-invasive Berlin Brain-Computer\n\nterface:\nhttp://dx.doi.org/10.1016/j.neuroimage.2007.01.051.\n\nFast Acquisition of Effective Performance in Untrained Subjects\u201d, NeuroImage, 37(2):\n\nIn-\n539\u2013550, 2007, URL\n\n[15] B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, and K.-R. M\u00fcller, \u201cOptimizing Spatial Filters for Robust EEG Single-Trial Analysis\u201d,\n\nIEEE Signal Proc. Magazine, 25(1): 41\u201356, 2008, URL http://dx.doi.org/10.1109/MSP.2008.4408441.\n\n[16] S. Mika, G. R\u00e4tsch, J. Weston, B. Sch\u00f6lkopf, A. Smola, and K.-R. M\u00fcller, \u201cInvariant Feature Extraction and Classi\ufb01cation in Kernel\nSpaces\u201d, in: S. Solla, T. Leen, and K.-R. M\u00fcller, eds., Advances in Neural Information Processing Systems, vol. 12, 526\u2013532, MIT Press,\n2000.\n\n[17] H. Berger, \u201c\u00dcber das Elektroenkephalogramm des Menschen\u201d, Archiv f\u00fcr Psychiatrie und Nervenkrankheiten, 99(6): 555\u2013574, 1933.\n[18] H. Jasper and H. Andrews, \u201cNormal differentiation of occipital and precentral regions in man\u201d, Arch. Neurol. Psychiat. (Chicago), 39:\n\n96\u2013115, 1938.\n\n[19] G. Pfurtscheller and F. H. L. da Silva, \u201cEvent-related EEG/MEG synchronization and desynchronization: basic principles\u201d, Clin. Neuro-\n\nphysiol., 110(11): 1842\u20131857, 1999.\n\n[20] Z. J. Koles, \u201cThe quantitative extraction and topographic mapping of the abnormal components in the clinical EEG\u201d, Electroencephalogr.\n\nClin. Neurophysiol., 79(6): 440\u2013447, 1991.\n\n[21] K. Fukunaga, Introduction to statistical pattern recognition, Academic Press, Boston, 2nd edn., 1990.\n[22] B. Sch\u00f6lkopf, Support vector learning, Oldenbourg Verlag, Munich, 1997.\n[23] G. Dornhege, B. Blankertz, G. Curio, and K.-R. M\u00fcller, \u201cBoosting bit rates in non-invasive EEG single-trial classi\ufb01cations by feature\n\ncombination and multi-class paradigms\u201d, IEEE Trans. Biomed. Eng., 51(6): 993\u20131002, 2004.\n\n[24] F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel, Robust Statistics: The Approach Based on In\ufb02uence Functions, Wiley,\n\nNew York, 1986.\n\n[25] F. Critchley, \u201cIn\ufb02uence in principal components analysis\u201d, Biometrika, 72(3): 627\u2013636, 1985.\n[26] M. Romanazzi, \u201cIn\ufb02uence in Canonical Correlation Analysis\u201d, Psychometrika, 57(2): 237\u2013259, 1992.\n[27] B. Blankertz, K.-R. M\u00fcller, G. Curio, T. M. Vaughan, G. Schalk, J. R. Wolpaw, A. Schl\u00f6gl, C. Neuper, G. Pfurtscheller, T. Hinterberger,\nM. Schr\u00f6der, and N. Birbaumer, \u201cThe BCI Competition 2003: Progress and Perspectives in Detection and Discrimination of EEG Single\nTrials\u201d, IEEE Trans. Biomed. Eng., 51(6): 1044\u20131051, 2004.\n\n[28] B. Blankertz, K.-R. M\u00fcller, D. Krusienski, G. Schalk, J. R. Wolpaw, A. Schl\u00f6gl, G. Pfurtscheller, J. del R. Mill\u00e1n, M. Schr\u00f6der, and\nN. Birbaumer, \u201cThe BCI Competition III: Validating Alternative Approachs to Actual BCI Problems\u201d, IEEE Trans. Neural Sys. Rehab.\nEng., 14(2): 153\u2013159, 2006.\n\n8\n\n\f", "award": [], "sourceid": 983, "authors": [{"given_name": "Benjamin", "family_name": "Blankertz", "institution": null}, {"given_name": "Motoaki", "family_name": "Kawanabe", "institution": null}, {"given_name": "Ryota", "family_name": "Tomioka", "institution": null}, {"given_name": "Friederike", "family_name": "Hohlefeld", "institution": null}, {"given_name": "Klaus-Robert", "family_name": "M\u00fcller", "institution": null}, {"given_name": "Vadim", "family_name": "Nikulin", "institution": null}]}