{"title": "Optimizing spatio-temporal filters for improving Brain-Computer Interfacing", "book": "Advances in Neural Information Processing Systems", "page_first": 315, "page_last": 322, "abstract": "", "full_text": "Optimizing spatio-temporal \ufb01lters for improving\n\nBrain-Computer Interfacing\n\nGuido Dornhege1, Benjamin Blankertz1, Matthias Krauledat1,3,\n\nFlorian Losch2, Gabriel Curio2 and Klaus-Robert M\u00fcller1,3\n1Fraunhofer FIRST.IDA, Kekul\u00e9str. 7, 12 489 Berlin, Germany\n2Campus Benjamin Franklin, Charit\u00e9 University Medicine Berlin,\n\nHindenburgdamm 30, 12 203 Berlin, Germany.\n\n3University of Potsdam, August-Bebel-Str. 89, 14 482 Germany\n\n{dornhege,blanker,kraulem,klaus}@\ufb01rst.fhg.de,\n{\ufb02orian-philip.losch,gabriel.curio}@charite.de\n\nAbstract\n\nBrain-Computer Interface (BCI) systems create a novel communication\nchannel from the brain to an output device by bypassing conventional\nmotor output pathways of nerves and muscles. Therefore they could\nprovide a new communication and control option for paralyzed patients.\nModern BCI technology is essentially based on techniques for the clas-\nsi\ufb01cation of single-trial brain signals. Here we present a novel technique\nthat allows the simultaneous optimization of a spatial and a spectral \ufb01lter\nenhancing discriminability of multi-channel EEG single-trials. The eval-\nuation of 60 experiments involving 22 different subjects demonstrates\nthe superiority of the proposed algorithm. Apart from the enhanced clas-\nsi\ufb01cation, the spatial and/or the spectral \ufb01lter that are determined by the\nalgorithm can also be used for further analysis of the data, e.g., for source\nlocalization of the respective brain rhythms.\n\n1 Introduction\n\nBrain-Computer Interface (BCI) research aims at the development of a system that allows\ndirect control of, e.g., a computer application or a neuroprosthesis, solely by human in-\ntentions as re\ufb02ected in suitable brain signals, cf. [1, 2, 3, 4, 5, 6, 7, 8, 9]. We will be\nfocussing on noninvasive, electroencephalogram (EEG) based BCI systems. Such devices\ncan be used as tools of communication for the disabled or for healthy subjects that might\nbe interested in exploring a new path of man-machine interfacing, say when playing BCI\noperated computer games.\nThe classical approach to establish EEG-based control is to set up a system that is con-\ntrolled by a speci\ufb01c EEG feature which is known to be susceptible to conditioning and to\nlet the subjects learn the voluntary control of that feature. In contrast, the Berlin Brain-\nComputer Interface (BBCI) uses well established motor competences in control paradigms\nand a machine learning approach to extract subject-speci\ufb01c discriminability patterns from\nhigh-dimensional features. This approach has the advantage that the long subject training\nneeded in the operant conditioning approach is replaced by a short calibration measurement\n\n\f(20 minutes) and machine training (1 minute). The machine adapts to the speci\ufb01c charac-\nteristics of the brain signals of each subject, accounting for the high inter-subject variability.\nWith respect to the topographic patterns of brain rhythm modulations the Common Spatial\nPatterns (CSP) (see [10]) algorithm has proven to be very useful to extract subject-speci\ufb01c,\ndiscriminative spatial \ufb01lters. On the other hand the frequency band on which the CSP al-\ngorithm operates is either selected manually or unspeci\ufb01cally set to a broad band \ufb01lter, cf.\n[10, 5]. Obviously, a simultaneous optimization of a frequency \ufb01lter with the spatial \ufb01lter\nis highly desirable. Recently, in [11] the CSSP algorithm was presented, in which very\nsimple frequency \ufb01lters (with one delay tap) for each channel are optimized together with\nthe spatial \ufb01lters. Although the results showed an improvement of the CSSP algorithm over\nCSP, the \ufb02exibility of the frequency \ufb01lters is very limited. Here we present a method that\nallows to simultaneously optimize an arbitrary FIR \ufb01lter within the CSP analysis. The pro-\nposed algorithm outperforms CSP and CSSP on average, and in cases where a separation of\nthe discriminative rhythm from dominating non-discriminative rhythms is of importance, a\nconsiderable increase of classi\ufb01cation accuracy can be achieved.\n\n2 Experimental Setup\n\nIn this paper we investigate data from 60 EEG experiments with 22 different subjects. All\nexperiments included so called training sessions which are used to train subject-speci\ufb01c\nclassi\ufb01ers. Many experiments also included feedback sessions in which the subject could\nsteer a cursor or play a computer game like brain-pong by BCI control. Data from feedback\nsessions are not used in this a-posteriori study since they depend on an intricate interaction\nof the subject with the original classi\ufb01cation algorithm.\nIn the experimental sessions used for the present study, labeled trials of brain signals were\nrecorded in the following way: The subjects were sitting in a comfortable chair with arms\nlying relaxed on the armrests. All 4.5\u20136 seconds one of 3 different visual stimuli indicated\nfor 3\u20133.5 seconds which mental task the subject should accomplish during that period. The\ninvestigated mental tasks were imagined movements of the left hand (l), the right hand\n(r), and one foot (f ). Brain activity was recorded from the scalp with multi-channel EEG\nampli\ufb01ers using 32, 64 resp. 128 channels. Besides EEG channels, we recorded the elec-\ntromyogram (EMG) from both forearms and the leg as well as horizontal and vertical elec-\ntrooculogram (EOG) from the eyes. The EMG and EOG channels were used exclusively\nto make sure that the subjects performed no real limb or eye movements correlated with\nthe mental tasks that could directly (artifacts) or indirectly (afferent signals from muscles\nand joint receptors) be re\ufb02ected in the EEG channels and thus be detected by the classi\ufb01er,\nwhich operates on the EEG signals only. Between 120 and 200 trials for each class were\nrecorded. In this study we investigate only binary classi\ufb01cations, but the results can be\nexpected to safely transfer to the multi-class case.\n\n3 Neurophysiological Background\n\nAccording to the well established model called homunculus, \ufb01rst described by [12], for\neach part of the human body there exists a corresponding region in the motor and so-\nmatosensory area of the neocortex. The \u2019mapping\u2019 from the body to the respective brain\nareas preserves in big parts topography, i.e., neighboring parts of the body are almost rep-\nresented in neighboring parts of the cortex. While the region of the feet is located at the\ncenter of the vertex, the left hand is represented lateralized on the right hemisphere and the\nright hand on the left hemisphere. Brain activity during rest and wakefulness is describable\nby different rhythms located over different brain areas. These rhythms re\ufb02ect functional\nstates of different neuronal cortical networks and can be used for brain-computer inter-\nfacing. These rhythms are blocked by movements, independent of their active, passive or\nre\ufb02exive origin. Blocking effects are visible bilaterally but pronounced contralaterally in\nthe cortical area that corresponds to the moved limb. This attenuation of brain rhythms is\n\n\f10\n\n15\nPz\n\n20\n\n10\n\n20\n\n15\nCz\n\n10\n\n20\n\n15\nC4\n\nB\nd\n\n40\n\n38\n\n36\n\n34\n\n32\n\n30\n\n28\n\nFigure 1: The plot shows the spectra for\none subject during left hand (light line)\nand foot (dark line) motor imagery be-\ntween 5 and 25 Hz at scalp positions Pz,\nCz and C4.\nIn both central channels\ntwo peaks, one at 8 Hz and one at 12 Hz\nare visible. Below each channel the r2-\nvalue which measures discriminability\nis added.\nIt indicates that the second\npeak contains more discriminative infor-\nmation.\n\nrhythm.\n\ntermed event-related desynchronization (ERD), see [13]. Over sensorimotor cortex a so\ncalled idle- or m -rhythm can be measured in the scalp EEG. The most common frequency\nband of m -rhythm is about 10 Hz (precentral a - or m -rhythm, [14]). Jasper and Pen\ufb01eld\n([12]) described a strictly local so called beta-rhythm about 20 Hz over human motor cor-\ntex in electrocorticographic recordings. In Scalp EEG recording one can \ufb01nd m -rhythm\nover motor areas mixed and superimposed by 20 Hz-activity. In this context m -rhythm is\nsometimes interpreted as a subharmonic of cortical faster activity. These brain rhythms de-\nscribed above are of cortical origin but the role of a thalomo-cortical pacemaker has been\ndiscussed since the \ufb01rst description of EEG by Berger ([15]) and is still a point of dis-\ncussion. Lopes da Silva ([16]) showed that cortico-cortical coherence is much larger than\nthalamo-cortical. However, since the focal ERD in the motor and/or sensory cortex can be\nobserved even when a subject is only imagining a movement or sensation in the speci\ufb01c\nlimb, this feature can well be used for BCI control. The discrimination of the imagina-\ntion of movements of left hand vs. right hand vs. foot is based on the topography of the\nattenuation of the m and/or b\nThere are two problems when using ERD features for BCI control:\n(1) The strength of the sensorimotor idle rhythms as measured by scalp EEG is known to\nvary strongly between subjects. This introduces a high intersubject variability on the accu-\nracy with which an ERD-based BCI system works. There is another feature independent\nfrom the ERD re\ufb02ecting imagined or intended movements, the movement related potentials\n(MRP), denoting a negative DC shift of the EEG signals in the respective cortical regions.\nSee [17, 18] for an investigation of how this feature can be exploited for BCI use and\ncombined with the ERD feature. This combination strategy was able to greatly enhance\nclassi\ufb01cation performance in of\ufb02ine studies. In this paper we focus only on improving the\nERD-based classi\ufb01cation, but all the improvements presented here can also be used in the\ncombined algorithm.\n(2) The precentral m -rhythm is often superimposed by the much stronger posterior a -\nrhythm, which is the idle rhythm of the visual system.\nIt is best articulated with eyes\nclosed, but also present in awake and attentive subjects, see Fig. 1 at channel Pz. Due to\nvolume conduction the posterior a -rhythm interferes with the precentral m -rhythm in the\nEEG channels over motor cortex. Hence a m -power based classi\ufb01er is susceptible to mod-\nulations of the posterior a -rhythm that occur due to fatigue, change in attentional focus\nwhile performing tasks, or changing demands of visual processing. When the two rhythms\nhave different spectral peaks as in Fig. 1, channels Cz and C4, a suitable frequency \ufb01lter\ncan help to weaken the interference. The optimization of such a \ufb01lter integrated in the CSP\nalgorithm is addressed in this paper.\n\n4 Spatial Filter - the CSP Algorithm\n\nThe common spatial pattern (CSP) algorithm ([19]) is very useful in calculating spatial\n\ufb01lters for detecting ERD effects ([20]) and for ERD-based BCIs, see [10], and has been\nextended to multi-class problems in [21]. Given two distributions in a high-dimensional\nspace, the (supervised) CSP algorithm \ufb01nds directions (i.e., spatial \ufb01lters) that maximize\n\n\fvariance for one class and at the same time minimize variance for the other class. After\nhaving band-pass \ufb01ltered the EEG signals to the rhythms of interest, high variance re\ufb02ects\na strong rhythm and low variance a weak (or attenuated) rhythm. Let us take the example\nof discriminating left hand vs. right hand imagery. According to Sec. 3, the spatial \ufb01lter\nthat focusses on the area of the left hand is characterized by a strong motor rhythm during\nimagination of right hand movements (left hand is in idle state), and by an attenuated motor\nrhythm during left hand imagination.\nThis criterion is exactly what the CSP algorithm optimizes: maximizing variance for the\nclass of right hand trials and at the same time minimizing variance for left hand trials.\nFurthermore the CSP algorithm calculates the dual \ufb01lter that will focus on the area of the\nright hand (and it will even calculate several \ufb01lters for both optimizations by considering\northogonal subspaces).\nThe CSP algorithm is trained on labeled data, i.e., we have a set of trials si, i = 1,2, ...,\nwhere each trial consists of several channels (as rows) and time points (as columns). A\nspatial \ufb01lter w \u2208 IR#channels projects these trials to the signal \u02c6si(w) = w\u22a4si with only one\nchannel. The idea of CSP is to \ufb01nd a spatial \ufb01lter w such that the projected signal has\nhigh variance for one class and low variance for the other. In other words we maximize the\nvariance for one class whereas the sum of the variances of both classes remains constant,\nwhich is expressed by the following optimization problem:\n\nmax\n\nw\n\nvar( \u02c6si(w)),\n\ns.t.\n\nvar( \u02c6si(w)) = 1,\n\n(1)\n\ni:Trial in Class 1\n\ni\n\nwhere var(\u00b7) is the variance of the vector. An analoguous formulation can be formed for\nthe second class.\nUsing the de\ufb01nition of the variance we simplify the problem to\n\nw\u22a4S 1w,\n\ns.t.\n\nw\u22a4(S 1 + S 2)w = 1,\n\nw\n\nmax\n\n(2)\nwhere S y is the covariance matrix of the trial-concatenated matrix of dimension [channels\n\u00d7 concatenated time-points] belonging to the respective class y \u2208 {1,2}.\nFormulating the dual problem we can \ufb01nd that the problem can be solved by calculating a\nmatrix Q and diagonal matrix D with elements in [0,1] such that\n\nQS 1Q\u22a4 = D\n\nand\n\nQS 2Q\u22a4 = I \u2212 D\n\n(3)\n\nand by choosing the highest and lowest eigenvalue.\nEquation (3) can be accomplished in the following way. First we whiten the matrix S 1 +S 2,\ni.e., determine a matrix P such that P(S 1 + S 2)P\u22a4 = I which is possible due to positive\nde\ufb01niteness of S 1 +S 2. Then de\ufb01ne \u02c6S y = PS yP\u22a4 and calculate an orthogonal matrix R and\na diagonal maxtrix D by spectral theory such that \u02c6S 1 = RDR\u22a4. Therefore \u02c6S 2 = R(I \u2212 D)R\u22a4\nsince \u02c6S 1 + \u02c6S 2 = I and Q := R\u22a4P satis\ufb01es (3). The projection that is given by the j-th row\nof matrix R has a relative variance of d j ( j-th element of D) for trials of class 1 and relative\nvariance 1 \u2212 d j for trials of class 2. If d j is near 1 the \ufb01lter given by the j-th row of R\nmaximizes variance for class 1, and since 1 \u2212 d j is near 0, minimizes variance for class\n2. Typically one would retain some projections corresponding to the highest eigenvalues\nd j, i.e., CSPs for class 1, and some corresponding to the lowest eigenvalues, i.e., CSPs for\nclass 2.\n\n5 Spectral Filter\n\nAs discussed in Sec. 3 the content of discriminative information in different frequency\nbands is highly subject-dependent. For example the subject whose spectra are visualized in\nFig. 1 shows a highly discriminative peak at 12 Hz whereas the peak at 8 Hz does not show\ngood discrimination. Since the lower frequency peak is stronger a better performance in\n\n(cid:229)\n(cid:229)\n\fclassi\ufb01cation can be expected, if we reduce the in\ufb02uence of the lower frequency peak for\nthis subject. However, for other subjects the situation looks differently, i.e., the classi\ufb01ca-\ntion might fail if we exclude this information. Thus it is desirable to optimize a spectral\n\ufb01lter for better discriminability. Here are two approaches to this task.\nCSSP. In [11] the following was suggested: Given si the signal st\ni is de\ufb01ned to be the signal\nsi delayed by t timepoints. In CSSP the usual CSP approach is applied to the concatenation\nof si and st\ni in the channel dimension, i.e., the delayed signals are treated as new channels.\nBy this concatenation step the ability to neglect or emphasize speci\ufb01c frequency bands can\nbe achieved and strongly depends on the choice of t which can be accomplished by some\nvalidation approach on the training set. More complex frequency \ufb01lters can be found by\nconcatenating more delayed EEG-signals with several delays. In [11] it was concluded that\nin typical BCI situations where only small training sets are available, the choice of only one\ndelay tap is most effective. The increased \ufb02exibility of a frequency \ufb01lter with more delay\ntaps does not trade off the increased complexity of the optimization problem.\nCSSSP. The idea of our new CSSSP algorithm is to learn a complete global spatial-\ntemporal \ufb01lter in the spirit of CSP and CSSP.\nA digital frequency \ufb01lter consists of two sequences a and b with length na and nb such that\nthe signal x is \ufb01ltered to y by\n\na(1)y(t) =\n\nb(1)x(t) + b(2)x(t \u2212 1) + ... + b(nb)x(t \u2212 nb \u2212 1)\n\n\u2212 a(2)y(t \u2212 1) \u2212 ... \u2212 a(na)y(t \u2212 na \u2212 1)\n\nHere we restrict ourselves to FIR (\ufb01nite impulse response) \ufb01lters by de\ufb01ning na = 1 and\na = 1. Furthermore we de\ufb01ne b(1) = 1 and \ufb01x the length of b to some T with T > 1. By\nthis restriction we resign some \ufb02exibility of the frequency \ufb01lter but it allows us to \ufb01nd a\nsuitable solution in the following way: We are looking for a real-valued sequence b1,...,T\nwith b(1) = 1 such that the trials\n\nsi,b = si + (cid:229)\n\nt =2,...,T\n\nt\nbt s\ni\n\n(4)\n\ncan be classi\ufb01ed better in some way. Using equation (1) we have to solve the problem\n\nmax\n\nw,b,b(1)=1\n\nvar( \u02c6si,b(w)),\n\ns.t.\n\nvar( \u02c6si,b(w)) = 1,\n\n(5)\n\ni:Trial in Class 1\n\ni\n\nwhich can be simpli\ufb01ed to\n\nt =0,...,T \u22121 (cid:229)\n\nj=1,...,T \u2212t\n\nb( j)b( j + t )!S\n\nb( j)b( j + t )! (S\n\nt\n1 + S\n\nj=1,...,T \u2212t\n\nt\n\n1! w,\n2)! w = 1.\n\nt\n\n(6)\n\nmax\nb,b(1)=1\n\nw\n\nmax\n\nw\u22a4 (cid:229)\nt =0,...,T \u22121 (cid:229)\nw\u22a4 (cid:229)\ni )\u22a4 + st\n\ni s\u22a4\n\ns.t.\n\nt\ny = E(hsi(st\n\ni |i : Trial in Class yi), namely the correlation between the\n\nwhere S\nsignal and the by t timepoints delayed signal.\nSince we can calculate for each b the optimal w by the usual CSP techniques (see equation\n(2) and (3)) a (T \u2212 1)-dimensional (b(1)=1) problem remains which we can solve with\nusual line-search optimization techniques if T is not too large.\nConsequently we get for each class a frequency band \ufb01lter and a pattern (or similar to CSP\nmore than one pattern by choosing the next eigenvectors).\nHowever, with increasing T the complexity of the frequency \ufb01lter has to be controlled in\norder to avoid over\ufb01tting. This control is achieved by introducing a regularization term in\n\n(cid:229)\n(cid:229)\n\f)\n\nB\nd\n(\n \n\ne\nd\nu\n\nt\ni\n\nn\ng\na\nM\n\n10\n\n0\n\n\u221210\n\n\u221220\n\n5\n\nCz\n\nC4\n\nB\nd\n\n48\n\n44\n\n40\n\n36\n\n32\n\n28\n\n10\n\n15\n\n20\n\n10\n\n15\n\n20\n\n10\n\n15\n\n20\n\n25\n\nFrequency (Hz)\n\nFigure 2: The plot on the left shows one learned frequency \ufb01lter for the subject whose spectra was\nshown Fig. 1. In the plot on the right the resulting spectra are visualized after applying the frequency\n\ufb01lter on the left. By this technique the classi\ufb01cation error could be reduced from 12.9 % to 4.3 %.\n\nthe following way:\n\nmax\nb,b(1)=1\n\nmax\n\nw\n\ns.t.\n\nw\u22a4 (cid:229)\nw\u22a4 (cid:229)\n\nt =0,...,T \u22121 (cid:229)\nt =0,...,T \u22121 (cid:229)\n\nj=1,...,T \u2212t\n\nj=1,...,T \u2212t\n\nb( j)b( j + t )!S\nb( j)b( j + t )! (S\n\nt\n\n1! w \u2212C/T ||b||1,\n2)! w = 1.\nt\n1 + S\n\nt\n\n(7)\n\nHere C is a non-negative regularization constant, which has to be chosen, e.g., by cross-\nvalidation. Since a sparse solution for b is desired, we use the 1-norm in this formulation.\nWith higher C we get sparser solutions for b until at one point the usual CSP approach\nremains, i.e., b(1) = 1,b(m) = 0 for m > 1. We call this approach Common Sparse Spectral\nSpatial Pattern (CSSSP) algorithm.\n\n6 Feature Extraction, Classi\ufb01cation and Validation\n6.1 Feature Extraction\nAfter choosing all channels except the EOG and EMG and a few of the outermost channels\nof the cap we apply a causal band-pass \ufb01lter from 7\u201330 Hz to the data, which encompasses\nboth the m - and the b -rhythm. For classi\ufb01cation we extract the interval 500\u20133500 ms after\nthe presented visual stimulus. To these trials we apply the original CSP ([10]) algorithm\n(see Sec. 4), the extended CSSP ([11]), and the proposed CSSSP algorithm (see Sec. 5).\nFor CSSP we choose the best t by leave-one-out cross validation on the training set. For\nCSSSP we present the results for different regularization constants C with \ufb01xed T = 16.\nHere we use 3 patterns per class which leads to a 6-dimensional output signal. As a measure\nof the amplitude in the speci\ufb01ed frequency band we calculate the logarithm of the variances\nof the spatio-temporally \ufb01ltered output signals as feature vectors.\n\n6.2 Classi\ufb01cation and Validation\nThe presented preprocessing reduces the dimensionality of the feature vectors to six. Since\nwe have 120 up to 200 samples per class for each data set, there is no need for regulariza-\ntion when using linear classi\ufb01ers. When testing non-linear classi\ufb01cation methods on these\nfeatures, we could not observe any statistically signi\ufb01cant gain for the given experimen-\ntal setup when compared to Linear Discriminant Analysis (LDA) (see also [22, 6, 23]).\nTherefore we choose LDA for classi\ufb01cation.\nFor validation purposes the (chronologically) \ufb01rst half of the data are used as training and\nthe second half as test data.\n\n7 Results\n\nFig. 2 shows one chosen frequency \ufb01lter for the subject whose spectra are shown in Fig. 1\nand the remaining spectrum after using this \ufb01lter. As expected the \ufb01lter detects that there\n\n\fC = 0.1\n\nC = 0.5\n\nC = 1\n\nC = 5\n\nCSP\nvs.\nCSSSP\n\nCSSP\nvs.\nCSSSP\n\n50\n\n40\n\n30\n\n20\n\n10\n\n0\n\n0\n\n50\n\n40\n\n30\n\n20\n\n10\n\n0\n\n0\n\n20\n\n40\n\n20\n\n40\n\n50\n\n40\n\n30\n\n20\n\n10\n\n0\n\n0\n\n50\n\n40\n\n30\n\n20\n\n10\n\n0\n\n0\n\n20\n\n40\n\n20\n\n40\n\n50\n\n40\n\n30\n\n20\n\n10\n\n0\n\n0\n\n50\n\n40\n\n30\n\n20\n\n10\n\n0\n\n0\n\n20\n\n40\n\n20\n\n40\n\n50\n\n40\n\n30\n\n20\n\n10\n\n0\n\n0\n\n50\n\n40\n\n30\n\n20\n\n10\n\n0\n\n0\n\n20\n\n40\n\n20\n\n40\n\nFigure 3: Each plots shows validation error of one algorithm against another, in row 1 that is CSP\n(y-axis) vs. CSSSP (x-axis), in row 2 that is CSSP (y-axis) vs. CSSSP (x-axis).\nIn columns the\nregularization parameter of CSSSP is varied between 0.1, 0.5, 1 and 5. In each plot a cross above the\ndiagonal marks a dataset where CSSSP outperforms the other algorithm.\n\nis a high discrimination in frequencies at 12 Hz, but only a low discrimination in the fre-\nquency band at 8 Hz. Since the lower frequency peak is very predominant for this subject\nwithout having a high discrimination power, a \ufb01lter is learned which drastically decreases\nthe amplitude in this band, whereas full power at 12 Hz is retained.\nApplied to all datasets and all pairwise class combinations of the datasets we get the results\nshown in Fig. 3. Only the results of those datasets are displayed whose classi\ufb01cation accu-\nracy exceeds 70 % for at least one classi\ufb01er. First of all, it is obvious that a small choice\nof the regularization constant is problematic, since the algorithm tends to over\ufb01t. For high\nvalues CSSSP tends towards the CSP performance since using frequency \ufb01lters is punished\ntoo hard. In between there is a range where CSSSP is better than CSP. Furthermore there\nare some datasets where the gain by CSSSP is huge.\nCompared to CSSP the situation is similar, namely that CSSSP outperforms the CSSP in\nmany cases and on average, but there are also a few cases, where CSSP is better.\nAn open issue is the choice of the parameter C. If we choose it constant at 1 for all datasets\nthe \ufb01gure shows that CSSSP will typically outperform CSP. Compared to CSSP both cases\nappear, namely that CSSP is better than CSSSP and vice versa.\nA more re\ufb01ned way is to choose C individually for each dataset. One way to accomplish\nthis choice is to perform cross-validations for a set of possible values of C and to select the\nC with minimum cross-validation error. We have done this, for example, for the dataset\nwhose spectra are shown in Fig. 1. Here on the training set for C the value 0.3 is chosen.\nThe classi\ufb01cation error of CSSSP with this C is 4.3 %, whereas CSP has 12.9 % and CSSP\n8.6 % classi\ufb01cation error.\n\n8 Concluding discussion\n\nIn past BCI research the CSP algorithm has proven to be very sucessful in determining\nspatial \ufb01lters which extract discriminative brain rhythms. However the performance can\nsuffer when a non-discriminative brain rhythm with an overlapping frequency range inter-\nferes. The presented CSSSP algorithm successful solves such problematic situations by\noptimizing simultaneously with the spatial \ufb01lters a spectral \ufb01lter. The trade-off between\n\ufb02exibility of the estimated frequency \ufb01lter and the danger of over\ufb01tting is accounted for by\na sparsity constraint which is weighted by a regularization constant. The successfulness of\nthe proposed algorithm when compared to the original CSP and to the CSSP algorithm was\ndemonstrated on a corpus of 60 EEG data sets recorded from 22 different subjects.\n\n\fAcknowledgments We thank S. Lemm for helpful discussions. The studies were supported\nby BMBF-grants FKZ 01IBB02A and FKZ 01IBB02B, by the Deutsche Forschungsgemeinschaft\n(DFG), FOR 375/B1 and by the PASCAL Network of Excellence (EU # 506778).\nReferences\n[1] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, and T. M. Vaughan, \u201cBrain-computer inter-\n\nfaces for communication and control\u201d, Clin. Neurophysiol., 113: 767\u2013791, 2002.\n\n[2] E. A. Curran and M. J. Stokes, \u201cLearning to control brain activity: A review of the production and control\nof EEG components for driving brain-computer interface (BCI) systems\u201d, Brain Cogn., 51: 326\u2013336, 2003.\n[3] A. K\u00fcbler, B. Kotchoubey, J. Kaiser, J. Wolpaw, and N. Birbaumer, \u201cBrain-Computer Communication:\n\nUnlocking the Locked In\u201d, Psychol. Bull., 127(3): 358\u2013375, 2001.\n\n[4] N. Birbaumer, N. Ghanayim, T. Hinterberger, I. Iversen, B. Kotchoubey, A. K\u00fcbler, J. Perelmouter, E. Taub,\n\nand H. Flor, \u201cA spelling device for the paralysed\u201d, Nature, 398: 297\u2013298, 1999.\n\n[5] G. Pfurtscheller, C. Neuper, C. Guger, W. Harkam, R. Ramoser, A. Schl\u00f6gl, B. Obermaier, and M. Pregenzer,\n\u201cCurrent Trends in Graz Brain-computer Interface (BCI)\u201d, IEEE Trans. Rehab. Eng., 8(2): 216\u2013219, 2000.\n[6] B. Blankertz, G. Curio, and K.-R. M\u00fcller, \u201cClassifying Single Trial EEG: Towards Brain Computer Interfac-\ning\u201d, in: T. G. Diettrich, S. Becker, and Z. Ghahramani, eds., Advances in Neural Inf. Proc. Systems (NIPS\n01), vol. 14, 157\u2013164, 2002.\n\n[7] L. Trejo, K. Wheeler, C. Jorgensen, R. Rosipal, S. Clanton, B. Matthews, A. Hibbs, R. Matthews, and\nM. Krupka, \u201cMultimodal Neuroelectric Interface Development\u201d, IEEE Trans. Neural Sys. Rehab. Eng.,\n(11): 199\u2013204, 2003.\n\n[8] L. Parra, C. Alvino, A. C. Tang, B. A. Pearlmutter, N. Yeung, A. Osman, and P. Sajda, \u201cLinear spatial\n\nintegration for single trial detection in encephalography\u201d, NeuroImage, 7(1): 223\u2013230, 2002.\n\n[9] W. D. Penny, S. J. Roberts, E. A. Curran, and M. J. Stokes, \u201cEEG-Based Communication: A Pattern Recog-\n\nnition Approach\u201d, IEEE Trans. Rehab. Eng., 8(2): 214\u2013215, 2000.\n\n[10] H. Ramoser, J. M\u00fcller-Gerking, and G. Pfurtscheller, \u201cOptimal spatial \ufb01ltering of single trial EEG during\n\nimagined hand movement\u201d, IEEE Trans. Rehab. Eng., 8(4): 441\u2013446, 2000.\n\n[11] S. Lemm, B. Blankertz, G. Curio, and K.-R. M\u00fcller, \u201cSpatio-Spectral Filters for Improved Classi\ufb01cation of\n\nSingle Trial EEG\u201d, IEEE Trans. Biomed. Eng., 52(9): 1541\u20131548, 2005.\n\n[12] H. Jasper and W. Pen\ufb01eld, \u201cElectrocorticograms in man: Effects of voluntary movement upon the electrical\n\nactivity of the precentral gyrus\u201d, Arch. Psychiat. Nervenkr., 183: 163\u2013174, 1949.\n\n[13] G. Pfurtscheller and F. H. L. da Silva, \u201cEvent-related EEG/MEG synchronization and desynchronization:\n\nbasic principles\u201d, Clin. Neurophysiol., 110(11): 1842\u20131857, 1999.\n\n[14] H. Jasper and H. Andrews, \u201cNormal differentiation of occipital and precentral regions in man\u201d, Arch. Neurol.\n\nPsychiat. (Chicago), 39: 96\u2013115, 1938.\n\n[15] H. Berger, \u201c\u00dcber das Elektroenkephalogramm des Menschen\u201d, Arch. Psychiat. Nervenkr., 99(6): 555\u2013574,\n\n1933.\n\n[16] F. H. da Silva, T. H. van Lierop, C. F. Schrijer, and W. S. van Leeuwen, \u201cOrganization of thalamic and\ncortical alpha rhythm: Spectra and coherences\u201d, Electroencephalogr. Clin. Neurophysiol., 35: 627\u2013640,\n1973.\n\n[17] G. Dornhege, B. Blankertz, G. Curio, and K.-R. M\u00fcller, \u201cCombining Features for BCI\u201d, in: S. Becker,\nS. Thrun, and K. Obermayer, eds., Advances in Neural Inf. Proc. Systems (NIPS 02), vol. 15, 1115\u20131122,\n2003.\n\n[18] G. Dornhege, B. Blankertz, G. Curio, and K.-R. M\u00fcller, \u201cIncrease Information Transfer Rates in BCI by CSP\nExtension to Multi-class\u201d, in: S. Thrun, L. Saul, and B. Sch\u00f6lkopf, eds., Advances in Neural Information\nProcessing Systems, vol. 16, 733\u2013740, MIT Press, Cambridge, MA, 2004.\n\n[19] K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, San Diego, 2nd edn., 1990.\n[20] Z. J. Koles and A. C. K. Soong, \u201cEEG source localization: implementing the spatio-temporal decomposition\n\napproach\u201d, Electroencephalogr. Clin. Neurophysiol., 107: 343\u2013352, 1998.\n\n[21] G. Dornhege, B. Blankertz, G. Curio, and K.-R. M\u00fcller, \u201cBoosting bit rates in non-invasive EEG single-\ntrial classi\ufb01cations by feature combination and multi-class paradigms\u201d, IEEE Trans. Biomed. Eng., 51(6):\n993\u20131002, 2004.\n\n[22] K.-R. M\u00fcller, C. W. Anderson, and G. E. Birch, \u201cLinear and Non-Linear Methods for Brain-Computer\n\nInterfaces\u201d, IEEE Trans. Neural Sys. Rehab. Eng., 11(2): 165\u2013169, 2003.\n\n[23] B. Blankertz, G. Dornhege, C. Sch\u00e4fer, R. Krepki, J. Kohlmorgen, K.-R. M\u00fcller, V. Kunzmann, F. Losch,\nand G. Curio, \u201cBoosting Bit Rates and Error Detection for the Classi\ufb01cation of Fast-Paced Motor Commands\nBased on Single-Trial EEG Analysis\u201d, IEEE Trans. Neural Sys. Rehab. Eng., 11(2): 127\u2013131, 2003.\n\n\f", "award": [], "sourceid": 2836, "authors": [{"given_name": "Guido", "family_name": "Dornhege", "institution": null}, {"given_name": "Benjamin", "family_name": "Blankertz", "institution": null}, {"given_name": "Matthias", "family_name": "Krauledat", "institution": null}, {"given_name": "Florian", "family_name": "Losch", "institution": null}, {"given_name": "Gabriel", "family_name": "Curio", "institution": null}, {"given_name": "Klaus-Robert", "family_name": "M\u00fcller", "institution": null}]}