{"title": "A Temporal Kernel-Based Model for Tracking Hand Movements from Neural Activities", "book": "Advances in Neural Information Processing Systems", "page_first": 1273, "page_last": 1280, "abstract": null, "full_text": "     A Temporal Kernel-Based Model for Tracking\n        Hand-Movements from Neural Activities\n\n\n\n     Lavi Shpigelman12 Koby Crammer1 Rony Paz23 Eilon Vaadia23 Yoram Singer1\n                      1 School of computer Science and Engineering\n                     2 Interdisciplinary Center for Neural Computation\n                     3 Dept. of Physiology, Hadassah Medical School\n                      The Hebrew University Jerusalem, 91904, Israel\n                 Email for correspondance: shpigi@cs.huji.ac.il\n\n\n                                        Abstract\n\n         We devise and experiment with a dynamical kernel-based system for\n         tracking hand movements from neural activity. The state of the system\n         corresponds to the hand location, velocity, and acceleration, while the\n         system's input are the instantaneous spike rates. The system's state dy-\n         namics is defined as a combination of a linear mapping from the previous\n         estimated state and a kernel-based mapping tailored for modeling neural\n         activities. In contrast to generative models, the activity-to-state mapping\n         is learned using discriminative methods by minimizing a noise-robust\n         loss function. We use this approach to predict hand trajectories on the\n         basis of neural activity in motor cortex of behaving monkeys and find\n         that the proposed approach is more accurate than both a static approach\n         based on support vector regression and the Kalman filter.\n\n\n1     Introduction\n\nThe paper focuses on the problem of tracking hand movements, which constitute smooth\nspatial trajectories, from spike trains of a neural population. We do so by devising a dynam-\nical system which employs a tailored kernel for spike trains along with a linear mapping\ncorresponding to the states' dynamics. Consider a situation where a subject performs free\nhand movements during a task that requires accurate space and time precision. In the lab,\nit may be a constrained reaching task while in real life it may be an every day task such as\neating. We wish to track the hand position given only spike trains from a recorded neural\npopulation. The rationale of such an undertaking is two fold. First, this task can be viewed\nas a stem towards the development of a Brain Machine Interface (BMI) which gradually and\nrapidly become a possible future solution for the motor disabled patients. Recent studies of\nBMIs [13, 3, 10] (being on-line and feedback enabled) show that a relatively small number\nof cortical units can be used to move a cursor or a robot effectively, even without genera-\ntion of hand movements and that training of the subjects improves the overall success of\nthe BMIs. Second, an open loop (off-line) movement decoding (see e.g. [7, 1, 15, 11, 8]),\nwhile inappropriate for BMIs, is computationally less expensive, easier to implement and\nallows repeated analysis thus providing a handle to understandings of neural computations\nin the brain.\n\nEarly studies [6] showed that the direction of arm movement is reflected by the population\nvector of preferred directions weighted by current firing ra tes, suggesting that intended\n\n\f\nmovement is encoded in the firing rate which, in turn, is modulated by the angle between a\nunit's preferred direction (PD) and the intended direction. This linear regression approach\nis still prevalent and is applied, with some variation of the learning methods, in closed and\nopen loop settings. There is relatively little work on the development of dedicated nonlinear\nmethods.\n\nBoth movement and neural activity are dynamic and can therefore be naturally modeled by\ndynamical systems. Filtering methods often employ generative probabilistic models such\nas the well known Kalman filter [16] or more neurally specialized models [1] in which a\ncortical unit's spike count is generated by a probability function of its underlying firing\nrate which is tuned to movement parameters. The movement, being a smooth trajectory,\nis modeled as a linear transition with (typically additive Gaussian) noise. These methods\nhave the advantage of being aware of the smooth nature of movement and provide models\nof what neurons are tuned to. However, the requirement of describing a neural population's\nfiring probability as a function of movement state is hard to satisfy without making costly\nassumptions. The most prominent is the assumption of statistical independence of cells\ngiven the movement.\n\nKernel based methods have been shown to achieve state of the art results in many applica-\ntion domains. Discriminative kernel methods, such as Support Vector Regression (SVR)\nforgo the task of modeling neuronal tuning functions. Furthermore, the construction of\nkernel induced feature spaces, lends itself to efficient implementation of distance measures\nover spike trains that are better suited to comparing two neural population trajectories than\nthe Euclidean distance in the original space of spike counts per bins [11, 5]. However,\nSVR is a \"static\" method that does not take into account the smooth dynamics of the pre-\ndicted movement trajectory which imposes a statistical dependency between consecutive\nexamples.\n\nThis paper introduces a kernel based regression method that incorporates linear dynamics\nof the predicted trajectories. In Sec. 2 we formally describe the problem setting. We intro-\nduce the movement tracking model and the associated learning framework in Sec. 3. The\nresulting learning problem yields a new kernel for linear dynamical systems. We provide\nan efficient calculation of this kernel and describe our dual space optimization method for\nsolving the learning problem. The experimental method is presented in Sec. 4. Results,\nunderscoring the merits of our algorithm are provided in Sec. 5 and conclusions are given\nin Sec. 6.\n\n\n2    Problem Setting\n\nOur training set contains m trials. Each trial (typically indexed by i or j) consists of a pair\n                                                                                            ti\nof movement and neural recordings, designated by                  Yi, Oi . Yi =       yi     end\n                                                                                       t            is a time\n                                                                                            t=1\nseries of movement state values and yi \n                                              t           Rd is the movement state vector at time t in\ntrial i. We are interested in reconstructing position, however, for better modeling, yit may\nbe a vector of position, velocity and acceleration (as is the case in Sec. 4). This trajectory is\n\nobserved during model learning and is the inference target. Oi = {ot}tiend\n                                                                                    t=1 is a time series\nof neural spike counts and oi \n                              t         Rq is a vector of spike counts from q cortical units at time\nt. We wish to learn a function zi = f Oi\n                                   t               1:t    that is a good estimate (in a sense formalized\nin the sequel) of the movement yit. Thus, f is a causal filtering method.\n\nWe confine ourselves to a causal setting since we plan to apply the proposed method in a\nclosed loop scenario where real-time output is required. The partition into separate trajecto-\nries is a natural one in a setting where a session is divided into many trials, each consisting\nof one attempt at accomplishing the basic task (such as reaching movements to displayed\ntargets). In tasks that involve no hitting of objects, hand movements are typically smooth.\n\n\f\nEnd point movement in small time steps is loosely approximated as having constant ac-\nceleration. On the other hand, neural spike counts (which are typically measured in bins\nof 50 - 100ms) vary greatly from one time step to the next. In summary, our goal is to\ndevise a dynamic mapping from sequences of neural activities ending at a given time to the\ninstantaneous hand movement characterization (location, velocity, and acceleration).\n\n\n3    Movement Tracking Algorithm\n\nOur regression method is defined as follows: given a series O  Rqtend of observations\nand, possibly, an initial state y0, the predicted trajectory Z  Rdtend is,\n                          zt    = Azt-1 + W (ot) , tend  t > 0 ,                                              (1)\n\nwhere z0 = y0, A  Rdd is a matrix describing linear movement dynamics and\nW       Rdq is a weight matrix.  (ot) is a feature vector of the observed spike trains\nat time t and is later replaced by a kernel operator (in the dual formulation to follow).\nThus, the state transition is a linear transformation of the previous state with the addition\nof a non-linear effect of the observation.\nNote that unfolding the recursion in Eq. (1) yields zt = Aty0 +                        t           At-kW (o\n                                                                                       k=1                   k )    .\nAssuming that A describes stable dynamics (the real parts of the eigenvalues of A are les\nthan 1), then the current prediction depends, in an exponentially decaying manner, on the\nprevious observations. We further assume that A is fixed and wish to learn W (we describe\nour choice of A in Sec. 4). In addition, ot may also encompass a series of previous spike\ncounts in a window ending at time t (as is the case in Sec. 4). Also, note that this model (in\nits non-kernelized version) has an algebraic form which is similar to the Kalman filter (to\nwhich we compare our results later).\n\nPrimal Learning Problem:              The optimization problem presented here is identical to the\nstandard SVR learning problem (see, for example [12]) with the exception that zit is defined\nas in Eq. (1) while in standard SVR, zt = W (ot) (i.e. without the linear dynamics).\nGiven a training set of fully observed trials          Yi, Oi m           we define the learning problem\n                                                                   i=1\nto be\n\n                                                      ti\n                                 1               m     end    d\n                    min               W 2 + c                      zi          - yi           .\n                                                                    t             t                             (2)\n                     W           2                                        s            s \n                                                 i=1 t=1 s=1\n\nWhere W 2 =                 (W)2 (is the Forbenius norm). The second term is a sum of training\n                     a,b        ab\nerrors (in all trials, times and movement dimensions). |  | is the  insensitive loss and is\ndefined as |v| = max {0, |v| - }. The first term is a regularization term that promotes\n              \nsmall weights and c is a fixed constant providing a tradeoff between the regularization\nterm and the training error. Note that to compensate for different units and scales of the\nmovement dimensions one could either define a different s and cs for each dimension of\nthe movement or, conversely, scale the sth movement dimension. The tracking method,\ncombined with the optimization specified here, defines the complete algorithm. We name\nthis method the Discriminative Dynamic Tracker or DDT in short.\n\nA Dual Solution:         The derivation of the dual of the learning problem defined in Eq. (2)\nis rather mundane (e.g. [12]) and is thus omitted. Briefly, we replace the -loss with pairs\nof slack variables. We then write a Lagrangian of the primal problem and replace zit with\nits (less-standard) definition. We then differentiate the Lagrangian with respect to the slack\nvariables and W and obtain a dual optimization problem. We present the dual dual problem\nin a top-down manner, starting with the general form and finishing with a kernel definition.\nThe form of the dual is\n\n          max      - 1 ( - )T G ( - ) + ( - )T y - ( + )T\n                    2\n          ,\n\n                                            s.t. ,   [0, c]                                         .       (3)\n\n\f\nNote that the above expression conforms to the dual form of SVR. Let equal the size of the\nmovement space (d), multiplied by the total number of time steps in all the training trajecto-\nries. ,   R are vectors of Lagrange multipliers, y  R is a column concatenation of\n                                                                                            T                        T         T\nall the training set movement trajectories                                          y11             ym\n                                                                                                        tm                            ,    = [, . . . , ]T  R\n                                                                                                              end\n\n\nand G  R  is a Gram matrix (vT denotes transposition). One obvious difference be-\ntween our setting and the standard SVR lies within the size of the vectors and Gram matrix.\nIn addition, a major difference is the definition of G. We define G here in a hierarchical\nmanner. Let i, j  {1, . . . , m} be trajectory (trial) indexes. G is built from blocks indexed\nby Gij , which are in turn made from basic blocks, indexed by Kij\n                                                                                                                               tq as follows\n\n               G11  G1m                                                                      Kij11  Kij1tj \n                                                                                                        .                                          .\n       G =           .                                   .                                                                .\n               . ... .                                                       , Gij =                   ..                     . .                 ..                 ,\n               .                                        .                                                                                                     \n                Gm1                             Gmm                                           Kij  Kij \n                                                                                                       ti       1                                                \n                                                                                                        end                                  ti          tj\n                                                                                                                                              end end\n\nwhere block Gij refers to a pair of trials (i and j). Finally Each basic block, Kij\n                                                                                                                                                               tq refers to a\n\npair of time steps t and q in trajectories i and j respectively. ti                                                            , tj          are the time lengths\n                                                                                                                     end              end\nof trials i and j. Basic blocks are defined as\n                                                               t         q\n\n                                              Kij =                                At-r kij Aq-s T ,\n                                                   tq                                            rs                                                                         (4)\n                                                              r=1 s=1\n\nwhere kij = k oi , oj\n        rs                r         s    is a (freely chosen) basic kernel between the two neural observa-\ntions oir and ojs at times r and s in trials i and j respectively. For an explanation of kernel\noperators we refer the reader to [14] and mention that the kernel operator can be viewed\nas computing  oi   oj\n                               r                   s     where  is a fixed mapping to some inner product space.\nThe choice of kernel (being the choice of feature space) reflects a modeling decision that\nspecifies how similarities between neural patterns are measured. The resulting dual form\nof the tracker is zt =                        \n                                         k     k Gtk where Gt is the Gram matrix row of the new example.\n\nIt is therefore clear from Eq. (4) that the linear dynamic characteristics of DDT results in\na Gram matrix whose entries depend on previous observations. This dependency is ex-\nponentially decaying as the time difference between events in the trajectories grow. Note\nthat solution of the dual optimization problem in Eq. (3) can be calculated by any stan-\ndard quadratic programming optimization tool. Also, note that direct calculation of G is\ninefficient. We describe an efficient method in the sequel.\n\nEfficient Calculation of the Gram Matrix                                                    Simple, straight-forward calculation of the\nGram matrix is time consuming. To illustrate this, suppose each trial is of length ti                                                                                      = n,\n                                                                                                                                                                  end\nthen calculation of each basic block would take (n2) summation steps. We now describe\na procedure based on dynamic-programming method for calculating the Gram matrix in a\nconstant number of operations for each basic block.\n\nOmitting the indexing over trials to ease notation, we are interested in calculating the basic\nblock Ktq. First, define Btq =                                       t         k\n                                                                    k=1       kq At-k . the basic block Ktq can be recursively\ncalculated in three different ways:\n                     Ktq                 = Kt(q-1)AT + Btq                                                                                                                  (5)\n\n                     Ktq                 = AK(t-1)q + (Bqt)T                                                                                                                (6)\n\n                     Ktq                 = AK(t-1)(q-1)AT + (Bqt)T + Btq - ktq .                                                                                            (7)\nThus, by adding Eq. (5) to Eq. (6) and subtracting Eq. (7) we get\n              Ktq         = AK(t-1)q + Kt(q-1)AT - AK(t-1)(q-1)AT + ktqI .\nBtq (and the entailed summation) is eliminated in exchange for a 2D dynamic program with\ninitial conditions: K11 = k11I , K1q = K1(q-1)AT + k1qI , Kt1 = AK(t-1)1 + kt1I.\n\n\f\nTable 1: Mean R2, MAE & MSE (across datasets, folds, hands and directions) for each algorithm.\n                                                                              \n\n\n                                                                                       R2                                            MAE                                     MSE\n                                      Algorithm                              pos.     vel.              accl.          pos.               vel.         accl.         pos.     vel.             accl.\n                                      Kalman filter                           0.64     0.58              0.30           0.40               0.15         0.37          0.78     0.27             1.16\n                                      DDT-linear                             0.59     0.49              0.17           0.63               0.41         0.58          0.97     0.50             1.23\n                                      SVR-Spikernel                          0.61     0.64              0.37           0.44               0.14         0.34          0.76     0.20             0.98\n                                      DDT-Spikernal                          0.73     0.67              0.40           0.37               0.14         0.34          0.50     0.16             0.91\n\n                                     1\n\n\n\n                                    0.8\n\n      Scores\n2\n                                    0.6\n\n\n\n                                    0.4\n                                                                                                                                                                                      left hand, X dir.\n\n                                                                                                                                                                                      left hand, Y dir.\n                                    0.2\n                DDT-Spikernel, R                                                                                                                                                      right hand, X dir.\n\n                                                                                                                                                                                      right hand, Y dir.\n\n                                     00           0.2    0.4     0.6          0.8     1       0           0.2         0.4          0.6      0.8        1        0      0.2    0.4       0.6         0.8     1\n                                                   Kalman filter, R2 Scores                                DDT-linear, R2 Scores                                       SVR-Spikernel, R2 Scores\n\n\nFigure 1: Correlation coefficients (R2, of predicted and observed hand positions) comparisons of\nthe DDT-Spikernel versus the Kalman filter (left), DDT-linear (center) and SVR-Spikernel (right).\nEach data point is the R2 values obtained by the DDT-Spikernel and by another method in one fold\nof one of the datasets for one of the two axes of movement (circle / square) and one of the hands\n(filled/non-filled). Results above the diagonals are cases were the DDT-Spikernel outperformes.\n\nSuggested Optimization Method.                                                                           One possible way to solve the optimization problem\n(essentially, a modification of the method described in [4] for classification) is to sequen-\ntially solve a reduced problem with respect to a single constraint at a time. Define:\n\n\n                                             i =                  -                                           -           min                              -                                .\n                                                                        j          j Gij - yi                                                         j           j Gij - yi\n                                                                                                                      i,[0,c]\n                                                            j                                                                i                    j\n                                                                                                                                                                                       \n\nThen i is the amount of -insensitive error that can be corrected for example i by keeping\n                                           ()                                                     ()\nall                                               constant and changing                                 . Optimality is reached by iteratively choosing the\n                                           j=i                                                     i\n\nexample with the largest i and changing its () within the [0, c] limits to minimize the\n                                                                                                                              i\nerror for this example.\n\n\n4                                          Experimental Setting\n\nThe data used in this work was recorded from the primary motor cortex of a Rhesus\n(Macaca Mulatta) monkey (~4.5 kg). The monkey sat in a dark chamber, and up to 8\nelectrodes were introduced into MI area of each hemisphere. The electrode signals were\namplified, filtered and sorted. The data used in this report was recorded on 8 different days\nand includes hand positions, sampled at 500Hz, spike times of single units (isolated by sig-\nnal fit to a series of windows) and of multi units (detection by threshold crossing) sampled\nat 1ms precision. The monkey used two planar-movement manipulanda to control 2 cur-\nsors on the screen to perform a center-out reaching task. Each trial began when the monkey\ncentered both cursors on a central circle. Either cursor could turn green, indicating the hand\nto be used in the trial. Then, one of eight targets appeared ('go signal'), the center circle\ndisappeared and the monkey had to move and reach the target to receive liquid reward. The\nnumber of multi-unit channels ranged from 5 to 15, the number of single units was 20-27\nand the average total was 34 units per dataset. The average spike rate per channel was 8.2\nspikes/sec. More information on the recordings can be found in [9].\n\n\f\n                                                  DDT (Spikernel)\n       DDT (Spikernel)                                                                                 DDT (Spikernel)\n\n\n                             88.1%                          75%                                                  78.7%\n\n                 100%        Kalman Filter        SVR (Spikernel)         87.5%                        SVR (Spikernel)          91.88%\n\n       100%                63.75%               99.4%            80.0%                              98.7%             86.3%\n\n          SVR (Spikernel)             78.12%              96.3%      Kalman Filter                             95.6%      Kalman Filter\n\n\n                   62.5%                                             86.8%                                                84.4%\n\n\n           DDT (Linear)                          DDT (Linear)                                         DDT (Linear)\n\n\n\n\n\nFigure 2: Comparison of R2-performance between algorithms. Each algorithm is represented by a\nvertex. The weight of an edge between two algorithms is the fraction of tests in which the algorithm\non top achieves higher R2 score than the other. A bold edge indicates a fraction higher than 95%.\nGraphs from left to right are for position, velocity, and acceleration respectively.\n\nThe results that we present here refer to prediction of instantaneous hand movements during\nthe period from 'Go Signal' to 'Target Reach' times of both hands in successful trials.\nNote that some of the trials required movement of the left hand while keeping the right\nhand steady and vise versa. Therefore, although we considered only movement periods\nof the trials, we had to predict both movement and non-movement for each hand. The\ncumulative time length of all the datasets was about 67 minutes. Since the correlation\nbetween the movements of the two hands tend to zero - we predicted movement for each\nhand separately, choosing the movement space to be [x, y, vx, vy, ax, ay]T for each of the\nhands (preliminary results using only [x, y, vx, vy]T were less accurate).\n\nWe preprocessed the spike trains into spike counts in a running windows of 100ms (choice\nof window size is based on previous experience [11]). Hand position, velocity and acceler-\nation were calculated using the 500Hz recordings. Both spike counts and hand movement\nwere then sampled at steps of 100ms (preliminary results with step size 50ms were negli-\ngibly different for all algorithms). A labeled example yi, oi\n                                                                                      t    t    for time t in trial i consisted\nof the previous 10 bins of population spike counts and the state, as a 6D vector for the left\nor right hand. Two such consecutive examples would than have 9 time bins of spike count\noverlap. For example, the number of cortical units q in the first dataset was 43 (27 single\nand 16 multiple) and the total length of all the trials that were used in that dataset is 529\nseconds. Hence in that session there are 5290 consecutive examples where each is a 4310\nmatrix of spike counts along with two 6D vectors of end point movement.\n\nIn order to run our algorithm we had to choose base kernels, their parameters, A and c (and\n, to be introduced below). We used the Spikernel [11], a kernel designed to be used with\nspike rate patterns, and the simple dot product (i.e. linear regression). Kernel parmeters and\nc were chosen (and subsequently held fixed) by 5 fold cross validation over half of the first\ndataset only. We compared DDT with the Spikernel and with the linear kernel to standard\nSVR using the Spikernel and the Kalman filter. We also obtained tracking results using\nboth DDT and SVR with the standard exponential kernel. These results were slightly less\naccurate on average than with the Spikernel and are therefore omitted here. The Kalman\nfilter was learned assuming the standard state space model (yt = Ayt-1 +  ,                                                           ot =\nHyt +, where ,  are white Gaussian noise with appropriate correlation matrices) such as\nin [16]. y belonged to the same 6D state space as described earlier. To ease the comparison\n- the same matrix A that was learned for the Kalman filter was used in our algorithm\n(though we show that it is not optimal for DDT), multiplied by a scaling parameter . This\nparameter was selected to produce best position results on the training set. The selected \nvalue is 0.8.\n\nThe figures that we show in Sec. 5 are of test results in 5 fold cross validation on the rest\nof the data. Each of the 8 remaining datasets was divided into 5 folds. 4/5 were used for\n\n\f\n                                                                                                                                   X              Y\n                                              R2        MAE    MSE    # Support\n\n\n                                                                                   14K\n\n                                                                                           position\n     Position\n                                                                                   12K\n                                                                                                                                        Actual\n                                                                                                                                        DDT-Spikernel\n                                                                                                                                        SVR-Spikernel\n                                                                                   10K\n                 Velocity                                                                              velocity\n\n                                                                                    8K\n\n\n\n                                                                                    6K\n                             Acceleration\n                                                                                                               acceleration\n\n\nFigure 3: Effect of  on R2, MAE ,MSE and                                                 Figure 4: Sample of tracking with the DDT-\n                                                                      \n\nnumber of support vectors.                                                                Spikernel and the SVR-Spikernel.\n\ntraining (with the parameters obtained previously and the remaining 1/5 as test set). This\nprocess was repeated 5 times for each hand. Altogether we had 8sets  5folds  2hands = 80\nfolds.\n\n\n5                                            Results\nWe begin by showing average results across all datasets, folds, hands and X/Y directions for\nthe four algorithms that are compared. Table. 1 shows mean Correlation Coefficients (R2,\nbetween recorded and predicted movement values), Mean  insensitive Absolute Errors\n(MAE) and Mean Square Errors (MSE). R2 is a standard performance measure, MAE is\nthe error minimized by DDT (subject to the regularization term) and MSE is minimized by\nthe Kalman filter. Under all the above measures the DDT-Spikernel outperforms the rest\nwith the SVR-Spikernel and the Kalman Filter alternating in second place.\n\nTo understand whether the performance differences are significant we look at the distribu-\ntion of position (X and Y) R2 values at each of the separate tests (160 altogether). Figure 1\nshows scatter plots of R2 results for position predictions. Each plot compares the DDT-\nSpikernel (on the Y axis) with one of the other three algorithms (on the X axes). It is\nclear that in spite large differences in accuracy across datasets, the algorithm pairs achieve\nsimilar success with the DDT-Spikernel achieving a better R2 score in almost all cases.\n\nTo summarize the significance of R2 differences we computed the number of tests in which\none algorithm achieved a higher R2 value than another algorithm (for all pairs, in each of\nthe position, velocity and acceleration categories). The results of this tournament between\nthe algorithms are presented in Figure 2 as winning percentages. The graphs produce a\nranking of the algorithms and the percentages are the significances of the ranking between\npairs. The DDT-Spikernel is significantly better then the rest in tracking position.\n\nThe matrix A in use is not optimal for our algorithm. The choice of  scales its effect. When\n = 0 we get the standard SVR algorithm (without state dynamics). To illustrate the effect\nof  we present in Figure 3 the mean (over 5 folds, X/Y direction and hand) R2 results on\nthe first dataset as a function of . It is clear that the value chosen to minimize position error\nis not optimal for minimizing velocity and acceleration errors. Another important effect of\n is the number of the support patterns in the learned model, which drops considerably\n(by about one third) when the effect of the dynamics is increased. This means that more\ntraining points fall strictly within the -tube in training, suggesting that the kernel which\ntacitly results from the dynamical model is better suited for the problem. Lastly, we show a\nsample of test tracking results for the DDT-Spikernel and SVR-Spikernel in Figure 4. Note\nthat the acceleration values are not smooth and are, therefore, least aided by the dynamics of\nthe model. However, adding acceleration to the model improves the prediction of position.\n\n\f\n6    Conclusion\nWe described and reported experiments with a dynamical system that combines a linear\nstate mapping with a nonlinear observation-to-state mapping. The estimation of the sys-\ntem's parameters is transformed to a dual representation and yields a novel kernel for tem-\nporal modelling. When a linear kernel is used, the DDT system has a similar form to the\nKalman filter as t  . However, the system's parameters are set so as to minimize the\nregularized -insensitive 1 loss between state trajectories. DDT also bares similarity to\nSVR, which employs the same loss yet without the state dynamics. Our experiments indi-\ncate that by combining a kernel-induced feature space, linear state dynamics, and using a\nrobust loss we are able to leverage the trajectory prediction accuracy and outperform com-\nmon approaches. Our next step toward an accurate brain-machine interface for predicting\nhand movements is the development of a learning procedure for the state dynamic mapping\nA and further developments of neurally motivated and compact representations.\n\nAcknowledgments        This study was partly supported by a center of excellence grant (8006/00)\nadministered by the ISF, BMBF-DIP, by the U.S. Israel BSF and by the IST Programme of the Eu-\nropean Community, under the PASCAL Network of Excellence, IST-2002-506778. L.S. is supported\nby a Horowitz fellowship.\n\nReferences\n\n [1] A. E. Brockwell, A. L. Rojas, and R. E. Kass. Recursive bayesian decoding of motor cortical\n     signals by particle filtering. Journal of Neurophysiology, 91:18991907, 2004.\n [2] E. N. Brown, L. M. Frank, D. Tang, M. C. Quirk, and M. A. Wilson. A statistical paradigm for\n     neural spike train decoding applied to position prediction from ensemble firing patterns of rat\n     hippocampal place cells. Journal of Neuroscience, 18(74117425), 1998.\n [3] J. M. Carmena, M. A. Lebedev, R. E. Crist, J. E. O'Doherty, D. M. Santucci, D. F. Dimitrov,\n     P. G. Patil, C. S. Henriques, and M. A. L. Nicolelis. Learning to control a brain-machine\n     interface for reaching and grasping by primates. PLOS Biology, 1(2):001016, 2003.\n [4] K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based\n     vector machines. Jornal of Machine Learning Research, 2:265292, 2001.\n [5] J. Eichhorn, A. Tolias, A. Zien, M. Kuss, C. E. Rasmussen, J. Weston, N. Logothetis, and\n     B. Scholkopf. Prediction on spike data using kernel algorithms. In NIPS 16. MIT Press, 2004.\n [6] A. P. Georgopoulos, J. Kalaska, and J. Massey. Spatial coding of movements: A hypothesis\n     concerning the coding of movement direction by motor cortical populations. Experimental\n     Brain Research (Supp), 7:327336, 1983.\n [7] R. E. Isaacs, D. J. Weber, and A. B. Schwartz. Work toward real-time control of a cortical\n     neural prothesis. IEEE Trans Rehabil Eng, 8(196198), 2000.\n [8] C. Mehring, J. Rickert, E. Vaadia, S. C. de Oliveira, A. Aertsen, and S. Rotter. Inference of\n     hand movements from local field potentials in monkey motor cortex. Nature Neur., 6(12), 2003.\n [9] R. Paz, T. Boraud, C. Natan, H. Bergman, and E. Vaadia. Preparatory activity in motor cortex\n     reflects learning of local visuomotor skills. Nature Neur., 6(8):882890, August 2003.\n[10] M. D. Serruya, N. G. Hatsopoulos, L. Paninski, M. R. Fellows, and J. P. Donoghue. Instant\n     neural control of a movement signal. Nature, 416:141142, March 2002.\n[11] L. Shpigelman, Y. Singer, R. Paz, and E. Vaadia. Spikernels: Embedding spiking neurons in\n     inner product spaces. In NIPS 15, Cambridge, MA, 2003. MIT Press.\n[12] A. Smola and B. Scholkop. A tutorial on support vector regressio. In NeuroCOLT2 Technical\n     Report, 1998.\n[13] S. I. H. Tillery, D. M. Taylor, and A. B. Schwartz. Training in cortical control of neuropros-\n     thetic devices improves signal extraction from small neuronal ensembles. Reviews in the Neu-\n     rosciences, 14:107119, 2003.\n[14] V. Vapnik. The Nature of Statistical Learning Theory. Springer, N.Y., 1995.\n[15] J. Wessberg, C. R. Stambaugh, J. D. Kralik, P. D. Beck, M. Laubach, J. K. Chapin, J. Kim,\n     J. Biggs, M. A. Srinivasan, and M. A. Nicolelis. Real-time prediction of hand trajectory by\n     ensembles of cortical neurons in primates. Nature, 408(16), November 2000.\n[16] W. Wu, M. J. Black, Y. Gao, E. Bienenstock, M. Serruya, and J. P. Donoghue. Inferring hand\n     motion from multi-cell recordings in motor cortex using a kalman filter. In SAB02, pages 6673,\n     Edinburgh, Scotland (UK), 2002.\n\n\f\n", "award": [], "sourceid": 2644, "authors": [{"given_name": "Lavi", "family_name": "Shpigelman", "institution": null}, {"given_name": "Koby", "family_name": "Crammer", "institution": null}, {"given_name": "Rony", "family_name": "Paz", "institution": null}, {"given_name": "Eilon", "family_name": "Vaadia", "institution": null}, {"given_name": "Yoram", "family_name": "Singer", "institution": null}]}