{"title": "Spikernels: Embedding Spiking Neurons in Inner-Product Spaces", "book": "Advances in Neural Information Processing Systems", "page_first": 141, "page_last": 148, "abstract": null, "full_text": "Spikernels:\n\nEmbedding Spiking Neurons\n\nin Inner-Product Spaces\n\nLavi Shpigelman\u0002\u0001 Yoram Singer Rony Paz\u0001\u0002\u0003 Eilon Vaadia\u0001\u0005\u0004\n\n School of computer Science and Engineering\n\n\u0001 Interdisciplinary Center for Neural Computation\n\u0003 Dept. of Physiology, Hadassah Medical School\nThe Hebrew University Jerusalem, 91904, Israel\n\n{shpigi,singer}@cs.huji.ac.il\n{ronyp,eilon}@hbf.huji.ac.il\n\nAbstract\n\nInner-product operators, often referred to as kernels in statistical learning, de-\n\ufb01ne a mapping from some input space into a feature space. The focus of\nthis paper is the construction of biologically-motivated kernels for cortical ac-\ntivities. The kernels we derive, termed Spikernels, map spike count sequences\ninto an abstract vector space in which we can perform various prediction tasks.\nWe discuss in detail the derivation of Spikernels and describe an ef\ufb01cient al-\ngorithm for computing their value on any two sequences of neural population\nspike counts. We demonstrate the merits of our modeling approach using the\nSpikernel and various standard kernels for the task of predicting hand move-\nment velocities from cortical recordings. In all of our experiments all the ker-\nnels we tested outperform the standard scalar product used in regression with\nthe Spikernel consistently achieving the best performance.\n\n1\n\nIntroduction\n\nNeuronal activity in primary motor cortex (MI) during multi-joint arm reaching movements in 2-\nD and 3-D [1, 2] and drawing movements [3] has been used extensively as a test bed for gaining\nunderstanding of neural computations in the brain. Most approaches assume that information is\ncoded by \ufb01ring rates, measured on various time scales. The tuning curve approach models the\naverage \ufb01ring rate of a cortical unit as a function of some external variable, like the frequency\nof an auditory stimulus or the direction of a planned movement. Many studies of motor cortical\nareas [4, 2, 5, 3, 6] showed that while single units are broadly tuned to movement direction,\na relatively small population of cells (tens to hundreds) carries enough information to allow\nfor accurate prediction. Such broad tuning can be found in many parts of the nervous system,\nsuggesting that computation by distributed populations of cells is a general cortical feature. The\npopulation-vector method [4, 2] describes each cell\u2019s \ufb01ring rate as the dot product between that\ncell\u2019s preferred direction and the direction of hand movement. The vector sum of preferred\ndirections, weighted by the measured \ufb01ring rates is used both as a way of understanding what\nthe cortical units encode and as a means for estimating the velocity vector.\nSeveral recent studies [7, 8, 9] propose that neurons can represent or process multiple parameters\nsimultaneously, suggesting that it is the dynamic organization of the activity in neuronal popu-\nlations that may represent temporal properties of behavior such as the computation of transfor-\nmation from \u2019desired action\u2019 in external coordinates to muscle activation patterns. Some studies\n\n\f[10, 11, 12] support the notion that neurons can associate and dissociate rapidly to functional\ngroups in the process of performing a computational task. The concepts of simultaneous encod-\ning of multiple parameters and dynamic representation in neuronal populations, could together\nexplain some of the conundrums in motor system physiology. These concepts also invite usage\nof increasingly complex models for relating neural activity to behavior. Advances in comput-\ning power and recent developments of physiological recording methods allow recording of ever\ngrowing numbers of cortical units that can be used for real-time analysis and modeling. These\ndevelopments and new understandings have recently been used to reconstruct movements on the\nbasis of neuronal activity in real-time in an effort to facilitate the development of hybrid brain-\nmachine interfaces that allow interaction between living brain tissue and arti\ufb01cial electronic or\nmechanical devices to produce brain controlled movements [13, 6, 14, 15, 11, 16, 17]. Cur-\nrent attempts at predicting movement from cortical activity rely on modeling techniques such as\ncosine-tuning estimation (pop. vector) [18], linear regression [15, 19] and arti\ufb01cial neural nets\n[15] (though this study reports getting better results by linear regression). A major de\ufb01ciency\nof standard approaches is poor ability to extract the relevant information from monitored brain\nactivity in an ef\ufb01cient manner that will allow reducing the number of recorded channels and\nrecording time.\n\nThe paper is organized as follows. In Sec. 2 we describe the problem setting that this paper\nIn Sec. 3 we introduce and explain the main mathematical tool that we\nis concerned with.\nuse, namely, the kernel operator.\nIn Sec. 4 we discuss the design and implementation of a\nbiologically-motivated kernel for neural activities. We report experimental results in Sec. 5 and\ngive conclusions in Sec. 6.\n\nical motor behavior of a subject. Our goal is to learn a predictive model of some behavior\n\n2 Problem setting\nConsider the case where we monitor instantaneous spike rates from cortical units during phys-\nparameter with the cortical activity as the input. Formally speaking, let \u0001\u0003\u0002\u0005\u0004\u0007\u0006\t\b\u000b\n be a sequence\nof instantaneous \ufb01ring rates from \f cortical units consisting of \r\nsamples altogether. We use \u0001\u000f\u000e\u0011\u0010\nthe length of a sequence \u0001 . Let \u0001\u001d\u001c be the\nto denote sequences of \ufb01ring rates and denote by \u0012\u0014\u0013\u0016\u0015\u0018\u0017\u0019\u0001\u001b\u001a\n\u001e th sample (i.e. instantaneous \ufb01ring rates) of a sequence\u0001 . We also use \u0001 \u001f\nnation of \u0001 with one more sample \u001f\n. We refer to the instantaneous \ufb01ring rate of a unit ! by \u001f#\" .\nWe also need to employ a notation for sub-sequences. The$ -long pre\ufb01x\u0001\n( . Finally,\nthroughout the work we need to examine a substrings of sequences. We denote by ) a vector of\n\u001e5-\n37\u0012\u0014\u0013\u0016\u0015#\u0017\u0019\u0001\u001b\u001a .\nindices into the sequence \u0001 where )+*,\u0017\nWe also need to introduce some notation for target variables we would like to predict. Let89\u0002\u0005\u0004\u0007\n\nvelocity in the \u001f direction, :<;\n( of the form >@?\n\u0006A\b\u001d\nCB\ntherefore con\ufb01ne ourselves to causal predictors that use \u0001\nlike to make =\n3 Kernel methods for regression\n\nis denoted\u0001&%\u0011'\n36/ / /&3\n). Our goal is to learn an approximation =\n\nfrom neural \ufb01ring rates to movement parameter. In general, information about\nmovement can be found in neural activity both before and after the time of movement itself. Our\nplan, though, is to design a model that can be used for controlling a neural prosthesis. We will\n\n\u001a as close as possible (in a sense that is explained in the sequel) to8\n\ndenote some parameter of the movement that we would like to predict (e.g.\n\n( . We therefore would\n( .\n\nto predict 8\n\nto denote the concate-\n\n\u001a and 243\n\n\u001e.-\n\n\u001e.1\n\n\u000e / / /0\u000e\n\n\u001e.1\n\nthe movement\n\n*D>E\u0017.\u0001\n\n%F'\n\n%\u0011'\n\nA major mathematical notion employed in this paper is kernel operators. Kernel operators al-\nlow algorithms whose interface to the data is limited to scalar products to employ complicated\npremappings of the data into feature spaces by use of kernels. Formally, a kernel is an inner-\nis some arbitrary vector space. An explicit way\nsuch that\n\nproduct operator GH?JILKMI\nto describe G\nGQ\u0017\u0014\u001fJ\u000e\u0011\u001f+RS\u001aT*UN#\u0017\u0014\u001fV\u001aXW<N\u0018\u0017\u0014\u001f+RY\u001a . Given a kernel operator we can use it to perform various statistical\n\nlearning tasks. One such task is support vector regression (SVR) [20] which attempts to \ufb01nd a\nregression function for target values that is linear if observed in the (typically very large) feature\nspace mapped by the kernel. We give here a brief description of SVR for the the sake of clarity.\n\nto an inner-products space O\n\nis via a mapping N6?\u0018I\n\n\u0004 where I\nBPO\n\nfrom I\n\n\u001e\n%\n\u000e\n\u001e\n%\n3\n8\n\u0004\n\u0004\n\n(\n8\n(\n(\nB\n\f8\u0003\u0002\n\n>J\u0017\u0005\u0004\n\nples that fall within it\u2019s boundaries are considered well estimated and do not contribute to the\nis the feature vec-\n\nSupport Vector Regression minimizes Vapnik\u2019s [21]  -insensitive loss function \u0001\n\b\n\t\f\u000b\u000e\r\u0010\u000f\nerror. Examples outside the tube contribute linearly to the loss. Say N#\u0017\u0005\u0004\ntor implemented by kernel GQ\u0017\nWS\u000e\u0014\u0004\n>J\u0017\u0005\u0004\n\u001b\u001d\u001c\n\n\u0002\u0011\u0013\u0012 which de\ufb01nes a hyperplane with width  around the estimate. Exam-\n\u001a . To estimate a linear (linear in feature space) regression\n\n\u001a\u0018\u0017\u001a\u0019 with precision  , one minimizes\n>E\u0017\u0019N\n\nW\u001bN#\u0017\u0005\u0004\n\n\u0017\u001a\u001e\n\n>J\u0017\u0005\u0004\n\n8\u0003\u0002\n\n\u001a\u0011\u001a\n\n\u0017\u0016\u0004\n\n\u0017\u0016\u0015\n\n\u0001\u0007\u0006\n\nThis can be written as a constrained minimization problem\n\n\u001c\"!\n\nminimize\n\nsubject to\n\n\u0017*$\u0016\u001c+\u0017,$'&\n\n\u0017)\u001e\n\n\u000e%$&\u000e%$'&\u0016\u001a\u0018*\n\u001b(\u001c\n\u0017\u0005\u0004J\u001c.\u001a-\u0017/. \u001a0\u0002\nW N\n\n\u001c\u0007!\n8A\u001c\u00183\u001a1\u00172$ \u001c\n\u001c\u0019\u001a-\u0017/. \u001aX3\u001a1\u00172$4&\n\n\u00173\u0004\n\n#\u0003\u0017\u0005\u0015\n\u0017\u0005\u0015\n8\t\u001c\u0018\u0002\n$\u0016\u001c\n\u000e%$\n\nW\u0016N\n\u0017\u0016\u0015\n\u001c65\n\nBy switching to the dual problem of this optimization problem, it is possible to incorporate the\nkernel function, achieving a mapping that may not be feasible by calculating (possibly in\ufb01nite)\n\nfeature vectors N#\u0017\u0019\u0001\t\u001a . For \u001e87\n\nmaximize\n\n\u000e9\n\u000e<;\n\n\u0017\u0016;\n\n\u001a\u0018*\n\nsubject to\n\n2A\u000e / / / \u000e<DE\u0012T?\n\nThe solution of the regression estimate takes the form\n\n\u0017\u0016;\n\n\u0002=\n\n\u000f chosen a-priori, the dual problem is\n\u0017\u0016;\n\u00022;\n!\u0003\u0017\u0005\u0004\n\u0002,;\u0018@\u0016\u001a\n\u0002,;\n\u0017\u0016;\n\n\u0017/;\n\u001a>\u0017\n\u001c\"!\n\u0017\u0016;A&\n\u00022;\n\u0017\u0016;A&\n\u001c.\u001a\n\u000eG\u001eIH and \u001f\n\u00022F\n\u001c\u0007!\n\n\u001c\"!\n\u001c\u0005?\n\n@<!\n\n\u000e<;\n\n\u001c.\u001a\n\n8A\u001c\n\u000e\u0014\u0004B@\u0016\u001a\n\n>J\u0017\u0005\u0004\n\n\u0017\u0016;\n\n\u001c\u0007!\n\n\u0002,;\n\n\u001c.\u001a\n\n!V\u0017\u0005\u0004\n\n\u001c\u0011\u000e\u0014\u0004J\u001a\u0018\u0017\u001a.\n\nIn summary, SVM regression solves a quadratic optimization problem to \ufb01nd a hyperplane in the\n\nkernel induced feature space that best estimates the data for an -insensitive linear loss function.\n4 Spikernels\n\nThe quality of SVM learning is highly dependent on how the data is embedded in the feature\nspace via the kernel operator. For this reason, several studies have been devoted lately to devel-\noping new kernels [22, 23, 24]. In fact, for classi\ufb01cation problems, a good kernel would render\nthe work of the classi\ufb01cation algorithm trivial. With this in mind, we develop a kernel for neural\nspiking activity.\n\n4.1 Motivation\n\nOur goal in developing a kernel for spike trains is to map similar patterns to nearby areas of the\nfeature space. Current methods for predicting response variables from neural activities use stan-\ndard linear regression techniques (see for instance [15]) or or even replace the time pattern with\nmean \ufb01ring rates. A notable example is the population vector method [18]. Other approaches use\noff-the-shelf learning algorithms, intended for general purpose. In the description of our kernel\nwe attempt to capture some well accepted notions on similarities between spike trains. We make\nthe following assumptions regarding similarities between spike patterns:\n\n\u001a\n*\n\u000e\n\u0001\n\u001a\n\u0001\n\u001a\n\u001a\n*\n\u001a\n2\n\u0015\n\u001c\n-\n\u001f\n \n%\n\u0001\n8\n\u001c\n\u0002\n\u001c\n\u0001\n\u0006\n2\n\u0015\n\u001c\n-\n\u001f\n \n%\n\u001c\n\u001a\n\u001c\n&\n\u000f\n\u000f\n5\n:\n&\n\u001f\n \n%\n&\n\u001c\n\u001c\n\u001f\n \n%\n&\n\u001c\n\u0002\n2\n\u001b\n\u001f\n \n%\n\u001c\n@\n\u001c\nC\n\u001e\n\u0002\n\n;\n\u001c\n&\n\u001c\n\u000f\n \n%\n\u001c\n&\n\u001c\n\u001a\n*\n\u000f\n\u001a\n*\n\u001f\n \n%\n&\n\u001c\n\fPattern A\nPattern A\nPattern B\nPattern B\n\nR\na\nt\ne\n\nPattern(cid:0)A\nPattern(cid:0)A\nPattern(cid:0)B\nPattern(cid:0)B\n\nR\na\n\nt\n\ne\n\nPattern(cid:0)A\nPattern(cid:0)A\nPattern(cid:0)B\nPattern(cid:0)B\n\nTime(cid:0)of\nTime(cid:0)of\nInterest(cid:0)\nInterest(cid:0)\n\nR\na\n\nt\n\ne\n\nTime\n\nTime\n\nTime\n\nFigure 1: Illustrative examples of pattern similarities. Left: bin-by-bin comparison yields small\ndifferences. Middle: patterns with large bin-by-bin differences that can be eliminated with some\ntime warping. Right: patterns whose suf\ufb01x (time of interest) is similar and pre\ufb01x is different.\n\n The most commonly made assumption is that similar \ufb01ring patterns may have small differences\nin a bin-by-bin comparison. This type of variation is due to inherent noise of any physical system\nbut also responses to external factors that were not recorded and are not directly related the to\nthe task performed. On the left-hand side of Fig. 1 we show an example of two patterns that are\nbin-wise similar though clearly not identical.\n A cortical population may display highly speci\ufb01c patterns to represent speci\ufb01c information. It\nis conceivable that some features of external stimuli are represented by population dynamics that\nwould be best described as \u2019temporal\u2019 coding.\n Two patterns may be quite different in a simple bin-wise comparison but if they are aligned\nby some non-linear time distortion or shifting, the similarity becomes apparent. An illustration\nof such patterns is given in the middle plots of Fig. 1. In comparing patterns we would like to\ninduce a higher score when the time-shifts are small.\n\n Patterns that are associated with identical values of an external stimulus at time $ may be\nsimilar at that time but very different at $\n\u0001\u0003\u0002 when values of the external stimulus for these\n\npatterns are no longer similar (as illustrated on the right-hand-side of Fig. 1).\n\n4.2 Kernel de\ufb01nition\n\nWe describe the kernel by specifying the features that make up the feature space. Our construc-\ntion of the feature space builds on the work of Lodhi et al. [24]. First, we need to introduce a few\n\n\u0017\u0016;\n\n\u0012\u0014\u0013\u0016\u0015\u0018\u0017\u0019\u0001\u001b\u001a . The set of all possible \u0015 -long index\n\u0012\u0014\u0013\u001b\u0015#\u0017\u0019\u0001\u001b\u001a\u0014\u0012 . Also,\n)J\u0002\t\b\n\u000e\r\f#\u001a denote a bin-wise distance over a pair of samples (\ufb01ring rates). We also overload no-\n\"\u000f\u001a a distance between sequences. The sequence\n\u000e\u00162\u0016\u001a . The\n\u0002D\u0017\n\nmore notations. Let \u0001 be a sequence of length \u0012\nvectors de\ufb01ning a sub-sequence of \u0001\nlet \ntation and denote by \n\u001c\u0015\u0014\n\u000e\u0011\u0010\n*\u0013\u0012\ncomponent of our (in\ufb01nite) feature vector N\u0018\u0017\u0019\u0001\u001b\u001a\n\ndistance is the sum over the samples constituting the two sequences. Let\n\nis de\ufb01ned as,\n\n\u0004\u0006\u00059?\n\u0007\u0017\u0019\u0001\n\n\u0017\u0019\u0001\u000f\u000e.\u000e\u0011\u0010\n\n\u0016#\u000e\u0011\u0017\n\n/ / /\n\n243\n\n%\u000b\n\nis\n\n\u000e\u001e\u001d \u001f\n\n\u0016*),+.-0/\n\nis compared to\n\nis a normalization constant that simpli\ufb01es the calculation and and )F%\n\n\u0017\u0019\u0001\u001b\u001a\u0018*\nis a sum over all n-long sub-sequences of \u0001 . Each sub-sequence\n\nN\u0019\u0018\nwhere and \u001e\nindex of ) . In words, N8\u0018V\u0017\u0019\u0001\t\u001a\nIn particular, part of the weight of each sub-sequence of \u0001\nsequence is toward the end of \u0001 . Put another way, the entry indexed by\nis to the time series \u0001 near its end.\n\nThis de\ufb01nition seems to \ufb01t our assumptions on neural coding for the following reasons:\n\n(the feature coordinate) and is weighted up according to its similarity to\n\n.\nre\ufb02ects how concentrated the sub-\n\nmeasures how close\n\nis the \ufb01rst\n\n#$\u001a\u0006%\n\n\u001a\u000f!\n\n&('\n\n\u001e\u001b\u001a\n\n1546\u000e\u001e7\n\n(1)\n\n\u0018 1\n\n\u000732\n\n+.-\n\nIt allows for complex patterns: small values of\n\nand\n\neach coordinate\n\nsuf\ufb01x of \u0001 or not.\n\ntends toward being either 2 or \u000f depending whether\n\n(or concentrated  measures) mean that\n\nis almost identical to a\n\n*\n\u0007\n*\n\n)\n?\n1\n)\n\n)\n1\n3\n\u001a\n1\n\"\n!\n%\n\u000f\n\u0010\n\u001c\n \n\"\n?\n\u0017\n1\n\u000e\n\u0010\n\u0010\n\u0010\n\u0010\n\n\u0017\n\u0016\n\u0010\n\u0010\n\f Patterns that are piece-wise similar to\nthat decays as the sample-by-sample comparison between the sequences grows large.\n We allow gaps in the indexes de\ufb01ning sub-sequences, thus, allowing for time warping.\n Patterns that begin further from the required prediction time are penalized by an exponentially\ndecaying weight.\n\nfeature coordinate with a weight\n\ncontribute to the\n\n4.3 Ef\ufb01cient kernel calculation\n\nhe de\ufb01nition of N given by Eq. (1) requires the manipulation of an in\ufb01nite feature space. Straight-\n\nforward calculation of the feature values and performing the induced inner-product is clearly\nimpossible. Based on ideas from [24] we developed an indirect method for evaluating the kernel\nthrough a recursion which can be performed ef\ufb01ciently using dynamic programing. We now\ndescribe the recursion.\n\n1 . We now describe two recursive\nDenote by\u001f\nequations for N with respect to the length of the time series and the sub-sequence length. Due\nto the lack of space we skip some of the algebraic manipulations that are needed to derive the\nrecursions. The \ufb01rst equation is\n\nthe last entry in the sequence\u0001 \u001f\n\n+.-\n\n\u000732\n\n\u0017\u0019\u0001\u0016\u001fV\u001a\n\n\u0017\u0019\u0001\t\u001a\u0018\u0017\n\nis, again, with respect to both the length of the sub-sequence (\n\nEq. (2) simply separates the sum over sub-sequences of \u0001\nspeci\ufb01ed by the index vectors and the latter where )\nfor N\nsequence \u0010 ,\n\u0003\u0005\u0004\nfor\b\nThe last equation simply states that the feature is a sum over all possible values of)\n\n),+\ninto two subsets: one where \u001f\n1 speci\ufb01es \u001f\n@\u0007\u0006\n\n. Note that\nis empty. Eqs. (2) and (3) are now used for computing the recursion equation for\n\nis not\n. The second recursive equation\n) and the length of the\n\nN*\u0018V\u0017\u0014\u0010\u001b\u001aP*\n\n7J\u0017\u0019\u0010\t%\u0011'\n\n4V%0\u001a\n\nN*\u0018\n\n(3)\n\n@\u0014!\n\n),+\n\n154\n\n\u000732\n\n\u000732\n\n:\n\n,\n\n7\u0016\u0017\u0019\u0001\u001b\u001a\n\n(2)\n\n\u0017\u0019\u0001 \u001fJ\u000e\u0011\u0010\u001b\u001a#*\n\nN*\u0018V\u0017\u0019\u0001\u0016\u001fV\u001a\n\u0017\u0019\u0001 \u001fV\u001a and plug Eq. (3) into N8\u0018\n\n\u0017\u0014\u0010\u001b\u001a\u000b\nN\u0019\u0018\nWe plug Eq. (2) into N*\u0018\n\u0017\u0014\u0010\u001b\u001a . Using algebraic manipulations we\nreplace integrals over scalar products of N by the proper kernels and get the following recursive\n1(4)\n\nfunction,\n\n\u0010\u000f\n\n\u000732\n\n\u0003\u0011\u0004\n\n\u0016*),+\n\n\u0016*),+\n\n@\u0007\u0006\n\n\t\u000b\n\r\f\u0007\u000e\n\nAssuming that the computation time of the integral in Eq. (4) is a constant, computing the entire\nif\n\nwe cache the term on the right hand side of Eq.(4) as follows. De\ufb01ne,\n\n\u000732\n\n\u000732\n\n154\n\n@<!\n\n\u0017\u0019\u0001\u0016\u001fJ\u000e\u0011\u0010\u001b\u001a\n\nThe initial conditions are:\n\n\u0017\u0019\u0001\u000f\u000e\u0011\u0010\u001b\u001a-\u0017\u001a\u001e\nif \b\u0015\u0014\u0017\u0016\n\u0001\u0007\u0002\u0005\u0004\nrecursion requires\u0018\u001a\u0019.\u0012\u0014\u0013\u0016\u0015#\u0017\u0019\u0001\t\u001a\u0018\u0012\u0014\u0013\u0016\u0015\n\u0003\u0011\u001d\n\u0017\u0019\u0001 \u001fJ\u000e\u0011\u0010\u00168\u000b\u001aP*\n\u0017\u0019\u0001\u000f\u000e\u0016\u0017\u0014\u0010\u00168\u000b\u001a\n4V%\nSeparating the above sum into its two parts (one for\b\n\n\u0010\u000f\nthe de\ufb01nition of G\n\u0016*),+\n\u000e\u0011\u0010\u0003\u0002\u0005\u0004\n\u000eF\u0012\u0014\u0013\u001b\u0015#\u0017\u0014\u0010\u001b\u001a\u0014\u0012\n\n%\u001b\u0017\u0019\u0001\u000f\u000e\u0011\u0010\u001b\u001a\n\u000732\n+.-\n\u0012\u0014\u0013\u0016\u0015#\u0017\u0019\u0001\u001b\u001a\n\n\u001e\u0003G\nif \b\u0015\u0014\u0017\u0016\n\u0002\u0005\u0004\n\n\u0017\u0019\u0001 \u001fJ\u000e\u0011\u0010\u00168&\u001a\n\n@<!\n\n\u000732\n\n\u000732\n\n\u000732\n\n\u000732\n\u000e\u0011\u0010T\u0002\n+.-\n\u000eF\u0012\u0014\u0013\u001b\u0015#\u0017\u0014\u0010\u001b\u001a\u0014\u0012\n\u0012\u0014\u0013\u0016\u0015#\u0017\u0019\u0001\u001b\u001a\n\u0015\u001c\u001b\ntime. We can achieve a speed up by a factor of \u0012\u0014\u0013\u0016\u0015\u0018\u0017\u0014\u0010\u001b\u001a\n\u0017\u0014\u0010\u001b\u001a\n\u0003\u001e\u001d\n@\u0007\u0006\n1 (5)\n*D\u0012\u0014\u0013\u001b\u0015#\u0017\u0014\u0010\u00168\u000b\u001a and one for the rest), and using\n\n\u001f\u000f\n\n\u0003\u0011\u001d\n\n\u00168)\n\n%F'\n\n),+\nR ,\n\u0017\u0019\u0001 \u001fJ\u000e\u0011\u0010\u001b\u001a\n\n(6)\n\n%J\u0017\u0019\u0001\u000f\u000e\u0011\u0010\t%\u0011'\n%0\u001a\nG\u0013\u0012<\u0017\u0019\u0001\u000f\u000e\u0011\u0010\u001b\u001aP*\n\u0017\u0019\u0001\u000f\u000e\u0011\u0010\u001b\u001a\n\n\u0017\u0019\u0001<\u000e\u0011\u0010\u001b\u001a\u0018*,2\n\u0017\u0019\u0001<\u000e\u0011\u0010\u001b\u001a\u0018*\n\nR from Eq.(5) we get the following recursive equation for G\n\n\u0010\n\u0010\n\u0002\n\u0004\n\u0006\n\u0002\n\u0004\n)\n\b\n1\n;\nN\n\u0018\n*\n\u0017\nN\n\u0018\n\u0017\n\u0016\n\u0018\n\u001a\n?\n;\n1\n\u001e\n7\n\u001c\nN\n\u0018\n7\n\u0001\n\u001a\n\u0002\n\u0010\n\u001e\n7\n\u001c\n1\n+\n\u0003\n1\n \n%\n\u0016\n\u0018\n\u001a\n?\n1\n\u0017\n1\n+\n\u0003\n%\n7\n\u0001\n\u001a\n\u0002\n@\n1\n\n\u0015\n\u0004\n1\n?\n@\nG\nG\n1\n\u001a\n\u0010\nG\n1\n*\n\u0017\nG\n1\n1\n+\n\u0003\n1\n \n%\n\u0017\n1\n+\n\u0003\n-\nG\n1\n4\n@\n4\n\t\n\u0018\n\u001a\n?\n;\n1\n\u0018\n\u001a\n?\n1\n\n\u0010\nC\n)\n\b\n1\n1\n\u0004\n)\n\b\n1\n+\n\u0003\n1\n2\n\n\u001e\nG\n\u001c\n*\n\u000f\n-\nG\nR\n1\n\u001e\n1\n+\n1\n \n%\n\u0017\n1\n+\n1\n4\n-\nG\n1\n@\n4\n%\n\u001a\n\t\n+\n\u0018\n\u001a\n?\n;\n1\n\u0006\n\u0018\n\u001a\n?\n+\n1\n\u0004\n1\n\n\u0010\nG\nR\n1\n*\n\u0017\n-\n1\n4\n\t\n\u0018\n\u001a\n?\n;\n1\n\u0006\n)\n+\n\u0018\n\u001a\n?\n\u001d\n1\n\n\u0010\n1\n\u0017\n\u0017\nG\nR\n1\nC\n\u0001\n)\n\b\n1\n1\n)\n\b\n1\n+\n\u0003\n1\nG\nR\n\u0012\n\n\u001e\nG\nR\n\u001c\n\u000f\n\fis,\n\nFinally, the recursive equation for G\n\u0017\u0019\u0001 \u001fJ\u000e\u0011\u0010\u001b\u001a\n\u0017\u0019\u0001 \u001fJ\u000e\u0011\u0010\u001b\u001a\nyielding an\u0018D\u0017.\u0012\u0014\u0013\u0016\u0015#\u0017\u0019\u0001\t\u001a\u000b\u0012\u0014\u0013\u0016\u0015\u0018\u0017\u0014\u0010\u001b\u001a\u000b\u0015\n\u001a dynamic programing solution for G\nThe kernels de\ufb01ned by Eq.(1) consider only patterns of \ufb01xed length (\u0015 ). It makes sense to look\n\nat sub-sequences of various lengths. Since a linear combination of kernels is also a kernel, we\ncan de\ufb01ne our kernel to be\n\n4.4 Spikernel variants.\n\n\u0017\u0019\u0001<\u000e\u0011\u0010\u001b\u001a .\n\n\u0017\u0019\u0001\u000f\u000e\u0011\u0010\u001b\u001a-\u0017\n\n-\u0006\u0005\n\u0016\u0015\u0014\n\ngoes to in\ufb01nity as\n\n\u001a#*\n\n\u001c\u0007!\n\n\u0017\u0016;\n\n\u000e\r\f\n\n\u00168)\n\n\u001a\u0003\u0002\n\n;E\u0002\n\n;#\"\n\n\fV\"\u000f\u001a\n\nin the kernel recursion Eq.(6) becomes:\n\nDifferent choices of +\u0017\u0016;\nSay we assign \n\u000e\r\f\n\nThe kernel summation can be interpreted as a concatenation of the feature vectors that these ker-\nnels represent. Weighted summation is concatenation of the feature vectors after \ufb01rst multiplying\nthem by the square root of the weights.\n\nGQ\u0017\u0001\u000f\u000e\u0011$\n\u0017\u0001\u000f\u000e\u0011$\n\u001a result in kernels that differ in the way two rate values are compared.\n- , the integral\n- norm: \u001c\nto be the squared\u0004\n\u000e\u0010\u000f\n\r\f\n),+\n\u0007\n+\b\u0007\n\u001c , which has an \u0015\nNote that the constant\u0018\u001a\u0019\n\u001b\u001d\u001c\u001f\u001e\n \"!$#\u0001%'&\ncan easily cancel it with the constant \u001e\nWe show results for the\u0004\n\n\u001b\u0013\u0012\n\u001c\u0017\u0016\nfold gain affect on G\n\u001c\u0017\u0016\n\n\u0017\u0019\u0001 \u001fJ\u000e\u0011\u0010\u00168\u000b\u001aP*\n- norm.\n5 Experimental results\n\ngoes to 1. This gain results in a kernel whose computation is numerically unstable. However, we\n\n. Substituting this result back into Eq.(4) we get\n\nData collection: The data used in this work was recorded from the primary motor cortex of a\nrhesus (Macaca mulatta) monkey (~4.5 kg). The animal\u2019s care and surgical procedures accorded\nwith The NIH Guide for the Care and Use of Laboratory Animals (rev. 1996) and with the\nHebrew University guidelines supervised by the institutional committee for animal care and use.\nThe monkey sat in a dark chamber, and 8 electrodes were introduced into each hemisphere.\nThe electrode signals were ampli\ufb01ed, \ufb01ltered and sorted (MCP-PLUS, MSD, Alpha-Omega,\nNazareth, Israel). The data used in this report includes 31 single units and 16 multi-unit channels\n(MUA) that were recorded in one session by 16 microelectrodes. The monkey used two planar-\nmovement manipulanda to control 2 cursors (X and + shapes) on the screen to perform center-out\ntask. Each trial begun when the monkey centered both cursors on a central circle for 1.0-1.5s.\nEither cursor could turn green, indicating the hand to be used in the trial (X for right arm and\n+ for the left). Then, (after an additional hold period of 1.0-1.5s) one of eight targets appeared\nat a distance of 4 cm from the origin and monkey had to move and reach the target in less than\n2s to receive liquid reward. At the end of each session, we examined the activity of neurons\nevoked by passive manipulation of the limbs and applied intracortical microstimulation (ICMS)\nto evoke movements. The data presented here was recorded in penetration sites where ICMS\nevoked shoulder and elbow movements. Penetration locations were veri\ufb01ed by MRI (Biospec\nBruker 4.7 Tesla).\n\n%\u001b\u0017\u0019\u0001\u000f\u000e\u0011\u0010\u001b\u001a\n\n\u0017\u0019\u0001\u0016\u001fJ\u000e\u0011\u0010\u001b\u001a\n\nData preprocessing and modeling: The movements and spike data were preprocessed to cre-\nate a labeled corpus. We used only the data from trials on which the monkey succeeded in the\nmovement task and examined only the right hand movements. We partitioned the movement and\n\nspike trains into 2\n\nD( -long bins to get the spike counts and average hand movement velocities\n\nG\n1\n*\n\u0017\nG\n1\nG\nR\n1\n\u000e\n1\n1\n \n%\n\f\n\u001c\nG\n\u001c\n\f\n7\n\u000f\n/\n\u001a\n\f\n\u001c\n-\n\u0012\n)\n\"\n!\n%\n\u0017\n\u0002\n\t\n\n\u000f\n?\n\t\n1\n\u0006\n?\n\u000b\n1\n*\n\u0011\n\u0002\n\u0016\n)\n\u0016\n7\n\t\n4\n\u000b\n\u0016\n\u001c\n\u001c\n\u0016\nG\nR\n1\n\u0017\n-\nG\n1\n4\n\u0016\n7\n;\n4\n\u001d\n\u0016\n\u001c\n\u001c\n\u0017\n\u0017\nG\nR\n1\n\u000f\n\u000f\n\f\u0017J\u000e\n\n4\u0006\u0005\n\n+3-\n\nhence\n\n\u001a for time $ consisted of the I\n\nthe matrix of spike counts is of size\nEach kernel employs a few parameters (\n\n\u0001\u0004\u0002\n\nin each segment. We then normalized the spike counts to achieve a zero mean and a unit variance\nvelocity as\n\nor\n\n( . In our experiments the number of cortical units \f was\n\n). Therefore, the learning task is performed in two stages. First,\nwe used cross-validation to choose the best parameters using a validation set. Then, we learned to\n\nfor each cortical unit. A labeled example \u0017\u0019\u0001A(\u0011\u000eF:\t(\nthe target label and the preceding 2 second (i.e. 10 segments) of spike counts from all (\f ) cortical\nunits as the input sequence \u0001\n\u000f .\n\u0016#\u000e / / / ) and the SVM regression setup requires setting\nof two more parameters, ( and\u001e\npredict the response variable using SVR. Overall we had \u001b\n\u000f minutes of clean cortical recordings\n\u000f minutes as our validation set for tuning the parameters. The second\nof which we used the \ufb01rst 2\n(GQ\u0017\u0019\u0001\u000f\u000e\u0011\u0010\u001b\u001a\nW \u0010 ) which boils down to a linear regression, and the\nstandard scalar product kernel (GQ\u0017\u0019\u0001\u000f\u000e\u0011\u0010\u001b\u001a#*\n\nhalf was used for training and testing. The kernels that we tested are the exponential kernel\n), the\n\n\u001a , the homogeneous polynomial kernel (GQ\u0017\u0019\u0001\u000f\u000e\u0011\u0010\u001b\u001a\n\nSpikernel.\nAccuracy results were obtained by performing 5-fold cross-validation for each kernel. The 5\ngroups were\nfolds were produced by randomly splitting the data into 5 groups:\nused for training and the rest of the data was used for evaluation. This was process was repeated\n5 times by using once each \ufb01fth of the data as a test set. We computed the correlation coef\ufb01cient\nper fold for each kernel. The per-fold results are shown in Fig. 2A as a scatter plot. Each point\ncompares the Spikernel score versus one of the adversaries. The Spikernel out-performed the\n\n\u001d signal was more dif\ufb01cult than\nrest in every single test set. We found out that predicting the :\npredicting the :\u000f; signal. This may be the result of sampling a population of cortical units that\n\nare tuned more to the left-right directions. The mean results are summarized in Fig. 2B. The\nlinear regression method (scalar-product kernel) came in last. It seems that both re-mapping\nthe data by standard kernels and by the Spikernel allow for better prediction models. The or-\ndering of the kernels by their mean score is consistent when looking at per-test results, except\nfor the exponential kernel which is out-performed by linear regression in some of the tests.\n\nout of the\n\n*,\u0017\u0019\u0001\n\nW\u0016\u0010\u001b\u001a\n\n, \n\n\u0001\u0003\u0002\n\n\u000e\b\u0007\n\n0.8\n\nA\n\n0.7\n\n0.6\n\n0.5\n\nl\n\ne\nn\nr\ne\nk\np\nS\n\ni\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n0\n\n(cid:1)(cid:2)(cid:3)(cid:3)(cid:4)(cid:5)(cid:6)(cid:7)(cid:8)(cid:2)(cid:9)(cid:10)(cid:1)(cid:2)(cid:4)(cid:11)(cid:11)(cid:8)(cid:12)(cid:8)(cid:4)(cid:9)(cid:7)(cid:13)\n(cid:8)(cid:16) (cid:4)(cid:3)(cid:9)(cid:4)(cid:5)(cid:10)(cid:17) (cid:13)(cid:18) (cid:10)(cid:13)(cid:7)(cid:6)(cid:9)(cid:19) (cid:6)(cid:3)(cid:19) (cid:10)(cid:16) (cid:4)(cid:3)(cid:9)(cid:4)(cid:5)(cid:13)\n\n0.1\n\n0.2\n\n0.3\n\n0.4\n\n0.5\n\n0.6\n\n0.7\n\n0.8\n\nstandard(cid:0)embeddings\n\n6 Summary\n\nB\n\nKernel \nKernel \n\nSpikernel\nSpikernel\n\n(s\u00b7t)2\n(s\u00b7t)2\n(s\u00b7t)3\n(s\u00b7t)3\nexp(-gggg(s-t)2)\nexp(-gggg(s-t)2)\n\nLin. - (s\u00b7t)\nLin. - (s\u00b7t)\n\nMean(cid:0)Values\nvx\nvx\n\nvy\nvy\n\nMean r\nMean r\n0.70\n0.70\n\nMean r\nMean r\n0.49\n0.49\n\nParameters \nParameters \n\nmmmm ====0.99, , , , llll =0.7, N=5\nmmmm ====0.99, , , , llll =0.7, N=5\n\n0.62\n0.62\n\n0.56\n0.56\n0.47\n0.47\n\n0.44\n0.44\n\n0.36\n0.36\n\n0.29\n0.29\n0.25\n0.25\n\n0.21\n0.21\n\nC=0.01\nC=0.01\n\nC=10\nC=10\n\ng g g g =10-6 C=1\ng g g g =10-6 C=1\n\nC=0.01\nC=0.01\n\nFigure 2: \nThe Spikernel is compared to (color & shape \ncoded) standard kernels.\nA - Scatter plot of correlation coefficient \nresults in all cross-validation folds. \nB \u2013 Mean correlation coefficient values for \neach kernel type\nThe Spikernel out-performs in all folds.\n\nIn this paper we described an approach based on recent advances in kernel-based learning for\npredicting response variables from neural activities. On the data we collected, all the kernels we\ndevised outperform the standard scalar product that is used in linear regression. Furthermore,\nthe Spikernel, a biologically motivated kernel operator for spike counts outperforms all the other\nkernels. Our current research is focused in two directions. First, we are investigating the adapta-\ntions of the Spikernel to other neural activities such as Local Field Potentials (LFP). Our second\nand more challenging goal is to devise statistical learning algorithms that use the Spikernel as\n\n\nK\n2\n*\n\u0013\n4\n\u0003\n1\n\u001c\n)\n*\n\u001b\n\u0001\n\u0001\n\t\n(cid:14)\n(cid:15)\n\fpart of a dynamical system that may incorporate bio-feedback. We believe that such extensions\nare an important and necessary steps toward operational neural prostheses.\n\nAcknowledgments: Supported in part by the German-Israeli-Foundation for Scienti\ufb01c Re-\nsearch and Development (GIF) and by the German-Israeli Project Cooperation (DIP) established\nby BMBF.\n\nReferences\n[1] Georgopoulos AP, Schwartz AB, and Kettner RE. Neuronal population coding of movement direction.\n\nScience, 233:1416\u20131419, 1986.\n\n[2] Apostolos P. Georgopoulus, Ronald E. Kettner, and Andrew B. Wchwartz. Primate motor cortex and\nfree arm movements to visual targets in three-dimensional space. The Journal of NeuroScience, 8,\nAugust 1988.\n\n[3] Schwartz AB. Direct cortical representation of drawing. Science, 265:540\u2013542, 1994.\n[4] A. P. Georgopoulus, J.F. Kalaska, and J.T. Massey. Spatial coding of movements: A hypothesis con-\ncerning the coding of movement of movement direction by motor cortical populations. Experimental\nBrain Research (Supp), 7:327\u2013336, 1983.\n\n[5] Daniel W. Moran and Andrew B. Schwartz. Motor cortical representation of speed and direction\n\nduring reaching. Journal of Neurophysiology, 82:2676\u20132692, 1999.\n\n[6] Mark Laubach, Johan Wessberh, and Miguel A. L. Nicolelis. Cortical ensemble activity increasingly\n\npredicts behavior outcomes during learning of a motor task. Nature, 405(1), June 2000.\n\n[7] Fu QG, Flament D, Coltz JD, and Ebner TJ. Relationship of cerebellar purkinje cell simple spike\n\ndischarge to movement kinematics in the monkey. Journal of Neurophysiology, 78, 1997.\n\n[8] Donchin O, Gribova A, Steinberg O, Bergman H, and Vaadia E. Primary motor cortex is involved in\n\nbimanual coordination. Nature, 1998.\n\n[9] Anthony G. Reina, Daniel W. Moran, and Andrew B. Schwartz. On the relationship between joint\nangular velocity and motor cortical discharge during reaching. Journal of Neurophysiology, 85:2576\u2013\n2589, 2001.\n\n[10] E. Vaadia, I. Haalman, M. Abeles, H. Bergman, Y. Prut, H. Slovin, and A. Aertsen. Dynamics\nof neuronal interactions in monkey cortex in relation to behavioral events. Nature, 373:515\u2013518,\nFebuary 1995.\n\n[11] Nicolelis MA Laubach M, Shuler M.\n\nIndependent component analyses for quantifying neuronal\n\nensemble interactions. J Neurosci Methods, 1999.\n\n[12] A. Reihle, S. Grun, M. Diesmann, and A. M. H. J. Aersten. Spike synchronization and rate modulation\n\ndifferentially involved in motor cortical function. Science, 278:1950\u20131952, 1997.\n\n[13] Chapin JK, Moxon KA, Markowitz RS, and Nicolelis MA. Real-time control of a robot arm using\n\nsimultaneously recorded neurons in the motor cortex. Nature Neuroscience, 2:664\u2013670, 1999.\n\n[14] Miguel A. L. Nicolelis. Actions from thoughts. Nature, 409(18), January 2001.\n[15] Johan Wessberg, Christopher R. Stambaugh, Jerald D. Kralik, Pamela D. Beck, Mark Laubach,\nJohn K. Chapin, Jung Kim, James Biggs, Mandayam A. Srinivasan, and Miguel A. L. Nicolelis.\nReal-time predictionof hand trajectory by ensembles of cortical neurons in primates. Nature, 408(16),\nNovember 2000.\n\n[16] Nicolelis MA, Ghazanfar AA, Faggin BM, Votaw S, and Oliveira LM. Reconstructing the engram:\n\nsimultaneous, multisite, many single neuron recordings. Neuron, 18:529\u2013537, 1997.\n\n[17] Isaacs RE, Weber DJ, and Schwartz A. Work toward real-time control of a cortical neural prothesis.\n\n[18] Dawn M. Taylor, Stephen I. Helms Tillery, and Andrew B. Schwartz. Direct cortical control of 3d\n\nIEEE Trans Rehabil Eng, 8(196\u2013198), 2000.\n\nneuroprosthetic devices. Science, 2002.\n\n[19] Mijail D. Serruya, Nicholas G. Hatsopoulus, Liam Paninski, Matthew R. Fellows, and John P.\n\nDonoghue. Instant neural control of a movement signal. Nature, 416:141\u2013142, March 2002.\n\n[20] A. Smola and B. Sch. A tutorial on support vector regression, 1998.\n[21] Vladimir Vapnik. The Nature of Statistical Learning Theory. Springer, N.Y., 1995.\n[22] Tommi S. Jaakola and David Haussler. Exploiting generative models in discriminative calssi\ufb01ers. In\n\nNIPS, 1998.\n\n[23] Marc G. Genton. Classes of kernels for machine learning: A statistical perspective. Journal of\n\nMAchine Learning Research, 2:299\u2013312, January 2001.\n\n[24] Huma Lodhi, John Shawe-Taylor, Nello Cristianini, and Christopher J. C. H. Watkins. Text classi\ufb01-\n\ncation using string kernels. In NIPS, pages 563\u2013569, 2000.\n\n\f", "award": [], "sourceid": 2205, "authors": [{"given_name": "Lavi", "family_name": "Shpigelman", "institution": null}, {"given_name": "Yoram", "family_name": "Singer", "institution": null}, {"given_name": "Rony", "family_name": "Paz", "institution": null}, {"given_name": "Eilon", "family_name": "Vaadia", "institution": null}]}