{"title": "A novel family of non-parametric cumulative based divergences for point processes", "book": "Advances in Neural Information Processing Systems", "page_first": 2119, "page_last": 2127, "abstract": "Hypothesis testing on point processes has several applications such as model fitting, plasticity detection, and non-stationarity detection. Standard tools for hypothesis testing include tests on mean firing rate and time varying rate function. However, these statistics do not fully describe a point process and thus the tests can be misleading. In this paper, we introduce a family of non-parametric divergence measures for hypothesis testing. We extend the traditional Kolmogorov--Smirnov and Cramer--von-Mises tests for point process via stratification. The proposed divergence measures compare the underlying probability structure and, thus, is zero if and only if the point processes are the same. This leads to a more robust test of hypothesis. We prove consistency and show that these measures can be efficiently estimated from data. We demonstrate an application of using the proposed divergence as a cost function to find optimally matched spike trains.", "full_text": "A novel family of non-parametric cumulative based\n\ndivergences for point processes\n\nSohan Seth\n\nUniversity of Florida\n\nIl \u201cMemming\u201d Park\n\nUniversity of Texas at Austin\n\nAustin J. Brockmeier\nUniversity of Florida\n\nMulugeta Semework\n\nSUNY Downstate Medical Center\n\nJohn Choi, Joseph T. Francis\n\nSUNY Downstate Medical Center & NYU-Poly\n\nJos\u00b4e C. Pr\u00b4\u0131ncipe\n\nUniversity of Florida\n\nAbstract\n\nHypothesis testing on point processes has several applications such as model \ufb01t-\nting, plasticity detection, and non-stationarity detection. Standard tools for hy-\npothesis testing include tests on mean \ufb01ring rate and time varying rate function.\nHowever, these statistics do not fully describe a point process, and therefore, the\nconclusions drawn by these tests can be misleading. In this paper, we introduce\na family of non-parametric divergence measures for hypothesis testing. A diver-\ngence measure compares the full probability structure and, therefore, leads to a\nmore robust test of hypothesis. We extend the traditional Kolmogorov\u2013Smirnov\nand Cram\u00b4er\u2013von-Mises tests to the space of spike trains via strati\ufb01cation, and\nshow that these statistics can be consistently estimated from data without any free\nparameter. We demonstrate an application of the proposed divergences as a cost\nfunction to \ufb01nd optimally matched point processes.\n\n1\n\nIntroduction\n\nNeurons communicate mostly through noisy sequences of action potentials, also known as spike\ntrains. A point process captures the stochastic properties of such sequences of events [1]. Many\nneuroscience problems such as model \ufb01tting (goodness-of-\ufb01t), plasticity detection, change point\ndetection, non-stationarity detection, and neural code analysis can be formulated as statistical infer-\nence on point processes [2, 3]. To avoid the complication of dealing with spike train observations,\nneuroscientists often use summarizing statistics such as mean \ufb01ring rate to compare two point pro-\ncesses. However, this approach implicitly assumes a model for the underlying point process, and\ntherefore, the choice of the summarizing statistic fundamentally restricts the validity of the inference\nprocedure.\n\nOne alternative to mean \ufb01ring rate is to use the distance between the inhomogeneous rate functions,\n\ni.e. R |\u03bb1(t) \u2212 \u03bb2(t)| dt, as a test statistic, which is sensitive to the temporal \ufb02uctuation of the\n\nmeans of the point processes. In general the rate function does not fully specify a point process,\nand therefore, ambiguity occurs when two distinct point processes have the same rate function.\nAlthough physiologically meaningful change is often accompanied by the change in rate, there has\nbeen evidence that the higher order statistics can change without a corresponding change of rate [4,\n5]. Therefore, statistical tools that capture higher order statistics, such as divergences, can improve\nthe state-of-the-art hypothesis testing framework for spike train observations, and may encourage\nnew scienti\ufb01c discoveries.\n\n1\n\n\fIn this paper, we present a novel family of divergence measures between two point processes. Un-\nlike \ufb01ring rate function based measures, a divergence measure is zero if and only if the two point\nprocesses are identical. Applying a divergence measure for hypothesis testing is, therefore, more\nappropriate in a statistical sense. We show that the proposed measures can be estimated from\ndata without any assumption on the underlying probability structure. However, a distribution-free\n(non-parametric) approach often suffers from having free parameters, e.g. choice of kernel in non-\nparametric density estimation, and these free parameters often need to be chosen using computa-\ntionally expensive methods such as cross validation [6]. We show that the proposed measures can\nbe consistently estimated in a parameter free manner, making them particularly useful in practice.\n\nOne of the dif\ufb01culties of dealing with continuous-time point process is the lack of well structured\nspace on which the corresponding probability laws can be described. In this paper we follow a rather\nunconventional approach for describing the point process by a direct sum of Euclidean spaces of\nvarying dimensionality, and show that the proposed divergence measures can be expressed in terms\nof cumulative distribution functions (CDFs) in these disjoint spaces. To be speci\ufb01c, we represent\nthe point process by the probability of having a \ufb01nite number of spikes and the probability of spike\ntimes given that number of spikes, and since these time values are reals, we can represent them in\na Euclidean space using a CDF. We follow this particular approach since, \ufb01rst, CDFs can be easily\nestimated consistently using empirical CDFs without any free parameter, and second, standard tests\non CDFs such as Kolmogorov\u2013Smirnov (K-S) test [7] and Cram\u00b4er\u2013von-Mises (C-M) test [8] are\nwell studied in the literature. Our work extends the conventional K-S test and C-M test on the real\nline to the space of spike trains.\n\nThe rest of the paper is organized as follows; in section 2 we introduce the measure space where\nthe point process is de\ufb01ned as probability measures, in section 3 and section 4 we introduce the\nextended K-S and C-M divergences, and derive their respective estimators. Here we also prove the\nconsistency of the proposed estimators. In section 5, we compare various point process statistics in\na hypothesis testing framework. In section 6 we show an application of the proposed measures in\nselecting the optimal stimulus parameter. In section 7, we conclude the paper with some relevant\ndiscussion and future work guidelines.\n\n2 Basic point process\n\nWe de\ufb01ne a point process to be a probability measure over all possible spike trains. Let \u2126 be the\nset of all \ufb01nite spike trains, that is, each \u03c9 \u2208 \u2126 can be represented by a \ufb01nite set of action potential\ntimings \u03c9 = {t1 \u2264 t2 \u2264 . . . \u2264 tn} \u2208 Rn where n is the number of spikes. Let \u21260, \u21261, \u00b7 \u00b7 \u00b7 denote\nthe partitions of \u2126 such that \u2126n contains all possible spike trains with exactly n events (spikes),\nn=0 \u2126n is a disjoint union, and that \u21260 has only one element\n\nrepresenting the empty spike train (no action potential). See Figure 1 for an illustration.\n\nhence \u2126n = Rn. Note that \u2126 = S\u221e\nclidean spaces; F = \u03c3 (S\u221e\n\nDe\ufb01ne a \u03c3-algebra on \u2126 by the \u03c3-algebra generated by the union of Borel sets de\ufb01ned on the Eu-\nn=0 B (\u2126n)). Note that any measurable set A \u2208 F can be partitioned\nn=0, such that each An is measurable in corresponding measurable space\ninto {An = A \u2229 \u2126n}\u221e\n(\u2126n, B (\u2126n)). Here A denotes a collection of spike trains involving varying number of action po-\ntentials and corresponding action potential timings, whereas An denotes a subset of these spike\ntrains involving only n action potentials each.\nA (\ufb01nite) point process is de\ufb01ned as a probability measure P on the measurable space (\u2126, F) [1].\nLet P and Q be two probability measures on (\u2126, F), then we are interested in \ufb01nding the diver-\ngence d(P, Q) between P and Q, where a divergence measure is characterized by d(P, Q) \u2265 0 and\nd(P, Q) = 0 \u21d0\u21d2 P = Q.\n\n3 Extended K-S divergence\n\nA Kolmogorov-Smirnov (K-S) type divergence between P and Q can be derived from the L1 dis-\ntance between the probability measures, following the equivalent representation,\n\nd1(P, Q) =Z\u2126\n\nd |P \u2212 Q| \u2265 sup\nA\u2208F\n\n|P (A) \u2212 Q(A)| .\n\n(1)\n\n2\n\n\fInhomogeneous Poisson Firing\n\n0\n\n2\n\n3\n\n4\n\n5\n\n6\n8\n\nFigure 1: (Left) Illustration of how the point process space is strati\ufb01ed. (Right) Example of spike\ntrains strati\ufb01ed by their respective spike count.\n\ntime\n\nSince (1) is dif\ufb01cult and perhaps impossible to estimate directly without a model, our strategy is to\nuse the strati\ufb01ed spaces (\u21260, \u21261, . . .) de\ufb01ned in the previous section, and take the supremum only in\nthe corresponding conditioned probability measures. Let Fi = F \u2229 \u2126i := {F \u2229 \u2126i|F \u2208 F}. Since\n\u222aiFi \u2282 F,\n\nd1(P, Q) \u2265 Xn\u2208N\n\nsup\nA\u2208Fn\n\n|P (A) \u2212 Q(A)| = Xn\u2208N\n\nsup\nA\u2208Fn\n\n|P (\u2126n)P (A|\u2126n) \u2212 Q(\u2126n)Q(A|\u2126n)| .\n\nSince each \u2126n is a Euclidean space, we can induce the traditional K-S test statistic by further reduc-\ning the search space to \u02dcFn = {\u00d7i(\u2212\u221e, ti]|t = (t1, . . . , tn) \u2208 Rn}. This results in the following\ninequality,\n\nsup\nA\u2208Fn\n\n|P (A) \u2212 Q(A)| \u2265 sup\nA\u2208 \u02dcFn\n\n|P (A) \u2212 Q(A)| = sup\n\nP (t) \u2212 F (n)\nF (n)\n\n,\n\n(2)\n\nt\u2208Rn(cid:12)(cid:12)(cid:12)\n\nQ (t)(cid:12)(cid:12)(cid:12)\n\nwhere F (n)\ncorresponding to the probability measure P in \u2126n. Hence, we de\ufb01ne the K-S divergence as\n\nP (t) = P [T1 \u2264 t1 \u2227 . . . \u2227 Tn \u2264 tn] is the cumulative distribution function (CDF)\n\nP (\u2126n)F (n)\n\nP (t) \u2212 Q(\u2126n)F (n)\n\n.\n\n(3)\n\ni=1 and Y = {yj}NQ\n\nj=1 from P and Q respectively, we\n\n\u02c6P (\u2126n) \u02c6F (n)\n\nP (t) \u2212 \u02c6Q(\u2126n) \u02c6F (n)\n\n\u02c6P (\u2126n) \u02c6F (n)\n\nP (t) \u2212 \u02c6Q(\u2126n) \u02c6F (n)\n\n(4)\n\nwhere Xn = X \u2229 \u2126n, and \u02c6P and \u02c6FP are the empirical probability and empirical CDF, respectively.\nNotice that we only search the supremum over the locations of the realizations Xn \u222a Yn and not\nonly changes\n\nP (t) \u2212 \u02c6Q(\u2126n) \u02c6F (n)\n\n\u02c6P (\u2126n) \u02c6F (n)\n\ndKS(P, Q) = Xn\u2208N\n\nsup\n\nt\u2208Rn(cid:12)(cid:12)(cid:12)\n\nGiven a \ufb01nite number of samples X = {xi}NP\nhave the following estimator for equation (3).\n\nsup\n\n\u02c6dKS(P, Q) = Xn\u2208N\n= Xn\u2208N\n\nt\u2208Rn(cid:12)(cid:12)(cid:12)\nt\u2208Xn\u222aYn(cid:12)(cid:12)(cid:12)\nthe whole Rn, since the empirical CDF difference(cid:12)(cid:12)(cid:12)\n\nvalues at those locations.\nTheorem 1 (dKS is a divergence).\n\nsup\n\nQ (t)(cid:12)(cid:12)(cid:12)\n\n,\n\nQ (t)(cid:12)(cid:12)(cid:12)\nQ (t)(cid:12)(cid:12)(cid:12)\nQ (t)(cid:12)(cid:12)(cid:12)\n\nd1(P, Q) \u2265 dKS(P, Q) \u2265 0\ndKS(P, Q) = 0 \u21d0\u21d2 P = Q\n\n(5)\n(6)\n\n3\n\n\fProof. The \ufb01rst property and the \u21d0 proof for the second property are trivial. From the de\ufb01nition\nof dKS and properties of CDF, dKS(P, Q) = 0 implies that P (\u2126n) = Q(\u2126n) and F (n)\nP = F (n)\nfor all n \u2208 N. Given probability measures for each (\u2126n, Fn) denoted as Pn and Qn, there exist\ncorresponding unique extended measures P and Q for (\u2126, F) such that their restrictions to (\u2126n, Fn)\ncoincide with Pn and Qn, hence P = Q.\nTheorem 2 (Consistency of K-S divergence estimator). As the sample size approaches in\ufb01nity,\n\nQ\n\n(7)\n\n(cid:12)(cid:12)(cid:12)(cid:12)\n\nsup\n\nt\u2208Rn(cid:12)(cid:12)(cid:12)\n\nProof. Note that |P sup \u00b7 \u2212P sup \u00b7| \u2264 P |sup \u00b7 \u2212 sup \u00b7|. Due to the triangle inequality of the\n\nsupremum norm,\n\nP (\u2126n)F (n)\n\nP (t) \u2212 Q(\u2126n)F (n)\n\n\u02c6P (\u2126n) \u02c6F (n)\n\nP (t) \u2212 \u02c6Q(\u2126n) \u02c6F (n)\n\nP (\u2126n)F (n)\n\nP (t) \u2212 Q(\u2126n)F (n)\n\n\u02c6P (\u2126n) \u02c6F (n)\n\nP (t) \u2212 \u02c6Q(\u2126n) \u02c6F (n)\n\n.\n\nAgain, using the triangle inequality we can show the following:\n\u02c6P (\u2126n) \u02c6F (n)\n\nP (\u2126n)F (n)\n\nP (t) \u2212 \u02c6Q(\u2126n) \u02c6F (n)\n\na.u.\u2212\u2212\u2192 0\n\n\u2212 sup\n\n(cid:12)(cid:12)(cid:12)\ndKS \u2212 \u02c6dKS(cid:12)(cid:12)(cid:12)\nQ (t)(cid:12)(cid:12)(cid:12)\nt\u2208Rn(cid:12)(cid:12)(cid:12)\nQ (t)(cid:12)(cid:12)(cid:12)\n\u2212(cid:12)(cid:12)(cid:12)\nQ (t)(cid:12)(cid:12)(cid:12)\n\u2212(cid:12)(cid:12)(cid:12)\n\nQ (t) \u2212 \u02c6P (\u2126n) \u02c6F (n)\nP (t) \u2212 Q(\u2126n)F (n)\nP (t) + \u02c6Q(\u2126n) \u02c6F (n)\n\nP (t) + \u02c6Q(\u2126n) \u02c6F (n)\nQ (t) + Q(\u2126n) \u02c6F (n)\nQ (t)\nQ (t) \u2212 Q(\u2126n) \u02c6F (n)\n\n(cid:12)(cid:12)(cid:12)(cid:12)\nQ (t)(cid:12)(cid:12)(cid:12)\nQ (t)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\nQ (t)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\nQ (t)(cid:12)(cid:12)(cid:12)\nQ (t)(cid:12)(cid:12)(cid:12)\n\nP (\u2126n)F (n)\n\nP (\u2126n)F (n)\n\n\u2264 sup\n\nt\u2208Rn(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\n\u2264(cid:12)(cid:12)(cid:12)\n=(cid:12)(cid:12)(cid:12)\n\u2264P (\u2126n)(cid:12)(cid:12)(cid:12)\nP (t)(cid:12)(cid:12)(cid:12)\n\n+ \u02c6F (n)\n\n+P (\u2126n) \u02c6F (n)\n\nP (t) \u2212 Q(\u2126n)F (n)\nP (t) \u2212 Q(\u2126n)F (n)\nP (t) \u2212 P (\u2126n) \u02c6F (n)\nP (t) \u2212 \u02c6P (\u2126n) \u02c6F (n)\nF (n)\nP (t) \u2212 \u02c6F (n)\n\nP (t)(cid:12)(cid:12)(cid:12)\nP (\u2126n) \u2212 \u02c6P (\u2126n)(cid:12)(cid:12)(cid:12)\n\n+ Q(\u2126n)(cid:12)(cid:12)(cid:12)\nQ (t)(cid:12)(cid:12)(cid:12)\n\n+ \u02c6F (n)\n\nF (n)\nQ (t) \u2212 \u02c6F (n)\n\nQ (t)(cid:12)(cid:12)(cid:12)\nQ(\u2126n) \u2212 \u02c6Q(\u2126n)(cid:12)(cid:12)(cid:12)\n\n.\n\nThen the theorem follows from the Glivenko-Cantelli theorem, and \u02c6P , \u02c6Q a.s.\u2212\u2212\u2192 P, Q.\n\nNotice that the inequality in (2) can be made stricter by considering the supremum over not just the\nproduct of the segments (\u2212\u221e, ti] but over the all 2n \u2212 1 possible products of the segments (\u2212\u221e, ti]\nand [ti, \u221e) in n dimensions [7]. However, the latter approach is computationally more expensive,\nand therefore, in this paper we only explore the former approach.\n\n4 Extended C-M divergence\n\nWe can extend equation (3) to derive a Cram\u00b4er\u2013von-Mises (C-M) type divergence for point pro-\ncesses. Let \u00b5 = P + Q/2, then P, Q are absolutely continuous with respect to \u00b5. Note that,\nF (n)\nP , F (n)\nQ \u2208 L2(\u2126n, \u00b5|n) where |n denotes the restriction on \u2126n, i.e. the CDFs are L2 integrable,\nsince they are bounded. Analogous to the relation between K-S test and C-M test, we would like to\nuse the integrated squared deviation statistics in place of the maximal deviation statistic. By inte-\ngrating over the probability measure \u00b5 instead of the supremum operation, and using L2 instead of\nL\u221e distance, we de\ufb01ne\n\ndCM (P, Q) = Xn\u2208NZRn(cid:16)P (\u2126n)F (n)\n\nQ (t)(cid:17)2\n\nP (t) \u2212 Q(\u2126n)F (n)\n\nd\u00b5|n(t).\n\n(8)\n\nThis can be seen as a direct extension of the C-M criterion. The corresponding estimator can be\nderived using the strong law of large numbers,\n\n\u02c6dCM (P, Q) = Xn\u2208N\" 1\n\n2Xi (cid:16) \u02c6P (\u2126n) \u02c6F (n)\n\nP (x(n)\n\ni\n\n) \u2212 \u02c6Q(\u2126n) \u02c6F (n)\n\nQ (x(n)\n\ni\n\n)(cid:17)2\n\n+\n\n1\n\n2Xi (cid:16) \u02c6P (\u2126n) \u02c6F (n)\n\nP (y(n)\n\ni\n\n) \u2212 \u02c6Q(\u2126n) \u02c6F (n)\n\nQ (y(n)\n\ni\n\n4\n\n)(cid:17)2# .\n\n(9)\n\n\fTheorem 3 (dCM is a divergence). For P and Q with square integrable CDFs,\n\ndCM (P, Q) \u2265 0\n\ndCM (P, Q) = 0 \u21d0\u21d2 P = Q.\n\n(10)\n(11)\n\nProof. Similar to theorem 1.\n\nTheorem 4 (Consistency of C-M divergence estimator). As the sample size approaches in\ufb01nity,\n\na.u.\u2212\u2212\u2192 0\n\n(12)\n\n(cid:12)(cid:12)(cid:12)\n\ndCM \u2212 \u02c6dCM(cid:12)(cid:12)(cid:12)\n\nProof. Similar to (7), we \ufb01nd an upper bound and show that the bound uniformly converges to\nzero. To simplify the notation, we de\ufb01ne gn(x) = P (\u2126n)F (n)\nQ (x), and \u02c6gn(x) =\na.u.\u2212\u2212\u2192 g by the Glivenko-Cantelli theorem and\n\u02c6P (\u2126n) \u02c6F (n)\n\u02c6P a.s.\u2212\u2212\u2192 P by the strong law of large numbers.\n\nP (x(n)) \u2212 \u02c6Q(\u2126n) \u02c6F (n)\n\nP (x) \u2212 Q(\u2126n)F (n)\n\nQ (x(n)). Note that \u02c6gn\n\n1\n\n=\n\n(cid:12)(cid:12)(cid:12)\n\n\u02c6gn(yi)2(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\n\nndQ|n \u2212Xn\u2208NXi\n\ndCM \u2212 \u02c6dCM(cid:12)(cid:12)(cid:12)\n\n2(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\nXn\u2208NZ g2\nndP |n +Xn\u2208NZ g2\n\u02c6gn(xi)2 \u2212Xn\u2208NXi\n=(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\nnd \u02c6Q|n(cid:21)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\nXn\u2208N(cid:20)Z g2\nnd \u02c6P |n +Z g2\nndQ|n \u2212Z \u02c6g2\nndP |n \u2212Z \u02c6g2\n\u2264(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\nXn\u2208N(cid:20)(cid:12)(cid:12)(cid:12)(cid:12)\nnd \u02c6Q|n(cid:12)(cid:12)(cid:12)(cid:12)\nnd \u02c6P |n(cid:12)(cid:12)(cid:12)(cid:12)\n+(cid:12)(cid:12)(cid:12)(cid:12)\nndQ|n \u2212Z \u02c6g2\nZ g2\nndP |n \u2212Z \u02c6g2\nZ g2\nwhere \u02c6P = Pi \u03b4(xi) and \u02c6Q = Pi \u03b4(yi) are the corresponding empirical measures. Without loss\nof generality, we only \ufb01nd the bound on(cid:12)(cid:12)(cid:12)R g2\nnd \u02c6P |n(cid:12)(cid:12)(cid:12)\nndP |n \u2212R \u02c6g2\nnd \u02c6P |n(cid:12)(cid:12)(cid:12)(cid:12)\nndP |n \u2212Z \u02c6g2\nndP |n +Z \u02c6g2\nndP |n \u2212Z \u02c6g2\nZ g2\n(cid:12)(cid:12)(cid:12)(cid:12)\nn(cid:1) dP |n(cid:12)(cid:12)(cid:12)(cid:12)\n\u2212(cid:12)(cid:12)(cid:12)(cid:12)\nnd(cid:16)P |n \u2212 \u02c6P |n(cid:17)(cid:12)(cid:12)(cid:12)(cid:12)\n(cid:12)(cid:12)(cid:12)(cid:12)\nZ (cid:0)g2\nZ \u02c6g2\n\nApplying Glivenko-Cantelli theorem and strong law of large numbers, these two terms converges\nsince \u02c6g2\n\nn is bounded. Hence, we show that the C-M test estimator is consistent.\n\nZ g2\n\nndP |n \u2212Z \u02c6g2\n\n, then the rest is bounded similarly\n\n(cid:21)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\n\nnd \u02c6P |n(cid:12)(cid:12)(cid:12)(cid:12)\n\n=(cid:12)(cid:12)(cid:12)(cid:12)\n\u2264(cid:12)(cid:12)(cid:12)(cid:12)\n\nfor Q.\n\n(cid:12)(cid:12)(cid:12)(cid:12)\n\nn \u2212 \u02c6g2\n\n5 Results\n\nWe present a set of two-sample problems and apply various statistics to perform hypothesis test-\ning. As a baseline measure, we consider the widely used Wilcoxon rank-sum test (or equiva-\nlently, the Mann-Whitney U test) on the count distribution (e.g. [9]), which is a non-parametric\nmedian test for the total number of action potentials, and the integrated squared deviation statistic\n\n\u03bbL2 = R (\u03bb1(t) \u2212 \u03bb2(t))2 dt, where \u03bb(t) is estimated by smoothing spike timing with a Gaussian\n\nkernel, evaluated at a uniform grid at least an order of magnitude smaller than the standard deviation\nof the kernel. We report the performance of the test with varying kernel sizes.\n\nAll tests are quanti\ufb01ed by the power of the test given a signi\ufb01cance threshold (type-I error) at 0.05.\nThe null hypothesis distribution is empirically computed by either generating independent samples\nor by permuting the data to create at least 1000 values.\n\n5.1 Stationary renewal processes\n\nRenewal process is a widely used point process model that compensates the deviation from Poisson\nprocess [10]. We consider two stationary renewal processes with gamma interval distributions. Since\nthe mean rate of the two processes are the same, the rate function statistic and Wilcoxon test does\n\n5\n\n\f20\n\n15\n\n10\n\n5\n\n0\n\n0\n\n20\n\n15\n\n10\n\n5\n\n0\n\n0\n\nH\n0\n\nr\ne\nw\no\np\n\n1\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\n1\n\nH\n1\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\n1\n\ntime (sec)\n\n \n\n82 111 150\n\nK\u2212S\nC\u2212M\n\u03bb\nL2\n\u03bb\nL2\n\u03bb\nL2\nN\n\n 10 ms\n 100 ms\n 1 ms\n\n \n10\n\n14 18\n\n33\n\n25\n61\nNumber of samples\n\n45\n\nFigure 2: Gamma distributed renewal process with shape parameter \u03b8 = 3 (H0) and \u03b8 = 0.5 (H1).\nThe mean number of action potential is \ufb01xed to 10. (Left) Spike trains from the null and alternate\nhypothesis. (Right) Comparison of the power of each method. The error bars are standard deviation\nover 20 Monte Carlo runs.\n\nnot yield consistent result, while the proposed measures obtain high power with a small number of\nsamples. The C-M test is more powerful than K-S in this case; this can be interpreted by the fact\nthat the difference in the cumulative is not concentrated but spread out over time because of the\nstationarity.\n\n5.2 Precisely timed spike trains\n\nWhen the same stimulation is presented to a neuronal system, the observed spike trains some-\ntimes show a highly repeatable spatio-temporal pattern at the millisecond time scale. Recently\nthese precisely timed spike trains (PTST) are abundantly reported both in vivo and in vitro prepa-\nrations [11, 12, 13]. Despite being highly reproducible, different forms of trial-to-trial variability\nhave also been observed [14]. It is crucial to understand this variability since for a system to utilize\nPTSTs as a temporal code, it should presumably be robust to its variability structure, and possibly\nlearn to reduce it [15].\n\nA precisely timed spike train in an interval is modeled by L number of probability density and\ni=1. Each fi(t) corresponds to the temporal jitter, and pi corresponds\nprobability pairs {(fi(t), pi)}L\nto the probability of generating the spike. Each realization of the PTST model produces at most\n\nL spikes. The equi-intensity Poisson process has the rate function \u03bb(t) = Pi pifi(t). We test if\n\nthe methods can differentiate between the PTST (H0) and equi-intensity Poisson process (H1) for\nL = 1, 2, 3, 4 (see Figure 3 for the L = 4 case). Note that L determines the maximum dimension for\nthe PTST. fi(t) were equal variance Gaussian distribution on a grid sampled from a uniform random\nvariable, and pi = 0.9.\nAs shown in Figure 3, only the proposed methods perform well. Since the rate function pro\ufb01le is\nidentical for both models, the rate function statistic \u03bbL2 fails to differentiate. The Wilcoxon test does\nwork for intermediate dimensions, however its performance is highly variable and unpredictable. In\ncontrast to the previous example, the K-S test is consistently better than the C-M statistic in this\nproblem.\n\n6 Optimal stimulation parameter selection\n\nGiven a set of point processes, we can \ufb01nd the one which is closest to a target point process in terms\nof the proposed divergence. Here we use this method on a real dataset obtained from the somatosen-\nsory system of an anesthetized rat (see supplement for procedure). Speci\ufb01cally, we address \ufb01nding\n\n6\n\n\fH\n0\n\n0.2\n\n0.6\n0.4\ntime (ms)\n\n0.8\n\n1\n\n \n\nr\ne\nw\no\np\n\nd\nCM\nd\nCM\nd\nCM\nd\nCM\nd\nKS\nd\nKS\nd\nKS\nd\nKS\n\n L=1\n L=2\n L=3\n L=4\n L=1\n L=2\n L=3\n L=4\n\n20\n\n15\n\n10\n\n5\n\n0\n\n0\n\n1\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n20\n\n15\n\n10\n\n5\n\n0\n\n0\n\n1\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\nr\ne\nw\no\np\n\n0\n\n \n\n19\n\n37\n136\nNumber of samples\n\n71\n\n261\n\n500\n\n0\n\n \n\n19\n\nH\n1\n\n0.2\n\n0.6\n0.4\ntime (ms)\n\n0.8\n\n1\n\n \n\nN L=1\nN L=2\nN L=3\nN L=4\n\n261\n\n500\n\n37\n\n71\n\n136\nnumber of samples\n\nFigure 3: [Top] Precisely timed spike train model (H0) versus equi-intensity Poisson process (H1).\nSpike trains from the null and alternate hypothesis for L = 4. [Bottom] Comparison of the power\nof each method for L = 1, 2, 3, 4 on precisely timed spike train model (H0) versus equi-intensity\nPoisson process (H1). (Left) Power comparison for methods except for N. The rate statistic \u03bbL2 are\nnot labeled, since they are not able to detect the difference. (Right) Wilcoxon test on the number of\naction potentials. The error bars are standard deviation over 10 Monte Carlo runs.\n\noptimal electrical stimulation settings to produce cortical spiking patterns similar to those observed\nwith tactile stimuli.\n\nThe target process has 240 realizations elicited by tactile stimulation of the ventral side of the \ufb01rst\ndigit with a mechanical tactor. We seek the closest out of 19 processes elicited by electrical stim-\nulation in the thalamus. Each process has 140 realizations that correspond to a particular setting\nof electrical stimulation. The settings correspond to combinations of duration and amplitude for\nbiphasic current injection on two adjacent channels in the thalamus. The channel of interest and the\nstimulating channels were chosen to have signi\ufb01cant response to tactile stimulation.\nThe results from applying the C-M, K-S, and \u03bbL2 measures between the tactile responses and the sets\nfrom each electrical stimulation setting are shown Figure 4. The overall trend among the measures\nis consistent, but the location of the minima does not coincide for \u03bbL2.\n\n7 Conclusion\n\nIn this paper, we have proposed two novel measures of divergence between point processes. The\nproposed measures have been derived from the basic probability law of a point process and we have\nshown that these measures can be ef\ufb01ciently estimated consistently from data. Using divergences for\nstatistical inference transcends \ufb01rst and second order statistics, and enables distribution-free spike\ntrain analysis.\n\nThe time complexity of both methods is O(cid:0)Pn n(cid:2)NP (n)NQ(n) + N 2\n\nNP (n) is the number of spike trains from P that has n spikes. In practice this is often faster than\n\nQ(n)(cid:3)(cid:1) where\n\nP (n) + N 2\n\n7\n\n\fTactile\n\n#15 (100uA,125\u00b5s)\n\n#17 (100uA,175\u00b5s)\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\nK\u2212S\nC\u2212M\n\n\u03bb\n\nL2\n\nt\n\nn\nu\no\nc\n \ny\nb\n\n \n\nd\ne\n\nt\nr\no\ns\n \ns\na\ni\nr\nT\n\nl\n\ni\n\ne\nk\np\ns\n \nt\ns\n1\n\n \n\nn\ne\nh\n\nt\n\nn\nb\n\ni\n\ni\n\n \nr\ne\np\n \ns\ne\nk\np\ns\n \ne\ng\na\nr\ne\nv\nA\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19\n\n0\n\n0.02\n\n0.04\n\n0\n\nParameter index (sorted by duration then amplitude)\n\n0.02\n\nTime (s)\n\n0.04\n\n0\n\n0.02\n\n0.04\n\nFigure 4: (Left) Dissimilarity/divergences from tactile response across parameter sets. The values\nof each measure are shifted and scaled to be in the range of 0 to 1. \u03bbL2 uses 2.5 ms bins with\nno smoothing. (Right) Responses from the tactile response (left), stimulation settings selected by\n\u03bbL2 (center), and the realizations selected by K-S and C-M (right). Top row shows the spike trains\nstrati\ufb01ed into number of spikes and then sorted by spike times. Bottom row shows the average\nresponse binned at 2.5 ms; the variance is shown as a thin green line.\n\nthe binned rate function estimation which has time complexity O(BN ) where B is the number of\n\nbins and N = Pn n(NP (n) + NQ(n)) is the total number of spikes in all the samples. Although,\n\nwe have observed that the statistic based on the L2 distance between the rate functions often outper-\nforms the proposed method, this approach involves the search for the smoothing kernel size and bin\nsize which can make the process slow and prohibitive. In addition, it brings the danger of multiple\ntesting, since some smoothing kernel sizes may pickup spurious patterns that are only \ufb02uctuations\ndue to \ufb01nite samples size.\n\nA similar approach based on strati\ufb01cation has also been addressed in [16], where the authors have\ndiscussed the problem of estimating Hellinger distance between two point processes. Although\nconceptually similar, the advantage of the proposed approach is that it is parameter free, whereas the\nother approach requires selecting appropriate kernels and the corresponding kernel sizes for each\nEuclidean partitions. However, a strati\ufb01cation-based approach suffers in estimation when the count\ndistributions of the point processes under consideration are \ufb02at, since in this situation the spike\ntrain realizations tend to exist in separate Euclidean partitions, and given a \ufb01nite set of realizations,\nit becomes dif\ufb01cult to populate each partition suf\ufb01ciently. Therefore, other methods should be\ninvestigated that allow two spike trains to interact irrespective of their spike counts. Other possible\napproaches include the kernel-based divergence measures as proposed in [17], since the measures\ncan be applied to any abstract space. However, it requires desinging an appropriate strictly positive\nde\ufb01nite kernel on the space of spike trains.\n\nIn this literature, we have presented the divergences in the context of spike trains generated by\nneurons. However, the proposed methods can be used for general point processes, and can be\napplied to other areas. Although we have proved consistency of the proposed measures, further\nstatistical analysis such as small sample power analysis, rate of convergence, and asymptotic prop-\nerties would be interesting to address. A MATLAB implementation is freely available on the web\n(http://code.google.com/p/iocane) with BSD-license.\n\nAcknowledgment\n\nThis work is partially funded by NSF Grant ECCS-0856441 and DARPA Contract N66001-10-C-\n2008.\n\n8\n\n\fReferences\n\n[1] D. J. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes. Springer,\n\n1988.\n\n[2] D. H. Johnson, C. M. Gruner, K. Baggerly, and C. Seshagiri. Information-theoretic analysis of\n\nneural coding. Journal of Computational Neuroscience, 10(1):47\u201369, 2001.\n\n[3] J. D. Victor. Spike train metrics. Current Opinion in Neurobiology, 15:585\u2013592, 2005.\n[4] A. Kuhn, A. Aertsen, and S. Rotter. Higher-order statistics of input ensembles and the response\n\nof simple model neurons. Neural Computation, 15(1):67\u2013101, 2003.\n\n[5] F. Rieke, D. Warland, R. de Ruyter van Steveninck, and W. Bialek. Spikes: exploring the\n\nneural code. MIT Press, Cambridge, MA, USA, 1999.\n\n[6] B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman and Hall,\n\nNew York, 1986.\n\n[7] G. Fasano and A. Franceschini. A multidimensional version of the Kolmogorov\u2013Smirnov test.\n\nRoyal Astronomical Society, Monthly Notices, 225:155\u2013170, 1987.\n\n[8] T. W. Anderson. On the distribution of the two-sample Cram\u00b4er\u2013von-Mises criterion. Annals\n\nof Mathematical Statistics, 33(3):1148\u20131159, 1962.\n\n[9] A. Kepecs, N. Uchida, H. A. Zariwala, and Z. F. Mainen. Neural correlates, computation and\n\nbehavioural impact of decision con\ufb01dence. Nature, 455(7210):227\u2013231, 2008.\n\n[10] M. P. P. Nawrot, C. Boucsein, V. R. Molina, A. Riehle, A. Aertsen, and S. Rotter. Measurement\nof variability dynamics in cortical spike trains. Journal of Neuroscience Methods, 169(2):374\u2013\n390, 2008.\n\n[11] P. Reinagel and R. Clay Reid. Precise \ufb01ring events are conserved across neurons. Journal of\n\nNeuroscience, 22(16):6837\u20136841, 2002.\n\n[12] M. R. DeWeese, M. Wehr, and A. M. Zador. Binary spiking in auditory cortex. Journal of\n\nNeuroscience, 23(21):7940\u20137949, 2003.\n\n[13] R. S. Johansson and I. Birznieks. First spikes in ensembles of human tactile afferents code\n\ncomplex spatial \ufb01ngertip events. Nature Neuroscience, 7(2):170\u2013177, 2004.\n\n[14] P. Tiesinga, J. M. Fellous, and T. J. Sejnowski. Regulation of spike timing in visual cortical\n\ncircuits. Nature Reviews Neuroscience, 9:97\u2013107, 2008.\n\n[15] S. M. Bohte and M. C. Mozer. Reducing the variability of neural responses: A computational\n\ntheory of spike-timing-dependent plasticity. Neural Computation, 19(2):371\u2013403, 2007.\n\n[16] I. Park and J. C. Pr\u00b4\u0131ncipe. Quanti\ufb01cation of inter-trial non-stationarity in spike trains from\nperiodically stimulated neural cultures. In Proceedings of IEEE International Conference on\nAcoustics, Speech and Signal Processing, 2010. Special session on Multivariate Analysis of\nBrain Signals: Methods and Applications.\n\n[17] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Sch\u00a8olkopf, and A. J. Smola. A kernel method\n\nfor the two-sample problem. CoRR, abs/0805.2368, 2008.\n\n9\n\n\f", "award": [], "sourceid": 751, "authors": [{"given_name": "Sohan", "family_name": "Seth", "institution": null}, {"given_name": "Park", "family_name": "Il", "institution": null}, {"given_name": "Austin", "family_name": "Brockmeier", "institution": null}, {"given_name": "Mulugeta", "family_name": "Semework", "institution": null}, {"given_name": "John", "family_name": "Choi", "institution": null}, {"given_name": "Joseph", "family_name": "Francis", "institution": null}, {"given_name": "Jose", "family_name": "Principe", "institution": null}]}