{"title": "Mixed vine copulas as joint models of spike counts and local field potentials", "book": "Advances in Neural Information Processing Systems", "page_first": 1325, "page_last": 1333, "abstract": "Concurrent measurements of neural activity at multiple scales, sometimes performed with multimodal techniques, become increasingly important for studying brain function. However, statistical methods for their concurrent analysis are currently lacking. Here we introduce such techniques in a framework based on vine copulas with mixed margins to construct multivariate stochastic models. These models can describe detailed mixed interactions between discrete variables such as neural spike counts, and continuous variables such as local field potentials. We propose efficient methods for likelihood calculation, inference, sampling and mutual information estimation within this framework. We test our methods on simulated data and demonstrate applicability on mixed data generated by a biologically realistic neural network. Our methods hold the promise to considerably improve statistical analysis of neural data recorded simultaneously at different scales.", "full_text": "Mixed vine copulas as joint models\n\nof spike counts and local \ufb01eld potentials\n\nArno Onken\n\nIstituto Italiano di Tecnologia\n38068 Rovereto (TN), Italy\n\narno.onken@iit.it\n\nStefano Panzeri\n\nIstituto Italiano di Tecnologia\n38068 Rovereto (TN), Italy\nstefano.panzeri@iit.it\n\nAbstract\n\nConcurrent measurements of neural activity at multiple scales, sometimes per-\nformed with multimodal techniques, become increasingly important for studying\nbrain function. However, statistical methods for their concurrent analysis are cur-\nrently lacking. Here we introduce such techniques in a framework based on vine\ncopulas with mixed margins to construct multivariate stochastic models. These\nmodels can describe detailed mixed interactions between discrete variables such\nas neural spike counts, and continuous variables such as local \ufb01eld potentials.\nWe propose ef\ufb01cient methods for likelihood calculation, inference, sampling and\nmutual information estimation within this framework. We test our methods on sim-\nulated data and demonstrate applicability on mixed data generated by a biologically\nrealistic neural network. Our methods hold the promise to considerably improve\nstatistical analysis of neural data recorded simultaneously at different scales.\n\n1\n\nIntroduction\n\nThe functions of the brain likely rely on the concerted interaction of its microscopic, mesoscopic\nand macroscopic systems. Concurrent recordings of signals at different scales, such as simultaneous\nmeasurements of \ufb01eld potential and single-cell spiking activity or other multimodal measures such as\nconcurrent electrophysiological and fMRI measures, are leading to rapid advances in understanding\nbrain dynamics [16]. Analysis of these concurrent data is complicated by the great difference in\nnature (e.g. discrete vs. continuous) and signal-to-noise ratio of each type of neural signal. To\ntake full advantage of these data, \ufb02exible statistical models that take into account many variables\nwith different statistics and their dependencies are needed. Recently, construction of multivariate\nstatistical models based on the concept of copulas has attracted a lot of attention [9]. Intuitively,\na copula represents a particular relationship between a set of random variables that, together with\nseparate margin models of the individual elements can be used to construct a joint statistical model.\nThis approach has become an indispensable tool in economics, \ufb01nance and risk management in\nboth theoretical and practical applications [9, 13, 11]. Yet, despite their promise, application to\nneuroscience has been limited [10, 14, 19]. The copula approach is general and, in principle,\napplicable to model mixed discrete and continuous statistics. Speci\ufb01c cases of mixed discrete and\ncontinuous copula-based models with parametric distributions were recently applied in clinical\napplications [24, 7]. Racine [17] proposed nonparametric mixed copula distributions based on\nkernel density estimators. In most studies, however, the elements of the copula-based multivariate\ndistributions are all continuous [9, 13, 11]. A reason for this is that in the general case, likelihood\ncalculation has exponential complexity in the number of discrete elements, limiting the usefulness\nof the models.\nIn particular, these methods are impractical for likelihood-based estimation of\ninformation-theoretic quantities which requires many likelihood evaluations. Smith and Khaled [23]\nrecently proposed a copula-based framework with quadratic complexity, but limited to fully discrete\ndistributions. For valuable applications in neuroscience settings, however, we need a framework\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fthat can overcome these limitations and cope with elements (i.e. number of neurons, activity sites)\nthat have different statistical properties - some continuous and others discrete - while still allowing\nef\ufb01cient likelihood calculation.\nHere, we develop a framework to accomplish these goals by means of vine copulas with mixed\ndiscrete and continuous margins. We describe methods to make numeric model selection, parameter\n\ufb01tting and sampling scale ef\ufb01ciently with the number of elements and apply these methods to estimate\ninformation-theoretic quantities. To demonstrate our framework, we draw samples from mixed\nmodels and simulate mixed activity in a biologically realistic neural network. We then apply our\nmethods to these data and show that our methods outperform corresponding mixed independent and\nfully continuous models.\n\n2 Mixed vine copulas\n\nOur goal is to construct multivariate distributions with arbitrary mixed margins and a wide range of\npossible dependence structures. To accomplish this goal, we apply an approach that individuates the\nmargin part and the dependence part. The dependence is represented by a copula. Brie\ufb02y, a copula\nis de\ufb01ned as a multivariate distribution function with support on the unit hypercube and uniform\nmargins [13]. We will denote multivariate random variables by X with elements Xi. We denote\nthe cumulative distribution function (CDF) of X by FX with margin CDFs Fi. For consistency of\nnotation, we will denote probability density functions as well as probability mass functions by fX\nwith margins fi.\n\n2.1 Mixed copula-based models\n\nThe great strength of copulas is their utility for constructing and decomposing multivariate distri-\nbutions. Sklar\u2019s Theorem [21, 13] lays out the theoretical foundations for this. According to this\ntheorem, every CDF FX can be decomposed into margins F1, . . . , Fd and a copula C such that\n\nFX (x1, . . . , xd) = C(F1(x1), . . . , Fd(xd))\n\n(1)\nand, conversely, margins F1, . . . , Fd, a copula C and Eq. 1 can be used to construct a CDF FX. In\nthis decomposition, C is unique on the range of X. Sklar\u2019s Theorem holds for mixed discrete and\ncontinuous distributions and thus provides a method to construct multivariate mixed distributions\nbased on CDFs of copulas and margins. The important point here is that the approach yields a\ncumulative distribution function FX of a multivariate random variable X, not its likelihood fX\nwhich we need for inference and other tasks (c.f. Section 2.5). Thus, we need to calculate the\nlikelihood fX based on the cumulative distribution function FX.\nW.l.o.g., let X1, . . . , Xn be discrete and Xn+1, . . . , Xd be continuous. By calculating the mixed\nderivative of Eq. 1, we obtain the probability density function of the mixed distribution of X:\n\n(cid:88)\n\n\u00b7\u00b7\u00b7 (cid:88)\n\n(\u22121)m1+\u00b7\u00b7\u00b7+mn\n\nfX (x1, . . . , xd) =\n\n\u2202d\u2212nC\n\n\u2202un+1 . . . \u2202ud\n\nm1=0,1\n\nmn=0,1\n\n(F1(x1 \u2212 m1), . . . , Fn(xn \u2212 mn), Fn+1(xn+1), . . . , Fd(xd))\n\nd(cid:89)\n\ni=n+1\n\nfi(xi).\n\n(2)\n\nNote that the number of terms in the sum grows exponentially with the number of discrete variables.\nIn general, the exponential number of terms prevents us from a direct evaluation of this equation.\nNevertheless, we will see in the next section that we need to calculate the probability density function\nfor likelihood-based estimation of differential entropy and mutual information. Therefore, we need an\nef\ufb01cient way to calculate the probability density function that is tractable for many discrete variables.\nWe will introduce methods to accomplish this in Section 2.5.\n\n2.2\n\nInformation estimation with copulas and mixed margins\n\nh(X) = \u2212(cid:82) fX (x) log2 fX (x)dx, where fX is a multivariate density which can also have mixed\n\nFor continuous as well as mixed multivariate distributions, differential entropy h(X) is de\ufb01ned as\n\nmargins like the one in Eq. 2 [6, 20]. With this, the mutual information I(X; Y ) between two\n\n2\n\n\fmultivariate random variables X and Y with potentially mixed margins is given by I(X; Y ) =\nh(X) + h(Y ) \u2212 h(X, Y ), where h(X, Y ) is the joint differential entropy of the joint distribution\n(X, Y ) with joint density fX,Y [6, 20]. For high dimensional distributions, evaluation of the integral\nover the support of fX is unfeasible. However, we can estimate the differential entropy and thereby\nthe mutual information by means of classical Monte Carlo (MC) estimation [18]. We express the\nentropy as an expectation over fX and approximate the expectation by the empirical average by\nproducing a large number of samples x1, . . . , xk from X:\n\nh(X) = EfX [\u2212 log2 fX (X)] \u2248(cid:99)hk := \u2212 1\n\nk(cid:88)\n\nk\n\nlog2(fX (xj))\n\n(3)\n\n1\n\nj=1\n\n(cid:17)2\n\nBy the strong law of large numbers, (cid:99)hk converges almost surely to h(X). Moreover, we\n(cid:105) \u2248\n(cid:104)(cid:99)hk\ncan assess the convergence of (cid:99)hk by estimating the sample variance of (cid:99)hk: Var\n(cid:80)k\n\n(cid:16)\u2212 log2(fX (xj)) \u2212(cid:99)hk\n\n. With this estimate, the term (cid:99)hk\u2212h(X)\nVar[(cid:99)hk]\n\nk+1\ndard normal distributed, allowing us to obtain con\ufb01dence intervals for our differential entropy\nestimate [18]. This shows that there are two requisites for the MC procedure to estimate entropy and\nmutual information for a mixed distribution: 1) an ef\ufb01cient sampling procedure to produce samples\nxj from X, and 2) a tractable method for calculating the density fX (xj). We will introduce the\nformer in Section 2.4 and the latter in Section 2.5. In the next section we will describe a copula\ndecomposition that makes these ef\ufb01cient methods possible.\n\nis approximately stan-\n\n(cid:113)\n\nj=1\n\n2.3 Pair copula constructions\n\nThe number of available high-dimensional copula families is quite limited while there are an abun-\ndant number of bivariate copula families. The pair copula construction provides a \ufb02exible way to\nconstruct higher-dimensional copulas from bivariate copulas [1]. The idea of pair copula models is to\nfactorize the multivariate distribution into conditional distributions and to describe these conditional\ndistributions by means of bivariate copulas modeling dependence of two variables at a time. Special\npair copula constructions, called regular vine copula structures, assume conditional independence\nbetween speci\ufb01c elements of the distribution, allowing us to circumvent the curse of dimensionality in\nlikelihood evaluation and sampling. More speci\ufb01cally, a vine can be represented as a hierarchical set\nof trees where each node corresponds to a conditional distribution function and each edge corresponds\nto a pair copula. The nodes of the lowest tree are the unconditional distribution margins with empty\nconditioning sets. Each tree in the hierarchy incorporates additional variables into the conditioning\nsets by means of its pair copulas. The results of these couplings then form the nodes of the next\ntree in the hierarchy, thus extending the conditioning sets from tree to tree. Here we focus on the\ncanonical vine or C-vine in which each tree in the hierarchy has a unique node that is connected to all\nother nodes [1].\nIn this section, F (xi|xj1, . . . , xjk ) denotes the conditional cumulative distribution function of Xi\ngiven Xj1, . . . , Xjk. In the C-vine, the multivariate model likelihood is factorized as follows [1]:\ncj,i+j|1,...,j\u22121(F (xj|x1, . . . , xj\u22121), F (xi+j|x1, . . . , xj\u22121))\n(4)\nThe C-vine is a good choice if there are outstanding variables with important dependencies to many\nother variables [2]. Such situations are commonly encountered in electrophysiology recordings where\nthe same electrode might record a local \ufb01eld potential (LFP, acting as the outstanding variable) and\nstatistically dependent spikes from nearby neurons.\n\nfX (x1, . . . , xd) =\n\nd\u22121(cid:89)\n\nd\u2212j(cid:89)\n\nd(cid:89)\n\nf (xk)\n\nk=1\n\nj=1\n\ni=1\n\n2.4 Sampling from mixed canonical vines\n\nFor a vine with mixed margins, we sample from the corresponding continuous vine and apply the\ninversion method with the inverse of the margin cumulative distribution function to obtain mixed\ndiscrete and continuous samples.\nIn the following, \u2202C\n\u2202u1\nand \u2202C\n\u2202u2\n\ndenotes the partial derivative of the copula C with respect to its \ufb01rst argument\ndenotes the partial derivative with respect to the second argument. For mixed C-vine\n\n3\n\n\fsampling, we take the algorithm for sampling from a continuous C-vine copula with uniform\nmargins [1] and extend it by means of the inversion method to attach arbitrary continuous and discrete\nmargins. The algorithm requires (d \u2212 2)(d \u2212 1)/2 + d cumulative distribution function evaluations:\n\n1. Sample w1, . . . , wd i.i.d. uniform on [0, 1].\n2. v1,1 = w1.\n3. x1 = F \u22121\n(v1,1).\n4. For i = 2, . . . , d:\n\n1\n\n(a) vi,1 = wi.\n(b) For k = i \u2212 1, i \u2212 2, . . . , 1 : vi,1 \u2190 F \u22121\n\ni|1,...,k(vi,1, vk,k),\n\nwhere Fi|1,...,k = \u2202Ck,i|1,...,k\u22121\n\n.\n\n\u2202u1\n\n(c) xi = F \u22121\n(d) If i < d then for j = 1, . . . , i \u2212 1 : vi,j+1 \u2190 Fi|1,...,j(vi,j, vj,j),\n\n(vi,1).\n\ni\n\nwhere Fi|1,...,j = \u2202Cj,i|1,...,j\u22121\n\n.\n\n\u2202u1\n\n5. The result is x1, . . . , xd.\n\nThe algorithm has quadratic complexity and is thus applicable to estimate information-theoretic\nquantities following the scheme outlined in Section 2.2.\n\n2.5 Tractable algorithm for calculating mixed canonical vine densities\n\ni|A := P (Xi \u2264 xi|XA = xA) and F \u2212\n\nPanagiotelis et al. [15] introduced an algorithm for calculating the likelihood of speci\ufb01c discrete\npair-copula decompositions. Notably, this algorithm has quadratic complexity in the number of\nelements in the multivariate distribution. Here, we generalize this algorithm to the mixed margins\ncase and apply it to the C-vine. We apply a dynamic programming approach and build the likelihood\nin a bottom up fashion from vine level T0 to level Td. The algorithm has quadratic complexity\nand computes the density of a C-vine with mixed discrete and continuous margins. We abbreviate\ni|A := P (Xi \u2264 xi \u2212 1|XA = xA). We write\nF +\ni|A := F c\nfi|A := f (Xi = xi|XA = xA) if Xi is continuous and fi|A := P (Xi = xi|XA = xA) if Xi is\ndiscrete. Moreover \u2200a, b \u2208 {+,\u2212, c} : C ab\nis the partial derivative of\nthe copula C with respect to its \ufb01rst argument and \u2202C\nis the partial derivative with respect to C\u2019s\n\u2202u2\nsecond argument. Consequently, for w \u2208 {u, v} we write \u2202Cab\ni \u2212 F \u2212\n\n1. Level T0: For i = 1, . . . , d: evaluate fi = F +\nif Xi is discrete and fi = fi(xi) if Xi\n2. Levels T1, T2, . . . , Td\u22121: For t = 1, . . . , d \u2212 1 and i = t + 1, . . . , d: Let It = {1, . . . , t}.\n\nis continuous.\nLet Lt = {1, . . . , t \u2212 1} if t > 1, and Lt = \u2205 if t = 1.\n(a)\n\ni,j|A := Ci,j|A(F a\n\n:= \u2202Ci,j|A\n\nj|A). \u2202C\n\u2202u1\n\n\u2202w (F a\n\ni|A, F b\n\ni|A, F b\n\ni,j|A\n\u2202w\n\nj|A).\n\ni\n\n\uf8f1\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f3\n\nEvaluate\n\nt,i|Lt\n\n\u2200a, b \u2208 {+,\u2212} : C ab\n\u2200a \u2208 {+,\u2212} : C ac\nt,i|Lt\n\u2200b \u2208 {+,\u2212} : C cb\nt,i|Lt\nand \u22022Cc\n\n\u2202Ccc\nt,i|Lt\n\u2202u1\n\nt,i|Lt\n\u2202u2\n\nt,i|Lt\n\u2202u2\nt,i|Lt\n\u2202u1\n\nand \u2202Cac\nand \u2202Ccb\nt,i|Lt\n\u2202u1\u2202u2\n\n\u2202Cc\n\n,\n\nif Xt and Xi discrete,\nif Xt discrete and Xi continuous,\nif Xt continuous and Xi discrete,\nif Xt and Xi continuous.\n\n(5)\n\n(b) Evaluate\n\n\u2022 if Xi discrete:\n\nwhere\n\u2013 if Xt discrete:\n\nF +\ni|It\n\n=\n\nfi|It = F +\ni|It\n\n\u2212 F \u2212\ni|It\n\n,\n\nC ++\nt,i|Lt\n\n\u2212 C\u2212+\nt,i|Lt\nft|Lt\n\n,\n\nF \u2212\ni|It\n\n=\n\nC +\u2212\nt,i|Lt\n\n\u2212 C\u2212\u2212\nt,i|Lt\nft|Lt\n\n.\n\n4\n\n(6)\n\n(7)\n\n\f\u2013 if Xt continuous:\n\nF +\ni|It\n\n=\n\n\u2202C c+\nt,i|Lt\n\u2202u2\n\n,\n\n\u2022 if Xt discrete and Xi continuous:\n\nC +c\n\nt,i|Lt\n\n\u2212 C\u2212c\nt,i|Lt\nft|Lt\n\nF c\n\n=\n\ni|It\n\n, fi|It =\n\u2022 if Xt continuous and Xi continuous:\n\n\u2202C c\u2212\nt,i|Lt\n\u2202u2\n\n.\n\nF \u2212\ni|It\n\n=\n\n(cid:32) \u2202C +c\n\n\u2202F c\n\ni|It\n\u2202xi\n\n=\n\nt,i|Lt\n\u2202u1\n\n\u2212 \u2202C\u2212c\n\nt,i|Lt\n\u2202u1\n\n(cid:33)\n\n(8)\n\nfi|Lt\nft|Lt\n\n, (9)\n\n\u2202C cc\n\nt,i|Lt\n\u2202u2\n\n,\n\nfi|It =\n\n\u2202F c\n\ni|It\n\u2202xi\n\n=\n\n\u22022C cc\n\nt,i|Lt\n\u2202u1\u2202u2\n\nfi|Lt,\n\n(10)\n\nF c\n\ni|It\n\n=\n\n(cid:81)d\n\n3. The result is f1,...,d = f1\n\ni=2 fi|1,...,i\u22121.\n\nLike the sampling algorithm in Section 2.4, the likelihood algorithm has quadratic complexity and\nis thus applicable to estimate information-theoretic quantities following the scheme outlined in\nSection 2.2.\n\n2.6\n\nInference\n\nWe can apply maximum likelihood methods to estimate model parameters, because we can directly\ncalculate the full likelihood of the model - even for high dimensions - following the procedure\nj=1 log fX (xj; \u03b8, \u03bb1, . . . , \u03bbd) denote the log\nlikelihood of the joint probability density function, where \u03b8 denotes the parameters of the chosen\ncopula family. We can now apply the so-called inference for margins (IFM) method to estimate the\nparameters [11]. The idea of this method is to break the joint optimization of all parameters up into\nj=1 log fi(xi,j; \u03bbi) denote the sum\nof log likelihoods of the marginal distribution fi(xi,j; \u03bbi), where \u03bb1, . . . , \u03bbd are the parameters of the\nchosen family of margins. The method proceeds in two steps. In the \ufb01rst step, the margin likelihoods\n{Li(\u03bbi)}. In the second step, the full\n\noutlined in Section 2.5. Let L(\u03b8, \u03bb1, . . . , \u03bbd) = (cid:80)k\nsmaller optimization problems. For i = 1, . . . , d, let Li(\u03bbi) =(cid:80)k\nare maximized separately: \u2200i = 1, . . . , d :(cid:99)\u03bbi = argmax\nlikelihood is maximized given the estimated margin parameters as(cid:98)\u03b8 = argmax\n\n{L(\u03b8,(cid:99)\u03bb1, . . . ,(cid:99)\u03bbd)}.\n\n\u03bbi\n\n\u03b8\n\nEach of the individual optimization problems can be solved by means of a general multivariate\noptimization algorithm such as the trust-region-re\ufb02ective algorithm [4]. Joe and Xu [11] showed\nthat the IFM estimator is asymptotically ef\ufb01cient. The method is particularly attractive if the ratio\nof margin parameters to copula parameters is big. If the number of copula parameters is too big to\nbe estimated in a single joint optimization, then the complexity of the copula model can be reduced\nby truncating the vine tree of the C-vine (truncated vine [1]). This corresponds to an independence\nassumption for higher vine levels and the validity of this simpli\ufb01cation should be con\ufb01rmed [22]. The\nfamilies of margin and copula distributions can be selected using the Akaike information criterion\n(AIC) [3]: Each combination of family selections is scored by means of its AIC value and then the\nbest combination is chosen.\n\n3 Validation on arti\ufb01cial data\n\nWe validated our framework by sampling from mixed vine-based models of different dimensionality\nand by evaluating performance of various alternative models. Fig. 1 illustrates a 3-dimensional\nexample vine-based model with two continuous margins and one discrete margin. In the top row, we\nshow the probability density functions of the 2-dimensional margins obtained by integrating over one\nmargin each. One can appreciate the mixed distribution from the step-wise changes in probability\ndensity in margin 2 and the smooth changes in margins 1 and 3. The bottom row shows scatter plots\nof 3-dimensional samples projected onto each pair of margins. The distributions of samples nicely\nre\ufb02ect the corresponding densities.\nWe drew samples from this and other mixed vine distributions and \ufb01tted various models to these\nsamples. For model selection, we used normal and gamma distributions as options for continuous\n\n5\n\n\fFigure 1: Characteristics of a 3D mixed vine example. Margin 1 is standard normal distributed,\nmargin 2 is Poisson distributed with mean 5 and margin 3 is gamma distributed with shape 2 and\nscale 4. The pairwise copulas are Gaussian with parameter 0.5, Student with correlation 0.5 and 2\ndegrees of freedom and Clayton with parameter 5 for margin pairs (1,2), (1,3) and (2,3) respectively.\nTop row: Probability density functions of 2D margins. The lighter the color the higher is the density.\nBottom row: 2D margin scatter plots of 300 samples.\n\nmargins, Poisson, binomial and negative binomial distributions as options for discrete margins, and\nGaussian, Student, Clayton and rotated (90\u25e6, 180\u25e6, 270\u25e6) Clayton copula families as options for\npair copula constructions. To quantify the gain of using a vine-based mixed model instead of a\nmixed independent model, we drew samples from the vine-based mixed model and calculated the\ncross-validated likelihood ratio (LR) statistic for nested models as D = 2(log(Lvine) \u2212 log(Lind)),\nwhere Lvine denotes the likelihood of separate test-set samples under the vine-based model and Lind\ndenotes the likelihood of the samples under the corresponding independent model.\n\nFigure 2: Model \ufb01t and entropy of simulated vine samples. Ground truth models are mixed vines\nof different dimensionality (range 2 to 6 shown as dark brown to light brown lines) with margins\nand copulas up to the respective dimension. Margins 1 to 3 and associated pairwise copulas are the\nsame as in Fig. 1. Margin 4 is binomial distributed with N = 6 and p = 0.4, margin 5 is negative\nbinomial distributed with N = 6 and p = 0.4 and margin 6 is standard normal distributed. The\npairwise copulas are Clayton survival, independent and Clayton rotated 90\u25e6 for margin pairs (1,4),\n(2,4) and (3,4) respectively, and Clayton rotated 270\u25e6, independent, Gaussian with parameter 0.5 and\nindependent for margin pairs (1,5), (2,5), (3,5) and (4,5) respectively and independent, independent,\nGaussian with parameter 0.5, independent and Student with parameters 0.5 and 2 for margin pairs\n(1,6), (2,6), (3,6), (4,6) and (5,6) respectively and with parameter 5 for all Clayton based copulas. (A-\nC) Cross-validated LR statistic between the ground truth model and the mixed vine-based model (A),\nindependent model (B) or mixed Gaussian model (C). (D,E) Normalized entropy difference between\nthe ground truth model and the independent model (D) or fully continuous vine-based model (E).\nLines denote averages over 30 repetitions as a function of the number of samples. Shaded areas\ndenote standard error.\n\nFig. 2A shows the LR statistic between the ground truth and the best-\ufb01tting mixed vine-based model\nas a function of the number of samples for different dimensionality. The statistics were low in all cases\nbut increased with increasing dimensionality. The gain as quanti\ufb01ed by the LR statistic of using the\nfull mixed vine-based model instead of the independent model, on the other hand, was moderate for\nthe bivariate model (D < 0.5) while being substantial for the 6-dimensional model (D \u2248 7). Wilks\u2019\nLR test on non-cross-validated data was highly signi\ufb01cant whenever we used at least 32 samples\n\n6\n\nMargin 2\u22122020510Margin 3\u2212202510152025Margin 30510510152025\u22122020510Margin 1Margin 2\u2212202510152025Margin 1Margin 30510510152025Margin 2Margin 3\f(p < 0.01). We also evaluated the \ufb01t of the multivariate Gaussian copula with mixed margins which\nis nested in our mixed vine-based models and obtained by restricting all pairwise copula families to\nbe Gaussian. The LR statistics indicated substantially better \ufb01t than for the independent model but\nthe statistics were below those of the mixed vine-based model for most tested dimensions (Fig. 2C).\nUnfortunately, a vine-based mixed model and the corresponding best-\ufb01tting fully continuous vine-\nbased model are not directly comparable in this way due to the different weighting of discrete and\ncontinuous elements (i.e. mass vs. density). Nevertheless, in an actual application it is easy to\ndetermine which margins are discrete and which margins are continuous. Appropriate discrete or\ncontinuous margins can therefore be selected easily. To extend our comparison to fully continuous\nvine-based models, we estimated entropies of the mixed vine-based model, the corresponding inde-\npendent model and of the best-\ufb01tting fully continuous model. We calculated the entropy differences\nbetween these models and normalized with the entropy of the mixed vine-based model. Fig. 2D\nshows the normalized entropy difference between the mixed vine-based model and the independent\nmodel. The relative results are similar to those of the likelihood ratio statistic (Fig. 2B) suggesting\nthat in this case the entropy comparison is indicative of the performance gain. In Fig. 2E, we plot\nthe normalized entropy difference between the mixed vine-based model and the best-\ufb01tting fully\ncontinuous model. Overall, the normalized differences of these models were smaller than for the\nindependent model. Similarly to the independent model, though, we found increasing differences for\nincreasing dimensionality of the models. All in all, our results suggest that our framework can yield\nsubstantial advantages in terms of goodness of \ufb01t and in terms of estimated entropy in particular for\nhigh-dimensional problems.\n\n4 Application to simulated network activity\n\nTo evaluate our framework in a typical neuroscience setting, we applied our mixed vine-based model\nto a biologically realistic neural network model. We simulated network activity with the Virtual\nElectrode Recording Tool for EXtracellular potentials (VERTEX) [25] with network parameters\nas in VERTEX tutorial 2. Brie\ufb02y, the model contained a total of 5000 neurons with 85% of those\ncell models representing layer 2/3 pyramidal neurons and 15% representing basket interneurons.\nThe spiking dynamics followed an adaptive exponential model. To simulate two different stimulus\nconditions, we used random input currents with different means. We presented each stimulus condition\nan equal number of times (corresponding to 1/2 probability of occurrence of either stimulus). The\nnetwork generated network oscillations in both conditions. To simulate a typical recording situation,\nwe recorded LFPs with two randomly placed electrodes and collected spike counts from the four\nneurons closest to those electrodes. For each input condition, we ran the network 128 times and\ncollected one 6-dimensional mixed vector with the LFPs (continuous) and spike counts (discrete)\ncollected in a 100 ms interval from each network run. We then \ufb01tted the full mixed vine-based model,\nthe mixed independent model and the fully continuous vine-based model to these data. Importantly,\nwe \ufb01tted separate models for each stimulus condition and varied the number of samples per stimulus\ncondition between 8 and 128. This allowed us to estimate mutual information following the procedure\noutlined in Section 2.2.\nSimilarly to Figs. 2B,C, Fig. 3A depicts the LR statistic between the best-\ufb01tting mixed vine-based\nmodel and the corresponding independent model or mixed Gaussian model. We found relatively\nsmall statistics for all sample sizes (D < 1). Nevertheless, Wilks\u2019 LR test indicated highly signi\ufb01cant\nimprovement whenever we used at least 64 samples (p < 0.01). To evaluate the importance of the\nmixed vine-based model when performing an information-theoretic analysis of the network activity,\nwe estimated mutual information between the modeled network activity (LFP and spike counts) and\nthe two stimulus conditions. Fig. 3B shows mutual information estimates that we obtained based on\nthe mixed independent, mixed Gaussian, continuous vine-based and mixed vine-based models. The\nmixed Gaussian model yielded information estimates that were close to those of the mixed-vine based\nmodels. Estimates based on the independent model and fully continuous model, on the other hand,\nwere both substantially different (overestimating and underestimating information, respectively) from\nestimates that we obtained from the mixed vine-based model. The latter model is the most faithful one\nwith the most accurate information estimates. The overestimation of the independent model suggests\nthat spike counts and LFPs carry partly redundant information. The big differences in information\nestimates further indicate that it can be important to take mixed margins and dependencies into\naccount for estimating mutual information, even if the LR statistic is low.\n\n7\n\n\fFigure 3: Analysis of simulated neural network activity obtained from the VERTEX tool [25].\nData samples are formed by the average LFP within 200 \u2212 300 ms after simulation onset from two\nrandomly chosen electrodes and spike counts from the four neurons in closest proximity to those\nelectrodes. One simulation run provided one sample only. The network was simulated with two\ndifferent input conditions: Input currents following an Ornstein-Uhlenbeck process had a mean value\nof 330 pA for the excitatory population and 190 pA for the inhibitory population in condition 1,\nand 300 pA for the excitatory population and 40 pA for the inhibitory population in condition 2.\nIn both conditions, standard deviation was 90 pA for the excitatory population and 50 pA for the\ninhibitory population. (A) LR statistic between the best-\ufb01tting mixed vine-based model and the\nbest-\ufb01tting mixed independent model (blue) or mixed Gaussian model (red) as a function of the\nnumber of samples (i.e. number of simulations in each condition) averaged over stimulus conditions.\n(B) Mutual information between the neural activity and the two input conditions estimated from the\nmixed independent model (blue), mixed Gaussian model (red), continuous vine-based model (green)\nor mixed vine-based model (black) as a function of the number of samples. Lines denote averages\nover 30 repetitions. Shaded areas denote standard error.\n\n5 Discussion\n\nWe developed a complete framework based on vine copulas for modeling multivariate data that are\npartly discrete and partly continuous. Our framework includes methods for sampling, likelihood\ncalculation and inference. We combined these procedures to estimate entropy and mutual information\nby means of MC integration. In particular, our methods provide the possibility to construct joint\nstatistical models of LFPs and spike counts. In a biologically realistic network simulation we demon-\nstrated that our mixed vine-based model provides a \ufb01t that is better than that of the corresponding\nindependent model and showed that mutual information estimates of fully continuous and mixed\nindependent models can strongly differ even if the likelihood ratio statistic suggests otherwise. For\nLFP and spike count data, a mixed model with detailed dependence structures can make full use of\nall available statistical data. This also makes it possible to construct optimal Bayesian decoders for\ninferring the presented stimulus from both LFPs and spike counts. Moreover, our model provides\nthe possibility to investigate the statistical dependencies between LFPs and spike counts. Contrary\nto other analysis methods for analyzing mixed LFPs and spiking [12, 5] our framework follows a\npurely data-driven approach. Even high-dimensional distributions can be \ufb01tted, because all inference\noperations have quadratic complexity. However, entropy and MI estimation can be problematic,\nbecause MC integration can become unfeasible for very high-dimensional problems. One possible\nremedy is to use our models for maximum likelihood decoding and then estimate information based\non decoding performance [8]. We note that our models are based on pair-constructions and thus\ncannot model arbitrary higher-order dependencies. We stress, however, that higher-order correlations\ndo occur in the vine tree and depend on both the vine-tree selection and the copula families. Thus,\nselecting the right vine-tree and copula families can - to a limited extent - account for higher-order\ncorrelations. In general, however, limited sample numbers make it dif\ufb01cult to reliably estimate higher-\norder correlations in real neuroscience applications. The parametric nature of our model framework\nalso makes it possible to introduce dependencies on external variables. Directions for future research\ninclude applications to experimentally recorded data and detailed evaluation of observed dependency\nstructures.\n\nAcknowledgments. This work was supported by the European Commission\u2019s Horizon 2020 Pro-\ngramme (H2020-MSCA-IF-2014) under grant agreement number 659227 (\u201cSTOMMAC\u201d).\n\n8\n\n\fReferences\n[1] K. Aas, C. Czado, A. Frigessi, and H. Bakken. Pair-copula constructions of multiple dependence. Insurance:\n\nMathematics and Economics, 44(2):182\u2013198, 2009.\n\n[2] E. F. Acar, C. Genest, and J. Ne\u0161Lehov\u00e1. Beyond simpli\ufb01ed pair-copula constructions. Journal of\n\nMultivariate Analysis, 110:74\u201390, 2012.\n\n[3] H. Akaike. A new look at the statistical model identi\ufb01cation. Automatic Control, IEEE Transactions on,\n\n19(6):716\u2013723, 1974.\n\n[4] R. H. Byrd, M. E. Hribar, and J. Nocedal. An interior point algorithm for large-scale nonlinear programming.\n\nSIAM Journal on Optimization, 9(4):877\u2013900, 1999.\n\n[5] D. E. Carlson, J. S. Borg, K. Dzirasa, and L. Carin. On the relations of LFPs & neural spike trains. In\n\nAdvances in Neural Information Processing Systems, pages 2060\u20132068, 2014.\n\n[6] T. M. Cover and J. A. Thomas. Elements of Information Theory. New York: Wiley, second edition, 2006.\n\n[7] A. R. de Leon and B. Wu. Copula-based regression models for a bivariate mixed discrete and continuous\n\noutcome. Statistics in Medicine, 30(2):175\u2013185, 2011.\n\n[8] R. A. A. Ince, S. Panzeri, and C. Kayser. Neural codes formed by small and temporally precise populations\n\nin auditory cortex. The Journal of Neuroscience, 33(46):18277\u201318287, 2013.\n\n[9] P. Jaworski, F. Durante, and W. K. H\u00e4rdle. Copulae in mathematical and quantitative \ufb01nance. Lecture\n\nNotes in Statistics, Proceedings, 213, 2013.\n\n[10] R. L. Jenison and R. A. Reale. The shape of neural dependence. Neural Computation, 16:665\u2013672, 2004.\n\n[11] H. Joe and J. J. Xu. The estimation method of inference functions for margins for multivariate models.\n\nTechnical Report 166, Department of Statistics, University of British Colombia, 1996.\n\n[12] R. C. Kelly, M. A. Smith, R. E. Kass, and T. S. Lee. Local \ufb01eld potentials indicate network state and\naccount for neuronal response variability. Journal of Computational Neuroscience, 29(3):567\u2013579, 2010.\n\n[13] R. B. Nelsen. An Introduction to Copulas. Springer, New York, second edition, 2006.\n\n[14] A. Onken, S. Gr\u00fcnew\u00e4lder, M. H. J. Munk, and K. Obermayer. Analyzing short-term noise dependencies\nof spike-counts in macaque prefrontal cortex using copulas and the \ufb02ashlight transformation. PLoS\nComputational Biology, 5(11):e1000577, 11 2009.\n\n[15] A. Panagiotelis, C. Czado, and H. Joe. Pair copula constructions for multivariate discrete data. Journal of\n\nthe American Statistical Association, 107(499):1063\u20131072, 2012.\n\n[16] S. Panzeri, J. H. Macke, J. Gross, and C. Kayser. Neural population coding: combining insights from\n\nmicroscopic and mass signals. Trends in Cognitive Sciences, 19(3):162\u2013172, 2015.\n[17] J. S. Racine. Mixed data kernel copulas. Empirical Economics, 48(1):37\u201359, 2015.\n[18] C. P. Robert and G. Casella. Monte Carlo Statistical Methods. New York: Springer, second edition, 2004.\n\n[19] L. Sacerdote, M. Tamborrino, and C. Zucca. Detecting dependencies between spike trains of pairs of\n\nneurons through copulas. Brain Research, 1434:243\u2013256, 2012.\n\n[20] C. E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379\u2013423,\n\n1948.\n\n[21] A. Sklar. Fonctions de r\u00e9partition \u00e0 n dimensions et leurs marges. Publications de l\u2019Institut de Statistique\n\nde L\u2019Universit\u00e9 de Paris, 8:229\u2013231, 1959.\n\n[22] M. Smith, A. Min, C. Almeida, and C. Czado. Modeling longitudinal data using a pair-copula decomposi-\n\ntion of serial dependence. Journal of the American Statistical Association, 105(492), 2010.\n\n[23] M. S. Smith and M. A. Khaled. Estimation of copula models with discrete margins via Bayesian data\n\naugmentation. Journal of the American Statistical Association, 107(497):290\u2013303, 2012.\n\n[24] P. X. K. Song, M. Li, and Y. Yuan. Joint regression analysis of correlated data using Gaussian copulas.\n\nBiometrics, 65(1):60\u201368, 2009.\n\n[25] R. J. Tomsett, M. Ainsworth, A. Thiele, M. Sanayei, X. Chen, M. A. Gieselmann, M. A. Whittington, M. O.\nCunningham, and M. Kaiser. Virtual Electrode Recording Tool for EXtracellular potentials (VERTEX):\ncomparing multi-electrode recordings from simulated and biological mammalian cortical tissue. Brain\nStructure and Function, 220(4):2333\u20132353, 2015.\n\n9\n\n\f", "award": [], "sourceid": 727, "authors": [{"given_name": "Arno", "family_name": "Onken", "institution": "IIT"}, {"given_name": "Stefano", "family_name": "Panzeri", "institution": "IIT"}]}