{"title": "Semi-Supervised Domain Adaptation with Non-Parametric Copulas", "book": "Advances in Neural Information Processing Systems", "page_first": 665, "page_last": 673, "abstract": "A new framework based on the theory of copulas is proposed to address semi-supervised domain adaptation problems. The presented method factorizes any multivariate density into a product of marginal distributions and bivariate copula functions. Therefore, changes in each of these factors can be detected and corrected to adapt a density model across different learning domains. Importantly, we introduce a novel vine copula model, which allows for this factorization in a non-parametric manner. Experimental results on regression problems with real-world data illustrate the efficacy of the proposed approach when compared to state-of-the-art techniques.", "full_text": "Semi-Supervised Domain Adaptation with\n\nNon-Parametric Copulas\n\nDavid Lopez-Paz\n\nJos\u00b4e Miguel Hern\u00b4andez-Lobato\n\nBernhard Sch\u00a8olkopf\n\nMPI for Intelligent Systems\ndlopez@tue.mpg.de\n\nUniversity of Cambridge\njmh233@cam.ac.uk\n\nMPI for Intelligent Systems\n\nbs@tue.mpg.de\n\nAbstract\n\nA new framework based on the theory of copulas is proposed to address semi-\nsupervised domain adaptation problems. The presented method factorizes any\nmultivariate density into a product of marginal distributions and bivariate cop-\nula functions. Therefore, changes in each of these factors can be detected and\ncorrected to adapt a density model accross different learning domains.\nImpor-\ntantly, we introduce a novel vine copula model, which allows for this factorization\nin a non-parametric manner. Experimental results on regression problems with\nreal-world data illustrate the ef\ufb01cacy of the proposed approach when compared to\nstate-of-the-art techniques.\n\n1\n\nIntroduction\n\nWhen humans address a new learning problem, they often use knowledge acquired while learning\ndifferent but related tasks in the past. For example, when learning a second language, people rely on\ngrammar rules and word derivations from their mother tongue. This is called language transfer [19].\nHowever, in machine learning, most of the traditional methods are not able to exploit similarities\nbetween different learning tasks. These techniques only achieve good performance when the data\ndistribution is stable between training and test phases. When this is not the case, it is necessary to a)\ncollect and label additional data and b) re-run the learning algorithm. However, these operations are\nnot affordable in most practical scenarios.\nDomain adaptation, transfer learning or multitask learning frameworks [17, 2, 5, 13] confront these\nissues by \ufb01rst, building a notion of task relatedness and second, providing mechanisms to transfer\nknowledge between similar tasks. Generally, we are interested in improving predictive performance\non a target task by using knowledge obtained when solving another related source task. Domain\nadaptation methods are concerned about what knowledge we can share between different tasks, how\nwe can transfer this knowledge and when we should do it or not to avoid additional damage [4].\nIn this work, we study semi-supervised domain adaptation for regression tasks. In these problems,\nthe object of interest (the mechanism that maps a set of inputs to a set of outputs) can be stated as\na conditional density function. The data available for solving each learning task is assumed to be\nsampled from modi\ufb01ed versions of a common multivariate distribution. Therefore, we are interested\nin sharing the \u201ccommon pieces\u201d of this generative model between tasks, and use the data from\neach individual task to detect, learn and adapt the varying parts of the model. To do so, we must\n\ufb01nd a decomposition of multivariate distributions into simpler building blocks that may be studied\nseparately across different domains. The theory of copulas provides such representations [18].\nCopulas are statistical tools that factorize multivariate distributions into the product of its marginals\nand a function that captures any possible form of dependence among them. This function is referred\nto as the copula, and it links the marginals together into the joint multivariate model. Firstly intro-\n\n1\n\n\fduced by Sklar [22], copulas have been successfully used in a wide range of applications, including\n\ufb01nance, time series or natural phenomena modeling [12]. Recently, a new family of copulas named\nvines have gained interest in the statistics literature [1]. These are methods that factorize multivari-\nate densities into a product of marginal distributions and bivariate copula functions. Each of these\nfactors corresponds to one of the building blocks that we assume either constant or varying across\ndifferent learning domains.\nThe contributions of this paper are two-fold. First, we propose a non-parametric vine copula model\nwhich can be used as a high-dimensional density estimator. Second, by making use of this method,\nwe present a new framework to address semi-supervised domain adaptation problems, which per-\nformance is validated in a series of experiments with real-world data and competing state-of-the-art\ntechniques.\nThe rest of the paper is organized as follows: Section 2 provides a brief introduction to copulas,\nand describes a non-parametric estimator for the bivariate case. Section 3 introduces a novel non-\nparametric vine copula model, which is formed by the described bivariate non-parametric copulas.\nSection 4 describes a new framework to address semi-supervised domain adaptation problems using\nthe proposed vine method. Finally, section 5 describes a series of experiments that validate the\nproposed approach on regression problems with real-world data.\n\n2 Copulas\n\nWhen the components of x = (x1, . . . , xd) are jointly independent, their density function p(x) can\nbe written as\n\np(x) =\n\np(xi) .\n\n(1)\n\nd(cid:89)\n\ni=1\n\nThis equality does not hold when x1, . . . , xd are not independent. Nevertheless, the differences\ncan be corrected if we multiply the right hand side of (1) by a speci\ufb01c function that fully describes\nany possible dependence between x1, . . . , xd. This function is called the copula of p(x) [18] and\nsatis\ufb01es\n\np(x) =\n\np(xi) c(P (x1), ..., P (xd))\n\n.\n\n(2)\n\n(cid:124)\n\n(cid:123)(cid:122)\n\ncopula\n\n(cid:125)\n\nd(cid:89)\n\ni=1\n\nd(cid:89)\n\nThe copula c is the joint density of P (x1), . . . , P (xd), where P (xi) is the marginal cdf of the ran-\ndom variable xi. This density has uniform marginals, since P (z) \u223c U[0, 1] for any random variable\nz. That is, when we apply the transformation P (x1), . . . , P (xd) to x1, . . . , xd, we are eliminating all\ninformation about the marginal distributions. Therefore, the copula captures any distributional pat-\ntern that does not depend on their speci\ufb01c form, or, in other words, all the information regarding the\ndependencies between x1, . . . , xd. When P (x1), . . . , P (xd) are continuous, the copula c is unique\n[22]. However, in\ufb01nitely many multivariate models share the same underlying copula function, as\nillustrated in Figure 1. The main advantage of copulas is that they allow us to model separately the\nmarginal distributions and the dependencies linking them together to produce the multivariate model\nsubject of study.\nGiven a sample from (2), we can estimate p(x) as follows. First, we construct estimates of the\nmarginal pdfs, \u02c6p(x1), . . . , \u02c6p(xd), which also provide estimates of the corresponding marginal cdfs,\n\u02c6P (x1), . . . , \u02c6P (xd). These cdfs estimates are used to map the data to the d-dimensional unit hyper-\ncube. The transformed data are then used to obtain an estimate \u02c6c for the copula of p(x). Finally, (2)\nis approximated as\n\n\u02c6p(x) =\n\n\u02c6p(xi) \u02c6c( \u02c6P (x1), ..., \u02c6P (xd)).\n\n(3)\n\ni=1\n\nThe estimation of marginal pdfs and cdfs can be implemented in a non-parametric manner by using\nunidimensional kernel density estimates. By contrast, it is common practice to assume a parametric\nmodel for the estimation of the copula function. Some examples of parametric copulas are Gaussian,\nGumbel, Frank, Clayton or Student copulas [18]. Nevertheless, real-world data often exhibit com-\nplex dependencies which cannot be correctly described by these parametric copula models. This\nlack of \ufb02exibility of parametric copulas is illustrated in Figure 2. As an alternative, we propose\n\n2\n\n\fFigure 1: Left, sample from a Gaussian copula with correlation \u03c1 = 0.8. Middle and right, two\nsamples drawn from multivariate models with this same copula but different marginal distributions,\ndepicted as rug plots.\n\nFigure 2: Left, sample from the copula linking variables 4 and 11 in the WIRELESS dataset. Middle,\ndensity estimate generated by a Gaussian copula model when \ufb01tted to the data. This technique is\nunable to capture the complex patterns present in the data. Right, copula density estimate generated\nby the non-parametric method described in section 2.1.\n\nto approximate the copula function in a non-parametric manner. Kernel density estimates can also\nbe used to generate non-parametric approximations of copulas, as described in [8]. The following\nsection reviews this method for the two-dimensional case.\n\n2.1 Non-parametric Bivariate Copulas\n\nWe now elaborate on how to non-parametrically estimate the copula of a given bivariate density\np(x, y). Recall that this density can be factorized as the product of its marginals and its copula\n\n{(ui, vi)}n\n\ni=1.\n\np(x, y) = p(x) p(y) c(P (x), P (y)).\n\n(4)\nAdditionally, given a sample {(xi, yi)}n\ni=1 from p(x, y), we can obtain a pseudo-sample from its\ncopula c by mapping each observation to the unit square using estimates of the marginal cdfs, namely\ni=1 := {( \u02c6P (xi), \u02c6P (yi))}n\n(5)\nThese are approximate observations from the uniformly distributed random variables u = P (x) and\nv = P (y), whose joint density is the copula function c(u, v). We could try to approximate this\ndensity function by placing Gaussian kernels on each observation ui and vi. However, the resulting\ndensity estimate would have support on R2, while the support of c is the unit square. A solution\nis to perform the density estimation in a transformed space. For this, we select some continuous\ndistribution with support on R, strictly positive density \u03c6, cumulative distribution \u03a6 and quantile\nfunction \u03a6\u22121. Let z and w be two new random variables given by z = \u03a6\u22121(u) and w = \u03a6\u22121(v).\nThen, the joint density of z and w is\n\n(6)\nThe copula of this new density is identical to the copula of (4), since the performed transforma-\ntions are marginal-wise. The support of (6) is now R2; therefore, we can now approximate it with\n\np(z, w) = \u03c6(z) \u03c6(w) c(\u03a6(z), \u03a6(w)) .\n\n3\n\nllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll0.000.250.500.751.000.000.250.500.751.00llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll0246\u2212202llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll0.02.55.00.02.55.0llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll0.000.250.500.751.000.000.250.500.751.000255075100025507510002550751000255075100\fGaussian kernels. Let zi = \u03a6\u22121(ui) and wi = \u03a6\u22121(vi). Then,\n\n\u02c6p(z, w) =\n\n(7)\nwhere N (\u00b7,\u00b7|\u03bd1, \u03bd2, \u03a3) is a two-dimensional Gaussian density with mean (\u03bd1, \u03bd2) and covariance\nmatrix \u03a3. For convenience, we select \u03c6, \u03a6 and \u03a6\u22121 to be the standard Gaussian pdf, cdf and\nquantile function, respectively. Finally, the copula density c(u, v) is approximated by combining (6)\nwith (7):\n\ni=1\n\nN (z, w|zi, wi, \u03a3),\n\n\u02c6c(u, v) =\n\n\u02c6p(\u03a6\u22121(u), \u03a6\u22121(v))\n\u03c6(\u03a6\u22121(u))\u03c6(\u03a6\u22121(v))\n\n=\n\n1\nn\n\nN (\u03a6\u22121(u), \u03a6\u22121(v)|\u03a6\u22121(ui), \u03a6\u22121(vi), \u03a3)\n\n\u03c6(\u03a6\u22121(u))\u03c6(\u03a6\u22121(v))\n\n.\n\n(8)\n\nn(cid:88)\n\n1\nn\n\nn(cid:88)\n\ni=1\n\n3 Regular Vines\n\nThe method described above can be generalized to the estimation of copulas of more than two ran-\ndom variables. However, although kernel density estimates can be successful in spaces of one or two\ndimensions, as the number of variables increases, this methods start to be signi\ufb01cantly affected by\nthe curse of dimensionality and tend to over\ufb01t to the training data. Additionally, for addressing do-\nmain adaptation problems, we are interested in factorizing these high-dimensional copulas into sim-\npler building blocks transferrable accross learning domains. These two drawbacks can be addressed\nby recent methods in copula modelling called vines [1]. Vines decompose any high-dimensional\ncopula density as a product of bivariate copula densities that can be approximated using the non-\nparametric model described above. These bivariate copulas (as well as the marginals) correspond to\nthe simple building blocks that we plan to transfer from one learning domain to another. Different\ntypes of vines have been proposed in the literature. Some examples are canonical vines, D-vines\nor regular vines [16, 1]. In this work we focus on regular vines (R-vines) since they are the most\ngeneral models.\nAn R-vine V for a probability density p(x1, . . . , xd) with variable set V = {1, . . . d} is formed\nby a set of undirected trees T1, . . . , Td\u22121, each of them with corresponding set of nodes Vi and set\nof edges Ei, where Vi = Ei\u22121 for i \u2208 [2, d \u2212 1] . Any edge e \u2208 Ei has associated three sets\nC(e), D(e), N (e) \u2282 V called the conditioned, conditioning and constraint sets of e, respectively.\nInitially, T1 is inferred from a complete graph with a node associated with each element of V; for any\ne \u2208 T1 joining nodes Vj and Vk, C(e) = N (e) = {Vj, Vk} and D(e) = {\u2205}. The trees T2, ..., Td\u22121\nare constructed so that each e \u2208 Ei is formed by joining two edges e1, e2 \u2208 Ei\u22121 which share a\ncommon node, for i \u2265 2. The new edge e has conditioned, conditioning and constraint sets given\nby C(e) = N (e1)\u2206N (e2), D(e) = N (e1) \u2229 N (e1), N (e) = N (e1) \u222a N (e2), where \u2206 is the\nsymmetric difference operator. Figure 3 illustrates this procedure for an R-vine with 4 variables.\nFor any edge e(j, k) \u2208 Ti, i = 1, . . . , d \u2212 1 with conditioned set C(e) = {j, k} and conditioning\nset D(e) let cjk|D(e) be the value of the copula density for the conditional distribution of xj and xk\nwhen conditioning on {xi : i \u2208 D(e)}, that is,\n\ncjk|D(e) := c(Pj|D(e), Pk|D(e)|xi : i \u2208 D(e)),\n\n(9)\nwhere Pj|D(e) := P (xj|xi : i \u2208 D(e)) is the conditional cdf of xj when conditioning on {xi : i \u2208\nD(e)}. Kurowicka and Cooke [16] indicate that any probability density function p(x1, . . . , xd) can\nthen be factorized as\n\ni=1\n\np(xi)\n\np(x) =\n\ne(j,k)\u2208Ei\n\ncjk|D(e) ,\n\n(10)\nwhere E1, . . . , Ed\u22121 are the edge sets of the R-vine V for p(x1, . . . , xd). In particular, each of the\nedges in the trees from V specify a different conditional copula density in (10). For d variables, the\ndensity in (10) is formed by d(d \u2212 1)/2 factors. Changes in each of these factors can be detected\nand independently transferred accross different learning domains to improve the estimation of the\ntarget density function.\nThe de\ufb01nition of cjk|D(e) in (9) requires the calculation of conditional marginal cdfs. For this, we\nuse the following recursive identity introduced by Joe [14], that is,\n\nd\u22121(cid:89)\n\n(cid:89)\n\nd(cid:89)\n\ni=1\n\nPj|D(e) =\n\n\u2202 Cjk|D(e)\\k\n\u2202Pk|D(e)\\k\n\n,\n\n4\n\n(11)\n\n\fTree 1\n1, 2|\u2205\n\nTree 2\n2, 3|1\n\nTree 3\n2, 4|1, 3\n\n1, 3|\u2205\n\n2, 3|1\n\n1, 4|3\n\n1, 4|3\n\n3, 4|\u2205\n\n2, 2|\u2205\n\n1, 2|\u2205\n\n2, 4|\u2205\n\n4, 4|\u2205\n\n1, 1|\u2205\n\n1, 3|\u2205\n\n3, 3|\u2205\n\np1234 = p1 \u00b7 p2 \u00b7 p3 \u00b7 p4\n\n\u00b7 c12 \u00b7 c13 \u00b7 c34\n\n\u00b7 c23|1 \u00b7 c14|3\n\n(cid:125)\n\n(cid:124)\n\n(cid:123)(cid:122)\n\nTree 1\n\n(cid:125)\n\n(cid:124)\n\n(cid:123)(cid:122)\n\nTree 2\n\n(cid:125)\n\n\u00b7 c24|13\n\n(cid:124)(cid:123)(cid:122)(cid:125)\n\nTree 3\n\n3, 4|\u2205\n\n(cid:124)\n\n(cid:123)(cid:122)\n\nMarginals\n\nFigure 3: Example of the hierarchical construction of a R-vine copula for a system of four variables.\nThe edges selected to form each tree are highlighted in bold. Conditioned and conditioning sets for\neach node and edge are shown as C(e)|D(e). Later, each edge in bold will correspond to a different\nbivariate copula function.\n\nwhich holds for any k \u2208 D(e), where D(e) \\ k = {i : i \u2208 D(e) \u2227 i (cid:54)= k} and Cjk|D(e)\\k is the cdf\nof cjk|D(e)\\k.\nOne major advantage of vines is that they can model high-dimensional data by estimating density\nfunctions of only one or two random variables. For this reason, these techniques are signi\ufb01cantly\nless affected by the curse of dimensionality than regular density estimators based on kernels, as we\nshow in Section 5. So far Vines have been generally constructed using parametric models for the\nestimation of bivariate copulas. In the following, we describe a novel method for the construction\nof non-parametric regular vines.\n\n3.1 Non-parametric Regular Vines\n\nIn this section, we introduce a vine distribution in which all participant bivariate copulas can be\nestimated in a non-parametric manner. Todo so, we model each of the copulas in (10) using the non-\nparametric method described in Section 2.1. Let {(ui, vi)}n\ni=1 be a sample from the copula density\nc(u, v). The basic operation needed for the implementation of the proposed method is the evaluation\nof the conditional cdf P (u|v) using the recursive equation (11). De\ufb01ne w = \u03a6\u22121(v), zi = \u03a6\u22121(ui)\nand wi = \u03a6\u22121(vi). Combining (8) and (11) we obtain\n\n(cid:90) u\n\n\u02c6P (u|v) =\n\n=\n\n=\n\n\u02c6c(x, v) dx\n\n(cid:90) u\n\n0\n\nn(cid:88)\nn(cid:88)\n\ni=1\n\n0\n\n1\n\nn\u03c6(w)\n\n1\n\nn\u03c6(w)\n\ni=1\n\nN (\u03a6\u22121(x), w|zi, wi, \u03a3)\n\n(cid:34)\n\n\u03c6(\u03a6\u22121(x))\n\ndx\n\u03a6\u22121(u) \u2212 \u00b5zi|wi\n\nN (w|wi, \u03c32\n\nw) \u03a6\n\n(cid:35)\n\n,\n\n\u03c32\nzi|wi\n\n(cid:18) \u03c32\n\n(12)\n\n(cid:19)\n\n\u03b3(w \u2212 wi) and \u03c32\n\nwhere N (\u00b7|\u00b5, \u03c32) denotes a Gaussian density with mean \u00b5 and variance \u03c32, \u03a3 =\nz (1 \u2212 \u03b32).\nkernel bandwidth matrix, \u00b5zi|wi = zi + \u03c3z\n\u03c3w\nEquation (12) can be used to approximate any conditional cdf Pj|D(e). For this, we use the fact that\nP (xj|xi : i \u2208 D(e)) = P (uj|ui : i \u2208 D(e)), where ui = P (xi), for i = 1, . . . , d, and recursively\napply rule (11) using equation (12) to compute \u02c6P (uj|ui : i \u2208 D(e)).\nTo complete the inference recipe for the non-parametric regular vine, we must specify how to con-\nstruct the hierarchy of trees T1, . . . , Td\u22121. In other words, we must de\ufb01ne a procedure to select the\nedges (bivariate copulas) that will form each tree. We have a total of d(d \u2212 1)/2 bivariate copulas\n\n\u03b3\n\u03c32\nw\n\n= \u03c32\n\nzi|wi\n\nthe\n\nz\n\u03b3\n\n5\n\n\fwhich should be distributed among the different trees. Ideally, we would like to include in the \ufb01rst\ntrees of the hierarchy the copulas with strongest dependence level. This will allow us to prune the\nmodel by assuming independence in the last k < d trees, since the density function for the inde-\npendent copula is constant and equal to 1. To construct the trees T1, . . . , Td\u22121, we assign a weight\nto each edge e(j, k) (copula) according to the level of dependence between the random variables xj\nand xk. A common practice is to \ufb01x this weight to the empirical estimate of Kendall\u2019s\u2019 \u03c4 for the two\nrandom variables under consideration[1]1. Given these weights for each edge, we propose to solve\nthe edge selection problem by obtaining d \u2212 1 maximum spanning trees. Prim\u2019s Algorithm [20] can\nbe used to solve this problem ef\ufb01ciently.\n\n4 Domain Adaptation with Regular Vines\n\nIn this section we describe how regular vines can be used to address domain adaptation problems\nin the non-linear regression setting with continuous data. The proposed approach could be easily\nextended to other problems such as density estimation or classi\ufb01cation. In regression problems, we\nare interested in inferring the mapping mechanism or conditional distribution with density p(y|x)\nthat maps one feature vector x = (x1, . . . , xd) \u2208 Rd into a target scalar value y \u2208 R. Rephrased\ninto the copula framework, this conditional density can be expressed as\n\ncjk|D(e)\n\n(13)\n\nd(cid:89)\n\n(cid:89)\n\np(y|x) \u221d p(y)\n\ni=1\n\ne(j,k)\u2208Ei\n\nwhere E1, . . . , Ed are the edge sets of an R-vine for p(x, y). Note that the normalization of the right\npart of (13) is relatively easy since y is scalar.\nIn the classic domain adaptation setup we usually have large amounts of data for solving a source\ntask characterized by the density function ps(x, y). However, only a partial or reduced sample is\navailable for solving a target task with density pt(x, y). Given the data available for both tasks, our\nobjective is to build a good estimate for the conditional density pt(y|x). To address this domain\nadaptation problem, we assume that pt is a modi\ufb01ed version of ps. In particular, we assume that\npt is obtained in two steps from ps. First, ps is expressed using an R-vine representation as in (10)\nand second, some of the factors included in that representation (marginal distributions or pairwise\ncopulas) are modi\ufb01ed to derive pt. All we need to address the adaptation across domains is to\nreconstruct the R-vine representation of ps using data from the source task, and then identify which\nof the factors have been modi\ufb01ed to produce pt. These factors are corrected using data from the\ntarget task. In the following, we describe how to identify and correct these modi\ufb01ed factors.\nMarginal distributions can change between source and target tasks (also known as covariate shift).\nIn this case, Ps(xi) (cid:54)= Pt(xi), for i = 1, . . . , d, or Ps(y) (cid:54)= Pt(y), and we need to re-generate\nthe estimates of the affected marginals using data from the target task. Additionally, some of the\nbivariate copulas cjk|D(e) may differ from source to target tasks. In this case, we also re-estimate\nthe affected copulas using data from the target task. Simultaneous changes in both copulas and\nmarginals can occur. However, there is no limitation in updating each of the modi\ufb01ed components\nseparately. Finally, if some of the factors remain constant across domains, we can use the available\ndata from the target task to improve the estimates obtained using only the data from the source\ntask. Note that we are addressing a more general problem than covariate shift. Besides identifying\nand correcting changes in marginal distributions, we also consider changes in any possible form of\ndependence (conditional distributions) between random variables.\nFor the implementation of the strategy mentioned above, we need to identify when two samples\ncome from the same distribution or not. For this, we propose to use the non-parametric two-sample\ntest Maximum Mean Discrepancy (MMD) [10]. MMD will return low p-values when two samples\nare unlikely to have been drawn from the same distribution. Speci\ufb01cally, given samples from two\ndistributions P and Q, MMD will determine P (cid:54)= Q if the distance between the embeddings of the\nempirical distributions for these two samples in a RKHS is signi\ufb01cantly large.\n\n1We have tried more general dependence measures such as the HSIC (Hilbert-Schmidt Independence Crite-\n\nrion) without observing gains that justify the increase of computational costs.\n\n6\n\n\fTable 1: Average TLL obtained by NPRV, GRV and KDE on six different UCI datasets.\n\nDataset\nNo. of variables\nKDE\nGRV\nNPRV\n\nAuto\n\n8\n\n1.32 \u00b1 0.06\n1.84 \u00b1 0.08\n2.07 \u00b1 0.07\n\nCloud\n\n10\n\n3.25 \u00b1 0.10\n5.00 \u00b1 0.12\n4.54 \u00b1 0.13\n\nHousing\n\n14\n\n1.96 \u00b1 0.17\n1.68 \u00b1 0.11\n3.18 \u00b1 0.17\n\nMagic\n\nPage-Blocks Wireless\n\n11\n\n1.13 \u00b1 0.11\n2.09 \u00b1 0.08\n2.72 \u00b1 0.17\n\n10\n\n1.90 \u00b1 0.13\n4.69 \u00b1 0.20\n5.64 \u00b1 0.14\n\n11\n\n0.98 \u00b1 0.06\n0.36 \u00b1 0.08\n2.17 \u00b1 0.13\n\nSemi-supervised and unsupervised domain adaptation: The proposed approach can be easily\nextended to take advantage of additional unlabeled data to improve the estimation of our model.\nSpeci\ufb01cally, extra unlabeled target task data can be used to re\ufb01ne the factors in the R-Vine decom-\nposition of pt which do not depend on y. This is still valid even in the limiting case of not having\naccess to labeled data from the target task at training time (unsupervised domain adaptation).\n\n5 Experiments\n\nTo validate the proposed method, we run two series of experiments using real world data. The \ufb01rst\nseries illustrates the accuracy of the density estimates generated by the proposed non-parametric\nvine method. The second series validates the effectiveness of the proposed framework for domain\nadaptation problems in the non-linear regression setting. In all experiments, kernel bandwidth ma-\ntrices are selected using Silverman\u2019s rule-of-thumb [21]. For comparative purposes, we include the\nresults of different state-of-the-art domain adaptation methods whose parameters are selected by a\n10-fold cross validation process on the training data.\n\nApproximations: A complete R-Vine requires the use of conditional copula functions, which are\nchallenging to learn. A common approximation is to ignore any dependence between the copula\nfunctional form and its set of conditioning variables. Note that the copula functions arguments re-\nmain to be conditioned cdfs. Moreover, to avoid excesive computational costs, we consider only\nthe \ufb01rst tree (d \u2212 1 copulas) of the R-Vine, which is the one containing the most amount of depen-\ndence between the distribution variables. Increasing the number of considered trees did not lead to\nsigni\ufb01cant performance improvements.\n\n5.1 Accuracy of Non-parametric Regular Vines for Density Estimation\n\nThe density estimates generated by the new non-parametric R-vine method (NPRV) are evaluated on\ndata from six normalized UCI datasets [9]. We compare against a standard density estimator based\non Gaussian kernels (KDE), and a parametric vine method based on bivariate Gaussian copulas\n(GRV). From each dataset, we extract 50 random samples of size 1000. Training is performed using\n30% of each random sample. Average test log-likelihoods and corresponding standard deviations\non the remaining 70% of the random sample are summarized in Table 1 for each technique. In these\nexperiments, NPRV obtains the highest average test log-likelihood in all cases except one, where it\nis outperformed by GRV. KDE shows the worst performance, due to its direct exposure to the curse\nof dimensionality.\n\n5.2 Comparison with other Domain Adaptation Methods\n\nNPRV is analyzed in a series of experiments for domain adaptation on the non-linear regression\nsetting with real-world data. Detailed descriptions of the 6 UCI selected datasets and their domains\nare available in the supplementary material. The proposed technique is compared with different\nbenchmark methods. The \ufb01rst two, GP-SOURCE and GP-ALL, are considered baselines. They are\ntwo gaussian process (GP) methods, the \ufb01rst one trained only with data from the source task, and\nthe second one trained with the normalized union of data from both source and target problems.\nThe other \ufb01ve methods are considered state-of-the-art domain adaptation techniques. DAUME [7]\nperforms a feature augmentation such that the kernel function evaluated at two points from the same\n\n7\n\n\fTable 2: Average NMSE and standard deviation for all algorithms and UCI datasets.\n\nDataset\nNo. of variables\nGP-Source\nGP-All\nDaume\nSSL-Daume\nATGP\nKMM\nKuLSIF\nNPRV\nUNPRV\nAv. Ch. Mar.\nAv. Ch. Cop.\n\nWine\n12\n\n0.86 \u00b1 0.02\n0.83 \u00b1 0.03\n0.97 \u00b1 0.03\n0.82 \u00b1 0.05\n0.86 \u00b1 0.08\n1.03 \u00b1 0.01\n0.91 \u00b1 0.08\n0.73 \u00b1 0.07\n0.76 \u00b1 0.06\n\n10\n5\n\nSarcos\n\nRocks-Mines Hill-Valleys\n\nAxis-Slice\n\n21\n\n1.80 \u00b1 0.04\n1.69 \u00b1 0.04\n0.88 \u00b1 0.02\n0.74 \u00b1 0.08\n0.79 \u00b1 0.07\n1.00 \u00b1 0.00\n1.67 \u00b1 0.06\n0.61 \u00b1 0.10\n0.62 \u00b1 0.13\n\n1\n8\n\n60\n\n0.90 \u00b1 0.01\n1.10 \u00b1 0.08\n0.72 \u00b1 0.09\n0.59 \u00b1 0.07\n0.56 \u00b1 0.10\n1.00 \u00b1 0.00\n0.65 \u00b1 0.10\n0.72 \u00b1 0.13\n0.72 \u00b1 0.15\n\n38\n49\n\n100\n\n1.00 \u00b1 0.00\n0.87 \u00b1 0.06\n0.99 \u00b1 0.03\n0.82 \u00b1 0.07\n0.15 \u00b1 0.07\n1.00 \u00b1 0.00\n0.80 \u00b1 0.11\n0.15 \u00b1 0.07\n0.19 \u00b1 0.09\n\n100\n34\n\n386\n\n1.52 \u00b1 0.02\n1.27 \u00b1 0.07\n0.95 \u00b1 0.02\n0.65 \u00b1 0.04\n1.00 \u00b1 0.01\n1.00 \u00b1 0.00\n0.98 \u00b1 0.07\n0.38 \u00b1 0.07\n0.37 \u00b1 0.07\n\n226\n155\n\nIsolet\n617\n\n1.59 \u00b1 0.02\n1.58 \u00b1 0.02\n0.99 \u00b1 0.00\n0.64 \u00b1 0.02\n1.00 \u00b1 0.00\n1.00 \u00b1 0.00\n0.58 \u00b1 0.02\n0.46 \u00b1 0.09\n0.42 \u00b1 0.04\n\n89\n474\n\ndomain is twice larger than when these two points come from different domains. SSL-DAUME [6] is\na SSL extension of DAUME which takes into account unlabeled data from the target domain. ATGP\n[4] models the source and target task data using a single GP, but learns additional kernel parameters\nto correlate input vectors between domains. This method outperforms others like the one proposed\nby Bonilla et al. [3]. KMM [11] minimizes the distance of marginal distributions in source and\ntarget domains by matching their means when mapped into an universal RKHS. Finally, KULSIF\n[15] operates in a similar way as KMM. Besides NPRV, we also include in the experiments its fully\nunsupervised variant, UNPRV, which ignores any labeled data from the target task.\nFor training, we randomly sample 1000 data points for both source and target tasks, where all the\ndata in the source task and 5% of the data in the target task are labeled. The test set contains 1000\npoints from the target task. Table 2 summarizes the average test normalized mean square error\n(NMSE) and corresponding standard deviation for each method in each dataset across 30 random\nrepetitions of the experiment. The proposed methods obtain the best results in 5 out of 6 cases.\nNotably, UNPRV (Unsupervised NPRV), which ignores labeled data from the target task, also\noutperforms the other benchmark methods in most cases. Finally, the two bottom rows in Table\n2 show the average number of marginals and bivariate copulas which are updated in each dataset\nduring the execution of NPRV, respectively.\nComputational Costs: Running NPRV requires to \ufb01ll in a weight matrix of size O(d2) with\nthe empirical estimates of Kendall\u2019s \u03c4 for any two random variables. The computation of each of\nthese estimates can be done ef\ufb01ciently with cost O(n log n), where n is the number of available\ndata points. Therefore, the \ufb01nal training cost of NPRV is O(d2n log n). In practice, we obtain\ncompetitive training times. Training NPRV for the Isolet dataset took about 3 minutes on a regular\nlaptop computer. Predictions made by a single level NPRV have cost O(nd). Parametric copulas\nmay be used to reduce the computational demands.\n\n6 Conclusions\n\nWe have proposed a novel non-parametric domain adaptation strategy based on copulas. The new\napproach works by decomposing any multivariate density into a product of marginal densities and\nbivariate copula functions. Changes in these factors across different domains can be detected using\ntwo sample tests, and transferred across domains in order to adapt the target task density model.\nA novel non-parametric vine method has been introduced for the practical implementation of this\nmethod. This technique leads to better density estimates than standard parametric vines or KDE, and\nis also able to outperform a large number of alternative domain adaptation methods in a collection\nof regression problems with real-world data.\n\n8\n\n\fReferences\n[1] K. Aas, C. Czado, A. Frigessi, and H. Bakken. Pair-copula constructions of multiple depen-\n\ndence. Insurance: Mathematics and Economics, 44(2):182\u2013198, 2006.\n\n[2] S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. Wortman. A theory of\n\nlearning from different domains. Machine Learning, 79(1):151\u2013175, 2010.\n\n[3] E. Bonilla, K. Chai, and C. Williams. Multi-task gaussian process prediction. NIPS, 2008.\n[4] B. Cao, S. Jialin, Y. Zhang, D. Yeung, and Q. Yang. Adaptive transfer learning. AAAI, 2010.\n[5] C. Cortes and M. Mohri. Domain adaptation in regression. In Proceedings of the 22nd interna-\ntional conference on Algorithmic learning theory, ALT\u201911, pages 308\u2013323, Berlin, Heidelberg,\n2011. Springer-Verlag.\n\n[6] H. Daum\u00b4e, III, Abhishek Kumar, and Avishek Saha. Frustratingly easy semi-supervised do-\nmain adaptation. Proceedings of the 2010 Workshop on Domain Adaptation for Natural Lan-\nguage Processing, pages 53\u201359, 2010.\n\n[7] H. Daum\u00b4e III. Frustratingly easy domain adaptation. Association of Computational Linguistics,\n\npages 256\u2013263, 2007.\n\n[8] J. Fermanian and O. Scaillet. The estimation of copulas: Theory and practice. Copulas: From\n\nTheory to Application in Finance, pages 35\u201360, 2007.\n\n[9] A. Frank and A. Asuncion. UCI machine learning repository, 2010.\n[10] A. Gretton, K. Borgwardt, M. Rasch, B. Scholkopf, and A. Smola. A kernel method for the\n\ntwo-sample-problem. NIPS, pages 513\u2013520, 2007.\n\n[11] J. Huang, A. Smola, A. Gretton, K. Borgwardt, and B. Schoelkopf. Correcting sample selection\n\nbias by unlabeled data. NIPS, pages 601\u2013608, 2007.\n\n[12] P. Jaworski, F. Durante, W.K. H\u00a8ardle, and T. Rychlik. Copula Theory and Its Applications.\n\nLecture Notes in Statistics. Springer, 2010.\n\n[13] S. Jialin-Pan and Q. Yang. A survey on transfer learning. IEEE Transactions on Knowledge\n[14] H. Joe. Families of m-variate distributions with given margins and m(m \u2212 1)/2 bivariate\n\nand Data Engineering, 22(10):1345\u20131359, 2010.\n\ndependence parameters. Distributions with Fixed Marginals and Related Topics, 1996.\n\n[15] T. Kanamori, T. Suzuki, and M. Sugiyama. Statistical analysis of kernel-based least-squares\n\ndensity-ratio estimation. Machine Learning, 86(3):335\u2013367, 2012.\n\n[16] D. Kurowicka and R. Cooke. Uncertainty Analysis with High Dimensional Dependence Mod-\n\nelling. Wiley Series in Probability and Statistics, 1st edition, 2006.\n\n[17] Y. Mansour, M. Mohri, and A. Rostamizadeh. Domain adaptation: Learning bounds and algo-\n\nrithms. In COLT, 2009.\n\n[18] R. Nelsen. An Introduction to Copulas. Springer Series in Statistics, 2nd edition, 2006.\n[19] S. Nitschke, E. Kidd, and L. Serratrice. First language transfer and long-term structural priming\n\nin comprehension. Language and Cognitive Processes, 5(1):94\u2013114, 2010.\n\n[20] R. C. Prim. Shortest connection networks and some generalizations. Bell System Technology\n\nJournal, 36:1389\u20131401, 1957.\n\n[21] B.W. Silverman. Density Estimation for Statistics and Data Analysis. Monographs on Statistics\n\nand Applied Probability. Chapman and Hall, 1986.\n\n[22] A. Sklar. Fonctions de repartition `a n dimension set leurs marges. Publ. Inst. Statis. Univ.\n\nParis, 8(1):229\u2013231, 1959.\n\n9\n\n\f", "award": [], "sourceid": 307, "authors": [{"given_name": "David", "family_name": "Lopez-paz", "institution": null}, {"given_name": "Jose", "family_name": "Hern\u00e1ndez-lobato", "institution": null}, {"given_name": "Bernhard", "family_name": "Sch\u00f6lkopf", "institution": null}]}