{"title": "A Bayes-Sard Cubature Method", "book": "Advances in Neural Information Processing Systems", "page_first": 5882, "page_last": 5893, "abstract": "This paper focusses on the formulation of numerical integration as an inferential task. To date, research effort has largely focussed on the development of Bayesian cubature, whose distributional output provides uncertainty quantification for the integral. However, the point estimators associated to Bayesian cubature can be inaccurate and acutely sensitive to the prior when the domain is high-dimensional. To address these drawbacks we introduce Bayes-Sard cubature, a probabilistic framework that combines the flexibility of Bayesian cubature with the robustness of classical cubatures which are well-established. This is achieved by considering a Gaussian process model for the integrand whose mean is a parametric regression model, with an improper prior on each regression coefficient. The features in the regression model consist of test functions which are guaranteed to be exactly integrated, with remaining degrees of freedom afforded to the non-parametric part. The asymptotic convergence of the Bayes-Sard cubature method is established and the theoretical results are numerically verified. In particular, we report two orders of magnitude reduction in error compared to Bayesian cubature in the context of a high-dimensional financial integral.", "full_text": "A Bayes\u2013Sard Cubature Method\n\nToni Karvonen\n\nAalto University, Finland\n\ntoni.karvonen@aalto.fi\n\nChris. J. Oates\n\nNewcastle University, UK\nAlan Turing Institute, UK\nchris.oates@ncl.ac.uk\n\nSimo S\u00e4rkk\u00e4\n\nAalto University, Finland\n\nsimo.sarkka@aalto.fi\n\nAbstract\n\nThis paper focusses on the formulation of numerical integration as an inferential\ntask. To date, research effort has largely focussed on the development of Bayesian\ncubature, whose distributional output provides uncertainty quanti\ufb01cation for the\nintegral. However, the point estimators associated to Bayesian cubature can be\ninaccurate and acutely sensitive to the prior when the domain is high-dimensional.\nTo address these drawbacks we introduce Bayes\u2013Sard cubature, a probabilistic\nframework that combines the \ufb02exibility of Bayesian cubature with the robustness\nof classical cubatures which are well-established. This is achieved by considering\na Gaussian process model for the integrand whose mean is a parametric regression\nmodel, with an improper prior on each regression coef\ufb01cient. The features in\nthe regression model consist of test functions which are guaranteed to be exactly\nintegrated, with remaining degrees of freedom afforded to the non-parametric part.\nThe asymptotic convergence of the Bayes\u2013Sard cubature method is established and\nthe theoretical results are numerically veri\ufb01ed. In particular, we report two orders\nof magnitude reduction in error compared to Bayesian cubature in the context of a\nhigh-dimensional \ufb01nancial integral.\n\n1\n\nIntroduction\n\nThis paper considers the numerical approximation of an integral I(f\u2020) :=(cid:82)\n\nD f\u2020d\u03bd of a continuous\nintegrand f\u2020 : D \u2192 R against a Borel distribution \u03bd de\ufb01ned on a domain D \u2286 Rd. The approxima-\ntion of such integrals is a fundamental task in applied mathematics, statistics and machine learning,\nand is usually achieved using an n-point cubature rule\n\nn(cid:88)\n\nIn(f\u2020) :=\n\nwif\u2020(xi) \u2248 I(f\u2020)\n\ni=1\n\nwith some weights w = (w1, . . . , wn) \u2208 Rn and points (or nodes) X = {x1, . . . , xn} \u2282 Rd. The\nscope and ambition of modern scienti\ufb01c and industrial computer codes is such that the integrand f\u2020\ncan often represent the output of a complex computational model. In such cases the evaluation of the\nintegrand is associated with a substantial resource cost and, as a consequence, the total number of\nevaluations will be limited. The research challenge, in these circumstances, manifests not merely in\nthe design of a cubature method but also in the assessment of the associated error.\nThe (generalised) Bayesian cubature (BC) method [27, 35, 29] provides a statistical approach to error\nassessment. In brief, let \u2126 be a probability space and consider a hypothetical Bayesian agent who\nrepresents their epistemic uncertainties in the form of a stochastic process f : D \u00d7 \u2126 \u2192 R. This\nstochastic process must arise from a Bayesian regression model and be consistent with obtained\nevaluations of the true integrand, typically provided on a discrete point set {xi}n\ni=1 \u2282 D; that is\nf (xi, \u03c9) = f\u2020(xi) for almost all \u03c9 \u2208 \u2126. The stochastic process acts as a stochastic model for the\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\funcertainty for the value of the integral I(f\u2020) of interest.\n\nintegrand f\u2020, implying a random variable \u03c9 (cid:55)\u2192(cid:82)\nThe output of a (generalised) BC method is the law of the random variable \u03c9 (cid:55)\u2192(cid:82)\n\nD f (\u00b7, \u03c9)d\u03bd that represents the agent\u2019s epistemic\nD f (\u00b7, \u03c9)d\u03bd. The\nmean of this output provides a point estimate for the integral, whilst the standard deviation indicates\nthe extent of the agent\u2019s uncertainty regarding the integral. The properties of this probabilistic output\nhave been explored in detail for the case of a centred Gaussian stochastic process (the standard BC\nmethod): In certain situations the mean has been shown to coincide with a kernel-based integration\nmethod [32] that is rate-optimal [1, 5], robust to misspeci\ufb01cation of the agent\u2019s belief [19, 20] and\nef\ufb01ciently computable [32, 22, 23, 18]. The non-Gaussian case and related extensions have been\nexplored empirically in [24, 37, 13, 31, 6]. The method has also been discussed in connection with\nprobabilistic numerics; see [10, 14, 7] for general background.\nHowever, it remains the case that non-probabilistic numerical integration methods, such as Gaussian\ncubatures [11] and quasi-Monte Carlo methods [16], are more widely used, due in part to how their\nease-of-use or reliability are perceived. This is despite the well-known fact that the trapezoidal rule\nand other higher-order spline methods [8] can be naturally cast as Bayesian cubatures if the stochastic\nprocess f is selected suitably [10]. It is also known that Gaussian cubature can be viewed as a special\n(in fact, degenerate) case of a kernel method [46, 21]. However, no overall framework to derive\nprobabilistic analogies of popular cubatures, with corresponding ease-of-use and reliability, has yet\nbeen developed.\nThis paper argues that the perceived performance gap between probabilistic and non-probabilistic\nmethods should be reconsidered. To this end, we consider a non-parametric Bayesian regression\nmodel augmented with a parametric component. The features in the parametric component, that is\nthe pre-speci\ufb01ed \ufb01nite set of basis functions, will be denoted \u03c0. Then, an improper prior limit on\nthe regression coef\ufb01cients (see [48, 33] and [42, Sec. 2.7]) is studied. This gives rise to Bayes\u2013Sard\ncubature1 (BSC), which differs at a fundamental level to standard BC, in that the functions in \u03c0 are\nnow exactly integrated. The extension is similar, though not identical, to that proposed in 1974 by\nLarkin [28] and in 1991 by O\u2019Hagan [35], and non-probabilistic versions have appeared independently\nin [3, 9] in the context of interpolation with conditionally positive de\ufb01nite kernels and optimal\napproximation in reproducing kernel Hilbert spaces. For other recent work, see also [40, Sec. 2.4].\nOur contributions therefore include (i) establishing a coherent and comprehensive Gaussian process\nframework for BSC; (ii) rigorous study of convergence and conditions that need to be established on\n\u03c0; (iii) empirical experiments that demonstrate improved accuracy in high dimensions and robustness\nto misspeci\ufb01ed kernel parameters compared to BC; and (iv) the important observation that, when the\ndimension of the function space \u03c0 matches the number of cubature nodes, the BSC method can be\nused to endow any cubature rule with a meaningful probabilistic output.\n\n2 Methods\n\nThis section contains our novel methodological development, which begins with specifying a Bayesian\nregression model for the integrand.\n\n2.1 A Bayesian Regression Model\n\nThis section serves to set up a generic Bayesian regression framework, which is essentially identical\nto that described in [33, 48] and [44, Sec. 4.1.2]. See also [42, Sec. 2.7] and [30]. This will act as the\nstochastic model f : D \u00d7 \u2126 \u2192 R for our subsequent development.\n\n2.1.1 Gaussian Process Prior\nRecall that a Gaussian process is a function-valued random variable \u03c9 (cid:55)\u2192 f (\u00b7, \u03c9) such that\nf (\u00b7, \u03c9) \u2208 C 0(D) and \u03c9 (cid:55)\u2192 Lf (\u00b7, \u03c9) is a (univariate) Gaussian for all continuous linear functionals\nL on C 0(D). Here \u03c9 denotes a generic element of an underlying probability space \u2126. See [4]\n\n1Our terminology is motivated by resemblance to the (non-probabilistic) method of Sard [45] for selecting\nweights for given n nodes by \ufb01xing a polynomial space of degree m < n on which the integration rule must\nbe exact and disposing of the remaining n \u2212 m \u2212 1 degrees of freedom by minimising an appropriate error\nfunctional. See also [47] and [26].\n\n2\n\n\ffor further background. Following the notational convention in [42], we suppress the argument \u03c9\nand denote by f (x) \u223c GP(s(x), k(x, x(cid:48))) a Gaussian process with mean function s \u2208 C 0(D) and\npositive de\ufb01nite covariance kernel k \u2208 C 0(D \u00d7 D). The characterising property of this Gaussian\nprocess is that f (x1), . . . , f (xn) are jointly Gaussian with the mean vector [s(x1), . . . , s(xn)] and\ncovariance matrix [KX ]ij = k(xi, xj) for all sets X = {x1, . . . , xn} \u2282 D.\nOur starting point in this paper will be to endow a hypothetical Bayesian agent with the following\nprior model for the integrand:\nDe\ufb01nition 2.1 (Prior). Let \u03c0 be a \ufb01nite-dimensional linear subspace of real-valued functions on D\nand {p1, . . . , pQ} a basis of \u03c0, so that Q = dim(\u03c0). Then, for some positive de\ufb01nite covariance\nmatrix \u03a3 \u2208 RQ\u00d7Q, we consider the following hierarchical prior model:\n\nf (x) | \u03b3 \u223c GP(cid:0)s(x), k(x, x(cid:48))(cid:1),\n\nQ(cid:88)\n\ns(x) =\n\n\u03b3jpj(x),\n\n\u03b3 \u223c N (0, \u03a3).\n\nj=1\n\nThe mean function s \u2208 \u03c0 is parametrised by \u03b31, . . . , \u03b3Q \u2208 R. Such a prior could arise, for example,\nwhen a parametric linear regression model is assumed and a non-parametric discrepancy term added\nto allow for misspeci\ufb01cation of the parametric part [25]. Note that a non-zero mean \u03b7 \u2208 RQ could be\nspeci\ufb01ed for \u03b3; this is done in the derivations contained in supplementary material. For the proposed\nmethod to be implementable, the functions p1, . . . , pQ must be known and their integrals available.\n\n2.1.2 Gaussian Process Posterior\nIn a regression context, the data consist of input-output pairs DX = {(xi, f\u2020(xi))}n\ni=1, based on a\n\ufb01nite point set X that, in this paper, is considered \ufb01xed. The elements of X are assumed to be distinct.\nOur interest is in the Bayesian agent\u2019s belief, after the data DX are observed. The posterior is de\ufb01ned\nas the law of the stochastic process which is obtained by conditioning the prior stochastic process\non DX. That the posterior, denoted f | DX, is again a Gaussian stochastic process is a well-known\nresult (for technical details, see e.g. [38]).\n\n\u2020\nX) denote the column vector with entries f (xi) (resp. f\u2020(xi)). Let p(x) be the row\nLet fX (resp. f\nvector with entries pj(x) and let PX denote the n \u00d7 Q Vandermonde matrix with [PX ]i,j = pj(xi).\nLet kX (x) denote the row vector with entries k(x, xj) and let KX denote the kernel matrix with\n[KX ]i,j = k(xi, xj). For the prior in Def. 2.1 we have the following result:\nTheorem 2.2 (Posterior). In the posterior, f (x) | DX \u223c GP(sX,\u03a3(f\u2020)(x), kX,\u03a3(x, x(cid:48))) where\n\nsX,\u03a3(f\u2020)(x) = kX (x)\u03b1 + p(x)\u03b2\n\n= [kX (x) + p(x)\u03a3P (cid:62)\n\nX ][KX + PX \u03a3P (cid:62)\n\nX ]\u22121f\n\n\u2020\nX ,\n\nkX,\u03a3(x, x(cid:48)) = k(x, x(cid:48)) + p(x)\u03a3p(x(cid:48))(cid:62)\n\u2212 [kX (x) + p(x)\u03a3P (cid:62)\n\n(1)\n\n(2)\n\n(3)\n\nand the coef\ufb01cients \u03b1 and \u03b2 are de\ufb01ned via the invertible linear system\n\n(cid:20) KX\n\nPX\nX \u2212\u03a3\u22121\nP (cid:62)\n\nX ][KX + PX \u03a3P (cid:62)\n(cid:20)\n\n(cid:21)(cid:20) \u03b1\n\n(cid:21)\n\nX ]\u22121[kX (x(cid:48)) + p(x(cid:48))\u03a3P (cid:62)\nX ](cid:62)\n(cid:21)\n\n=\n\n\u03b2\n\n\u2020\nf\nX\n0\n\n.\n\nThe proofs for all results are contained in the supplementary material, unless otherwise stated. Note\nthat the posterior is consistent with the data DX, in the sense that the posterior mean sX,\u03a3(f\u2020)(x)\ncoincides with the value f\u2020(x) at the locations x \u2208 X and, moreover, the posterior variance vanishes\nat each x \u2208 X. These facts imply that sample paths from f | DX almost surely satisfy fX = f\nRemark 2.3 (Standard Bayesian cubature; BC). Based on Eqns. 1 and 2, it is apparent that if we set\n\u03c0 = \u2205, then the posterior reduces to a Gaussian process with mean and covariance\nkX,0(x, x(cid:48)) = k(x, x(cid:48)) \u2212 kX (x)K\u22121\nThis is precisely the stochastic process used in the standard BC method [29, 5].\n\nsX,0(f\u2020)(x) = kX (x)K\u22121\nX f\n\nX kX (x(cid:48))(cid:62).\n\n\u2020\nX ,\n\n\u2020\nX.\n\nThe need for the Bayesian agent to elicit a covariance matrix \u03a3 appears to prevent automatic use of\nthis prior model. For this reason, we consider the \ufb02at prior limit as \u03a3\u22121 \u2192 0, which corresponds to a\nparticular encoding of an absence of prior information about the value of the parameter \u03b3 in Def. 2.1.\n\n3\n\n\fFigure 1: Posterior mean (blue) and 95% credible intervals (gray) given four data points (red) for\nthe prior model of Def. 2.1, with the linear space \u03c0 taken as the set of polynomials with degree \u2264 3.\nThe Gaussian kernel with length-scale (cid:96) = 0.8 was used. The unique polynomial interpolant of\ndegree 3 to the data (dashed) is plotted for comparison. Note convergence of the posterior mean to\nthe polynomial interpolant as \u03a3\u22121 \u2192 0.\n\n2.1.3 Flat Prior Limit\n\nIn this section we ask whether the Gaussian process posterior is well-de\ufb01ned in the \ufb02at prior limit\n\u03a3\u22121 \u2192 0. For this, we need the concept of unisolvency [49, Sec. 2.2]:\nDe\ufb01nition 2.4 (Unisolvency). Let \u03c0 be a \ufb01nite-dimensional linear subspace of real-valued functions\non D. A set X = {x1, . . . , xn} \u2282 D with n \u2265 dim(\u03c0) is called \u03c0-unisolvent if the zero function is\nthe only element in \u03c0 that vanishes on X. (Examples are provided in Sec. B of the supplement.)\nTheorem 2.5 (Flat prior limit). Assume that X is a \u03c0-unisolvent set. For the prior in Def. 2.1 we\nhave that sX,\u03a3(f\u2020) \u2192 sX (f\u2020) and kX,\u03a3 \u2192 kX pointwise as \u03a3\u22121 \u2192 0, where\n\nsX (f\u2020)(x) = kX (x)\u03b1 + p(x)\u03b2,\nkX (x, x(cid:48)) = k(x, x(cid:48)) \u2212 kX (x)K\u22121\n\n+(cid:2)kX (x)K\u22121\n\nand the coef\ufb01cients \u03b1 and \u03b2 are de\ufb01ned via the invertible linear system\n\nX kX (x(cid:48))(cid:62)\n\nX PX \u2212 p(x)(cid:3)[P (cid:62)\n(cid:21)(cid:20) \u03b1\n(cid:20) KX PX\n\nX PX ]\u22121(cid:2)kX (x(cid:48))K\u22121\n(cid:20)\n\n(cid:21)\n\nX K\u22121\n(cid:21)\n\n=\n\n.\n\nP (cid:62)\n\nX\n\n0\n\n\u03b2\n\n\u2020\nf\nX\n0\n\nX PX \u2212 p(x(cid:48))(cid:3)(cid:62)\n\n,\n\n(4)\n\n(5)\n\n(6)\n\nQ such that p =(cid:80)Q\n\nThe following observation, illustrated in Fig. 1, will be important:\nProposition 2.6. Assume that X is a \u03c0-unisolvent set. Then sX (p) = p whenever p \u2208 \u03c0.\n\n1, . . . , \u03b2(cid:48)\n\nProof. If p \u2208 \u03c0, there exist coef\ufb01cients \u03b2(cid:48)\njpj. That is, a particular\nsolution of Eqn. 6 is \u03b1 = 0 and \u03b2 = \u03b2(cid:48). The linear system being invertible, this must be the only\nsolution. We deduce that sX (p) = p.\nIn particular, if dim(\u03c0) = n, the posterior mean reduces to the unique interpolant in \u03c0 to the data DX\nwhile the posterior covariance is non-zero. This observation will enable us to endow any classical\ncubature rule with a non-degenerate probabilistic output in Sec. 2.4. Next we turn our attention to\nestimation of the unknown value of the integral.\n\ni=1 \u03b2(cid:48)\n\n2.2 The Bayes\u2013Sard Framework\n\nRecall that the output of a generalised BC method is the push-forward \u03c9 (cid:55)\u2192(cid:82)\n\nD f (\u00b7, \u03c9)d\u03bd of the\nstochastic process f | DX through the integration operator I. This random variable will be denoted\nI(f ) | DX. In this section we present the BSC method, which is based on the prior model with\n\u03a3\u22121 \u2192 0 studied in Sec. 2.1.3. It will be demonstrated that BSC differs, at a fundamental level, from\nthe standard BC method in that the elements of \u03c0 are exactly integrated.\nLet k\u03bd(x) = I(k(\u00b7, x)) denote the kernel mean function and k\u03bd,\u03bd = I(k\u03bd) its integral. De\ufb01ne the\nrow vectors p\u03bd and k\u03bd,X to have respective entries [p\u03bd]j = I(pj) and [k\u03bd,X ]j = k\u03bd(xj).\n\n4\n\n\u03a3\u22121=10I\u03a3\u22121=I\u03a3\u22121=0.1I\fTheorem 2.7 (Bayes\u2013Sard cubature; BSC). Consider the Gaussian process\n\nf (x) | DX \u223c GP(cid:0)sX,\u03a3(f\u2020)(x), kX,\u03a3(x, x(cid:48))(cid:1)\n\n(cid:1)w\u03c0,\n\nde\ufb01ned in Thm. 2.2 and suppose that X is a \u03c0-unisolvent point set. Then, as \u03a3\u22121 \u2192 0, the mean and\nvariance of I(f ) | DX converge to\n\n\u00b5X (f\u2020) = w(cid:62)\nk f\n\n\u2020\nX\n\nand\n\nX = k\u03bd,\u03bd \u2212 k\u03bd,X K\u22121\n\u03c32\n\nX k(cid:62)\n\nX PX \u2212 p\u03bd\n\nrespectively, where the weight vectors wk \u2208 Rn and w\u03c0 \u2208 RQ are obtained from the solution of the\ninvertible linear system\n\n\u03bd,X +(cid:0)k\u03bd,X K\u22121\n(cid:20) k(cid:62)\n\n(cid:21)\n\n(cid:20) KX PX\n\n(cid:21)(cid:20) wk\n\n(cid:21)\n\n.\n\n\u03bd\n\nX\n\n0\n\n=\n\n\u03bd,X\n\nw\u03c0\n\np(cid:62)\n\nP (cid:62)\n\n(7)\nThe posterior mean indeed takes the form of a cubature rule, with weights wk,i and points xi \u2208 X.\nThis provides a point estimator for the integral I(f\u2020), while the posterior variance enables uncertainty\nto be assessed. The Bayes\u2013Sard nomenclature derives from the fact that the associated cubature rule\n\u00b5X is exact on the space \u03c0 (recall Prop. 2.6; the proof is also similar):\nProposition 2.8. Assume that X is a \u03c0-unisolvent set. Then \u00b5X (p) = I(p) whenever p \u2208 \u03c0.\nThus we have a probabilistic framework that combines the \ufb02exibility of BC with the robustness of\nclassical numerical integration techniques, for instance based on a polynomial exactness criteria\nbeing satis\ufb01ed.\nRemark 2.9. Rates of convergence identical to those appearing in [5, 20] for the BC can be derived\nfor the BSC method by using results in [49]. Details are contained in Sec. C of the supplement.\n\n2.3 Normalised Bayesian Cubature\nThe difference between BSC and BC is perhaps best illustrated in the case \u03c0 = {1}, also considered\nin [24, 41, 18], where constant functions are exactly integrated in BSC but not in BC. Indeed, PX = 1,\nthe n-vector of ones, and\n\n(cid:18)\nI \u2212 K\u22121\n\n11(cid:62)\n1(cid:62)K\u22121\n\nX\n\n1\n\nX\n\n(cid:19)\n\nwk =\n\nK\u22121\nX k(cid:62)\n\n\u03bd,X +\n\nK\u22121\n1(cid:62)K\u22121\n\nX\n\n1\n\nX\n\n.\n\n1\n\nThese weights have the desirable property of summing up to one; we might therefore call this a\nnormalised Bayesian cubature method. Furthermore, if the kernel is parametrised by a length-scale\nparameter and this parameter is too small, then wk,i \u2248 1/n, which is a reasonable default. This\nshould be contrasted with BC, for which the weights wk,i \u2248 0 become degenerate instead.\n\n2.4 Reproduction of Classical Cubature Rules\n\nIn this section we indicate how any cubature rule can be endowed with a probabilistic interpretation\nunder the Bayes\u2013Sard framework. Recall that every continuous positive de\ufb01nite kernel k induces a\nunique reproducing kernel Hilbert space (RKHS) H(k) \u2282 C 0(D) with norm denoted (cid:107)\u00b7(cid:107)k [2]. It is\n\u03bd,X \u2208 Rn of the standard BC method (recall Rmk. 2.3)\nwell-known that the weights wBC := K\u22121\nare worst-case optimal in H(k):\n\nX k(cid:62)\n\nwBC = arg min\nw\u2208Rn\n\nek(X, w),\n\nek(X, w) := sup\n(cid:107)h(cid:107)k\u22641\n\nD\n\nwih(xi)\n\nwhere ek(X, w) is the worst-case error (WCE) of the cubature rule speci\ufb01ed by the points X and\nweights w. Furthermore, the posterior standard deviation coincides with ek(X, wBC). See [26, 43, 32]\nfor details on optimal cubature rules in RKHS. It is now shown that, when dim(\u03c0) = n, the BSC\nweights in Thm. 2.7 do not depend on the kernel and the standard deviation coincides with the WCE:\nTheorem 2.10. Suppose that dim(\u03c0) = n and let X be a \u03c0-unisolvent set. Then\n\n\u00b5X (f\u2020) = w(cid:62)\nk f\n\nand\n\nk = p\u03bdP \u22121\nw(cid:62)\n\n\u2020\nX ,\n\u00b5X (p) = I(p)\nX = ek(X, wk)2 = k\u03bd,\u03bd \u2212 2kX,\u03bdwk + w(cid:62)\n\u03c32\n\nand\n\nX\n\nk KX wk.\n\nfor every p \u2208 \u03c0\n\n5\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:90)\n\nhd\u03bd \u2212 n(cid:88)\n\ni=1\n\n(cid:12)(cid:12)(cid:12)(cid:12),\n\n\fX )(cid:62)(cid:0)K\u22121\n\n\u2020\n\n(cid:1)f\n\n\u2020\nX \u00d7 \u03c32\nX .\n\nThat is, the BSC weights wk are the unique weights for a cubature rule with the points X such that\nevery function in \u03c0 is integrated exactly and the posterior standard deviation \u03c3X coincides with the\nWCE in the RKHS H(k).\nCorollary 2.11. Consider an n-point cubature rule with points X and non-zero weights w \u2208 Rn\nand assume that \u03bd admits a positive density function (w.r.t. the Lebesgue measure). Then there is a\nfunction space \u03c0 of dimension n, such that the BSC method recovers wk = w and \u03c32\nX = ek(X, w)2,\nas de\ufb01ned in Thm. 2.7.\n\nThus any cubature rule can be recovered as a posterior mean for some prior (brie\ufb02y alluded to\nin [35, Sec. 2.3] in a more limited setting and lacking RKHS machinery). Our result goes beyond\nearlier work in [46, 21], in the sense that the associated posterior is non-degenerate (i.e. has non-zero\nvariance) in the Bayes\u2013Sard framework. Further discussion is provided in Sec. D of the supplement.\nFrom a practical perspective, this enables us to simultaneously achieve the same reliable integration\nperformance as popular non-probabilistic rules (see Sec. C.2 in the supplement) and to perform\nformal uncertainty quanti\ufb01cation for the integral.\nRemark 2.12. The function space alluded to in Cor. 2.11 can be constructed explicitly. The general\nconstruction is somewhat arti\ufb01cial, but can be made more appealing if the weights arise from, for\nexample, a natural polynomial exactness criterion. See Sec. A.2 of the supplement for details.\n\n2.5 On Weakly-Informative Priors\n\nAs mentioned earlier, methods similar to ours were proposed by O\u2019Hagan [35]. See also [28, 36, 24]\nand, in particular, [34, Sec. 3.6]. Following [35], let k(x, x(cid:48)) = \u03bbk0(x, x(cid:48)) for some base kernel k0\nand consider the improper prior p(\u03b3, \u03bb) \u221d 1/\u03bb. It can then be shown that the marginal posterior for\nI(f ) is Student-t, with n \u2212 Q degrees of freedom and mean equal to \u00b5X (f\u2020) in our work, but whose\nvariance is instead\n\n1\n\nn \u2212 Q \u2212 2\n\n(f\n\nX \u2212 K\u22121\n\nX PX [P (cid:62)\n\nX K\u22121\n\nX PX ]\u22121P (cid:62)\n\nX K\u22121\n\nX\n\nThat is, when n \u2212 2 < Q \u2264 n, this posterior is not well-de\ufb01ned. As a consequence of this prior\nspeci\ufb01cation one cannot, as opposed to Cor. 2.11 that requires Q = n, associate every cubature rule\nwith a non-degenerate posterior. Thus one of the principal advantages of using the weakly-informative\ninformative prior, obtained as a limit of Gaussians, considered in this paper is that the worst-case\nerror can be reinterpreted as a posterior standard deviation. However, to ensure this variance provides\nmeaningful quanti\ufb01cation of uncertainty can be challenging. This is discussed in Secs. 3.1 and 3.4.\n\n3 Experimental Results\n\nThis section contains three numerical experiments, which investigate the empirical performance of the\nBSC method and the associated uncertainty quanti\ufb01cation that is provided. The examples demonstrate\nthat BSC is typically at least as accurate as BC whilst being less sensitive to misspeci\ufb01cation of the\nkernel length-scale parameter.\n\n3.1 On Choosing the Kernel Parameters\n\nThe stationary kernels often used in Gaussian process regression are parametrised by positive length-\nscale2 (cid:96) and amplitude \u03bb:\n\nk(x, x(cid:48)) = k(x \u2212 x(cid:48)) = \u03bbk0\n\n(cid:0)(x \u2212 x(cid:48))/(cid:96)(cid:1)\n\nfor, in a slight abuse of notation, some base kernel k0. Adapting these parameters in a data-dependent\nway is an essential prerequisite for meaningful quanti\ufb01cation of uncertainty for the integral. After\ntaking the limit \u03a3\u22121 \u2192 0, that yields the BSC, we proceed to set these parameters independently,\nfollowing the approach suggested in [5, Sec. 4.1], but as if the prior model were\n\nf (x) | (cid:96), \u03bb \u223c GP(cid:0)0, \u03bbk0((x \u2212 x(cid:48))/(cid:96))(cid:1).\n\nThis procedure, though admittedly somewhat unsound, appears to work well in the examples of Secs.\n3.2 and 3.4. That is, we (i) assign \u03bb the improper prior p(\u03bb) \u221d 1/\u03bb and marginalise over it so that the\n\n2In general, a distinct length-scale parameter for each dimension could be used.\n\n6\n\n\fFigure 2: Approximation of the integral in Eqn. 8 using BC and BSC with \u03c0 = \u03a0m(R) for\nm = 1, 3, 5, both based on the Gaussian kernel. The n nodes were placed uniformly on [\u2212\u221a\nn ].\nLeft: Uncertainty quanti\ufb01cation (UQ) provided by BSC with m = 3 and BC when kernel parameters\nwere selected as outlined in Sec. 3.1. Middle & right: Effect of (cid:96) on approximation accuracy. The\nupper row presents the relative integration error |I(f\u2020) \u2212 In(f\u2020)|/I(f\u2020) for each cubature rule In\nas a function of (cid:96). The \u201coptimal\u201d length-scales (cid:96)EB, as computed by EB, are also shown. The lower\nrow contains the corresponding posterior means when an inappropriate value, (cid:96) = 0.3, is used. Since\ndim(\u03a05(R)) = 6, that BSC for m = 5 and n = 6 is independent of (cid:96) is a consequence of Thm. 2.10.\n\n\u221a\n\nn,\n\n\u2020\nX \u00d7 \u03c32\nBSC posterior becomes Student-t with the mean \u00b5X (f\u2020), variance (n \u2212 2)\u22121(f\nX\nand n degrees of freedom [35, Sec. 2.2] and (ii) set (cid:96) using empirical Bayes (EB) based on the\nGaussian log-marginal likelihood [42, Sec. 5.4.1]\n\n\u2020\nX )(cid:62)K\u22121\nX f\n\n(cid:21)\n\n(cid:20)\n\n(cid:96)EB = arg max\n\n(cid:96)>0\n\n\u2212 1\n2\n\n\u2020\nX )(cid:62)K\u22121\nX f\n\n(f\n\n\u2020\nX \u2212 1\n\n2\n\nlog det(KX )\n\n.\n\nThere are of course other possibilities that could be explored, such as cross-validation or, when\nQ < n, using the likelihood of the regression model set up in Sec. 2.1 (see [42, Eqn. 2.45]).\n\n3.2 A One-Dimensional Toy Example\n\n(cid:18)\n\n(cid:19)\n\n(cid:90)\n\nR\n\nOur \ufb01rst example is one-dimensional. The test function and its integral that we considered were\nf\u2020(x)e\u2212x2/2dx \u2248 2.0693.\nf\u2020(x) = exp\n(8)\nThe effect of the length-scale (cid:96) of the Gaussian kernel k(x, x(cid:48)) = exp(\u2212(x \u2212 x(cid:48))2/(2(cid:96)2)) on the\nperformance of standard BC and BSC of Sec. 2.2, with\n\nsin(2x) \u2212 x2\n5\n\nI(f\u2020) =\n\n1\u221a\n2\u03c0\n\nand\n\n+\n\nx2\n2\n\n\u03c0 = \u03a0m(R) := span{x\u03b1 : \u03b1 \u2208 Nd\n\n0, \u03b11 + \u00b7\u00b7\u00b7 + \u03b1d \u2264 m}, where x\u03b1 = x\u03b11\n\n1 \u00d7 \u00b7\u00b7\u00b7 x\u03b1d\nd ,\nfor different m, was investigated and the quality of the uncertainty quanti\ufb01cation was assessed.\nResults are depicted in Fig. 2. It can be observed that the BSC is more robust compared to BC when\nthe length-scale is misspeci\ufb01ed (in particular, when it is too small). This is because the polynomial\npart mitigates the tendency of the posterior mean to revert quickly back to zero. For reasonable values\nof the length-scale, the accuracy of the different methods is comparable. The BSC and BC provide\nqualitatively similar quanti\ufb01cation of uncertainty and both exhibit a degree of over-con\ufb01dence, as\n\n7\n\n1.92I(f\u2020)2.1UQ:BSC(m=3)95%credibleintervalBSCestimate0.10.5\u2018EB11.510\u2212410\u2212310\u2212210\u22121100\u2018n=60.10.5\u2018EB11.510\u2212610\u2212510\u2212410\u2212310\u2212210\u22121100\u2018Relativeerrorn=12BCm=1m=3m=5510151.92I(f\u2020)2.1nUQ:BC95%credibleintervalBCestimate\u22124\u221220240510x\u22124\u221220240510xf\u2020BCBSC(m=3)\fFigure 3: Approximation of the d-dimensional integral (9) using BC and BSC with \u03c0 = \u03a01(Rd),\nboth based on product Mat\u00e9rn kernel with \u03c1 = 5/2 (see Eqn. C15 in the supplement) and length-scale\n(cid:96). Figures contain relative integration errors for each cubature rule for a given dimension and different\nlength-scales as a function of the number of nodes n, drawn randomly from the uniform distribution.\nNon-monotonicity of the BSC errors is caused by higher accuracy of this method; small \ufb02uctuations\nin error are magni\ufb01ed on the logarithmic scale. Note that the point set is almost surely unisolvent.\nFor comparison, the standard Monte Carlo approximation (MC) is also plotted.\n\nobserved already in [5, Sec. 5.1] for the BC and attributed to the manner in which the length-scale is\nselected. However, BSC is less over-con\ufb01dent. The reason for this is that the BSC variance in Thm.\n2.7 is a sum of the BC variance and a positive term.\n\n3.3 Zero Coupon Bonds\n\nThis section experiments with a high-dimensional zero coupon bond example that has been used\npreviously in numerical experiments for kernel cubature in [22, Sec. 5.5]. See [17, Sec. 6.1] and\nSec. E of the supplement for a more detailed account of this experiment. The integral of interest is\n\n(cid:34)\n\n(cid:18)\n\ndT \u22121(cid:88)\n\n(cid:19)(cid:35)\n\n(cid:34)\n= exp(\u2212\u2206trt0)E\n\nrti\n\n(cid:18)\n\n(cid:19)(cid:35)\n\ndT \u22121(cid:88)\n\nexp\n\n\u2212 \u2206t\n\nrti\n\n,\n\n(9)\n\nP (0, T ) := E\n\nexp\n\n\u2212 \u2206t\n\ni=0\n\ni=1\n\nwhere rti, i > 0, are Gaussian random variables and rt0 is a constant. This d = dT \u2212 1 dimensional\nintegral represents the price at time t = 0 of a zero coupon bond with maturity time T and arises\nfrom dT -step uniform Euler\u2013Maruyama discretisation of the Vasicek model. Existence of a closed-\nform solution for P (0, T ) makes numerical approximation of Eqn. 9 an attractive high-dimensional\nbenchmark problem.\nWe transformed the integral (9) onto the hypercube [0, 1]d and compared the accuracy of BC to BSC\nwith \u03c0 = \u03a01(Rd). Different dimensions d and length-scales (cid:96) were considered and the product\nMat\u00e9rn kernel with smoothness parameter \u03c1 = 5/2 (see Eqn. C15 in the supplement) was used. As\nin Sec. 3.2, it is apparent from Fig. 3 that the BSC is less sensitive to length-scale misspeci\ufb01cation\ncompared to the standard BC method. In this misspeci\ufb01ed case a two order of magnitude reduction\nin integration error was observed. This is attributed to the improved extrapolation performance\nconferred through the polynomial component.\n\n3.4 Uncertainty Quanti\ufb01cation for Gauss\u2013Patterson Quadrature\n\nIn this section we assess the uncertainty quanti\ufb01cation provided by Cor. 2.11 for Gauss\u2013Patterson\nquadrature rules [39], a sequence of nested classical quadrature rules. These rules are nested\nextensions of the familiar Gaussian quadratures: to an n-point quadrature rule n + 1 points are added\nso as to maximise the polynomial degree of the resulting (2n + 1)-point quadrature rule and the\nprocess is then repeated iteratively. For the uniform measure, these rules have been computed3 for\nthe sequence n = (1, 3, 7, 15, 31, 63, 127, 255, 511); see [12] for a Gaussian version.\n\n3https://people.sc.fsu.edu/~jburkardt/m_src/patterson_rule/patterson_rule.html\n\n8\n\n1,0003,0005,00010\u2212710\u2212510\u2212310\u22121nRelativeerrord=311,0003,0005,000nd=631,0003,0005,000nd=127BC(\u2018=\u221ad/4)BC(\u2018=\u221ad/2)BC(\u2018=\u221ad)BSC(\u2018=\u221ad/4)BSC(\u2018=\u221ad/2)BSC(\u2018=\u221ad)MC\fFigure 4: Uncertainty quanti\ufb01cation by the BSC for Gauss\u2013Patterson quadrature applied to the\nintegration problem in Eqn. 10. The number of nodes ranges from 3 to 511. The lower row presents\n\u2020\nthe absolute integration error |I(f\nC)| of the n-point Gauss\u2013Patterson rule for the three\nintegrals. The plotted credible bound exceeding the integration error indicates that the true integral\nvalue is contained within the 95% highest posterior density credible interval.\n\n\u2020\nC) \u2212 In(f\n\nOur \ufb01nal experiment considered computation of the integrals\n\nC(x) = exp(cid:0) sin(Cx)2 \u2212 0.5x(cid:1) +\n\n\u2020\n\n(cid:90) 8\n\n0\n\n\u2020\nC) =\n\nI(f\n\n1\n8\n\n\u2020\nC(x)dx for f\nf\n\n, C \u2208 {10, 15, 20},\n\n(10)\n\nC\n10\n\nwhich are expected to be more dif\ufb01cult to compute the larger the constant C is (see [5, Sec. 5.1]\nfor a similar example for the standard BC). We again used the Mat\u00e9rn kernel with the smoothness\nparameter \u03c1 = 5/2 and set its length-scale and magnitude parameters for each n as described in Sec.\n3.1 The results appear in Fig. 4, where we clearly observe that a larger integral variance is assigned\nfor more dif\ufb01cult integrals and that the true integral value is always contained within the 95% credible\ninterval. In particular, for small n that do not produce useful integral estimates (n = 3 yields relative\nerrors between 0.46 and 0.69 and n = 7 between 0.31 and 0.44) the posterior variance is large,\ncorrectly re\ufb02ecting signi\ufb01cant uncertainty in these estimates. This suggests that the BSC appears to\nprovide sensible uncertainty quanti\ufb01cation for Gauss\u2013Patterson rules, at least in this experiment.\n\n4 Conclusion\n\nThis paper proposed a Bayes\u2013Sard cubature method, which provides an explicit connection between\nclassical cubatures and the Bayesian inferential framework. In particular, we obtained polynomially\nexact generalisations of standard BC in Thm. 2.7 and demonstrated in Cor. 2.11 how any cubature\nrule, including widely-used cubature methods, can be recovered as the output of a probabilistic model.\nThe main practical drawback of standard BC is its acute sensitivity to the choice of kernel. As\ndemonstrated in Sec. 3, the Bayes\u2013Sard point estimator performance is more robust to the choice\nof kernel and this suggests that fast Gaussian process methods (e.g., [15, 51]) could be used for\nef\ufb01cient automatic selection of kernel parameters with little adverse effect on accuracy of the point\nestimator. On the other hand, further work is required to assess the quality of the uncertainty\nquanti\ufb01cation provided by the BSC method. This will require careful analysis that accounts for how\nkernel parameters are estimated, and is expected to be technically more challenging (see, e.g., [50]).\n\n9\n\n0246812345xIntegrandC=10C=15C=2010110200.511.52nEBlength-scale10110210\u2212610\u2212410\u22122100nIntegralstd10110210\u22121210\u2212910\u2212610\u22123100nAbsoluteerror(C=10)95%crediblebound101102nAbsoluteerror(C=15)10110210\u22121210\u2212910\u2212610\u22123100nAbsoluteerror(C=20)\fAcknowledgments\n\nThe authors are grateful for discussion with Aretha Teckentrup, Catherine Powell, Fred Hickernell\nand Filip Tronarp. TK was supported by the Aalto ELEC Doctoral School. CJO was supported\nby the Lloyd\u2019s Register Foundation programme on data-centric engineering. SS was supported\nby the Academy of Finland projects 266940, 304087 and 313708. This material was based upon\nwork partially supported by the National Science Foundation under Grant DMS-1127914 to the\nStatistical and Applied Mathematical Sciences Institute. Any opinions, \ufb01ndings, and conclusions or\nrecommendations expressed in this material are those of the author(s) and do not necessarily re\ufb02ect\nthe views of the National Science Foundation.\n\nReferences\n[1] F. Bach. On the equivalence between kernel quadrature rules and random feature expansions. Journal of\n\nMachine Learning Research, 18(21):1\u201338, 2017.\n\n[2] A. Berlinet and C. Thomas-Agnan. Reproducing Kernel Hilbert Spaces in Probability and Statistics.\n\nSpringer Science & Business Media, 2011.\n\n[3] A. Yu. Bezhaev. Cubature formulae on scattered meshes. Soviet Journal of Numerical Analysis and\n\nMathematical Modelling, 6(2):95\u2013106, 1991.\n\n[4] V. I. Bogachev. Gaussian Measures. Number 62 in Mathematical Surveys and Monographs. American\n\nMathematical Society, 1998.\n\n[5] F.-X. Briol, C. J. Oates, M. Girolami, M. A. Osborne, and D. Sejdinovic. Probabilistic integration: A role\n\nin statistical computation? (with discussion). Statistical Science, 2018. To appear.\n\n[6] H. Chai and R. Garnett. An improved Bayesian framework for quadrature of constrained integrands.\n\narXiv:1802.04782, 2018.\n\n[7] J. Cockayne, C. J. Oates, T. Sullivan, and M. Girolami. Bayesian probabilistic numerical methods.\n\narXiv:1702.03673v2, 2017.\n\n[8] P. J. Davis and P. Rabinowitz. Methods of Numerical Integration. Courier Corporation, 2007.\n\n[9] R. DeVore, S. Foucart, G. Petrova, and P. Wojtaszczyk. Computing a quantity of interest from observational\n\ndata. Constructive Approximation, 2018.\n\n[10] P. Diaconis. Bayesian numerical analysis. In Statistical Decision Theory and Related Topics IV, volume 1,\n\npages 163\u2013175. Springer-Verlag New York, 1988.\n\n[11] W. Gautschi. Orthogonal Polynomials: Computation and Approximation. Numerical Mathematics and\n\nScienti\ufb01c Computation. Oxford University Press, 2004.\n\n[12] A. Genz and B. D. Keister. Fully symmetric interpolatory rules for multiple integrals over in\ufb01nite regions\n\nwith Gaussian weight. Journal of Computational and Applied Mathematics, 71(2):299\u2013309, 1996.\n\n[13] T. Gunter, M. A. Osborne, R. Garnett, P. Hennig, and S. J. Roberts. Sampling for inference in probabilistic\nmodels with fast Bayesian quadrature. In Advances in Neural Information Processing Systems, pages\n2789\u20132797, 2014.\n\n[14] P. Hennig, M. A. Osborne, and M. Girolami. Probabilistic numerics and uncertainty in computations. Pro-\nceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 471(2179),\n2015.\n\n[15] J. Hensman, N. Durrande, and A. Solin. Variational fourier features for Gaussian processes. Journal of\n\nMachine Learning Research, 18(151):1\u201352, 2018.\n\n[16] F. Hickernell. A generalized discrepancy and quadrature error bound. Mathematics of Computation,\n\n67(221):299\u2013322, 1998.\n\n[17] M. Holtz. Sparse Grid Quadrature in High Dimensions with Applications in Finance and Insurance.\n\nNumber 77 in Lecture Notes in Computational Science and Engineering. Springer, 2011.\n\n[18] R. Jagadeeswaran and F. Hickernell.\n\narXiv:1809.09803v1, 2018.\n\nFast automatic Bayesian cubature using lattice sampling.\n\n10\n\n\f[19] M. Kanagawa, B. K. Sriperumbudur, and K. Fukumizu. Convergence guarantees for kernel-based quadra-\nture rules in misspeci\ufb01ed settings. In Advances in Neural Information Processing Systems, pages 3288\u2013\n3296, 2016.\n\n[20] M. Kanagawa, B. K. Sriperumbudur, and K. Fukumizu. Convergence analysis of deterministic kernel-based\n\nquadrature rules in misspeci\ufb01ed settings. arXiv:1709.00147v1, 2017.\n\n[21] T. Karvonen and S. S\u00e4rkk\u00e4. Classical quadrature rules via Gaussian processes. In 27th IEEE International\n\nWorkshop on Machine Learning for Signal Processing, 2017.\n\n[22] T. Karvonen and S. S\u00e4rkk\u00e4. Fully symmetric kernel quadrature. SIAM Journal on Scienti\ufb01c Computing,\n\n40(2):A697\u2013A720, 2018.\n\n[23] T. Karvonen, S. S\u00e4rkk\u00e4, and C. J. Oates.\n\narXiv:1809.10227v1, 2018.\n\nSymmetry exploits for Bayesian cubature methods.\n\n[24] M. Kennedy. Bayesian quadrature with non-normal approximating functions. Statistics and Computing,\n\n8(4):365\u2013375, 1998.\n\n[25] M. C. Kennedy and A. O\u2019Hagan. Bayesian calibration of computer models. Journal of the Royal Statistical\n\nSociety: Series B (Statistical Methodology), 63(3):425\u2013464, 2001.\n\n[26] F. M. Larkin. Optimal approximation in Hilbert spaces with reproducing kernel functions. Mathematics of\n\nComputation, 24(112):911\u2013921, 1970.\n\n[27] F. M. Larkin. Gaussian measure in Hilbert space and applications in numerical analysis. The Rocky\n\nMountain Journal of Mathematics, 2(3):379\u2013421, 1972.\n\n[28] F. M. Larkin. Probabilistic error estimates in spline interpolation and quadrature. In Information Processing\n74 (Proceedings of IFIP Congress, Stockholm, 1974), volume 74, pages 605\u2013609. North-Holland, 1974.\n\n[29] T. Minka. Deriving quadrature rules from Gaussian processes. Technical report, Statistics Department,\n\nCarnegie Mellon University, 2000.\n\n[30] A. M. Mosamam and J. T. Kent. Semi-reproducing kernel Hilbert spaces, splines and increment kriging.\n\nJournal of Nonparametric Statistics, 22(6):711\u2013722, 2010.\n\n[31] C. J. Oates, S. Niederer, A. Lee, F.-X. Briol, and M. Girolami. Probabilistic models for integration error\nin the assessment of functional cardiac models. In Advances in Neural Information Processing Systems,\npages 109\u2013117, 2017.\n\n[32] J. Oettershagen. Construction of Optimal Cubature Algorithms with Applications to Econometrics and\n\nUncertainty Quanti\ufb01cation. PhD thesis, Institut f\u00fcr Numerische Simulation, Universit\u00e4t Bonn, 2017.\n\n[33] A. O\u2019Hagan. Curve \ufb01tting and optimal design for prediction. Journal of the Royal Statistical Society.\n\nSeries B (Methodological), 40(1):1\u201342, 1978.\n\n[34] A. O\u2019Hagan. Bayesian quadrature. Technical report, Department of Statistics, University of Warwick,\n\n1988. Warwick Statistics Research Report 159.\n\n[35] A. O\u2019Hagan. Bayes\u2013Hermite quadrature. Journal of Statistical Planning and Inference, 29(3):245\u2013260,\n\n1991.\n\n[36] A. O\u2019Hagan. Some Bayesian numerical analysis. Bayesian Statistics, 4:345\u2013363, 1992.\n\n[37] M. Osborne, R. Garnett, Z. Ghahramani, D. K. Duvenaud, S. J. Roberts, and C. E. Rasmussen. Active\nlearning of model evidence using Bayesian quadrature. In Advances in Neural Information Processing\nSystems, pages 46\u201354, 2012.\n\n[38] H. Owhadi and C. Scovel. Conditioning Gaussian measure on Hilbert space. arXiv:1506.04208v2, 2015.\n\n[39] T. N. L. Patterson. The optimum addition of points to quadrature formulae. Mathematics of Computation,\n\n22:847\u2013856, 1968.\n\n[40] F. Portier and J. Segers. Monte Carlo integration with a growing number of control variates.\n\narXiv:1801.01797v3, 2018.\n\n[41] L. Pronzato and A. Zhigljavsky. Bayesian quadrature and energy minimization for space-\ufb01lling design.\n\narXiv:1808.10722v1, 2018.\n\n11\n\n\f[42] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.\n\n[43] N. Richter-Dyn. Minimal interpolation and approximation in Hilbert spaces. SIAM Journal on Numerical\n\nAnalysis, 8(3):583\u2013597, 1971.\n\n[44] T. J. Santner, B. J. Williams, and W. I. Notz. The Design and Analysis of Computer Experiments. Springer\n\nSeries in Statistics. Springer, 2003.\n\n[45] A. Sard. Best approximate integration formulas; best approximation formulas. American Journal of\n\nMathematics, 71(1):80\u201391, 1949.\n\n[46] S. S\u00e4rkk\u00e4, J. Hartikainen, L. Svensson, and F. Sandblom. On the relation between Gaussian process\n\nquadratures and sigma-point methods. Journal of Advances in Information Fusion, 11(1):31\u201346, 2016.\n\n[47] I. J. Schoenberg. Spline interpolation and best quadrature formulae. Bulletin of the American Mathematical\n\nSociety, 70(1):143\u2013148, 1964.\n\n[48] G. Wahba.\n\nImproper priors, spline smoothing and the problem of guarding against model errors in\n\nregression. Journal of the Royal Statistical Society. Series B (Methodological), 40(3):364\u2013372, 1978.\n\n[49] H. Wendland. Scattered Data Approximation. Number 17 in Cambridge Monographs on Applied and\n\nComputational Mathematics. Cambridge University Press, 2005.\n\n[50] W. Xu and M.L. Stein. Maximum likelihood estimation for a smooth Gaussian random \ufb01eld model.\n\nSIAM/ASA Journal on Uncertainty Quanti\ufb01cation, 5(1):138\u2013175, 2017.\n\n[51] Q. Zhou, W. Liu, J. Li, and Y. M. Marzouk. An approximate empirical Bayesian method for large-scale\n\nlinear-Gaussian inverse problems. Inverse Problems, 34(9), 2018.\n\n12\n\n\f", "award": [], "sourceid": 2835, "authors": [{"given_name": "Toni", "family_name": "Karvonen", "institution": "Aalto University"}, {"given_name": "Chris", "family_name": "Oates", "institution": "Newcastle University"}, {"given_name": "Simo", "family_name": "Sarkka", "institution": "Aalto University"}]}