{"title": "Unifying Framework for Fast Learning Rate of Non-Sparse Multiple Kernel Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 1575, "page_last": 1583, "abstract": "In this paper, we give a new generalization error bound of Multiple Kernel Learning (MKL) for a general class of regularizations. Our main target in this paper is dense type regularizations including \u2113p-MKL that imposes \u2113p-mixed-norm regularization instead of \u21131-mixed-norm regularization. According to the recent numerical experiments, the sparse regularization does not necessarily show a good performance compared with dense type regularizations. Motivated by this fact, this paper gives a general theoretical tool to derive fast learning rates that is applicable to arbitrary monotone norm-type regularizations in a unifying manner. As a by-product of our general result, we show a fast learning rate of \u2113p-MKL that is tightest among existing bounds. We also show that our general learning rate achieves the minimax lower bound. Finally, we show that, when the complexities of candidate reproducing kernel Hilbert spaces are inhomogeneous, dense type regularization shows better learning rate compared with sparse \u21131 regularization.", "full_text": "Unifying Framework for Fast Learning Rate of\n\nNon-Sparse Multiple Kernel Learning\n\nTaiji Suzuki\n\nDepartment of Mathematical Informatics\n\nThe University of Tokyo\nTokyo 113-8656, Japan\n\ns-taiji@stat.t.u-tokyo.ac.jp\n\nAbstract\n\nIn this paper, we give a new generalization error bound of Multiple Kernel Learn-\ning (MKL) for a general class of regularizations. Our main target in this paper is\ndense type regularizations including \u2113p-MKL that imposes \u2113p-mixed-norm regu-\nlarization instead of \u21131-mixed-norm regularization. According to the recent nu-\nmerical experiments, the sparse regularization does not necessarily show a good\nperformance compared with dense type regularizations. Motivated by this fact,\nthis paper gives a general theoretical tool to derive fast learning rates that is ap-\nplicable to arbitrary mixed-norm-type regularizations in a unifying manner. As\na by-product of our general result, we show a fast learning rate of \u2113p-MKL that\nis tightest among existing bounds. We also show that our general learning rate\nachieves the minimax lower bound. Finally, we show that, when the complexities\nof candidate reproducing kernel Hilbert spaces are inhomogeneous, dense type\nregularization shows better learning rate compared with sparse \u21131 regularization.\n\n1 Introduction\n\nMultiple Kernel Learning (MKL) proposed by [20] is one of the most promising methods that adap-\ntively select the kernel function in supervised kernel learning. A kernel method is widely used and\nseveral studies have supported its usefulness [25]. However the performance of kernel methods\ncritically relies on the choice of the kernel function. Many methods have been proposed to deal\nwith the issue of kernel selection. [23] studied hyperkrenels as a kernel of kernel functions. [2]\nconsidered DC programming approach to learn a mixture of kernels with continuous parameters.\nSome studies tackled a problem to learn non-linear combination of kernels as in [4, 9, 34]. Among\nthem, learning a linear combination of \ufb01nite candidate kernels with non-negative coef\ufb01cients is the\nmost basic, fundamental and commonly used approach. The seminal work of MKL by [20] con-\nsidered learning convex combination of candidate kernels. This work opened up the sequence of\nthe MKL studies. [5] showed that MKL can be reformulated as a kernel version of the group lasso\n[36]. This formulation gives an insight that MKL can be described as a \u21131-mixed-norm regularized\nmethod. As a generalization of MKL, \u2113p-MKL that imposes \u2113p-mixed-norm regularization has been\nproposed [22, 14]. \u2113p-MKL includes the original MKL as a special case as \u21131-MKL. Another direc-\ntion of generalizing MKL is elasticnet-MKL [26, 31] that imposes a mixture of \u21131-mixed-norm and\n\u21132-mixed-norm regularizations. Recently numerical studies have shown that \u2113p-MKL with p > 1\nand elasticnet-MKL show better performances than \u21131-MKL in several situations [14, 8, 31]. An\ninteresting perception here is that both \u2113p-MKL and elasticnet-MKL produce denser estimator than\nthe original \u21131-MKL while they show favorable performances. One motivation of this paper is to\ngive a theoretical justi\ufb01cation to these generalized dense type MKL methods in a unifying manner.\n\n1\n\n\f\u221a\n\nM\nIn the pioneering paper of [20], a convergence rate of MKL is given as\nn , where M is the number\nof given kernels and n is the number of samples. [27] gave improved learning bound utilizing the\npseudo-dimension of the given kernel class. [35] gave a convergence bound utilizing Rademacher\nchaos and gave some upper bounds of the Rademacher chaos utilizing the pseudo-dimension of the\nkernel class. [8] presented a convergence bound for a learning method with L2 regularization on the\nfor 1 \u2264 p \u2264 2. [15]\nkernel weight. [10] gave the convergence rate of \u2113p-MKL as M\ngave a similar convergence bound with improved constants. [16] generalized this bound to a variant\nof the elasticnet type regularization and widened the effective range of p to all range of p \u2265 1 while\nin the existing bounds 1 \u2264 p \u2264 2 was imposed. One concern about these bounds is that all bounds\n\u221a\nintroduced above are \u201cglobal\u201d bounds in a sense that the bounds are applicable to all candidates of\nestimators. Consequently all convergence rate presented above are of order 1/\nn with respect to\nthe number n of samples. However, by utilizing the localization techniques including so-called local\nRademacher complexity [6, 17] and peeling device [32], we can derive a faster learning rate. Instead\nof uniformly bounding all candidates of estimators, the localized inequality focuses on a particular\nestimator such as empirical risk minimizer, thus can gives a sharp convergence rate.\n\n\u221a\np \u2228\n\u221a\nn\n\nlog(M )\n\n1\u2212 1\n\nLocalized bounds of MKL have been given mainly in sparse learning settings [18, 21, 19], and there\nare only few studies for non-sparse settings in which the sparsity of the ground truth is not assumed.\nRecently [13] gave a localized convergence bound of \u2113p-MKL. However, their analysis assumed a\nstrong condition where RKHSs have no-correlation to each other.\n\nIn this paper, we show a uni\ufb01ed framework to derive fast convergence rates of MKL with various\nregularization types. The framework is applicable to arbitrary mixed-norm regularizations includ-\ning \u2113p-MKL and elasticnet-MKL. Our learning rate utilizes the localization technique, thus is tighter\nthan global type learning rates. Moreover our analysis does not require no-correlation assumption\nas in [13]. We apply our general framework to some examples and show our bound achieves the\nminimax-optimal rate. As a by-product, we obtain a tighter convergence rate of \u2113p-MKL than exist-\ning results. Finally, we show that dense type regularizations can outperforms sparse \u21131 regularization\nwhen the complexities of the RKHSs are not uniformly same.\n\n2 Preliminary\n\nIn this section, we give the problem formulation, the notations and the assumptions required for the\nconvergence analysis.\n\n2.1 Problem Formulation\nSuppose that we are given n i.i.d. samples {(xi, yi)}n\ni=1 distributed from a probability distribution\nP on X \u00d7 R where X is an input space. We denote by (cid:5) the marginal distribution of P on X . We\nare given M reproducing kernel Hilbert spaces (RKHS) {Hm}M\nm=1 each of which is associated with\na kernel km. We consider a mixed-norm type regularization with respect to an arbitrary given norm\n\u2225\u03c8 of the vector (\u2225fm\u2225Hm)M\n\u2225\u00b7\u2225\u03c8, that is, the regularization is given by the norm \u2225(\u2225fm\u2225Hm)M\n\u2225\u03c8\nfor fm \u2208 Hm (m = 1, . . . , M)\nm=1\nm=1 fm (fm \u2208 Hm).\nfor f =\n\n. For notational simplicity, we write \u2225f\u2225\u03c8 = \u2225(\u2225fm\u2225Hm)M\n\n\u2211\n\nm=1\n\nM\n\n\u2217\n\nThe general formulation of MKL that we consider in this paper \ufb01ts a function f =\nHm) to the data by solving the following optimization problem:\n\nm=1\n\n\u2211\nm=1 fm (fm \u2208\n\nM\n\nM\u2211\n\nm=1\n\nn\u2211\n\n(\nyi \u2212 M\u2211\n\ni=1\n\nm=1\n\n)2\n\n^f =\n\n^fm =\n\narg min\n\nfm\u2208Hm (m=1,...,M )\n\n1\nn\n\nfm(xi)\n\n+ \u03bb(n)\n\n1\n\n\u2225f\u22252\n\u03c8.\n\n(1)\n\nWe call this \u201c\u03c8-norm MKL\u201d. This formulation covers many practically used MKL methods (e.g.,\n\u2113p-MKL, elasticnet-MKL, variable sparsity kernel learning (see later for their de\ufb01nitions)), and is\nsolvable by a \ufb01nite dimensional optimization procedure due to the representer theorem [12]. In this\nm=1\u2225 satis\ufb01es the triangular inequality with respect to\nm=1\u2225 . To satisfy this\n\nm\u2225Hm )M\n\u2032\n(fm)M\ncondition, it is suf\ufb01cient if the norm is monotone, i.e., \u2225a\u2225 \u2264 \u2225a + b\u2225 for all a, b \u2265 0.\n\nWe assume that the mixed-norm \u2225(\u2225fm\u2225Hm )M\nm=1, that is, \u2225(\u2225fm + f\n\nm=1\u2225 \u2264 \u2225(\u2225fm\u2225Hm )M\n\nm=1\u2225 + \u2225(\u2225f\n\nm\u2225Hm )M\n\u2032\n\n\u2217\n\n2\n\n\f\u2211\n\n\u2211\n\n\u2211\n\npaper, we focus on the regression problem (the squared loss). However the discussion presented\nhere can be generalized to Lipschitz continuous and strongly convex losses [6].\n\nExample 1: \u2113p-MKL The \ufb01rst motivating example of \u03c8-norm MKL is \u2113p-MKL [14] that employs\n\u2113p-norm for 1 \u2264 p \u2264 \u221e as the regularizer: \u2225f\u2225\u03c8 = \u2225(\u2225fm\u2225Hm)M\np .\nIf p is strictly greater than 1 (p > 1), the solution of \u2113p-MKL becomes dense. In particular, p = 2\ncorresponds to averaging candidate kernels with uniform weight [22]. It is reported that \u2113p-MKL\nwith p greater than 1, say p = 4\n3 , often shows better performance than the original sparse \u21131-MKL\n[10].\n\n\u2225fm\u2225pHm\n\n\u2225\u2113p = (\n\nM\nm=1\n\nm=1\n\n)\n\n1\n\nExample 2: Elasticnet-MKL The second example is elasticnet-MKL [26, 31] that employs mix-\nture of \u21131 and \u21132 norms as the regularizer: \u2225f\u2225\u03c8 = \u03c4\u2225f\u2225\u21131 + (1 \u2212 \u03c4 )\u2225f\u2225\u21132 = \u03c4\n\u2225fm\u2225Hm +\n(1 \u2212 \u03c4 )(\n2 with \u03c4 \u2208 [0, 1]. Elasticnet-MKL shares the same spirit with \u2113p-MKL in\n) 1\na sense that it bridges sparse \u21131-regularization and dense \u21132-regularization. An ef\ufb01cient optimization\nmethod for elasticnet-MKL is proposed by [30].\n\n\u2225fm\u22252Hm\n\nM\nm=1\n\nM\nm=1\n\n} 1\nExample 3: Variable Sparsity Kernel Learning Variable Sparsity Kernel Learning (VSKL) pro-\n\u2032\n\u2032\nposed by [1] divides the RKHSs into M\n) and imposes a mixed\nnorm regularization \u2225f\u2225\u03c8 = \u2225f\u2225(p,q) =\nq where 1 \u2264 p, q, and\nfj,k \u2208 Hj,k. An advantageous point of VSKL is that by adjusting the parameters p and q, various\nlevels of sparsity can be introduced, that is, the parameters can control the level of sparsity within\ngroup and between groups. This point is bene\ufb01cial especially for multi-modal tasks like object\ncategorization.\n\ngroups {Hj,k}Mj\n\nk=1, (j = 1, . . . , M\n\n\u2225fj,k\u2225pHj,k\n\n\u2211Mj\n\n{\u2211\n\nM\nj=1(\n\nk=1\n\nq\np\n\n)\n\n\u2032\n\n2.2 Notations and Assumptions\nHere, we prepare notations and assumptions that are used in the analysis. Let H\u2295M = H1 \u2295 \u00b7\u00b7\u00b7 \u2295\nHM . Throughout the paper, we assume the following technical conditions (see also [3]).\n\u2211\nAssumption 1. (Basic Assumptions)\n\n\u2217\nm(X),\n(A2) For each m = 1, . . . , M, Hm is separable (with respect to the RKHS norm) and\n\nM ) \u2208 H\u2295M such that E[Y |X] = f\n\u2217\n\n(X) is bounded as |\u03f5| \u2264 L.\n\n(A1) There exists f\n\n\u2217\n1 , . . . , f\n\nand the noise \u03f5 := Y \u2212 f\nsupX\u2208X |km(X, X)| < 1.\n\nM\nm=1 f\n\n(X) =\n\n= (f\n\n\u2217\n\n\u2217\n\n\u2217\n\nThe \ufb01rst assumption in (A1) ensures the model H\u2295M is correctly speci\ufb01ed, and the technical as-\nsumption |\u03f5| \u2264 L allows \u03f5f to be Lipschitz continuous with respect to f. The noise boundedness\ncan be relaxed to unbounded situation as in [24], but we don\u2019t pursue that direction for simplicity.\nLet an integral operator Tkm : L2((cid:5)) \u2192 L2((cid:5)) corresponding to a kernel function km be\n\n\u222b\n\nTkm f =\n\nkm(\u00b7, x)f (x)d(cid:5)(x).\n\nIt is known that this operator is compact, positive, and self-adjoint (see Theorem 4.27 of [28]). Thus\nit has at most countably many non-negative eigenvalues. We denote by \u00b5\u2113,m be the \u2113-th largest\neigenvalue (with possible multiplicity) of the integral operator Tkm. Then we assume the following\nassumption on the decreasing rate of \u00b5\u2113,m.\nAssumption 2. (Spectral Assumption) There exist 0 < sm < 1 and 0 < c such that\n\n\u00b5\u2113,m \u2264 c\u2113\n\n\u2212 1\n\n(\u2200\u2113 \u2265 1, 1 \u2264 \u2200m \u2264 M ),\n\nsm ,\n\n\u2113=1 is the spectrum of the operator Tkm corresponding to the kernel km.\n\n(A3)\nwhere {\u00b5\u2113,m}\u221e\nIt was shown that the spectral assumption (A3) is equivalent to the classical covering number as-\nsumption [29]. Recall that the \u03f5-covering number N (\u03f5,BHm, L2((cid:5))) with respect to L2((cid:5)) is the\nminimal number of balls with radius \u03f5 needed to cover the unit ball BHm in Hm [33]. If the spectral\nassumption (A3) holds, there exists a constant C that depends only on s and c such that\n\nlog N (\u03b5,BHm , L2((cid:5))) \u2264 C\u03b5\n\n\u22122sm,\n\n(2)\n\n3\n\n\fTable 1: Summary of the constants we use in this article.\n\nThe number of samples.\n\nn\nM The number of candidate kernels.\nsm The spectral decay coef\ufb01cient; see (A3).\n\u03baM The smallest eigenvalue of the design matrix (see Eq. (3)).\n\nand the converse is also true (see [29, Theorem 15] and [28] for details). Therefore, if sm is large,\nthe RKHSs are regarded as \u201ccomplex\u201d, and if sm is small, the RKHSs are \u201csimple\u201d.\nAn important class of RKHSs where sm is known is Sobolev space. (A3) holds with sm = d\n2\u03b1 for\nSobolev space of \u03b1-times continuously differentiability on the Euclidean ball of Rd [11]. Moreover,\nfor \u03b1-times continuously differentiable kernels on a closed Euclidean ball in Rd, that holds for sm =\nd\n2\u03b1 [28, Theorem 6.26]. According to Theorem 7.34 of [28], for Gaussian kernels with compact\nsupport distribution, that holds for arbitrary small 0 < sm. The covering number of Gaussian\n{\nkernels with unbounded support distribution is also described in Theorem 7.34 of [28].\nLet \u03baM be de\ufb01ned as follows:\n\u03ba \u2265 0\n\n, \u2200fm \u2208 Hm (m = 1, . . . , M )\n\n(cid:12)(cid:12)(cid:12) \u03ba \u2264 \u2225\u2211\n\u2211\n\n\u03baM := sup\n\n}\n\n(3)\n\nL2((cid:5))\n\nM\n\n.\n\nL2((cid:5))\n\nm=1 fm\u22252\n\u2225fm\u22252\n\nM\nm=1\n\n\u03baM represents the correlation of RKHSs. We assume all RKHSs are not completely correlated to\neach other.\nAssumption 3. (Incoherence Assumption) \u03baM is strictly bounded from below; there exists a con-\nstant C0 > 0 such that\n(A4)\n\n\u2211\nThis condition is motivated by the incoherence condition [18, 21] considered in sparse MKL settings.\n\u2217\nm of the ground truth. [3] also\nThis ensures the uniqueness of the decomposition f\nassumed this condition to show the consistency of \u21131-MKL.\nFinally we give a technical assumption with respect to \u221e-norm.\nAssumption 4. (Embedded Assumption) Under the Spectral Assumption, there exists a constant\nC1 > 0 such that\n(A5)\n\n\u2225fm\u2225\u221e \u2264 C1\u2225fm\u22251\u2212smHm\n\n\u22121\n0 < \u03baM .\n\n\u2225fm\u2225sm\n\nM\nm=1 f\n\n0 < C\n\nL2((cid:5)).\n\n=\n\n\u2217\n\nThis condition is met when the input distribution (cid:5) has a density with respect to the uniform distri-\nbution on X that is bounded away from 0 and the RKHSs are continuously embedded in a Sobolev\nspace W \u03b1,2(X ) where sm = d\n2\u03b1 , d is the dimension of the input space X and \u03b1 is the \u201csmoothness\u201d\nof the Sobolev space. Many practically used kernels satisfy this condition (A5). For example, the\nRKHSs of Gaussian kernels can be embedded in all Sobolev spaces. Therefore the condition (A5)\nseems rather common and practical. More generally, there is a clear characterization of the condi-\ntion (A5) in terms of real interpolation of spaces. One can \ufb01nd detailed and formal discussions of\ninterpolations in [29], and Proposition 2.10 of [7] gives the necessary and suf\ufb01cient condition for\nthe assumption (A5).\n\nConstants we use later are summarized in Table 1.\n\n3 Convergence Rate Analysis of -norm MKL\nHere we derive the learning rate of \u03c8-norm MKL in a most general setting. We suppose that the\nnumber of kernels M can increase along with the number of samples n. The motivation of our\nanalysis is summarized as follows:\n\n\u2022 Give a unifying frame work to derive a sharp convergence rate of \u03c8-norm MKL.\n\u2022 (homogeneous complexity) Show the convergence rate of some examples using our general\nframe work, and prove its minimax-optimality under conditions that the complexities sm\nof all RKHSs are same.\n\n4\n\n\f)M\n\n(cid:13)(cid:13)(cid:13)(cid:13)\n\n\u2022 (inhomogeneous complexity) Discuss how the dense type regularization outperforms the\nsparse type regularization, when the complexities sm of all RKHSs are not uniformly same.\n\u221a\nn) for t > 0, and, for given positive reals {rm}M\n\n\u221a\nt, t/\n\nm=1\n\n(\nNow we de\ufb01ne \u03b7(t) := \u03b7n(t) = max(1,\nand given n, we de\ufb01ne \u03b11, \u03b12, \u03b21, \u03b22 as follows:\n(\n\u03b11 := \u03b11({rm}) = 3\n\n) 1\n\n\u22122sm\nn\n\nm=1\n\nm\n\nr\n\n2\n\n\u2212 2sm (3\u2212sm )\n\nM\u2211\nM\u2211\n\n) 1\n, \u03b12 := \u03b12({rm}) = 3\n\n(cid:13)(cid:13)(cid:13)(cid:13)(\n\nsmr\n\n2\n\n2\n\nr\n\nm\n\nn\n\n1+sm\n\n1+sm\n\nm=1\n\n, \u03b22 := \u03b22({rm}) = 3\n\n\u03b21 := \u03b21({rm}) = 3\n(note that \u03b11, \u03b12, \u03b21, \u03b22 implicitly depends on the reals {rm}M\n(\n)2\ngives the general form of the learning rate of \u03c8-norm MKL.\nTheorem 1. Suppose Assumptions 1-4 are satis\ufb01ed. Let {rm}M\ncan depend on n, and assume \u03bb(n)\n[(\n)2\n12 and for all t \u2265 1, we have\n1 , M log(M )\nand 4\u03d5\n1, \u03b22\n\u2264 24\u03b7(t)2\u03d52\n\n(\n) \u2264 1\n\nmax{\u03b12\n\u2217\u22252\n\n1 =\n(\n}\u03b7(t\n\nM log(M )\n\n)2\n\n)\n\n\u03b12\n\u03b11\n\n\u03b22\n\u03b21\n\n\u03baM\n\n\u221a\n\n+\n\nn\n\nn\n\n\u2032\n\n. Then for all n and t\n\n\u2225 ^f \u2212 f\n\n+ 4\n\n\u03b12\n1 + \u03b22\n\n1 +\n\nL2((cid:5))\n\n\u03baM\n\nn\n\nsmr1\u2212sm\n\nm\u221a\nn\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)\n(\n\nm=1\n\n(1\u2212sm )2\n1+sm\n1\n\nm\n\nn\n\n1+sm\n\n,\n\n)M\n\n\u03c8\u2217\n\nm=1\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)\n\n\u03c8\u2217\n\n, (4)\n\nm=1). Then the following theorem\n\nm=1 be arbitrary positive reals that\n\u2264 1\n\nthat satisfy log(M )\u221a\n\n\u2032\n\nn\n\n(\n\n]\n\n)2\n\n\u03b12\n\u03b11\n\n+\n\n\u03b22\n\u03b21\n\n\u2225f\n\n\u2217\u22252\n\u03c8.\n\n(5)\n\n).\n\nwith probability 1 \u2212 exp(\u2212t) \u2212 exp(\u2212t\n\u2032\nThe proof will be given in Appendix D in the supplementary material. One can also \ufb01nd an outline\nof the proof in Appendix A in the supplementary material.\nThe statement of Theorem 1 itself is complicated. Thus we will show later concrete learning rates on\nsome examples such as \u2113p-MKL. The convergence rate (5) depends on the positive reals {rm}M\nm=1,\n(\n})\nbut the choice of {rm}M\nm=1 are arbitrary. Thus by minimizing the right hand side of Eq. (5), we\nobtain tight convergence bound as follows:\n\u2225 ^f \u2212 f\n\nThere is a trade-off between the \ufb01rst two terms (a) := \u03b12\n\n[(\n1 and the third term (b) :=\n\u2217\u22252\n\u03c8, that is, if we take {rm}m large, then the term (a) becomes small and\nthe term (b) becomes large, on the other hand, if we take {rm}m small, then it results in large (a)\nand small (b). Therefore we need to balance the two terms (a) and (b) to obtain the minimum in\nEq. (6).\n\nL2((cid:5)) =Op\n\u2217\u22252\n]\n)2\n(\n\nmin\n{rm}M\nrm>0\n\n[(\n\n)2\n\n)2\n\n\u03b12\n1 + \u03b22\n\nM log(M )\n\n)2\n\n1 + \u03b22\n\n{\n\n(\n\n\u03b12\n\u03b11\n\n\u03b22\n\u03b21\n\n\u2217\u22252\n\n]\n\n\u2225f\n\n\u2225f\n\n\u03c8 +\n\n1 +\n\nm=1:\n\n\u03b12\n\u03b11\n\n(6)\n\n\u03b22\n\u03b21\n\n+\n\n+\n\nn\n\n.\n\nWe discuss the obtained learning rate in two situations, (i) homogeneous complexity situation, and\n(ii) inhomogeneous complexity situation:\n(i) (homogeneous) All sms are same: there exists 0 < s < 1 such that sm = s (\u2200m) (Sec.3.1).\nsuch that sm \u0338= sm\u2032 (Sec.3.2).\n(ii) (inhomogeneous) All sms are not same: there exist m, m\n\n\u2032\n\n3.1 Analysis on Homogeneous Settings\n\nHere we assume all sms are same, say sm = s for all m (homogeneous setting). If we further restrict\nthe situation as all rms are same (rm = r (\u2200m) for some r), then the minimization in Eq. (6) can\nbe easily carried out as in the following lemma. Let 1 be the M-dimensional vector each element of\nwhich is 1: 1 := (1, . . . , 1)\nLemma 2. When sm = s (\u2200m) with some 0 < s < 1 and n \u2265 (\u22251\u2225\u03c8\u2217\u2225f\n(6) indicates that\n\u2225 ^f \u2212 f\n\n\u22a4 \u2208 RM , and \u2225 \u00b7 \u2225\u03c8\u2217 be the dual norm of the \u03c8-norm\n\n.\n\u2217\u2225\u03c8/M )\n\n1+s (\u22251\u2225\u03c8\u2217\u2225f\n\u2212 1\n\n1\u2212s , the bound\n\nM 1\u2212 2s\n\n\u2217\u22252\nL2((cid:5)) = Op\n\nM log(M )\n\n(\n\n)\n\n1+s n\n\n(7)\n\n\u2020\n\n4s\n\n.\n\n2s\n\n1+s +\n\n\u2217\u2225\u03c8)\na | \u2225a\u2225 \u2264 1}.\n\nn\n\n\u2020\n\nThe dual of the norm \u2225 \u00b7 \u2225 is de\ufb01ned as \u2225b\u2225 \u2217 := supa\n\n{b\n\u22a4\n\n5\n\n\fThe proof is given in Appendix G.1 in the supplementary material. Lemma 2 is derived by assuming\nrm = r (\u2200m), which might make the bound loose. However, when the norm \u2225 \u00b7 \u2225\u03c8 is isotropic\n(whose de\ufb01nition will appear later), that restriction (rm = r (\u2200m)) does not make the bound loose,\nthat is, the upper bound obtained in Lemma 2 is tight and achieves the minimax optimal rate (the\nminimax optimal rate is the one that cannot be improved by any estimator). In the following, we\ninvestigate the general result of Lemma 2 through some important examples.\nConvergence Rate of \u2113p-MKL Here we derive the convergence rate of \u2113p-MKL (1 \u2264 p \u2264 \u221e)\nm=1(\u2225fm\u2225pHm\np (for p = \u221e, it is de\ufb01ned as maxm \u2225fm\u2225Hm). It is well known\nwhere \u2225f\u2225\u03c8 =\n(\u2211\nthat the dual norm of \u2113p-norm is given as \u2113q-norm where q is the real satisfying 1\nq = 1. For\n\u2217\u2225\u03c8 = Rp and \u22251\u2225\u03c8\u2217 =\n)\n\n) 1\np . Then substituting \u2225f\n\nnotational simplicity, let Rp :=\n\u22251\u2225\u2113q = M\n\np into the bound (7), the learning rate of \u2113p-MKL is given as\n\n\u2225pHm\n\n\u2211\n\np + 1\n\nM\nm=1\n\n(\n\n\u2225f\n\n\u2217\nm\n\nM\n\n)\n\n1\n\n1\n\nq = M 1\u2212 1\n\u2225 ^f \u2212 f\n\n\u2217\u22252\nL2((cid:5)) =Op\n\n1+s M 1\u2212 2s\n\u2212 1\n(\nIf we further assume n is suf\ufb01ciently large so that n \u2265 M\n\ufb01rst term, and thus we have\n\nn\n\n2s\n\nM log(M )\n\n.\n\np(1+s) R\n\n1+s\n\np +\n\u22122\np (log M )\n\n2\n\np R\n\nn\ns , the leading term is the\n\n1+s\n\n(8)\n\n)\n\n\u2225 ^f \u2212 f\n\n\u2217\u22252\nL2((cid:5)) = Op\n\n1+s M 1\u2212 2s\n\u2212 1\n\np(1+s) R\n\nn\n\n2s\n\n1+s\np\n\n.\n\n(9)\n\n\u2212 1\n\nNote that as the complexity s of RKHSs becomes small the convergence rate becomes fast. It is\nknown that n\n1+s is the minimax optimal learning rate for single kernel learning. The derived\nrate of \u2113p-MKL is obtained by multiplying a coef\ufb01cient depending on M and Rp to the optimal\nrate of single kernel learning. To investigate the dependency of Rp to the learning rate, let us\nconsider two extreme settings, i.e., sparse setting (\u2225f\nm=1 = (1, 0, . . . , 0) and dense setting\n(\u2225f\n\n\u2225Hm )M\n\n\u2217\nm\n\n\u2217\nm\n\nm=1 = (1, . . . , 1) as in [15].\n\n\u2225Hm)M\n\u2225Hm )M\n\u2022 (\u2225f\n\u2217\nm\n1+s M 1\u2212 2s\n\u2212 1\nn\nthat \u21131 regularization is preferred for sparse truth.\n\u2225Hm )M\n\u2022 (\u2225f\n\nm=1 = (1, . . . , 1): Rp = M\n\n\u2217\nm\n\n1\n\nm=1 = (1, 0, . . . , 0): Rp = 1 for all p. Therefore the convergence rate\np(1+s) is fast for small p and the minimum is achieved at p = 1. This means\n\np , thus the convergence rate is M n\n\n1+s for all\np. Interestingly for dense ground truth, there is no dependency of the convergence rate\non the parameter p (later we will show that this is not the case in inhomogeneous settings\n(Sec.3.2)). That is, the convergence rate is M times the optimal learning rate of single\nkernel learning (n\n1+s ) for all p. This means that for the dense settings, the complexity of\nsolving MKL problem is equivalent to that of solving M single kernel learning problems.\n\n\u2212 1\n\n\u2212 1\n\n1\n\n)\n\nM\n\nM\n\n1\u2212 1\n\nlog(M )\n\nM\nm=1\n\n\u221a\np \u2228\n\u221a\nn\n\nR for all f \u2208 H\u2113p(R),\n\n\u2225fm\u2225pHm\nR(f ) \u2264 bR(f ) + C\n\n\u2211\n\u2211\nComparison with Existing Bounds Here we compare the bound for \u2113p-MKL we derived above\nwith the existing bounds. Let H\u2113p (R) be the \u2113p-mixed norm ball with radius R: H\u2113p(R) := {f =\nm=1 fm | (\nwhere R(f ) and bR(f ) is the population risk and the empirical risk. First observation is that the\n\np \u2264 R}. [10, 16, 15] gave \u201cglobal\u201d type bounds for \u2113p-MKL as\n\nbounds by [10] and [15] are restricted to the situation 1 \u2264 p \u2264 2. On the other hand, our analysis\nand that of [16] covers all p \u2265 1. Second, since our bound is specialized to the regularized risk\nminimizer ^f de\ufb01ned at Eq. (1) while the existing bound (10) is applicable to all f \u2208 H\u2113p (R), our\nbound is sharper than theirs for suf\ufb01ciently large n. To see this, suppose n \u2265 M\n\u22122\np , then we\nhave n\np . Moreover we should note that s can be large as long as\nSpectral Assumption (A3) is satis\ufb01ed. Thus the bound (10) is formally recovered by our analysis by\n(\napproaching s to 1.\nRecently [13] gave a tighter convergence rate utilizing the localization technique as \u2225 ^f\u2212f\nOp\n\nL2((cid:5)) =\n, under a strong condition \u03baM = 1 that imposes all\n\n1+s M 1\u2212 2s\n\u2212 1\n\n1+s M 1\u2212 2s\n\u2212 1\n\np(1+s) \u2264 n\n\n2 M 1\u2212 1\n\u2212 1\n\np\u2032 (1+s) R\n\n})\n\nminp\u2032\u2265p\n\n{\n\n\u2217\u22252\n\n(10)\n\np R\n\n1+s\n\n2s\n\n\u2032\n\n2\n\np\np\u2032\u22121 n\n\np\u2032\n\n6\n\n\f\u2032\n\nRKHSs are completely uncorrelated to each other. Comparing our bound with their result, there are\nnot minp\u2032\u2265p and p\np\u2032\u22121 , then the minimum of minp\u2032\u2265p\np\u2032\u22121 in our bound (if there is not the term p\n\u2032\nis attained at p\n= p, thus our bound is tighter), moreover our analysis doesn\u2019t need the strong\nassumption \u03baM = 1.\n\n\u2032\n\n(\u2225a\u2225\u2113\u221e\n\n)}\n\nConvergence Rate of Elasticnet-MKL Elasticnet-MKL employs a mixture of \u21131 and \u21132 norm as\nthe regularizer: \u2225f\u2225\u03c8 = \u03c4\u2225f\u2225\u21131 + (1 \u2212 \u03c4 )\u2225f\u2225\u21132 where \u03c4 \u2208 [0, 1]. Then its dual norm is given\nby \u2225b\u2225\u03c8\u2217 = mina\u2208RM\n. Therefore by a simple calculation, we have\n)\n\u22251\u2225\u03c8\u2217 =\n\u2225 ^f \u2212 f\n\n. Hence Eq. (7) gives the convergence rate of elasticnet-MKL as\n\n\u221a\n\u221a\nM\n1\u2212\u03c4 +\u03c4\nL2((cid:5)) = Op\n\u2217\u22252\n\n\u2217\u2225\u21131 + (1 \u2212 \u03c4 )\u2225f\n\n1+s + M log(M )\n\n\u2225a\u2212b\u2225\u21132\n\n\u2217\u2225\u21132)\n\n1\u2212 s\n\u221a\n1+s\n\n(\u03c4\u2225f\n\n1\u2212\u03c4\n\nmax\n\n\u2212 1\n\nn\n\n1+s\n\nM\n\nn\n\n2s\n\n2s\n\n,\n\n.\n\n\u03c4\n\nM\n(1\u2212\u03c4 +\u03c4\n\nM )\n\n1+s\n\n{\n(\n\nNote that, when \u03c4 = 0 or \u03c4 = 1, this rate is identical to that of \u21132-MKL or \u21131-MKL obtained in\nEq. (8) respectively.\n\n3.1.1 Minimax Lower Bound\n\nIn this section, we show that the derived learning rate (7) achieves the minimax-learning rate on the\n\u03c8-norm ball\n\n{\n\n\u2211\n\nH\u03c8(R) :=\n\nf =\n\nM\nm=1 fm\n\n}\n\n(cid:12)(cid:12) \u2225f\u2225\u03c8 \u2264 R\n\n,\n\n\u2225b\u2225\u03c8 \u2264 \u2225b\n\n(cid:22)cM = (cid:22)c\u22251\u2225\u21131\n\n\u2265 \u22251\u2225\u03c8\u2217\u22251\u2225\u03c8,\n\nm (\u2200m)),\n\u2032\u2225\u03c8 (if 0 \u2264 bm \u2264 b\n\u2032\n\nwhen the norm is isotropic. We say the \u03c8-norm \u2225 \u00b7 \u2225\u03c8 is isotropic when there exits a universal\nconstant (cid:22)c such that\n(11)\n(note that the inverse inequality M \u2264 \u22251\u2225\u03c8\u2217\u22251\u2225\u03c8 of the \ufb01rst condition always holds by the de\ufb01-\nnition of the dual norm). Practically used regularizations usually satisfy this isotropic property. In\nfact, \u2113p-MKL, elasticnet-MKL and VSKL satisfy the isotropic property with (cid:22)c = 1.\nWe derive the minimax learning rate in a simpler situation. First we assume that each RKHS is same\nas others. That is, the input vector is decomposed into M components like x = (x(1), . . . , x(M ))\nm=1 are M i.i.d. copies of a random variable ~X, and Hm = {fm | fm(x) =\nwhere {x(m)}M\nf \u2208 H\u2295M is decomposed as f (x) = f (x(1), . . . , x(M )) =\n\nfm(x(1), . . . , x(M )) = ~fm(x(m)), ~fm \u2208 eH} where eH is an RKHS shared by all Hm. Thus\na member of the common RKHS eH. We denote byek the kernel associated with the RKHS eH.\n\n~fm(x(m)) where each ~fm is\n\n\u2211\n\nM\nm=1\n\n\u2032\n\nc\n\n\u2212 1\ns ,\n\n(1 \u2264 \u2200\u2113),\n\nIn addition to the condition about the upper bound of spectrum (Spectral Assumption (A3)), we\nassume that the spectrum of all the RKHSs Hm have the same lower bound of polynomial rate.\n\u2032\nAssumption 5. (Strong Spectral Assumption) There exist 0 < s < 1 and 0 < c, c\nsuch that\n(A6)\nwhere {~\u00b5\u2113}\u221e\nular, the spectrum of Tkm also satis\ufb01es \u00b5\u2113,m \u223c \u2113\n\nWithout loss of generality, we may assume that E[f ( ~X)] = 0 (\u2200f \u2208 eH). Since each fm receives\n\n\u2113=1 is the spectrum of the integral operator T~k corresponding to the kernel ~k. In partic-\n\ns \u2264 ~\u00b5\u2113 \u2264 c\u2113\n\u2212 1\n\u2113\n\ni.i.d. copy of ~X, Hms are orthogonal to each other:\nE[fm(X)fm\u2032(X)] = E[ ~fm(X (m)) ~fm\u2032(X (m\nWe also assume that the noise {\u03f5i}n\nUnder the assumptions described above, we have the following minimax L2((cid:5))-error.\n[\nTheorem 3. Suppose R > 0 is given and n > (cid:22)c2M 2\nR2\u22251\u22252\non H\u03c8(R) for isotropic norm \u2225 \u00b7 \u2225\u03c8 is lower bounded as\n\n).\ni=1 is an i.i.d. normal sequence with standard deviation \u03c3 > 0.\n\n))] = 0 (\u2200fm \u2208 Hm, \u2200fm\u2032 \u2208 Hm\u2032 , \u2200m \u0338= m\n\n\u03c8\u2217 is satis\ufb01ed. Then the minimax-learning rate\n\ns (\u2200\u2113, m).\n\u2212 1\n\n]\n\n\u2032\n\n\u2032\n\n\u2225 ^f \u2212 f\n\n\u2217\u22252\n\nL2((cid:5))\n\n\u2265 CM 1\u2212 2s\n\n1+s n\n\n1+s (\u22251\u2225\u03c8\u2217 R)\n\u2212 1\n\n2s\n\n1+s ,\n\n(12)\n\nmin\n\n^f\n\nmax\n\nf\u2217\u2208H\u03c8(R)\n\nE\n\nwhere inf is taken over all measurable functions of n samples {(xi, yi)}n\n\ni=1.\n\n7\n\n\fThe proof will be given in Appendix F in the supplementary material. One can see that the con-\nvergence rate derived in Eq. (7) achieves the minimax rate on the \u03c8-norm ball (Theorem 3) up\nto M log(M )\nthat is negligible when the number of samples is large. This means that the \u03c8-norm\nregularization is well suited to make the estimator included in the \u03c8-norm ball.\n\nn\n\n3.2 Analysis on Inhomogeneous Settings\nIn the previous section (analysis on homogeneous settings), we have not seen any theoretical justi\ufb01-\ncation supporting the fact that dense MKL methods like \u2113 4\n-MKL can outperform the sparse \u21131-MKL\n[10]. In this section, we show dense type regularizations can outperform the sparse regularization\nsuch that sm \u0338= sm\u2032). For simplicity, we focus on\nin inhomogeneous settings (there exists m, m\n\u2113p-MKL, and discuss the relation between the learning rate and the norm parameter p.\n\n\u2032\n\n3\n\n\u2021\n\n. In\n\nLet us consider an extreme situation where s1 = s for some 0 < s < 1 and sm = 0 (m > 1)\nthis situation, we have\n\n\u03b11 = 3\n\nr\n\n1 +M\u22121\n\u22122s\n\nn\n\n, \u03b12 = 3 sr1\u2212s\n1\u221a\n\nn , \u03b21 = 3\n\n\u2212 2s(3\u2212s)\n\n1+s\n\nr\n\n1\n\n+M\u22121\n\n2\n\n1+s\n\nn\n\n(1\u2212s)2\n1+s\n\n, \u03b22 = 3 sr\nn\n\n1\n\n1\n\n1+s\n\n.\n\n(\n\n) 1\n\n2\n\n(\n\n) 1\n\n2\n\n2s\n\n\u2217\nm\n\n\u2225Hm)M\n\n\u2217\u2225\u2113\u221e \u2264 \u2225f\n\n\u2217\u2225\u21131 = M\u2225f\n\nm=1 = 1, we have \u2225f\n\nfor all p. Note that these \u03b11, \u03b12, \u03b21 and \u03b22 have no dependency on p. Therefore the learning bound\n(6) is smallest when p = \u221e because \u2225f\n\u2217\u2225\u2113p for all 1 \u2264 p < \u221e. In particular, when\n\u2217\u2225\u2113\u221e and thus obviously the learning rate of \u2113\u221e-MKL\n(\u2225f\ngiven by Eq. (6) is faster than that of \u21131-MKL. In fact, through a bit cumbersome calculation, one\ncan check that \u2113\u221e-MKL can be M\n1+s times faster than \u21131-MKL in a worst case. This indicates\nthat, when the complexities of RKHSs are inhomogeneous, the generalization abilities of dense type\nregularizations (e.g., \u2113\u221e-MKL) can be better than the sparse type regularization (\u21131-MKL). In real\nsettings, it is likely that one uses various types of kernels and the complexities of RKHSs become\ninhomogeneous. As mentioned above, it has been often reported that \u21131-MKL is outperformed by\ndense type MKL such as \u2113 4\n-MKL in numerical experiments [10]. Our theoretical analysis explains\nwell this experimental results.\n4 Conclusion\nWe have shown a uni\ufb01ed framework to derive the learning rate of MKL with arbitrary mixed-norm-\ntype regularization. To analyze the general result, we considered two situations: homogeneous\nsettings and inhomogeneous settings. We have seen that the convergence rate of \u2113p-MKL obtained in\nhomogeneous settings is tighter and require less restrictive condition than existing results. We have\nalso shown the convergence rate of elasticnet-MKL, and proved the derived learning rate is minimax\noptimal. Furthermore, we observed that our bound well explains the favorable experimental results\nfor dense type MKL by considering the inhomogeneous settings. This is the \ufb01rst result that strongly\njusti\ufb01es the effectiveness of dense type regularizations in MKL.\nAcknowledgement This work was partially supported by MEXT Kakenhi 22700289 and the Ai-\nhara Project, the FIRST program from JSPS, initiated by CSTP.\n\n3\n\nReferences\n\n[1] J. A\ufb02alo, A. Ben-Tal, C. Bhattacharyya, J. S. Nath, and S. Raman. Variable sparsity kernel learning.\n\nJournal of Machine Learning Research, 12:565\u2013592, 2011.\n\n[2] A. Argyriou, R. Hauser, C. A. Micchelli, and M. Pontil. A DC-programming algorithm for kernel selec-\n\ntion. In the 23st ICML, pages 41\u201348, 2006.\n\n[3] F. R. Bach. Consistency of the group lasso and multiple kernel learning. Journal of Machine Learning\n\nResearch, 9:1179\u20131225, 2008.\n\n[4] F. R. Bach. Exploring large feature spaces with hierarchical multiple kernel learning. In Advances in\n\nNeural Information Processing Systems 21, pages 105\u2013112, 2009.\n\n[5] F. R. Bach, G. Lanckriet, and M. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm.\n\nIn the 21st ICML, pages 41\u201348, 2004.\n\n[6] P. Bartlett, O. Bousquet, and S. Mendelson. Local Rademacher complexities. The Annals of Statistics,\n\n33:1487\u20131537, 2005.\n\n\u2021\n\nIn our assumption sm should be greater than 0. However we formally put sm = 0 (m > 1) for simplicity\n\nof discussion. For rigorous discussion, one might consider arbitrary small sm \u226a s.\n\n8\n\n\f[7] C. Bennett and R. Sharpley. Interpolation of Operators. Academic Press, Boston, 1988.\n[8] C. Cortes, M. Mohri, and A. Rostamizadeh. L2 regularization for learning kernels. In UAI 2009, 2009.\n[9] C. Cortes, M. Mohri, and A. Rostamizadeh. Learning non-linear combinations of kernels. In Advances\n\nin Neural Information Processing Systems 22, pages 396\u2013404, 2009.\n\n[10] C. Cortes, M. Mohri, and A. Rostamizadeh. Generalization bounds for learning kernels.\n\nICML, pages 247\u2013254, 2010.\n\nIn the 27th\n\n[11] D. E. Edmunds and H. Triebel. Function Spaces, Entropy Numbers, Differential Operators. Cambridge\n\nUniversity Press, Cambridge, 1996.\n\n[12] G. S. Kimeldorf and G. Wahba. Some results on Tchebychef\ufb01an spline functions. Journal of Mathemati-\n\ncal Analysis and Applications, 33:82\u201395, 1971.\n\n[13] M. Kloft and G. Blanchard. The local rademacher complexity of \u2113p-norm multiple kernel learning, 2011.\n\narXiv:1103.0790.\n\n[14] M. Kloft, U. Brefeld, S. Sonnenburg, P. Laskov, K.-R. M\u00a8uller, and A. Zien. Ef\ufb01cient and accurate \u2113p-norm\nmultiple kernel learning. In Advances in Neural Information Processing Systems 22, pages 997\u20131005,\n2009.\n\n[15] M. Kloft, U. Brefeld, S. Sonnenburg, and A. Zien. lp-norm multiple kernel learning. Journal of Machine\n\nLearning Research, 12:953\u2013997, 2011.\n\n[16] M. Kloft, U. R\u00a8uckert, and P. L. Bartlett. A unifying view of multiple kernel learning. In ECML/PKDD,\n\n2010.\n\n[17] V. Koltchinskii. Local Rademacher complexities and oracle inequalities in risk minimization. The Annals\n\nof Statistics, 34:2593\u20132656, 2006.\n\n[18] V. Koltchinskii and M. Yuan. Sparse recovery in large ensembles of kernel machines. In COLT, pages\n\n229\u2013238, 2008.\n\n[19] V. Koltchinskii and M. Yuan. Sparsity in multiple kernel learning. The Annals of Statistics, 38(6):3660\u2013\n\n3695, 2010.\n\n[20] G. Lanckriet, N. Cristianini, L. E. Ghaoui, P. Bartlett, and M. Jordan. Learning the kernel matrix with\n\nsemi-de\ufb01nite programming. Journal of Machine Learning Research, 5:27\u201372, 2004.\n\n[21] L. Meier, S. van de Geer, and P. B\u00a8uhlmann. High-dimensional additive modeling. The Annals of Statistics,\n\n37(6B):3779\u20133821, 2009.\n\n[22] C. A. Micchelli and M. Pontil. Learning the kernel function via regularization. Journal of Machine\n\nLearning Research, 6:1099\u20131125, 2005.\n\n[23] C. S. Ong, A. J. Smola, and R. C. Williamson. Learning the kernel with hyperkernels. Journal of Machine\n\nLearning Research, 6:1043\u20131071, 2005.\n\n[24] G. Raskutti, M. Wainwright, and B. Yu. Minimax-optimal rates for sparse additive models over kernel\n\nclasses via convex programming. Technical report, 2010. arXiv:1008.3654.\n\n[25] B. Sch\u00a8olkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.\n[26] J. Shawe-Taylor. Kernel learning for novelty detection. In NIPS 2008 Workshop on Kernel Learning:\n\nAutomatic Selection of Optimal Kernels, Whistler, 2008.\n\n[27] N. Srebro and S. Ben-David. Learning bounds for support vector machines with learned kernels.\n\nCOLT, pages 169\u2013183, 2006.\n\nIn\n\n[28] I. Steinwart. Support Vector Machines. Springer, 2008.\n[29] I. Steinwart, D. Hush, and C. Scovel. Optimal rates for regularized least squares regression. In COLT,\n\n2009.\n\n[30] T. Suzuki and R. Tomioka. Spicymkl: A fast algorithm for multiple kernel learning with thousands of\n\nkernels. Machine Learning, 85(1):77\u2013108, 2011.\n\n[31] R. Tomioka and T. Suzuki. Sparsity-accuracy trade-off in MKL. In NIPS 2009 Workshop: Understanding\n\nMultiple Kernel Learning Methods, Whistler, 2009.\n\n[32] S. van de Geer. Empirical Processes in M-Estimation. Cambridge University Press, 2000.\n[33] A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes: With Applications to\n\nStatistics. Springer, New York, 1996.\n\n[34] M. Varma and B. R. Babu. More generality in ef\ufb01cient multiple kernel learning. In the 26th ICML, pages\n\n1065\u20131072, 2009.\n\n[35] Y. Ying and C. Campbell. Generalization bounds for learning the kernel. In COLT, 2009.\n[36] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of The\n\nRoyal Statistical Society Series B, 68(1):49\u201367, 2006.\n\n9\n\n\f", "award": [], "sourceid": 903, "authors": [{"given_name": "Taiji", "family_name": "Suzuki", "institution": null}]}