{"title": "Localized Structured Prediction", "book": "Advances in Neural Information Processing Systems", "page_first": 7301, "page_last": 7311, "abstract": "Key to structured prediction is exploiting the problem's structure to simplify the learning process. A major challenge arises when data exhibit a local structure (i.e., are made ``by parts'') that can be leveraged to better approximate the relation between (parts of) the input and (parts of) the output. Recent literature on signal processing, and in particular computer vision, shows that capturing these aspects is indeed essential to achieve state-of-the-art performance. However, in this context algorithms are typically derived on a case-by-case basis. In this work we propose the first theoretical framework to deal with part-based data from a general perspective and study a novel method within the setting of statistical learning theory. Our analysis is novel in that it explicitly quantifies the benefits of leveraging the part-based structure of a problem on the learning rates of the proposed estimator.", "full_text": "Localized Structured Prediction\n\nCarlo Ciliberto 1\n\nc.ciliberto@imperial.ac.uk\n\nFrancis Bach 2\n\nfrancis.bach@inria.fr\n\nAlessandro Rudi 2\n\nalessandro.rudi@inria.fr\n\n1 Department of Electrical and Electronic Engineering, Imperial College, London, UK.\n\n2 INRIA - D\u00e9partement d\u2019informatique, \u00c9cole Normale Sup\u00e9rieure - PSL Research University, Paris, France.\n\nAbstract\n\nKey to structured prediction is exploiting the problem\u2019s structure to simplify the\nlearning process. A major challenge arises when data exhibit a local structure\n(i.e., are made \u201cby parts\u201d) that can be leveraged to better approximate the relation\nbetween (parts of) the input and (parts of) the output. Recent literature on signal\nprocessing, and in particular computer vision, shows that capturing these aspects is\nindeed essential to achieve state-of-the-art performance. However, in this context\nalgorithms are typically derived on a case-by-case basis. In this work we propose\nthe \ufb01rst theoretical framework to deal with part-based data from a general per-\nspective and study a novel method within the setting of statistical learning theory.\nOur analysis is novel in that it explicitly quanti\ufb01es the bene\ufb01ts of leveraging the\npart-based structure of a problem on the learning rates of the proposed estimator.\n\n1\n\nIntroduction\n\nStructured prediction deals with supervised learning problems where the output space is not endowed\nwith a canonical linear metric but has a rich semantic or geometric structure [5, 29]. Typical\nexamples are settings in which the outputs correspond to strings (e.g., captioning [19]), images (e.g.,\nsegmentation [1]), rankings [16, 20], points on a manifold [33], probability distributions [24] or\nprotein foldings [18]. While the lack of linearity poses several modeling and computational challenges,\nthis additional complexity comes with a potentially signi\ufb01cant advantage: when suitably incorporated\nwithin the learning model, knowledge about the structure allows to capture key properties of the data.\nThis could potentially lower the sample complexity of the problem, attaining better generalization\nperformance with less training examples. A natural scenario in this sense is the case where both\ninput and output data are organized into \u201cparts\u201d that can interact with one another according to a\nspeci\ufb01c structure. Examples can be found in computer vision (e.g., segmentation [1], localization\n[6, 22], pixel-wise classi\ufb01cation [41]), speech recognition [4, 40], natural language processing [43],\ntrajectory planing [31] or hierarchical classi\ufb01cation [44].\nRecent literature on the topic has empirically shown that the local structure in the data can indeed\nlead to signi\ufb01cantly better predictions than global approaches [17, 45]. However in practice, these\nideas are typically investigated on a case-by-case basis, leading to ad-hoc algorithms that cannot\nbe easily adapted to new settings. On the theoretical side, few works have considered less speci\ufb01c\npart-based factorizations [12] and a comprehensive theory analyzing the effect of local interactions\nbetween parts within the context of learning theory is still missing.\nIn this paper, we propose: 1) a novel theoretical framework that can be applied to a wide family of\nstructured prediction settings able to capture potential local structure in the data, and 2) a structured\nprediction algorithm, based on this framework for which we prove universal consistency and general-\nization rates. The proposed approach builds on recent results from the structured prediction literature\nthat leverage the concept of implicit embeddings [8, 9, 28, 15, 25], also related to [30, 39]. A key\ncontribution of our analysis is to quantify the impact of the part-based structure of the problem on the\nlearning rates of the proposed estimator. In particular, we prove that under natural assumptions on\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\f20\n\n40\n\n60\n\n80\n\n100\n\n120\n\n140\n\n160\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\nFigure 1: (Left) Between-locality in a sequence-to-sequence setting: each window (part) yp of the output\nsequence y is fully determined by the part xp of the input sequence x, for every p 2 P . (Right) Empirical\nwithin-locality Cp,q of 100 images sampled from ImageNet between a 20 \u21e5 20 patch q and the central patch p.\n\n20\n\n40\n\n60\n\n80\n\n100\n\n120\n\n140\n\n160\n\nthe local behavior of the data, our algorithm bene\ufb01ts adaptively from this underlying structure. We\nsupport our theoretical \ufb01ndings with experiments on the task of detecting local orientation of ridges\nin images depicting human \ufb01ngerprints.\n\n2 Learning with Between- & Within-locality\n\nTo formalize the concept of locality within a learning problem, in this work we assume that the data is\nstructured in terms of \u201cparts\u201d. Practical examples of this setting often arise in image/audio or language\nprocessing, where the signal has a natural factorization into patches or sub-sequences. Following\nthese guiding examples, we assume every input x 2 X and output y 2 Y to be interpretable as a\ncollection of (possibly overlapping) parts, and denote xp (respectively yp) its p-th part, with p 2 P a\nset of part identi\ufb01ers (e.g., the position and size of a patch in an image). We assume input and output\nto share same part structure with respect to P . To formalize the intuition that the learning problem\nshould interact well with this structure of parts, we introduce two key assumptions: between-locality\nand within-locality. They characterize respectively the interplay between corresponding input-output\nparts and the correlation of parts within the same input.\nAssumption 1 (Between-locality). yp is conditionally independent from x, given xp, moreover the\nprobability of yp given xp is the same as yq given xq, for any p, q 2 P .\nBetween-locality (BL) assumes that the p-th part of the output y 2 Y depends only on the p-th part of\nthe input x 2 X, see Fig. 1 (Left) for an intuition in the case of sequence-to-sequence prediction.\nThis is often veri\ufb01ed in pixel-wise classi\ufb01cation settings, where the class yp of a pixel p is determined\nonly by the sub-image in the corresponding patch xp. BL essentially corresponds to assuming a joint\ngraphical model on the parts of x and y, where each yp is only connected to xp but not to other parts.\nBL motivates us to focus on a local level by directly learning the relation between input-output parts.\nThis is often an effective strategy in computer vision [22, 45, 17] but intuitively, one that provides\nsigni\ufb01cant advantages only when the input parts are not highly correlated with each other: in the\nextreme case where all parts are identical, there is no advantage in solving the learning problem\nlocally. In this sense it can be useful to measure the amount of \u201ccovariance\u201d\n\nCp,q = Ex S(xp, xq) Ex,x0 S(xp, x0q)\n\n(1)\nbetween two parts p and q of an input x, for S(xp, xq) a suitable measure of similarity between parts\n(if S(xp, xq) = xpxq, with xp and xq scalars random variables, then Cp,q is the p, q-th entry of the\ncovariance matrix of the vector (x1, . . . , x|P|) ). Here ExS(xp, xq) and Ex,x0S(xp, x0q) measure the\nsimilarity between the p-th and the q-th part of, respectively, the same input, and two independent\nones (in particular Cp,q = 0 when the p-th and q-th part of x are independent). In many applications,\nit is reasonable to assume that Cp,q decays according to the distance between p and q.\nAssumption 2 (Within-locality). There exists a distance d : P \u21e5 P ! R and 0, such that\n\n(2)\n\n|Cp,q| 6 r2 ed(p,q)\n\nwith\n\nr2 = sup\n\nx,x0 |S(x, x0)|.\n\nWithin-locality (WL) is always satis\ufb01ed for = 0. However, when xp is independent of xq, it holds\nwith = 1 and d(p, q) = p,q the Dirac\u2019s delta. Exponential decays of correlation are typically\n\n2\n\n\fCp,q decreases extremely fast as a function of the distancep q, suggesting that Assumption 2\n\nobserved when the distribution of the parts of x factorizes in a graphical model that connects parts\nwhich are close in terms of the distance d: although all parts depend on each other, the long-range\ndependence typically goes to zero exponentially fast in the distance (see, e.g., [26] for mixing\nproperties of Markov chains). Fig. 1 (Right) reports the empirical WL measured on 100 images\nrandomly sampled from ImageNet [13]: each pixel (i, j) reports the value of Cp,q of the central\npatch p with respect to a 20 \u21e5 20 patch q centered in (i, j). Here S(xp, xq) = x>p xq. We note that\nholds for a large value of .\nContributions. In this work we present a novel structured prediction algorithm that adaptively\nleverages locality in the learning problem, when present (Sec. 4). We study the generalization\nproperties of the proposed estimator (Sec. 5), showing that it is equivalent to the state of the art in the\nworst case scenario. More importantly, if the locality Assumptions 1 and 2 are satis\ufb01ed, we prove that\nour learning rates improve proportionally to the number |P| of parts in the problem. Here we give an\ninformal version of this main result, reported in more detail in Thm. 4 (Sec. 5). Below we denote by bf\nthe proposed estimator, by E(f ) the expected risk of a function f : X ! Y and f\u21e4 = argminf E(f ).\nTheorem 1 (Informal - Learning Rates & Locality). Under mild assumptions on the loss and the\ndata distribution, if the learning problem is local (Assumptions 1 and 2), there exists c0 > 0 such that\n\nE hE(bf ) E (f\u21e4)i 6 c0\u2713 s\n\nn|P|\u25c61/4\n\n,\n\ns =\n\nr2\n|P|\n\n|P|Xp,q=1\n\ned(p,q),\n\n(3)\n\nwhere the expectation is taken with respect to the sample of n input-output points used to train bf.\nIn the worst-case scenario = 0 (no exponential decay of the covariance between parts), the bound\nin (3) scales as 1/n1/4 (since s = r2|P|) recovering [8], where no structure is assumed on the parts.\nHowever, as soon as > 0, s can be upper bounded by a constant independent of |P| and thus the\nrate scales as 1/(|P|n)1/4, accelerating proportionally to the number of parts. In this sense, Thm. 1\nshows the signi\ufb01cant bene\ufb01t of making use of locality. The following example focuses on the special\ncase of sequence-to-sequence prediction.\nExample 1 (Locality on Sequences). As depicted in Fig. 1, for discrete sequences we can consider\nparts (e.g., windows) indexed by P = {1, . . . ,|P|}, with d(p, q) = |p q| for p, q 2 P (see\nAppendix K.1 for more details). In this case, Assumption 2 leads to\n(4)\nwhich for > 0 is bounded by a constant not depending on the number of parts. Hence, Thm. 1\nguarantees a learning rate of order 1/(n|P|)1/4, which is signi\ufb01canlty faster than the rate 1/n1/4 of\nmethods that do not leverage locality such as [8]. See Sec. 6 for empirical support to this observation.\n\ns 6 2r2(1 e)1,\n\n3 Problem Formulation\n\nWe denote by X, Y and Z respectively the input space, label space and output space of a learning\nproblem. Let \u21e2 be a probability measure on X \u21e5 Y and 4 : Z \u21e5 Y \u21e5 X ! R a loss measuring\nprediction errors between a label y 2 Y and a output z 2 Z, possibly parametrized by an input x 2 X.\nTo stress this interpretation we adopt the notation 4(z, y|x). Given a \ufb01nite number of (xi, yi)n\nindependently sampled from \u21e2, our goal is to approximate the minimizer f\u21e4 of the expected risk\n\ni=1\n\nf :X!Z E(f ), with E(f ) =Z 4(f (x), y|x) d\u21e2(x, y).\n\nmin\n\n(5)\n\nLoss Made by Parts. We formalize the intuition introduced in Sec. 2 that data are decomposable\ninto parts: we denote the sets of parts of X, Y and Z by respectively [X], [Y ] and [Z]. These are\nabstract sets that depend on the problem at hand (see examples below). We assume P to be a set of\npart \u201cindices\u201d equipped with a selection operator X \u21e5 P ! [X] denoted (x, p) 7! [x]p (analogously\nfor Y and Z). When clear from context, we will use the shorthand xp = [x]p. For simplicity, in the\nfollowing we will assume P be \ufb01nite, however our analysis generalizes also to the in\ufb01nite case (see\n\n3\n\n\fsupplementary material). Let \u21e1(\u00b7|x) be a probability distribution over the set of parts P , conditioned\nwith respect to an input x 2 X. We study loss functions 4 that can be represented as\n(6)\n\n\u21e1(p|x) Lp(zp, yp| xp).\n\n4(z, y|x) =Xp2P\n\nThe collection of (Lp)p2P is a family of loss functions Lp : [Z] \u21e5 [Y ] \u21e5 [X] ! R, each comparing\nthe p-th part of a label y and output z. For instance, in an image processing scenario, Lp could\nmeasure the similarity between the two images at different locations and scales, indexed by p. In this\nsense, the distribution \u21e1(p|x) allows to weigh each Lp differently depending on the application (e.g.,\nmistakes at large scales could be more relevant than at lower scales). Various examples of parts and\nconcrete cases are illustrated in the supplementary material, here we report an extract.\nExample 2 (Sequence to Sequence Prediction). Let X = Ak, Y = Z = Bk for two sets A, B\nand k 2 N a \ufb01xed length. We consider in this example parts that are windows of length l 6 k.\nThen P = {1, . . . , k l + 1} where p 2 P indexes the window xp = (x(p), . . . , x(p+l1)), with\nx 2 X, where we have denoted x(s) the s-th entry of the sequence x 2 X, analogous de\ufb01nition\nfor yp, zp. Finally, we choose the loss Lp to be the 0-1 distance between two strings of same length\nLp(zp, yp|x) = 1(zp 6= yp). Finally, we can choose \u21e1(p|x) = 1/|P|, leading to a loss function\n|P|Pp2P 1(zp 6= yp), which is common in the context of Conditional Random Fields\n4(z, y|x) = 1\n(CRFs) [21].\n\nThe example above, highlights a tight connection between the framework considerd in this work\nand the literature of CRFs. However, we care to stress that the two approaches differ by the way\nthey interpret the concepts of loss (used to evaluate \ufb01tting errors at training time) and the score\nfunctions (used to estimate predictions at inference time). Speci\ufb01cally, while such functions are two\nseparate entities in CRF settings, they essentially coincide in our framework (i.e. the score is a linear\ncombination of loss functions). However, as shown in Example 2, the resulting score functions for\nboth CRFs and our approach have essentially the same structure. Hence they ultimately lead to the\nsame inference problem [40]. We conclude this section by providing additional examples of loss\nfunctons decomposable into parts.\nRemark 1 (Examples of Loss Functions by Parts). Several loss functions used in machine learning\nhave a natural formulation in terms of (6). Notable examples are the Hamming distance [10, 42, 11],\nused in settings such as hierarchical classi\ufb01cation [44], computer vision [29, 45, 41] or trajectory\nplanning [31] to name a few. Also, loss functions used in natural language processing, such as\nthe precision/recall and F1 score can be written in this form. Finally, we point out that multi-task\nlearning settings [27] can be seen as problem by parts, with the loss corresponding to the sum of\nstandard regression/classi\ufb01cation loss functions (least-squares, logistic, etc.) over the tasks/parts.\n\n4 Algorithm\n\nIn this section we introduce our estimator for structured prediction problems with parts. Our approach\nstarts with an auxiliary step for dataset generation that explicitly extracts the parts from the data.\nAuxiliary Dataset Generation. The locality assumptions introduced in Sec. 2 motivate us to learn the\nlocal relations between individual parts p 2 P of each input-output pair. In this sense, given a training\ndataset D = (xi, yi)n\ni=1 a \ufb01rst step would be to extract a new, part-based dataset {(xp, p, yp) | (x, y) 2\nD, p 2 P}. However in most applications the cardinality |P| of the set of parts can be very large\n(possibly in\ufb01nite as we discuss in the Appendix) making this process impractical. Instead, we\ngenerate an auxiliary dataset by randomly sub-sampling m 2 N elements from the part-based dataset.\nConcretely, for j 2{ 1, . . . , m}, we \ufb01rst sample ij according to the uniform distribution Un on\n{1, . . . , n}, set j = xij , sample pj \u21e0 \u21e1(\u00b7 | j) and \ufb01nally set \u2318j = [yij ]pj . This leads to the\nauxiliary dataset D0 = (j, pj,\u2318 j)m\nEstimator. Given the auxiliary dataset, we propose the estimator bf : X ! Z, such that 8x 2 X\nz2Z Xp2P\nbf (x) = argmin\nThe functions \u21b5j : X \u21e5 P ! R are learned from the auxiliary dataset and are the fundamental\ncomponents allowing our estimator to capture the part-based structure of the learning problem. Indeed,\n\n\u21b5j(x, p)h\u21e1(p|x) Lp(zp,\u2318 j|xp)i.\n\nj=1, as summarized in the GENERATE routine of Alg. 1.\n\nmXj=1\n\n(7)\n\n4\n\n\fOutput\n\nzAAAB6HicbZC7SwNBEMbnfMbzFbW0WQyCVbiz0UYM2lgmYB6QhLC3mUvW7O0du3tCPAL2NhaK2PrP2Nv537h5FJr4wcKP75thZyZIBNfG876dpeWV1bX13Ia7ubW9s5vf26/pOFUMqywWsWoEVKPgEquGG4GNRCGNAoH1YHA9zuv3qDSP5a0ZJtiOaE/ykDNqrFV56OQLXtGbiCyCP4PC5ad78QgA5U7+q9WNWRqhNExQrZu+l5h2RpXhTODIbaUaE8oGtIdNi5JGqNvZZNARObZOl4Sxsk8aMnF/d2Q00noYBbYyoqav57Ox+V/WTE143s64TFKDkk0/ClNBTEzGW5MuV8iMGFqgTHE7K2F9qigz9jauPYI/v/Ii1E6Lvlf0K16hdAVT5eAQjuAEfDiDEtxAGarAAOEJXuDVuXOenTfnfVq65Mx6DuCPnI8fV0iOxw==\n\nAAAB6HicbZC7SgNBFIbPxltcb1FLm8EgWIVdG23EoI1lAuYCyRJmJ2eTMbMXZmaFuOQJbCwUsdWHsbcR38bJpdDEHwY+/v8c5pzjJ4Ir7TjfVm5peWV1Lb9ub2xube8UdvfqKk4lwxqLRSybPlUoeIQ1zbXAZiKRhr7Ahj+4GueNO5SKx9GNHibohbQX8YAzqo1Vve8Uik7JmYgsgjuD4sWHfZ68f9mVTuGz3Y1ZGmKkmaBKtVwn0V5GpeZM4MhupwoTyga0hy2DEQ1Redlk0BE5Mk6XBLE0L9Jk4v7uyGio1DD0TWVIdV/NZ2Pzv6yV6uDMy3iUpBojNv0oSAXRMRlvTbpcItNiaIAyyc2shPWppEyb29jmCO78yotQPym5TsmtOsXyJUyVhwM4hGNw4RTKcA0VqAEDhAd4gmfr1nq0XqzXaWnOmvXswx9Zbz9I15A7\nAAAB6HicbZC7SgNBFIbPxltcb1FLm8EgWIVdG23EoI1lAuYCyRJmJ2eTMbMXZmaFuOQJbCwUsdWHsbcR38bJpdDEHwY+/v8c5pzjJ4Ir7TjfVm5peWV1Lb9ub2xube8UdvfqKk4lwxqLRSybPlUoeIQ1zbXAZiKRhr7Ahj+4GueNO5SKx9GNHibohbQX8YAzqo1Vve8Uik7JmYgsgjuD4sWHfZ68f9mVTuGz3Y1ZGmKkmaBKtVwn0V5GpeZM4MhupwoTyga0hy2DEQ1Redlk0BE5Mk6XBLE0L9Jk4v7uyGio1DD0TWVIdV/NZ2Pzv6yV6uDMy3iUpBojNv0oSAXRMRlvTbpcItNiaIAyyc2shPWppEyb29jmCO78yotQPym5TsmtOsXyJUyVhwM4hGNw4RTKcA0VqAEDhAd4gmfr1nq0XqzXaWnOmvXswx9Zbz9I15A7\nAAAB6HicbVA9TwJBEJ3DL8Qv1NJmIzGxInc2UhJtLCGRjwQuZG+Zg5W9vcvungle+AU2Fhpj60+y89+4wBUKvmSSl/dmMjMvSATXxnW/ncLG5tb2TnG3tLd/cHhUPj5p6zhVDFssFrHqBlSj4BJbhhuB3UQhjQKBnWByO/c7j6g0j+W9mSboR3QkecgZNVZqPg3KFbfqLkDWiZeTCuRoDMpf/WHM0gilYYJq3fPcxPgZVYYzgbNSP9WYUDahI+xZKmmE2s8Wh87IhVWGJIyVLWnIQv09kdFI62kU2M6ImrFe9ebif14vNWHNz7hMUoOSLReFqSAmJvOvyZArZEZMLaFMcXsrYWOqKDM2m5INwVt9eZ20r6qeW/WabqV+k8dRhDM4h0vw4BrqcAcNaAEDhGd4hTfnwXlx3p2PZWvByWdO4Q+czx/oAYz6\n\nAlgorithm 1\n\npAAAB6HicbVDLSgNBEOyNrxhfUY96GAyCp7DrJR6DXjwmYB6QLGF20puMmZ1dZmaFsOQLvHhQxKtf4Xd48+anOHkcNLGgoajqprsrSATXxnW/nNza+sbmVn67sLO7t39QPDxq6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/VbD6g0j+WdGSfoR3QgecgZNVaqJ71iyS27M5BV4i1IqXr6Uf8GgFqv+NntxyyNUBomqNYdz02Mn1FlOBM4KXRTjQllIzrAjqWSRqj9bHbohJxbpU/CWNmShszU3xMZjbQeR4HtjKgZ6mVvKv7ndVITXvkZl0lqULL5ojAVxMRk+jXpc4XMiLEllClubyVsSBVlxmZTsCF4yy+vkuZl2XPLXt2mcQ1z5OEEzuACPKhAFW6hBg1ggPAIz/Di3DtPzqvzNm/NOYuZY/gD5/0H6NOPNQ==\n\nAAAB6HicbVDLSgNBEOyNrxhfUY+KDAbBU9j1osegF48JmAckS5iddJIxs7PLzKwQlhw9efGgiFe/It/hzW/wJ5w8DppY0FBUddPdFcSCa+O6X05mZXVtfSO7mdva3tndy+8f1HSUKIZVFolINQKqUXCJVcONwEaskIaBwHowuJn49QdUmkfyzgxj9EPak7zLGTVWqsTtfMEtulOQZeLNSaF0PK58P56My+38Z6sTsSREaZigWjc9NzZ+SpXhTOAo10o0xpQNaA+blkoaovbT6aEjcmaVDulGypY0ZKr+nkhpqPUwDGxnSE1fL3oT8T+vmZjulZ9yGScGJZst6iaCmIhMviYdrpAZMbSEMsXtrYT1qaLM2GxyNgRv8eVlUrsoem7Rq9g0rmGGLBzBKZyDB5dQglsoQxUYIDzBC7w6986z8+a8z1ozznzmEP7A+fgBxxiQmw==\nAAAB6HicbVDLSgNBEOyNrxhfUY+KDAbBU9j1osegF48JmAckS5iddJIxs7PLzKwQlhw9efGgiFe/It/hzW/wJ5w8DppY0FBUddPdFcSCa+O6X05mZXVtfSO7mdva3tndy+8f1HSUKIZVFolINQKqUXCJVcONwEaskIaBwHowuJn49QdUmkfyzgxj9EPak7zLGTVWqsTtfMEtulOQZeLNSaF0PK58P56My+38Z6sTsSREaZigWjc9NzZ+SpXhTOAo10o0xpQNaA+blkoaovbT6aEjcmaVDulGypY0ZKr+nkhpqPUwDGxnSE1fL3oT8T+vmZjulZ9yGScGJZst6iaCmIhMviYdrpAZMbSEMsXtrYT1qaLM2GxyNgRv8eVlUrsoem7Rq9g0rmGGLBzBKZyDB5dQglsoQxUYIDzBC7w6986z8+a8z1ozznzmEP7A+fgBxxiQmw==\nAAAB6HicbVBNT8JAEJ3iF+IX6tHLRmLiibRe9Ej04hESCyTQkO0yhZXtttndmpCGX+DFg8Z49Sd589+4QA8KvmSSl/dmMjMvTAXXxnW/ndLG5tb2Tnm3srd/cHhUPT5p6yRTDH2WiER1Q6pRcIm+4UZgN1VI41BgJ5zczf3OEyrNE/lgpikGMR1JHnFGjZVa6aBac+vuAmSdeAWpQYHmoPrVHyYsi1EaJqjWPc9NTZBTZTgTOKv0M40pZRM6wp6lksaog3xx6IxcWGVIokTZkoYs1N8TOY21nsah7YypGetVby7+5/UyE90EOZdpZlCy5aIoE8QkZP41GXKFzIipJZQpbm8lbEwVZcZmU7EheKsvr5P2Vd1z617LrTVuizjKcAbncAkeXEMD7qEJPjBAeIZXeHMenRfn3flYtpacYuYU/sD5/AHY2Yzw\n\nTest\n\nInput\nxAAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi8cW7Ae0oWy2k3btZhN2N2IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3H1FpHst7M0nQj+hQ8pAzaqzUeOqXK27VnYOsEi8nFchR75e/eoOYpRFKwwTVuuu5ifEzqgxnAqelXqoxoWxMh9i1VNIItZ/ND52SM6sMSBgrW9KQufp7IqOR1pMosJ0RNSO97M3E/7xuasJrP+MySQ1KtlgUpoKYmMy+JgOukBkxsYQyxe2thI2ooszYbEo2BG/55VXSuqh6btVrXFZqN3kcRTiBUzgHD66gBndQhyYwQHiGV3hzHpwX5935WLQWnHzmGP7A+fwB5jmM/A==\n\nAAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi8cW7Ae0oWy2k3btZhN2N2IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3H1FpHst7M0nQj+hQ8pAzaqzUeOqXK27VnYOsEi8nFchR75e/eoOYpRFKwwTVuuu5ifEzqgxnAqelXqoxoWxMh9i1VNIItZ/ND52SM6sMSBgrW9KQufp7IqOR1pMosJ0RNSO97M3E/7xuasJrP+MySQ1KtlgUpoKYmMy+JgOukBkxsYQyxe2thI2ooszYbEo2BG/55VXSuqh6btVrXFZqN3kcRTiBUzgHD66gBndQhyYwQHiGV3hzHpwX5935WLQWnHzmGP7A+fwB5jmM/A==\nAAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi8cW7Ae0oWy2k3btZhN2N2IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3H1FpHst7M0nQj+hQ8pAzaqzUeOqXK27VnYOsEi8nFchR75e/eoOYpRFKwwTVuuu5ifEzqgxnAqelXqoxoWxMh9i1VNIItZ/ND52SM6sMSBgrW9KQufp7IqOR1pMosJ0RNSO97M3E/7xuasJrP+MySQ1KtlgUpoKYmMy+JgOukBkxsYQyxe2thI2ooszYbEo2BG/55VXSuqh6btVrXFZqN3kcRTiBUzgHD66gBndQhyYwQHiGV3hzHpwX5935WLQWnHzmGP7A+fwB5jmM/A==\nAAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi8cW7Ae0oWy2k3btZhN2N2IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3H1FpHst7M0nQj+hQ8pAzaqzUeOqXK27VnYOsEi8nFchR75e/eoOYpRFKwwTVuuu5ifEzqgxnAqelXqoxoWxMh9i1VNIItZ/ND52SM6sMSBgrW9KQufp7IqOR1pMosJ0RNSO97M3E/7xuasJrP+MySQ1KtlgUpoKYmMy+JgOukBkxsYQyxe2thI2ooszYbEo2BG/55VXSuqh6btVrXFZqN3kcRTiBUzgHD66gBndQhyYwQHiGV3hzHpwX5935WLQWnHzmGP7A+fwB5jmM/A==\n\nx0\n\nAAAB6XicbZC7SgNBFIbPeo3rLWppMxhEq7Bro40YtLGMYi6QLGF2cjYZMju7zMyKIQR8ABsLRWx9GHs738bJpdDEHwY+/v8c5pwTpoJr43nfzsLi0vLKam7NXd/Y3NrO7+xWdZIphhWWiETVQ6pRcIkVw43AeqqQxqHAWti7GuW1e1SaJ/LO9FMMYtqRPOKMGmvdPhy18gWv6I1F5sGfQuHi0z1/BIByK//VbCcsi1EaJqjWDd9LTTCgynAmcOg2M40pZT3awYZFSWPUwWA86ZAcWqdNokTZJw0Zu787BjTWuh+HtjKmpqtns5H5X9bITHQWDLhMM4OSTT6KMkFMQkZrkzZXyIzoW6BMcTsrYV2qKDP2OK49gj+78jxUT4q+V/RvvELpEibKwT4cwDH4cAoluIYyVIBBBE/wAq9Oz3l23pz3SemCM+3Zgz9yPn4AtLiO9g==\nAAAB6XicbZDLSgMxFIbP1Fsdb1WXboJFdFVm3OhGLLpxWcVeoB1KJs20oZlkSDJiGfoGblwo4rYP496N+Daml4W2/hD4+P9zyDknTDjTxvO+ndzS8srqWn7d3djc2t4p7O7VtEwVoVUiuVSNEGvKmaBVwwynjURRHIec1sP+9TivP1ClmRT3ZpDQIMZdwSJGsLHW3eNxu1D0St5EaBH8GRQvP9yLZPTlVtqFz1ZHkjSmwhCOtW76XmKCDCvDCKdDt5VqmmDSx13atChwTHWQTSYdoiPrdFAklX3CoIn7uyPDsdaDOLSVMTY9PZ+Nzf+yZmqi8yBjIkkNFWT6UZRyZCQar406TFFi+MACJorZWRHpYYWJscdx7RH8+ZUXoXZa8r2Sf+sVy1cwVR4O4BBOwIczKMMNVKAKBCJ4ghd4dfrOs/PmvE9Lc86sZx/+yBn9AKZHkGo=\nAAAB6XicbZDLSgMxFIbP1Fsdb1WXboJFdFVm3OhGLLpxWcVeoB1KJs20oZlkSDJiGfoGblwo4rYP496N+Daml4W2/hD4+P9zyDknTDjTxvO+ndzS8srqWn7d3djc2t4p7O7VtEwVoVUiuVSNEGvKmaBVwwynjURRHIec1sP+9TivP1ClmRT3ZpDQIMZdwSJGsLHW3eNxu1D0St5EaBH8GRQvP9yLZPTlVtqFz1ZHkjSmwhCOtW76XmKCDCvDCKdDt5VqmmDSx13atChwTHWQTSYdoiPrdFAklX3CoIn7uyPDsdaDOLSVMTY9PZ+Nzf+yZmqi8yBjIkkNFWT6UZRyZCQar406TFFi+MACJorZWRHpYYWJscdx7RH8+ZUXoXZa8r2Sf+sVy1cwVR4O4BBOwIczKMMNVKAKBCJ4ghd4dfrOs/PmvE9Lc86sZx/+yBn9AKZHkGo=\nAAAB6XicbVA9TwJBEJ3DL8Qv1NJmIzFakTsbLIk2lmjkI4EL2VvmYMPe3mV3z0gu/AMbC42x9R/Z+W9c4AoFXzLJy3szmZkXJIJr47rfTmFtfWNzq7hd2tnd2z8oHx61dJwqhk0Wi1h1AqpRcIlNw43ATqKQRoHAdjC+mfntR1Sax/LBTBL0IzqUPOSMGivdP533yxW36s5BVomXkwrkaPTLX71BzNIIpWGCat313MT4GVWGM4HTUi/VmFA2pkPsWipphNrP5pdOyZlVBiSMlS1pyFz9PZHRSOtJFNjOiJqRXvZm4n9eNzXhlZ9xmaQGJVssClNBTExmb5MBV8iMmFhCmeL2VsJGVFFmbDglG4K3/PIqaV1WPbfq3bmV+nUeRxFO4BQuwIMa1OEWGtAEBiE8wyu8OWPnxXl3PhatBSefOYY/cD5/AEWAjSk=\n\nTrain\n\np0\n\nAAAB6XicbVC7TsNAEFyHVwivACUUJyIEVWTTQBlBQ5kg8pASKzpf1skp57N1d0aKrPwBDQUI0fITfAcdHZ/C5VFAwkgrjWZ2tbsTJIJr47pfTm5ldW19I79Z2Nre2d0r7h80dJwqhnUWi1i1AqpRcIl1w43AVqKQRoHAZjC8mfjNB1Sax/LejBL0I9qXPOSMGivdJWfdYsktu1OQZeLNSaly/FH7BoBqt/jZ6cUsjVAaJqjWbc9NjJ9RZTgTOC50Uo0JZUPax7alkkao/Wx66ZicWqVHwljZkoZM1d8TGY20HkWB7YyoGehFbyL+57VTE175GZdJalCy2aIwFcTEZPI26XGFzIiRJZQpbm8lbEAVZcaGU7AheIsvL5PGRdlzy17NpnENM+ThCE7gHDy4hArcQhXqwCCER3iGF2foPDmvztusNefMZw7hD5z3H0lSj2Y=\nAAAB6XicbVDLSgNBEOz1GeMr6lGRwSB6Crte9Bj04jER84AkhNnJbDJkdnaZ6RXCkqM3Lx4U8epP5Du8+Q3+hJPHQRMLGoqqbrq7/FgKg6775Swtr6yurWc2sptb2zu7ub39qokSzXiFRTLSdZ8aLoXiFRQoeT3WnIa+5DW/fzP2aw9cGxGpexzEvBXSrhKBYBStdBeftXN5t+BOQBaJNyP54tGo/P14PCq1c5/NTsSSkCtkkhrT8NwYWynVKJjkw2wzMTymrE+7vGGpoiE3rXRy6ZCcWqVDgkjbUkgm6u+JlIbGDELfdoYUe2beG4v/eY0Eg6tWKlScIFdsuihIJMGIjN8mHaE5QzmwhDIt7K2E9aimDG04WRuCN//yIqleFDy34JVtGtcwRQYO4QTOwYNLKMItlKACDAJ4ghd4dfrOs/PmvE9bl5zZzAH8gfPxAyeXkMw=\nAAAB6XicbVDLSgNBEOz1GeMr6lGRwSB6Crte9Bj04jER84AkhNnJbDJkdnaZ6RXCkqM3Lx4U8epP5Du8+Q3+hJPHQRMLGoqqbrq7/FgKg6775Swtr6yurWc2sptb2zu7ub39qokSzXiFRTLSdZ8aLoXiFRQoeT3WnIa+5DW/fzP2aw9cGxGpexzEvBXSrhKBYBStdBeftXN5t+BOQBaJNyP54tGo/P14PCq1c5/NTsSSkCtkkhrT8NwYWynVKJjkw2wzMTymrE+7vGGpoiE3rXRy6ZCcWqVDgkjbUkgm6u+JlIbGDELfdoYUe2beG4v/eY0Eg6tWKlScIFdsuihIJMGIjN8mHaE5QzmwhDIt7K2E9aimDG04WRuCN//yIqleFDy34JVtGtcwRQYO4QTOwYNLKMItlKACDAJ4ghd4dfrOs/PmvE9bl5zZzAH8gfPxAyeXkMw=\nAAAB6XicbVBNS8NAEJ3Ur1q/oh69LBbRU0m86LHoxWMV+wFtKJvtpl262YTdiVBC/4EXD4p49R9589+4bXPQ1gcDj/dmmJkXplIY9Lxvp7S2vrG5Vd6u7Ozu7R+4h0ctk2Sa8SZLZKI7ITVcCsWbKFDyTqo5jUPJ2+H4dua3n7g2IlGPOEl5ENOhEpFgFK30kJ733apX8+Ygq8QvSBUKNPruV2+QsCzmCpmkxnR9L8UgpxoFk3xa6WWGp5SN6ZB3LVU05ibI55dOyZlVBiRKtC2FZK7+nshpbMwkDm1nTHFklr2Z+J/XzTC6DnKh0gy5YotFUSYJJmT2NhkIzRnKiSWUaWFvJWxENWVow6nYEPzll1dJ67LmezX/3qvWb4o4ynACp3ABPlxBHe6gAU1gEMEzvMKbM3ZenHfnY9FacoqZY/gD5/MHOViNIQ==\n\nAAAB8nicbVBNS8NAEN3Ur1q/qh69LBbBU0lE0GPRi8cK9gPaUDabSbt0kw27E6WE/gwvHhTx6q/x5r9x2+agrQ8GHu/NMDMvSKUw6LrfTmltfWNzq7xd2dnd2z+oHh61jco0hxZXUuluwAxIkUALBUrophpYHEjoBOPbmd95BG2ESh5wkoIfs2EiIsEZWqnXfxIhjBjm0XRQrbl1dw66SryC1EiB5qD61Q8Vz2JIkEtmTM9zU/RzplFwCdNKPzOQMj5mQ+hZmrAYjJ/PT57SM6uENFLaVoJ0rv6eyFlszCQObGfMcGSWvZn4n9fLMLr2c5GkGULCF4uiTFJUdPY/DYUGjnJiCeNa2FspHzHNONqUKjYEb/nlVdK+qHtu3bu/rDVuijjK5IScknPikSvSIHekSVqEE0WeySt5c9B5cd6dj0VrySlmjskfOJ8/uwCRiA==\nAAAB8nicbVBNS8NAEN3Ur1q/qh69LBbBU0lE0GPRi8cK9gPaUDabSbt0kw27E6WE/gwvHhTx6q/x5r9x2+agrQ8GHu/NMDMvSKUw6LrfTmltfWNzq7xd2dnd2z+oHh61jco0hxZXUuluwAxIkUALBUrophpYHEjoBOPbmd95BG2ESh5wkoIfs2EiIsEZWqnXfxIhjBjm0XRQrbl1dw66SryC1EiB5qD61Q8Vz2JIkEtmTM9zU/RzplFwCdNKPzOQMj5mQ+hZmrAYjJ/PT57SM6uENFLaVoJ0rv6eyFlszCQObGfMcGSWvZn4n9fLMLr2c5GkGULCF4uiTFJUdPY/DYUGjnJiCeNa2FspHzHNONqUKjYEb/nlVdK+qHtu3bu/rDVuijjK5IScknPikSvSIHekSVqEE0WeySt5c9B5cd6dj0VrySlmjskfOJ8/uwCRiA==\nAAAB8nicbVBNS8NAEN3Ur1q/qh69LBbBU0lE0GPRi8cK9gPaUDabSbt0kw27E6WE/gwvHhTx6q/x5r9x2+agrQ8GHu/NMDMvSKUw6LrfTmltfWNzq7xd2dnd2z+oHh61jco0hxZXUuluwAxIkUALBUrophpYHEjoBOPbmd95BG2ESh5wkoIfs2EiIsEZWqnXfxIhjBjm0XRQrbl1dw66SryC1EiB5qD61Q8Vz2JIkEtmTM9zU/RzplFwCdNKPzOQMj5mQ+hZmrAYjJ/PT57SM6uENFLaVoJ0rv6eyFlszCQObGfMcGSWvZn4n9fLMLr2c5GkGULCF4uiTFJUdPY/DYUGjnJiCeNa2FspHzHNONqUKjYEb/nlVdK+qHtu3bu/rDVuijjK5IScknPikSvSIHekSVqEE0WeySt5c9B5cd6dj0VrySlmjskfOJ8/uwCRiA==\n\nbfAAAB8nicbVBNS8NAEN3Ur1q/qh69LBbBU0lE0GPRi8cK9gPaUDabSbt0kw27E6WE/gwvHhTx6q/x5r9x2+agrQ8GHu/NMDMvSKUw6LrfTmltfWNzq7xd2dnd2z+oHh61jco0hxZXUuluwAxIkUALBUrophpYHEjoBOPbmd95BG2ESh5wkoIfs2EiIsEZWqnXfxIhjBjm0XRQrbl1dw66SryC1EiB5qD61Q8Vz2JIkEtmTM9zU/RzplFwCdNKPzOQMj5mQ+hZmrAYjJ/PT57SM6uENFLaVoJ0rv6eyFlszCQObGfMcGSWvZn4n9fLMLr2c5GkGULCF4uiTFJUdPY/DYUGjnJiCeNa2FspHzHNONqUKjYEb/nlVdK+qHtu3bu/rDVuijjK5IScknPikSvSIHekSVqEE0WeySt5c9B5cd6dj0VrySlmjskfOJ8/uwCRiA==\n\npAAAB6HicbVDLSgNBEOyNrxhfUY96GAyCp7DrJR6DXjwmYB6QLGF20puMmZ1dZmaFsOQLvHhQxKtf4Xd48+anOHkcNLGgoajqprsrSATXxnW/nNza+sbmVn67sLO7t39QPDxq6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/VbD6g0j+WdGSfoR3QgecgZNVaqJ71iyS27M5BV4i1IqXr6Uf8GgFqv+NntxyyNUBomqNYdz02Mn1FlOBM4KXRTjQllIzrAjqWSRqj9bHbohJxbpU/CWNmShszU3xMZjbQeR4HtjKgZ6mVvKv7ndVITXvkZl0lqULL5ojAVxMRk+jXpc4XMiLEllClubyVsSBVlxmZTsCF4yy+vkuZl2XPLXt2mcQ1z5OEEzuACPKhAFW6hBg1ggPAIz/Di3DtPzqvzNm/NOYuZY/gD5/0H6NOPNQ==\n\nAAAB6HicbVDLSgNBEOyNrxhfUY+KDAbBU9j1osegF48JmAckS5iddJIxs7PLzKwQlhw9efGgiFe/It/hzW/wJ5w8DppY0FBUddPdFcSCa+O6X05mZXVtfSO7mdva3tndy+8f1HSUKIZVFolINQKqUXCJVcONwEaskIaBwHowuJn49QdUmkfyzgxj9EPak7zLGTVWqsTtfMEtulOQZeLNSaF0PK58P56My+38Z6sTsSREaZigWjc9NzZ+SpXhTOAo10o0xpQNaA+blkoaovbT6aEjcmaVDulGypY0ZKr+nkhpqPUwDGxnSE1fL3oT8T+vmZjulZ9yGScGJZst6iaCmIhMviYdrpAZMbSEMsXtrYT1qaLM2GxyNgRv8eVlUrsoem7Rq9g0rmGGLBzBKZyDB5dQglsoQxUYIDzBC7w6986z8+a8z1ozznzmEP7A+fgBxxiQmw==\nAAAB6HicbVDLSgNBEOyNrxhfUY+KDAbBU9j1osegF48JmAckS5iddJIxs7PLzKwQlhw9efGgiFe/It/hzW/wJ5w8DppY0FBUddPdFcSCa+O6X05mZXVtfSO7mdva3tndy+8f1HSUKIZVFolINQKqUXCJVcONwEaskIaBwHowuJn49QdUmkfyzgxj9EPak7zLGTVWqsTtfMEtulOQZeLNSaF0PK58P56My+38Z6sTsSREaZigWjc9NzZ+SpXhTOAo10o0xpQNaA+blkoaovbT6aEjcmaVDulGypY0ZKr+nkhpqPUwDGxnSE1fL3oT8T+vmZjulZ9yGScGJZst6iaCmIhMviYdrpAZMbSEMsXtrYT1qaLM2GxyNgRv8eVlUrsoem7Rq9g0rmGGLBzBKZyDB5dQglsoQxUYIDzBC7w6986z8+a8z1ozznzmEP7A+fgBxxiQmw==\nAAAB6HicbVBNT8JAEJ3iF+IX6tHLRmLiibRe9Ej04hESCyTQkO0yhZXtttndmpCGX+DFg8Z49Sd589+4QA8KvmSSl/dmMjMvTAXXxnW/ndLG5tb2Tnm3srd/cHhUPT5p6yRTDH2WiER1Q6pRcIm+4UZgN1VI41BgJ5zczf3OEyrNE/lgpikGMR1JHnFGjZVa6aBac+vuAmSdeAWpQYHmoPrVHyYsi1EaJqjWPc9NTZBTZTgTOKv0M40pZRM6wp6lksaog3xx6IxcWGVIokTZkoYs1N8TOY21nsah7YypGetVby7+5/UyE90EOZdpZlCy5aIoE8QkZP41GXKFzIipJZQpbm8lbEwVZcZmU7EheKsvr5P2Vd1z617LrTVuizjKcAbncAkeXEMD7qEJPjBAeIZXeHMenRfn3flYtpacYuYU/sD5/AHY2Yzw\n\nimplied\u2028\nsimilarity\n`(zp, y0p0)\n\nAAAB+3icbVDLSsNAFJ3UV62vWJduBou0gpREBF0W3bisYB/QhjCZTtqhk8kwMxFjyK+4caGIW3/EnX/jtM1CqwcuHM65l3vvCQSjSjvOl1VaWV1b3yhvVra2d3b37P1qV8WJxKSDYxbLfoAUYZSTjqaakb6QBEUBI71gej3ze/dEKhrzO50K4kVozGlIMdJG8u3qkDDWePTFKUzrfibq+Ylv15ymMwf8S9yC1ECBtm9/DkcxTiLCNWZIqYHrCO1lSGqKGckrw0QRgfAUjcnAUI4iorxsfnsOj40ygmEsTXEN5+rPiQxFSqVRYDojpCdq2ZuJ/3mDRIeXXka5SDTheLEoTBjUMZwFAUdUEqxZagjCkppbIZ4gibA2cVVMCO7yy39J96zpOk339rzWuiriKINDcAQawAUXoAVuQBt0AAYP4Am8gFcrt56tN+t90VqyipkD8AvWxzd+i5Ns\nAAAB+3icbVDLSsNAFJ3UV62vWJduBou0gpREBF0W3bisYB/QhjCZTtqhk8kwMxFjyK+4caGIW3/EnX/jtM1CqwcuHM65l3vvCQSjSjvOl1VaWV1b3yhvVra2d3b37P1qV8WJxKSDYxbLfoAUYZSTjqaakb6QBEUBI71gej3ze/dEKhrzO50K4kVozGlIMdJG8u3qkDDWePTFKUzrfibq+Ylv15ymMwf8S9yC1ECBtm9/DkcxTiLCNWZIqYHrCO1lSGqKGckrw0QRgfAUjcnAUI4iorxsfnsOj40ygmEsTXEN5+rPiQxFSqVRYDojpCdq2ZuJ/3mDRIeXXka5SDTheLEoTBjUMZwFAUdUEqxZagjCkppbIZ4gibA2cVVMCO7yy39J96zpOk339rzWuiriKINDcAQawAUXoAVuQBt0AAYP4Am8gFcrt56tN+t90VqyipkD8AvWxzd+i5Ns\nAAAB+3icbVDLSsNAFJ3UV62vWJduBou0gpREBF0W3bisYB/QhjCZTtqhk8kwMxFjyK+4caGIW3/EnX/jtM1CqwcuHM65l3vvCQSjSjvOl1VaWV1b3yhvVra2d3b37P1qV8WJxKSDYxbLfoAUYZSTjqaakb6QBEUBI71gej3ze/dEKhrzO50K4kVozGlIMdJG8u3qkDDWePTFKUzrfibq+Ylv15ymMwf8S9yC1ECBtm9/DkcxTiLCNWZIqYHrCO1lSGqKGckrw0QRgfAUjcnAUI4iorxsfnsOj40ygmEsTXEN5+rPiQxFSqVRYDojpCdq2ZuJ/3mDRIeXXka5SDTheLEoTBjUMZwFAUdUEqxZagjCkppbIZ4gibA2cVVMCO7yy39J96zpOk339rzWuiriKINDcAQawAUXoAVuQBt0AAYP4Am8gFcrt56tN+t90VqyipkD8AvWxzd+i5Ns\nAAAB+3icbVDLSsNAFJ3UV62vWJduBou0gpREBF0W3bisYB/QhjCZTtqhk8kwMxFjyK+4caGIW3/EnX/jtM1CqwcuHM65l3vvCQSjSjvOl1VaWV1b3yhvVra2d3b37P1qV8WJxKSDYxbLfoAUYZSTjqaakb6QBEUBI71gej3ze/dEKhrzO50K4kVozGlIMdJG8u3qkDDWePTFKUzrfibq+Ylv15ymMwf8S9yC1ECBtm9/DkcxTiLCNWZIqYHrCO1lSGqKGckrw0QRgfAUjcnAUI4iorxsfnsOj40ygmEsTXEN5+rPiQxFSqVRYDojpCdq2ZuJ/3mDRIeXXka5SDTheLEoTBjUMZwFAUdUEqxZagjCkppbIZ4gibA2cVVMCO7yy39J96zpOk339rzWuiriKINDcAQawAUXoAVuQBt0AAYP4Am8gFcrt56tN+t90VqyipkD8AvWxzd+i5Ns\n\nobserved\u2028\nsimilarity\nk(xp, x0p0)\n\nAAAB+HicbVBNS8NAEJ3Ur1o/GvXoZbFIK0hJRNBj0YvHCvYD2hA22227dLMJuxtpDf0lXjwo4tWf4s1/47bNQVsfDDzem2FmXhBzprTjfFu5tfWNza38dmFnd2+/aB8cNlWUSEIbJOKRbAdYUc4EbWimOW3HkuIw4LQVjG5nfuuRSsUi8aAnMfVCPBCszwjWRvLt4qgy9uNzNC77aVyenvl2yak6c6BV4makBBnqvv3V7UUkCanQhGOlOq4Tay/FUjPC6bTQTRSNMRnhAe0YKnBIlZfOD5+iU6P0UD+SpoRGc/X3RIpDpSZhYDpDrIdq2ZuJ/3mdRPevvZSJONFUkMWifsKRjtAsBdRjkhLNJ4ZgIpm5FZEhlphok1XBhOAuv7xKmhdV16m695el2k0WRx6O4QQq4MIV1OAO6tAAAgk8wyu8WU/Wi/VufSxac1Y2cwR/YH3+AC3Zkh0=\nAAAB+HicbVBNS8NAEJ3Ur1o/GvXoZbFIK0hJRNBj0YvHCvYD2hA22227dLMJuxtpDf0lXjwo4tWf4s1/47bNQVsfDDzem2FmXhBzprTjfFu5tfWNza38dmFnd2+/aB8cNlWUSEIbJOKRbAdYUc4EbWimOW3HkuIw4LQVjG5nfuuRSsUi8aAnMfVCPBCszwjWRvLt4qgy9uNzNC77aVyenvl2yak6c6BV4makBBnqvv3V7UUkCanQhGOlOq4Tay/FUjPC6bTQTRSNMRnhAe0YKnBIlZfOD5+iU6P0UD+SpoRGc/X3RIpDpSZhYDpDrIdq2ZuJ/3mdRPevvZSJONFUkMWifsKRjtAsBdRjkhLNJ4ZgIpm5FZEhlphok1XBhOAuv7xKmhdV16m695el2k0WRx6O4QQq4MIV1OAO6tAAAgk8wyu8WU/Wi/VufSxac1Y2cwR/YH3+AC3Zkh0=\nAAAB+HicbVBNS8NAEJ3Ur1o/GvXoZbFIK0hJRNBj0YvHCvYD2hA22227dLMJuxtpDf0lXjwo4tWf4s1/47bNQVsfDDzem2FmXhBzprTjfFu5tfWNza38dmFnd2+/aB8cNlWUSEIbJOKRbAdYUc4EbWimOW3HkuIw4LQVjG5nfuuRSsUi8aAnMfVCPBCszwjWRvLt4qgy9uNzNC77aVyenvl2yak6c6BV4makBBnqvv3V7UUkCanQhGOlOq4Tay/FUjPC6bTQTRSNMRnhAe0YKnBIlZfOD5+iU6P0UD+SpoRGc/X3RIpDpSZhYDpDrIdq2ZuJ/3mdRPevvZSJONFUkMWifsKRjtAsBdRjkhLNJ4ZgIpm5FZEhlphok1XBhOAuv7xKmhdV16m695el2k0WRx6O4QQq4MIV1OAO6tAAAgk8wyu8WU/Wi/VufSxac1Y2cwR/YH3+AC3Zkh0=\nAAAB+HicbVBNS8NAEJ3Ur1o/GvXoZbFIK0hJRNBj0YvHCvYD2hA22227dLMJuxtpDf0lXjwo4tWf4s1/47bNQVsfDDzem2FmXhBzprTjfFu5tfWNza38dmFnd2+/aB8cNlWUSEIbJOKRbAdYUc4EbWimOW3HkuIw4LQVjG5nfuuRSsUi8aAnMfVCPBCszwjWRvLt4qgy9uNzNC77aVyenvl2yak6c6BV4makBBnqvv3V7UUkCanQhGOlOq4Tay/FUjPC6bTQTRSNMRnhAe0YKnBIlZfOD5+iU6P0UD+SpoRGc/X3RIpDpSZhYDpDrIdq2ZuJ/3mdRPevvZSJONFUkMWifsKRjtAsBdRjkhLNJ4ZgIpm5FZEhlphok1XBhOAuv7xKmhdV16m695el2k0WRx6O4QQq4MIV1OAO6tAAAgk8wyu8WU/Wi/VufSxac1Y2cwR/YH3+AC3Zkh0=\n\ny0\n\nAAAB6XicbVC7TsNAEFyHVwivACUUJyIEVWTTQBlBQ5kg8pASKzpf1skp57N1d0aKrPwBDQUI0fITfAcdHZ/C5VFAwkgrjWZ2tbsTJIJr47pfTm5ldW19I79Z2Nre2d0r7h80dJwqhnUWi1i1AqpRcIl1w43AVqKQRoHAZjC8mfjNB1Sax/LejBL0I9qXPOSMGivdjc66xZJbdqcgy8Sbk1Ll+KP2DQDVbvGz04tZGqE0TFCt256bGD+jynAmcFzopBoTyoa0j21LJY1Q+9n00jE5tUqPhLGyJQ2Zqr8nMhppPYoC2xlRM9CL3kT8z2unJrzyMy6T1KBks0VhKoiJyeRt0uMKmREjSyhT3N5K2IAqyowNp2BD8BZfXiaNi7Lnlr2aTeMaZsjDEZzAOXhwCRW4hSrUgUEIj/AML87QeXJenbdZa86ZzxzCHzjvP1b/j28=\nAAAB6XicbVC7SgNBFL3rM8ZX1FKRwSBahV0bLYM2lomYByRLmJ3MJkNmZpeZWWFZUtrZWChi60/kO+z8Bn/CyaPQxAMXDufcy733BDFn2rjul7O0vLK6tp7byG9ube/sFvb26zpKFKE1EvFINQOsKWeS1gwznDZjRbEIOG0Eg5ux33igSrNI3ps0pr7APclCRrCx0l161ikU3ZI7AVok3owUy0ej6vfj8ajSKXy2uxFJBJWGcKx1y3Nj42dYGUY4HebbiaYxJgPcoy1LJRZU+9nk0iE6tUoXhZGyJQ2aqL8nMiy0TkVgOwU2fT3vjcX/vFZiwis/YzJODJVkuihMODIRGr+NukxRYnhqCSaK2VsR6WOFibHh5G0I3vzLi6R+UfLckle1aVzDFDk4hBM4Bw8uoQy3UIEaEAjhCV7g1Rk4z86b8z5tXXJmMwfwB87HDzVEkNU=\nAAAB6XicbVC7SgNBFL3rM8ZX1FKRwSBahV0bLYM2lomYByRLmJ3MJkNmZpeZWWFZUtrZWChi60/kO+z8Bn/CyaPQxAMXDufcy733BDFn2rjul7O0vLK6tp7byG9ube/sFvb26zpKFKE1EvFINQOsKWeS1gwznDZjRbEIOG0Eg5ux33igSrNI3ps0pr7APclCRrCx0l161ikU3ZI7AVok3owUy0ej6vfj8ajSKXy2uxFJBJWGcKx1y3Nj42dYGUY4HebbiaYxJgPcoy1LJRZU+9nk0iE6tUoXhZGyJQ2aqL8nMiy0TkVgOwU2fT3vjcX/vFZiwis/YzJODJVkuihMODIRGr+NukxRYnhqCSaK2VsR6WOFibHh5G0I3vzLi6R+UfLckle1aVzDFDk4hBM4Bw8uoQy3UIEaEAjhCV7g1Rk4z86b8z5tXXJmMwfwB87HDzVEkNU=\nAAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbRU0m86LHoxWMV+wFtKJvtpF262YTdjRBC/4EXD4p49R9589+4bXPQ1gcDj/dmmJkXJIJr47rfTmltfWNzq7xd2dnd2z+oHh61dZwqhi0Wi1h1A6pRcIktw43AbqKQRoHATjC5nfmdJ1Sax/LRZAn6ER1JHnJGjZUesvNBtebW3TnIKvEKUoMCzUH1qz+MWRqhNExQrXuemxg/p8pwJnBa6acaE8omdIQ9SyWNUPv5/NIpObPKkISxsiUNmau/J3IaaZ1Fge2MqBnrZW8m/uf1UhNe+zmXSWpQssWiMBXExGT2NhlyhcyIzBLKFLe3EjamijJjw6nYELzll1dJ+7LuuXXv3q01boo4ynACp3ABHlxBA+6gCS1gEMIzvMKbM3FenHfnY9FacoqZY/gD5/MHRwWNKg==\n\np0\n\nAAAB6XicbVC7TsNAEFyHVwivACUUJyIEVWTTQBlBQ5kg8pASKzpf1skp57N1d0aKrPwBDQUI0fITfAcdHZ/C5VFAwkgrjWZ2tbsTJIJr47pfTm5ldW19I79Z2Nre2d0r7h80dJwqhnUWi1i1AqpRcIl1w43AVqKQRoHAZjC8mfjNB1Sax/LejBL0I9qXPOSMGivdJWfdYsktu1OQZeLNSaly/FH7BoBqt/jZ6cUsjVAaJqjWbc9NjJ9RZTgTOC50Uo0JZUPax7alkkao/Wx66ZicWqVHwljZkoZM1d8TGY20HkWB7YyoGehFbyL+57VTE175GZdJalCy2aIwFcTEZPI26XGFzIiRJZQpbm8lbEAVZcaGU7AheIsvL5PGRdlzy17NpnENM+ThCE7gHDy4hArcQhXqwCCER3iGF2foPDmvztusNefMZw7hD5z3H0lSj2Y=\nAAAB6XicbVDLSgNBEOz1GeMr6lGRwSB6Crte9Bj04jER84AkhNnJbDJkdnaZ6RXCkqM3Lx4U8epP5Du8+Q3+hJPHQRMLGoqqbrq7/FgKg6775Swtr6yurWc2sptb2zu7ub39qokSzXiFRTLSdZ8aLoXiFRQoeT3WnIa+5DW/fzP2aw9cGxGpexzEvBXSrhKBYBStdBeftXN5t+BOQBaJNyP54tGo/P14PCq1c5/NTsSSkCtkkhrT8NwYWynVKJjkw2wzMTymrE+7vGGpoiE3rXRy6ZCcWqVDgkjbUkgm6u+JlIbGDELfdoYUe2beG4v/eY0Eg6tWKlScIFdsuihIJMGIjN8mHaE5QzmwhDIt7K2E9aimDG04WRuCN//yIqleFDy34JVtGtcwRQYO4QTOwYNLKMItlKACDAJ4ghd4dfrOs/PmvE9bl5zZzAH8gfPxAyeXkMw=\nAAAB6XicbVDLSgNBEOz1GeMr6lGRwSB6Crte9Bj04jER84AkhNnJbDJkdnaZ6RXCkqM3Lx4U8epP5Du8+Q3+hJPHQRMLGoqqbrq7/FgKg6775Swtr6yurWc2sptb2zu7ub39qokSzXiFRTLSdZ8aLoXiFRQoeT3WnIa+5DW/fzP2aw9cGxGpexzEvBXSrhKBYBStdBeftXN5t+BOQBaJNyP54tGo/P14PCq1c5/NTsSSkCtkkhrT8NwYWynVKJjkw2wzMTymrE+7vGGpoiE3rXRy6ZCcWqVDgkjbUkgm6u+JlIbGDELfdoYUe2beG4v/eY0Eg6tWKlScIFdsuihIJMGIjN8mHaE5QzmwhDIt7K2E9aimDG04WRuCN//yIqleFDy34JVtGtcwRQYO4QTOwYNLKMItlKACDAJ4ghd4dfrOs/PmvE9bl5zZzAH8gfPxAyeXkMw=\nAAAB6XicbVBNS8NAEJ3Ur1q/oh69LBbRU0m86LHoxWMV+wFtKJvtpl262YTdiVBC/4EXD4p49R9589+4bXPQ1gcDj/dmmJkXplIY9Lxvp7S2vrG5Vd6u7Ozu7R+4h0ctk2Sa8SZLZKI7ITVcCsWbKFDyTqo5jUPJ2+H4dua3n7g2IlGPOEl5ENOhEpFgFK30kJ733apX8+Ygq8QvSBUKNPruV2+QsCzmCpmkxnR9L8UgpxoFk3xa6WWGp5SN6ZB3LVU05ibI55dOyZlVBiRKtC2FZK7+nshpbMwkDm1nTHFklr2Z+J/XzTC6DnKh0gy5YotFUSYJJmT2NhkIzRnKiSWUaWFvJWxENWVow6nYEPzll1dJ67LmezX/3qvWb4o4ynACp3ABPlxBHe6gAU1gEMEzvMKbM3ZenHfnY9FacoqZY/gD5/MHOViNIQ==\n\nFigure 2: Illustration of the prediction process for the\nLocalized Structured Prediction Estimator (7) for a\nhypothetical computer vision application.\n\ntraining set (xi, yi)n\n\nInput:\ni=1, distributions\n\u21e1(\u00b7|x) a reproducing kernel k on X \u21e5 P , hyper-\nparameter > 0, auxiliary dataset size m 2 N.\nGENERATE the auxiliary set (\u2318j, j, pj)m\nj=1:\nSample ij 2 Un(\u00b7). Set j = xij .\nSample pj \u21e0 \u21e1(\u00b7|j). Set \u2318j = [yij ]pj .\n\nLEARN the coef\ufb01cients for the map \u21b5:\n\nSet K with Kjj0 = k((j, pj), (j0 , pj0 )).\nA = (K + mI)1.\n\nReturn the map \u21b5 : (x, p) 7! A v(x, p) 2 Rm\nwith v(x, p)j = k(j, pj), (x, p).\n\nfor any test point x 2 X and part p 2 P , the value \u21b5j(x, p) can be interpreted as a measure of how\nsimilar xp is to the pj-th part of the auxiliary training point j. For instance, assume \u21b5j(x, p) to be\nan approximation of the delta function that is 1 when xp = [j]pj\n\nand 0 otherwise. Then,\n\n\u21b5j(x, p) Lp(zp,\u2318 j|xp) \u21e1 (xp, [j]pj ) Lp(zp,\u2318 j|xp),\n\nwhich implies essentially that\n\n(8)\n\nxp \u21e1 [j]pj =) zp \u21e1 \u2318j.\n\n(9)\nIn other words, if the p-th part of test input x and the pj-th part of the auxiliary training input j (i.e.,\nthe pj-th part of the training input xij ) are deemed similar, then the estimator will encourage the\np-th part of the test output z to be similar to the auxiliary part \u2318j. This process is illustrated in Fig. 2\nfor an ideal computer vision application: for a given test image x, the \u21b5 scores detect a similarity\nbetween the p-th patch of x and the pj-th patch of the training input xij . Hence, the estimator will\nenforce the p-th patch of the output z to be similar to the pj-th patch of the training label yij .\nLearning \u21b5. In line with previous work on structured prediction [8], we learn each \u21b5j by solving a\nlinear system for a problem akin to kernel ridge regression (see Sec. 5 for the theoretical motivation).\nIn particular, let k : (X \u21e5 P ) \u21e5 (X \u21e5 P ) ! R be a positive de\ufb01nite kernel, we de\ufb01ne\n\n(\u21b51(x, p), . . . ,\u21b5 m(x, p))> = (K + mI)1v(x, p),\n\n(10)\nwhere K 2 Rm\u21e5m is the empircal kernel matrix with entries Kjh = k((j, pj), (h, ph)) and\nv(x, p) 2 Rm is the vector with entries v(x, p)j = k((j, pj), (x, p)). Training the proposed\nalgorithm, consists in precomputing A = (K + mI)1 to evaluate the coef\ufb01cients \u21b5 as detailed by\nthe LEARN routine in Alg. 1. While computing A amounts to solving a linear system, which requires\nO(m3) operations, we note that it is possible to achieve the same statistical accuracy with reduced\ncomplexity O(mpm) by means of low rank approximations (see [14, 32]).\n\nRemark 2 (Evaluating bf). According to (7), evaluating bf on a test point x 2 X consists in solving\nan optimization problem over the output space Z. This is a standard strategy in structured prediction,\nwhere an optimization protocol is derived on a case-by-case basis depending on both 4 and Z\n(see, e.g., [29]). Hence, from a computational viewpoint, the inference step in this work is not\nmore demanding than previous methods (while also enjoying strong theoretical guarantees on the\nprediction perfomance, as discussed in Sec. 5). Moreover, the speci\ufb01c form of our estimator suggests\na general stochastic meta-algorithm to address the inference problem in special settings. In particular,\nwe can reformulate (7) as\n\nwith p sampled according to \u21e1, j 2{ 1, . . . , m} sampled according to the weights \u21b5j and hj,p suitably\nde\ufb01ned in terms of Lp. When the hj,p are (sub)differentiable, (11) can be effectively addressed by\nstochastic gradient methods (SGM). In Alg. 3 in Appendix J we give an example of this strategy.\n\nEj,p hj,p(z|x),\n\n(11)\n\nbf (x) = argmin\n\nz2Z\n\n5\n\n\f5 Generalization Properties of Structured Prediction with Parts\n\nIn this section we study the statistical properties for the proposed algorithm, with particular attention\nto the impact of locality on learning rates, see Thm. 4 (for a complete analysis of univeral consistency\nand learning rates without locality assumptions, see Appendices F and H). Our analysis leverages the\nassumption that the loss function 4 is a Structure Encoding Loss Function (SELF) by Parts.\nDe\ufb01nition 1 (SELF by Parts). A function 4 : Z\u21e5Y \u21e5X ! R is a Structure Encoding Loss Function\n(SELF) by Parts if it admits a factorization in the form of (6) with functions Lp : [Z]\u21e5[Y ]\u21e5[X] ! R,\nand there exists a separable Hilbert space H and two bounded maps : [Z] \u21e5 [X] \u21e5 P !H ,\n' : [Y ] !H such that for any \u21e3 2 [Z], \u2318 2 [Y ], \u21e0 2 [X], p 2 P\n(12)\n\nLp(\u21e3,\u2318 |\u21e0) = h (\u21e3,\u21e0, p ),' (\u2318)iH .\n\nThe de\ufb01nition of \u201cSELF by Parts\u201d specializes the de\ufb01nition of SELF in [9] and in the following\nwe will always assume 4 to satisfy it. Indeed, Def. 1 is satis\ufb01ed when the spaces of parts involved\nare discrete sets and it is rather mild in the general case (see [8] for an exhaustive list of examples).\nNote that when 4 is SELF, the solution of (5) is completely characterized in terms of the conditional\nexpectation (related to the conditional mean embedding [7, 23, 36, 34]) of '(yp) given x, denoted by\ng\u21e4 : X \u21e5 P !H , as follows.\nLemma 2. Let 4 be SELF and Z compact. Then, the minimizer of (5) is \u21e2X-a.e. characterized by\nf\u21e4(x) = argmin\n(13)\n\n,\n\n\u21e1(p|x)h (zp, xp, p), g\u21e4(x, p)iH\n\ng\u21e4(x, p) =ZY\n\n'(yp)d\u21e2(y|x).\n\nz2Z Xp2P\n\nLemma 2 (proved in Appendix C) shows that f\u21e4 is completely characterized in terms of the conditional\n\nexpectation g\u21e4, which indeed plays a key role in controlling the learning rates of bf. In particular,\n\nwe investigate the learning rates in light of the two assumptions of between- and within-locality\nintroduced in Sec. 2. To this end, we \ufb01rst study the direct effects of these two assumptions on the\nlearning framework introduced in this work.\nThe effect of Between-locality. We start by observing that the between-locality between parts of the\ninputs and parts of the output allows for a re\ufb01ned characterization of the conditional mean g\u21e4.\nLemma 3. Let g\u21e4 be de\ufb01ned as in (13). Under Assumption 1, there exists \u00afg\u21e4 : [X] !H such that\n(14)\n\ng\u21e4(x, p) = \u00afg\u21e4(xp)\n\n8x 2 X, p 2 P.\n\nLemma 3 above shows that we can learn g\u21e4 by focusing on a \u201csimpler\u201d problem, identi\ufb01ed by\nthe function \u00afg\u21e4 acting only the parts [X] of X rather than on the whole input directly (for a proof\nsee Lemma 21 in Appendix G). This motivates the adoption of the restriction kernel [6], namely a\nfunction k : (X \u21e5 P ) \u21e5 (X \u21e5 P ) ! R such that\n\nk((x, p), (x0, q)) = \u00afk(xp, xq),\n\n(15)\nwhich, for any pair of inputs x, x0 2 X and parts p, q 2 P , measures the similarity between the p-part\nof x and the q-th part of q via a kernel \u00afk : [X] \u21e5 [X] ! R on the parts of X. The restriction kernel is\na well-established tool in structured prediction settings [6] and it has been observed to be remarkably\neffective in computer vision applications [22, 45, 17].\nThe effect of Within-locality. We recall that within-locality characterizes the statistical correlation\nbetween two different parts of the input (see Assumption 2). To this end we consider the simpli\ufb01ed\nfor any\nscenario where the parts are sampled from the uniform distribution on P , i.e., \u21e1(p|x) = 1\n|P|\nx 2 X and p 2 P . While more general situations can be considered, this setting is useful to illustrate\nthe effect we are interested in this work. We now de\ufb01ne some important quantities that characterize\nthe learning rates under locality,\n\nCp,q = Ex,x0\u21e5 \u00afk(xp, xq)2 \u00afk(xp, x0q)2\u21e4 ,\n\nr = sup\n\nx2X,p2P\n\n\u00afk(xp, xp).\n\n(16)\n\nIt is clear that the terms Cp,q and r above correspond respectively to the correlations introduced in (1)\n\nand the scale parameter introduced in (2), with similarity function S = \u00afk2. Let bf be the structured\nprediction estimator in (7) learned using the restriction kernel in (15) based on \u00afk and denote by \u00afG the\n\n6\n\n\f0.12\n\n0.1\n\n0.08\n\n0.06\n\n0.04\n\n0.02\n\n Local-\n\nLocal-LS\n\nGlobal-\n\nKRLS\n\nFigure 3: Learning the direction of ridges in \ufb01ngerprint images. (Left) Examples of ground truths and predictions\nwith pixels\u2019 color corresponding to the local direction of ridges. (Right) Test error according to 4 in (18).\n\nspace of functions \u00afG = H\u2326 \u00afF with \u00afF the reproducing kernel Hilbert space [3] associated to \u00afk. In\nparticular, in the following we will consider the standard assumption in the context of non-parametric\nestimation [7] on the regularity of the target function, which in our context reads as \u00afg\u21e4 2 \u00afG. Finally\nwe introduce c2\nto measure the \u201ccomplexity\u201d of the loss 4\nw.r.t. the representation induced by SELF decomposition (Def. 1) analogously to Thm. 2 of [8].\nTheorem 4 (Learning Rates & Locality). Under Assumptions 1 and 2 with S = \u00afk2, let \u00afg\u21e4 satisfying\nLemma 3, with \u00afg = k\u00afg\u21e4k \u00afG < 1. Let s be as in (3). When = (r2/m + s/(|P|n))1/2, then\n\n4 = supz2Z,x2X\n\nH\n\n1\n\n|P|Pp2P k (z, x, p)k2\nE E(bf ) E (f\u21e4) 6 12 c4 \u00afg \u2713 r2\n\nm\n\nThe proof of the result above can be found in Appendix G.1. We can see that between- and within-\nlocality allow to re\ufb01ne (and potentially improve) the bound of n1/4 from structured prediction\nwithout locality [8] (see also Thm. 5 in Appendix F). In particular, we observe that the adoption of the\nrestriction kernel in Thm. 4 allows the structured prediction estimator to leverage the within-locality,\ngaining a bene\ufb01t proportional to the magnitude of the parameter . Indeed r2 6 s 6 r2|P| by\nde\ufb01nition. More precisely, if = 0 (e.g., all parts are identical copies) then s = r2|P| and we recover\nthe rate of O(n1/4) of [8], while if is large (the parts are almost not correlated) then s = r2 and we\ncan take m / n|P| achieving a rate of the order of O(n|P|)1/4. We clearly see that depending\n\non the amount of within-locality in the learning problem, the proposed estimator is able to gain\nsigni\ufb01cantly in terms of \ufb01nite sample bounds.\n\n+\n\nr2\n|P|n\n\n+\n\ns\n\n|P|n\u25c61/4\n\n.\n\n(17)\n\n6 Empirical Evaluation\n\nWe evaluate the proposed estimator on simulated as well as real data. We highlight how locality leads\nto improved generalization performance, in particular when only few training examples are available.\nLearning the Direction of Ridges for Fingerprint. Similarly to [37], we considered the problem of\ndetecting the pointwise direction of ridges in a \ufb01ngerprint image on the FVC04 dataset1 comprising 80\ngrayscale 640 \u21e5 480 input images depicting \ufb01ngerprints and corresponding output images encoding\nin each pixel the local direction of the ridges of the input \ufb01ngerprint as an angle \u2713 2 [\u21e1, \u21e1 ]. A\nnatural loss function is the average pixel-wise error sin(\u2713 \u27130)2 between a ground-truth angle \u2713 and\nthe predicted \u27130 according to the geodesic distance on the sphere. To apply the proposed algorithm,\nwe consider the following representation of the loss in term of parts: let P be the collection of patches\nof dimension 20 \u21e5 20 and equispaced each 5 \u21e5 5 pixels2 so that each pixel belongs exactly to 16\npatches. For all z, y 2 R640\u21e5480, the average pixel-wise error is\n1\n\nL(zp, yp),\n\nwith\n\nL(\u21e3,\u2318 ) =\n\nsin([\u21e3]ij, [\u2318]ij)2,\n\n(18)\n\n4(z, y) =\n\n16\n\n|P|Xp2P\n\n20Xi,j=1\n\n20 \u21e5 20\n\n1http://bias.csr.unibo.it/fvc2004, DB1_B. The output is obtained by applying 7\u21e5 7 Sobel \ufb01ltering.\n2For simplicity we assume \u201ccircular images\u201d, namely [x]i,j = [x](i mod 640),(j mod 480).\n\n7\n\n\f103\n\nr\no\nr\nr\n\n \n\nE\nd\ne\nr\na\nu\nq\nS\nn\na\ne\nM\n\n \n\n = 0.0, Global-LS\n = 0.1, Global-LS\n = 0.2, Global-LS\n = 0.6, Global-LS\n = 1.6, Global-LS\n = 4.0, Global-LS\n = 10.0, Global-LS\n\nIndependent-Parts-LS\n\n = 0.1, Local-LS\n = 0.2, Local-LS\n = 0.6, Local-LS\n = 1.6, Local-LS\n = 4.0, Local-LS\n = 10.0, Local-LS\n\n102\n\nFigure 4: Empirical estimation of within-locality\nfor the central patch of the \ufb01ngerprints dataset.\n\nFigure 5: Effect of within-locality w.r.t. and |P|:\nGlobal-LS vs. IndependentParts-LS vs. Local-LS (ours).\n\n1\n\n50\n\n100\n\n# of Parts\n\n150\n\n200\n\nwhere \u21e3 = zp,\u2318 = yp 2 [\u21e1, \u21e1 ]20\u21e520 are the extracted patches and [\u00b7]ij their value at pixel (i, j).\nWe compared our approach using 4 (Local-4) or least-squares (Local-LS) with competitors that do\nnot take into account the local structure of the problem, namely standard vector-valued kernel ridge\nregression (KRLS) [7] and the structured prediction algorithm in [8] with 4 loss (4-Global). We used\na Gaussian kernel on the input (for the local estimators the restriction kernel in (15) with \u00afk Gaussian).\nWe randomly sampled 50/30 images for training/testing, performing 5-fold cross-validation on in\n[106, 10] (log spaced) and the kernel bandwidth in [103, 1]. For Local-4 and Local-LS we built\nan auxiliary set with m = 30000 random patches (see Sec. 4), sampled from the 50 training images.\nResults. Fig. 3 (Left) reports the average prediction error across 10 random train-test splits. We make\ntwo observations: \ufb01rst, methods that leverage the locality in the data are consistently superior to their\n\u201cglobal\u201d counterparts, supporting our theoretical results in Sec. 5 that the proposed estimator can lead\nto signi\ufb01cantly better performance, in particular when few training points are available. Second, the\nexperiment suggests that choosing the right loss is critical, since exploiting locality without the right\nloss (i.e., Local-LS in the \ufb01gure) generally leads to worse performance. The three sample predictions\nin Fig. 3 (Right) provide more qualitative insights on the models tested. In particular while both\nlocality-aware methods are able to recover the correct structure of the \ufb01ngerprints, only combining\nthis information with the loss 4 leads to accurate recovery of the ridge orientation.\nWithin-locality. In Fig. 4 we visualize the (empirical) within-locality of the central patch p for the\n\ufb01ngerprint dataset. The \ufb01gure depicts Cp,q (de\ufb01ned in (16)) for q 2 P , with the (i, j)-th pixel in\nthe image corresponding to Cp,q with q the 20 \u21e5 20 patch centered in (i, j). The fast decay of these\nvalues as the distance from the central patch p increase, suggests that within-locality holds for a large\nvalue of , possibly justifying the good performance exhibited by (Local-4) in light of Thm. 4.\nSimulation: Within-Locality. We complement our analysis with synthetic experiments where we\ncontrol the \u201camount\u201d of within-locality . We considered a setting where input points are vectors\nx 2 Rk|P| comprising |P| parts of dimension k = 1000. Inputs are sampled according to a normal\ndistribution with zero mean and covariance \u2303() = M () \u2326 I, where M () 2 R|P|\u21e5|P| has entries\nM ()pq = ed(p,q) and d(p, q) = |p q|/|P|. By design, as grows C varies from being rank-one\n(all parts are identical copies) to diagonal (all parts are independently sampled).\nTo isolate the effect of within-locality on learning, we tested our estimator on a linear multitask (actu-\nally vector-valued) regression problem with least-squares loss 4. We generated datasets (xi, yi)n\ni=1\nof size n = 100 for training and n = 1000 for testing, with xi sampled as described above and\nyi = w>xi + \u270f with noise \u270f 2 Rk|P| sampled from an isotropic Gaussian with standard deviation\n0.5. To guarantee between-locality to hold, we generated the target vector w = [ \u00afw, . . . , \u00afw] 2 Rk|P|\nby concatenating copies of a \u00afw 2 Rk sampled uniformly on the radius-one ball. We performed\nregression with linear restriction kernel on the parts/subvectors (Local-LS) on the \u201cfull\u201d auxiliary\ndataset ([xi]p, [yi]p) with 1 6 i 6 n and 1 6 p 6 |P|, and compared it with standard linear regres-\nsion (Global-LS) on the original dataset (xi, yi)n\ni=1 and linear regression performed independently\nfor each (local) subdataset ([xi]p, [yi]p)n\ni=1 (IndependentParts - LS). The parameter was chosen by\nhold-out cross-validation in [106, 10] (log spaced).\n\n8\n\n\fFig. 5 reports the (log scale) mean square error (MSE) across 100 runs of the two estimators for\nincreasing values of and |P|. In line with Thm. 4, when and |P| are large, Local-LS signi\ufb01cantly\noutperforms both i) Global-LS, which solves one single problem jointly and does not bene\ufb01t within-\nlocality, and ii) IndependentParts-LS, which is insensitive to the between-locality across parts and\nsolves each local prediction problem in isolation. For a smaller , such advantage becomes less\nprominent even when the number of parts is large. This is expected since for = 0 the input parts\nare extremely correlated and there is no within locality that can be exploited.\n\n7 Conclusion\n\nWe proposed a novel approach for structured prediction in presence of locality in the data. Our\nmethod builds on [8] by incorporating knowledge of the parts directly within the learning model. We\nproved the bene\ufb01ts of locality by showing that, under a low-correlation assumption on the parts of the\ninput (within locality), the learning rates of our estimator can improve proportionally to the number\nof parts in the data. To obtain this result we additionally introduced a natural assumption on the\nconditional independence between input-output parts (between locality), which provides also a formal\njusti\ufb01cation for adoption of the so-called \u201crestriction kernel\u201d, previously proposed in the literature,\nas a mean to lower the sample complexity of the problem. Empirical evaluation on synthetic as\nwell as real data shows that our approach offers signi\ufb01cant advantages when few training points are\navailable and leveraging structural information such as locality is crucial to achieve good prediction\nperformance. We identify two main directions for future work: 1) consider settings where the parts\nare unknown (or \u201clatent\u201d) and need to be discovered/learned from data; 2) Consider more general\nlocality assumptions. In particular, we argue that Assumption 2 (WL) might be weakened to account\nfor different (but related) local input-output relations across adjacent parts.\n\nReferences\n[1] Karteek Alahari, Pushmeet Kohli, and Philip H. S. Torr. Reduce, reuse & recycle: Ef\ufb01ciently\nsolving multi-label MRFs. In Conference on Computer Vision and Pattern Recognition (CVPR),\npages 1\u20138, 2008.\n\n[2] Charalambos D. Aliprantis and Kim Border. In\ufb01nite Dimensional Analysis: a Hitchhiker\u2019s\n\nHuide. Springer Science & Business Media, 2006.\n\n[3] Nachman Aronszajn. Theory of reproducing kernels. Transactions of the American Mathemati-\n\ncal Society, 68(3):337\u2013404, 1950.\n\n[4] Lalit Bahl, Peter Brown, Peter De Souza, and Robert Mercer. Maximum mutual information esti-\nmation of hidden markov model parameters for speech recognition. In International Conference\non Acoustics, Speech, and Signal Processing (ICASSP), volume 11, pages 49\u201352, 1986.\n\n[5] G. H. Bakir, T. Hofmann, B. Sch\u00f6lkopf, A. J. Smola, B. Taskar, and S. V. N. Vishwanathan.\n\nPredicting Structured Data. MIT Press, 2007.\n\n[6] Matthew B. Blaschko and Christoph H. Lampert. Learning to localize objects with structured\noutput regression. In European Conference on Computer Vision, pages 2\u201315. Springer, 2008.\n[7] Andrea Caponnetto and Ernesto De Vito. Optimal rates for the regularized least-squares\n\nalgorithm. Foundations of Computational Mathematics, 7(3):331\u2013368, 2007.\n\n[8] Carlo Ciliberto, Lorenzo Rosasco, and Alessandro Rudi. A consistent regularization approach\nfor structured prediction. Advances in Neural Information Processing Systems 29 (NIPS), pages\n4412\u20134420, 2016.\n\n[9] Carlo Ciliberto, Alessandro Rudi, Lorenzo Rosasco, and Massimiliano Pontil. Consistent multi-\ntask learning with nonlinear output relations. In Advances in Neural Information Processing\nSystems, pages 1983\u20131993, 2017.\n\n[10] Michael Collins. Parameter estimation for statistical parsing models: Theory and practice of\ndistribution-free methods. In New Developments in Parsing Technology, pages 19\u201355. Springer,\n2004.\n\n9\n\n\f[11] Corinna Cortes, Vitaly Kuznetsov, and Mehryar Mohri. Ensemble methods for structured\n\nprediction. In International Conference on Machine Learning, pages 1134\u20131142, 2014.\n\n[12] Corinna Cortes, Vitaly Kuznetsov, Mehryar Mohri, and Scott Yang. Structured prediction theory\nbased on factor graph complexity. In Advances in Neural Information Processing Systems,\npages 2514\u20132522, 2016.\n\n[13] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-\nscale hierarchical image database. In 2009 IEEE conference on computer vision and pattern\nrecognition, pages 248\u2013255. Ieee, 2009.\n\n[14] Aymeric Dieuleveut, Nicolas Flammarion, and Francis Bach. Harder, better, faster, stronger con-\nvergence rates for least-squares regression. Journal of Machine Learning Research, 18(1):3520\u2013\n3570, 2017.\n\n[15] Moussab Djerrab, Alexandre Garcia, Maxime Sangnier, and Florence d\u2019Alch\u00e9 Buc. Output\n\n\ufb01sher embedding regression. Machine Learning, 107(8-10):1229\u20131256, 2018.\n\n[16] John C. Duchi, Lester W. Mackey, and Michael I. Jordan. On the consistency of ranking\nalgorithms. In Proceedings of the International Conference on Machine Learning (ICML),\npages 327\u2013334, 2010.\n\n[17] Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester, and Deva Ramanan. Object\nIEEE Transactions on Pattern\n\ndetection with discriminatively trained part-based models.\nAnalysis and Machine Intelligence, 32(9):1627\u20131645, 2010.\n\n[18] Thorsten Joachims, Thomas Hofmann, Yisong Yue, and Chun-Nam Yu. Predicting structured\n\nobjects with support vector machines. Communications of the ACM, 52(11):97\u2013104, 2009.\n\n[19] Andrej Karpathy and Li Fei-Fei. Deep visual-semantic alignments for generating image\ndescriptions. In Proceedings of the Conference on Computer Vision and Pattern Recognition,\npages 3128\u20133137, 2015.\n\n[20] Anna Korba, Alexandre Garcia, and Florence d\u2019Alch\u00e9 Buc. A structured prediction approach\nfor label ranking. In Advances in Neural Information Processing Systems, pages 8994\u20139004,\n2018.\n\n[21] John Lafferty, Andrew McCallum, and Fernando C. N. Pereira. Conditional random \ufb01elds:\n\nProbabilistic models for segmenting and labeling sequence data. 2001.\n\n[22] Christoph H. Lampert, Matthew B. Blaschko, and Thomas Hofmann. Ef\ufb01cient subwindow\nsearch: A branch and bound framework for object localization. IEEE Transactions on Pattern\nAnalysis and Machine Intelligence, 31(12):2129\u20132142, 2009.\n\n[23] Guy Lever, Luca Baldassarre, Sam Patterson, Arthur Gretton, Massimiliano Pontil, and Steffen\nGr\u00fcnew\u00e4lder. Conditional mean embeddings as regressors. In International Conference on\nMachine Learing (ICML), volume 5, 2012.\n\n[24] Giulia Luise, Alessandro Rudi, Massimiliano Pontil, and Carlo Ciliberto. Differential properties\nIn Advances in Neural\n\nof sinkhorn approximation for learning with wasserstein distance.\nInformation Processing Systems, pages 5859\u20135870, 2018.\n\n[25] Giulia Luise, Dimitris Stamos, Massimiliano Pontil, and Carlo Ciliberto. Leveraging low-rank\nrelations between surrogate tasks in structured prediction. International Conference on Machine\nLearning (ICML), 2019.\n\n[26] Sean P. Meyn and Richard L. Tweedie. Markov Chains and Stochastic Stability. Springer\n\nScience & Business Media, 2012.\n\n[27] Charles A. Micchelli and Massimiliano Pontil. Kernels for multi\u2013task learning. In Advances in\n\nNeural Information Processing Systems, pages 921\u2013928, 2004.\n\n[28] Alex Nowak-Vila, Francis Bach, and Alessandro Rudi. Sharp analysis of learning with discrete\n\nlosses. AISTATS, 2018.\n\n10\n\n\f[29] Sebastian Nowozin, Christoph H Lampert, et al. Structured learning and prediction in computer\n\nvision. Foundations and Trends in Computer Graphics and Vision, 2011.\n\n[30] Anton Osokin, Francis Bach, and Simon Lacoste-Julien. On structured prediction theory with\ncalibrated convex surrogate losses. In Advances in Neural Information Processing Systems,\npages 302\u2013313, 2017.\n\n[31] Nathan D. Ratliff, J. Andrew Bagnell, and Martin A. Zinkevich. Maximum margin planning.\nIn Proceedings of the International Conference on Machine Learning, pages 729\u2013736. ACM,\n2006.\n\n[32] Alessandro Rudi, Luigi Carratino, and Lorenzo Rosasco. Falkon: An optimal large scale kernel\n\nmethod. In Advances in Neural Information Processing Systems, pages 3891\u20133901, 2017.\n\n[33] Alessandro Rudi, Carlo Ciliberto, GianMaria Marconi, and Lorenzo Rosasco. Manifold\nstructured prediction. In Advances in Neural Information Processing Systems, pages 5610\u20135621,\n2018.\n\n[34] Rahul Singh, Maneesh Sahani, and Arthur Gretton. Kernel instrumental variable regression.\n\nAdvances in Neural Information Processing Systems, 2019.\n\n[35] Steve Smale and Ding-Xuan Zhou. Learning theory estimates via integral operators and their\n\napproximations. Constructive Approximation, 26(2):153\u2013172, 2007.\n\n[36] Le Song, Kenji Fukumizu, and Arthur Gretton. Kernel embeddings of conditional distributions:\nA uni\ufb01ed kernel framework for nonparametric inference in graphical models. IEEE Signal\nProcessing Magazine, 30(4):98\u2013111, 2013.\n\n[37] Florian Steinke, Matthias Hein, and Bernhard Sch\u00f6lkopf. Nonparametric regression between\n\ngeneral riemannian manifolds. SIAM Journal on Imaging Sciences, 3(3):527\u2013563, 2010.\n\n[38] Ingo Steinwart and Andreas Christmann. Support Vector Machines. Information Science and\n\nStatistics. Springer New York, 2008.\n\n[39] Kirill Struminsky, Simon Lacoste-Julien, and Anton Osokin. Quantifying learning guarantees\nfor convex but inconsistent surrogates. In Advances in Neural Information Processing Systems,\npages 669\u2013677, 2018.\n\n[40] Charles Sutton and Andrew McCallum. An introduction to conditional random \ufb01elds. Founda-\n\ntions and Trends R in Machine Learning, 4(4):267\u2013373, 2012.\n\n[41] Martin Szummer, Pushmeet Kohli, and Derek Hoiem. Learning CRFs using graph cuts. In\n\nEuropean Conference on Computer Vision, pages 582\u2013595. Springer, 2008.\n\n[42] Ben Taskar, Carlos Guestrin, and Daphne Koller. Max-margin Markov networks. In Advances\n\nin Neural Information Processing Systems, pages 25\u201332, 2004.\n\n[43] Ioannis Tsochantaridis, Thorsten Joachims, Thomas Hofmann, and Yasemin Altun. Large\nmargin methods for structured and interdependent output variables. volume 6, pages 1453\u20131484,\n2005.\n\n[44] Devis Tuia, Jordi Munoz-Mari, Mikhail Kanevski, and Gustavo Camps-Valls. Structured\noutput svm for remote sensing image classi\ufb01cation. Journal of Signal Processing Systems,\n65(3):301\u2013310, 2011.\n\n[45] Andrea Vedaldi and Andrew Zisserman. Structured output regression for detection with partial\ntruncation. In Advances in Neural Information Processing Systems, pages 1928\u20131936, 2009.\n\n11\n\n\f", "award": [], "sourceid": 3968, "authors": [{"given_name": "Carlo", "family_name": "Ciliberto", "institution": "Imperial College London"}, {"given_name": "Francis", "family_name": "Bach", "institution": "INRIA - Ecole Normale Superieure"}, {"given_name": "Alessandro", "family_name": "Rudi", "institution": "INRIA, Ecole Normale Superieure"}]}