{"title": "Local Rules for Global MAP: When Do They Work ?", "book": "Advances in Neural Information Processing Systems", "page_first": 871, "page_last": 879, "abstract": "We consider the question of computing Maximum A Posteriori (MAP) assignment in an arbitrary pair-wise Markov Random Field (MRF). We present a randomized iterative algorithm based on simple local updates. The algorithm, starting with an arbitrary initial assignment, updates it in each iteration by first, picking a random node, then selecting an (appropriately chosen) random local neighborhood and optimizing over this local neighborhood. Somewhat surprisingly, we show that this algorithm finds a near optimal assignment within $2n\\ln n$ iterations on average and with high probability for {\\em any} $n$ node pair-wise MRF with {\\em geometry} (i.e. MRF graph with polynomial growth) with the approximation error depending on (in a reasonable manner) the geometric growth rate of the graph and the average radius of the local neighborhood -- this allows for a graceful tradeoff between the complexity of the algorithm and the approximation error. Through extensive simulations, we show that our algorithm finds extremely good approximate solutions for various kinds of MRFs with geometry.", "full_text": "Local Rules for Global MAP: When Do They Work ?\n\n\u2217\n\nKyomin Jung\n\nKAIST\n\nDaejeon, Korea\n\nkyomin@kaist.edu\n\nPushmeet Kohli\nMicrosoft Research\n\nCambridge, UK\n\npkohli@microsoft.com\n\nDevavrat Shah\n\nMIT\n\nCambridge, MA, USA\ndevavrat@mit.edu\n\nAbstract\n\nWe consider the question of computing Maximum A Posteriori (MAP) assignment\nin an arbitrary pair-wise Markov Random Field (MRF). We present a randomized\niterative algorithm based on simple local updates. The algorithm, starting with an\narbitrary initial assignment, updates it in each iteration by \ufb01rst, picking a random\nnode, then selecting an (appropriately chosen) random local neighborhood and\noptimizing over this local neighborhood. Somewhat surprisingly, we show that\nthis algorithm \ufb01nds a near optimal assignment within n log 2 n iterations with high\nprobability for any n node pair-wise MRF with geometry (i.e. MRF graph with\npolynomial growth) with the approximation error depending on (in a reasonable\nmanner) the geometric growth rate of the graph and the average radius of the local\nneighborhood \u2013 this allows for a graceful tradeoff between the complexity of the\nalgorithm and the approximation error. Through extensive simulations, we show\nthat our algorithm \ufb01nds extremely good approximate solutions for various kinds\nof MRFs with geometry.\n\n1 Introduction\n\nThe abstraction of Markov random \ufb01eld (MRF) allows one to utilize graphical representation to\ncapture inter-dependency between large number of random variables in a succinct manner. The MRF\nbased models have been utilized successfully in the context of coding (e.g. the low density parity\ncheck code [15]), statistical physics (e.g.\nthe Ising model [5]), natural language processing [13]\nand image processing in computer vision [11, 12, 19]. In most applications, the primary inference\nquestion of interest is that of \ufb01nding maximum a posteriori (MAP) solution \u2013 e.g. \ufb01nding a most\nlikely transmitted message based on the received signal.\n\nRelated Work. Computing the exact MAP solution in general probabilistic models is an NP-hard\nproblem. This had led researchers to resort of fast approximate algorithms. Various such algorith-\nmic approaches have been developed over more than the past three decades. In essence, all such\napproaches try to \ufb01nd a locally optimal solution of the problem through iterative procedure. These\n\u201dlocal update\u201d algorithms start from an initial solution and proceed by making a series of changes\nwhich lead to solutions having lower energy (or better likelihood), and hence are also called \u201dmove\nmaking algorithms\u201d. At each step, the algorithms search the space of all possible local changes that\ncan be made to the current solution (also called move space), and choose the one which leads to the\nsolution having the highest probability or lowest energy.\n\nOne such algorithm (which has been rediscovered multiple times) is called Iterated Conditional\nModes or ICM for short. Its local update involves selecting (randomly or deterministically) a vari-\nable of the problem. Keeping the values of all other variables \ufb01xed, the value of the selected variable\n\n\u2217\n\nThis work was partially carried out while the author was visiting Microsoft Research Cambridge, and was\n\npartially supported by NSF CAREER project CNS-0546590.\n\n1\n\n\fis chosen which results in a solution with the maximum probability. This process is repeated by se-\nlecting other variables until the probability cannot be increased further.\n\nThe size of the move space is the de\ufb01ning characteristic of any such move making algorithm. A large\nmove space means that more extensive changes to the current solution can be made. This makes the\nalgorithm less prone to getting stuck in local minima and also results in a faster rate of convergence.\nExpansion and Swap are move making algorithms which search for the optimal move in a move\nspace of size 2n where n is the number of random variables. For energy functions composed of\nmetric pairwise potentials, the optimal move can be found in polynomial time by minimizing a\nsubmodular quadratic pseudo-boolean function [3] (or solving an equivalent minimum cost st-cut\nproblem).\n\nThe last few years have seen a lot of interest in st-mincut based move algorithms for energy mini-\nmization. Komodakis et al. [9] recently gave an alternative interpretation of the expansion algorithm.\nThey showed that expansion can be seen as solving the dual of a linear programming relaxation of\nthe energy minimization problem. Researchers have also proposed a number of novel move en-\ncoding strategies for solving particular forms of energy functions. Veksler [18] proposed a move\nalgorithm in which variables can choose any label from a range of labels. They showed that this\nmove space allowed them to obtain better minima of energy functions with truncated convex pair-\nwise terms. Kumar and Torr [10] have since shown that the range move algorithm achieves the same\nguarantees as the ones obtained by methods based on the standard linear programming relaxation.\n\nA related popular algorithmic approach is based on max-product belief propagation (cf. [14] and\n[22]). In a sense, it can be viewed as an iterative algorithm that makes local updates based optimizing\nbased on the immediate graphical structure. There is a long list of literature on understanding the\nconditions under which max-product belief propagation algorithm \ufb01nd correct solution. Speci\ufb01cally,\nin recent years a sequence of results suggest that there is an intimate relation between the max-\nproduct algorithm and a natural linear programming relaxation \u2013 for example, see [1, 2, 8, 16, 21].\n\nWe also note that Swendsen-Wang algorithm (SW) [17], a local \ufb02ipping algorithm, has a philosophy\nsimilar to ours in that it repeats a process of randomly partitioning the graph, and computing an\nassignment. However, the graph partitioning of SW is fundamentally different from ours and there\nis no known guarantee for the error bound of SW.\n\nIn summary, all the approaches thus far with provable guarantees for local update based algorithm\nare primarily for linear or more generally convex optimization setup.\n\nOur Contribution. As the main result of this paper, we propose a randomized iterative local al-\ngorithm that is based on simple local updates. The algorithm, starting with an arbitrary initial as-\nsignment, updates it in each iteration by \ufb01rst picking a random node, then its (appropriate) random\nlocal neighborhood and optimizing over this local neighborhood. Somewhat surprisingly, we show\nthat this algorithm \ufb01nds near optimal assignment within n log 2 n iterations with high probability for\ngraphs with geometry \u2013 i.e. graphs in which the neighborhood of each node within distance r grows\nno faster than a polynomial in r. Such graphs can have arbitrarily structure subject to this poly-\nnomial growth structure. We show that the approximation error depends gracefully on the average\nrandom radius of the local neighborhood and degree of polynomial growth of the graph. Overall, our\nalgorithm can provide an \u03b5\u2212approximation MAP with C(\u03b5)n log 2 n total computation with C(\u03b5)\ndepending only on \u03b5 and the degree of polynomial growth. The crucial novel feature of our algo-\nrithm is the appropriate selection of random local neighborhood rather than deterministic in order to\nachieve provable performance guarantee.\n\nWe note that near optimality of our algorithm does not depend on convexity property, or tree-like\nstructure as many of the previous works; but only relies on geometry of the graphical structure which\nis present in many graphical models of interest such as those arising in image processing, wireless\nnetworks, etc.\n\nWe use our algorithm to verify its performance in simulation scenario. Speci\ufb01cally, we apply our\nalgorithm to two popular setting: (a) a grid graph based pairwise MRF with varying node and edge\ninteraction strengths, and (b) a grid graph based MRF on the weighted independent set (or hardcore)\nmodel. We \ufb01nd that with very small radius (within 3), we \ufb01nd assignment which within 1% (0.99\nfactor) of the MAP for a large range of parameters and upto graph of 1000 nodes.\n\n2\n\n\fOrganization. We start by formally stating our problem statement and main theorem (Theorem 1)\nin Section 2. This is followed by a detailed description of the algorithm in Section 3. We present\nthe sketch proof of the main result in Section 4. Finally, we provide a detailed simulation results in\nSection 5.\n\n2 Main Results\n\nWe start with the formal problem description and useful de\ufb01nitions/notations followed by the state-\nment of the main result about performance of the algorithm. The algorithm will be stated in the next\nsection.\n\nDe\ufb01nitions & Problem Statement. Our interest is in a pair-wise MRF de\ufb01ned next. We note that,\nformally all (non pair-wise) MRFs are equivalent to pair-wise MRFs \u2013 e.g. see [20].\nDe\ufb01nition 1 (Pair-wise MRF). A pair-wise MRF based on graph G = (V, E) with n = |V | vertices\nand edge set E is de\ufb01ned by associated a random variable X v with each vertex v \u2208 V taking value\nin \ufb01nite alphabet set \u03a3; the joint distribution of X = (X v)v\u2208V de\ufb01ned as\n\nPr[X = x] \u221d\n\n\u03a8v(xv) \u00b7\n\n\u03a8uv(xu, xv)\n\n(1)\n\n(cid:2)\n\nv\u2208V\n\n(cid:2)\n\n(u,v)\u2208E\n\nwhere \u03a8v : \u03a3 \u2192 R+ and \u03a8uv : \u03a32 \u2192 R+ are called node and edge potential functions. 1\nIn this paper, the question of interest is to \ufb01nd the maximum a posteriori (MAP) assignment x \u2217 \u2208\n\u03a3n, i.e.\n\nx\u2217 \u2208 arg max\n\nx\u2208\u03a3n Pr[X = x].\n\nEquivalently, from the optimization point of view, we wish to \ufb01nd an optimal assignment of the\nproblem\n\nmaximize H(x)\n\nx \u2208 \u03a3n,\n\nwhere\n\nH(x) =\n\nln \u03a8v(xv) +\n\nln \u03a8uv(xu, xv).\n\n(cid:3)\n\nv\u2208V\n\nover\n\n(cid:3)\n\n(u,v)\u2208E\n\nFor completeness and simplicity of exposition, we assume that the function H is \ufb01nite valued over\n\u03a3n. However, results of this paper extend for hard constrained problems such as the hardcore or\nindependent set model.\n\nIn this paper, we will design algorithms for \ufb01nding approximate MAP problem. Speci\ufb01cally, we call\n\nan assignment(cid:4)x as an \u03b5-approximate MAP if\n(1 \u2212 \u03b5)H(x\u2217\n\n) \u2264 H((cid:4)x) \u2264 H(x\u2217\n\n).\n\nGraphs with Geometry. We de\ufb01ne notion of graphs with geometry here. To this end, a graph\nG = (V, E) induces a natural \u2018graph metric\u2019 on vertices V , denoted by d G : V \u00d7 V \u2192 R+ with\ndG(v, u) as the length of the shortest path between u and v; with it de\ufb01ned as \u221e if there is no path\nbetween them.\n\nDe\ufb01nition 2 (Graph with Polynomial Growth). We call a graph G with polynomial growth of degree\n(or growth rate) \u03c1, if for any v \u2208 V and r \u2208 N,\n\nwhere C > 0 is a universal constant and BG(v, r) = {w \u2208 V |dG(w, v) < r}.\n\n|BG(v, r)| \u2264 C \u00b7 r\u03c1,\n\nA large class of graph model naturally fall into the graphs with polynomial growth. To begin with,\nthe standard d-dimensional regular grid graphs have polynomial growth rate d \u2013 e.g. d = 1 is the\nline graph. More generally, in recent years in the context of computational geometry and metric\nembedding, the graphs with \ufb01nite doubling dimensions have become popular object of study [6, 7].\n\n1We assume the positivity of \u03a8v\u2019s and \u03a8uv\u2019s for simplicity of analysis.\n\n3\n\n\fIt can be checked that a graph with doubling dimension \u03c1 is also a graph with polynomial growth\nrate \u03c1. Finally, the popular geometric graph model where nodes are placed arbitrarily on a two\ndimensional surface with minimum distance separation and two nodes have an edge between them\nif they are within certain \ufb01nite distance, then it is indeed a graph with \ufb01nite polynomial growth rate.\n\nStatement of Main Result. The main result of this paper is a randomized iterative algorithm based\non simple local updates. In essence the algorithm works as follows. It starts with an arbitrary initial\nassignment. In each iteration, it picks a node, say v from all n nodes of V , uniformly at random and\npicks a random radius Q (as per speci\ufb01c distribution). The algorithm re-assigns values to all nodes\nwithin distance Q of node v with respect to graph distance d G by \ufb01nding the optimal assignment\nfor this local neighborhood subject to keeping the assignment to all other nodes the same. The\nalgorithm LOC-ALGO described in Section 3 repeats this process for n log 2 n many times. We show\nthat LOC-ALGO \ufb01nds near optimal solution with high probability as long as the graph has \ufb01nite\npolynomial growth rate.\nTheorem 1. Given MRF based on graph G = (V, E) of n = |V | nodes with polynomial growth rate\niterations produces a solution (cid:4)x such that\n\u03c1 and approximation parameter \u03b5 \u2208 (0, 1), our algorithm LOC-ALGO with O (log(1/\u03b4)n log n)\n\n) \u2212 H((cid:4)x) \u2264 2\u03b5H(x\u2217\n\nPr[H(x\u2217\n\n)] \u2265 1 \u2212 \u03b4 \u2212\n\n1\n\npoly(n)\n\n.\n\nAnd each iteration takes at most \u03b6(\u03b5, \u03c1) computation, with\n\nwhere K(\u03b5, \u03c1) is de\ufb01ned as\n\nK = K(\u03b5, \u03c1) =\n\n8\u03c1\n\u03d5 log\n\n(cid:5)\n\n8\u03c1\n\u03d5\n\n\u03b6(\u03b5, \u03c1) \u2264 |\u03a3|CK(\u03b5,\u03c1)\u03c1,\n(cid:6)\n\n+\n\n4\n\u03d5 log C +\n\n4\n\u03d5 log\n\n1\n\u03d5 + 2 with \u03d5 =\n\n\u03b5\n\n5C2\u03c1\n\n.\n\nIn a nutshell, Theorem 1 say that the complexity of the algorithm for obtaining an \u03b5-approximation\nscales almost linearly in n, double exponentially in 1/\u03b5 and \u03c1. On one hand, this result establishes\nthat it is indeed possible to have polynomial (or almost linear) time approximation algorithm for\narbitrary pair-wise MRF with polynomial growth. On the other hand, though theoretical bound on\nthe pre-constant \u03b6(\u03b5, \u03c1) as function of 1/\u03b5 and \u03c1 is not very exciting, our simulations suggest (see\nSection 5) that even for hard problem setup, the performance is much more optimistic than predicted\nby these theoretical bounds. Therefore, as a recommendation for a system designer, we suggest use\nof smaller \u2018radius\u2019 distribution in algorithm described in Section 3 for obtaining good algorithm.\n\n3 Algorithm Description\n\nnoted earlier, the algorithm iteratively updates its estimation of MAP, denoted by (cid:4)x. Initially, the(cid:4)x is\nIn this section, we provide details of the algorithm intuitively described in the previous section. As\nchosen arbitrarily. Iteratively, at each step a vertex v \u2208 V is chosen uniformly at random along with\na random radius Q that is chosen independently as per distribution Q. Then, select R \u2282 V , the local\nThen while keeping the assignment of all nodes in V \\R \ufb01xed as per (cid:4)x = (\u02c6xv)v\u2208V , \ufb01nd MAP\nneighborhood (or ball) of radius Q around v as per graph distance d G, i.e. {w \u2208 V |dG(u, w) < Q}.\nrestricted to nodes of R. And, update the assignment of nodes in v \u2208 R as per\nassignment x\u2217,R\nx\u2217,R\n. A caricature of an iteration is described in Figure 1. The precise description of the algorithm\nis given in Figure 2.\n\nIn order to have good performance, it is essential to choose appropriate distribution Q for selection\nof random radius Q each time. Next, we de\ufb01ne this distribution which is essentially a truncated\nGeometric distribution. Speci\ufb01cally, given parameters \u03b5 \u2208 (0, 1) and the polynomial growth rate \u03c1\n(with constant C) of the graph, de\ufb01ne \u03d5 = \u03b5\n\n5C2\u03c1 , and\n\n(cid:5)\n\n(cid:6)\n\nThen, the distribution (or random variable) Q is de\ufb01ned over integers from 1 to K(\u03b5, \u03c1) as\n\nK = K(\u03b5, \u03c1) =\n\nPr[Q = i] =\n\n8\u03c1\n\u03d5\n\n8\u03c1\n\u03d5 log\n(cid:7)\n\u03d5(1 \u2212 \u03d5)i\u22121\n(1 \u2212 \u03d5)K\u22121\n\n4\n\n+\n\n4\n\u03d5 log C +\n\n4\n\u03d5 log\n\n1\n\u03d5 + 2.\n\nif 1 \u2264 i < K(\u03b5, \u03c1)\nif i = K(\u03b5, \u03c1)\n\n.\n\n\fGraph G\nGraph G\n\nQ\n\nu\n\nPr[\n\niQ\n\n(cid:32)(cid:32)\n\n]\n\n(cid:72)(cid:72)\n\n1(\n\n(cid:16)\n\ni\n1)\n(cid:16)\n\nfor \n\ni\n\n(cid:32)\n\n3,2,1\n\n(cid:21)\n\n(,\n\nK\n\n(cid:16)\n\n)1\n\nFigure 1: Pictorial description of an iteration of LOC-ALGO.\n\nLOC-ALGO(\u03b5, K)\n\n(0) Input: MRF G = (V, E) with \u03c6i(\u00b7), i \u2208 V , \u03c8ij(\u00b7,\u00b7), (i, j) \u2208 E.\n(1) Initially, select(cid:4)x \u2208 \u03a3n arbitrarily.\n\n(2) Do the following for n log 2 n many times :\n\n(a) Choose an element u \u2208 V uniformly at random.\n(b) Draw a random number Q according to the distribution Q.\n(c) Let R \u2190 {w \u2208 V |dG(u, w) < Q}.\n(d) Through dynamic programming (or exhaustive computation) \ufb01nd\nof(cid:4)x value outside R.\nfor R while \ufb01xing all the other assignment\nan exact MAP x\u2217,R\n(e) Change values of(cid:4)x for R by x\u2217,R\n\n.\n\n(3) Output(cid:4)x.\n\nFigure 2: Algorithm for approximate MAP computation.\n\n4 Proof of Theorem 1\n\nIn this section, we present proof of Theorem 1. To that end, we will prove the following Lemma.\nLemma 1. If we run the LOC-ALGO with (2n ln n) iterations, with probability at least 1\u2212 1/n, we\nhave\n\n(1 \u2212 \u03b5)H(x\u2217\n\n) \u2264 E[H((cid:4)x)] \u2264 H(x\u2217\n\n).\n\nFrom Lemma 1, we obtain Theorem 1 as follows. De\ufb01ne T = 2 log(1/\u03b4), and consider LOC-\n\nALGO with (2T n ln n) iterations. From the fact that H(x\u2217) \u2212 H((cid:4)x) \u2265 0, and by the Markov\ninequality applied to H(x\u2217) \u2212 H((cid:4)x) with Lemma 1, we have that after (2n ln n) iterations,\n\n) \u2212 H((cid:4)x) \u2264 2\u03b5H(x\u2217\n\nPr[H(x\u2217\n\n)] \u2265 1\n2\n\n.\n\n(2)\n(2tn ln n) iterations, (2) holds independently with probability 1 \u2212 1/n. Also, note that H((cid:4)x) is\nNote that (2) is true for any initial assignment of LOC-ALGO. Hence for each 1 \u2264 t \u2264 T , after\nincreasing monotonically. Hence, H(x\u2217)\u2212 H((cid:4)x) > 2\u03b5H(x\u2217) holds after (2T n ln n) iterations only\nhave Pr[H(x\u2217) \u2212 H((cid:4)x) \u2264 2\u03b5H(x\u2217)] \u2265 1 \u2212 \u03b4 \u2212 1/poly(n), which proves the \ufb01rst part of Theorem\nif the same holds after (2tn ln n) iterations for all 1 \u2264 t \u2264 T . Hence, after (2T n ln n) iterations, we\n\n1.\n\nFor the total computation bound in Theorem 1, note that each iteration of LOC-ALGO involves\ndynamic programming over a local neighborhood of radius at most K = K(\u03b5, \u03c1) around a node.\n\n5\n\n\foperations as claimed.\n\nThis involves, due to the polynomial growth condition, at most CK \u03c1 nodes. Each variable can takes\nat most |\u03a3| different values. Therefore, dynamic programming (or exhaustive search) can take at\nmost |\u03a3|CK \u03c1\nProof of Lemma 1. First observe that by the standard argument in the classical coupon collector\nproblem with n coupons (e.g. see [4]), it follows that after 2n ln n iterations, with probability at\nleast 1 \u2212 1/n, all the vertices of V will be chosen as \u2018ball centers\u2019 at least once.\nanswer(cid:4)x generated by LOC-ALGO after 2n ln n iterations, is indeed an \u03b5-approximation on average.\nError bound. Now we prove that if all the vertices of V are chosen as \u2018ball centers\u2019 at least once, the\nTo this end, we construct an imaginarily set of edges as follows. Imagine that the procedure (2) of\nLOC-ALGO is done with an iteration parameter t \u2208 Z +. Then for each vertex v \u2208 V, we assign the\nlargest iteration number t such that the chosen ball R at the iteration t contains w. That is,\nT (v) = max{t \u2208 Z+| LOC-ALGO chooses v as a member of R at iteration t}.\n\nClearly, this is well de\ufb01ned algorithm is run till each node is chosen as the \u2018ball center\u2019 at least once.\nNow de\ufb01ne an imaginary boundary set of LOC-ALGO as\n\nB = {(u, w) \u2208 E|T (u) (cid:10)= T (w)}.\n\nsuch that\n\nNow consider graph G(cid:4) = (V, E\\B) obtained by removing edges B from G. In this graph, nodes\nof the same connected component have same T (\u00b7) value. Next, we state two Lemmas that will be\ncrucial to the proof of the Theorem. Proof of Lemmas 2 and 3 are omitted.\nLemma 2. Given two MRFs X1 and X2 on the same graph G = (V,E) with identical edge po-\n(cid:8)(cid:8) . Finally, for (cid:10) \u2208 {1, 2} and any x \u2208 \u03a3n,\ntentials {\u03c8ij(\u00b7,\u00b7)}, (i, j) \u2208 E but distinct node potentials {\u03c61\ni (\u00b7)}, i \u2208 V respectively. For\neach i \u2208 V, de\ufb01ne \u03c6D\n(cid:11)\n(i,j)\u2208E \u03c8ij(xi, xj), with x\u2217,(cid:5) being a MAP assignment of MRF\nde\ufb01ne H(cid:5)(x) =\ni\u2208V \u03c6(cid:5)\nx(cid:5). Then, we have |H1(x\u2217,1) \u2212 H1(x2,\u2217)| \u2264 2\nLemma 3. Given MRF X de\ufb01ned on G (as in (1)), the algorithm LOC-ALGO produces output (cid:4)x\n\ni (\u00b7)},{\u03c62\n\ni = max\u03c3\u2208\u03a3\n\ni(xi) +\n\ni\u2208V \u03c6D\ni\n\n(cid:9)\n\n(cid:9)\n\n.\n\ni (\u03c3)\n\n(cid:8)(cid:8)\u03c61\ni (\u03c3) \u2212 \u03c62\n(cid:10)(cid:9)\n\u239b\n\u239d (cid:3)\n\n) \u2212 H((cid:4)x)| \u2264 5\n\n(cid:10)\n\nij\n\n(i,j)\u2208B\n\n|H(x\u2217\n\nij \u2212 \u03c8L\n\u03c8U\nwhere B is the (random) imaginary boundary set of LOC-ALGO, \u03c8 U\nij (cid:2) min\u03c3,\u03c3(cid:2)\u2208\u03a3 \u03c8ij(\u03c3, \u03c3(cid:4)).\n\u03c8L\nNow we obtain the following lemma that utilizes the fact that the distribution Q follows a geometric\ndistribution with rate (1 \u2212 \u03d5) \u2013 its proof is omitted.\nLemma 4. For any edge e \u2208 E of G,\nFrom Lemma 4, we obtain that(cid:3)\n(cid:10)\n\n(cid:10)\n\n(cid:11)\n\n(cid:11)\u239e\n\u23a0 ,\nij (cid:2) max\u03c3,\u03c3(cid:2)\u2208\u03a3 \u03c8ij(\u03c3, \u03c3(cid:4)), and\n\n.\n\nij\n\nij \u2212 \u03c8L\n\u03c8U\n(cid:10)\n\n(i,j)\u2208E\n\nij \u2212 \u03c8L\n\u03c8U\n\nij\n\n(cid:11)\n\n(3)\n\n\u2013 its proof is omitted.\n\n+ 1)H(x\u2217\n\n).\n\n(4)\n\n(cid:11) \u2264 \u03d5\n\nPr[e \u2208 B] \u2264 \u03d5.\n(cid:3)\n(cid:9)\n\n(i,j)\u2208E\n\nij\n\nij \u2212 \u03c8L\n\u03c8U\n\n(i,j)\u2208B\n\nFinally, we establish the following lemma that bounds\nLemma 5. If G has maximum vertex degree d\u2217\n, then\nij \u2212 \u03c8L\n\u03c8U\n\n(cid:11) \u2264 (d\u2217\n\n(cid:3)\n\n(cid:10)\n\nij\n\n(i,j)\u2208E\n\nNow recall that the maximum vertex degree d\u2217\nof G is less than 2\u03c1C by the de\ufb01nition of poly-\nnomially growing graph. Therefore, by Lemma 3, (3), and Lemma 5, the output produced by the\n) \u2212 H((cid:4)x)| \u2264 5(d\u2217\nLOC-ALGO algorithm is such that\n\n+ 1)\u03d5H(x\u2217\n\n) \u2264 \u03b5H(x\u2217\n\n),\n\n|H(x\u2217\n5C2\u03c1 . This completes the proof of Lemma 1.\n\nwhere recall that \u03d5 = \u03b5\n\n6\n\n\f5 Experiments\n\nOur algorithm provides a provable approximation for any MRF on a polynomially growing graph.\nIn this section, we present experimental evaluations of our algorithm for two popular models: (a)\nsynthetic Ising model, and (b) hardcore (independent set) model. As a reader will notice, the ex-\nperimental results not only conform the qualitatively behavior proved by our theoretical result, but\nit also suggest that much tighter approximation guarantees should be expected in practice compared\nto what is guaranteed by theoretical results.\n\u239e\nSetup 12 Consider a binary (i.e. \u03a3 = {0, 1}) MRF on an n1 \u00d7 n2 grid G = (V, E):\n\u23a0 , for x \u2208 {0, 1}n1n2 .\n\n\u239b\n\u239d(cid:3)\n\nPr(x) \u221d exp\n\n\u03b8ixi +\n\n\u03b8ij xixj\n\n(cid:3)\n\ni\u2208V\n\n(i,j)\u2208E\n\nWe consider the following scenario for choosing parameters (with the notation U[a, b] for the uni-\nform distribution over the interval [a, b]):\n\n1. For each i \u2208 V , choose \u03b8i independently as per the distribution U[\u22121, 1].\n2. For each (i, j) \u2208 E, choose \u03b8ij independently from U[\u2212\u03b1, \u03b1]. Here the interaction param-\n\neter \u03b1 is chose from {0.125, 0.25, 0.5, 1, 2, 4, 8, 16, 32, 64}.\n\n(A)\n\n0.25\n\n0.2\n\n0.15\n\n0 1\n0.1\n\n0.05\n\nE\nError\n\n0\n\n0.125 0.25\n\n0.5\n\n1\n\n2\n\n4\n\n8\n\n16\n\n32\n\n64\n\n(cid:68)\n\n(B)\n\n0.3\n0.25\n0.2\n0.15\n0.1\n0.05\n0\n\nError\n\n0.125 0.25\n\n0.5\n\n1\n\n2\n\n4\n\n8\n\n16\n\n32\n\n64\n\n(cid:68)\n\nr=1\nr=2\nr 2\nr=3\n\nr=1\nr=2\nr=2\nr=3\n\nFigure 3: (A) plots the error of local update algorithm for a random Ising model in the grid graph of\nsize 10 \u00d7 10, and (B) plots the error in the grid of size 100 \u00d7 10.\n\nTo compare the effectiveness of our algorithm for each size of the local updates, in our simulation,\nwe \ufb01x the square size as a constant instead of choosing it from a distribution. We run the simulation\nfor the local square size r\u00d7r with r = 1, 2, 3, where r = 1 is the case when each square consists of a\nthe output(cid:4)x of our local update algorithm for each r, by doing 4n 1n2 log(n1n2) many local updates\nsingle vertex. We computed an exact MAP assignment x\u2217\nby dynamic programming, and computed\nfor n1 \u00d7 n2 grid graph. Then compare the error as follows:\nH(x\u2217) \u2212 H((cid:16)x\u2217)\n\nError =\n\nH(x\u2217)\n\n.\n\nWe run the simulation for 100 trials and compute the average error for each case. The Figure 3(A)\nplots the error for the grid of size 10 \u00d7 10, while Figure 3(B) plots the error for the grid of size\n100 \u00d7 10.\n\n2Though this setup has \u03c6i, \u03c8ij taking negative values, they are equivalent to the setup considered in the\n\npaper, since af\ufb01ne shift will make them non-negative without changing the distribution.\n\n7\n\n\fRemind that the approximation guarantee of Theorem 1 is an error bound for the worst case. As the\nsimulation result suggests, for any graph and any range of \u03b1, the error of the local update algorithm\ndecreases dramatically as r increases. Moreover, when r is comparably small as r = 3, the output\nof the local update algorithm achieves remarkably good approximation. Hence we observe that our\nalgorithm performs well not only theoretically, but also practically.\n\nSetup 2. We consider the vertex weighted independent set model de\ufb01ned on a grid graph. To this\nend, we start by description of a weighted independent set problem as the MRF model. Speci\ufb01cally,\nconsider a binary MRF on an n1 \u00d7 n2 grid G = (V, E):\n\n\u239e\n\u23a0 , for x \u2208 {0, 1}n1n2 .\n\n\u239b\n\u239d(cid:3)\n\ni\u2208V\n\n(cid:3)\n\n(i,j)\u2208E\n\nPr(x) \u221d exp\n\n\u03b8ixi +\n\n\u03a8(xixj)\n\nHere, the parameters are chosen as follows.\n\n1. For each i \u2208 V , \u03b8i is chosen independently as per the distribution U[0, 1].\n2. The function \u03a8(\u00b7,\u00b7) is de\ufb01ned as\n\u03a8(\u03c3, \u03c3(cid:4)\n\n(cid:7)\u2212M if (\u03c3, \u03c3(cid:4)) = (1, 1)\n\n) =\n\n,\n\n0\n\notherwise\n\nwhere M is a large number.\n\nFor this model, we did simulations for grid graphs of size 10\u00d710, 30\u00d710, and 100\u00d710 respectively.\nFor each graph, we computed the average error as in the Setup 1, over 100 trials. The result is shown\nin the following table. As the result shows, our local update algorithm achieves remarkably good\napproximation of the MAP or equivalently in this setup the maximum weight independent set, even\nwith very small r values !\n\n10 \u00d7 10\n0.219734\n0.016032\n0.001539\n\n30 \u00d7 10\n0.205429\n0.019145\n0.002616\n\n100 \u00d7 10\n0.208446\n0.019305\n0.002445\n\nr=1\nr=2\nr=3\n\nIt is worth nothing that choosing \u03b8 i from U[0, \u03b1] for any \u03b1 > 0 will give the same approximation\nresult, since x\u2217\n\nand(cid:4)x are both linear on \u03b1.\n\n6 Conclusion\n\nWe considered the question of designing simple, iterative algorithm with local updates for \ufb01nding\nMAP in any pair-wise MRF. As the main result of this paper, we presented such a randomized, local\niterative algorithm that can \ufb01nd \u03b5-approximate solution of MAP in any pair-wise MRF based on\nG within 2n ln n iterations and the computation per iteration is constant C(\u03b5, \u03c1) dependent on the\naccuracy parameter \u03b5 as well as the growth rate \u03c1 of the polynomially growing graph G. That is,\nours is a local, iterative randomized PTAS for MAP problem in MRF with geometry. Our results are\nsomewhat surprising given that thus far the known theoretical justi\ufb01cation for such local algorithms\nstrongly dependended on some form of convexity of the \u2018energy\u2019 function. In contrast, our results\ndo not require any such condition, but only the geometry of the underlying MRF. We believe that\nour algorithm will be of great practical interest in near future as a large class of problems that utilize\nMRF based modeling and inference in practice have the underlying graphical structure possessing\nsome form of geometry naturally.\n\n8\n\n\fReferences\n\n[1] M. Bayati, D. Shah, and M. Sharma. Maximum weight matching via max-product belief\n\npropagation. In IEEE ISIT, 2005.\n\n[2] M. Bayati, D. Shah, and M. Sharma. Max-Product for Maximum Weight Matching: Conver-\ngence, Correctness, and LP Duality. IEEE Transactions on Information Theory, 54(3):1241\u2013\n1251, 2008.\n\n[3] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts.\n\nIEEE Trans. Pattern Anal. Mach. Intell., 23(11):1222\u20131239, 2001.\n\n[4] William Feller. An Introduction to Probability Theory and Its Applications. Wiley, 1957.\n[5] Hans-Otto Georgii. Gibbs measures and phase transitions. Walter de Gruyter, 1988.\n[6] A. Gupta, R. Krauthgamer, and J.R. Lee. Bounded geometries, fractals, and low-distortion\nembeddings. In In Proceedings of the 44th annual Symposium on the Foundations of Computer\nScience, 2003.\n\n[7] S. Har-Peled and M. Mendel. Fast construction of nets in low dimensional metrics, and their\napplications. In Proceedings of the twenty-\ufb01rst annual symposium on Computational geometry,\npages 150\u2013158. ACM New York, NY, USA, 2005.\n\n[8] B. Huang and T. Jebara. Loopy belief propagation for bipartite maximum weight b-matching.\n\nArti\ufb01cial Intelligence and Statistics (AISTATS), 2007.\n\n[9] N. Komodakis and G. Tziritas. A new framework for approximate labeling via graph cuts. In\n\nInternational Conference on Computer Vision, pages 1018\u20131025, 2005.\n\n[10] M. Pawan Kumar and Philip H. S. Torr. Improved moves for truncated convex models. In\n\nNIPS, pages 889\u2013896, 2008.\n\n[11] Stan Z. Li. Markov Random Field Modeling in Image Analysis. Springer, 2001.\n[12] M. Malfait and D. Roose. Wavelet-based image denoising using a markov random \ufb01eld a priori\n\nmodel. IEEE Transactions on : Image Processing, 6(4):549\u2013565, 1997.\n\n[13] Christopher D. Manning and Hinrich Schutze. Foundations of Statistical Natural Language\n\nProcessing. The MIT Press, 1999.\n\n[14] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San\n\nFrancisco, CA: Morgan Kaufmann, 1988.\n\n[15] Thomas Richardson and Ruediger Ubanke. Modern Coding Theory. Cambridge University\n\nPress, 2008.\n\n[16] S. Sanghavi, D. Shah, and A. Willsky. Message-passing for Maximum Weight Independent\n\nSet. In Proceedings of NIPS, 2007.\n\n[17] R. Swendsen and J. Wang. Nonuniversal critical dynamics in monte carlo simulations. Phys.\n\nRev. Letter., 58:86\u201388, 1987.\n\n[18] O. Veksler. Graph cut based optimization for mrfs with truncated convex priors. In CVPR,\n\n2007.\n\n[19] Paul Viola and Michael J. Jones. Robust real-time face detection. International Journal of\n\nComputer Vision, 57(2):137\u2013154, 2004.\n\n[20] M. Wainwright and M. Jordan. Graphical models, exponential families, and variational infer-\n\nence. UC Berkeley, Dept. of Statistics, Technical Report 649, 2003.\n\n[21] M. J. Wainwright, T. Jaakkola, and A. S. Willsky. Map estimation via agreement on (hy-\nper)trees: Message-passing and linear-programming approaches. IEEE Transactions on Infor-\nmation Theory, 2005.\n\n[22] J. Yedidia, W. Freeman, and Y. Weiss. Generalized belief propagation. Mitsubishi Elect. Res.\n\nLab., TR-2000-26, 2000.\n\n9\n\n\f", "award": [], "sourceid": 185, "authors": [{"given_name": "Kyomin", "family_name": "Jung", "institution": null}, {"given_name": "Pushmeet", "family_name": "Kohli", "institution": null}, {"given_name": "Devavrat", "family_name": "Shah", "institution": null}]}