{"title": "Bounds on marginal probability distributions", "book": "Advances in Neural Information Processing Systems", "page_first": 1105, "page_last": 1112, "abstract": "We propose a novel bound on single-variable marginal probability distributions in factor graphs with discrete variables. The bound is obtained by propagating bounds (convex sets of probability distributions) over a subtree of the factor graph, rooted in the variable of interest. By construction, the method not only bounds the exact marginal probability distribution of a variable, but also its approximate Belief Propagation marginal (``belief''). Thus, apart from providing a practical means to calculate bounds on marginals, our contribution also lies in providing a better understanding of the error made by Belief Propagation. We show that our bound outperforms the state-of-the-art on some inference problems arising in medical diagnosis.", "full_text": "Bounds on marginal probability distributions\n\nJoris Mooij\n\nMPI for Biological Cybernetics\n\nT\u00a8ubingen, Germany\n\njoris.mooij@tuebingen.mpg.de\n\nb.kappen@science.ru.nl\n\nBert Kappen\n\nDepartment of Biophysics\n\nRadboud University Nijmegen, the Netherlands\n\nAbstract\n\nWe propose a novel bound on single-variable marginal probability distributions in\nfactor graphs with discrete variables. The bound is obtained by propagating local\nbounds (convex sets of probability distributions) over a subtree of the factor graph,\nrooted in the variable of interest. By construction, the method not only bounds\nthe exact marginal probability distribution of a variable, but also its approximate\nBelief Propagation marginal (\u201cbelief\u201d). Thus, apart from providing a practical\nmeans to calculate bounds on marginals, our contribution also lies in providing\na better understanding of the error made by Belief Propagation. We show that\nour bound outperforms the state-of-the-art on some inference problems arising in\nmedical diagnosis.\n\n1 Introduction\n\nGraphical models are used in many different \ufb01elds. A fundamental problem in the application of\ngraphical models is that exact inference is NP-hard [1]. In recent years, much research has focused\non approximate inference techniques, such as sampling methods and deterministic approximation\nmethods, e.g., Belief Propagation (BP) [2]. Although the approximations obtained by these meth-\nods can be very accurate, there are only few useful guarantees on the error of the approximation,\nand often it is not known (without comparing with the intractable exact solution) how accurate an\napproximate result is. Thus it is desirable to calculate, in addition to the approximate results, tight\nbounds on the approximation error.\nThere exist various methods to bound the BP error [3, 4, 5, 6], which can be used, in conjunction\nwith the results of BP, to calculate bounds on the exact marginals. Furthermore, upper bounds on\nthe partition sum, e.g., [7, 8], can be combined with lower bounds on the partition sum, such as\nthe well-known mean \ufb01eld bound or higher-order lower bounds [9], to obtain bounds on marginals.\nFinally, a method called Bound Propagation [10] directly calculates bounds on marginals. However,\nmost of these bounds (with the exception of [3, 10]) have only been formulated for the special case\nof pairwise interactions, which limits their applicability, excluding for example the interesting class\nof Bayesian networks.\nIn this contribution we describe a novel bound on exact single-variable marginals in factor graphs\nwhich is not limited to pairwise interactions. The original motivation for this work was to better\nunderstand and quantify the BP error. This has led to bounds which are at the same time bounds\nfor the exact single-variable marginals as well as for the BP beliefs. A particularly nice feature of\nour bounds is that their computational cost is relatively low, provided that the number of possible\nvalues of each variable in the factor graph is small. On the other hand, the computational complexity\nis exponential in the number of possible values of the variables, which limits application to factor\ngraphs in which each variable has a low number of possible values. On these factor graphs however,\nour bound can signi\ufb01cantly outperform existing methods, either in terms of accuracy or in terms\nof computation time (or both). We illustrate this on two toy problems and on real-world problems\narising in medical diagnosis.\n\n\fThe basic idea underlying our method is that we recursively propagate bounds over a particular sub-\ntree of the factor graph. The propagation rules are similar to those of Belief Propagation; however,\ninstead of propagating messages, we propagate convex sets of messages. This can be done in such\na way that the \ufb01nal \u201cbeliefs\u201d at the root node of the subtree are convex sets which contain the exact\nmarginal of the root node (and, by construction, also its BP belief). In the next section, we describe\nour method in more detail. Due to space constraints, we have omitted the proofs and other techni-\ncal details; these are provided in a technical report [11], which also reports additional experimental\nresults and presents an extension that uses self-avoiding-walk trees instead of subtrees (inspired by\n[6]).\n\n2 Theory\n\n2.1 Preliminaries\nFactorizing probability distributions. Let V := {1, . . . , N} and consider N discrete random\nvariables (xi)i\u2208V. Each variable xi takes values in a discrete domain Xi. We will use the following\nmulti-index notation: let A = {i1, i2, . . . , im} \u2286 V with i1 < i2 < . . . im; we write XA := Xi1 \u00d7\nXi2 \u00d7 \u00b7\u00b7\u00b7 \u00d7 Xim and for any family (Yi)i\u2208B with A \u2286 B \u2286 V, we write YA := (Yi1, Yi2, . . . , Yim).\nWe consider a probability distribution over x = (x1, . . . , xN ) \u2208 XV that can be written as a product\nof factors (\u03c8I)I\u2208F :\n\n(cid:89)\n\nI\u2208F\n\nP(x) =\n\n1\nZ\n\n\u03c8I(xNI ),\n\nwhere Z = (cid:88)\n\n(cid:89)\n\nx\u2208XV\n\nI\u2208F\n\n\u03c8I(xNI ).\n\n(1)\n\nFor each factor index I \u2208 F, there is an associated subset NI \u2286 V of variable indices and the\nfactor \u03c8I is a nonnegative function \u03c8I : XNI \u2192 [0,\u221e). For a Bayesian network, the factors are\n(conditional) probability tables. In case of Markov random \ufb01elds, the factors are often called poten-\ntials. In general, the normalizing constant (\u201cpartition sum\u201d) Z is not known and exact computation\nof Z is infeasible, due to the fact that the number of terms to be summed is exponential in the\nnumber of variables N. Similarly, computing marginal distributions P(xA) for subsets of variables\nA \u2286 V is intractable in general. In this article, we focus on the task of obtaining rigorous bounds on\n\nsingle-variable marginals P(xi) =(cid:80)\n\nP(x).\n\nxV\\{i}\n\nFactor graphs. We can represent the structure of the probability distribution (1) using a factor\ngraph (V,F,E). This is a bipartite graph, consisting of variable nodes i \u2208 V, factor nodes I \u2208 F,\nand edges e \u2208 E, with an edge {i, I} between i \u2208 V and I \u2208 F if and only if the factor \u03c8I\ndepends on xi (i.e., if i \u2208 NI). We will represent factor nodes visually as rectangles and variable\nnodes as circles. Figure 1(a) shows a simple example of a factor graph. The set of neighbors\nof a factor node I is precisely NI; similarly, we denote the set of neighbors of a variable node i by\nNi := {I \u2208 F : i \u2208 NI}. We will assume throughout this article that the factor graph corresponding\nto (1) is connected.\nConvexity. We denote the set of extreme points of a convex set X \u2286 Rd by ext (X). For a subset\nY \u2286 Rd, the convex hull of Y is de\ufb01ned as the smallest convex set X \u2286 Rd with Y \u2286 X; we denote\nthe convex hull of Y as conv (Y ).\nMeasures. For A \u2286 V, de\ufb01ne MA := [0,\u221e)XA as the set of nonnegative functions on XA. Each\nelement of MA can be identi\ufb01ed with a \ufb01nite measure on XA; therefore we will call the elements\nof MA \u201cmeasures on A\u201d. We write M\u2217\nOperations on measures. Adding two measures \u03a8, \u03a6 \u2208 MA results in the measure \u03a8 + \u03a6 in MA.\nFor A, B \u2286 V, we can multiply a measure on MA with a measure on MB to obtain a measure\non MA\u222aB; a special case is multiplication with a scalar. Note that there is a natural embedding of\nMA in MB for A \u2286 B \u2286 V obtained by multiplying a measure \u03a8 \u2208 MA by 1B\\A \u2208 MB\\A,\nthe constant function with value 1 on XB\\A. Another important operation is the partial summation:\n\u03a8 to be the measure in MB\\A obtained by summing\nxA\u2208XA\n\ngiven A \u2286 B \u2286 V and \u03a8 \u2208 MB, de\ufb01ne(cid:80)\n\u03a8 over all xa \u2208 XA, i.e.,(cid:80)\n\u03a8 : xB\\A (cid:55)\u2192(cid:80)\n\nA := MA \\ {0}.\n\n\u03a8(xA, xB\\A).\n\nxA\n\nxA\n\nOperations on sets of measures. We will de\ufb01ne operations on sets of measures by applying the\noperation on elements of these sets and taking the set of the resulting measures; e.g., if we have two\n\n\fxA\n\nfactorized measures on A, i.e., QA := (cid:81)\ni.e., PA = {\u03a8 \u2208 MA : (cid:80)\n\nsubsets \u039eA \u2286 MA and \u039eB \u2286 MB for A, B \u2286 V, we de\ufb01ne the product of the sets \u039eA and \u039eB to be\nthe set of the products of elements of \u039eA and \u039eB, i.e., \u039eA\u039eB := {\u03a8A\u03a8B : \u03a8A \u2208 \u039eA, \u03a8B \u2208 \u039eB}.\nCompletely factorized measures. For A \u2286 V, we will de\ufb01ne QA to be the set of completely\na\u2208A M{a}. Note that MA is the convex hull of QA.\nIndeed, we can write each measure \u03a8 \u2208 MA as a convex combination of measures in QA which\nare zero everywhere except at one particular value of their argument. We denote Q\u2217\nA := QA \\ {0}.\nNormalized (probability) measures. We denote with PA the set of probability measures on A,\n\u03a8 = 1}. The set PA is called a simplex. Note that a simplex is\nconvex; the simplex PA has precisely #(XA) extreme points, each of which corresponds to putting\nall probability mass on one of the possible values of xA. We de\ufb01ne the normalization operator N\nwhich normalizes measures, i.e., for \u03a8 \u2208 M\u2217\nBoxes. Let a, b \u2208 Rd such that a\u03b1 \u2264 b\u03b1 for all components \u03b1 = 1, . . . , d. Then we de-\n\ufb01ne the box with lower bound a and upper bound b by B (a, b) := {x \u2208 Rd : a\u03b1 \u2264 x\u03b1 \u2264\nb\u03b1 for all \u03b1 = 1, . . . , d}. Note that a box is convex; indeed, its extreme points are the \u201ccorners\u201d of\nwhich there are 2d.\nSmallest bounding boxes. Let X \u2286 Rd be bounded. The smallest bounding box of X is de\ufb01ned\nas B (X) := B (a, b) where the lower bound a is given by the pointwise in\ufb01mum of X and the\nupper bound b is given by the pointwise supremum of X, that is a\u03b1 := inf{x\u03b1 : x \u2208 X} and\nb\u03b1 := sup{x\u03b1 : x \u2208 X} for all \u03b1 = 1, . . . , d. Note that B (X) = B (conv (X)). Therefore, if\nX is convex, the smallest bounding box for X depends only on the extreme points ext (X), i.e.,\nB (X) = B (ext (X)); this bounding box can be easily calculated if the number of extreme points is\nnot too large.\n\nZ \u03a8 with Z =(cid:80)\n\nA we de\ufb01ne N \u03a8 := 1\n\n\u03a8.\n\nxA\n\n2.2 The basic tools\n\nTo calculate marginals of subsets of variables in some factor graph, several operations performed\non measures are relevant: normalization, taking products of measures, and summing over subsets of\nvariables. Here we study the interplay between convexity and these operations. This will turn out\nto be useful later on, because our bounds make use of convex sets of measures that are propagated\nover the factor graph.\nThe interplay between convexity and normalization, taking products and partial summation is de-\nscribed by the following lemma.\nLemma 1 Let A \u2286 V and let \u039e \u2286 M\u2217\n\nA. Then:\n\n1. conv (N \u039e) = N (conv \u039e);\n2. for all B \u2286 V, \u03a8 \u2208 MB: conv (\u03a8\u039e) = \u03a8(conv \u039e);\n\n3. for all B \u2286 A: conv(cid:0)(cid:80)\n\n\u039e(cid:1) =(cid:80)\n\nxB\n\nxB\n\nconv \u039e.\n\n(cid:3)\n\nThe next lemma concerns the interplay between convexity and taking products; it says that if we\ntake the product of convex sets of measures on different spaces, the resulting set is contained in the\nconvex hull of the product of the extreme points of the convex sets.\nLemma 2 Let (At)t=1,...,T be disjoint subsets of V. For each t = 1, . . . , T , let \u039et \u2286 MAt be\n. (cid:3)\nconvex with a \ufb01nite number of extreme points. Then conv\n\n(cid:16)(cid:81)T\n\n(cid:16)(cid:81)T\n\n= conv\n\n(cid:17)\n\n(cid:17)\n\nt=1 ext \u039et\n\nt=1 \u039et\n\nThe third lemma says that the product of several boxes on the same subset A of variables can be\neasily calculated: the product of the boxes is again a box, with as lower (upper) bound the product\nof the lower (upper) bounds of the boxes.\nt=1 B(cid:0)\u03a8t, \u03a8t\n(cid:81)T\nLemma 3 Let A \u2286 V and for each t = 1, . . . , T , let \u03a8t, \u03a8t \u2208 MA such that \u03a8t \u2264 \u03a8t. Then\n(cid:3)\n\n(cid:1) = B(cid:16)(cid:81)T\n\nt=1 \u03a8t,(cid:81)T\n\nt=1 \u03a8t\n\n(cid:17)\n\n.\n\nWe are now ready to state the basic lemma. It basically says that one can bound the marginal of\na variable by replacing a factor depending on some other variables by a product of single-variable\n\n\f(a)\n\ni\n\n(b)\n\ni\n\n(c)\n\ni\n\n(d)\n\ni\n\n(e)\n\nJ\n\nK\n\nJ\n\nK\n\nJ\n\nK\n\nJ\n\nK\n\nBi\ni\n\nBJ\u2192i\n\nBK\u2192i\n\nJ\n\nK\n\nj\n\nL\n\nk\n\n\u03b4\n\nj\n\nj(cid:48)\n\nk\n\nL\n\nj\n\n?\n\nj\n\nk\n\nL\n\nBj\u2192J\nj\n\nBL\u2192j\n\nPj\n\nk\n\nL\n\nj(cid:48)\n\n?\n\nBk\u2192K\nk\nBL\u2192k\nL\n\nBj\u2192L\n\nPj\n\nFigure 1:\n(a) Example factor graph with three variable nodes (i, j, k) and three factor nodes\n(J, K, L), with probability distribution P(xi, xj, xk) = 1\nZ \u03c8J(xi, xj)\u03c8K(xi, xk)\u03c8L(xj, xk); (b)\nCloning node j by adding a new variable j(cid:48) and a factor \u03c8\u03b4(xj, xj(cid:48)) = \u03b4xj (xj(cid:48)); (c) Illustration\nof the bound on P(xi) based on (b): \u201cwhat can we say about the range of P(xi) when the factors\ncorresponding to the nodes marked with question marks are arbitrary?\u201d; (d) Subtree of the factor\ngraph; (e) Propagating convex sets of measures (boxes or simplices) on the subtree (d), leading to a\nbound Bi on the marginal probability of xi in G.\n\nfactors and bounding the result. This can be exploited to simplify the computational complexity of\nbounding the marginal. An example of its use will be given in the next subsection.\nLemma 4 Let A, B, C \u2286 V be mutually disjoint subsets of variables. Let \u03a8 \u2208 MA\u222aB\u222aC such that\n\nfor each xC \u2208 XC,(cid:80)\n\nxA\u222aB\n\n(cid:32)\n\n(cid:32) (cid:88)\n\n\u03a8 > 0. Then:\n\u03a8M\u2217\nN\n\nC\n\nB\n\n(cid:33)(cid:33)\n\n(cid:32)\n\n(cid:32) (cid:88)\n\n= B\n\nN\n\n(cid:33)(cid:33)\n\n\u03a8Q\u2217\n\nC\n\n.\n\nxB ,xC\n\nxB ,xC\n\nProof. Note that M\u2217\n(cid:3)\nThe positivity condition is a technical condition, which in our experience is ful\ufb01lled for many prac-\ntically relevant factor graphs.\n\nC is the convex hull of Q\u2217\n\nC and apply Lemma 1.\n\n2.3 A simple example\n\nBefore proceeding to our main result, we \ufb01rst illustrate for a simple case how the basic lemma can\nbe employed to obtain computationally tractable bounds on marginals. We derive a bound for the\nmarginal of the variable xi in the factor graph in Figure 1(a). We start by cloning the variable xj,\ni.e., adding a new variable xj(cid:48) that is constrained to take the same value as xj. In terms of the\nfactor graph, we add a variable node j(cid:48) and a factor node \u03b4, connected to variable nodes j and j(cid:48),\nwith corresponding factor \u03c8\u03b4(xj, xj(cid:48)) := \u03b4xj (xj(cid:48)); see also Figure 1(b). Clearly, the marginal of xi\nsatis\ufb01es:\n\nP(xi) = N\n\n\u03c8J \u03c8K\u03c8L\n\n\u03c8J \u03c8K\u03c8L\u03b4xj (xj(cid:48))\n\n\uf8f6\uf8f8\n\nwhere it should be noted that the \ufb01rst occurrence of \u03c8L is shorthand for \u03c8L(xj, xk), but the second\noccurrence is shorthand for \u03c8L(xj(cid:48) , xk). Noting that \u03c8\u03b4 \u2208 M\u2217\n{j,j(cid:48)} and applying the basic lemma\nwith A = {i}, B = {k}, C = {j, j(cid:48)} and \u03a8 = \u03c8J \u03c8K\u03c8L yields:\n\nP(xi) \u2208 N\n\n\u03c8J \u03c8K\u03c8LM\u2217\n\nxj\nApplying the distributive law, we obtain (see also Figure 1(c)):\n\nxk\n\nxj\n\nxj(cid:48)\n\nP(xi) \u2208 BN\n\n\u03c8JM\u2217\n{j}\n\nxj\n\nxk\n\nxj(cid:48)\n\n\uf8eb\uf8ed\uf8eb\uf8ed(cid:88)\n\n\uf8eb\uf8ed(cid:88)\n\n(cid:88)\n\nxj\n\nxk\n\n\uf8eb\uf8ed(cid:88)\n\n(cid:88)\n\n(cid:88)\n\nxj\n\n\uf8eb\uf8ed(cid:88)\n\uf8f6\uf8f8 = N\n\uf8f6\uf8f8 \u2208 BN\n\uf8f6\uf8f8\uf8eb\uf8ed(cid:88)\n\n{j,j(cid:48)}\n\n(cid:88)\n\n(cid:88)\n\nxj(cid:48)\n\nxk\n\n\uf8eb\uf8ed(cid:88)\n(cid:88)\n\n\u03c8K\n\n\uf8f6\uf8f8 .\n\n\u03c8J \u03c8K\u03c8LQ\u2217\n\n{j,j(cid:48)}\n\n(cid:88)\n\n(cid:88)\n\nxj(cid:48)\n\nxk\n\n\u03c8LM\u2217\n\n{j(cid:48)}\n\n\uf8f6\uf8f8\uf8f6\uf8f8 ,\n\n\fwhich we relax to\n\nP(xi) \u2208 BN\n\n\u03c8JP{j}\n\n\u03c8KBN\n\n\u03c8LP{j(cid:48)}\n\n\uf8eb\uf8edBN\n\uf8eb\uf8ed(cid:88)\n\uf8eb\uf8ed(cid:88)\n\uf8eb\uf8edBN\n\nxj\n\n\uf8f6\uf8f8BN\n\uf8f6\uf8f8BN\n\nxk\n\n\uf8eb\uf8ed(cid:88)\n\uf8eb\uf8ed(cid:88)\n\nxj(cid:48)\n\n\uf8eb\uf8ed(cid:88)\n\uf8eb\uf8ed(cid:88)\n\n\uf8f6\uf8f8\uf8f6\uf8f8\uf8f6\uf8f8 .\n\uf8f6\uf8f8\uf8f6\uf8f8\uf8f6\uf8f8 .\n\nNow it may seem that this smallest bounding box would be dif\ufb01cult to compute. Fortunately, we\nonly need to compute the extreme points of these sets because of convexity. Since smallest bounding\nboxes only depend on extreme points, we conclude that\n\nP(xi) \u2208 BN\n\n\u03c8JextP{j}\n\n\u03c8KBN\n\n\u03c8LextP{j(cid:48)}\n\nwhich can be calculated ef\ufb01ciently if the number of possible values of each variable is small.\n\nxj\n\nxk\n\nxj(cid:48)\n\n2.4 The main result\n\nThe example in the previous subsection can be generalized as follows. First, one chooses a particular\nsubtree of the factor graph, rooted in the variable for which one wants to calculate a bound on its\nmarginal. Then, one propagates messages (which are either bounding boxes or simplices) over this\nsubtree, from the leaf nodes towards the root node. The update equations resemble those of Belief\nPropagation. The resulting \u201cbelief\u201d at the root node is a box that bounds the exact marginal of\nthe root node. The choice of the subtree is arbitrary; different choices lead to different bounds in\ngeneral. We now describe this \u201cbox propagation\u201d algorithm in more detail.\nDe\ufb01nition 5 Let (V,F,E) be a factor graph. We call the bipartite graph (V, F, E) a subtree of\n(V,F,E) with root i if i \u2208 V \u2286 V, F \u2286 F, E \u2286 E such that (V, F, E) is a tree with root i and for\nall {j, J} \u2208 E, j \u2208 V and J \u2208 F (i.e., there are no \u201cloose edges\u201d).1 We denote the parent of j \u2208 V\naccording to (V, F, E) by par(j) and similarly, we denote the parent of J \u2208 F by par(J).\nAn illustration of a possible subtree of the factor graph in Figure 1(a) is the one shown in Figure\n1(d). The bound that we will obtain using this subtree corresponds to the example described in the\nprevious subsection.\nIn the following, we will use the topology of the original factor graph (V,F,E) whenever we refer\nto neighbors of variables or factors. Each edge of the subtree will carry one message, oriented such\nthat it \u201c\ufb02ows\u201d towards the root node. In addition, we de\ufb01ne messages entering the subtree for all\n\u201cmissing\u201d edges in the subtree (see also Figure 1(e)). Because of the bipartite character of the factor\ngraph, we can distinguish two types of messages: messages BJ\u2192j \u2286 Mj sent to a variable j \u2208 V\nfrom a neighboring factor J \u2208 Nj, and messages Bj\u2192J \u2286 Mj sent to a factor J \u2208 F from a\nneighboring variable j \u2208 NJ. The messages entering the subtree are all de\ufb01ned to be simplices;\nmore precisely, we de\ufb01ne the incoming messages\n\nBj\u2192J = Pj\nBJ\u2192j = Pj\n\nfor all J \u2208 F , {j, J} \u2208 E \\ E\nfor all j \u2208 V , {j, J} \u2208 E \\ E.\n\nWe propagate messages towards the root i of the tree using the following update rules (note the\nsimilarity with the BP update rules). The message sent from a variable j \u2208 V to its parent J =\npar(j) \u2208 F is de\ufb01ned as\n\nBj\u2192J =\n\nBK\u2192j\n\nif all incoming BK\u2192j are boxes\nif at least one of the BK\u2192j is the simplex Pj,\n\nwhere the product of the boxes can be calculated using Lemma 3. The message sent from a factor\nJ \u2208 F to its parent k = par(J) \u2208 V is de\ufb01ned as\n\n(cid:89)\n\nBl\u2192J\n\n\u03c8J\n\nl\u2208NJ\\k\n\n\uf8f6\uf8f8 = BN\n\n\uf8eb\uf8ed (cid:88)\n\n(cid:89)\n\n\u03c8J\n\nxNJ \\k\n\nl\u2208NJ\\k\n\nextBl\u2192J\n\n(2)\n\n\uf8f6\uf8f8 ,\n\nBJ\u2192k = BN\n\nK\u2208Nj\\J\nPj\n\n(cid:89)\n\n\uf8f1\uf8f2\uf8f3\n\uf8eb\uf8ed (cid:88)\n\nxNJ \\k\n\n1Note that this corresponds to the notion of subtree of a bipartite graph; for a subtree of a factor graph, one\nsometimes imposes the additional constraint that for all factors J \u2208 F , all its connecting edges {J, j} with\nj \u2208 NJ have to be in E; here we do not impose this additional constraint.\n\n\fwhere the second equality follows from Lemmas 1 and 2. The \ufb01nal \u201cbelief\u201d Bi at the root node i is\ncalculated by\n\n(cid:33)\n\nBK\u2192i\n\n(cid:32) (cid:89)\n\nK\u2208Ni\n\n\uf8f1\uf8f4\uf8f2\uf8f4\uf8f3BN\n\nPi\n\nBi =\n\nif all incoming BK\u2192i are boxes\nif at least one of the BK\u2192i is the simplex Pi.\n\nWe can now formulate our main result, which gives a rigorous bound on the exact single-variable\nmarginal of the root node:\nTheorem 6 Let (V,F,E) be a factor graph with corresponding probability distribution (1). Let\ni \u2208 V and (V, F, E) be a subtree of (V,F,E) with root i \u2208 V . Apply the \u201cbox propagation\u201d\nalgorithm described above to calculate the \ufb01nal \u201cbelief\u201d Bi on the root node i. Then P(xi) \u2208 Bi.\nProof sketch The \ufb01rst step consists in extending the subtree such that each factor node has the\nright number of neighboring variables by cloning the missing variables. The second step consists\nof applying the basic lemma where the set C consists of all the variable nodes of the subtree which\nhave connecting edges in E \\ E, together with all the cloned variable nodes. Then we apply the\ndistributive law, which can be done because the extended subtree has no cycles. Finally, we relax\nthe bound by adding additional normalizations and smallest bounding boxes at each factor node in\nthe subtree. It should now be clear that the \u201cbox propagation\u201d algorithm described above precisely\n(cid:3)\ncalculates the smallest bounding box at the root node i that corresponds to this procedure.\nBecause each subtree of the orginal factor graph is also a subtree of the computation tree for i\n[12], the bounds on the (exact) marginals that we just derived are at the same time bounds on the\napproximate Belief Propagation marginals (beliefs):\nCorollary 7 In the situation described in Theorem 6, the \ufb01nal bounding box Bi also bounds the\n(approximate) Belief Propagation marginal of the root node i, i.e., PBP (xi) \u2208 Bi.\n(cid:3)\n\n2.5 Related work\n\nWe brie\ufb02y discuss the relationship of our bound to previous work. More details are provided in [11].\nThe bound in [6] is related to the bound we present here; however, the bound in [6] differs from ours\nin that it (i) goes deeper into the computation tree by propagating bounds over self-avoiding-walk\n(SAW) trees instead of mere subtrees, (ii) uses a different parameterization of the propagated bounds\nand a different update rule, and (iii), it is only formulated for the special case of factors depending\non two variables, while it is not entirely obvious how to extend the result to more general factor\ngraphs.\nAnother method to obtain bounds on exact marginals is \u201cBound Propagation\u201d [10]. The idea un-\nderlying Bound Propagation is very similar to the one employed in this work, with one crucial\npear in some factor in which i participates) and \u2202i := \u2206i \\ {i} (the Markov blanket of i). Whereas\nour method uses a cavity approach, using as basis equation\n\ndifference. For a variable i \u2208 V, we de\ufb01ne the sets \u2206i :=(cid:83) Ni (consisting of all variables that ap-\n\n(cid:33)\n\n(cid:32)(cid:89)\n\nI\u2208Ni\n\n\u03c8I\n\nP\\i(x\u2202i),\n\nP\\i(x\u2202i) \u221d (cid:88)\n\n(cid:89)\n\n\u03c8I\n\nxV\\\u2206i\n\nI\u2208F\\Ni\n\nP(xi) \u221d(cid:88)\nPropagation is P(xi) =(cid:80)\n\nx\u2202i\n\nx\u2202i\n\nand bound the quantity P(xi) by optimizing over P\\i(x\u2202i), the basis equation employed by Bound\nP(xi | x\u2202i)P(x\u2202i) and the optimization is over P(x\u2202i). Unlike in our\ncase, the computational complexity of Bound Propagation is exponential in the size of the Markov\nblanket, because of the required calculation of the conditional distribution P(xi | x\u2202i). On the other\nhand, the advantage of this approach is that a bound on P(xj) for j \u2208 \u2202i is also a bound on P(x\u2202i),\nwhich in turn gives rise to a bound on P(xi). In this way, bounds can propagate through the graphical\nmodel, eventually yielding a new (tighter) bound on P(x\u2202i). Although the iteration can result in\nrather tight bounds, the main disadvantage of Bound Propagation is its computational cost: it is\nexponential in the Markov blanket and often many iterations are needed for the bounds to become\ntight.\n\n\fFigure 2: Comparisons of various methods on different factor graphs: PROMEDAS (left), a large grid\nwith strong interactions (middle) and a small grid with medium-strength interactions (right).\n\n3 Experiments\n\nIn this section, we present only few empirical results due to space constraints. More details and\nadditional experimental results are given in [11]. We have compared different methods for calculat-\ning bounds on single-variable marginals; for each method and each variable, we calculated the gap\n(tightness) of the bound, which we de\ufb01ned as the (cid:96)\u221e distance between the upper and lower bound of\nthe bounding box. We have investigated three different types of factor graphs; the results are shown\nin Figure 2. The factor graphs used for our experiments are provided as supplementary material to\nthe electronic version of this article at books.nips.cc. We also plan to release the source code\nof several methods as part of a new release of the approximate inference library libDAI [13]. For\nour method, we chose the subtrees in a breadth-\ufb01rst manner.\nFirst, we applied our bound on simulated PROMEDAS patient cases [14]. These factor graphs have\nbinary variables and singleton, pairwise and triple interactions (containing zeros). We generated nine\ndifferent random instances. For each instance, we calculated bounds for each \u201c\ufb01nding\u201d variable in\nthat instance using our method (\u201cBOXPROP\u201d) and the method in [10]. Note that the tightness of\nboth bounds varies widely depending on the instance and on the variable of interest. Our bound was\ntighter than the bound from [10] for all but one out of 1270 variables. Furthermore, whereas [10]\nhad only \ufb01nished on 7 out of 9 instances after running for 75000 s (after which we decided to abort\nthe calculation on the remaining two instances), our method only needed 51 s to calculate all nine\ninstances.\nWe also compared our method with the method described in [6] on a large grid of 100 \u00d7 100 binary\n(\u00b11-valued) variables with strong interactions. Note that this is an intractable problem for exact\ninference methods. The single-variable factors were chosen as exp(\u03b8ixi) with \u03b8i \u223c N (0, 1), the\npair factors were exp(\u03b8ijxixj) with \u03b8ij \u223c N (0, 1). We truncated the subtree to 400 nodes and the\nSAW tree to 105 nodes. Note that our method yields the tightest bound for almost all variables.\nFinally, we compared our method with several other methods referred to in Section 1 on a small\n8 \u00d7 8 grid with medium-strength interactions (similarly chosen as for the large grid, but with\n\u03b8i \u223c N (0, 0.22) and \u03b8ij \u223c N (0, 0.22)). The small size of the grid was necessary because some\nmethods would need several days to handle a large grid. In this case, the method by [6] yields the\ntightest bounds, followed by [10], and our method gets a third place. Note that many methods return\ncompletely uninformative bounds in this case.\n\n4 Conclusion and discussion\n\nWe have described a novel bound on exact single-variable marginals, which is at the same time a\nbound on the (approximate) Belief Propagation marginals. Contrary to many other existing bounds,\nit is formulated for the general case of factor graphs with discrete variables and factors depending on\nan arbitrary number of variables. The bound is calculated by propagating convex sets of measures\nover a subtree of the factor graph, with update equations resembling those of BP. For variables with\na limited number of possible values, the bounds can be computed ef\ufb01ciently. We have compared our\nbounds with existing methods and conclude that our method belongs to the best methods, but that\nit is dif\ufb01cult to say in general which method will yield the tightest bounds for a given variable in a\n\n 0.0001 0.001 0.01 0.1 1 0.0001 0.001 0.01 0.1 1Gaps [10]Gaps BoxPropPROMEDAS 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1Gaps [6]Gaps BoxProp100x100 grid, strong interactions[4][5]BoxProp[6][10][3]+[8]MF-[8][9]-[8]MF-[7][9]-[7] 0.001 0.01 0.1 1Gaps8x8 toroidal grid, medium interactions\fspeci\ufb01c factor graph. Our method could be further improved by optimizing over the choice of the\nsubtree.\nAlthough our bounds are a step forward in quantifying the error of Belief Propagation, the actual\nerror made by BP is often at least one order of magnitude lower than the tightness of these bounds.\nThis is due to the fact that (loopy) BP cycles information through loops in the factor graph; this\ncycling apparently often improves the results. The interesting and still unanswered question is why\nit makes sense to cycle information in this way and whether this error reduction effect can be quan-\nti\ufb01ed.\n\nAcknowledgments\n\nWe thank Wim Wiegerinck for several fruitful discussions, Bastian Wemmenhove for providing the PROMEDAS\ntest cases, and Martijn Leisink for kindly providing his implementation of Bound Propagation. The research\nreported here was supported by the Interactive Collaborative Information Systems (ICIS) project (supported by\nthe Dutch Ministry of Economic Affairs, grant BSIK03024), the Dutch Technology Foundation (STW), and the\nIST Programme of the European Community, under the PASCAL2 Network of Excellence, IST-2007-216886.\n\nReferences\n\n[1] G.F. Cooper. The computational complexity of probabilistic inferences. Arti\ufb01cial Intelligence, 42(2-\n\n3):393\u2013405, March 1990.\n\n[2] J. Pearl. Probabilistic Reasoning in Intelligent systems: Networks of Plausible Inference. Morgan Kauf-\n\nmann, San Francisco, CA, 1988.\n\n[3] M.J. Wainwright, T.S. Jaakkola, and A.S. Willsky. Tree-based reparameterization framework for analysis\nIEEE Transactions on Information Theory, 49(5):1120\u20131146,\n\nof sum-product and related algorithms.\nMay 2003.\n\n[4] S. C. Tatikonda. Convergence of the sum-product algorithm.\n\nTheory Workshop, pages 222\u2013225, April 2003.\n\nIn Proceedings 2003 IEEE Information\n\n[5] Nobuyuki Taga and Shigeru Mase. Error bounds between marginal probabilities and beliefs of loopy\n\nbelief propagation algorithm. In MICAI, pages 186\u2013196, 2006.\n\n[6] A. Ihler. Accuracy bounds for belief propagation.\n\nIn Proceedings of the 23th Annual Conference on\n\nUncertainty in Arti\ufb01cial Intelligence (UAI-07), July 2007.\n\n[7] T. S. Jaakkola and M. Jordan. Recursive algorithms for approximating probabilities in graphical models.\n\nIn Proc. Conf. Neural Information Processing Systems (NIPS 9), pages 487\u2013493, Denver, CO, 1996.\n\n[8] M. J. Wainwright, T. Jaakkola, and A. S. Willsky. A new class of upper bounds on the log partition\n\nfunction. IEEE Transactions on Information Theory, 51:2313\u20132335, July 2005.\n\n[9] M. A. R. Leisink and H. J. Kappen. A tighter bound for graphical models. In Lawrence K. Saul, Yair\nWeiss, and L\u00b4eon Bottou, editors, Advances in Neural Information Processing Systems 13 (NIPS*2000),\npages 266\u2013272, Cambridge, MA, 2001. MIT Press.\n\n[10] M. Leisink and B. Kappen. Bound propagation. Journal of Arti\ufb01cial Intelligence Research, 19:139\u2013154,\n\n2003.\n\n[11] J. M. Mooij and H. J. Kappen. Novel bounds on marginal probabilities. arXiv.org, arXiv:0801.3797\n\n[math.PR], January 2008. Submitted to Journal of Machine Learning Research.\n\n[12] S. C. Tatikonda and M. I. Jordan. Loopy belief propagation and Gibbs measures. In Proc. of the 18th\nAnnual Conf. on Uncertainty in Arti\ufb01cial Intelligence (UAI-02), pages 493\u2013500, San Francisco, CA, 2002.\nMorgan Kaufmann Publishers.\n\n[13] J. M. Mooij. libDAI: A free/open source C++ library for discrete approximate inference methods, 2008.\n\nhttp://mloss.org/software/view/77/.\n\n[14] B. Wemmenhove, J. M. Mooij, W. Wiegerinck, M. Leisink, H. J. Kappen, and J. P. Neijt. Inference in\nthe Promedas medical expert system. In Proceedings of the 11th Conference on Arti\ufb01cial Intelligence in\nMedicine (AIME 2007), volume 4594 of Lecture Notes in Computer Science, pages 456\u2013460. Springer,\n2007.\n\n\f", "award": [], "sourceid": 950, "authors": [{"given_name": "Joris", "family_name": "Mooij", "institution": null}, {"given_name": "Hilbert", "family_name": "Kappen", "institution": null}]}