{"title": "A Differential Semantics for Jointree Algorithms", "book": "Advances in Neural Information Processing Systems", "page_first": 801, "page_last": 808, "abstract": "", "full_text": "A Di\ufb01erential Semantics for Jointree\n\nAlgorithms\n\nJames D. P ark and Adnan Darwic he\n\nComputer Science Department\n\nUniv ersity of California, Los Angeles, CA 90095\n\nfjd,darwicheg@cs.ucla.edu\n\nAbstract\n\nA new approach to inference in belief networks has been recently\nproposed, which is based on an algebraic representation of belief\nnetworks using multi{linear functions. According to this approach,\nthe key computational question is that of representing multi{linear\nfunctions compactly, since inference reduces to a simple process of\nev aluating and di\ufb01erentiating such functions. W e show here that\nmainstream inference algorithms based on jointrees are a special\ncase of this approach in a v ery precise sense. W e use this result to\nprov e new properties of jointree algorithms, and then discuss some\nof its practical and theoretical implications.\n\n1\n\nIntroduction\n\nIt was recently shown that the probability distribution of a belief network can be\nrepresented using a multi{linear function, and that most probabilistic queries of\ninterest can be retriev ed directly from the partial deriv ativ es of this function [2].\nAlthough the multi{linear function has an exponential number of terms, it can\nbe represented using a small arithmetic circuit in certain situations [3].1 Once\na belief network is represented as an arithmetic circuit, probabilistic inference is\nthen performed by ev aluating and di\ufb01erentiating the circuit, using a v ery simple\nprocedure which resembles back{propagation in neural networks.\n\nW e show in this paper that mainstream inference algorithms based on jointrees [14,\n8] are a special-case of the arithmetic{circuit approach proposed in [2]. Speci\ufb02cally,\nwe show that each jointree is an implicit representation of an arithmetic circuit; that\nthe inward{pass in jointree propagation ev aluates this circuit; and that the outward{\npass di\ufb01erentiates the circuit. Using these results, we prov e new useful properties\nof jointree propagation.\nW e also suggest a new interpretation of the process of\nfactoring graphical models into jointrees, as a process of factoring exponentially{\nsized multi{linear functions into arithmetic circuits of smaller size.\n\n1For example, it was shown recently that real{world b elief networks with treewidth up\nto 60 can b e compiled into arithmetic circuits with few thousand nodes [3]. Such networks\nhav e local structure, and are outside the scope of mainstream algorithms for inference in\nb elief networks whose complexity is exponential in treewidth.\n\n\fA\n\ntrue\ntrue\n\nfalse\nfalse\n\nB\n\ntrue\nfalse\n\ntrue\nfalse\n\n(cid:181)bja = :2\n= :8\n(cid:181)\u201ebja\n(cid:181)bj\u201ea = :7\n= :3\n(cid:181)\u201ebj\u201ea\n\nA\n\ntrue\nfalse\n\n(cid:181)a = :6\n(cid:181)\u201ea = :4\n\nA\n\ntrue\ntrue\nfalse\nfalse\n\nC\n\ntrue\nfalse\ntrue\nfalse\n\n(cid:181)cja = :8\n(cid:181)\n\u201ecja = :2\n(cid:181)cj\u201ea = :15\n(cid:181)\n\u201ecj\u201ea = :85\n\nFigure 1: The CPTs of belief network B \u02c6 A ! C.\n\nThis paper is structured as follows. Sections 2 and 3 are dedicated to a review of\ninference approaches based on arithmetic circuits and jointrees. Section 4 shows\nthat the jointree approach is a special case of the arithmetic{circuit approach, and\ndiscusses some practical implications of this \ufb02nding. Finally, Section 5 closes with a\nnew perspective on factoring graphical models. Proofs of all theorems can be found\nin the long version of this paper [11].\n\n2 Belief netw orks as m ulti{linear functions\n\nA belief network is a factored representation of a probability distribution. It consists\nof two parts: a directed acyclic graph (D A G) and a set of conditional probability\ntables (CPTs). For each node X and its parents U, we have a CPT that speci\ufb02es\nthe distribution of X given each instantiation u of the parents; see Figure 1.2\n\nA belief network is a representational factorization of a probability distribution,\nnot a computational one. That is, although the network compactly represents the\ndistribution, it needs to be processed further if one is to obtain answers to arbitrary\nprobabilistic queries. Mainstream algorithms for inference in belief networks oper-\nate on the network to generate a computational factorization, allowing one to answer\nqueries in time which is linear in the factorization size. A most in(cid:176)uential compu-\ntational factorization of belief networks is the jointree [14, 8, 6]. Standard jointree\nfactorizations are structure{based: their size depend only on the network topol-\nogy and is invariant to local CPT structure. This observation has triggered much\nresearch for alternative, \ufb02ner{grained factorizations, since real-world networks can\nexhibit signi\ufb02cant local structure that leads to smaller factorizations if exploited.\n\nW e discuss next one of the latest proposals in this direction, which calls for using\narithmetic circuits as a computational factorization of belief networks [2]. This\nproposal is based on viewing each belief network as a multi{linear function, which\ncan be represented compactly using an arithmetic circuit. The multi{linear function\nitself contains two types of variables. First, evidence indicators, where for each\nvariable X in the network , we have a variable \u201ax for each value x of X. Second,\nnetwork parameters, where for each variable X and its parents U in the network,\nwe have a variable (cid:181)xju for each value x of X and instantiation u of U.\n\nThe multi{linear function has a term for each instantiation of the network variables,\nwhich is constructed by multiplying all evidence indicators and network parameters\nthat are consistent with that instantiation. For example, the multi{linear function\nof the network in Figure 1 has eight terms corresponding to the eight instantiations\nof variables A; B; C: f = \u201aa\u201ab\u201ac(cid:181)a(cid:181)bja(cid:181)cja+\u201aa\u201ab\u201a\u201ec(cid:181)a(cid:181)bja(cid:181)\u201ecja+: : :+\u201a\u201ea\u201a\u201eb\u201a\u201ec(cid:181)\u201ea(cid:181)\u201ebj\u201ea(cid:181)\u201ecj\u201ea.\nW e will often refer to such a multi{linear function as the network polynomial.\n\n2Variables are denoted by upper{case letters (A) and their v alues by lower{case letters\n(a). Sets of v ariables are denoted by bold{face upper{case letters (A) and their instanti-\nations are denoted by bold{face lower{case letters (a). For a v ariable A with v alues true\nand false, we use a to denote A= true and \u201ea to denote A= false. Finally , for a v ariable X\nand its parents U, we use (cid:181)xju to denote the CPT entry corresponding to Pr (x j u).\n\n\f+\n\n*\n\n*\n\n+\n\n+\n\n*\n\n*\n\n*\n\n*\n\n(cid:1)(cid:2) (cid:3)(cid:2)\n\n(cid:3)(cid:4)(cid:1)(cid:2)\n\n(cid:1)(cid:4)\n\n(cid:3)\n\n(cid:4)(cid:1)(cid:2)\n\n(cid:3)\n\n(cid:4)(cid:1)(cid:2)\n\n(cid:1)\n\n(cid:4)\n\n(cid:3)\n\n(cid:4)(cid:1)(cid:2)\n\n(cid:3)\n\n(cid:2)\n\n(cid:1)\n\n(cid:2)\n\nA\n\nC\n\nE\n\nB\n\nD\n\nB\n\nD\n\nD|BC\n\nBCD\n\nABC\n\nCE\n\nA\n\nA\n\nB|A\n\nC|A\n\nC\n\nE\n\nE|C\n\nFigure 2: On the left: An arithmetic circuit which computes the function\n\u201aa\u201ab(cid:181)a(cid:181)bja+ \u201aa\u201a\u201eb(cid:181)a(cid:181)\u201ebja+ \u201a\u201ea\u201ab(cid:181)\u201ea(cid:181)bj\u201ea+ \u201a\u201ea\u201a\u201eb(cid:181)\u201ea(cid:181)\u201ebj\u201ea. The circuit is a D AG, where\nleaf nodes represent function variables and internal nodes represent arithmetic op-\nerations. On the right: A belief network structure and its corresponding jointree.\n\nGiven the network polynomial f , we can answer any query with respect to the\nbelief network. Speci\ufb02cally, let e be an instantiation of some network variables,\nand suppose we want to compute the probability of e. W e can do this by simply\nevaluating the polynomial f while setting each evidence indicator \u201ax to 1 if x\nis consistent with e, and to 0 otherwise. For the network in Figure 1, we can\ncompute the probability of evidence e = b\u201ec by evaluating its polynomial above under\n\u201aa = 1,\u201a\u201ea = 1,\u201ab = 1; \u201a\u201eb = 0 and \u201ac = 0; \u201a\u201ec = 1. This leads to (cid:181)a(cid:181)bja(cid:181)\u201ecja+(cid:181)\u201ea(cid:181)bj\u201ea(cid:181)\u201ecj\u201ea,\nwhich equals the probability of b; \u201ec in this case. W e use f (e) to denote the result\nof evaluating the polynomial f under evidence e as given above.\n\nThis algebraic representation of belief networks is attractive as it allows us to obtain\nanswers to a large number of probabilistic queries directly from the derivatives of\nthe network polynomial [2]. For example, the posterior marginal Pr (xje) for a\nvariable X 62 E equals\nis the partial derivative of f wrt \u201ax\nevaluated at e. Second, the probability of evidence e after having retracted the\nvalue of some variable X from e, Pr (e \u00a1 X), equals Px\n. Third, the posterior\nmarginal Pr (x; uje) for a variable X and its parents U equals (cid:181)xju\nf (e)\n\n, where @ f (e)\n@ \u201ax\n\n@ f (e)\n@ \u201ax\n\n@ f (e)\n@ \u201ax\n\n1\n\nf (e)\n\n@ f (e)\n@ (cid:181)xju\n\n.\n\nThe network polynomial has an exponential number of terms, yet one can represent\nit compactly in certain cases using an arithmetic circuit; see Figure 2. The (\ufb02rst)\npartial derivatives of an arithmetic circuit can all be computed simultaneously in\ntime linear in the circuit size [2, 12]. The procedure resembles the back{propagation\nalgorithm for neural networks as it evaluates the circuit in a single upward{pass,\nand then di\ufb01erentiates it through a single downward{pass.\n\nThe main computational question is then that of generating the smallest arithmetic\ncircuit that computes the network polynomial. A structure{based approach for\nthis has been given in [2], which is guaranteed to generate a circuit whose size is\nbounded by O(n exp(w)), where n is the number of nodes in the network and w\nis its treewidth. A more recent approach, however, which exploits local structure\nhas been presented in [3] and was shown experimentally to generate small arith-\nmetic circuits for networks whose treewidth is up to 60. As we show in the rest of\nthis paper, the process of factoring a belief network into a jointree is yet another\nmethod for generating an arithmetic circuit for the network. Speci\ufb02cally, we show\nthat the jointree structure is an implicit representation of such a circuit, and that\njointree propagation corresponds to circuit evaluation and di\ufb01erentiation. More-\nover, the di\ufb01erence between Shenoy{Shafer and Hugin propagation turns out to be\na di\ufb01erence in the numeric scheme used for circuit di\ufb01erentiation [11].\n\nl\nl\nl\nl\nl\nl\nl\nl\nq\nq\nq\nq\nl\nl\nl\nl\nl\nl\nl\nl\nq\nq\nq\nq\nl\nl\nl\nl\nq\nq\nq\nq\nq\nq\nq\nq\nq\nq\nq\nq\n\f3 Join tree Algorithms\n\nWe now review jointree algorithms, which are quite in(cid:176)uential in graphical models.\nLet B be a belief network. A jointree for B is a pair (T ; L), where T is a tree and\nL is a function that assigns labels to nodes in T . A jointree must satisfy three\nproperties: (1) each label L(i) is a set of variables in the belief network; (2) each\nnetwork variable X and its parents U (a family) must appear together in some label\nL(i); (3) if a variable appears in the labels of i and j, it must also appear in the\nlabel of each node k on the path connecting them. The label of edge ij in T is\nde\ufb02ned as L(i) \\ L(j). We will refer to the nodes of a jointree (and sometimes their\nlabels) as clusters. We will also refer to the edges of a jointree (and sometimes their\nlabels) as separators. Figure 2 depicts a belief network and one of its jointrees.\n\nJointree algorithms start by constructing a jointree for a given belief network [14, 8,\n6]. They also associate tables (also called potentials) with clusters and separators.3\nThe conditional probability table (CPT or CP Table) of each variable X with parents\nU, denoted (cid:181)XjU, is assigned to a cluster that contains X and U. In addition, an\nevidence table over variable X, denoted \u201aX , is assigned to a cluster that contains X.\nFigure 2 depicts the assignments of evidence and CP tables to clusters. Evidence e\nis entered into a jointree by initializing evidence tables as follows: we set \u201aX (x) to\n1 if x is consistent with evidence e, and we set \u201aX (x) to 0 otherwise.\n\nGiven some evidence e, a jointree algorithm propagates messages between clusters.\nAfter passing two message per edge in the jointree, one can compute the marginals\nPr (C; e) for every cluster C. There are two main methods for propagating messages\nin a jointree: the Shenoy{Shafer architecture [14] and the Hugin architecture [8].\n\nShenoy{Shafer propagation proceeds as follows [14]. First, evidence e is then entered\ninto the jointree. A cluster is then selected as the root and message propagation\nproceeds in two phases, inward and outward. In the inward phase, messages are\npassed toward the root. In the outward phase, messages are passed away from the\nroot. Cluster i sends a message to cluster j only when it has received messages\nfrom all its other neighbors k. A message from cluster i to cluster j is a table Mij\nde\ufb02ned as follows: Mij = PCnS `i Qk6=j Mki, where C are the variables of cluster\ni, S are the variables of separator ij, and `i is the multiplication of all evidence\nand CP tables assigned to cluster i. Once message propagation is \ufb02nished, we have\nPr (C; e) = `i Qk Mki, where C are the variables of cluster i.\nHugin propagation proceeds similarly to Shenoy{Shafer by entering evidence; select-\ning a cluster as root; and propagating messages in two phases, inward and outward\n[8]. The Hugin method, however, di\ufb01ers in some major ways. It maintains a table\n'ij with each separator, whose entries are initialized to 1s. It also maintains a table\n'i with each cluster i, initialized to the multiplication of all CPTs and evidence\ntables assigned to cluster i. Cluster i passes a message to neighboring cluster j only\nwhen i has received messages from all its other neighbors k. When cluster i is ready\nto send a message to cluster j, it does the following. First, it saves the table of\nseparator 'ij into 'old\nij . Second, it computes a new separator table 'ij = PCnS 'i;\nwhere C are the variables of cluster i and S are the variables of separator ij. Third,\nit computes a message to cluster j: Mij = 'ij\n. Finally, it multiplies the computed\nmessage into the table of cluster j: 'j = 'jMij. After the inward and outward{\npasses of Hugin propagation are completed, we have: Pr (C; e) = 'i; where C are\nthe variables of cluster i.\n\n'old\n\nij\n\n3A table is an array which is indexed by v ariable instantiations. Sp eci\ufb02cally , a table `\n\nov er v ariables X is indexed by the instantiations x of X. Its entries `(x) are in [0; 1].\n\n\f4 Join trees as arithmetic circuits\n\nWe now show that every jointree (together with a root cluster and a particular\nassignment of evidence and CP tables to clusters) corresponds precisely to an arith-\nmetic circuit that computes the network polynomial. We also show that the inward{\npass of the Shenoy{Shafer architecture evaluates this circuit, while the outward{pass\ndi\ufb01erentiates it. We show a similar result for the Hugin architecture.\n\nDe\ufb02nition 1 Given a root cluster, a particular assignment of evidence and CP\ntables to clusters, the arithmetic circuit embedded in a jointree is de\ufb02ned as follows:4\n\nNodes: The circuit includes: an output addition node f ; an addition node s for each\ninstantiation of a separator S; a multiplication node c for each instantiation of a\ncluster C; an input node \u201ax for each instantiation x of variable X; an input node\n(cid:181)xju for each instantiation xu of family XU.\n\nEdges: The children of the output node f are the multiplication nodes generated by\nthe root cluster; the children of an addition node s are all compatible nodes generated\nby the child cluster; the children of a multiplication node c are all compatible nodes\ngenerated by child separators, and all compatible input nodes assigned to cluster C.\n\nHence, separators contribute addition nodes and clusters contribute multiplication\nnodes. Moreover, the structure of the jointree dictates how these nodes are con-\nnected into a circuit. The arithmetic circuit in Figure 2 is embedded in the jointree\nA \u00a1 AB, with cluster A as the root, and with tables \u201aA; (cid:181)A assigned to cluster A\nand tables \u201aB and (cid:181)BjA assigned to cluster B.\n\nTheorem 1 The circuit embedded in a jointree computes the network polynomial.\n\nTherefore, by constructing a jointree one is generating a compact representation of\nthe network polynomial in terms of an arithmetic circuit.\n\nWe are now ready to state our basic results on the di\ufb01erential semantics of jointree\npropagation, but we need some notational conventions \ufb02rst. In the following three\ntheorems: f denotes the circuit embedded in a jointree or its (unique) output node;\ns denotes a separator instantiation or the addition node generated by that instanti-\nation; and c denotes a cluster instantiation or the multiplication node generated by\nthat instantiation. Moreover, the value that a circuit node v tak es under evidence\ne is denoted v(e). Recall that a circuit (or network polynomial) is evaluated under\nevidence e by setting each input \u201ax to 1 if x is consistent with e; and to 0 other-\nwise. Finally, recall that @ f =@ v represents the derivative of the circuit output with\nrespect to node v. Our \ufb02rst result relates to Shenoy{Shafer propagation.\n\nTheorem 2 The messages produced using Shenoy{Shafer propagation on a jointree\nunder evidence e have the following semantics. F or each inward message Mij, we\nhave Mij(s) = s(e). F or each outward message Mji, we have Mji(s) = @f (e)\n@ s .\n\nHence, if we interpret separator instantiations as addition nodes in a circuit as given\nby De\ufb02nition 1, we get that a message directed towards the jointree root contains\nthe values of these addition nodes, while a message directed outward from the root\ncontains the partial derivatives of the circuit output with respect to these nodes.\n\nShenoy{Shafer propagation does not compute derivatives with respect to input\nnodes \u201ax and (cid:181)xju, but these can be obtained using local computations as follows.\n\n4Given a root cluster, one can direct the jointree b y having arrows point away from\nroot, which also de\ufb02nes a parent/child relationship b etween clusters and separators.\n\nthe\n\n\fTheorem 3 If evidence table \u201aX is assigned to cluster i with variables C:\n\n@ f (e)\n@ \u201ax\n\n= 2\n4X\n\nCnX\n\nY\n\nj\n\nMji Y\n\n\u02c66=\u201aX\n\n\u02c63\n5\n\n(x);\n\n(1)\n\nwhere \u02c6 ranges over all evidence and CP tables assigned to cluster i. Moreover, if\nCPT (cid:181)XjU is assigned to cluster i with variables C:\n\n@ f (e)\n@ (cid:181)xju\n\n= 2\n4 X\n\nCnXU\n\nY\n\nj\n\nMji Y\n\n\u02c66=(cid:181)XjU\n\n\u02c63\n5\n\n(xu);\n\n(2)\n\nwhere \u02c6 ranges over all evidence and CP tables assigned to cluster i.\n\nTherefore, even though Shenoy{Shafer propagation does not fully di\ufb01erentiate the\nembedded circuit, the di\ufb01erentiation process can be completed through local com-\nputations after propagation has \ufb02nished.5\n\nW e now discuss some applications of the partial derivatives with respect to evidence\nindicators \u201ax and network parameters (cid:181)xju.\n\nF ast retraction & evidence (cid:176)ipping. Suppose jointree propagation has been\nperformed using evidence e, which gives us access directly to the probability of e.\nSuppose now we are interested in the probability of a di\ufb01erent evidence e0, which\nresults from changing the value of some variable X in e to a new value x. The\nprobability of e0 in this case is equal to @f (e)\n[2], which can be obtained as given\n@\u201ax\nby Equation 1. The ability to perform this computation e\u2013ciently is crucial for\nalgorithms that try to approximate maximum ap osteriori hyp othesis (MAP) using\nlocal search [9, 10]. Another application of this derivative is in computing the\nprobability of evidence e0, which results from retracting the value of some variable\nX from e: Pr (e0) = Px\n. This computation is k ey to analyzing evidence\ncon(cid:176)ict, as it allows us to determine the extent to which one piece of evidence is\ncontradicted by the remaining pieces.\n\n@f (e)\n@\u201ax\n\nSensitivity analysis & parameter learning. The derivative @Pr (e)\nis essen-\n@(cid:181)xju\ntial for sensitivity analysis|it is the basis for an e\u2013cient approach that identi-\n\ufb02es minimal network parameters changes that are necessary to satisfy constraints\non probabilistic queries [1]. This derivative is also crucial for gradient ascent ap-\nproaches for learning network parameters as it is required to compute the gradient\n\n5Hugin propagation also corresponds to circuit ev aluation/di\ufb01erentiation:\n\n@ s\n\nif s(e) 6= 0.\n\nTheorem 4 Cluster tables, separator tables and messages produced using Hugin propaga-\ntion under evidence e have the following semantics: F or table 'i of cluster i with variables\n@ c . F or table 'ij of separator ij with variables S: 'ij(s) = s(e) @f (e)\nC: 'i(c) = c(e) @f (e)\n@ s .\nF or each inward message Mij , we have Mij(s) = s(e). F or each outward message Mji, we\nhave Mji(s) = @f (e)\nAgain, Hugin propagation does not compute deriv ativ es with respect to input nodes \u201ax\nand (cid:181)xju. Ev en for addition and multiplication nodes, it only retains deriv ativ es multiplied\nby v alues. Hence, if we want to recov er the deriv ativ e with respect to, say, multiplication\nnode c, we must know the v alue of this node and it must be di\ufb01erent than zero. In such a\ncase, we hav e @f (e)=@c = 'i(c)=c(e), where 'i is the table associated with the cluster i\nthat generates node c. One can also compute the quantity v @f =@v for input nodes using\nequations similar to those in Theorem 3. But such quantities will be useful for obtaining\nderiv ativ es only if the v alues of such input nodes are not zero. Hence, Shenoy{Shafer\npropagation is more informativ e than Hugin propagation as far as the computation of\nderiv ativ es is concerned.\n\n\fused for deciding moves in the search space [13]. This derivative equals @f (e)\n, and\n@(cid:181)xju\ncan be obtained as given by Equation 2. The only other method we are aware\nof to compute this derivative (beyond the one in [2]) is the one using the identity\n@Pr (e)=@(cid:181)xju = Pr (x; u; e)=(cid:181)xju, which requires (cid:181)xju 6= 0 [13]. Hence, our results\nseem to suggest the \ufb02rst general approach for computing this derivative using stan-\ndard jointree propagation.\n\nBounding rounding errors. Jointree propagation gives exact results only when\nin\ufb02nite precision arithmetic is used. In practice, however, \ufb02nite precision (cid:176)oating{\npoint arithmetic is typically used, in which case the di\ufb01erential semantics of jointree\npropagation can be used to bound the rounding error in the computed probability\nof evidence. See the full paper [11] for details on computing this bound.\n\n5 A new perspectiv e on factoring graphical models\n\nW e have shown in this paper that each jointree can be viewed as an implicit repre-\nsentation of an arithmetic circuit which computes the network polynomial, and that\njointree propagation corresponds to an evaluation and di\ufb01erentiation of the circuit.\nThese results have been useful in unifying the circuit approach presented in [2] with\njointree approaches, and in uncovering more properties of jointree propagation.\n\nAnother outcome of these results relates to the level at which it is useful to phrase\nthe problem of factoring graphical probabilistic models. Speci\ufb02cally, the perspective\nwe are promoting here is that probability distributions de\ufb02ned by graphical models\nshould be viewed as multi{linear functions, and the construction of jointrees should\nbe viewed as a process of constructing arithmetic circuits that compute these func-\ntions. That is, the fundamental object being factored is a multi{linear function,\nand the fundamental result of the factorization is an arithmetic circuit. A graphical\nmodel is a useful abstraction of the multi{linear function, and a jointree is a useful\nstructure for embedding the arithmetic circuit.\n\nThis view of factoring is useful since it allows us to cast the factoring problem in\nmore re\ufb02ned terms, which puts us in a better position to exploit the local structure\nof graphical models in the factorization process. Note that the topology of a graph-\nical model de\ufb02nes the form of the multi{linear function, while the model\u2019s local\nstructure (as exhibited in its CPTs) constrains the values of variables appearing\nin the function. One can factor a multi{linear function without knowledge of such\nconstraints, but the resulting factorizations will not be optimal. For a dramatic\nexample, consider a fully connected network with variables X1; : : : ; Xn, where all\nparameters are equal to 1\n2 . Any jointree for the network will have a cluster of size\nn, leading to O(exp(n)) complexity. There is, however, a circuit of O(n) size here,\nsince the network polynomial can be easily factored as: f = ( 1\ni=1(\u201axi + \u201a \u201exi ).\n2 )\nHence, in the presence of local structure, it appears more promising to factor the\ngraphical model into the more re\ufb02ned arithmetic circuit since not every arithmetic\ncircuit can be embedded in a jointree. This promise is made apparent by the results\nin [3], which we sketch next. First, the multi{linear function of a belief network\nis \\encoded\" using a propositional theory, which is expressive enough to capture\nthe form of the multi{linear function in addition to constraints on its variables.\nThe theory is then compiled into a special logical form, known as deterministic\ndecomposable negation normal form. An arithmetic circuit is \ufb02nally extracted from\nthat form. The method was able to generate relatively small arithmetic circuits for\na signi\ufb02cant suite of real{world belief networks with treewidths up to 60.\n\nQn\n\nn\n\nIt is worth mentioning here that the above perspective is in harmony with recent\n\n\fapproaches that represent probabilistic models using algebraic decision diagrams\n(ADDs), citing the promise of ADDs in exploiting local structure [5]. ADDs and\nrelated representations, such as edge{v alued decision diagrams, are known to be\ncompact representations of multi{linear functions. Moreov er, each of these repre-\nsentations can be expanded in linear time into an arithmetic circuit that satis\ufb02es\nsome strong properties [4]. Hence, such representations are special cases of arith-\nmetic circuits as well.\n\nW e \ufb02nally note that the relationship between multi{linear functions (polynomials in\ngeneral) and arithmetic circuits is a classical subject of algebraic complexity theory\n[15]. In this \ufb02eld of complexity, computational problems are expressed as polynomi-\nals, and a central question is that of determining the size of the smallest arithmetic\ncircuit that computes a giv en polynomial, leading to the notion of circuit complex-\nity. Using this notion, it is then meaningful to talk about the circuit complexity\nof a graphical model: the size of the smallest arithmetic circuit that computes the\nmulti{linear function induced by the model.\n\nAc kno wledgment This work has been partially supported by NSF grant IIS-\n9988543 and MURI grant N00014-00-1-0617.\n\nReferences\n\n[1] H. Chan and A. Darwiche. When do numbers really matter? JAIR, 17: 265{287,\n\n2002.\n\n[2] A. Darwiche. A di\ufb01erential approach to inference in Bay esian networks. In UAI\u201900,\n\npages 123{132, 2000. T o appear in JACM.\n\n[3] A. Darwiche. A logical approach to factoring belief networks. In KR\u201902, pages 409{\n\n420, 2002.\n\n[4] A. Darwiche. On the factorization of multi{linear functions. T echnical Report D{128,\n\nUCLA, Los Angeles, Ca 90095, 2002.\n\n[5] J. Hoey , R. St-Aubin, A. Hu, and G. Boutilier. SPUDD: Stochastic planning using\n\ndecision diagrams. In UAI\u201999, pages 279{288, 1999.\n\n[6] C. Huang and A. Darwiche. Inference in belief networks: A procedural guide. IJAR,\n\n15(3): 225{263, 1996.\n\n[7] M. Iri. Simultaneous computation of functions, partial deriv ativ es and estimates of\n\nrounding error. Japan J. Appl. Math., 1:223{252, 1984.\n\n[8] F. V. Jensen, S.L. Lauritzen, and K.G. Olesen. Bay esian updating in recursiv e graph-\n\nical models by local computation. Comp. Stat. Quart., 4:269{282, 1990.\n\n[9] J. Park. MAP complexity results and approximation methods.\n\nIn UAI\u201902, pages\n\n388{396, 2002.\n\n[10] J. Park and A. Darwiche. Approximating MAP using stochastic local search.\n\nIn\n\nUAI\u201901, pages 403{410, 2001.\n\n[11] J. Park and A. Darwiche. A di\ufb01erential semantics for jointree algorithms. T echnical\n\nReport D{118, UCLA, Los Angeles, Ca 90095, 2001.\n\n[12] G. Rote. Path problems in graphs. Computing Suppl., 7:155{189, 1990.\n\n[13] S. Russell, J. Binder, D. Koller, and K. Kanazawa. Local learning in probabilistic\n\nnetworks with hidden v ariables. In UAI\u201995, pages 1146{1152, 1995.\n\n[14] P. P. Shenoy and G. Shafer. Propagating belief functions with local computations.\n\nIEEE Expert, 1(3):43{52, 1986.\n\n[15] J. v on zur Gathen. Algebraic complexity theory . Ann. Rev. Comp. Sci., 3:317{347,\n\n1988.\n\n\f", "award": [], "sourceid": 2226, "authors": [{"given_name": "James", "family_name": "Park", "institution": null}, {"given_name": "Adnan", "family_name": "Darwiche", "institution": null}]}