{"title": "First-order Decomposition Trees", "book": "Advances in Neural Information Processing Systems", "page_first": 1052, "page_last": 1060, "abstract": "Lifting attempts to speedup probabilistic inference by exploiting symmetries in the model. Exact lifted inference methods, like their propositional counterparts, work by recursively decomposing the model and the problem. In the propositional case, there exist formal structures, such as decomposition trees (dtrees), that represent such a decomposition and allow us to determine the complexity of inference a priori. However, there is currently no equivalent structure nor analogous complexity results for lifted inference. In this paper, we introduce FO-dtrees, which upgrade propositional dtrees to the first-order level. We show how these trees can characterize a lifted inference solution for a probabilistic logical model (in terms of a sequence of lifted operations), and make a theoretical analysis of the complexity of lifted inference in terms of the novel notion of lifted width for the tree.", "full_text": "First-Order Decomposition Trees\n\nNima Taghipour\n\nJesse Davis\n\nDepartment of Computer Science, KU Leuven\n\nCelestijnenlaan 200A, B-3001 Heverlee, Belgium\n\nHendrik Blockeel\n\nAbstract\n\nLifting attempts to speedup probabilistic inference by exploiting symmetries in the\nmodel. Exact lifted inference methods, like their propositional counterparts, work\nby recursively decomposing the model and the problem. In the propositional case,\nthere exist formal structures, such as decomposition trees (dtrees), that represent\nsuch a decomposition and allow us to determine the complexity of inference a pri-\nori. However, there is currently no equivalent structure nor analogous complexity\nresults for lifted inference. In this paper, we introduce FO-dtrees, which upgrade\npropositional dtrees to the \ufb01rst-order level. We show how these trees can char-\nacterize a lifted inference solution for a probabilistic logical model (in terms of a\nsequence of lifted operations), and make a theoretical analysis of the complexity\nof lifted inference in terms of the novel notion of lifted width for the tree.\n\n1\n\nIntroduction\n\nProbabilistic logical modes (PLMs) combine elements of \ufb01rst-order logic with graphical models to\nsuccinctly model complex, uncertain, structured domains [5]. These domains often involve a large\nnumber of objects, making ef\ufb01cient inference a challenge. To address this, Poole [12] introduced\nthe concept of lifted probabilistic inference, i.e., inference that exploits the symmetries in the model\nto improve ef\ufb01ciency. Various lifted algorithms have been proposed, mainly by lifting propositional\ninference algorithms [3, 6, 8, 9, 10, 13, 15, 17, 18, 19, 21, 22]. While the relation between the\npropositional algorithms is well studied, we have far less insight into their lifted counterparts.\nThe performance of propositional inference, such as variable elimination [4, 14] or recursive condi-\ntioning [2], is characterized in terms of a corresponding tree decomposition of the model, and their\ncomplexity is measured based on properties of the decomposition, mainly its width. It is known that\nstandard (propositional) inference has complexity exponential in the treewidth [2, 4]. This allows\nus to measure the complexity of various inference algorithms only based on the structure of the\nmodel and its given decomposition. Such analysis is typically done using a secondary structure for\nrepresenting the decomposition of graphical models, such as decomposition trees (dtrees) [2].\nHowever, the existing notion of treewidth does not provide a tight upper bound for the complex-\nity of lifted inference, since it ignores the opportunities that lifting exploits to improve ef\ufb01ciency.\nCurrently, there exists no notion analogous to treewidth for lifted inference to analyze inference\ncomplexity based on the model structure. In this paper, we take a step towards \ufb01lling these gaps.\nOur work centers around a new structure for specifying and analyzing a lifted solution to an inference\nproblem, and makes the following contributions. First, building on the existing structure of dtrees\nfor propositional graphical models, we propose the structure of First-Order dtrees (FO-dtrees) for\nPLMs. An FO-dtree represents both the decomposition of a PLM and the symmetries that lifting\nexploits for performing inference. Second, we show how to determine whether an FO-dtree has\na lifted solution, from its structure alone. Third, we present a method to read a lifted solution (a\nsequence of lifted inference operations) from a liftable FO-dtree, just like we can read a propositional\ninference solution from a dtree. Fourth, we show how the structure of an FO-dtree determines the\n\n1\n\n\fcomplexity of inference using its corresponding solution. We formally analyze the complexity of\nlifted inference in terms of the novel, symmetry-aware notion of lifted width for FO-dtrees. As such,\nFO-dtrees serve as the \ufb01rst formal tool for \ufb01nding, evaluating, and choosing among lifted solutions.1\n\n2 Background\n\nWe use the term \u201cvariable\u201d in both the logical and probabilistic sense. We use logvar for logical\nvariables and randvar for random variables. We write variables in uppercase and their values in\nlowercase. Applying a substitution \u03b8 = {s1 \u2192 t1, . . . , sn \u2192 tn} to a structure S means replacing\neach occurrence of si in S by the corresponding ti. The result is written S\u03b8.\n\n2.1 Propositional and \ufb01rst-order graphical models\n\nProbabilistic graphical models such as Bayesian networks, Markov networks and factor graphs com-\npactly represent a joint distribution over a set of randvars V = {V1, . . . , Vn} by factorizing the dis-\ntribution into a set of local distribution. For example, factor graphs represent the distribution as a\nZ(cid:81) \u03c6i(Vi), where \u03c6i is a potential function that maps each\nproduct of factors: Pr(V1, . . . , Vn) = 1\ncon\ufb01guration of Vi \u2286 V to a real number and Z is a normalization constant.\nProbabilistic logical models use concepts from \ufb01rst-order logic to provide a high-level modeling\nlanguage for representing propositional graphical models. While many such languages exist (see [5]\nfor an overview), we focus on parametric factors (parfactors) [12] that generalize factor graphs.\nParfactors use parametrized randvars (PRVs) to represent entire sets of randvars. For example, the\nPRV BloodT ype(X), where X is a logvar, represents one BloodT ype randvar for each object in\nthe domain of X (written D(X)). Formally, a PRV is of the form P (X)|C where C is a constraint\nconsisting of a conjunction of inequalities Xi (cid:54)= t where t \u2208 D(Xi) or t \u2208 X. It represents the set\nof all randvars P (x) where x \u2208 D(X) and x satis\ufb01es C; this set is denoted rv(P (X)|C).\nA parfactor uses PRVs to compactly encode a set of factors.\nthe parfac-\ntor \u03c6(S moke(X), F riends(X, Y ), S moke(Y )) could encode that friends have similar smoking\nhabits. It imposes a symmetry in the model by stating that the probability that, among two friends,\nboth, one or none smoke, is the same for all pairs of friends, in the absence of any other information.\nFormally, a parfactor is of the form \u03c6(A)|C, where A = (Ai)n\ni=1 is a sequence of PRVs, C is a\nconstraint on the logvars appearing in A, and \u03c6 is a potential function. The set of logvars occurring in\nA is denoted logvar(A). A grounding substitution maps each logvar to an object from its domain. A\nparfactor g represents the set of all factors that can be obtained by applying a grounding substitution\nto g that is consistent with C; this set is called the grounding of g, and is denoted gr(g). A parfactor\nmodel is a set G of parfactors. It compactly de\ufb01nes a factor graph gr(G) = {gr(g)|g \u2208 G}.\nFollowing the literature, we assume that the model is in a normal form, such that (i) each pair of\nlogvars have either identical or disjoint domains, and (ii) for each pair of co-domain logvars X, X(cid:48)\nin a parfactor \u03c6(A)|C, (X (cid:54)= X(cid:48)) \u2208 C. Every model can be written into this form in poly time [13].\n2.2\n\nFor example,\n\nInference\n\nA typical inference task is to compute the marginal probability of some variables by summing out the\n\nremaining variables, which can be written as: Pr(V(cid:48)) =(cid:80)V\\V(cid:48)(cid:81)i \u03c6i(Vi). This is an instance of the\ngeneral sum-product problem [1]. Abusing notation, we write this sum of products as(cid:80)V\\V(cid:48) M (V).\n\nInference by recursive decomposition. Inference algorithms exploit the factorization of the model\nto recursively decompose the original problem into smaller, independent subproblems. This is\nachieved by a decomposition of the sum-product, according to a simple decomposition rule.\n\nDe\ufb01nition 1 (The decomposition rule) Let P be a sum-product computation P :(cid:80)V\nM (V), and\nlet M = {M1(V1), . . . Mk(Vk)} be a partitioning (decomposition) of M (V). Then, the decomposi-\n1Similarly to existing studies on propositional inference [2, 4], our analysis only considers the model\u2019s\n\nglobal structure, and makes no assumptions about its local structure.\n\n2\n\n\fM1(V1)(cid:1) . . .(cid:0)(cid:88)V\n\n(cid:48)\nk\n\nMk(Vk)(cid:1)(cid:105)\n\n(cid:48)\n1\n\nFigure 1: (a) a factor graph model; (b) a dtree for the model, with its node clusters shown as\ncutset, [context]; (c) the corresponding factorization of the sum-product computations.\ntion of P, w.r.t. M is an equivalent sum-product formula PM, de\ufb01ned as follows:\n\nMost exact inference algorithms recursively apply this rule and compute the \ufb01nal result using top-\ndown or bottom-up dynamic programming [1, 2, 4]. The complexity is then exponential only in\nthe size of the largest sub-problem solved. Variable elimination (VE) is a bottom-up algorithm that\n\nPM :(cid:88)V(cid:48) (cid:104)(cid:0)(cid:88)V\nwhere V(cid:48) =(cid:83)i,j Vi \u2229 Vj, and V(cid:48)i = Vi \\ V(cid:48).\ncomputes the nested sum-product by repeatedly solving an innermost problem(cid:80)V M (V,V(cid:48)) to\neliminate V from the model. At each step, VE eliminates a randvar V from the model by multiplying\nthe factors in M (V,V(cid:48)) into one and summing-out V from the resulting factor.\nDecomposition trees. A single inference problem typically has multiple solutions, each with a\ndifferent complexity. A decomposition tree (dtree) is a structure that represents the decomposition\nused by a speci\ufb01c solution and allows us to determine its complexity [2]. Formally, a dtree is a\nrooted tree in which each leaf represents a factor in the model.2 Each node in the tree represents a\ndecomposition of the model into the models under its child subtrees. Properties of the nodes can be\nused to determine the complexity of inference. Child(T ) refers to T \u2019s child nodes; rv(T ) refers to\nthe randvars under T , which are those in its factor if T is a leaf and rv(T ) = \u222aT (cid:48)\u2208Child(T )rv(T (cid:48))\notherwise. Using these, the important properties of cutset, context, and cluster are de\ufb01ned as follows:\n\u2022 cutset(T ) = \u222a{T1,T2}\u2208child(T )rv(T1) \u2229 rv(T2) \\ acutset(T ), where acutset(T ) is the\n\u2022 context(T ) = rv(T ) \u2229 acutset(T )\n\u2022 cluster(T ) = rv(T ), if T is a leaf; otherwise cluster(T ) = cutset(T ) \u222a context(T )\n\nunion of cutsets associated with ancestors of T .\n\nFigure 1 shows a factor graph model, a dtree for it with its clusters, and the corresponding sum-\nproduct factorization. Intuitively, the properties of dtree nodes help us analyze the size of subprob-\nlems solved during inference. In short, the time complexity of inference is O(n exp(w)) where n is\nthe size (number of nodes) of the tree and w is its width, i.e., its maximal cluster size minus one.\n\n3 Lifted inference: Exploiting symmetries\n\nThe inference approach of Section 2.2 ignores the symmetries imposed by a PLM. Lifted inference\naims at exploiting symmetries among a model\u2019s isomorphic parts. Two constructs are isomorphic if\nthere is a structure preserving bijection between their components. As PLMs make assertions about\nwhole groups of objects, they contain many isomorphisms, established by a bijection at the level\nof objects. Building on this, symmetries arise between constructs at different levels [11], such as\nbetween: randvars, value assignments to randvars, factors, models, or even sum-product problems.\nAll exact lifted inference methods use two main tools for exploiting symmetries, i.e., for lifting:\n\n1. Divide the problem into isomorphic subproblems, solve one instance, and aggregate\n2. Count the number of isomorphic con\ufb01gurations for a group of interchangeable variables\n\ninstead of enumerating all possible con\ufb01gurations.\n\n2We use a slightly modi\ufb01ed de\ufb01nition for dtrees, which were originally de\ufb01ned as full binary rooted trees.\n\n3\n\n\u03a3B1,B2\u03c61\u03c62\u03c63\u03c64AB1B2B3B4\u03c64(A,B4)\u03c63(A,B3)\u03c62(A,B2)\u03c61(A,B1)\u03c6\uffff\u03c6\uffff(B1,B2)\u03c64(A,B4)\u03c63(A,B3)\u03c62(A,B2)\u03c61(A,B1)\u03c6\uffff(B1,B2)\u03a3A\u03a3B4\u03a3B3A[\u2205]B4,[A]B3,[A]B1,B2,[A][A,B1][B1,B2][A,B2](a)(b)(c)\fFigure 2: Isomorphic decomposition of a model. Dashed boxes indicate the partitioning into groups.\n\nBelow, we show how these tools are used by lifted variable elimination (LVE) [3, 10, 12, 17, 18].\nIsomorphic decomposition: exploiting symmetry among subproblems. The \ufb01rst lifting tool\nidenti\ufb01es cases where the application of the decomposition rule results in a product of isomorphic\nsum-product problems. Since such problems all have isomorphic answers, we can solve one prob-\nlem and reuse its result for all the others. In LVE, this corresponds to lifted elimination, which uses\nthe operations of lifted multiplication and lifted sum-out on parfactors to evaluate a single represen-\ntative problem. Afterwards, LVE also attempts to aggregate the result (compute their product) by\ntaking advantage of their isomorphism. For instance, when the results are identical, LVE computes\ntheir product simply by exponentiating the result of one problem.\nExample 1. Figure 2 shows the model de\ufb01ned by \u03c6(F (X, Y ), F (Y, X))|X (cid:54)= Y , with D(X) =\nD(Y ) = {a, b, c, d}. The model asserts that the friendship relationship (F ) is likely to be symmetric.\nTo sum-out the randvars F using the decomposition rule, we partition the ground factors into six\ngroups of the form {\u03c6(F (x, y), F (y, x)), \u03c6(F (y, x), F (x, y))}, i.e., one group for each 2-subset\n{x, y} \u2286 {a, b, c, d}. Since no randvars are shared between the groups, this decomposes the problem\ninto the product of six isomorphic sums(cid:80)F (x,y),F (y,x) \u03c6(F (x, y), F (y, x)) \u00b7 \u03c6(F (y, x), F (x, y)).\nAll six sums have the same result c (a scalar). Thus, LVE computes c only once (lifted elimination)\nand computes the \ufb01nal result by exponentiation as c6 (lifted aggregation).\nCounting: exploiting interchangeability among randvars. Whereas isomorphic decomposition\nexploits symmetry among problems, counting exploits symmetries within a problem, by identi-\nfying interchangeable randvars. A group of (k-tuples of) randvars are interchangeable, if per-\nmuting the assignment of values to the group results in an equivalent model. Consider a sum-\nproduct subproblem(cid:80)V\nM (V,V(cid:48)) that contains a set of n interchangeable (k-tuples of) randvars\ni=1. The interchangeability allows us to rewrite V into a single counting\nV = {(Vi1, Vi2, . . . Vik)}n\nrandvar #[V], whose value is the histogram h = {(v1, n1), . . . , (vr, nr)}, where ni is the number\nof tuples with joint state vi. This allows us to replace a sum over all possible joint states of V with\na sum over the histograms for #[V]. That is, we compute M (V(cid:48)) =(cid:80)m\ni=1 MUL(hi) \u00d7 M (hi,V(cid:48)),\nwhere MUL(hi) denotes the number of assignments to V that yield the same histogram hi for #[V].\nSince the number of histograms is O(nexp(k)), when n (cid:29) k, we gain exponential savings over\nenumerating all the possible joint assignments, whose number is O(exp(nk)). This lifting tool is\nemployed in LVE by counting conversion, which rewrites the model in terms of counting randvars.\nExample 2. Consider the model de\ufb01ned by the parfactor \u03c6(S(X), S(Y ))|X (cid:54)= Y , which is\n(cid:81)i(cid:54)=j \u03c6(S(xi), S(xj)). The group of randvars {S(x1), . . . , S(xn)} are interchangeable here, since\nunder any value assignment where nt randvars are true and nf randvars are f alse, the model eval-\nuates to the same value \u03c6(cid:48)(nt, nf ) = \u03c6(t, t)nt.(nt\u22121) \u00b7 \u03c6(t, f )nt.nf \u00b7 \u03c6(f, t)nf .nt \u00b7 \u03c6(f, f )nf .(nf\u22121).\nBy counting conversion, LVE rewrites this model into \u03c6(cid:48)(#X [S(X)]).\n\n4 First-order decomposition trees\n\nIn this section, we propose the structure of FO-dtrees, which compactly represent a recursive de-\ncomposition for a PLM and the symmetries therein.\n\n4.1 Structure\n\nAn FO-dtree provides a compact representation of a propositional dtree, just like a PLM is a compact\nrepresentation of a propositional model. It does so by explicitly capturing isomorphic decomposi-\ntion, which in a dtree correspond to a node with isomorphic children. Using a novel node type,\ncalled a decomposition into partial groundings (DPG) node, an FO-dtree represents the entire set of\n\n4\n\nF(a,b)F(b,a)\u03c6\u03c6F(a,c)F(c,a)\u03c6\u03c6...\u03c6\u03c6F(c,d)F(d,c)\fFigure 3: (a) dtree (left) and FO-dtree (right) of Example 3; (b) FO-dtree of Example 1\n\nisomorphic child subtrees with a single representative subtree. To formally introduce the structure,\nwe \ufb01rst show how a PLM can be decomposed into isomorphic parts by DPG.\nDPG of a parfactor model. The DPG of a parfactor g is de\ufb01ned w.r.t. a k-subset X =\n{X1, . . . , Xk} of its logvars that all have the same domain DX. For example, the decomposition\nused in Example 1, and shown in Figure 2, is the DPG of \u03c6(F (X, Y ), F (Y, X))|X (cid:54)= Y w.r.t. log-\nvars {X, Y }. Formally, DP G(g, X) partitions the model de\ufb01ned by g into(cid:0)|DX|k (cid:1) parts: one part Gx\nfor each k-subset x = {x1, . . . , xk} of the objects in DX. Each Gx in turn contains all k! (partial)\ngroundings of g that can result from replacing (X1, . . . , Xk) with a permutation of (x1, . . . , xk).\nThe key intuition behind DPG is that for any x, x(cid:48) \u2286k DX, Gx is isomorphic to Gx(cid:48), since any\nbijection from x to x(cid:48) yields a bijection from Gx to Gx(cid:48).\nDP G can be applied to a whole model G = {gi}m\ni=1, if G\u2019s logvars are (re-)named such that (i)\nonly co-domain logvars share the same name, and (ii) logvars X appear in all parfactors.\nExample 3. Consider G = {\u03c61(P (X)), \u03c62(A, P (X))}. DP G(G,{X}) = {Gi}n\ngroup Gi = {\u03c61(P (xi)), \u03c62(A, P (xi))} is a grounding of G (w.r.t. X).\nFO-dtrees simply add to dtrees special nodes for representing DPGs in parfactor models.\nDe\ufb01nition 2 (DPG node) A DPG node TX is a triplet (X, x, C), where X = {X1, . . . Xk} is a set\nof logvars with the same domain DX, x = {x1, . . . , xk} is a set of representative objects, and C is\na constraint, such that for all i (cid:54)= j: xi (cid:54)= xj \u2208 C. We denote this node as \u2200x : C in the tree.\nA representative object is simply a placeholder for a domain object.3 The idea behind our FO-dtrees\nis to use TX to graphically indicate a DP G(G, X). For this, each TX has a single child distinguished\nas Tx, under which the model is a representative instance of the isomorphic models Gx in the DPG.\n\ni=1, where each\n\nDe\ufb01nition 3 (FO-dtree) An FO-dtree is a rooted tree in which\n\n1. non-leaf nodes may be DPG nodes\n\n2. each leaf contains a factor (possibly with representative objects)\n\n3. each leaf with a representative object x is the descendent of exactly one DPG node TX =\n\n(X, x, C), such that x \u2208 x\n\n4. each leaf that is a descendent of TX has all the representative objects x, and\n5. for each TX with X = {X1, . . . , Xk}, Tx has k! children {Ti}k!\n\nup to a permutation of the representative objects x.\n\ni=1, which are isomorphic\n\nSemantics. Each FO-dtree de\ufb01nes a dtree, which can be constructed by recursively grounding\n\nits DPG nodes. Grounding a DPG node TX yields a (regular) node T (cid:48)X with (cid:0)|DX|k (cid:1) children\n{Tx\u2192x(cid:48)|x(cid:48) \u2286k DX}, where Tx\u2192x(cid:48) is the result of replacing x with objects x(cid:48) in Tx.\nExample 4. Figure 3 (a) shows the dtree of Example 3 and its corresponding FO-dtree, which only\nhas one instance Tx of all isomorphic subtrees Txi. Figure 3 (b) shows the FO-dtree for Example 1.\n3As such, it plays the same role as a logvar. However, we use both to distinguish between a whole group of\n\nrandvars (a PRV P (X)), and a representative of this group (a representative randvar P (x)).\n\n5\n\n\u03c61(P(xn))\u03c62(A,P(xn))......\u03c61(P(x1))\u03c62(A,P(x1))TXTx1Txn\u2200x\u03c61(P(x))\u03c62(A,P(x))TxTX\u21d4\u03c6(F(x,y),F(y,x))\u03c6(F(y,x),F(x,y))\u2200{x,y}y\uffff=xTXYTxy(a)(b)\f4.2 Properties\n\nDarwiche [2] showed that important properties of a recursive decomposition are captured in the\nproperties of dtree nodes. In this section, we de\ufb01ne these properties for FO-dtrees. Adapting the def-\ninitions of the dtree properties, such as cutset, context, and cluster, for FO-dtrees requires accounting\nfor the semantics of an FO-dtree, which uses DPG nodes and representative objects. More specif-\nically, this requires making the following two modi\ufb01cations (i) use a function Child\u03b8(T ), instead\nof Child(T ), to take into account the semantics of DPG nodes, and (ii) use a function \u2229\u03b8 that \ufb01nds\nthe intersection of two sets of representative randvars. First, for a DPG node TX = (X, x, C), we\nde\ufb01ne: Child\u03b8(TX ) = {Tx\u2192x(cid:48)|x(cid:48) \u2286k DX}. Second, for two sets A = {ai}n\ni=1 and B = {bi}n\ni=1\nof (representative) randvars we de\ufb01ne: A \u2229\u03b8 B = {ai|\u2203\u03b8 \u2208 \u0398 : ai\u03b8 \u2208 B}, with \u0398 the set of\ngrounding substitutions to their representative objects. Naturally, this provides a basis to de\ufb01ne a\n\u2018\\\u03b8\u2019 operator as : A \\\u03b8 B = A \\ (A \u2229\u03b8 B).\nAll the properties of an FO-dtree are de\ufb01ned based on their corresponding de\ufb01nitions for dtrees, by\nreplacing Child, \u2229, \\ with Child\u03b8, \u2229\u03b8, \\\u03b8. Interestingly, all the properties can be computed without\ngrounding the model, e.g., for a DPG node TX, we can compute rv(TX ) simply as rv(Tx)\u03b8\u22121\nX , with\nX = {x \u2192 X}.4 Figure 4 shows examples of FO-dtrees with their node clusters.\n\u03b8\u22121\n\nFigure 4: Three FO-dtree with their clusters (shown as cutset, [context]).\n\nCounted FO-dtrees. FO-dtrees capture the \ufb01rst lifting tool, isomorphic decomposition, explicitly\nin DPG nodes. The second tool, counting, can be simply captured by rewriting interchangeable\nrandvars in clusters of the tree nodes with counting randvars. This can be done in FO-dtrees similarly\nto the operation of counting conversion on logvars in LVE. We call such a tree a counted FO-dtree.\nFigure 5(a) shows an FO-dtree (left) and its counted version (right).\n\nFigure 5: (a) an FO-dtree (left) and its counted version (right); (b) lifted operations of each node.\n\n5 Liftable FO-dtrees\n\nWhen inference can be performed using the lifted operations (i.e., without grounding the model),\nit runs in polynomial time in the domain size of logvars. Formally, this is called a domain-lifted\ninference solution [19]. Not all FO-dtrees have a lifted solution, which is easy to see since not\n\n4The only non-trivial property is cutset of DPG nodes. We can show that cutset(TX ) excludes from\n\nrv(TX ) \\ acutset(TX ) only those PRVs for which X is a root binding class of logvars [8, 19].\n\n6\n\n\u03c6(F(x,y),F(y,x))\u03c6(F(y,x),F(x,y))F(y,x),F(x,y)\u2200x\u03c6(F(x,y),F(y,x))\u2200{x,y}y\uffff=x\u2200yy\uffff=xF(X,Y)|X\uffff=Y\u2205\uffff\u2205\uffff\uffff\u2205\uffff\u2205\uffffF(x,Y),F(Y,x)|Y\uffff=X\uffffTXYTxyTXTY\u2200x\u2200y\u03c61(P(x))\u03c63(P(x),Q(x,y))\u03c62(Q(x,y))Q(x,y)P(x)[\u2205][P(x)]TXTxTYTy[P(x)]\u2205\u2205[\u2205]\u2200x\u2200y\u2200x\u2200yCountingonY\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2192\u03c6(S(x),F(x,y),D(y))\u03c6(S(x),F(x,y),D(y))S(x),[D(Y)]S(x),\uffff#Y[D(Y)]\uffffD(Y),[\u2205]#Y[D(Y)],[\u2205]F(x,y)[S(x),D(y)]F(x,y)[S(x),D(y)]\u03a3F(X,Y)#Y,\u03a3S(X),\u03a3#Y[D(Y)]AGG(X)(a)(b)y\uffff=xy\uffff=x\fall models are liftable [7], though each model has at least one FO-dtree.5 Fortunately, we can\nstructurally identify the FO-dtrees for which we know a lifted solution.\nWhat models can the lifting tools handle? Lifted inference identi\ufb01es isomorphic problems and\nsolves only one instance of those. Similar to propositional inference, for a lifted method the dif\ufb01culty\nof each sub-problem increases with the number of variables in the problem\u2013 those that appear in\nthe clusters of FO-dtree nodes. When each problem has a bounded (domain-independent) number\nof those, the complexity of inference is clearly independent of the domain size. However, a sub-\nproblem can involve a large group of randvars\u2014 when there is a PRV in the cluster. While traditional\ninference is then intractable, lifting may be able to exploit the interchangeability among the randvars\nand reduce the complexity by counting. Thus, whether a problem has a lifted solution boils down\nto whether we can rewrite it such that it only contains a bounded (domain-independent) number\nof counting randvars and ground randvars. This requires the problem to have enough symmetries\nin it such that all the randvars V = V1, . . . Vn in each cluster can be divided into k groups of\ninterchangeable (tuples of) randvars V1,V2, . . . ,Vk, where k is independent of the domain size.\nTheorem 1 A (non-counted) FO-dtree has a lifted inference solution if its clusters only consist of\n(representative) randvars and 1-logvar PRVs. We call such an FO-dtree a liftable tree.6\n\nProof sketch. Such a tree has a corresponding LVE solution: (i) each sub-problem that we need to\nsolve in such a tree can be formulated as a (sum-out) problem on a model consisting of a parfac-\ntor with 1-logvar PRVs, and (ii) we can count-convert all the logvars in a parfactor with 1-logvar\nPRVs [10, 16], to rewrite all the PRVs into a (bounded) number of counting randvars.7\n6 Lifted inference based on FO-dtrees\n\nA dtree can prescribe the operations performed by propositional inference, such as VE [2]. In this\nsection, we show how a liftable FO-dtree can prescribe an LVE solution for the model, thus providing\nthe \ufb01rst formal method for symbolic operation selection in lifted inference.\nIn VE, each inference procedure can be characterized based on its elimination order. Darwiche [2]\nshows how we can read a (partial) elimination order from a dtree (by assigning elimination of each\nrandvar to some tree node). We build on this result to read an LVE solution from a (non-counted)\nFO-dtree. For this, we assign to each node a set of lifted operations, including lifted elimination of\nPRVs (using multiplication and sum-out), and counting conversion and aggregation of logvars:\n\n: A PRV V is eliminated at a node T , if V \u2208 cluster(T ) \\ context(T ).\n\n\u2022 (cid:80)V\n\u2022 AGG(X): A logvar X is aggregated at a DPG node TX = (X, x, C), if (i) X \u2208 X, and\n(ii) X /\u2208 logvar(cluster(TX)).\n\u2022 #X: A logvar X is counted at TX, if (i) X \u2208 X, and (ii) X \u2208 logvar(cluster(TX )).\n\nA lifted solution can be characterized by a sequence of these operations. For this we simply need to\norder the operations according to two rules:\n\n1. If node T2 is a descendent of T1, and OPi is performed at Ti, then OP2 \u227a OP1.\n2. For operations at the same node, aggregation and counting precede elimination.\n\nExample 5. From the FO-dtree shown in Figure 5 (a) we can read the following order of operations:\n\n(cid:80) F (X, Y ) \u227a #Y \u227a(cid:80) S(X) \u227a AGG(X) \u227a(cid:80) #Y [D(Y )], see Figure 5 (b). (cid:3)\n\n7 Complexity of lifted inference\n\nIn this section, we show how to compute the complexity of lifted inference based on an FO-dtree.\nJust as the complexity of ground inference for a dtree is parametrized in terms of the tree\u2019s width,\nwe de\ufb01ne a lifted width for FO-dtrees and use it to parametrize the complexity of lifted inference.\n\n5A basic algorithm for constructing an FO-dtree for a PLM is presented in the supplementary material.\n6Note that this only restricts the number of logvars in PRVs appearing in an FO-dtree\u2019s clusters, not PRVs\n\nin the PLM. For instance, all the liftable trees in this paper correspond to PLMs containing 2-logvar PRVs.\n\n7For a more detailed proof, see the supplementary material.\n\n7\n\n\fTo analyze the complexity, it suf\ufb01ces to compute the complexity of the operations performed at each\nnode. Similar to standard inference, this depends on the randvars involved in the node\u2019s cluster: for\neach lifted operation at a node T , LVE manipulates a factor involving the randvars in cluster(T ),\nand thus has complexity proportional to O(|range(cluster(T ))|), where range denotes the set of\npossible (joint) values that the randvars can take on. However, unlike in standard inference, this\ncomplexity need not be exponential in |rv(cluster(T ))|, since the clusters can contain counting\nrandvars that allow us to handle interchangeable randvars more ef\ufb01ciently. To accommodate this\nin our analysis, we de\ufb01ne two widths for a cluster: a ground width wg, which is the number of\nground randvars in the cluster, and a counting width, w#, which is the number of counting randvars\nin it. The cornerstone of our analysis is that the complexity of an operation performed at node T is\nexponential only in wg, and polynomial in the domain size with degree w#. We can thus compute\nthe complexity of the entire inference process, by considering the hardest of these operations, and\nthe number of operations performed. We do so by de\ufb01ning a lifted width for the tree.\nDe\ufb01nition 4 (Lifted width) The lifted width of an FO-dtree T is a pair (wg, w#), where wg is the\nlargest ground width among the clusters of T and and w# is the largest counting width among them.\nTheorem 2 The complexity of lifted variable elimination for a counted liftable FO-dtree T is:\n\nO(nT \u00b7 log n \u00b7 exp(wg) \u00b7 n(w#\u00b7r#)\n\n#\n\n),\n\nwhere nT is the number of nodes in T , (wg, w#) is its lifted width, n (resp., n#) is the the largest\ndomain size among its logvars (resp., counted logvars), and r# is the largest range size among its\ntuples of counted randvars.\nProof sketch. We can prove the theorem by showing that (i) the largest range size among clusters,\nand thus the largest factor constructed by LVE, is O(exp(wg)\u00b7 n(w#\u00b7r#)), (ii) in case of aggregation\nor counting conversion, each entry of the factor is exponentiated, with complexity O(log n), and\n(iii) there are at most nT operations. (For a more detailed proof, see the supplementary material.) (cid:3)\nComparison to ground inference. To understand the savings achieved by lifting, it is useful to\ncompare the above complexity to that of standard VE on the corresponding dtree, i.e., using the\nsame decomposition. The complexity of ground VE is: O(nG \u00b7 exp(wg) \u00b7 exp(n#.w#)), where nG\nis the size of the corresponding propositional dtree. Two important observations are:\n1. The number of ground operations is linear in the dtree\u2019s size nG, instead of the FO-dtree\u2019s\nsize nT (which is polynomially smaller than nG due to DPG nodes). Roughly speaking,\nlifting allows us to perform nT /nG of the ground operations by isomorphic decomposition.\n# for lifted inference.\nThe latter is typically exponentially smaller. These speedups, achieved by counting, are the\nmost signi\ufb01cant for lifted inference, and what allows it to tackle high treewidth models.\n\n2. Ground VE, has a factor exp(n#.w#) in its complexity, instead of nw#\n\n8 Conclusion\n\nWe proposed FO-dtrees, a tool for representing a recursive decomposition of PLMs. An FO-dtree\nexplicitly shows the symmetry between its isomorphic parts, and can thus show a form of decom-\nposition that lifted inference methods employ. We showed how to decide whether an FO-dtree is\nliftable (has a corresponding lifted solution), and how to derive the sequence of lifted operations\nand the complexity of LVE based on such a tree. While we focused on LVE, our analysis is also\napplicable to lifted search-based methods, such as lifted recursive conditioning [13], weighted \ufb01rst-\norder model counting [21], and probabilistic theorem proving [6]. This allows us to derive an order\nof operations and complexity results for these methods, when operating based on an FO-dtree. Fur-\nther, we can show the close connection between LVE and search-based methods, by analyzing their\nperformance based on the same FO-dtree. FO-dtrees are also useful to approximate lifted inference\nalgorithms, such as lifted blocked Gibbs sampling [22] and RCR [20], that attempt to improve their\ninference accuracy by identifying liftable subproblems and handling them by exact inference.\n\nAcknowledgements\n\nThis research is supported by the Research Fund K.U.Leuven (GOA 08/008, CREA/11/015 and\nOT/11/051), and FWO-Vlaanderen (G.0356.12).\n\n8\n\n\fReferences\n[1] F. Bacchus, S. Dalmao, and T. Pitassi. Solving #-SAT and Bayesian inference with backtracking search.\n\nJournal of Arti\ufb01cial Intelligence Research, 34(2):391, 2009.\n\n[2] Adnan Darwiche. Recursive conditioning. Artif. Intell., 126(1-2):5\u201341, 2001.\n[3] Rodrigo de Salvo Braz, Eyal Amir, and Dan Roth. Lifted \ufb01rst-order probabilistic inference.\n\nIn Pro-\nceedings of the 19th International Joint Conference on Arti\ufb01cial Intelligence (IJCAI), pages 1319\u20131325,\n2005.\n\n[4] Rina Dechter. Bucket elimination: A unifying framework for reasoning. Artif. Intell., 113(1-2):41\u201385,\n\n1999.\n\n[5] Lise Getoor and Ben Taskar, editors. An Introduction to Statistical Relational Learning. MIT Press, 2007.\n[6] Vibhav Gogate and Pedro Domingos. Probabilistic theorem proving. In Proceedings of the 27th Confer-\n\nence on Uncertainty in Arti\ufb01cial Intelligence (UAI), pages 256\u2013265, 2011.\n\n[7] Manfred Jaeger and Guy Van den Broeck. Liftability of probabilistic inference: Upper and lower bounds.\n\nIn Proceedings of the 2nd International Workshop on Statistical Relational AI (StaRAI), 2012.\n\n[8] Abhay Jha, Vibhav Gogate, Alexandra Meliou, and Dan Suciu. Lifted inference seen from the other side :\nThe tractable features. In Proceedings of the 23rd Annual Conference on Neural Information Processing\nSystems (NIPS), pages 973\u2013981. 2010.\n\n[9] Kristian Kersting, Babak Ahmadi, and Sriraam Natarajan. Counting belief propagation. In Proceedings\n\nof the 25th Conference on Uncertainty in Arti\ufb01cial Intelligence (UAI), pages 277\u2013284, 2009.\n\n[10] Brian Milch, Luke S. Zettlemoyer, Kristian Kersting, Michael Haimes, and Leslie Pack Kaelbling. Lifted\nprobabilistic inference with counting formulas. In Proceedings of the 23rd AAAI Conference on Arti\ufb01cial\nIntelligence (AAAI), pages 1062\u20131608, 2008.\n\n[11] Mathias Niepert. Markov chains on orbits of permutation groups. In Proceedings of the 28th Conference\n\non Uncertainty in Arti\ufb01cial Intelligence (UAI), pages 624\u2013633, 2012.\n\n[12] David Poole. First-order probabilistic inference. In Proceedings of the 18th International Joint Confer-\n\nence on Arti\ufb01cial Intelligence (IJCAI), pages 985\u2013991, 2003.\n\n[13] David Poole, Fahiem Bacchus, and Jacek Kisynski. Towards completely lifted search-based probabilistic\n\ninference. CoRR, abs/1107.4035, 2011.\n\n[14] David Poole and Nevin Lianwen Zhang. Exploiting contextual independence in probabilistic inference.\n\nJ. Artif. Intell. Res. (JAIR), 18:263\u2013313, 2003.\n\n[15] Parag Singla and Pedro Domingos. Lifted \ufb01rst-order belief propagation. In Proceedings of the 23rd AAAI\n\nConference on Arti\ufb01cial Intelligence (AAAI), pages 1094\u20131099, 2008.\n\n[16] Nima Taghipour and Jesse Davis. Generalized counting for lifted variable elimination. In Proceedings of\n\nthe 2nd International Workshop on Statistical Relational AI (StaRAI), 2012.\n\n[17] Nima Taghipour, Daan Fierens, Jesse Davis, and Hendrik Blockeel. Lifted variable elimination with\narbitrary constraints. In Proceedings of the 15th International Conference on Arti\ufb01cial Intelligence and\nStatistics (AISTATS), pages 1194\u20131202, 2012.\n\n[18] Nima Taghipour, Daan Fierens, Guy Van den Broeck, Jesse Davis, and Hendrik Blockeel. Completeness\nresults for lifted variable elimination. In Proceedings of the 16th International Conference on Arti\ufb01cial\nIntelligence and Statistics (AISTATS), 2013.\n\n[19] Guy Van den Broeck. On the completeness of \ufb01rst-order knowledge compilation for lifted probabilistic\ninference. In Proceedings of the 24th Annual Conference on Advances in Neural Information Processing\nSystems (NIPS), pages 1386\u20131394, 2011.\n\n[20] Guy Van den Broeck, Arthur Choi, and Adnan Darwiche. Lifted relax, compensate and then recover:\nFrom approximate to exact lifted probabilistic inference. In Proceedings of the 28th Conference on Un-\ncertainty in Arti\ufb01cial Intelligence (UAI), pages 131\u2013141, 2012.\n\n[21] Guy Van den Broeck, Nima Taghipour, Wannes Meert, Jesse Davis, and Luc De Raedt. Lifted proba-\nbilistic inference by \ufb01rst-order knowledge compilation. In Proceedings of the 22nd International Joint\nConference on Arti\ufb01cial Intelligence (IJCAI), pages 2178\u20132185, 2011.\n\n[22] Deepak Venugopal and Vibhav Gogate. On lifting the gibbs sampling algorithm. In Proceedings of the\n26th Annual Conference on Advances in Neural Information Processing Systems (NIPS), pages 1\u20136, 2012.\n\n9\n\n\f", "award": [], "sourceid": 553, "authors": [{"given_name": "Nima", "family_name": "Taghipour", "institution": "KU Leuven"}, {"given_name": "Jesse", "family_name": "Davis", "institution": "KU Leuven"}, {"given_name": "Hendrik", "family_name": "Blockeel", "institution": "KU Leuven"}]}