{"title": "Identification of Conditional Causal Effects under Markov Equivalence", "book": "Advances in Neural Information Processing Systems", "page_first": 11516, "page_last": 11524, "abstract": "Causal identification is the problem of deciding whether a post-interventional distribution is computable from a combination of qualitative knowledge about the data-generating process, which is encoded in a causal diagram, and an observational distribution. A generalization of this problem restricts the qualitative knowledge to a class of Markov equivalent causal diagrams, which, unlike a single, fully-specified causal diagram, can be inferred from the observational distribution.\nRecent work by (Jaber et al., 2019a) devised a complete algorithm for the identification of unconditional causal effects given a Markov equivalence class of causal diagrams. However, there are identifiable conditional causal effects that cannot be handled by that algorithm. In this work, we derive an algorithm to identify conditional effects, which are particularly useful for evaluating conditional plans or policies.", "full_text": "Identi\ufb01cation of Conditional Causal Effects\n\nunder Markov Equivalence\n\nAmin Jaber\n\nPurdue University\n\njaber0@purdue.edu\n\nJiji Zhang\n\nLingnan University\n\njijizhang@ln.edu.hk\n\nElias Bareinboim\nColumbia University\neb@cs.columbia.edu\n\nAbstract\n\nCausal identi\ufb01cation is the problem of deciding whether a post-interventional\ndistribution is computable from a combination of qualitative knowledge about the\ndata-generating process, which is encoded in a causal diagram, and an observational\ndistribution. A generalization of this problem restricts the qualitative knowledge to\na class of Markov equivalent causal diagrams, which, unlike a single, fully-speci\ufb01ed\ncausal diagram, can be inferred from the observational distribution. Recent work\nby (Jaber et al., 2019a) devised a complete algorithm for the identi\ufb01cation of\nunconditional causal effects given a Markov equivalence class of causal diagrams.\nHowever, there are identi\ufb01able conditional causal effects that cannot be handled by\nthat algorithm. In this work, we derive an algorithm to identify conditional effects,\nwhich are particularly useful for evaluating conditional plans or policies.\n\n1\n\nIntroduction\n\nThe graphical approach to causal inference is becoming an important tool for assessing the ef\ufb01cacy\nof actions or policies (Pearl, 2000; Bareinboim and Pearl, 2016). In this approach, data from an\nobservational probability distribution P is associated with a causal diagram (e.g., Fig. 1a) in which\nnodes correspond to measured variables, directed edges represent direct causal relations, and bi-\ndirected edges encode spurious associations due to unmeasured confounding variables. Performing\nan action do(X = x) eliminates the impact of other variables on those in X by \ufb01xing the values of\nthe latter and induces an interventional distribution, denoted Px. Whether, and if so how, aspects of\nPx can be determined from the observational distribution together with the causal diagram is known\nas the problem of causal identi\ufb01cation.\nIn this work, we focus on conditional causal effects, of the form Px(y|z), which denotes the\nconditional probability of Y = y given Z = z according to the interventional distribution Px. Such\nconditional effects are particularly useful when what is at stake is the consequence of conditional\nplans or policies, in which what value or probability distribution to impose on X is contingent on\nthe value of Z (Pearl and Robins, 1995). When the available knowledge is suf\ufb01cient to delineate\nthe causal diagram, a number of criteria, including a complete algorithm, for identifying conditional\neffects are known (Pearl, 1995; Spirtes et al., 2000; Tian, 2004; Shpitser and Pearl, 2006). However,\nwe are usually in a position where background knowledge is not nearly enough to give us con\ufb01dence\non a single causal diagram. In such situations, forcing a single diagram easily leads to false modeling\nassumptions and, consequently, misleading inferences.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fV2\n\nV1\n\nV2\n\nV1\n\nV4\n\nV3\n\nv\n\nv\n\nX\n\n(b)\n\nX\n\n(a)\n\n\u25e6\n\u25e6\n\n\u25e6\nV4\n\u25e6\nV3\n\nFigure 1: Causal diagram (left) and the inferred\nPAG (right).\n\nInstead of specifying the causal diagram based\non expert knowledge, one may adopt a more\ndata-driven approach and attempt to learn it\nfrom data. However, from observational data,\nit is common that only a Markov equivalence\nclass of causal diagrams can be consistently esti-\nmated (Verma, 1993; Spirtes et al., 2001; Zhang,\n2008b). A distinguished characterization of the\nMarkov equivalence class uses partial ancestral\ngraphs (PAGs). Fig. 1b shows the PAG learnable\nfrom observational data that is consistent with the causal diagram depicted in Fig. 1a. The directed\nedges in a PAG represent causal relations (that are not necessarily direct) and the circle marks stand\nfor structural uncertainty. Labeled edges (with v) signify the absence of unmeasured confounders.\nIn this work, we study the problem of using invariant structural features in a Markov equivalence\nclass (learnable from observational data) to identify conditional causal effects. Identi\ufb01cation from an\nequivalence class is considerably more challenging than from a single diagram due to the structural\nuncertainties. Zhang (2007) extended Pearl\u2019s do-calculus to PAGs. However, it is computationally\nhard to decide whether there exists (and, if so, to \ufb01nd) a sequence of derivations in the generalized\ncalculus to identify the effect of interest. More recently, a complete algorithm was devised for\nidentifying unconditional causal effects given a PAG (Jaber et al., 2019a). This algorithm can be used\nto identify conditional effects of the form Px(y|z) whenever the joint effect Px(y, z) is identi\ufb01able.\nHowever, as we will show, many conditional effects are identi\ufb01able while the corresponding joint\neffect is not.1 Speci\ufb01cally, we make the following contributions:\n\n1. We establish a novel decomposition that serves to reduce a targeted conditional causal\n\ndistribution into components that are easier to identify.\n\n2. Based on the decomposition, we develop an algorithm to compute the effect of an arbitrary\nset of intervention variables on an arbitrary outcome set while conditioning on a third disjoint\nset, from a PAG and an observational distribution. We show that this algorithm subsumes\nthat of (Jaber et al., 2019a).\n\n2 Preliminaries\n\nIn this section, we introduce the basic setup and notations. Boldface capital letters denote sets of\nvariables, while boldface lowercase letters stand for value assignments to those variables.\n\nStructural Causal Models. We use Structural Causal Models (SCMs) (Pearl, 2000, pp. 204-207)\nas our basic semantical framework. Formally, an SCM M is a 4-tuple (cid:104)U, V, F, P (U)(cid:105), where U is\na set of exogenous (latent) variables and V is a set of endogenous (measured) variables. F represents\na collection of functions F = {fi} such that each endogenous variable Vi \u2208 V is determined by a\nfunction fi \u2208 F, where fi is a mapping from the respective domain of Ui \u222a Pai to Vi, Ui \u2286 U,\nPai \u2286 V \\ Vi. The uncertainty is encoded through a probability distribution over the exogenous\nvariables, P (U). Every SCM is associated with one causal diagram where every variable V \u222a U is\na node, and an arrow is drawn from each member of Ui \u222a Pai to Vi. Following standard practice,\nwhen drawing a causal diagram, we omit the exogenous nodes and add a bi-directed arc between\ntwo endogenous nodes if they share an exogenous parent. We restrict our study to recursive systems,\nwhich means that the corresponding diagram will be acyclic. The marginal distribution induced\nover the endogenous variables P (V) is called observational, and factorizes according to the causal\ndiagram, i.e.:\n\n(cid:88)\n\n(cid:89)\n\nP (v) =\n\nP (vi|pai, ui)P (u)\n\n(1)\n\nu\n\ni\n\nWithin the structural semantics, performing an action X=x is represented through the do-operator,\ndo(X = x), which encodes the operation of replacing the original equation for X by the constant x\nand induces a submodel Mx. The resulting distribution is denoted by Px, which is the main target for\nidenti\ufb01cation in this paper. For details on structural models, we refer readers to (Pearl, 2000).\n\n1Another approach is based on SAT (Boolean constraint satisfaction) solvers (Hyttinen et al., 2015). Given\n\nits somewhat distinct nature, a closer comparison lies outside the scope of this paper.\n\n2\n\n\fAncestral Graphs. We now introduce a graphical representation of equivalence classes of causal\ndiagrams. A mixed graph can contain directed and bi-directed edges. A is an ancestor of B if there\nis a directed path from A to B. A is a spouse of B if A \u2194 B is present. An almost directed cycle\nhappens when A is both a spouse and an ancestor of B. An inducing path is a path on which every\nnode (except for the endpoints) is a collider on the path (i.e., both edges incident to the node are\ninto it) and is an ancestor of an endpoint of the path. A mixed graph is ancestral if it does not\ncontain directed or almost directed cycles. It is maximal if there is no inducing path between any\ntwo non-adjacent nodes. A Maximal Ancestral Graph (MAG) is a graph that is both ancestral and\nmaximal (Richardson and Spirtes, 2002).\nIn general, a causal MAG represents a set of causal diagrams with the same set of observed variables\nthat entail the same conditional independence and ancestral relations among the observed variables.\nDifferent MAGs may be Markov equivalent in that they entail the exact same independence model.\nA partial ancestral graph (PAG) represents an equivalence class of MAGs [M], which shares the\nsame adjacencies as every MAG in [M] and displays all and only the invariant edge marks (i.e., edge\nmarks that are shared by all members of [M]). A circle indicates an edge mark that is not invariant.\nA PAG is learnable from the independence model over the observed variables, and the FCI algorithm\nis a standard method to learn such an object (Zhang, 2008b). In short, a PAG represents a class of\ncausal diagrams with the same observed variables that entail the same independence model over the\nobserved variables.\n\nGraphical Notions. Given a causal diagram, a MAG, or a PAG, a path between X and Y is\npotentially directed (causal) from X to Y if there is no arrowhead on the path pointing towards X. Y\nis called a possible descendant of X and X a possible ancestor of Y if there is a potentially directed\npath from X to Y . Y is called a possible child of X and X a possible parent of Y if they are adjacent\nand the edge is not into X. For a set of nodes X, let Pa(X) (Ch(X)) denote the union of X and the\nset of possible parents (children) of X, and let An(X) denote the union of X and the set of possible\nancestors of X. Let Pa\u2217(X) denote Pa(X) excluding the possible parents of X due to circle edges\n(\u25e6\u2212\u25e6). Similarly, Ch\u2217(X) denotes Ch(X) excluding the possible children of X due to circle edges.\nFor convenience, we use an asterisk (*) as a wildcard to denote any possible mark of a PAG (\u25e6, >,\u2212)\nor a MAG (>,\u2212). If the edge marks on a path between X and Y are all circles, we call the path a\ncircle path. We refer to the closure of nodes connected with circle paths as a bucket. Obviously, given\na PAG, nodes are partitioned into a unique set of buckets.\nA directed edge X \u2192 Y in a MAG or a PAG is visible if there exists no causal diagram in the\ncorresponding equivalence class where there is an inducing path between X and Y that is into X.\nThis implies that a visible edge is not confounded (X \u2190\u2212\u2192 Y doesn\u2019t exist). Which directed edges\nare visible is easily decidable by a graphical condition (Zhang, 2008a), so we simply mark visible\nedges by v. For brevity, we refer to any edge that is not a visible directed edge as invisible.\n\nIdenti\ufb01cation in Causal Diagrams. Tian and Pearl (2002) introduced a decomposition of a causal\ndiagram into a set of so-called c-components (confounded components).\nDe\ufb01nition 1 (C-Component). In a causal diagram, two nodes are said to be in the same c-component\niff they are connected by a bi-directed path, i.e., a path composed solely of bi-directed edges.\n\nThe signi\ufb01cance of c-components and their decomposition is evident from (Tian, 2004, Lemmas 2,\n3), which are the basis for the proposed algorithm for identifying conditional causal effects. For any\nset C \u2286 V, Q[C] denotes the post-intervention distribution of C under an intervention on V \\ C.\n(2)\n\nP (vi|pai, ui)P (u)\n\n(cid:89)\n(cid:88)\nof the causal diagram D over C. That is, Q[C] =(cid:81)\n\nQ[C] = Pv\\c(c) =\n\nu\n\n{i|Vi\u2208C}\n\nObviously, Q[C] functionally depends on C and the corresponding parents, i.e., Pa(C). Moreover,\nQ[C] decomposes into a product of sub-queries over the c-components in DC, the induced subgraph\n\ni Q[Ci], where Ci is a c-component in DC.\n\n3 Unconditional Causal Effect\n\nIn this section, we review the techniques developed in (Jaber et al., 2019a) for identifying uncondi-\ntional causal effects. The notion of pc-component (Def. 2) in MAGs and PAGs generalizes that of\n\n3\n\n\fAlgorithm 1 IDP(x, y) given PAG P\nInput: two disjoint sets X, Y \u2282 V\nOutput: Expression for Px(y) or FAIL\n\n2: Px(y) =(cid:80)\n\n1: Let D = An(Y)PV\\X\n\nd\\y IDENTIFY(D, V, P )\n3: function IDENTIFY(C, T, Q = Q[T])\n4:\n5:\n\nif C = \u2205 then return 1\nif C = T then return Q\n\n/* In PT, let B denote a bucket, and let C B denote the pc-component of B */\n\n6:\n7:\n8:\n9:\n\n10:\n11:\n12:\n\nif \u2203B \u2282 T \\ C such that C B \u2229 Ch(B) \u2286 B then\n\nCompute Q[T \\ B] from Q;\nreturn IDENTIFY(C,T \\ B,Q[T \\ B])\n\nelse if \u2203B \u2282 C such that RB (cid:54)= C then\n\nreturn IDENTIFY(RB,T,Q) \u00b7 IDENTIFY(RC\\RB ,T,Q)\n\nIDENTIFY(RB\u2229RC\\RB ,T,Q)\n\nelse\n\nthrow FAIL\n\n(cid:46) via Proposition 2\n\n(cid:46) by Proposition 3\n\nc-component in a causal diagram. Being in the same pc-component is a necessary condition for two\nnodes to be in the same c-component in some causal diagram in the corresponding equivalence class\n(Prop. 1). As a special case of Def. 2, two nodes are in the same de\ufb01nite c-component (dc-component)\nif they are connected with a bi-directed path, i.e., a path composed solely of bi-directed edges.\nDe\ufb01nition 2 (PC-Component). In a MAG, a PAG, or any induced subgraph thereof, two nodes are\nin the same possible c-component (pc-component) if there is a path between them such that (1) all\nnon-endpoint nodes along the path are colliders, and (2) none of the edges is visible.\nProposition 1. Let P be a MAG or a PAG over V, and D be any causal diagram in the equivalence\nclass represented by P. For any X, Y \u2208 A \u2286 V, if X and Y are in the same c-component in DA,\nthen X and Y are in the same pc-component in PA.\nUsing the above notions, the following identi\ufb01cation criterion is derived where the intervention is on\na bucket rather than a single node and the input distribution is possibly interventional. The expression\ndepends on a partial topological order (PTO) over the nodes, which is a topological order over the\nbuckets. A detailed discussion can be found in (Jaber et al., 2018).\nProposition 2. Let P denote a PAG over V, T be a union of a subset of the buckets in P, and X \u2282 T\nbe a bucket. Given Pv\\t (i.e., Q[T]), and a partial topological order B1 < \u00b7\u00b7\u00b7 < Bm with respect\nto PT, Q[T \\ X] is identi\ufb01able if and only if, in PT, there does not exist Z \u2208 X such that Z has a\npossible child C /\u2208 X that is in the pc-component of Z. If identi\ufb01able, then the expression is given by\n\n(cid:81){i|Bi\u2286SX} Pv\\t(Bi|B(i\u22121))\n\nPv\\t\n\n\u00d7(cid:88)\n\n(cid:89)\n\nx\n\n{i|Bi\u2286SX}\n\nPv\\t(Bi|B(i\u22121)),\n\nZ\u2208X SZ, SZ being the dc-component of Z in PT, and B(i\u22121) denoting the set of\n\nnodes preceding bucket Bi in the partial order.\nFor example, given the PAG in Fig. 1b, X is not in the same pc-component with any of its possible\nchildren V3, V4, hence Px(v1, . . . , v4) is computable from the observational distribution P (v). An-\nother important result is decomposing a target quantity Q[C] into a product of smaller quantities.\nSuch a decomposition is obtained in Proposition 3 using the Region construct (Def. 3).\nDe\ufb01nition 3 (Region RC\nA). Given a PAG or a MAG P over V, and A \u2286 C \u2286 V. Let the region of\nA with respect to C, denoted RC\nA, be the union of the buckets that contain nodes in the pc-component\nof A in the induced subgraph PC.\nProposition 3. Given a PAG P over V and set C \u2286 V, Q[C] can be decomposed as follows.\n\nQ[T \\ X] =\n\nwhere SX =(cid:83)\n\nwhere A \u2282 C and R(.) = RC\n(.).\n\nQ[C] =\n\nQ[RA].Q[RC\\RA]\nQ[RA \u2229 RC\\RA ]\n\n4\n\n\fX\n\nZ\n\n\u25e6 \u25e6 \u25e6 \u25e6\n(a) Px(y|z).\n\nY\n\n\u25e6\n\u25e6\n\u25e6\n\u25e6\nX\nY\n\u25e6 \u25e6\n\u25e6 \u25e6\n\u25e6\n\u25e6\nZ\nW\n(b) Px(y|z, w).\n\nX\n\n\u25e6\nZ2\n\n\u25e6 \u25e6\n\nW\n\n\u25e6\n\nZ2\n\nv\n\nZ1\n\n\u25e6\n(c) Px(y|z1, z2).\n\n\u25e6\n\nX\n\nY\n\nZ1\n\nv\n\nY\n\n(d) Px(y|z1, z2).\n\nFigure 2: Sample PAGs with identi\ufb01able conditional causal effects.\n\nPropositions 2 and 3 are utilized in Algorithm 1 which is sound and complete for identifying\nunconditional causal effects given a PAG (Jaber et al., 2019a).\n\n4 Conditional Causal Effects\n\nWe formalize the notion of identi\ufb01ability from a PAG using the following de\ufb01nition, which generalizes\nthe causal-diagram-speci\ufb01c notion (Tian, 2004).\nDe\ufb01nition 4 (Causal-Effect Identi\ufb01ability). The causal effect of a set of variables X on a disjoint set\nof variables Y conditioned on another set Z is said to be identi\ufb01able from a PAG P if the quantity\nPx(y|z) can be computed uniquely from the observational distribution P (V) given every causal\ndiagram D (represented by a MAG) in the Markov equivalence class represented by P.\nGiven a PAG P and a conditional causal effect Px(y|z), we can rewrite the quantity as follows.\nHence, if Px(y, z) is identi\ufb01able, then Px(y|z) is identi\ufb01able as well.\n\n(cid:80)\n\nPx(y|z) =\n\nPx(y, z)\ny Px(y, z)\n\nFor example, Pz1(y, z2) is identi\ufb01able in the PAG of Figure 2c with the following (simpli\ufb01ed)\nexpression via Algorithm 1. Hence, both Pz1(y|z2) and Pz1(z2|y) are identi\ufb01able.\n\nPz1(y, z2) = Q[Y, Z2] = P (y|z1, z2)P (z2)\n\nHowever, not all identi\ufb01able conditional effects can be identi\ufb01ed this way. Consider the PAG in Fig. 2d\nand the conditional effect Px(y|z1, z2). Whereas Px(y, z1, z2) is not identi\ufb01able by Algorithm 1 and\nhence the conditional effect is not identi\ufb01able simpliciter, Px(y|z1, z2) turns out to be identi\ufb01able\nas we show later. Therefore, Algorithm 1, though complete for identifying unconditional effects, is\nunable to compute many identi\ufb01able conditional effects.\nTo do better, we start by generalizing the notion of Q[\u00b7] to accommodate conditioning.\nDe\ufb01nition 5. For any pair of disjoint sets C, Z \u2286 V, we de\ufb01ne the quantity Q[C|Z], given below, to\nbe the post-intervention distribution of C conditional on Z under an intervention on V \\ (C \u222a Z).\n\nQ[C|Z] =\n\n(cid:80)\nQ[C \u222a Z]\nc Q[C \u222a Z]\n\nIn what follows, we utilize De\ufb01nition 5 to derive an algorithm for conditional causal effect identi\ufb01-\ncation. The following proposition shows a way to rewrite a given conditional effect in terms of the\nnotion in De\ufb01nition 5.2\nProposition 4. Given distribution P (V), causal PAG P over V, and target effect Px(y|z) where\nX, Y, Z are disjoint subsets of V, we have the following.\n\nQ[D|Z]\n\n(3)\n\nPx(y|z) =\n\nwhere D = An(Y \u222a Z)PV\\X \\ Z.\n\n2The proofs can be found in (Jaber et al., 2019b).\n\n(cid:88)\n\nd\\y\n\n5\n\n\fAlgorithm 2 Recursive routine to decompose Q[T|Z].\n1: function DECOMPOSE(P, T, Z)\n2:\n\nif T = \u2205 then return \u2205\n\n/* In PT\u222aZ, let C (\u00b7) denote the pc-component of (\u00b7) in PT\u222aZ. */\n\nInitialize X to an arbitrary node in T\nLet A = Pa\u2217(C X) \u2229 Pa\u2217(C T\u222aZ\\CX\n)\nwhile A (cid:54)\u2286 Z do\n\nX = X \u222a Ch\u2217(A \u2229 T)\nA = Pa\u2217(C X) \u2229 Pa\u2217(C T\u222aZ\\CX\n\n)\n/* Let T1 = C X \u2229 T and T2 = T \\ T1 */\n\n3:\n4:\n5:\n6:\n7:\n\n8:\n\nreturn (cid:104)T1,RX \\ T1(cid:105) \u222a DECOMPOSE(P, T2,RT\u222aZ\\CX \\ T2)\n\ncausal effect as(cid:80)\n\nw Q[Y, W|Z1, Z2].\n\nFor example, given the PAG in Figure 2d and query Px(y|z1, z2), we can rewrite the conditional\n\nThe following fact plays a crucial role in the derivation of our algorithm.\nLemma 1. Given a PAG P over V and any causal diagram D in the equivalence class represented\nby P, suppose X \u2282 A \u2286 V, and let SX and C X denote the c-component and pc-component of X in\nDA and PA, respectively. Then, for every Y \u2208 A, if Y \u2208 Pa(SX) in DA, then Y \u2208 Pa\u2217(C X) in PA,\nwhere Pa\u2217(\u00b7) is (the union of the input set and) the set of possible parents due to directed or partially\ndirected edges (\u2192 , \u25e6\u2192).\nIn words, given a PAG P and any diagram D in the equivalence class, if a node Y is a parent of the\nc-component of X in DA, then Y must be either in the pc-component of X in PA or a possible parent\nof the pc-component by a non-circle edge. For example, given the PAG in Figure 2a, C X = {X, Z}\nand Y (cid:54)\u2208 Pa\u2217(C X ), hence Y (cid:54)\u2208 Pa(SX ) in any causal diagram in the equivalence class. It is easy\nto see why in this simple example. First, X is not in the pc-component of Y so they are not in the\nsame c-component in any causal diagram, by Proposition 1. If X and Z are in the same c-component\nin some diagram D and Y is a parent of Z, then we have an (unshielded) collider at Z in D, which\nwould contradict the given PAG. This observation generalizes to more complex cases. Note that the\nproperty does not necessarily hold if the input to Pa(\u00b7) and Pa\u2217(\u00b7) are arbitrary subsets of V rather\nthan a c-component and a pc-component.\nNext, we derive a suf\ufb01cient condition for decomposing Q[T|Z] into two sub-queries.\nProposition 5. Given a PAG P over V and Q[T|Z], let X \u2282 T \u222a Z. The following decomposition\nholds if Pa\u2217(C X) \u2229 Pa\u2217(C T\u222aZ\\CX\n) \u2286 Z, where C (\u00b7) is the set of nodes in the pc-component of (\u00b7)\nin PT\u222aZ, R(\u00b7) is with respect to T \u222a Z, T1 = C X \u2229 T, and T2 = T \\ T1.\nQ[T|Z] = Q[T1|RX \\ T1] \u00b7 Q[T2|RT\u222aZ\\CX \\ T2]\n\nFor example, given Q[Y, W|Z1, Z2] and the PAG in Figure 2d, C Y = {Y, Z2}, Pa\u2217(C Y ) =\n{Y, Z2, Z1}, and Pa\u2217(C{W,Z1}) = {W, Z1, Z2}. Hence, Pa\u2217(C Y ) \u2229 Pa\u2217(C{W,Z1}) = {Z1, Z2}\nand the condition of Prop. 5 is satis\ufb01ed. So, we have the following decomposition.\n\nQ[Y, W|Z1, Z2] = Q[Y |Z2, W ] \u00b7 Q[W|Z1, Z2]\n\n(4)\nIt is important to note that the condition is based on the pc-component of X \u2282 T \u222a Z while the Q[\u00b7]\ndecomposition uses the region of X (Def. 3). The decomposition would still be valid by using the\npc-component instead of the region, but using the region has the advantage of keeping together nodes\nin the same bucket (i.e., nodes that share circle edges). For instance, using the region allows us to\nkeep W and Z2 together in each sub-query. This will be useful in the \ufb01nal algorithm.\nAlgorithm 2 decomposes Q[T|Z] into a product of sub-queries by applying Prop. 5 recursively.\nIn each iteration, the routine \ufb01nds a subset X that satis\ufb01es the criterion in the proposition (cf.\nline 5). The \ufb01rst line checks for a base case where T = \u2205. For example, given Q[Y |Z1, Z2]\nand the PAG in Figure 2c, the function assigns X to {Y }. Since Pa\u2217(C X) = {Y, Z2, Z1} and\nPa\u2217(C Z1) = {Z1, Z2}, their intersection satis\ufb01es the criterion. Hence, Q[Y |Z1, Z2] = Q[Y |Z2] \u00d7\n\n6\n\n\fAlgorithm 3 CIDP(x, y, z) given PAG P\nInput: three disjoint sets X, Y, Z \u2282 V\nOutput: Expression for Px(y|z) or FAIL\n\n1: Let D = An(Y \u222a Z)PV\\X \\ Z\nd\\y Q[D|Z]\n3: F = DECOMPOSE(P, D, Z)\n\n2: Px(y|z) =(cid:80)\n/* At this point, Px(y|z) =(cid:80)\n\n4: Let F\u2217 = \u2205\n5: for each (cid:104)Di, Zi(cid:105) \u2208 F do\nif Di \u2229 Y (cid:54)= \u2205 then\n6:\n7:\n\n8: Px(y|z) =(cid:81){i|(cid:104)Di,Zi(cid:105)\u2208F\u2217}(cid:80)\n\n9: function DO-SEE(P, T, Z)\n\nF\u2217 = F\u2217 \u222a DO-SEE(P, Di, Zi)\n\n(cid:81)\ni Q[Di|Zi] =(cid:81)\n\ni\n\n(cid:46) Expand query via Prop. 4\n(cid:46) F is a set of pairs (cid:104)Di, Zi(cid:105)\n\n(cid:80)\ndi\\y Q[Di|Zi] */\n\nd\\y\n\n(cid:80)\n\ndi\\y\n\nIDENTIFY(Di \u222a Zi, V, P )\n\nIDENTIFY(Di \u222a Zi, V, P )\n\ndi\n\n/* Let B denote a bucket in P and C (\u00b7) denote the pc-component of (\u00b7) in PT\u222aZ\u222aB */\n\nif \u2203B | B \u2229 (T \u222a Z) (cid:54)= \u2205 \u2227 B (cid:54)\u2286 (T \u222a Z) then\n\n10:\n11:\n12:\n13:\n14:\n15:\n\nif Pa\u2217(C B\\(T\u222aZ)) \u2229 T = \u2205 then\nelse\n\nreturn DO-SEE(P, T, Z \u222a B \\ T)\n\nthrow FAIL\n\nreturn (cid:104)T, Z(cid:105)\n\nQ[\u2205|Z1, Z2] where Q[\u2205|Z1, Z2] = 1 by de\ufb01nition. The base case accounts for the recursive call\nDECOMPOSE(P,\u2205,{Z1, Z2}) which yields Q[\u2205|Z1, Z2] = 1. In general, this step simpli\ufb01es a target\nQ[\u00b7] and facilitates its computation.\nTo derive our identi\ufb01cation algorithm, we use one more trick. Lemma 2 below provides a suf\ufb01cient\ncriterion where given Q[T|Z] and a causal diagram D, we can move a subset X from the intervention\nset V \\ (T \u222a Z) to the conditioning set.\nLemma 2. Given causal diagram D and Q[T|Z], let X \u2286 V \\ (T \u222a Z) and let SX denote the\nc-component of X in DT\u222aZ\u222aX. If Pa(SX) \u2229 T = \u2205, then Q[T|Z] = Q[T|Z \u222a X].\nThe following proposition generalizes the result in Lemma 2 to PAGs using the property in Lemma 1.\nProposition 6. Given PAG P and Q[T|Z], let X \u2286 V\\(T\u222aZ) and let C X denote the pc-component\nof X in PT\u222aZ\u222aX. If Pa\u2217(C X) \u2229 T = \u2205, then Q[T|Z] = Q[T|Z \u222a X].\nProof. Let D be any diagram in the equivalence class represented by P. By Lemma 1, if Pa\u2217(C X) \u2229\nT = \u2205 in PT\u222aZ\u222aX, then Pa(SX) \u2229 T = \u2205 in DT\u222aZ\u222aX. Hence, the proposition follows by Lemma 2\nsince the equation is valid for all the diagrams in the equivalence class.\nFor example, given Q[Y |Z] and the PAG in Figure 2a, Pa\u2217(C X ) \u2229 {Y } = Pa\u2217({X, Z}) \u2229 {Y } = \u2205,\nhence Q[Y |Z] = Q[Y |Z, X]. Similarly, given the PAG in Fig. 2b, Q[Y |Z, W ] = Q[Y |Z, W, X].\nFinally, we use the above results to construct Algorithm 3 which identi\ufb01es conditional causal effects.\nThe algorithm is sound by Theorem 1. It starts by computing set D then expanding the query\naccordingly in lines 1-2. Then, CIDP calls Alg. 2 which decomposes Q[D|Z] to sub-queries as the\ncomment below line 3 elaborates. Lines 4-7 achieve two things. First, we drop every unnecessary\nQ[Di|Zi] = 1. For each remaining query Q[Di|Zi],\nfunction DO-SEE(\u00b7,\u00b7,\u00b7) searches recursively for a bucket B in P such that a strict subset of B is\nin Di \u222a Zi, and then tries to apply Prop. 6 to obtain Q[Di|Zi \u222a B \\ Di]. Finally, in line 8, we try\nto compute the target conditional effect by computing each Q[Di|Zi] = Q[Di\u222aZi]\nQ[Di\u222aZi] and calling\nIDENTIFY(Di \u222a Zi, V,P) from Alg. 1. CIDP does not identify the target effect if either DO-SEE(\u00b7)\nor IDENTIFY(\u00b7) throws a FAIL.\nTheorem 1. CIDP (Algorithm 3) is sound.\n\nquery Q[Di|Zi] where Di \u2229 Y = \u2205 since(cid:80)\n\n(cid:80)\n\ndi\n\ndi\n\n7\n\n\f6 drops from F\u2217 every Q[Di|Zi] where Y \u2229 Di = \u2205 since(cid:80)\n\nProof Sketch. Line 2 follows from Proposition 4. Function DECOMPOSE(\u00b7,\u00b7,\u00b7) is sound by Proposi-\ntion 5. The second equivalence in the comment after line 3 is justi\ufb01ed by the proof of Prop. 5. Line\nQ[Di|Zi] = 1. The soundness of\nDO-SEE(\u00b7,\u00b7,\u00b7) follows from Proposition 6. Finally, line 8 is sound by De\ufb01nition 5 and the correctness\nof IDENTIFY(\u00b7,\u00b7,\u00b7) in (Jaber et al., 2019a).\n\ndi\n\nIllustrative Example\n\n4.1\nConsider the effect Px(y|z1, z2) and the PAG in Figure 2d. We have the following from Eq. 4 and\nLines 4-7 of the algorithm. Since {Y },{Z2, W} are buckets in the PAG, DO-SEE(\u00b7) does nothing.\n\n(cid:88)\n\nQ[Y, W|Z1, Z2] = Q[Y |Z2, W ] \u00b7(cid:88)\n\nQ[W|Z1, Z2] = Q[Y |Z2, W ]\n\nPx(y|z1, z2) =\n\nw\n\nw\n\n\u00d7(cid:88)\n\nThen, we call IDENTIFY({Y, Z2, W}, V, P ) to compute Q[{Y, Z2, W}] from P (V). Node Z1 is\nnot in the same pc-component with its only child Y in P. Hence, Q[{Y, Z2, W, X}] is identi\ufb01able\nfrom P (V) by Proposition 2 using the order X < {W, Z2} < Z1 < Y .\nQ[{Y, Z2, W, X}] =\nNext, X does not have any possible children in PV\\{Z1}, hence Q[{Y, Z2, W}] is identi\ufb01able from\nQ[{Y, Z2, W, X}] (Eq. 5) using the partial order X < {W, Z2} < Y . The last equivalence is a\nsimpli\ufb01cation obtained by considering the independence relation (X \u22a5\u22a5 {W, Z2}).\nQ[{Y, Z2, W}] =\n\n1|x, w, z2) = P (x, w, z2) \u00b7 P (y|x, w, z2, z1)\n\nP (x, w, z2)P (y|x, w, z2, z1)\n\n= P (w, z2)P (y|w, z2, z1)\n\nP (z1|x, w, z2)\n\nP (z(cid:48)\n\nP (v)\n\nPz1\n\n(5)\n\nz(cid:48)\n\n1\n\nPz1(x)\n\nP (x)\n\nPz1(x(cid:48)) =\n\n\u00d7(cid:88)\n(cid:80)\nQ[{Y, Z2, W}]\ny Q[{Y, Z2, W}]\n\nx(cid:48)\n\nFinally, the conditional effect simpli\ufb01es as follows.\n\n(cid:80)\nP (w, z2) \u00b7 P (y|w, z2, z1)\ny P (w, z2) \u00b7 P (y|w, z2, z1)\n\n=\n\n= P (y|z2, z1)\n\nPx(y|z1, z2) =\n\n4.2 Expressiveness\n\nTheorem 2 below establishes that CIDP subsumes IDP. Conversely, IDP cannot compute some\nconditional effects that are identi\ufb01able by CIDP, such as the cases depicted in Fig. 2, because the\ncorresponding joint effects are not identi\ufb01able. Hence, CIDP is strictly more powerful than IDP.\nTheorem 2. CIDP (Alg. 3) subsumes IDP (Alg. 1) \u2212 if CIDP fails to identify Px(y), IDP fails too.\nProof. Suppose Z = \u2205. The query expansion reduces to that in Alg. 1. Alg. 2 will decompose Q[D]\nonly if the subsets are disjoint in PD since any adjacency implies the condition of Proposition 5\nwould fail. Such a decomposition is valid for IDP using Prop. 3 where the denominator set would be\nempty. Whenever set B in DO-SEE(\u00b7) exists (line 10), the function fails. We then have B \u2229 X (cid:54)= \u2205\nand there exists a potentially causal path from X to Y that starts with an invisible edge. Hence, Px(y)\nis not identi\ufb01able by (Jaber et al., 2019a, Th. 3), and consequently IDP fails. Finally, CIDP fails if a\ncall to IDENTIFY(\u00b7) fails. It follows that IDP would fail as well which concludes the proof.\n\n5 Conclusion\n\nThe problem of identifying conditional causal effects is of great interest due to its role in evaluating\nconditional plans or policies (Pearl and Robins, 1995). We have investigated a challenging version of\nthis problem where in addition to the observational distribution, the available causal information is\nnot a fully speci\ufb01ed causal diagram, but a PAG which represents a Markov equivalence class of causal\ndiagrams and which can be inferred from the observational distribution. We develop an algorithm\nto compute the effect of an arbitrary set of intervention variables X on an arbitrary outcome set Y\nwhile conditioning on a third disjoint set Z, denoted Px(y|z). We show that the proposed algorithm\nsubsumes the state-of-the-art algorithm in (Jaber et al., 2019a), which is complete for unconditional\neffects. Moreover, CIDP identi\ufb01es all the examples in the literature that we are aware of, including\nthe one in Fig. 2b which is not identi\ufb01able by the generalized do-calculus (Zhang, 2008a). Based on\nthese observations, we conjecture that our algorithm is complete.\n\n8\n\n\fAcknowledgements\n\nBareinboim and Jaber are supported in parts by grants from NSF IIS-1704352, IIS-1750807 (CA-\nREER), IBM Research, and Adobe Research. Zhang\u2019s research was supported in part by the Research\nGrants Council of Hong Kong under the General Research Fund LU13602818.\n\nReferences\nBareinboim, E. and Pearl, J. (2016). Causal inference and the data-fusion problem. Proceedings of\n\nthe National Academy of Sciences, 113:7345\u20137352.\n\nHyttinen, A., Eberhardt, F., and J\u00e4rvisalo, M. (2015). Do-calculus when the true graph is unknown.\n\nIn UAI, pages 395\u2013404.\n\nJaber, A., Zhang, J., and Bareinboim, E. (2018). A graphical criterion for effect identi\ufb01cation in\nequivalence classes of causal diagrams. In Proceedings of the 27th International Joint Conference\non Arti\ufb01cial Intelligence, IJCAI\u201918, pages 5024\u20135030.\n\nJaber, A., Zhang, J., and Bareinboim, E. (2019a). Causal identi\ufb01cation under Markov equivalence:\nCompleteness results. In Proceedings of the 36th International Conference on Machine Learning,\nICML\u201919.\n\nJaber, A., Zhang, J., and Bareinboim, E. (2019b). Identi\ufb01cation of conditional causal effects under\nMarkov equivalence. Technical report, R-50, Columbia CausalAI Lab, Department of Computer\nScience, Columbia University.\n\nPearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82(4):669\u2013688.\n\nPearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge University Press, New\n\nYork. 2nd edition, 2009.\n\nPearl, J. and Robins, J. M. (1995). Probabilistic evaluation of sequential plans from causal models\n\nwith hidden variables. In UAI, volume 95, pages 444\u2013453. Citeseer.\n\nRichardson, T. and Spirtes, P. (2002). Ancestral graph Markov models. Annals of Statistics, pages\n\n962\u20131030.\n\nShpitser, I. and Pearl, J. (2006). Identi\ufb01cation of conditional interventional distributions. In Proceed-\nings of the 22nd Conference on Uncertainty in Arti\ufb01cial Intelligence, UAI 2006, pages 437\u2013444.\n\nSpirtes, P., Glymour, C. N., and Scheines, R. (2001). Causation, prediction, and search, volume 81.\n\nMIT press.\n\nSpirtes, P., Glymour, C. N., Scheines, R., Heckerman, D., Meek, C., Cooper, G., and Richardson, T.\n\n(2000). Causation, prediction, and search. MIT press.\n\nTian, J. (2004). Identifying conditional causal effects. In Proceedings of the 20th conference on\n\nUncertainty in arti\ufb01cial intelligence, pages 561\u2013568. AUAI Press.\n\nTian, J. and Pearl, J. (2002). A general identi\ufb01cation condition for causal effects. In AAAI/IAAI,\n\npages 567\u2013573.\n\nVerma, T. (1993). Graphical aspects of causal models. Technical R eport R-191, UCLA.\n\nZhang, J. (2007). Generalized do-calculus with testable causal assumptions.\n\nConference on Arti\ufb01cial Intelligence and Statistics, pages 667\u2013674.\n\nIn International\n\nZhang, J. (2008a). Causal reasoning with ancestral graphs. Journal of Machine Learning Research,\n\n9(Jul):1437\u20131474.\n\nZhang, J. (2008b). On the completeness of orientation rules for causal discovery in the presence of\n\nlatent confounders and selection bias. Arti\ufb01cial Intelligence, 172(16):1873\u20131896.\n\n9\n\n\f", "award": [], "sourceid": 6138, "authors": [{"given_name": "Amin", "family_name": "Jaber", "institution": "Purdue University"}, {"given_name": "Jiji", "family_name": "Zhang", "institution": "Lingnan University"}, {"given_name": "Elias", "family_name": "Bareinboim", "institution": "Purdue"}]}