{"title": "Graphical Models for Recovering Probabilistic and Causal Queries from Missing Data", "book": "Advances in Neural Information Processing Systems", "page_first": 1520, "page_last": 1528, "abstract": "We address the problem of deciding whether a causal or probabilistic query is estimable from data corrupted by missing entries, given a model of missingness process. We extend the results of Mohan et al, 2013 by presenting more general conditions for recovering probabilistic queries of the form P(y|x) and P(y,x) as well as causal queries of the form P(y|do(x)). We show that causal queries may be recoverable even when the factors in their identifying estimands are not recoverable. Specifically, we derive graphical conditions for recovering causal effects of the form P(y|do(x)) when Y and its missingness mechanism are not d-separable. Finally, we apply our results to problems of attrition and characterize the recovery of causal effects from data corrupted by attrition.", "full_text": "Graphical Models for Recovering Probabilistic\n\nand Causal Queries from Missing Data\n\nKarthika Mohan and Judea Pearl\n\nCognitive Systems Laboratory\nComputer Science Department\n{karthika,judea}@cs.ucla.edu\n\nUniversity of California, Los Angeles, CA 90024\n\nAbstract\n\nWe address the problem of deciding whether a causal or probabilistic query\nis estimable from data corrupted by missing entries, given a model of miss-\ningness process. We extend the results of Mohan et al. [2013] by present-\ning more general conditions for recovering probabilistic queries of the form\nP (y|x) and P (y, x) as well as causal queries of the form P (y|do(x)). We\nshow that causal queries may be recoverable even when the factors in their\nidentifying estimands are not recoverable. Speci\ufb01cally, we derive graphical\nconditions for recovering causal e\ufb00ects of the form P (y|do(x)) when Y and\nits missingness mechanism are not d-separable. Finally, we apply our re-\nsults to problems of attrition and characterize the recovery of causal e\ufb00ects\nfrom data corrupted by attrition.\n\n1 Introduction\n\nAll branches of experimental science are plagued by missing data. Improper handling of\nmissing data can bias outcomes and potentially distort the conclusions drawn from a study.\nTherefore, accurate diagnosis of the causes of missingness is crucial for the success of any re-\nsearch. We employ a formal representation called \u2018Missingness Graphs\u2019 (m-graphs, for short)\nto explicitly portray the missingness process as well as the dependencies among variables in\nthe available dataset (Mohan et al. [2013]). Apart from determining whether recoverabil-\nity is feasible namely, whether there exists any theoretical impediment to estimability of\nqueries of interest, m-graphs can also provide a means for communication and re\ufb01nement\nof assumptions about the missingness process. Furthermore, m-graphs permit us to detect\nviolations in modeling assumptions even when the dataset is contaminated with missing\nentries (Mohan and Pearl [2014]).\n\nIn this paper, we extend the results of Mohan et al. [2013] by presenting general conditions\nunder which probabilistic queries such as joint and conditional distributions can be recov-\nered. We show that causal queries of the type P (y|do(x)) can be recovered even when the\nassociated probabilistic relations such as P (y, x) and P (y|x) are not recoverable. In partic-\nular, causal e\ufb00ects may be recoverable even when Y is not separable from its missingness\nmechanism. Finally, we apply our results to recover causal e\ufb00ects when the available dataset\nis tainted by attrition.\n\nThis paper is organized as follows. Section 2 provides an overview of missingness graphs\nand reviews the notion of recoverability i.e. obtaining consistent estimates of a query,\ngiven a dataset and an m-graph. Section 3 re\ufb01nes the sequential factorization theorem\npresented in Mohan et al. [2013] and extends its applicability to a wider range of problems\nin which missingness mechanisms may in\ufb02uence each other. In section 4, we present general\n\n1\n\n\fFigure 1: Typical m-graph where Vo = {S, X}, Vm = {I, Q}, V \u2217 = {I\u2217, Q\u2217}, R = {Ri, Rq}\nand U is the latent common cause. Members of Vo and Vm are represented by full and hollow\ncircles respectively. The associated missingness process and assumptions are elaborated in\nappendix 10.1.\n\nalgorithms to recover joint distributions from the class of problems for which sequential\nfactorization theorem fails. In section 5, we introduce new graphical criteria that preclude\nrecoverability of joint and conditional distributions. In section 6, we discuss recoverability\nof causal queries and show that unlike probabilistic queries, P (y|do(x)) may be recovered\neven when Y and its missingness mechanism (Ry) are not d-separable.\nIn section 7, we\ndemonstrate how we can apply our results to problems of attrition in which missingness is a\nsevere obstacle to sound inferences. Related works are discussed in section 8 and conclusions\nare drawn in section 9. Proofs of all theoretical results in this paper are provided in the\nappendix.\n\n2 Missingness Graph and Recoverability\n\nMissingness graphs as discussed below was \ufb01rst de\ufb01ned in Mohan et al. [2013] and we adopt\nthe same notations. Let G(V, E) be the causal DAG where V = V \u222a U \u222a V \u2217 \u222a R. V is the\nset of observable nodes. Nodes in the graph correspond to variables in the data set. U is\nthe set of unobserved nodes (also called latent variables). E is the set of edges in the DAG.\nWe use bi-directed edges as a shorthand notation to denote the existence of a U variable\nas common parent of two variables in V \u222a R. V is partitioned into Vo and Vm such that\nVo \u2286 V is the set of variables that are observed in all records in the population and Vm \u2286 V\nis the set of variables that are missing in at least one record. Variable X is termed as fully\nobserved if X \u2208 Vo, partially observed if X \u2208 Vm and substantive if X \u2208 Vo\u222a Vm. Associated\nwith every partially observed variable Vi \u2208 Vm are two other variables Rvi and V \u2217\ni , where\nV \u2217\nis a proxy variable that is actually observed, and Rvi represents the status of the causal\nmechanism responsible for the missingness of V \u2217\n\ni\n\ni ; formally,\n\n(cid:26) vi\n\nm\n\nv\u2217\ni = f (rvi, vi) =\n\nif rvi = 0\nif rvi = 1\n\n(1)\n\nV \u2217 is the set of all proxy variables and R is the set of all causal mechanisms that are\nresponsible for missingness. R variables may not be parents of variables in V \u222a U . We\ncall this graphical representation Missingness Graph (or m-graph). An example of an\nm-graph is given in Figure 1 (a).We use the following shorthand. For any variable X, let\nX(cid:48) be a shorthand for X = 0. For any set W \u2286 Vm \u222a Vo \u222a R, let Wr, Wo and Wm be the\nshorthand for W \u2229 R, W \u2229 Vo and W \u2229 Vm respectively. Let Rw be a shorthand for RVm\u2229W\ni.e. Rw is the set containing missingness mechanisms of all partially observed variables in\nW . Note that Rw and Wr are not the same. GX and GX represent graphs formed by\nremoving from G all edges leaving and entering X, respectively.\nA manifest distribution P (Vo, V \u2217, R) is the distribution that governs the available dataset.\nAn underlying distribution P (Vo, Vm, R) is said to be compatible with a given manifest\ndistribution P (Vo, V \u2217, R) if the latter can be obtained from the former using equation 1.\nManifest distribution Pm is compatible with a given underlying distribution Pu if \u2200X, X \u2286\n\n2\n\nRQRII*Q*Experience (X)Income (I)Missingness Mechanismof IncomeProxy variable for IncomeULatent VariableSex (S)Qualifcation (Q)\fFigure 2: (a) m-graph in which P (V ) is recoverable by the sequential factorization (b) &\n(c): m-graphs for which no admissible sequence exists.\n\nVm and Y = Vm \\ X, the following equality holds true.\nx, Ry, X\u2217, Y \u2217, Vo) = Pu(R(cid:48)\n\nPm(R(cid:48)\n\nx, Ry, X, Vo)\n\nwhere R(cid:48)\n\nx denotes Rx = 0 and Ry denotes Ry = 1. Refer Appendix 10.2 for an example.\n\n2.1 Recoverability\nGiven a manifest distribution P (V \u2217, Vo, R) and an m-graph G that depicts the missingness\nprocess, query Q is recoverable if we can compute a consistent estimate of Q as if no data\nwere missing. Formally,\n\nDe\ufb01nition 1 (Recoverability (Mohan et al. [2013])). Given a m-graph G, and a target\nrelation Q de\ufb01ned on the variables in V , Q is said to be recoverable in G if there exists an\nalgorithm that produces a consistent estimate of Q for every dataset D such that P (D) is\n(1) compatible with G and (2) strictly positive1 i.e. P (Vo, Vm, R) > 0.\n\nFor an introduction to the notion of recoverability see, Pearl and Mohan [2013] and Mohan\net al. [2013].\n\n3 Recovering Probabilistic Queries by Sequential Factorization\n\nMohan et al. [2013] (theorem-4) presented a su\ufb03cient condition for recovering probabilistic\nqueries such as joint and conditional distributions by using ordered factorizations. However,\nthe theorem is not applicable to certain classes of problems such as those in longitudinal\nstudies in which edges exist between R variables. General ordered factorization de\ufb01ned\nbelow broadens the concept of ordered factorization (Mohan et al. [2013]) to include the set of\nR variables. Subsequently, the modi\ufb01ed theorem (stated below as theorem 1) will permit us\nto handle cases in which R variables are contained in separating sets that d-separate partially\nobserved variables from their respective missingness mechanisms (example: X\u22a5\u22a5Rx|Ry in\n\ufb01gure 2 (a)).\nDe\ufb01nition 2 (General Ordered factorization). Given a graph G and a set O of ordered V \u222aR\nvariables Y1 < Y2 < . . . < Yk, a general ordered factorization relative to G, denoted by f (O),\ni P (Yi|Xi) where Xi \u2286 {Yi+1, . . . , Yn} is\na minimal set such that Yi\u22a5\u22a5({Yi+1, . . . , Yn} \\ Xi)|Xi holds in G.\n\nis a product of conditional probabilities f (O) =(cid:81)\n\nTheorem 1 (Sequential Factorization ). A su\ufb03cient condition for recoverability of a rela-\ntion Q de\ufb01ned over substantive variables is that Q be decomposable into a general ordered\nfactorization, or a sum of such factorizations, such that every factor Qi = P (Yi|Xi) satis-\n\ufb01es, (1) Yi\u22a5\u22a5(Ryi, Rxi)|Xi \\ {Ryi, Rxi}, if Yi \u2208 (Vo \u222a Vm) and (2) Rz\u22a5\u22a5RXi|Xi if Yi = Rz\nfor any Z \u2208 Vm, Z /\u2208 Xi and Xr \u2229 RXm = \u2205.\nAn ordered factorization that satis\ufb01es the condition in Theorem 1 is called an admissible\nsequence.\n\nThe following example illustrates the use of theorem 1 for recovering the joint distribution.\nAdditionally, it sheds light on the need for the notion of minimality in de\ufb01nition 2.\n\n1An extension to datasets that are not strictly positive is sometimes feasible(Mohan et al. [2013]).\n\n3\n\nRWRXRYRZXZWYRWRZRXRYRXRYRZXZWYYZX(a)(b)(c)\fExample 1. We are interested in recovering P (X, Y, Z) given the m-graph in Figure 2\n(a). We discern from the graph that de\ufb01nition 2 is satis\ufb01ed because: (1) P (Y |X, Z, Ry) =\nP (Y |X, Z) and (X, Z) is a minimal set such that Y \u22a5\u22a5({X, Z, Ry} \\ (X, Z))|(X, Z), (2)\nP (X|Ry, Z) = P (X|Ry) and Ry is the minimal set such that X\u22a5\u22a5({Ry, Z} \\ Ry)|Ry\nand (3) P (Z|Ry) = P (Z) and \u2205 is the minimal set such that Z\u22a5\u22a5Ry|\u2205. Therefore,\nthe order Y < X < Z < Ry induces a general ordered factorization P (X, Y, Z, Ry) =\nP (Y |X, Z)P (X|Ry)P (Z)P (Ry). We now rewrite P (X, Y, Z) as follows:\n\nP (X, Y, Z) =\n\nP (Y, X, Z, Ry) = P (Y |X, Z)P (Z)\n\nRy\n\nRy\n\nP (X|Ry)P (Ry)\n\n(cid:88)\n\nSince Y \u22a5\u22a5Ry|X, Z, Z\u22a5\u22a5Rz, X\u22a5\u22a5Rx|Ry, by theorem 1 we have,\n\n(cid:88)\n\n(cid:88)\n\nRy\n\nP (X, Y, Z) = P (Y |X, Z, R(cid:48)\n\nx, R(cid:48)\n\ny, R(cid:48)\n\nz)P (Z|R(cid:48)\nz)\n\nP (X|R(cid:48)\n\nx, Ry)P (Ry)\n\nIndeed, equation 1 permits us to rewrite it as:\ny, R(cid:48)\n\nP (X, Y, Z) = P (Y \u2217|X\u2217, Z\u2217, R(cid:48)\n\nx, R(cid:48)\n\nz)P (Z\u2217|R(cid:48)\nz)\n\n(cid:88)\n\nRy\n\nP (X\u2217|R(cid:48)\n\nx, Ry)P (Ry)\n\nP (X, Y, Z) is recoverable because every term in the right hand side is consistently estimable\nfrom the available dataset.\n\nin de\ufb01nition 2 and chosen to factorize\nHad we ignored the minimality requirement\nY < X < Z < Ry using the chain rule, we would have obtained: P (X, Y, Z, Ry) =\nP (Y |X, Z, Ry)P (X|Z, Ry)P (Z|Ry)P (Ry) which is not admissible since X\u22a5\u22a5(Rz, Rx)|Z does\nnot hold in the graph. In other words, existence of one admissible sequence based on an order\nO of variables does not guarantee that every factorization based on O is admissible; it is for\nthis reason that we need to impose the condition of minimality in de\ufb01nition 2.\n\nThe recovery procedure presented in example 1 requires that we introduce Ry into the order.\nIndeed, there is no ordered factorization over the substantive variables {X, Y, Z} that will\npermit recoverability of P (X, Y, Z) in \ufb01gure 2 (a). This extension of Mohan et al. [2013]\nthus permits the recovery of probabilistic queries from problems in which the missingness\nmechanisms interact with one another.\n\n4 Recoverability in the Absence of an Admissible Sequence\n\nMohan et al. [2013] presented a theorem (refer appendix 10.4) that stated the necessary and\nsu\ufb03cient condition for recovering the joint distribution for the class of problems in which the\nparent set of every R variable is a subset of Vo\u222aVm. In contrast to Theorem 1, their theorem\ncan handle problems for which no admissible sequence exists. The following theorem gives a\ngeneralization and is applicable to any given semi-markovian model (for example, m-graphs\nin \ufb01gure 2 (b) & (c)). It relies on the notion of collider path and two new subsets, R(part):\nthe partitions of R variables and M b(R(i)): substantive variables related to R(i), which we\nwill de\ufb01ne after stating the theorem.\nTheorem 2. Given an m-graph G in which no element in Vm is either a neighbor of its\nmissingness mechanism or connected to its missingness mechanism by a collider path, P (V )\nis recoverable if no M b(R(i)) contains a partially observed variable X such that Rx \u2208 R(i)\ni.e. \u2200i, R(i) \u2229 RM b(R(i)) = \u2205. Moreover, if recoverable, P (V ) is given by,\n\n(cid:81)\ni P (R(i) = 0|M b(R(i)), RM b(R(i)) = 0)\n\nP (V, R = 0)\n\nP (V ) =\n\nIn theorem 2:\n(i) collider path p between any two nodes X and Y is a path in which every intermediate\nnode is a collider. Example, X \u2192 Z < \u2212\u2212 > Y .\n(ii) Rpart = {R(1), R(2), ...R(N )} are partitions of R variables such that for every element\nRx and Ry belonging to distinct partitions, the following conditions hold true: (i) Rx and\n\n4\n\n\fRy are not neighbors and (ii) Rx and Ry are not connected by a collider path. In \ufb01gure 2\n(b): Rpart = {R(1), R(2)} where R(1) = {Rw, Rz}, R(2) = {Rx, Ry}\n(iii) M b(R(i)) is the markov blanket of R(i) comprising of all substantive variables that are\neither neighbors or connected to variables in R(i) by a collider path (Richardson [2003]). In\n\ufb01gure 2 (b): M b(R(1)) = {X, Y } and M b(R(2)) = {Z, W}.\nAppendix 10.6 demonstrates how theorem 2 leads to the recoverability of P (V ) in \ufb01gure 2,\nto which theorems in Mohan et al. [2013] do not apply.\n\nThe following corollary yields a su\ufb03cient condition for recovering the joint distribution from\nthe class of problems in which no bi-directed edge exists between variables in sets R and\nVo\u222a Vm (for example, the m-graph described in Figure 2 (c)). These problems form a subset\nof the class of problems covered in theorem 2. Subset P asub(R(i)) used in the corollary is\nthe set of all substantive variables that are parents of variables in R(i).\nIn \ufb01gure 2 (b):\nP asub(R(1)) = \u2205 and P asub(R(2)) = {Z, W}.\nCorollary 1. Let G be an m-graph such that (i) \u2200X \u2208 Vm \u222a Vo, no latent variable is a\ncommon parent of X and any member of R, and (ii) \u2200Y \u2208 Vm, Y is not a parent of Ry. If\n\u2200i, P asub(R(i)) does not contain a partially observed variables whose missing mechanism is\nin R(i) i.e. R(i) \u2229 RP asub(R(i)) = \u2205, then P (V ) is recoverable and is given by,\n\nP (v) =\n\n(cid:81)\nP (R=0,V )\ni P (R(i)=0|P asub(R(i)),R\n\nP asub (R(i) )\n\n=0)\n\n5 Non-recoverability Criteria for Joint and Conditional\n\nDistributions\n\nUp until now, we dealt with su\ufb03cient conditions for recoverability. It is important however\nto supplement these results with criteria for non-recoverability in order to alert the user to\nthe fact that the available assumptions are insu\ufb03cient to produce a consistent estimate of\nthe target query. Such criteria have not been treated formally in the literature thus far. In\nthe following theorem we introduce two graphical conditions that preclude recoverability.\n\nTheorem 3 (Non-recoverability of P (V )). Given a semi-markovian model G, the following\nconditions are necessary for recoverability of the joint distribution:\n(i) \u2200X \u2208 Vm, X and Rx are not neighbors and\n(ii) \u2200X \u2208 Vm, there does not exist a path from X to Rx in which every intermediate node\nis both a collider and a substantive variable.\n\nIn the following corollary, we leverage theorem 3 to yield necessary conditions for recovering\nconditional distributions.\nCorollary 2. [Non-recoverability of P (Y |X)] Let X and Y be disjoint subsets of substantive\nvariables. P (Y |X) is non-recoverable in m-graph G if one of the following conditions is true:\n(1) Y and Ry are neighbors\n(2) G contains a collider path p connecting Y and Ry such that all intermediate nodes in p\nare in X.\n\n6 Recovering Causal Queries\n\nGiven a causal query and a causal bayesian network a complete algorithm exists for deciding\nwhether the query is identi\ufb01able or not (Shpitser and Pearl [2006]). Obviously, a query that\nis not identi\ufb01able in the substantive model is not recoverable from missing data. Therefore,\na necessary condition for recoverability of a causal query is its identi\ufb01ability which we will\nassume in the rest of our discussion.\n\nDe\ufb01nition 3 (Trivially Recoverable Query). A causal query Q is said to be trivially recov-\nerable given an m-graph G if it has an estimand (in terms of substantive variables) in which\nevery factor is recoverable.\n\n5\n\n\fFigure 3: m-graph in which Y and Ry are not separable but still P (Y |do(Z)) is recoverable.\n\nClasses of problems that fall into the MCAR (Missing Completely At Random) and MAR\n(Missing At Random) category are much discussed in the literature ((Rubin [1976])) be-\ncause in such categories probabilistic queries are recoverable by graph-blind algorithms. An\nimmediate but important implication of trivial recoverability is that if data are MAR or\nMCAR and the query is identi\ufb01able, then it is also recoverable by model-blind algorithms.\n\nExample 2. In the gender wage-gap study example in Figure 1 (a), the e\ufb00ect of sex on\nincome, P (I|do(S)), is identi\ufb01able and is given by P (I|S). By theorem 2, P (S, X, Q, I) is\nrecoverable. Hence P (I|do(S)) is recoverable.\n6.1 Recovering P (y|do(z)) when Y and Ry are inseparable\n\nExample 3. Examine Figure 3. By backdoor criterion, P (y|do(z)) =(cid:80)\n\nThe recoverability of P (V ) hinges on the separability of a partially observed variable from its\nmissingness mechanism (a condition established in theorem 3). Remarkably, causal queries\nmay circumvent this requirement. The following example demonstrates that P (y|do(z)) is\nrecoverable even when Y and Ry are not separable.\nw P (y|z, w)P (w).\nOne might be tempted to conclude that the causal relation is non-recoverable because\nP (w, z, y) is non-recoverable (by theorem 2) and P (y|z, w) is not recoverable (by corollary\n2). However, P (y|do(z)) is recoverable as demonstrated below:\n\n(cid:88)\n\nw\n\nP (y|do(z), w, R(cid:48)\nP (w|do(z), R(cid:48)\n\nP (y|do(z)) = P (y|do(z), R(cid:48)\ny) = P (y|z, w, R(cid:48)\ny) = P (w|R(cid:48)\n(cid:88)\n\nSubstituting (3) and (4) in (2) we get:\nP (y|z, w, R(cid:48)\n\nP (y|do(z)) =\n\ny) =\n\nP (y|do(z), w, R(cid:48)\n\ny)P (w|do(z), R(cid:48)\ny)\n\ny) (by Rule-2 of do-calculus (Pearl [2009]))\n\ny) (by Rule-3 of do-calculus) )\n\n(2)\n\n(3)\n\n(4)\n\n(cid:88)\n\ny)P (w|R(cid:48)\n\ny) =\n\nP (y\u2217|z, w, R(cid:48)\n\ny)P (w|R(cid:48)\ny)\n\nw\n\nw\n\nThe recoverability of P (y|do(z)) in the previous example follows from the notion of d*-\nseparability and dormant independence [Shpitser and Pearl, 2008].\nDe\ufb01nition 4 (d\u2217-separation (Shpitser and Pearl [2008])). Let G be a causal diagram. Vari-\nable sets X, Y are d\u2217-separated in G given Z, W (written X \u22a5w Y |Z), if we can \ufb01nd sets\nZ, W , such that X \u22a5 Y |Z in Gw, and P (y, x|z, do(w)) is identi\ufb01able.\nDe\ufb01nition 5 (Inducing path (Verma and Pearl [1991])). An path p between X and Y is\ncalled inducing path if every node on the path is a collider and an ancestor of either X or\nY .\nTheorem 4. Given an m-graph in which |Vm| = 1 and Y and Ry are connected by an\ninducing path, P (y|do(x)) is recoverable if there exists Z, W such that Y \u22a5w Ry|Z and for\nW = W \\ X, the following conditions hold:\n(1) Y \u22a5\u22a5W1|X, Z in GX,W1\nP (y|do(x)) =(cid:80)\n(2) P (W1, Z|do(X)) and P (Y |do(W1), do(X), Z, R(cid:48)y) are identi\ufb01able.\nMoreover, if recoverable then,\n\nW1,Z P (Y |do(W ), do(X), Z, R(cid:48)\n\ny)P (Z, W1|do(X))\n\nand\n\nWe can quickly conclude that P (y|do(z)) is recoverable in the m-graph in \ufb01gure 3 by verifying\nthat the conditions in theorem 4 hold in the m-graph.\n\n6\n\nRyW ZY\f(a) m-graphs in which P (y|do(x)) is not recoverable (b) m-graphs in which\n\nFigure 4:\nP (y|do(x)) is recoverable.\n\n7 Attrition\n\nAttrition (i.e. participants dropping out from a study/experiment), is a ubiquitous phe-\nnomenon, especially in longitudinal studies. In this section, we shall discuss a special case\nof attrition called \u2018Simple Attrition\u2019 (for an in-depth treatment see Garcia [2013]). In this\nproblem, a researcher conducts a randomized trial, measures a set of variables (X,Y,Z) and\nobtains a dataset where outcome (Y) is corrupted by missing values (due to attrition).\nClearly, due to randomization, the e\ufb00ect of treatment (X) on outcome (Y), P (y|do(x)),\nis identi\ufb01able and is given by P (Y |X). We shall now demonstrate the usefulness of our\nprevious discussion in recovering P (y|do(x)). Typical attrition problems are depicted in\nIn Figure 4 (b) we can apply theorem 1 to recover P (y|do(x)) as given below:\n\ufb01gure 4.\nIn Figure 4 (a), we observe that Y and Ry are\nconnected by a collider path. Therefore by corollary 2, P (Y |X) is not recoverable; hence\nP (y|do(x)) is also not recoverable.\n\nP (Y |X) = (cid:80)\n\nZ P (Y \u2217|X, Z, R(cid:48)\n\ny)P (Z|X).\n\n7.1 Recovering Joint Distributions under simple attrition\n\nThe following theorem yields the necessary and su\ufb03cient condition for recovering joint dis-\ntributions from semi-markovian models with a single partially observed variable i.e. |Vm| = 1\nwhich includes models a\ufb04icted by simple attrition.\nTheorem 5. Let Y \u2208 Vm and |Vm| = 1. P (V ) is recoverable in m-graph G if and only\nif Y and Ry are not neighbors and Y and Ry are not connected by a path in which all\nintermediate nodes are colliders. If both conditions are satis\ufb01ed, then P (V ) is given by,\nP (V ) = P (Y |VO, Ry = 0)P (VO)\n\n7.2 Recovering Causal E\ufb00ects under Simple Attrition\nTheorem 6. P (y|do(x)) is recoverable in the simple attrition case (with one partially ob-\nserved variable) if and only if Y and Ry are neither neighbors nor connected by an inducing\npath. Moreover, if recoverable,\n\n(cid:88)\n\nP (Y |X) =\n\nP (Y \u2217|X, Z, R(cid:48)\n\ny)P (Z|X)\n\n(5)\n\nwhere Z is the separating set that d-separates Y from Ry.\n\nz\n\n8 Related Work\n\nDeletion based methods such as listwise deletion that are easy to understand as well as\nimplement, guarantee consistent estimates only for certain categories of missingness such as\nMCAR (Rubin [1976]). Maximum Likelihood method is known to yield consistent estimates\nunder MAR assumption; expectation maximization algorithm and gradient based algorithms\nare widely used for searching for ML estimates under incomplete data (Lauritzen [1995],\nDempster et al. [1977], Darwiche [2009], Koller and Friedman [2009]). Most work in machine\nlearning assumes MAR and proceeds with ML or Bayesian inference. However, there are\nexceptions such as recent work on collaborative \ufb01ltering and recommender systems which\ndevelop probabilistic models that explicitly incorporate missing data mechanism (Marlin\net al. [2011], Marlin and Zemel [2009], Marlin et al. [2007]).\n\n7\n\nRYYXZ(a)RYXYZ(b)\fOther methods for handling missing data can be classi\ufb01ed into two: (a) Inverse Probability\nWeighted Methods and (b) Imputation based methods (Rothman et al. [2008]).\nInverse\nProbability Weighing methods analyze and assign weights to complete records based on\nestimated probabilities of completeness (Van der Laan and Robins [2003], Robins et al.\n[1994]). Imputation based methods substitute a reasonable guess in the place of a missing\nvalue (Allison [2002]) and Multiple Imputation (Little and Rubin [2002]) is a widely used\nimputation method.\n\nMissing data is a special case of coarsened data and data are said to be coarsened at\nrandom (CAR) if the coarsening mechanism is only a function of the observed data (Heitjan\nand Rubin [1991]). Robins and Rotnitzky [1992] introduced a methodology for parameter\nestimation from data structures for which full data has a non-zero probability of being fully\nobserved and their methodology was later extended to deal with censored data in which\ncomplete data on subjects are never observed (Van Der Laan and Robins [1998]).\n\nThe use of graphical models for handling missing data is a relatively new development.\nDaniel et al. [2012] used graphical models for analyzing missing information in the form of\nmissing cases (due to sample selection bias). Attrition is a common occurrence in longitu-\ndinal studies and arises when subjects drop out of the study (Twisk and de Vente [2002],\nShadish [2002]) and Garcia [2013] analysed the problem of attrition using causal graphs.\nThoemmes and Rose [2013] and Thoemmes and Mohan [2015] cautioned the practitioner\nthat contrary to popular belief, not all auxiliary variables reduce bias. Both Garcia [2013]\nand Thoemmes and Rose [2013] associate missingness with a single variable and interactions\namong several missingness mechanisms are unexplored.\n\nMohan et al. [2013] employed a formal representation called Missingness Graphs to depict\nthe missingness process, de\ufb01ned the notion of recoverability and derived conditions under\nwhich queries would be recoverable when datasets are categorized as Missing Not At Random\n(MNAR). Tests to detect misspeci\ufb01cations in the m-graph are discussed in Mohan and Pearl\n[2014].\n\n9 Conclusion\n\nGraphical models play a critical role in portraying the missingness process, encoding and\ncommunicating assumptions about missingness and deciding recoverability given a dataset\na\ufb04icted with missingness. We presented graphical conditions for recovering joint and con-\nditional distributions and su\ufb03cient conditions for recovering causal queries. We exempli\ufb01ed\nthe recoverability of causal queries of the form P (y|do(x)) despite the existence of an in-\nseparable path between Y and Ry, which is an insurmountable obstacle to the recovery of\nP(Y). We applied our results to problems of attrition and presented necessary and su\ufb03cient\ngraphical conditions for recovering causal e\ufb00ects in such problems.\n\nAcknowledgement\n\nThis paper has bene\ufb01ted from discussions with Ilya Shpitser. This research was supported\nin parts by grants from NSF #IIS1249822 and #IIS1302448, and ONR #N00014-13-1-0153\nand #N00014-10-1-0933.\n\nReferences\n\nP.D. Allison. Missing data series: Quantitative applications in the social sciences, 2002.\n\nR.M. Daniel, M.G. Kenward, S.N. Cousens, and B.L. De Stavola. Using causal diagrams to guide\nanalysis in missing data problems. Statistical Methods in Medical Research, 21(3):243\u2013256, 2012.\n\nA Darwiche. Modeling and reasoning with Bayesian networks. Cambridge University Press, 2009.\n\nA.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the\nem algorithm. Journal of the Royal Statistical Society. Series B (Methodological), pages 1\u201338,\n1977.\n\nF. M. Garcia. De\ufb01nition and diagnosis of problematic attrition in randomized controlled experi-\n\nments. Working paper, April 2013. Available at SSRN: http://ssrn.com/abstract=2267120.\n\n8\n\n\fD.F. Heitjan and D.B. Rubin. Ignorability and coarse data. The Annals of Statistics, pages 2244\u2013\n\n2253, 1991.\n\nD Koller and N Friedman. Probabilistic graphical models: principles and techniques. 2009.\n\nS L Lauritzen. The em algorithm for graphical association models with missing data. Computational\n\nStatistics & Data Analysis, 19(2):191\u2013201, 1995.\n\nR.J.A. Little and D.B. Rubin. Statistical analysis with missing data. Wiley, 2002.\n\nB.M. Marlin and R.S. Zemel. Collaborative prediction and ranking with non-random missing data.\nIn Proceedings of the third ACM conference on Recommender systems, pages 5\u201312. ACM, 2009.\n\nB.M. Marlin, R.S. Zemel, S. Roweis, and M. Slaney. Collaborative \ufb01ltering and the missing at\n\nrandom assumption. In UAI, 2007.\n\nB.M. Marlin, R.S. Zemel, S.T. Roweis, and M. Slaney. Recommender systems: missing data and\n\nstatistical model estimation. In IJCAI, 2011.\n\nK Mohan and J Pearl. On the testability of models with missing data. Proceedings of AISTAT,\n\n2014.\n\nK Mohan, J Pearl, and J Tian. Graphical models for inference with missing data. In Advances in\n\nNeural Information Processing Systems 26, pages 1277\u20131285. 2013.\n\nJ. Pearl. Causality: models, reasoning and inference. Cambridge Univ Press, New York, 2009.\n\nJ Pearl and K Mohan.\n\nRecoverability and testability of missing data:\n\nIntroduc-\nAvailable at\n\ntion and summary of results.\nhttp://ftp.cs.ucla.edu/pub/stat ser/r417.pdf.\n\nTechnical Report R-417, UCLA, 2013.\n\nT Richardson. Markov properties for acyclic directed mixed graphs. Scandinavian Journal of\n\nStatistics, 30(1):145\u2013157, 2003.\n\nJ M Robins and A Rotnitzky. Recovery of information and adjustment for dependent censoring\n\nusing surrogate markers. In AIDS Epidemiology, pages 297\u2013331. Springer, 1992.\n\nJ M Robins, A Rotnitzky, and L P Zhao. Estimation of regression coe\ufb03cients when some regressors\nare not always observed. Journal of the American Statistical Association, 89(427):846\u2013866, 1994.\n\nK J Rothman, S Greenland, and T L Lash. Modern epidemiology. Lippincott Williams & Wilkins,\n\n2008.\n\nD.B. Rubin. Inference and missing data. Biometrika, 63:581\u2013592, 1976.\n\nW R Shadish. Revisiting \ufb01eld experimentation: \ufb01eld notes for the future. Psychological methods, 7\n\n(1):3, 2002.\n\nI Shpitser and J Pearl. Identi\ufb01cation of conditional interventional distributions. In Proceedings of\n\nthe Twenty-Second Conference on Uncertainty in Arti\ufb01cial Intelligence, pages 437\u2013444. 2006.\n\nI Shpitser and J Pearl. Dormant independence. In AAAI, pages 1081\u20131087, 2008.\n\nF Thoemmes and K Mohan. Graphical representation of missing data problems. Structural Equation\n\nModeling: A Multidisciplinary Journal, 2015.\n\nF. Thoemmes and N. Rose. Selection of auxiliary variables in missing data problems: Not all\n\nauxiliary variables are created equal. Technical Report R-002, Cornell University, 2013.\n\nJ Twisk and W de Vente. Attrition in longitudinal studies: how to deal with missing data. Journal\n\nof clinical epidemiology, 55(4):329\u2013337, 2002.\n\nM J Van Der Laan and J M Robins. Locally e\ufb03cient estimation with current status data and\ntime-dependent covariates. Journal of the American Statistical Association, 93(442):693\u2013701,\n1998.\n\nM.J. Van der Laan and J.M. Robins. Uni\ufb01ed methods for censored longitudinal data and causality.\n\nSpringer Verlag, 2003.\n\nT.S Verma and J Pearl. Equivalence and synthesis of causal models. In Proceedings of the Sixth\n\nConference in Arti\ufb01cial Intelligence, pages 220\u2013227. Association for Uncertainty in AI, 1991.\n\n9\n\n\f", "award": [], "sourceid": 817, "authors": [{"given_name": "Karthika", "family_name": "Mohan", "institution": "UCLA"}, {"given_name": "Judea", "family_name": "Pearl", "institution": "UCLA"}]}