{"title": "Triad Constraints for Learning Causal Structure of Latent Variables", "book": "Advances in Neural Information Processing Systems", "page_first": 12883, "page_last": 12892, "abstract": "Learning causal structure from observational data has attracted much attention, and it is notoriously challenging to find the underlying structure in the presence of confounders (hidden direct common causes of two variables). In this paper, by properly leveraging the non-Gaussianity of the data, we propose to estimate the structure over latent variables with the so-called Triad constraints: we design a form of \"pseudo-residual\" from three variables, and show that when causal relations are linear and noise terms are non-Gaussian, the causal direction between the latent variables for the three observed variables is identifiable by checking a certain kind of independence relationship. In other words, the Triad constraints help us to locate latent confounders and determine the causal direction between them. This goes far beyond the Tetrad constraints and reveals more information about the underlying structure from non-Gaussian data. Finally, based on the Triad constraints, we develop a two-step algorithm to learn the causal structure corresponding to measurement models. Experimental results on both synthetic and real data demonstrate the effectiveness and reliability of our method.", "full_text": "Triad Constraints for Learning Causal Structure of\n\nLatent Variables\n\nRuichu Cai\u2217 1, Feng Xie \u22171, Clark Glymour 2, Zhifeng Hao 1,3, Kun Zhang 2\n\n1 School of Computer Science, Guangdong University of Technology, Guangzhou, China\n\n2 Department of Philosophy, Carnegie Mellon University, Pittsburgh, USA\n3 School of Mathematics and Big Data, Foshan University, Foshan, China\n\ncairuichu@gdut.edu.cn,xiefeng009@gmail.com,cg09@andrew.cmu.edu\n\nzfhao@gdut.edu.cn,kunz1@cmu.edu\n\nAbstract\n\nLearning causal structure from observational data has attracted much attention,\nand it is notoriously challenging to \ufb01nd the underlying structure in the presence\nof confounders (hidden direct common causes of two variables). In this paper,\nby properly leveraging the non-Gaussianity of the data, we propose to estimate\nthe structure over latent variables with the so-called Triad constraints: we design\na form of \"pseudo-residual\" from three variables, and show that when causal\nrelations are linear and noise terms are non-Gaussian, the causal direction between\nthe latent variables for the three observed variables is identi\ufb01able by checking a\ncertain kind of independence relationship. In other words, the Triad constraints\nhelp us to locate latent confounders and determine the causal direction between\nthem. This goes far beyond the Tetrad constraints and reveals more information\nabout the underlying structure from non-Gaussian data. Finally, based on the\nTriad constraints, we develop a two-step algorithm to learn the causal structure\ncorresponding to measurement models. Experimental results on both synthetic and\nreal data demonstrate the effectiveness and reliability of our method.\n\n1\n\nIntroduction\n\nTraditional methods for causal discovery, which aims to \ufb01nd causal relations from (purely) observa-\ntional data, can be roughly divided into two categories, namely constraint-based methods including\nPC [Spirtes and Glymour, 1991] and FCI [Spirtes et al., 1995; Colombo et al., 2012], and score-based\nones such as GES [Chickering, 2002] and GES with generalized scores [Huang et al., 2018]. A num-\nber of methods focus on estimating causal relationships between observed variables and fail to recover\nthe underlying causal structure of latent variables. For example, from large enough data generated by\nthe structure in Figure 1, where Li are latent variables and Xi are observed ones, we may only get a\ncomplete graph using the PC algorithm [Spirtes and Glymour, 1991], a widely-used constraint-based\n\nmethod, since there is no d-separation relation among the observed variables (although{X1} and\n{X2, X3} are d-separated by L1, which is latent). Besides, in reality we can measure only a limited\n\nnumber of variables and the causal in\ufb02uences may happen at the level of latent variables, so we are\noften concerned about the causal structure of latent variables; see e.g., Bartholomew et al. [2008].\nThere exist several methods for causal discovery in the case with confounders. Spirtes et al. [2000]\nattempt to resolve this problem using the so-called Tetrad constraints [Spearman, 1928]. Inspired\nby Tetrad constraints, various contributions have been made towards estimating structure over latent\n\n\u2217These authors contributed equally to this work.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fvariables. For instance, Silva and Scheines [2005] presented testable statistical conditions to identify\nd-separations in linear latent variable models, Silva et al. [2006] propose the BPC algorithm using\nTetrad constraints to discovery causal structure of latent variables, and Shimizu et al. [2009] further\napplied analysis based on the Linear, Non-Gaussian, Acyclic Model (LiNGAM) [Shimizu et al.,\n2006] to the recovered latent variables to further improve the estimated causal relations between\nthem; Sullivant et al. [2010] showed that a sub-matrix of the covariance matrix with low rank\ncorresponds to conditional independence constraints on the collections of Gaussian data and proposed\na trek separation criterion to learn causal structure. Recently, Kummerfeld and Ramsey [2016]\nused the extended t-separation [Spirtes, 2013] to infer causal relations of latent variables, with the\nFindOneFactorClusters (FOFC) algorithm. However, these methods fail to work when latent variables\nhave fewer than three pure measurement variables. Furthermore, even when this condition holds,\nTetrad and its variants may not be able to \ufb01nd the causal direction between latent variables. Over-\ncomplete independent component analysis offers another method [Hoyer et al., 2008], as an extension\nof the LiNGAM analysis; however, this analysis is generally hard to do, especially when there are\nrelatively many latent variables, and the method does not focus on the structure of latent variables.\nMore recently, Zhang et al. [2017] and Huang et al. [2015] deal with a speci\ufb01c type of confounders,\nwhich can be written as functions of the time/domain index in nonstationary/heterogeneous data.\nOverall, learning the structure of latent variables is a challenging problem; for instance, none of the\nabove methods is able to recover the causal structure as shown in Figure 1.\nIt is desirable to develop testable conditions on the observed data to estimate the structure of latent\nvariables. Interestingly, we \ufb01nd that given three variables in the non-Gaussian case, the independence\ncondition between one of them and a certain linear combination of the remaining two variables gives\nhints as to the causal structure even in the presence of latent confounders. In particular, given a set\n\nof three distinct and dependent variables{Xi, Xj, Xk}, we de\ufb01ne a particular type of \"regression\nresidual,\" E(i,j\u08af k) \u08bc= Xi \u2212 Cov(Xi,Xk)\nCov(Xj ,Xk) \u22c5 Xj. Then whether E(i,j\u08af k) is independent from Xk contains\n\ninformation regarding where latent confounders might be and the causal relationships among them.\nWe term this condition the Triad constraint.\nWe further extend our Triad constraints to learn the\nstructure of a wide class of linear latent structure mod-\nels from non-Gaussian data. Speci\ufb01cally, we propose\na two-phase algorithm to discover the causal relation-\nships of latent variables. It \ufb01rst \ufb01nds pure clusters\n(clusters of variables having only one common latent\nvariable and no observed parent) from observed data\nin phase I. Then in phase II it learns the causal order of\nlatent variables based on the clusters. Compared with\nTetrad constraints, Triad constraints can reveal more\ninformation about the causal structure involving latent\nvariables for non-Gaussian data. For instance, Triad\nconstraints can be used to locate the latent variables Li, i = 1, ..., 5, in Figure 1 and identify their\nstructure, including their causal direction, but Tetrad constraints cannot (see the details in Section 4).\nOur main contributions include 1) proposing a novel constraint involving only three non-Gaussian\nvariables, namely the Triad constraint, and showing the connection between this constraint and the\nunderlying causal structure, which helps identify causal information of latent confounders, and 2)\ndeveloping a two-phase algorithm to learn the causal structure of latent variables, including causal\nskeleton and causal directions, based on the Triad constraints.\n\nFigure 1: A causal structure involving 5 la-\ntent variables.\n\nc d\n\nX2 X3\n\n\u03b8\n\nL5\n\nX5\n\nX7 X8\n\nX1\n\nL4\n\ne\n\nX6\n\na\n\n\u03bb\n\n\u03b1\n\n\u03b3\n\nL2\n\nL3\n\nf\n\n\u03b2\n\nL1\n\nb\n\nX4\n\n2 Problem De\ufb01nition\n\nIn this work, we focus on a particular type of linear latent structure model. Let X ={X1, X2, ...Xm}\ndenote the observed variable set, L ={L1, L2, ...Ln} denote the latent variable set, and V = X \u222a L\nbikVk + \u03b5Vi,i = 1, 2, ..., m + n, where P a(Vi) contains all the parent variables of Vi and\nVk\u2208P a(Vi),k\u2260i\n\ndenote the full variable set. In the linear latent structure model, the data generation process follows:\n1) the structure of V can be represented by a Direct Acyclic Graph (DAG), 2) no observed variable\nin X is an ancestor of any latent variable in L, 3) the generation of V is assumed to follow Vi =\n\n\u2211\n\n2\n\n\fbik is the causal strength from Vk to Vi; and 4) all \u03b5Vi are noise (disturbance) variables which are\nindependent with each other.\nBPC, FOFC, and their variants [Silva et al., 2006; Kummerfeld and Ramsey, 2016] have been shown\nto be able to recover a certain amount of causal information for some linear latent structure models\nfrom observed data. These methods usually assume that each latent variable has at least three pure\nmeasurement variables, which may not hold in practice, e.g., for the example given in Figure 1;\nfurthermore, they cannot always recover the causal direction between latent variables. Here, pure\nmeasurement variables are de\ufb01ned as measured variables that have only one latent parent and no\nobserved parent.\nHere, we greatly relax the structural assumption of Tetrad; we consider the case where each latent\nvariable has two or more pure variables as children, under the assumption of non-Gaussianity of the\nnoise terms. Here, pure variables are the variables that may be latent or observed but have only one\nparent. The model is de\ufb01ned as follows.\nDe\ufb01nition 1 (Non-Gaussian Two-Pure Linear Latent Structure Model). A linear latent structure\nmodel is called a Non-Gaussian Two-Pure (NG2P) linear latent structure model if it further satis\ufb01es\nthe following three assumptions:\n\n1) [Purity Assumption] there is no direct edges between the observed variables;\n\n2) [Two-Pure Child Variable Assumption] each latent variable has at least two pure variables\n\nas children;\n\n3) [Non-Gaussianity Assumption] the noise terms are non-Gaussian.\n\nOne may wonder how restrictive the above assumptions are and how to interpret the result produced\nby our proposed method when the assumptions, especially assumption 1), are violated. We will\ndiscuss such issues in Section 5.\n\n3 Triad Constraints: A Brief Formulation\n\nWe begin with the de\ufb01nition of Triad constraints, the independence relationship between the \"pseudo-\nresidual\" and the observed variables. It is worth noting that there is some related work that also exploits\nsimilar concepts to \"pseudo-residual\", e.g., in the context of auxiliary variables (or instrumental\nvariables)[Chen et al., 2017] or pseudo-variable [Drton and Richardson, 2004], but to the best of our\nknowledge, it has not been realized that the independence property involving such pseudo-residuals\nre\ufb02ects structural asymmetry of the latent variables.\nDe\ufb01nition 2 (Triad constraints). Suppose Xi, Xj and Xk are distinct and correlated variables and\n\nthat all noise variables are non-Gaussian. De\ufb01ne the pseudo-residual of{Xi, Xj} relative to Xk,\nE(i,j\u08af k) \u08bc= Xi \u2212 Cov(Xi, Xk)\nCov(Xj, Xk) \u22c5 Xj.\nWe say that{Xi, Xj} and Xk satisfy Triad constraint if and only if E(i,j\u08af k) \u0002 Xk, i.e.,{Xi, Xj} and\nXk violate the Triad constraint if and only if E(i,j\u08af k) \u0002 Xk.\n\nwhich is called a reference variable, as\n\n(1)\n\nThe following two theorems show some interesting properties of the Triad constraints, which are\nfurther explored to discover the causal structure among the latent variables. We \ufb01rst aim at the\nidenti\ufb01cation of the causal direction of latent variables by analyzing the variables in the clusters. The\nfollowing theorem shows the asymmetry between the latent variables in light of the Triad condition\nin the non-Gaussian case.\nTheorem 1. Let La and Lb be two directed connected latent variables without confounders and\n\nlet{Xi} and{Xj, Xk} be their children, respectively. Then if{Xi, Xj} and Xk violate the Triad\n\nconstraint, La \u2192 Lb holds. In other words, if the Triad condition is violated and the latent variables\nhave no confounders, then the latent variable of the reference variable is a child of the other latent\nvariable.\n\nThe proof is given in the Supplementary Material, and it heavily relies on the Darmois-Skitovich\nTheorem Kagan et al. [1973], which essentially says that as long as two variables share any non-\nGaussian, independent component, they cannot be statistically independent. The following example\n\n3\n\n\fshows that Triad constraints help \ufb01nd the causal direction between two latent variables from their\npure clusters.\n\nExample 1. Consider the example in Figure 1, clusters{X1} and{X4, X5} have corresponding\nwith any child of L2 is violated, i.e., E(1,4\u08af 5) \u0002 X5, and E(1,5\u08af 4) \u0002 X4, but E(4,5\u08af 1) \u0002 X1. This\n\nlatent variables L1 and L2, respectively. Because L1 \u2192 L2 without a confounder, any Triad condition\n\nshows the asymmetry between L1 and L2, implied by the three observed variables.\n\nOne might wonder whether we can make use of the Triad constraints in the Gaussian case to\ninfer the causal direction between L1 and L2 in the above example. Unfortunately, one can show\n\nE(1,2\u08af 3) \u0002 X3, E(1,3\u08af 2) \u0002 X2 and E(2,3\u08af 1) \u0002 X1 when the variables are jointly Gaussian, and thus\n\nthe asymmetry between L1 and L2 disappears.\nThe second theorem is about the property of the clusters in terms of the Triad constraints. Here we\nsay a set of observed variables is a cluster if these variables have the same latent variable as the\nparent. Intuitively, if such variables are pure variables, they are equivalent under the Triad constraints.\nFor example, X2 and X3 in Figure 1 have the same constraints. Theorem 2 formalizes this property\nof clusters and gives the criterion for \ufb01nding clusters.\n\nTheorem 2. Let S be a correlated variable set. If \u2200Xi, Xj \u2208 S and \u2200Xk \u2208 X\\ S,{Xi, Xj} and\nExample 2. Consider the example in Figure 1, for{X4, X5}, one may \ufb01nd{X4, X5} and Xi satisfy\nTriad constraint, where i = 1, 2, 3, 6, 7, 8, so{X4, X5} is a cluster. But for{X1, X4}, E(1,4\u08af 5) is not\nindependent of X5, so{X1, X4} is not a cluster.\n\nThe proof is given in the Supplementary Material. The following example illuminates how the\ntheorem can be used to distinguish the cluster of the variables.\n\nXk satisfy the Triad constraints, then S is a cluster.\n\n4 Triad Constraint-Based Causal Latent Structure Discovery\n\nIn this section, we extend the above results to estimate the NG2P linear latent structure. To this\nend, we propose a two-phase algorithm to Learn the Structure of latent variables based on Triad\nConstraints (LSTC). It \ufb01rstly \ufb01nds pure clusters from the observed data (phase I), and then it learns\nthe structure of the latent variables behind these clusters (phase II).\n\n4.1 Phase 1: Finding Clusters\n\nTheorem 2 has paved the way to discover the clusters of the variables. It also enables us to use a\ncluster fusion-like method to discover the clusters of observed variables and latent variables that\nhave already been found, i.e., we recursively \ufb01nd the clusters of variables and merge the overlapping\nclusters. Here we consider two practical issues involved in such a recursive fusion algorithm. The\n\ufb01rst is what clusters are to be merged, and the second is how to check whether Triad constraints\ninvolving latent variables hold given that they are hidden.\nFor the merge problem, we \ufb01nd that the overlapping clusters can be directly merged into one cluster.\nThis is because the overlapping clusters have the same latent variable as the parent under the NG2P\nlinear latent structure. The validity of the merge step is guaranteed by Proposition 1.\nProposition 1. Let C1 and C2 be two clusters. If C1 and C2 are overlapping, C1 and C2 share the\nsame latent parent.\n\nThis proposition holds true because of the equivalence of the pure variables in terms of Triad\nconstraints. In particular, as shown in Theorem 2, all variables in a cluster have the same Triad\nconstraints.\nAfter we \ufb01nd and merge clusters, we associate each cluster with a latent variable and, in fact, replace\nthe variables in the cluster by the corresponding latent variable. We will then continue \ufb01nding\nclusters and merging clusters. Since we replace variables in the same cluster with the associated\nlatent variable, clearly subsequent Triad constraints to be checked may involve latent variables. How\ncan we check such constraints without knowing the values of the latent variables? Thanks to the\nlinearity assumption and the transitivity of linear causal relations, one can use its child to test the\n\nTriad constraints. Consider the example in Figure 1. Suppose we already found the cluster{X2, X3}\n\n4\n\n\fAlgorithm 1 FindClusters\nOutput: Partial causal structure G\n1: Initialize C = \u2205, G = \u2205, V = X;\n2: repeat\n3:\n4:\n5:\n\nInput: Data set X ={X1, ..., Xm}\nfor each{Vi, Vj} \u2208 V do\nV\\{Vi, Vj} then\nif E(i,j\u08af k) \u0002 Vk holds for \u2200Vk \u2208\nC = C \u222a{{Vi, Vj}};\n\nif Vi and Vj then\n\n6:\n7:\n8:\n9:\n\nend if\n\nend if\nend for\n\n10: Merge all the overlapping sets in C.\n11:\n12:\n\nfor each S \u2208 C do\n\nIntroduce a latent variable L for S and\ninitialize L with the value of any vari-\nable of S;\n\nV =(V\\ S) \u222a{L};\nG = G \u222a{L \u2192 Vi\u08afVi \u2208 S};\n\n13:\n14:\nend for\n15:\n16: until V contains only latent variables.\n17: Return: G\n\n{X1, L4} and X5, holds true if and only if{X1, X2} and X5 holds because X3 is not in the variable\n\nand associated it with a latent variable, say L4. Then one can see that if only one variable in this\ncluster, say X2, is kept (i.e., X3 is removed), then any subsequent Triad constraint, e.g., that of\n\nset and L4 and its only child, X2, have the same Triad properties relative to any other remaining\nvariable. That means, we can just use the observed variables of X2 as the values of L4 and ignore all\nthe other variables in the same cluster for the purpose of checking Triad constraints.\nConsideration of the above two issues directly leads to the following algorithm, which includes three\nmain steps: 1) \ufb01nd the clusters according to Theorem 2; 2) merge the overlapping clusters according\nto Proposition 1; 3) introduce a new latent variable to represent a newly discovered cluster and use\nthe values of an arbitrary variable in the cluster as the observed values of the latent variable for\nsubsequent Triad condition checking. This procedure is illustrated with the following example.\n\nExample 3. Consider the example in Figure 1. First, we \ufb01nd the clusters{X2, X3},{X4, X5},\n{X7, X8} based on the Theorem 2 (line 3-8). Second, introduce L4, L2 and L5 as the parents\nfor{X2, X3},{X4, X5},{X7, X8}, respectively, whose values are set to those of X2, X4 and X7,\nrespectively. Third, we \ufb01nd the clusters{X1, L4},{X6, L5} on the updated V based on Theorem\n2 (line 3-8). Fourth, introduce L1 and L3 as the parents of{X1, L4} and{X6, L5}, respectively.\nFinally, we return the clusters of the variables in the form of partial graph as G ={L1 \u2192{X1, L4},\nL4 \u2192{X2, X3}, L2 \u2192{X4, X5}, L3 \u2192{X6, L5} and L5 \u2192{X7, X8}}.\n\n4.2 Phase 2: Learning the Structure of Latent Variables\n\nGiven the clusters discovered in the previous step, we aim to recover the structure among the root\nlatent variables of each cluster. Due to the availability of various independence test methods for the\nlatent variables, the causal order is the focus of this learning procedure. As an immediate extension\nof Theorem 1, the root latent variable can be identi\ufb01ed by checking the Triad constraints, as stated in\nthe following proposition.\n\nProposition 2. Given a latent variable Lr and its two children{Vi, Vj}, Lr is a root latent variable\nif and only if E(k,i\u08afj) \u0002 Vj holds for each Vk, where Vk is a child of any other latent variables.\n\nThis proposition inspires us to use a recur-\nsive approach to discover the causal order;\nwe recursively identify the root latent vari-\nable and update the data by removing the\nroot variable\u2019s effect, until the causal order\nover all latent variables is determined. The\nkey concern of such recursive approach is\nwhether Proposition 2 still works on the\nupdated data.\nFortunately, we \ufb01nd that there is still asym-\nmetry implied by the Triad constraints if\n\nwe update the data as follows: let{Vi, Vj}\n\nbe two pure variables of the root latent Lr,\n\n\u2032\nL\n2\n\n\u03b2\n\n\u2032\nL\n3\n\nX5\n\nX7\n\nc d\n\ne f\n\n\u03b8\n\nd\u03b1L1 + \u03b5X5\n\nE(4,1\u08af 2)\nE(6,1\u08af 2)\nf \u03b8(\u03b3 + \u03b1\u03b2)L1 + \u03b5\n\u2212 e(\u03b3+\u03b1\u03b2)\nfects of L1 through{X1, X2}, where L\n\nFigure 2: Structure obtained after removing the ef-\n3 =\n+ \u03b5X7, and the in\ufb02uences of\n\ne\u03b2\u03b5L2\nnoise terms are shown by dashed lines.\n\n2 = \u03b5L2, L\n\n+ \u03b5L3, \u03b5\n\n\u2212 c\u03b1\na\n\n\u22c5 \u03b5X1\n\n\u2032\nX7\n\n= f \u03b5L5\n\n\u03b5X6\n\n\u03b5X1\n\na\n\n\u2032\n\n\u2032\n\n\u2032\nX7\n\n\u03b5X4\n\n5\n\n\ffor any other remaining latent variable L, we update the value of Vk, which is a child of the value\n\nof L, as Vk \u08bc= E(k,i\u08afj) and keep the value of the other children unchanged. On the updated data,\nthe property of the root, i.e., E(k,i\u08afj) is independent of Xj still holds. Recall the example given\ni.e., E(4,1\u08af2) and E(6,1\u08af2) share a common noise \u03b5X1, as seen in Figure 2,{E(4,1\u08af2),{E(6,1\u08af2)} and X5\nsatisfy the Triad constraint, while{E(4,1\u08af2),{E(6,1\u08af2)} and X7 violate it. More detail is given in the\n\nin Figure 1, although such a removal step introduces common effect into the updated variables,\n\nSupplementary Material.\nGiven the causal order of the variables, we can \ufb01nd the causal structure simply by removing redun-\ndant edges from the full acyclic graph using the independence test methods. Here we adopt the\nindependence test method proposed in [Silva et al., 2006] (see Theorem 19 therein for the detail).\nFinally, we present the following recursive algorithm for learning the structure over latent variables,\nand give the following example for illustration.\n\nAlgorithm 2 LearnLatentStructure\nInput: Partial causal structure G\nOutput: Complete causal structure G\n1: Initialize L with the root variables of each\n2: Select two pure child for each L \u2208 L;\n3: repeat\n4:\n\nsubgraph in G and Lr = \u03c6;\n\nFind the root node Lr and it\u2019s children\nLchild be the largest set satis\ufb01ng Proposi-\ntion 2 and add the Lr into Lr;\nwhile L\u2032 \u2260 \u03c6 do\n\nL = L\\{Lr \u222a Lchild}, L\u2032 ={Lr \u222a Lchild};\nL\u2032 = L\u2032\\{L\nr};\n\n\u2032\nFind the root node L\nto Proposition 2.\n\nr from L\u2032 according\n\n\u2032\nLet Vi, Vj be the children of L\nr;\n\n\u2032\n\n5:\n6:\n7:\n\n8:\n9:\n\n10:\n11:\n12:\n\n\u2032};\n\nfor each L\n\n\u2032) as Vk =\n\n\u2032 \u2208 L\u2032 do\n\u2032\nr \u2192 L\n\nupdate Vk (a child of L\n\nG = G \u222a{L\nE(k,i\u08afj);\nend for\n16: if\u08afLr\u08af > 1 then\n13:\nend while\n14:\n15: until L = \u03c6\nG = G \u222a{L \u2192 Lr} for all Lr \u2208 Lr;\n\nConstruct an new latent variable L;\n\n17:\n18:\n19: end if\n20: Remove the redundant edges of G using the\n21: Return: G\n\nmethod given in [Silva et al., 2006]);\n\nExample 4. Continue to consider the example in Figure 1. Given the partial structure discovered\n\nin previous phase, i.e., L1 \u2192{X1, L4}, L4 \u2192{X2, X3}, L2 \u2192{X4, X5}, L3 \u2192{X6, L5} and L5 \u2192\n{X7, X8}, the algorithm proceeds is as follows. First, we \ufb01nd three latent variables{L1, L2, L3} in\nL1 is the root variable (Line 4). Third, we update data make use of{X1, X2} (Line 12) and the\n{E(4,1\u08af2),{E(6,1\u08af2)} and X5 satis\ufb01es the Triad constraint, while{E(4,1\u08af2),{E(6,1\u08af2)} and X7 violates it.\nFinally, the whole structure is L1 \u2192{L4, L2, L3}, L2 \u2192 L3, and L3 \u2192 L4.\n\nthe partial graph G that cannot be further merged (Line 1). Second, we \ufb01nd that the latent variable\n\nresults are given in Figure 2 . Fourth, we \ufb01nd that L2 a root latent variable of L3 (Line 7), because\n\n5 Discussion of the Assumptions of Our Model\n\n\u03b1\n\n\u03b2\n\nL3\n\n\u03b3\n\nL2\n\nL1\n\nab\n\nTo understand the applicability of our\nmodel (De\ufb01nition 1), we discuss the plau-\nsibility of the involved three assumptions\nand what may happen if they are violated.\nIf Purity Assumption is violated, i.e., there\nare directed links between observed vari-\nables, there may exist pure models equiv-\nalent to the underlying causal structure in\nterms of Triad constraints. For example,\nif we have enough data generated by the\nnon-pure structure given in Figure 3, the\nestimated structure would be the one given\nin Figure 1. In the result, one essentially\nuses another latent variable (e.g., L4) to replace the direct causal relation between the observed\n\nFigure 3: An non-pure latent causal structure, which\ncan be transformed into the equivalent pure structure in\nFigure 1, by simply using a latent variable to represent\nthe direct causal relation among the observed variables.\n\nX1 X2 X3\n\nX6 X7 X8\n\nc d\n\nef\n\nX4\n\nX5\n\n6\n\n\fvariables (e.g., X2 and X3). It is challenging but desirable to give a characterization of the result\ngiven by our procedure and its connection to the underlying causal structure in the general case.\nFor Two-Pure Children Variable Assumption, our assumption is much milder than that of Tetrad:\nwe only need two pure variables for each latent variable, while Tetrad needs three pure observed\nvariables for each latent variable. For Non-Gaussianity Assumption, we note that this assumption can\nbe easily tested from the observed data. Furthermore, non-Gaussian distributions, unlike Gaussian\nones, are expected to be ubiquitous, due to Cram\u00e9r Decomposition Theorem [Cram\u00e9r, 1962], as\nargued in Spirtes and Zhang [2016]. In fact, for our algorithm, this assumption can be relaxed to at\nmost one noise term is Gaussian for observed variables, but not for latent confounders.\n\n6 Simulation\n\nFor fair comparison, we simulate data following the linear latent structure model. There are four\ntypical cases: Cases 1 and 2 have two latent variables L1 and L2, with L1 \u2192 L2, and Cases 3 and\n4 have three latent variables L1, L2, and L3, with L2 \u2190 L1 \u2192 L3, and L2 \u2192 L3, respectively.\nNote that the simulated structure does not necessarily follow the pure assumption of our model (e.g.\nX2 \u2192 X5 violates the purity assumption of our model), we simply recover the equivalent pure latent\nvariable model for such structure as discussed in Section 5. In all four cases, the causal strength b is\n\nsampled from a uniform distribution between[\u22122, \u22120.5] \u222a[0.5, 2], noise terms are generated as the\n\ufb01fth power of uniform(-1,1) variables, and the sample size is selected from{500, 1000, 2000}. The\n\u2022 Case 1: L1 and L2 both have two pure measurement variables, i.e., L1 \u2192{X1, X2} and\nL2 \u2192{X3, X4}.\nand add edges{X2 \u2192 X5, X4 \u2192 X6}.\n\u2022 Case 3: each latent variable has two measurement variables, i.e., L1 \u2192{X1, X2}, L2 \u2192\n{X3, X4}, L3 \u2192{X5, X6}.\nedges{X9 \u2192 X10, X11 \u2192 X12}.\n\n\u2022 Case 2: adding impure variables to Case 1. We add X5 and X6 to L1 and L2 respectively,\n\ndetails of these networks are as follow.\n\n\u2022 Case 4: adding impurities to Case 3. In detail, we add two measurement variables to each\nlatent variable, i.e., add X7, X8 to L1, X9, X10 to L2, and X11, X12 to L3. Further add\n\nConsidering the data with non-Gaussian noise variables, we choose the Hilbert-Schmidt Independence\nCriterion (HSIC) test [Gretton et al., 2008] as the independence test. We compared the proposed\nalgorithm with the BPC [Silva et al., 2006] and FOFC [Kummerfeld and Ramsey, 2016] algorithms2.\nThe method by Shimizu et al. [2009] exploits BPC as its \ufb01rst step, so it is not used for comparison,\ngiven that BPC is included. All the following experimental results are based on 10 runs of the\nalgorithms over randomly generated data.\nIn the experiment, the discovered measurement model and the reconstructed structure model are\ncompared with ground truth to evaluate the performance of the algorithms. To evaluate the quality\nof the measurement model, we use Latent omission= OL\nT L , and Mismeasure-\nment= M O\nT O as the evaluation metrics, where OL is the number of omission latent variables, F L is the\nnumber of false latent variables, and T L is the total number of latent variables in ground truth graph\n(See the details in [Silva et al., 2006]) . To evaluate the quality of the reconstructed structure model,\nwe further use the F 1 = 2P \u00d7R\nP +R as our metric. Here P and R are the precision and recall, respectively.\n\nT L , Latent commission= F L\n\nAs shown in Table 1, our algorithm, LSTC, achieves the best performance (the lowest errors) on all\ncases of the measurement model. Notably, when the sample size reaches 2000, the latent omission,\nlatent commission, and mismeasurements of our method all reach 0. The BPC and FOFC algorithms\n(with the Delta test, a distribution-free test) do not perform well. These \ufb01ndings demonstrate that our\nalgorithm requires only two pure variables in the measurement model, which is a clear advantage\nover the compared methods. Because of the clear performance gap, we only report the results of our\nmethods on structure learning in Figure 4.\n\n2We used these implementations in the TETRAD package, which can be downloaded at http://www.phil.cmu.\n\nedu/tetrad/.\n\n7\n\n\fTable 1: Evaluation of output latent variables\n\nAlgorithm\n\nLatent omission\n\nBPC\n\nFOFC\n\nLatent commission\n\nBPC\n\nFOFC\n\nMismeasurements\n\nBPC\n\nFOFC\n\nCase 1\n\nCase 2\n\nCase 3\n\nCase 4\n\nLSTC\n0.00(0)\n0.00(0)\n0.00(0)\n0.10(0)\n0.05(0)\n0.00(0)\n0.20(0)\n0.13(0)\n0.00(0)\n0.00(0)\n0.00(0)\n0.00(0)\n\n500\n1000\n2000\n500\n1000\n2000\n500\n1000\n2000\n500\n1000\n2000\n\n-\n-\n-\n\n-\n\n-\n\n0.50(2)\n0.65(3)\n\n0.86(6)\n0.93(8)\n\n0.10(0)\n0.00(0)\n0.00(0)\n\n-\n-\n-\n\n-\n-\n\n-\n-\n\n0.90(8)\n\n0.96(9)\n\n0.13(0)\n0.16(0)\n0.50(0)\n\nLSTC\n0.00(0)\n0.00(0)\n0.00(0)\n0.05(0)\n0.00(0)\n0.00(0)\n0.03(0)\n0.00(0)\n0.00(0)\n0.00(0)\n0.26(0)\n0.00(0)\n\n-\n-\n-\n\n-\n\n-\n\n0.00(2)\n0.00(3)\n\n0.00(6)\n0.00(8)\n\n0.00(0)\n0.00(0)\n0.00(0)\n\n-\n-\n-\n\n-\n-\n\n-\n-\n\n0.00(8)\n\n0.00(9)\n\n0.00(0)\n0.00(0)\n0.00(0)\n\nLSTC\n0.00(0)\n0.00(0)\n0.00(0)\n0.03(0)\n0.00(0)\n0.00(0)\n0.17(0)\n0.00(0)\n0.00(0)\n0.00(0)\n0.00(0)\n0.00(0)\n\n-\n-\n-\n\n-\n\n-\n\n0.06(2)\n0.05(3)\n\n0.71(6)\n0.85(8)\n\n0.04(0)\n0.00(0)\n0.00(0)\n\n-\n-\n-\n\n-\n-\n\n-\n-\n\n0.03(8)\n\n0.93(9)\n\n0.04(0)\n0.00(0)\n0.01(0)\n\nNote: The number in parentheses indicates the number of occurrences that the current algorithm cannot\ncorrectly solve the problem. If the result of a method is always wrong, we use the symbol \u2019-\u2019 to indicate it.\n\nAs shown in Figure 4, the F1 score gradually increases to 1 as the sample size increases in all the four\ncases, which illustrates that our algorithm can recover the complete structure of the latent variables,\nincluding their causal directions.\n\n7 Application to Stock Market Data\n\nWe now apply our algorithm to discover the causal\nnetwork behind the Hong Kong stock market. The\ndata set contains 1331 daily returns of 14 major\nstocks. Although some interesting results have been\ndiscovered on the data [Zhang and Chan, 2008], the\nlatent variables behind the stocks are still unexplored.\nThe kernel width in the HSIC test [Gretton et al.,\n2008] is set to 0.1. Note that the condition for\n\ufb01nding clusters (Theorem 2) might be partially vi-\nolated in the real world; we choose the candi-\ndate clusters with the highest number of satis\ufb01ed\nTriad constraints in the algorithm, which proceeds\n\nas follows. First, {X4, X7, X12}, {X2, X3, X6},\n{X1, X10, X11},{X5, X8, X13}, and{X9, X14} are\n\n1\n\n1\n\n1\n\n1\n\n1\n\n1\n\n1\n\ne\nr\no\nc\no\ns\n\n1\nF\n\n0.9\n\n0.8\n\n0.7\n\n0.9\n\n0.84\n\n0.9\n\n0.77\n\n0.8\n\n0.7\n\nCase 1\n\nCase 2\n\nCase 3\n\nCase 4\n\n500\n\n1000\n\n2000\n\nFigure 4: The F1 scores of LSTC algorithm.\n\nidenti\ufb01ed as clusters by running the FindClusters al-\ngorithm. These \ufb01ve clusters are set to L2, L3, L4, L5\nand L6, respectively. We then run algorithm 2 over the \ufb01ve clusters and obtain the \ufb01nal result, shown\nin Figure 5.\nWe have a number of observations from the discovered structure, which are consistent with our\nunderstanding of the stock market. 1) All stocks are affected by a major latent variable (L1), which\nmay be related to government policy, the total risk in the market, etc. 2) Companies in the same sub-\n\nindex tend to gather under a common latent variable. For example, the cluster{X5, X8, X13} is in the\nFinance Sub-index; the cluster{X2, X3, X6} is in the Utilities Sub-index; the cluster{X1, X10, X11}\n\nis in the Properties Sub-index. 3) Ownership relations tend to have one common latent variable, i.e.,\nX1 holds about 50% of X10, and they have one common cause L4. Similarly, X5 holds about 60%\nof X8, and they have one common cause L5.\n\n8 Conclusion\n\nIn this paper, we proposed the so-called Triad constraints for estimating a particular type of linear\nnon-Gaussian latent variable model. The constraints help locate latent variables and identify their\ncausal structure. Then we apply these constraints to discover the whole structure of latent variables\nwith a two-phase algorithm. Theoretical analysis showed asymptotic correctness of the proposed\n\n8\n\n\fFigure 5: Causal diagram of the stocks\n\napproach under our assumptions. Experimental results further veri\ufb01ed the usefulness of our algorithm.\nOur future work is to 1) characterize properties of the results of our procedure for general causal\nstructures with latent variables and 2) further relax our assumptions for better applicability of the\nmethod.\n\nAcknowledgments\n\nThis research was supported in part by NSFC-Guangdong Joint Found (U1501254), Natural Science\nFoundation of China (61876043), Natural Science Foundation of Guangdong (2014A030306004,\n2014A030308008), Guangdong High-level Personnel of Special Support Program (2015TQ01X140),\nScience and Technology Planning Project of Guangzhou(201902010058) and Outstanding Young\nScienti\ufb01c Research Talents International Cultivation Project Fund of Department of Education of\nGuangdong Province(40190001). KZ would like to acknowledge the support by NIH under Contract\nNo. NIH-1R01EB022858-01, FAINR01EB022858, NIH-1R01LM012087, NIH-5U54HG008540-02,\nand FAINU54HG008540, by the United States Air Force under Contract No. FA8650-17-C-7715,\nand by NSF EAGER Grant No. IIS-1829681. The NIH, the U.S. Air Force, and the NSF are not\nresponsible for the views reported here. KZ also bene\ufb01ted from funding from Living Analytics\nResearch Center and Singapore Management University. Feng would like to thank Shohei Shimizu\nfor his insightful discussions and suggestions on the original draft. We appreciate the comments from\nanonymous reviewers, which greatly helped to improve the paper.\n\nReferences\nDavid Bartholomew, Fiona Steele, Ir Moustaki, and Jane Galbraith. The analysis and interpretation of multivari-\n\nate data for social scientists. Routledge (2 edition), 2008.\n\nBryant Chen, Daniel Kumor, and Elias Bareinboim.\n\nIdenti\ufb01cation and model testing in linear structural\nequation models using auxiliary variables. In Proceedings of the 34th International Conference on Machine\nLearning-Volume 70, pages 757\u2013766. JMLR. org, 2017.\n\nDavid Maxwell Chickering. Optimal structure identi\ufb01cation with greedy search. Journal of machine learning\n\nresearch, 3(Nov):507\u2013554, 2002.\n\nDiego Colombo, Marloes H Maathuis, Markus Kalisch, and Thomas S Richardson. Learning high-dimensional\n\ndirected acyclic graphs with latent and selection variables. The Annals of Statistics, pages 294\u2013321, 2012.\n\nH. Cram\u00e9r. Random variables and probability distributions. Cambridge University Press, Cambridge, 2nd\n\nedition, 1962.\n\nMathias Drton and Thomas S Richardson. Iterative conditional \ufb01tting for gaussian ancestral graph models. In\nProceedings of the 20th conference on Uncertainty in arti\ufb01cial intelligence, pages 130\u2013137. AUAI Press,\n2004.\n\nArthur Gretton, Kenji Fukumizu, Choon H Teo, Le Song, Bernhard Sch\u00f6lkopf, and Alex J Smola. A kernel\nstatistical test of independence. In Advances in neural information processing systems, pages 585\u2013592, 2008.\nPatrik O Hoyer, Shohei Shimizu, Antti J Kerminen, and Markus Palviainen. Estimation of causal effects using\nlinear non-gaussian causal models with hidden variables. International Journal of Approximate Reasoning,\n49(2):362\u2013378, 2008.\n\nBiwei Huang, Kun Zhang, and Bernhard Sch\u00f6lkopf. Identi\ufb01cation of time-dependent causal model: A gaussian\nprocess treatment. In Twenty-Fourth International Joint Conference on Arti\ufb01cial Intelligence, pages 3561\u2013\n3568, 2015.\n\n9\n\n3L13X2L5L1L8X5X12X7X4X6X3X2X1X4L11X10X6L9X14XSun Hung Kai Prop (0016.hk)11XCheung Kong (0001.hk)1XCLP Hldgs (0002.hk)2XHK & China Gas (0003.hk)3XWharf (Hldgs) (0004.hk)4XHSBC Hldg (0005.hk)5XHK Electric (0006.hk)6XHang Lung Dev (0010.hk)7XHang Seng Bank (0011.hk)8XHenderson Land (0012.hk)9XHutchison (0013.hk)10XSwire Pacific 'A' (0019.hk)12XBank of East Asia (0023.hk)13XCathay Pacific Air (0293.hk)14X\fBiwei Huang, Kun Zhang, Yizhu Lin, Bernhard Sch\u00f6lkopf, and Clark Glymour. Generalized score functions\nfor causal discovery. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge\nDiscovery & Data Mining, pages 1551\u20131560. ACM, 2018.\n\nAbram M Kagan, Calyampudi Radhakrishna Rao, and Yurij Vladimirovich Linnik. Characterization problems\n\nin mathematical statistics. 1973.\n\nErich Kummerfeld and Joseph Ramsey. Causal clustering for 1-factor measurement models. In Proceedings\nof the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages\n1655\u20131664. ACM, 2016.\n\nShohei Shimizu, Patrik O Hoyer, Aapo Hyv\u00e4rinen, and Antti Kerminen. A linear non-Gaussian acyclic model\n\nfor causal discovery. Journal of Machine Learning Research, 7(Oct):2003\u20132030, 2006.\n\nShohei Shimizu, Patrik O Hoyer, and Aapo Hyv\u00e4rinen. Estimation of linear non-gaussian acyclic models for\n\nlatent factors. Neurocomputing, 72(7-9):2024\u20132027, 2009.\n\nRicardo Silva and Richard Scheines. New d-separation identi\ufb01cation results for learning continuous latent\nvariable models. In Proceedings of the 22nd international conference on Machine learning, pages 808\u2013815.\nACM, 2005.\n\nRicardo Silva, Richard Scheine, Clark Glymour, and Peter Spirtes. Learning the structure of linear latent variable\n\nmodels. Journal of Machine Learning Research, 7(Feb):191\u2013246, 2006.\n\nCharles Spearman. Pearson\u2019s contribution to the theory of two factors. British Journal of Psychology. General\n\nSection, 19(1):95\u2013101, 1928.\n\nPeter Spirtes and Clark Glymour. An algorithm for fast recovery of sparse causal graphs. Social science\n\ncomputer review, 9(1):62\u201372, 1991.\n\nPeter Spirtes and Kun Zhang. Causal discovery and inference: concepts and recent methodological advances. In\n\nApplied informatics, volume 3, page 3. SpringerOpen, 2016.\n\nPeter Spirtes, Christopher Meek, and Thomas Richardson. Causal inference in the presence of latent variables\nand selection bias. In Proceedings of the Eleventh conference on Uncertainty in arti\ufb01cial intelligence, pages\n499\u2013506. Morgan Kaufmann Publishers Inc., 1995.\n\nPeter Spirtes, Clark N Glymour, and Richard Scheines. Causation, Prediction, and Search. MIT press, 2000.\nPeter Spirtes. Calculation of entailed rank constraints in partially non-linear and cyclic models. In Proceedings\nof the Twenty-Ninth Conference on Uncertainty in Arti\ufb01cial Intelligence, pages 606\u2013615. AUAI Press, 2013.\nSeth Sullivant, Kelli Talaska, Jan Draisma, et al. Trek separation for gaussian graphical models. The Annals of\n\nStatistics, 38(3):1665\u20131685, 2010.\n\nKun Zhang and Laiwan Chan. Minimal nonlinear distortion principle for nonlinear independent component\n\nanalysis. Journal of Machine Learning Research, 9(Nov):2455\u20132487, 2008.\n\nKun Zhang, Biwei Huang, Jiji Zhang, Clark Glymour, and Bernhard Sch\u00f6lkopf. Causal discovery from\nnonstationary/heterogeneous data: Skeleton estimation and orientation determination. In IJCAI: Proceedings\nof the Conference, pages 1347\u20131353, 2017.\n\n10\n\n\f", "award": [], "sourceid": 7042, "authors": [{"given_name": "Ruichu", "family_name": "Cai", "institution": "Guangdong University of Technology"}, {"given_name": "Feng", "family_name": "Xie", "institution": "Guangdong University of Technology"}, {"given_name": "Clark", "family_name": "Glymour", "institution": "Carnegie Mellon University"}, {"given_name": "Zhifeng", "family_name": "Hao", "institution": "Guangdong University of Technology"}, {"given_name": "Kun", "family_name": "Zhang", "institution": "CMU"}]}