{"title": "Computationally and statistically efficient learning of causal Bayes nets using path queries", "book": "Advances in Neural Information Processing Systems", "page_first": 10931, "page_last": 10941, "abstract": "Causal discovery from empirical data is a fundamental problem in many scientific domains. Observational data allows for identifiability only up to Markov equivalence class. In this paper we first propose a polynomial time algorithm for learning the exact correctly-oriented structure of the transitive reduction of any causal Bayesian network with high probability, by using interventional path queries. Each path query takes as input an origin node and a target node, and answers whether there is a directed path from the origin to the target. This is done by intervening on the origin node and observing samples from the target node. We theoretically  show the logarithmic sample complexity for the size of interventional data per path query, for continuous and discrete networks. We then show how to learn the transitive edges using also logarithmic sample complexity (albeit in time exponential in the maximum number of parents for discrete networks), which allows us to learn the full network. We further extend our work by reducing the number of interventional path queries for learning rooted trees. We also provide an analysis of imperfect interventions.", "full_text": "Computationally and statistically ef\ufb01cient learning of\n\ncausal Bayes nets using path queries\n\nDepartment of Computer Science\n\nDepartment of Computer Science\n\nJean Honorio\n\nPurdue University\n\nWest Lafayette, IN, USA\njhonorio@purdue.edu\n\nKevin Bello\n\nPurdue University\n\nWest Lafayette, IN, USA\nkbellome@purdue.edu\n\nAbstract\n\nCausal discovery from empirical data is a fundamental problem in many scien-\nti\ufb01c domains. Observational data allows for identi\ufb01ability only up to Markov\nequivalence class. In this paper we \ufb01rst propose a polynomial time algorithm\nfor learning the exact correctly-oriented structure of the transitive reduction of\nany causal Bayesian network with high probability, by using interventional path\nqueries. Each path query takes as input an origin node and a target node, and\nanswers whether there is a directed path from the origin to the target. This is done\nby intervening on the origin node and observing samples from the target node. We\ntheoretically show the logarithmic sample complexity for the size of interventional\ndata per path query, for continuous and discrete networks. We then show how\nto learn the transitive edges using also logarithmic sample complexity (albeit in\ntime exponential in the maximum number of parents for discrete networks), which\nallows us to learn the full network. We further extend our work by reducing the\nnumber of interventional path queries for learning rooted trees. We also provide an\nanalysis of imperfect interventions.\n\n1\n\nIntroduction\n\nMotivation. Scientists in diverse areas (e.g., epidemiology, economics, etc.) aim to unveil causal\nrelationships within variables from collected data. For instance, biologists try to discover the causal\nrelationships between genes. By providing a speci\ufb01c treatment to a particular gene (origin), one can\nobserve whether there is an effect in another gene (target). This effect can be either direct (if the two\ngenes are connected with a directed edge) or indirect (if there is a directed path from the origin to the\ntarget gene).\nBayesian networks (BNs) are powerful representations of joint probability distributions. BNs are also\nused to describe causal relationships among variables [14]. The structure of a causal BN (CBN) is\nrepresented by a directed acyclic graph (DAG), where nodes represent random variables, and an edge\nbetween two nodes X and Y (i.e., X \u2192 Y ) represents that the former (X) is a direct cause of the\nlatter (Y ). Learning the DAG structure of a CBN is of much relevance in several domains, and is a\nproblem that has long been studied during the last decades.\nFrom observational data alone (i.e., passively observed data from an undisturbed system), DAGs\nare only identi\ufb01able up to Markov equivalence.1 However, since our goal is causal discovery, this\nis inadequate as two BNs might be Markov equivalent and yet make different predictions about the\n\n1Two graphs are Markov equivalent if they imply the same set of (conditional) independences. In general,\ntwo graphs are Markov equivalent iff they have the same structure ignoring arc directions, and have the same v-\nstructures [31]. (A v-structure consists of converging directed edges into the same node, such as X \u2192 Y \u2190 Z.)\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fconsequences of interventions (e.g., X \u2190 Y and X \u2192 Y are Markov equivalent, but make very\ndifferent assertions about the effect of changing X on Y ). In general, the only way to distinguish\ncausal graphs from the same Markov equivalence class is to use interventional data [10, 11, 19]. This\ndata is produced after performing an experiment (intervention) [21], in which one or several random\nvariables are forced to take some speci\ufb01c values, irrespective of their causal parents.\n\nRelated work. Several methods have been proposed for learning the structure of Bayesian networks\nfrom observational data. Approaches ranging from score-maximizing heuristics, exact exponential-\ntime score-maximizing, ordering-based search methods using MCMC, and test-based methods have\nbeen developed to name a few. The umbrella of tools for structure learning of Bayesian networks\ngo from exact methods (exponential-time with convergence/consistency guarantees) to heuristics\nmethods (polynomial-time without any convergence/consistency guarantee). [12] provide a score-\nmaximizing algorithm that is likelihood consistent, but that needs super-exponential time. [27, 3]\nprovide polynomial-time test-based methods that are structure consistent, but results hold only in\nthe in\ufb01nite-sample limit (i.e., when given an in\ufb01nite number of samples). [5] show that greedy hill-\nclimbing is structure consistent in the in\ufb01nite sample limit, with unbounded time. [34] show structure\nconsistency of a single network and do not provide uniform consistency for all candidate networks (the\nauthors discuss the issue of not using the union bound in their manuscript). From the active learning\nliterature, most of the works \ufb01rst \ufb01nd a Markov equivalence class (or assume that they have one)\nfrom purely observational data and then orient the edges by using as few interventions as possible.\n[19, 28] propose an exponential-time Bayesian approach relying on structural priors and MCMC.\n[10, 11, 25] present methods to \ufb01nd an optimal set of interventions in polynomial time for a class of\nchordal DAGs. Unfortunately, \ufb01nding the initial Markov equivalence class remains exponential-time\nfor general DAGs [4, 21]. [7] propose an exponential-time dynamic programming algorithm for\nlearning DAG structures exactly. [29] propose a constraint-based method to combine heterogeneous\n(observational and interventional) datasets but rely on solving instances of the (NP-hard) boolean\nsatis\ufb01ability problem. [8] analyzed the number of interventions suf\ufb01cient and in the worst-case\nnecessary to determine the structure of any DAG, although no algorithm or sample complexity\nanalysis was provided. Literature on learning structural equation models from observational data,\ninclude the work on continuous [23, 26] and discrete [22] additive noise models. Correctness was\nshown for the continuous case [23] but only in the in\ufb01nite-sample limit. [13] propose a method to\nlearn the exact observable graph by using O(log n) multiple-vertex interventions, where n is the\nnumber of variables, through the use of pairwise conditional independence test and assuming access\nto the post-interventional graph. However the size of the intervened set is O(n/2) which leads to a\nO(2n/2) number of experiments in the worst case. In contrast to this work, we perform single-vertex\ninterventions as a \ufb01rst step and then multiple-vertex interventions while keeping a small sample\ncomplexity. While this increments the number of interventions to n, we have a better control of the\nnumber of experiments.\nRemark 1. In this paper we consider one intervention as one selection of variables to intervene.\nHowever, we consider an experiment as the actual setting of values to the variables. For example, if a\nvariable X takes p different values, then one experiment is X taking one speci\ufb01c value. To intervene\none binary variable, it is common to make 2 experiments, one under treatment, and one under no\ntreatment.\n\nFor a discussion of learning from purely interventional data, as well as availability of purely interven-\ntional data, see Appendix A.\n\nContributions. We propose a polynomial time algorithm with provable guarantees for exact learn-\ning of the transitive reduction of any CBN by using interventional path queries. We emphasize that\nmodeling the problem of structure learning of CBNs as a problem of reconstructing a graph using\npath queries is also part of our contributions. We analyze the sample complexity for answering\nevery interventional path query and show that for CBNs of discrete random variables with maximum\ndomain size r, the sample complexity is O(log(nr)); whereas for CBNs of sub-Gaussian random\nvariables, the sample complexity is O(\u03c32\nub is an upper bound of the variable vari-\nances (marginally as well as after interventions). Then, we introduce a new type of query to learn the\ntransitive edges (i.e., the edges that are part of the true network but not of the transitive reduction),\nwhile the learning is not in polynomial-time for discrete CBNs in the worst case (exponential in the\nmaximum number of parents), we show that the sample complexity is still polynomial. We also\npresent two extensions: for learning rooted trees the number of path queries is reduced to O(n log n),\n\nub log n) where \u03c32\n\n2\n\n\fwhich is an improvement from the n2 for general DAGs. We also provide an analysis of imperfect\ninterventions. We summarize our main results in Table 1 and compare them to one of the closest\nrelated work [13].\n\nTable 1: Here n is the number of variables, \u03c32\nub is an upper bound of the variable variances (marginally\nas well as after interventions), t is the maximum number of parents, r is the maximum number of\nvalues a discrete variable can take, and B denotes the time complexity of an independence-test\noracle. Note that B \u2208 O(2n) in the worst case and not O(2t) because [13] can select an intervention\nset of n/2 nodes (see for example Appendix F.2). In this table, novel indicates that no prior work\nprovided results on the respective subject. Finally, C and D denote continuous and discrete variables\nrespectively.\n\nGraph\n\nVar.\n\nAlgorithms\n\nSample complexity\n\nGeneral\nDAGs\n\nRooted trees\n\nD\n\nC\nD\n\n1, 5, 3, 7 (our work) O(n22t log(nr)) (Novel, see Thms. 1, 3)\n\n1, 3 in [13]\n\n1, 6, 3, 8 (our work)\n\nSee Section 4\n\n-\n\nO(n2\u03c32\nub log n) (Novel, see Thms. 2, 4)\nO(n log2(nr)) (Novel, see Section 4)\n\nO(Btn2 log2 n) (B \u2208 O(2n))\n\nTime complexity\nO(n22t log(nr))\nO(n2\u03c32\nub log n)\nO(n log2(nr))\n\nGraph\n\nVar.\n\nAlgorithms\n\n# of interventions\n\n# of experiments\n\nGeneral\nDAGs\n\nRooted trees\n\nD\nC\nD\n\n1, 5, 3, 7 (our work)\n\n1, 3 in [13]\n\n1, 6, 3, 8 (our work)\n\nSee Section 4\n\nO(n2)\nO(log n)\nO(n2)\nO(n)\n\nO(2n log n) (see Appendix F.2.)\n\nO(n22t)\nO(n2)\nO(nr)\n\n2 Preliminaries\n\nIn this section, we introduce our formal de\ufb01nitions and notations. Vectors and matrices are\ndenoted by lowercase and uppercase bold faced letters respectively. Random variables are de-\nnoted by italicized uppercase letters and their values by lowercase italicized letters. Vector\n(cid:96)p-norms are denoted by (cid:107)\u00b7(cid:107)p. For matrices, (cid:107)\u00b7(cid:107)p,q denotes the entrywise (cid:96)p,q norm, i.e., for\n(cid:107)A(cid:107)p,q = (cid:107)((cid:107)(A1,1, . . . , Am,1)(cid:107)p, . . . ,(cid:107)(A1,n, . . . , Am,n)(cid:107)p)(cid:107)q.\nLet G = (V, E) be directed acyclic graph (DAG) with vertex set V = {1, . . . , n} and edge set\nE \u2282 V \u00d7 V, where (i, j) \u2208 E implies the edge i \u2192 j. For a node i \u2208 V, we denote \u03c0G(i) as the\nparent set of the node i. In addition, a directed path of length k from node i to node j is a sequence\nof nodes (i, v1, v2, . . . , vk\u22121, j) such that {(i, v1), (v1, v2), . . . , (vk\u22122, vk\u22121), (vk\u22121, j)} is a subset\nof the edge set E.\nLet X = {X1, . . . , Xn} be a set of random variables, with each variable Xi taking values in some\ndomain Dom[Xi]. A Bayesian network (BN) over X is a pair B = (G,PG) that represents a\ndistribution over the joint space of X. Here, G is a DAG, whose nodes correspond to the random\nvariables in X and whose structure encodes conditional independence properties about the joint\ndistribution, while PG quanti\ufb01es the network by specifying the conditional probability distributions\n(CPDs) P (Xi|X \u03c0G(i)). We use X \u03c0G(i) to denote the set of random variables which are parents of\nXi. A Bayesian network represents a joint probability distribution over the set of variables X, i.e.,\n\nP (X1, . . . , Xn) =(cid:81)n\n\ni=1 P (Xi|X \u03c0G(i)).\n\nViewed as a probabilistic model, a BN can answer any \u201cconditioning\u201d query of the form P (Z|E = e)\nwhere Z and E are sets of random variables and e is an assignment of values to E. Nonetheless, a\nBN can also be viewed as a causal model or causal BN (CBN) [21]. Under this perspective, the CBN\ncan also be used to answer interventional queries, which specify probabilities after we intervene in the\nmodel, forcibly setting one or more variables to take on particular values. The manipulation theorem\n[27, 21] states that one can compute the consequences of such interventions (perfect interventions) by\n\u201ccutting\u201d all the arcs coming into the nodes which have been clamped by intervention, and then doing\ntypical probabilistic inference in the \u201cmutilated\u201d graph (see Figure 1 as an example). We follow the\nstandard notation [21] for denoting the probability distribution of a variable Xj after intervening\nXi, that is, P (Xj|do(Xi = xi)). In this case, the joint distribution after intervention is given by\n\nP (X1, . . . , Xi\u22121, Xi+1, . . . , Xn|do(Xi = xi)) = 1[Xi = xi](cid:81)\n\nj(cid:54)=i P (Xj|X \u03c0G(j)).\n\nWe refer to CBNs in which all random variables Xi have \ufb01nite domain, Dom[Xi], as discrete CBNs.\nIn this case, we will denote the probability mass function (PMF) of a random variable as a vector.\n\n3\n\n\f1\n\n2\n\n3\n\n4\n\n5\n\n2\n\n1\n\nx4\n4\n\n3\n\n5\n\n6\n\n6\n\nFigure 1: (Left) A CBN of 6 variables, where the joint distribution, P (X), is factorized as(cid:81)\nx4](cid:81)\n\ni P (Xi|X \u03c0G(i)).\n(Right) The mutilated CBN after intervening X4 with value x4. Note that the edges {(1, 4), (2, 4)} are\nnot part of the CBN after the intervention, thus, the new joint is P (X|do(X4 = x4)) = 1[X4 =\n\ni(cid:54)=4 P (Xi|X \u03c0G(i)).\n\nThat is, a PMF, P (Y ), can be described as a vector p(Y ) \u2208 [0, 1]|Dom[Y ]| indexed by the elements\nof Dom[Y ], i.e., pj(Y ) = P (Y = j),\u2200j \u2208 Dom[Y ]. We refer to networks with variables that have\ncontinuous domains as continuous CBNs.\nNext, we formally de\ufb01ne transitive edges.\nDe\ufb01nition 1 (Transitive edge). Let G = (V, E) be a DAG. We say that an edge (i, j) \u2208 E is transitive\nif there exists a directed path from i to j of length greater than 1.\n\nThe algorithm for removing transitive edges from a DAG is called transitive reduction and it was\nintroduced in [1]. The transitive reduction of a DAG G, TR(G), is then G without any of its transitive\nedges. Our proposed methods also make use of path queries, which we de\ufb01ne as follows:\nDe\ufb01nition 2 (Path query). Let G = (V, E) be a DAG. A path query is a function QG : V \u00d7 V \u2192\n{0, 1} such that QG(i, j) = 1 if there exists a directed path in G from i to j, and QG(i, j) = 0\notherwise.\n\nGeneral DAGs are identi\ufb01able only up to their transitive reduction by using path queries.\nIn\ngeneral, DAGs can be non-identi\ufb01able by using path queries. We will use Q(i, j) to denote QG(i, j)\nsince for our problem, the DAG G is \ufb01xed (but unknown). For instance, consider the two graphs\nshown in Figure 2. In both cases, we have that Q(1, 2) = Q(1, 3) = Q(2, 3) = 1. Thus, by using\npath queries, it is impossible to discern whether the edge (1, 3) exists or not. Later in Subsection 3.3\nwe focus on the recovery of transitive edges, which requires a different type of query.\n\n1\n\n1\n\n2\n\n3\n\n2\n\n3\n\nFigure 2: Two directed acyclic graphs that produce the same answers when using path queries.\n\nHow to answer path queries is a key step in this work. Since we answer path queries by using a \ufb01nite\nnumber of interventional samples, we require a noisy path query, which is de\ufb01ned below.\nDe\ufb01nition 3 (\u03b4-noisy partially-correct path query). Let G = (V, E) be a DAG, and let QG be a path\nquery. Let \u03b4 \u2208 (0, 1) be a probability of error. A \u03b4-noisy partially-correct path query is a function\n\u02dcQG : V \u00d7 V \u2192 {0, 1} such that \u02dcQG(i, j) = QG(i, j) with probability at least 1 \u2212 \u03b4 if i \u2208 \u03c0G(j) or\nif there is no directed path from i to j.\n\nWe will use the term noisy path query to refer to \u03b4-noisy partially-correct path query. Note that\nDe\ufb01nition 3 requires a noisy path query to be correct only in certain cases, when one variable is\nparent of the other, or when there is no directed path between them. We do not require correctness\nwhen there is a directed path between i and j and i is not a parent of j, that is, when the path length\nis greater than 1. Note that the uncertainty of the exact recovery of the transitive reduction relies on\nanswering multiple noisy path queries.\n\n2.1 Assumptions\n\nHere we state the main set of assumptions used throughout our paper.\nAssumption 1. Let G = (V, E) be a DAG. All nodes in G are observable, furthermore, we can\nperform interventions on any node i \u2208 V.\n\n4\n\n\fAssumption 2 (Causal Markov). The data is generated from an underlying CBN (G,PG) over X.\nAssumption 3 (Faithfulness). The distribution P over X induced by (G,PG) satis\ufb01es no inde-\npendences beyond those implied by the structure of G. We also assume faithfulness in the post-\ninterventional distribution.\n\nAssumption 1 implies the availability of purely interventional data, and has been widely used in\nthe literature [19, 28, 11, 10, 25, 13]. We consider only observed variables because we perform\ninterventions on each node, thus, our method is robust to latent confounders. (See Appendix E for\nmore details). With Assumption 2, we assume that any population produced by a causal graph has the\nindependence relations obtained by applying d-separation to it, while with Assumption 3, we ensure\nthat the population has exactly these and no additional independences [27, 28, 25, 11, 29].\n\n3 Algorithms and Sample Complexity\n\nNext, we present our \ufb01rst set of results and provide a formal analysis on the sample complexity.\n\n3.1 Algorithm for Learning the Transitive Reduction of CBNs\n[13] show that by using O (log n) multiple-vertex interventions, one can recover the transitive\nreduction of a DAG. However, in this case, each set of intervened variables has a size of O(n/2),\nwhich means that the method of [13] has to perform a total of O(2n/2 log n) experiments, one for\neach possible setting of the O(n/2) intervened variables (see an example of this in Appendix D). Thus,\nin this part we work with single-vertex interventions to avoid the exponential number of experiments.\nWe can then learn the transitive reduction as follows (see mote details in Appendix B.1).\nAlgorithm 1. Start with a set of edges \u02c6E = \u2205. Then for each pair of nodes i, j \u2208 V, compute the\nnoisy path query \u02dcQ(i, j) and add the edge (i, j) to \u02c6E if the query returns 1. Finally, compute the\ntransitive reduction of \u02c6E in poly-time [1], and return \u02c6E.\n\nAs seen in the next section, each query is computed using single-vertex interventions. In fact, for\neach intervened node, we can compute n queries, i.e., while the number of queries is n2, the number\nof interventions is n. This number of single-vertex interventions is necessary in the worst case [8].\nIt is natural to ask what would be the bene\ufb01t of using path queries. A query \u02dcQ(i, j) can be interpreted\nas observing the variable Xj after intervening Xi. Under this viewpoint, if one could reduce\nthe number of queries for learning certain classes of graphs, then not only might the number of\ninterventions decrease but the number of variables to observe too. That is, if one knows a priori that\nthe topology of the graph belongs to a certain family of graphs then it may be possible to reduce\nthe number of queries (see for example Section 4). This is important in practice as both performing\ninterventions and observing variables might be costly. We \ufb01rst focus in learning general DAGs, in\nwhich a number of \u2126(n2) path queries is in the worst case necessary for any conceivable algorithm.\n(See Theorems 7 and 8 in [32]). Later we show that a number of O (n log n) noisy path queries2\nsuf\ufb01ces for learning rooted trees.\n\n3.2 Noisy Path Query Algorithm\n\nThe next two propositions are important for answering a path query.\nProposition 1. Let B = (G,PG) be a CBN with Xi, Xj \u2208 X being any two random variables in G.\nIf there is no directed path from i to j in G, then P (Xj|do(Xi = xi)) = P (Xj).\nProposition 2. Let B = (G,PG) be a CBN and let Xi and Xj be two random variables in G, such\nthat i \u2208 \u03c0G(j). Then, there exists xi and x(cid:48)\n\ni such that:\n\n1. P (Xj) (cid:54)= P (Xj|do(Xi = xi)) and 2. P (Xj|do(Xi = xi)) (cid:54)= P (Xj|do(Xi = x(cid:48)\ni))\n\nSee Appendix F for details of all proofs. Proposition 2 motivates the idea that we can search\nfor two different values of Xi to determine the causal dependence on Xj (Claim 2), which is\n\n2This path query requires a \u201cstronger\u201d version of De\ufb01nition 3. See for instance De\ufb01nition 6.\n\n5\n\n\farguably useful for discrete CBNs. Alternatively, we can use the expected value of Xj, since\nE[Xj] (cid:54)= E[Xj|do(Xi = xi)] implies that P (Xj) (cid:54)= P (Xj|do(Xi = xi)) (Claim 1).\nNext, we propose a polynomial time algorithm for answering a noisy path query. Algorithm 2 presents\nthe procedure in an intuitive way. Here, the type of statistic is motivated by Lemmas 1 and 2, and\nthe value of interventions and threshold t are motivated by Theorems 1 and 2. See Appendix B.2\n(Algorithms 5 and 6) for the speci\ufb01c details of the algorithms for discrete and continuous CBNs.\n\nAlgorithm 2 Noisy path query algorithm\nInput: Nodes i and j, number of interventional samples m, and threshold t.\nOutput: \u02dcQ(i, j)\n1: Intervene Xi by setting its value to xi \u2208 Dom[Xi], and observe m samples of Xj\n2: Compute a statistic of Xj and return 1 if it is greater than t.\n\nDiscrete random variables.\nIn this paper we use conditional probability tables (CPTs) as the\nrepresentation of the CPDs for discrete CBNs. Next, we present a theorem that provides the sample\ncomplexity of a noisy path query.\nTheorem 1. Let B = (G,PG) be a discrete CBN, such that each random variable Xj has a \ufb01nite\n\ndomain Dom[Xj], with(cid:12)(cid:12)Dom[Xj](cid:12)(cid:12) \u2264 r. Furthermore, let\n\n(cid:107)p(Xj|do(Xi = xi)) \u2212 p(Xj|do(Xi = x(cid:48)\n\ni))(cid:107)\u221e,\n\n\u03b3 = min\nj\u2208V\ni\u2208\u03c0G(j)\n\nxi,x(cid:48)\n\nmin\ni\u2208Dom[Xi]\n\np(Xj|do(Xi=xi))(cid:54)=p(Xj|do(Xi=x(cid:48)\n\nand let \u02c6G = (V, \u02c6E) be the learned graph by using Algorithm 1. Then for \u03b3 > 0 and a \ufb01xed probability\nof error \u03b4 \u2208 (0, 1), we have P\ninterventional samples are used per \u03b4-noisy partially-correct path query in Algorithm 5.\n\nTR(G) = \u02c6G\n\n(cid:17) \u2265 1 \u2212 \u03b4, provided that m \u2208 O( 1\n\n(cid:0)ln n + ln r\n\n(cid:1))\n\n(cid:16)\n\n\u03b32\n\n\u03b4\n\ni))\n\nIntuitively, the value \u03b3 characterizes the minimum causal effect among all the pair of parent-child\nnodes. Due to Assumption 3, and the fact that an edge represents a causal relationship, we have\n\u03b3 > 0. This value is used for deciding whether two empirical PMFs are equal or not in our path query\nalgorithm (Algorithm 5), which implements Claim 2 in Proposition 2. Finally, in practice, the value\nof \u03b3 is unknown3. Fortunately, knowing a lower bound of \u03b3 suf\ufb01ces for structure recovery.\n\nContinuous random variables. For continuous CBNs, our algorithm compares two empirical\nexpected values for answering a path query. This is related to Claim 1 in Proposition 2, since\nE[Xj] (cid:54)= E[Xj|do(Xi = xi)] implies P (Xj) (cid:54)= P (Xj|do(Xi = xi)). We analyze continuous CBNs\nwhere every random variable is sub-Gaussian. The class of sub-Gaussian variates includes for instance\nGaussian variables, any bounded random variable (e.g., uniform), any random variable with strictly\nlog-concave density, and any \ufb01nite mixture of sub-Gaussian variables. Note that sample complexity\nusing sub-Gaussian variables has been studied in the past for other models, such as Markov random\n\ufb01elds [24]. Next, we present a theorem that formally characterizes the class of continuous CBNs that\nour algorithm can learn, and provides the sample complexity for each noisy path query.\nTheorem 2. Let B = (G,PG) be a continuous CBN such that each variable Xj is a sub-Gaussian\nrandom variable with full support on R, with mean \u00b5j = 0 and variance \u03c32\nj . Let \u00b5j|do(Xi=z) and\nj|do(Xi=z) denote the expected value and variance of Xj after intervening Xi with value z, assuming\n\u03c32\nalso that the variables remain sub-Gaussian after performing an intervention. Furthermore, let\n\n\u00b5(B, z) =\n\nmin\n\nj\u2208V,i\u2208\u03c0G(j)\n\n(cid:32)\n\n(cid:12)(cid:12)(cid:12)\u00b5j|do(Xi=z)\n(cid:16)\n\n\u03c32(B, z) = max\n\n(cid:12)(cid:12)(cid:12) ,\n(cid:17) \u2265 1 \u2212 \u03b4, provided that m \u2208 O(\u03c32\n\nj\u2208V,i\u2208\u03c0G(j)\n\nmax\n\n(cid:33)\n\n\u03c32\nj|do(Xi=z), max\nj\u2208V\n\n\u03c32\nj\n\n,\n\nand let \u02c6G = (V, \u02c6E) be the learned graph by using Algorithm 1. If there exist an upper bound \u03c32\nub and \u00b5(B, z) \u2265 1, then for a \ufb01xed probability of error\nand a \ufb01nite value z such that \u03c32(B, z) \u2264 \u03c32\nub\n\u03b4 \u2208 (0, 1), we have P\n\u03b4 ) interventional\nsamples are used per \u03b4-noisy partially-correct path query in Algorithm 6.\n\nTR(G) = \u02c6G\n\nub log n\n\n3Several prior works from leading experts also have \u02dcO( 1\n\n\u03b3. See for instance, [2, 20, 24].\n\n6\n\n\u03b32 ) sample complexity for an unknowable constant\n\n\ftuple (G,P(W,S)) where each variable Xi can be written as follows: Xi =(cid:80)\n\nNote that the conditions \u00b5j = 0,\u2200j \u2208 V , and \u00b5(B, z) \u2265 1 are set to offer clarity in the derivations.\nOne could for instance set an upper bound for the magnitude of \u00b5j, assume \u00b5(B, z) to be greater than\nthis upper bound plus 1, and still have the same sample complexity. Finally, our motivation for giving\nsuch conditions is that of guaranteeing a proper separation of the expected values in cases where\nthere is effect of a variable Xi over another variable Xj, versus cases where there is no effect at all.\nNext, we de\ufb01ne the additive sub-Gaussian noise model (ASGN).\nDe\ufb01nition 4. Let G = (V, E) be a DAG, let W \u2208 Rn\u00d7n be the matrix of edge weights and let\ni \u2208 R+|i \u2208 V} be the set of noise variances. An additive sub-Gaussian noise network is a\nS = {\u03c32\nj\u2208\u03c0G(i) WijXj +\nNi, \u2200i \u2208 V, with Ni being an independent sub-Gaussian noise with full support on R, with zero\nmean and variance \u03c32\nRemark 2. Let B = (G,P(W,S)) be an ASGN network. We can rewrite the model in vector form as:\nx = Wx + n or equivalently x = (I \u2212 W)\u22121n, where x = (X1, . . . , Xn) and n = (N1, . . . , Nn)\nare the vector of random variables and the noise vector respectively. Additionally, we denote (cid:12)iW as\nthe weight matrix W with its i-th row set to 0. This means that we can interpret (cid:12)iW as the weight\nmatrix after performing and intervention on node i (mutilated graph).\n\ni for all i \u2208 V, and Wij (cid:54)= 0 iff (j, i) \u2208 E.\n\nWe now present a corollary that ful\ufb01lls the conditions presented in Theorem 2.\nCorollary 1 (Additive sub-Gaussian noise model). Let B = (G,P(W,S)) be an ASGN network\nmax,\u2200j \u2208 V. Also, let wmin = min(i,j)\u2208E |{(I \u2212 (cid:12)iW)\u22121}ji|,\nas in De\ufb01nition 4, such that \u03c32\nand wmax = max((cid:107)(I \u2212 W)\u22121(cid:107)2\u221e,2, maxi\u2208V(cid:107)(I \u2212 (cid:12)iW)\u22121(cid:107)2\u221e,2). If z = 1/wmin and \u03c32\nub =\nmaxwmax, then for a \ufb01xed probability of error \u03b4 \u2208 (0, 1), we have P (TR(G) = \u02c6G) \u2265 1 \u2212 \u03b4.\n\u03c32\nWhere \u02c6G = (V, \u02c6E) is the learned graph by using Algorithm 1, and provided that m \u2208 O(\u03c32\nub log n\n\u03b4 )\ninterventional samples are used per \u03b4-noisy partially-correct path query in Algorithm 6.\n\nj \u2264 \u03c32\n\nThe values of wmin and wmax follow the speci\ufb01cations of Theorem 2. In addition, the value of wmin\nis guaranteed to be greater than 0 because of the faithfulness assumption (see Assumption 3). For an\nexample about our motivation to use the faithfulness assumption, see Appendix D.\n\n3.3 Recovery of Transitive Edges\n\nIn this section, we show a method to recover the transitive edges by using multiple-vertex interventions.\nThis allows us to learn the full network. For this purpose, we present a new query de\ufb01ned as follows.\nDe\ufb01nition 5 (\u03b4-noisy transitive query). Let G = (V, E) be a DAG, and let \u03b4 \u2208 (0, 1) be a probability\nof error. A \u03b4-noisy transitive query is a function \u02dcTG : V\u00d7 V\u00d7 2V \u2192 {0, 1} such that \u02dcTG(i, j, S) = 1\nwith probability at least 1 \u2212 \u03b4 if (i, j) \u2208 E is a transitive edge (where the additional path from i to j\ngoes through S), and 0 otherwise. Here S \u2286 \u03c0G(j) is an auxiliary set necessary to answer the query,\nin order to block any in\ufb02uence from i to S, and to unveil the direct effect from i to j.\n\nAlgorithms 7 and 8 (see Appendix B.3) show how to answer a transitive query for discrete and\ncontinuous CBNs respectively. Both algorithms are motivated on a property of CBNs, that is, \u2200i \u2208 V\nand for every set S disjoint of {i, \u03c0G(i)}, we have P (Xi|do(X\u03c0G(i) = x\u03c0G(i)), do(XS = xS)) =\nP (Xi|do(X\u03c0G(i) = x\u03c0G(i))). Thus, both algorithms intervene all the variables in S, if S is the parent\nset of j, then i will have no effect on j and they return 0, and 1 otherwise.\nRecall that by using Algorithm 1 we obtain the transitive reduction of the CBN, thus, we have the true\ntopological ordering of the CBN, and also for each node i \u2208 V, we know its parent set or a subset\nof it. Using these observations, we can cleverly set the input i, j, and S of a noisy transitive query,\nas done in Algorithm 3. It is clear that Algorithm 3 makes O(n2) noisy transitive queries in total.\nThe time complexity to answer a transitive query for a discrete CBN is exponential in the maximum\nnumber of parents in the worst case. However, the sample complexity for queries in discrete and\ncontinuous CBNs remains polynomial in n as prescribed in the following theorems.\nTheorem 3. Let B = (G,PG) be a discrete CBN, such that each random variable Xj has a \ufb01nite\n\ndomain Dom[Xj], with(cid:12)(cid:12)Dom[Xj](cid:12)(cid:12) \u2264 r. Furthermore, let\n\n\u03b3 = min\nj\u2208V\n\nS\u2286\u03c0G(j),|S|\u22651\n\np(Xj|do(XS=xS))(cid:54)=p(Xj|do(XS=x(cid:48)\n\nS))\n\nmin\n\nxS,x(cid:48)\n\nS\u2208\u00d7i\u2208SDom[Xi]\n\n(cid:107)p(Xj|do(XS = xS))\u2212p(Xj|do(XS = x(cid:48)\n\nS))(cid:107)\u221e,\n\n7\n\n\fAlgorithm 3 Learning the transitive edges by using noisy transitive queries\nInput: Transitively reduced DAG \u02c6G = (V, \u02c6E) (output of Algorithm 1)\nOutput: DAG \u02dcG = (V, \u02dcE)\n1: \u03a8 \u2190 TopologicalOrder( \u02c6G); \u02c6\u03c0(i) \u2190 {u \u2208 V|(u, i) \u2208 \u02c6E} (current parents of i); \u02dcE \u2190 \u02c6E\n2: for j = 2 . . . n do\n3:\n4:\n\nif \u02dcT (\u03a8i, \u03a8j, \u02c6\u03c0(\u03a8j)) = 1 then \u02dcE \u2190 \u02dcE \u222a {(\u03a8i, \u03a8j)} and \u02c6\u03c0(\u03a8j) \u2190 \u02c6\u03c0(\u03a8j) \u222a \u03a8i\n\nfor i = j \u2212 1, j \u2212 2, . . . 1 do\n\n(cid:32)\n\n(cid:33)\n\n(cid:16)\n\n(cid:17) \u2265 1 \u2212 \u03b4, provided that m \u2208 O( 1\n\nG = \u02dcG\n\nand let \u02dcG = (V, \u02dcE) be the output of Algorithm 3. Then for \u03b3 > 0 and a \ufb01xed probability of error\n\u03b4 \u2208 (0, 1), we have P\nsamples are used per \u03b4-noisy transitive query in Algorithm 7.\nTheorem 4. Let B = (G,PG) be a continuous CBN such that each variable Xj is a sub-Gaussian\nrandom variable with full support on R, with mean \u00b5j = 0 and variance \u03c32\nj . Let \u00b5j|do(XS=1z) and\nj|do(XS=1z) denote the expected value and variance of Xj after intervening each node of XS with\n\u03c32\nvalue z. Furthermore, let\n\u00b5(B, z1, z2) =\n\n(cid:12)(cid:12)(cid:12)\u00b5j|do(XS\u2212{i}=1z1,Xi=z2)\n\n(cid:12)(cid:12)(cid:12) ,\n\nmin\n\nj\u2208V,S\u2286\u03c0G(j),|S|\u22652,i\u2208S\n\n\u03b32\n\n(cid:0)ln n + ln r\n\n(cid:1)) interventional\n\n\u03b4\n\n\u03c32(B, z1, z2) = max\n\nmax\nj\u2208V\n\n\u03c32\nj ,\n\nmax\n\nj\u2208V,S\u2286\u03c0G(j),|S|\u22652,i\u2208S\n\n\u03c32\nj|do(XS\u2212{i}=1z1,Xi=z2)\n\n,\n\nand let \u02dcG = (V, \u02dcE) be the output of Algorithm 3. If there exist an upper bound \u03c32\nvalues z1, z2 such that \u03c32(B, z1, z2) \u2264 \u03c32\n\u03b4 \u2208 (0, 1), we have P\nare used per \u03b4-noisy transitive query in Algorithm 8.\n\nub and \ufb01nite\nub and \u00b5(B, z1, z2) \u2265 1, then for a \ufb01xed probability of error\n\u03b4 ) interventional samples\n\n(cid:17) \u2265 1 \u2212 \u03b4, provided that m \u2208 O(\u03c32\n\nub log n\n\nG = \u02dcG\n\n(cid:16)\n\nNext, we show that ASGN networks can ful\ufb01ll the conditions in Theorem 4.\nCorollary 2. Let B = (G,P(W,S)), and \u03c32\nmax follow the same de\ufb01nition as in Corollary 1. Let\nwmin = minij |Wij|, and wmax = max((cid:107)(I \u2212 W)\u22121(cid:107)2\u221e,2, maxj\u2208V,S\u2286\u03c0G(j)(cid:107)(I \u2212 (cid:12)SW)\u22121(cid:107)2\u221e,2).\nmaxwmax, then for a \ufb01xed probability of error \u03b4 \u2208 (0, 1), we\nIf z1 = 0, z2 = 1/wmin, and \u03c32\nhave P (G = \u02dcG) \u2265 1 \u2212 \u03b4, provided that m \u2208 O(\u03c32\n\u03b4 ) interventional samples are used per\n\u03b4-noisy transitive query in Algorithm 8.\n\nub = \u03c32\n\nub log n\n\n4 Extensions\n\n\u03b4\n\n1\n\n1\n\n(1/2\u2212\u0001)2 n log dn\n\n(1/2\u2212\u0001)2 dn log2 n log dn\n\nLearning rooted trees. Here we make use of the results in [32], for rooted trees of node degree\nat most d. Theorem 4 in [32] states that for a \ufb01xed probability error \u03b4 \u2208 (0, 1), one can reconstruct\na rooted tree with probability 1 \u2212 \u03b4 in O( 1\n\u03b4 ) time provided that a total of\nO(\n\u03b4 ) noisy path queries are used, where \u0001 relates to the con\ufb01dence of the noisy path\nquery. The number of queries is improved with respect to the n2 queries used for general DAGs in\nthe previous section. Finally, recall that in the previous section we made use of partially-correct path\nqueries, for this part we require a stronger version of noisy path query, which is de\ufb01ned below.\nDe\ufb01nition 6 (\u0001-noisy path query). Let G = (V, E) be a DAG, and let QG be a path query. Let\n\u0001 \u2208 (0, 1/2) be a probability of error. A \u0001-noisy path query is a function \u02dcQG : V \u00d7 V \u2192 {0, 1}\nsuch that \u02dcQG(i, j) = QG(i, j) with probability at least 1 \u2212 \u0001, and \u02dcQG(i, j) = 1 \u2212 QG(i, j) with\nprobability at most \u0001.\n\nThe following states the sample complexity for exact learning of rooted trees in the discrete case.\nProposition 3. Let B = (G,PG) be a discrete CBN, such that each random variable Xj has a \ufb01nite\n\ndomain Dom[Xj], with(cid:12)(cid:12)Dom[Xj](cid:12)(cid:12) \u2264 r. Furthermore, let\n\n8\n\n\f\u03b3 = min\nj\u2208V\ni\u2208V\n\nxi,x(cid:48)\n\nmin\ni\u2208Dom[Xi]\n\np(Xj|do(Xi=xi))(cid:54)=p(Xj|do(Xi=x(cid:48)\n\ni))\n\n(cid:107)p(Xj|do(Xi = xi)) \u2212 p(Xj|do(Xi = x(cid:48)\n\ni))(cid:107)\u221e,\n\nand let \u02c6G = (V, \u02c6E) be the learned graph by using Algorithm 7 in [32]. Then for \u03b3 > 0 and a \ufb01xed\nprobability of error \u03b4 \u2208 (0, 1), we have P\ninterventional samples are used per \u03b4-noisy path query in Algorithm 5.\n\nG = \u02c6G\n\n\u03b32\n\n\u03b4\n\n(cid:17) \u2265 1\u2212\u03b4, provided that m \u2208 O( 1\n\n(cid:0)ln n + ln r\n\n(cid:1))\n\n(cid:16)\n\nWe use the same Algorithm 5 to answer a \u0001-noisy path query. The difference is that now \u03b3 represents\nthe minimum causal effect among all pair nodes and not only parent-child nodes.\n\nOn Imperfect Interventions. Here we state some results on imperfect interventions. In Appendix\nC, we show that the sample complexity for discrete CBNs is scaled by \u03b1\u22121, where \u03b1 accounts for the\ndegree of uncertainty in the intervention. While for CBNs of sub-Gaussian random variables, the\nsample complexity still has the same dependence on an upper bound of the variances.\n\n5 Experiments\n\nIn Appendix G.1, we tested our algorithms for perfect and imperfect interventions in synthetic\nnetworks, in order to empirically show the logarithmic phase transition of the number of interventional\nsamples (see Figure 3 as an example). Appendix G.2 shows that in several benchmark BNs, most\nof the graph belongs to its transitive reduction, meaning that one can learn most of the network in\npolynomial time. Appendix G.3 shows experiments on some of these benchmark networks, using the\naforementioned algorithms and also our algorithm for learning transitive edges, thus recovering the\nfull networks. Finally, in Appendix G.4, as an illustration of the availability of interventional data,\nwe show experimental evidence using three gene perturbation datasets from [33, 9].\n\nFigure 3:\n(Left) Probability of correct structure recovery of the transitive reduction of a discrete CBN vs.\nnumber of samples per query, where the latter was set to eC log nr, with all CBNs having r = 5 and \u03b3 \u2265 0.01.\n(Right) Similarly, for continuous CBNs, the number of samples per query was set to eC log n, with all CBNs\n2,\u221e \u2264 20. Finally, we observe that there is a sharp phase transition from recovery failure\nhaving (cid:107)(I \u2212 W)\u22121(cid:107)2\nto success in all cases, and the log n scaling holds in practice, as prescribed by Theorems 1, 2.\n\n6 Future Work\n\nThere are several ways of extending this work. For instance, it would be interesting to analyze other\nclasses of interventions with uncertainty, as in [7]. For continuous CBNs, we opted to use expected\nvalues and not to compare continuous distributions directly. The fact that the conditioning is with\nrespect to a continuous random variable makes this task more complex than the typical comparison\nof continuous distributions. Still, it would be interesting to see whether kernel density estimators [16]\ncould be bene\ufb01cial.\n\nReferences\n[1] Aho, A., Garey, M., and Ullman, J. The transitive reduction of a directed graph. SIAM Journal\n\non Computing, 1(2):131\u2013137, 1972.\n\n9\n\n024681012141600.20.40.60.81012345678900.20.40.60.81\f[2] Brenner, E. and Sontag, D. SparsityBoost: A new scoring function for learning Bayesian\n\nnetwork structure. UAI, 2013.\n\n[3] Cheng, J., Greiner, R., Kelly, J., Bell, D., and Liu, W. Learning Bayesian networks from data:\n\nAn information-theory based approach. Arti\ufb01cial Intelligence Journal, 2002.\n\n[4] Chickering, D. Learning Bayesian networks is NP-complete.\n\n121\u2013130. Springer, 1996.\n\nIn Learning from data, pp.\n\n[5] Chickering, D. and Meek, C. Finding optimal Bayesian networks. UAI, 2002.\n[6] Dvoretzky, A., Kiefer, J., and Wolfowitz, J. Asymptotic minimax character of the sample\ndistribution function and of the classical multinomial estimator. The Annals of Mathematical\nStatistics, pp. 642\u2013669, 1956.\n\n[7] Eaton, D. and Murphy, K. Exact Bayesian structure learning from uncertain interventions. In\n\nArti\ufb01cial Intelligence and Statistics, pp. 107\u2013114, 2007.\n\n[8] Eberhardt, F., Glymour, C., and Scheines, R. On the number of experiments suf\ufb01cient and in the\nworst case necessary to identify all causal relations among N variables. In UAI, pp. 178\u2013184.\nAUAI Press, 2005.\n\n[9] Harbison, Christopher T, Gordon, D Benjamin, Lee, Tong Ihn, Rinaldi, Nicola J, Macisaac,\nKenzie D, Danford, Timothy W, Hannett, Nancy M, Tagne, Jean-Bosco, Reynolds, David B,\nYoo, Jane, et al. Transcriptional regulatory code of a eukaryotic genome. Nature, 2004.\n\n[10] Hauser, A. and B\u00fchlmann, P. Two optimal strategies for active learning of causal models from\ninterventions. In Proceedings of the 6th European Workshop on Probabilistic Graphical Models,\n2012.\n\n[11] He, Y. and Geng, Z. Active learning of causal networks with intervention experiments and\n\noptimal designs. Journal of Machine Learning Research, 9(Nov), 2008.\n\n[12] H\u00f6ffgen, K. Learning and robust learning of product distributions. COLT, 1993.\n[13] Kocaoglu, Murat, Shanmugam, Karthikeyan, and Bareinboim, Elias. Experimental design for\nlearning causal graphs with latent variables. In Advances in Neural Information Processing\nSystems, pp. 7021\u20137031, 2017.\n\n[14] Koller, D. and Friedman, N. Probabilistic Graphical Models: Principles and Techniques. The\n\nMIT Press, 2009.\n\n[15] LeGall, F. Powers of tensors and fast matrix multiplication.\n\nIn Proceedings of the 39th\ninternational symposium on symbolic and algebraic computation, pp. 296\u2013303. ACM, 2014.\n[16] Liu, H., Wasserman, L., and Lafferty, J. Exponential concentration for mutual information\nestimation with application to forests. In Advances in Neural Information Processing Systems,\npp. 2537\u20132545, 2012.\n\n[17] Louizos, Christos, Shalit, Uri, Mooij, Joris, Sontag, David, Zemel, Richard, and Welling, Max.\n\nCausal effect inference with deep latent-variable models. NIPS, 2017.\n\n[18] Massart, P. The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. The Annals of\n\nProbability, pp. 1269\u20131283, 1990.\n\n[19] Murphy, K. Active learning of causal Bayes net structure. Technical report, 2001.\n[20] Obozinski, Guillaume R, Wainwright, Martin J, and Jordan, Michael I. High-dimensional\nsupport union recovery in multivariate regression. In Advances in Neural Information Processing\nSystems, 2009.\n\n[21] Pearl, J. Causality: Models, Reasoning and Inference. Cambridge University Press, 2nd edition,\n\n2009.\n\n[22] Peters, J., Janzing, D., and Sch\u00f6lkopf, B. Identifying cause and effect on discrete data using\n\nadditive noise models. In AIStats, pp. 597\u2013604, 2010.\n\n[23] Peters, J., Mooij, J., Janzing, D., Sch\u00f6lkopf, B., et al. Causal discovery with continuous additive\n\nnoise models. Journal of Machine Learning Research, 15(1):2009\u20132053, 2014.\n\n[24] Ravikumar, P., Wainwright, M., Raskutti, G., B.Yu, et al. High-dimensional covariance estima-\ntion by minimizing (cid:96)1-penalized log-determinant divergence. Electronic Journal of Statistics, 5:\n935\u2013980, 2011.\n\n10\n\n\f[25] Shanmugam, K., Kocaoglu, M., Dimakis, A., and Vishwanath, S. Learning causal graphs with\nsmall interventions. In Advances in Neural Information Processing Systems, pp. 3195\u20133203,\n2015.\n\n[26] Shimizu, Shohei, Hoyer, Patrik O, Hyv\u00e4rinen, Aapo, and Kerminen, Antti. A linear non-\ngaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7(Oct):\n2003\u20132030, 2006.\n\n[27] Spirtes, P., Glymour, C., and Scheines, R. Causation, Prediction and Search. The MIT Press,\n\nsecond edition edition, 2000.\n\n[28] Tong, S. and Koller, D. Active learning for structure in Bayesian networks. In International\n\njoint conference on arti\ufb01cial intelligence, 2001.\n\n[29] Trianta\ufb01llou, S. and Tsamardinos, I. Constraint-based causal discovery from multiple interven-\ntions over overlapping variable sets. Journal of Machine Learning Research, 16:2147\u20132205,\n2015.\n\n[30] Tsamardinos, I., Brown, L., and Aliferis, C. The max-min hill climbing Bayesian network\n\nstructure learning algorithm. Machine Learning, 2006.\n\n[31] Verma, T. and Pearl, J. Equivalence and synthesis of causal models. In Proceedings of the Sixth\nAnnual Conference on Uncertainty in Arti\ufb01cial Intelligence, UAI \u201990. Elsevier Science Inc.,\n1991.\n\n[32] Wang, Z. and Honorio, J. Reconstructing a bounded-degree directed tree using path queries.\n\narXiv preprint arXiv:1606.05183, 2016.\n\n[33] Xiao, Yun, Gong, Yonghui, Lv, Yanling, Lan, Yujia, Hu, Jing, Li, Feng, Xu, Jinyuan, Bai,\nJing, Deng, Yulan, Liu, Ling, et al. Gene perturbation atlas (gpa): a single-gene perturbation\nrepository for characterizing functional mechanisms of coding and non-coding genes. Scienti\ufb01c\nreports, 2015.\n\n[34] Zuk, O., Margel, S., and Domany, E. On the number of samples needed to learn the correct\n\nstructure of a Bayesian network. UAI, 2006.\n\n11\n\n\f", "award": [], "sourceid": 7994, "authors": [{"given_name": "Kevin", "family_name": "Bello", "institution": "Purdue University"}, {"given_name": "Jean", "family_name": "Honorio", "institution": "Purdue University"}]}