__ Summary and Contributions__: This paper is about extending invariant causal prediction (ICP) to the
active online setting. ICP is based on what has been described as the Most Useful Tautology Ever
(MUTE): If we do not intervene on Y, then we do not intervene on Y.
In other words, by not intervening on the response variable Y,
then every other intervention is conditionally independent of Y given its parents.
The authors exploit this fact along with certain properties of
intervention stable sets to contribute 3 strategies for finding
maximally informative interventions for discovering the parents of the
response variable Y. These strategies consider both the population
setting, where we can assume infinite data, and in the finite setting,
where testing errors can occur. The authors claim these strategies
can be adapted for both linear and nonparametric settings; and for
single interventions, as well as batch interventions. The authors
then perform experiments that demonstrate several different
policy implementations of their strategies on simulated datasets from
randomly chosen linear SCMs using single interventions, where the
number of samples from each dataset ranges from the finite setting to
the population setting. In the population setting, they compare two of their policies against a
random intervention policy and in the finite setting, the compare seven policies against a random intervention policy.
They also compare the previously published Active Budgeted Causal Design Strategy (ABCD) against one of their policies.

__ Strengths__: Theoretical grounding:
The authors provide well-motivated examples and counterexamples,
proofs, algorithms of ICP and A-ICP, and well-documented python code implementations, not only of
A-ICP, but also ICP and ABCD in the appendices.
Empirical evaluation:
The results the authors report indicate robust performance
improvements in comparison to random strategies. They successfully
demonstrate their policies are capable of finding the direct causes of
the response variable with fewer interventions than a random policy.
Compared to the ABCD strategy, they are more likely to find the full
set of direct causes of the response variable. The relevance to
the NeurIPS community is greatly enhanced by providing python
notebooks that enable the reproduction of the figures and results.

__ Weaknesses__: Theoretical grounding:
The main manuscript is light on details about the algorithms due to
space limitations, but the appendices more than make up for it.
Empirical evaluation:
The authors argue that their goal is to achieve good performance after as few interventions as possible.
The ABCD strategy finds more direct causes with fewer interventions than any of the A-ICP policies.
If I had a limited budget, I would probably use ABCD.

__ Correctness__: The claims and methods appear to be correct. The reported results were a bit distorted, due to space limitations.
Evaluating the empirical methodology with the appendices seems thorough.

__ Clarity__: The paper assumes a lot of prior knowledge on the part of the
reader. Fortunately, the authors do provide enough breadcrumbs to
follow the citations to fill in missing gaps.
For this paper, clarity is somewhat impeded due to space limitations,
as there are a lot of results to report and not much space to report
them.
As a consequences, many of the most interesting contributions are
buried in the appendices, as described above.

__ Relation to Prior Work__: This paper stands on the theoretical results of two previously
published papers: ICP and "Stabilizing Variable Selection and
Regression" and more fundamentally, on MUTE. This distinguishes it
from prior work in the active learning field that rely on bayesian or
graphical approaches. The most similar work in terms of output type
is ABCD, but they differ quite a bit in their input assumptions, which
the authors try to resolve in order to make fair comparisons.

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: The authors proposed A-ICP, an active causal learning framework based on ICP [27] which assumes that the conditional distribution of the response, given its direct causes, remains invariant when intervening on arbitrary variables in the system. They proposed several active learning policies for performing interventions and used the results of intervetions to identify the parent set of the target variable.

__ Strengths__: - Comparing with previous work in experiment design, the proposed approach does not need to know the parameters of the model (like in Bayesian approach) or knowing MEC (like in graph-theoretic approach).
- In Proposition 1, a necessary condition for being an ancestor of the target variable is provided which is an interesting result.

__ Weaknesses__: ===After rebuttal===
I read the reubttal and I think the proposed method has computational complexity issues and it should be compared with the naive solution of inteverening on the target variable (estimating MEC from finite sample size). Thus, I decided to keep my score unchanged.
================
- The main assumption of ICP may not be satisified in real world scenarios. In particular, in the linear model, it means that the variance of exognous noise of target variable should not be changed across environments.
- It is not clear for me why we cannot intervene on the target variable and get enough samples to recover its parents. It might be a good idea to compare this solution with the proposed policies.
- It is required to analyze the time complexity of the proposed policies mentioned in Section 4.
- It is not clear whether each experiment consists of a single intervention or not in the proposed policies. It is better to clarify this issue.

__ Correctness__: It seems that the claims and proposed methods are correct. However, I did not go into details of the proofs.

__ Clarity__: The paper is generally well written.

__ Relation to Prior Work__: The authors compared the proposed framework with other approaches in Introduction section.

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: The paper proposes an algorithm for active learning in causal models.
In particular, they rely on Invariant Causal Prediction (ICP) to
select which experiments should be performed. They further
characterize causal effects on stable sets and then propose
intervention selection policies.

__ Strengths__: The paper considers a very important problem in causal inference: that
is active learning on causal graphs. In particular, the goal is to
construct efficiently the intervention stable sets as plausible causal
predictors. the key tool is invariance: while the full causal graph
may be non-identifiable, the conditional distribution of the response,
given its direct causes, remains invariant.
The proposed algorithm shows performance gains in empirical studies.
The empty set policy outperforms the others across the different
sample sizes and a large range of intervention numbers.

__ Weaknesses__: The relationship between the sample size of the intervention
experiments and the size of the causal graph. How much active learning
can help with the analysis will depend on the sample size obtained in
the intervention experiments relative to the size of the relevant
variables (or loosely the signal to noise ratio in the analysis).
While some experiments are performed in finite sample settings, it
remains unclear how the "signal-to-noise" ratio might affect the
relative performance of the algorithm.
Further, empirical studies are limited mostly to linear settings.
However, there exists some development in invariant causal prediction
in nonlinear settings. These days causal inference does routinely work
with nonlinear settings. How the conclusion drawn from linear settings
can be generalized more generally would be an important and
interesting discussion in the paper.

__ Correctness__: The paper seems correct.

__ Clarity__: The paper is quite well-written.

__ Relation to Prior Work__: The prior is adequately discussed.

__ Reproducibility__: Yes

__ Additional Feedback__: See above.
--------------
Thank you to the authors for the rebuttal. I have read the rebuttal and my evaluation stays the same.

__ Summary and Contributions__: This paper considers finding the parents of the variable of interest (a response variable Y) through the invariant causal prediction (ICP) principle. Often the given data from multiple environments are not suitable to pinpoint what the parents are. Hence, to achieve the goal, it is desired to obtain data from different environments through interventions. The authors define intervention-stable sets and explore its properties to finally propose a few criteria to obtain new data for the active learning of the parents under ICP. Empirical results show the usefulness of such criteria compared to a baseline.

__ Strengths__: The paper is simple(-looking) yet provides fundamental criteria of what can be done with ICP given the capability to obtain more data through active learning. The proposed criteria seem novel and results are sound. NeurIPS community recently embraces causality, and this paper, which is at the intersection of ML (prediction model) and causal inference, will be of interest to many members in the community.

__ Weaknesses__: It may be inherent to ICP that there exists no unobserved confounders between two variables (e.g., the response variable and its parent). But it is hard to conceive the cases where one can strongly claim that there exist no unobserved confounders.
L195 “For now, only single-variable interventions are considered.“ The whole point of active learning is getting a (near-)optimal number of interventions to achieve the goal (estimating the parents of the response variable). The sentence makes me think there will be a later section where intervening on multiple variables. It would be desirable to discuss at least what is challenging to work with multiple-variable interventions. How will it be different from limiting to single variable interventions?
An atomic intervention is different from stochastic intervention since they can cut the incoming edges onto the intervened variables. For example, the example A.3 in the supplementary material will not include edges from X0 and X1 to X2 if the intervention is atomic. Then, {S2} and {S3} will be in the mathbbS_e and the intersection will not include {X0, X1}. Hence, one cannot just casually say that “do” or “different” types of interventions are feasible. (Further, “conditional” intervention is deterministic and the application of d-separation is more subtle.)

__ Correctness__:
I checked theorems and their proofs and found no specific problem except that intervening on a variable does not remove incoming edges onto it.

__ Clarity__: The paper is written clearly in general.
footnote 1 says that authors consider any type of intervention but it is not well specified. What is “a different type”? (also see the weaknesses regarding other types)

__ Relation to Prior Work__: The paper summarized prior work well.

__ Reproducibility__: Yes

__ Additional Feedback__: I have read the authors' feedback.
=================================
Thanks for a simple and elegant new work on active learning with ICP.
Causal sufficiency can be clearly mentioned In L115–116, the authors mentioned that noise variables are jointly independent. If noise variables are unobserved variables as used in an SCM, jointly independence does not imply that Xs and Y are unconfounded. The causal sufficiency assumption seems more clear (to me).
minor
L319 One might ask whether
L323 It would be interesting