NeurIPS 2020
### Generative causal explanations of black-box classifiers

### Meta Review

This paper presents a generative model to "explain" any given black-box classifier and its training dataset. Explanation is through a hidden factor that can control or intervene in the output of the classifier. The discovery is based on a objective with two terms: 1) a proposed Information Flow that denotes the causal effect from the hidden factor to the classifier output and 2) a distribution similarity to impose the discovered hidden factor can generate back the feature space.
Reviewers found this a borderline paper. After the discussion phase all reviewers are leaning towards acceptance. They pointed out as strengths that this is a very well-written paper, presenting a simple yet effective method, with extensive ablative experiments. However, they also pointed out several weaknesses, including a somewhat overstated storyline, which frames the method into causal inference when this does not seem to be really needed (as R1 points out and R2 agrees, their setup “renders almost all techniques of causal inference such as do-operation and counterfactuals trivial”). The author response alleviated some concerns (in particular from R1 and R4), but the feeling that causality is not really needed here still remains. The authors need to make it clearer why mutual information is a good metric for causal influence (as pointed out by R1 and R2), since MI measures a type of correlation. R3 also points out that “the theoretical results, propositions 2 and maybe also 3, are almost immediate observations rather than "surprising" results.“ It's not clear how the method would scale to problems where the classifier is complex and derives its output from many causally relevant factors. A discussion about this should be added (see R3’s comments).
Overall, this paper could be accepted but it is not outstanding and it needs considerable revision (doable in camera ready time) to incorporate the necessary improvements. I lean towards accepting but urge the authors to follow the reviewers’ suggestions to improve the paper. In particular, I second R1’s suggestion to update the storyline and compare and discuss the related work in disentanglement, which seems to be more linked to the main contribution of this paper.