NeurIPS 2020

Causal Shapley Values: Exploiting Causal Knowledge to Explain Individual Predictions of Complex Models

Meta Review

The Shapley value based methodology for explaining a model considers features as players whose coalitions result in establishing the prediction. Formally, the impact of a feature is estimated as the difference between the average Shapley value of the coalitions containing this feature, and that of the coalitions not containing the features. This paper introduces a difference between the direct and indirect effects of a feature, where the difference is whether the out-of-coalition variables are subject or not to an intervention on the target feature. This paper generated an intensive discussion. Some reviewers were enthusiastic about the paper and others found it extremely hard to follow. The discussion somewhat clarified the difference between causal and interventional Shapley values, and the point of replacing the simulated removal of features, with the simulated intervention on features. The terminology needs be simplified: distinguishing between asymmetric conditionnal Shapley and asymmetric interventional Shapley is confusing as the asymmetric interventional approach (this paper) considers the averaging over all permutations. The paper was found very inspirational by some reviewers. Some reviewers regret the experimental illustration to be very sketchy. The authors promised "additional empirical analyses on (deep) NNs", and the AC strongly expect that they will do so if the paper is accepted. The difference with "Asymmetric Shapley Values" should be discussed in more depth and illustrated on more complex cases than the biking. One would lke to see complex cases where it is *not* preferable to give the credit to the root cause only.