NeurIPS 2020

Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability

Meta Review

The Shapley value based methodology for explaining a model considers features as players whose coalitions result in establishing the prediction. Formally, the impact of a feature is estimated as the difference between the average Shapley value of the coalitions containing this feature, and that of the coalitions not containing the features. The paper describes how to refine the Shapley values based on (possibly incomplete) knowledge about the causal diagram relating the features, by averaging the Shapley value restricted to coalitions compatible with this knowledge. This paper generated a heated discussion. The rebuttal did clarify the distinction between the causality of the prediction process vs causality of data, which was helpful. An unaddressed concern is whether and when the causal or the anti-causal ordering should be used; this ambiguity undermines the clarity of the approach. More in-depth argumentation about the intuition (the restriction to coalitions compatible with causal notions) is required. More elaboration about "resolving variables" and "sensitive attributes" will definitely be appreciated. The AC inclines on the positive side; this paper is very original and timely.