NeurIPS 2020

### Review 1

Strengths: - To my knowledge, the problem formulation is novel; I don't believe other works have specifically investigated the issue of distorted attributions in compressed models. - The methodology makes sense, and the empirical results seem convincing in terms of matching the attributions to the parent model and also obtaining higher-quality attributions w.r.t. the image segmentation ground truth.

Correctness: My main concern about the methodology, as described in the previous section, is that the child model may be superficially mimicking the saliency map of the parent model without actually adopting the parent model's decision-making process (particularly of a "local" explanation method like Grad-CAM is used). Perturbation experiments could address this concern. However, I do not expect this phenomenon to be likely, so it is not a very major concern. Another small concern: in line 193, the authors describe applying a ReLU operation to the channel importance obtained from Grad-CAM. In general, discarding negative gradients (as is done in, e.g., Guided Backprop and DeconvNet) has been shown to diminish the quality of the attributions (e.g. by making them prone to failing sanity checks; https://arxiv.org/abs/1912.09818). I am thus somewhat concerned that the authors felt the need to discard negative channel importance here, because negative importance can still be relevant for classification.

Clarity: For the most part, yes. One piece I was unclear on was whether only correctly-predicted examples from the parent model were used for regularization during training (the authors wrote that "we only consider the samples that each model correctly predicted" in the context of the test set because those are the attributions that are likely to be reliable; I was unsure whether this was also leveraged during training).

Relation to Prior Work: To my knowledge, yes (I am not very familiar with the model compression literature, hence my lower confidence rating).

Reproducibility: Yes

Additional Feedback: I have mentioned some suggestions under "Weaknesses"; what's listed here are more minor issues: (1) Assuming that the Grad-CAM backpropagation was started w.r.t. the logits of the softmax layer, it may be a good idea to normalize the logits such that the mean across all classes sums to 0. Normalizing the logits of a softmax does not change the output of a softmax, but it would change the attributions (in that, if a particular channel has the same contribution to all softmax logits, it is effectively contributing to none of the softmax logits). This is also mentioned in the section "Adjustments for softmax layers" in the DeepLIFT paper: https://arxiv.org/pdf/1704.02685.pdf (2) I think it is worth reflecting on the extent to which image segmentation is a good "ground truth" explanation, because I think background pixels can often be relevant for a class prediction (for example, if the background is green, then a prediction of "cow" is more likely than if the background is pink). That said, I agree that, broadly speaking, the "pointing game" measure is likely valid (i.e. the peak attribution should fall within the segmented region). (3) (minor) I would be curious how Grad-CAM (which averages the gradients over a channel) performs relative to simply doing "activation*gradient" at each individual neuron in the convolutional layer.

### Review 2

Summary and Contributions: The paper starts from the observation that compressed networks can produce attribution maps significantly different from the corresponding original uncompressed networks, despite having comparable accuracy. The authors argue that this is problematic, as similar accuracy does not necessarily mean that the two networks process information in the same way. They propose an attribution-based regularization term to steer the fine-tuning towards local minima that have both high predictive accuracy and good matching of attributions between the original and the compressed network.

Strengths: As neural networks will be increasingly used in safety-critical domains, the problem of understanding how they process input information is important. To the best of my knowledge, the observation that compression techniques might shift the attention of the network towards less relevant input features, despite preserving the model accuracy, is novel and therefore potentially relevant for the XAI and security community. The authors show empirically on VOC and ImageNet that it is possible to mitigate the problem of "attribution shift" employing attributions as regularization term, and that this often produces better results than simple activation matching as proposed by Zagoruyko et al. 2017. The paper and the proposed method are easy to understand. The proposed regularization technique is based on a well-known attribution method and easy to implement. The framework can be readily applied to several compression techniques, such as structured/unstructured pruning and KD.

Correctness: Some claims require clarification. In particular, the connection between Stochastic matching and dropout. Claims such as "preserves the interpretation of the original networks" and "signiﬁcant performance gains" cannot be assessed without error bounds in the experimental section.

Clarity: While the paper is somehow understandable, I have the feeling that the paper would benefit from professional proofreading as several sentences sound odd to me (as a non-native speaker).

Relation to Prior Work: The paper should better explain what is inherited from Zagoruyko et al. as, currently, it seems that Zagoruyko et al. only investigated equally weighted activation matching, while actually they also investigated sensitivity-based regularizers. There is also a line of works [1-3] that investigated training neural networks using attribution as regularizers. The authors might want to compare and contrast with these works. [1] https://arxiv.org/abs/1703.03717 [2] https://arxiv.org/abs/1906.10670 [3] https://arxiv.org/pdf/1909.13584.pdf

Reproducibility: Yes

### Review 3

Summary and Contributions: This paper highlights the surprising fact that network compression, while maintaining the accuracy of the original network, changes the regions of attention of the network, making it less explainable. This is addressed by introducing a regularization term that encourages the attribution maps of the student network to match that of the teacher one.

Strengths: - As far as I am aware of, this paper is the first work to notice that the regions on which a network focuses are affected by compression/distillation. I find this surprising and interesting. - The experiments demonstrate convincingly that the proposed SWA regularizer addresses this issue. - The paper is clearly written and could be relatively easily reproduced (let alone the fact that the code is provided).

Weaknesses: Technical novelty: As acknowledged by the authors, [4] proposed a very similar regularizer (see Eq. 2 in [4]). In fact, the form of Eq. 2 in [4] is quite general, as any function F() could potentially be used. In practice, the authors of [4] studied several functions, i.e., not only the one referred to as EWA in this submission, although this one was the best-performing one in [4]. Altogether, I acknowledge that the motivation behind [4] was different from the one here and that the proposed formulation is somewhat more general and more effective than the one in [4]. However, I feel that the technical novelty remains on the weak side. Presentation: While the paper is clearly written, it could benefit from some additional analysis. In particular: - As mentioned above, I do appreciate the interest of observing that attribution maps are affected by compression. However, I feel that the authors fail to study and explain why this happens. In particular, in the context of pruning, I find particularly surprising that the fine-tuning stage does not address this issue. I would be glad to hear some hypothetical explanations from the authors. - What is the motivation behind the rectification function V()? Why does one need it, and why is a ReLU an appropriate choice (better than alternatives)? - What is the motivation behind the stochastic matching approach? Experiments: The experiments are in general convincing. However: - It would be interesting to study the sensitivity to \beta. - The additional results on ImageNet in the supplementary material (Table 3) show that the compressed networks have a higher AUC than the full network. Can the authors explain this? #### POST-REBUTTAL COMMENTS #### I would like to thank the authors for their responses. I acknowledge that there are some differences w.r.t. [4]. However, I still feel that the similarities make the novelty on the weak side for NeurIPS. Furthermore, while the rebuttal indeed clarifies a few points, others remain unclear, such as why fine-tuning post-pruning doesn't solve the problem by itself, the motivation behind the function V(.) and the influence of \beta. Therefore, while this paper is essentially borderline, I tend to remain slightly on the rejection side.

Correctness: The claims and methodology are correct.

Clarity: The paper is clearly written, but one point nonetheless bothers me: At the beginning of Section 4, the authors mention that Grad-Cam is used to generate the attribution maps. However, it seems to me that these maps depend on the regularizer used, i.e., they are generated using Eq. 3 for EWA, using Eq. 4 for SWA, and using the stochastic variant for SSWA. Is Grad-Cam used for some other purpose?

Relation to Prior Work: The relation to prior work is acknowledged, although, as discussed above, the technical novelty over [4] is limited.

Reproducibility: Yes

Additional Feedback: - Strictly speaking, experimental evaluation is not a contribution and should thus not be listed as such in the introduction. - In unstructured pruning, is the regularizer used in every fine-tuning step?

### Review 4

Summary and Contributions: This paper aims to compress the neural network by preserving visual attribution. The authors observe that existing network compression methods only focus on simulating the performance of the target network, so their attribution does not match that of the target network. An attribution-aware compression method is proposed and evaluated on PASCAL VOC 2012 and ImageNet datasets, under several network compression techniques; structured pruning, unstructured pruning, and knowledge distillation.

Strengths: + This paper is well-written and easy to follow. + It is interesting to find that the existing network compression methods do not preserve the attribution map, and the method to address the problem is well-motivated. + Evaluation is done on several network compression techniques and several datasets.