NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:9087
Title:Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty

Reviewer 1

The authors present a way of self-supervised auxiliary learning in which the images in the training set are rotated with 4 different rotations, and the neural network has to predict the type of rotation. The authors show with various experiments that this type of SSL increases the robustness against all kinds of perturbations, ranging from adversarial attacks to motion blur and fog. In addition, the outputs indicating the rotation can be used for detecting outliers. The article makes a good case for both contributions. One main remark is that the title of the article talks about uncertainty estimation, while the experiments focus on outlier detection. These two tasks are related but not identical. Outlier detection boils down to binary classification, while uncertainty estimation produces continuous real values. In principle, the scheme introduced by the authors could be used to produce floating point numbers (the sum of outputs is such a number), but additional experiments should then show that the magnitude of this "uncertainty" then makes sense. One of the enticing properties of the method proposed by the authors is that it is so simple / elegant. When comparing with other approaches for OOD detection, the other approaches all have been explicitly made for this task. However, how well would it work to just take the max probability of the normal neural network (instead of the auxiliary outputs)? Should the probability of the max class also not be lower for outliers? And what about estimating the uncertainty via dropout at test time? How well does that work? Concerning robustness, I would wonder why the proposed auxiliary learning works. When also rotating the images, does this lead to more "blurry" features, so that it can deal with blur better? Differently put: Would it work for any auxiliary self-supervised task? What is special about rotations? Finally, when introducing this training, the "normal" performance almost always drops. For example in table 1, the performance goes from 94.8 to 83.5. What do the authors think of this matter? Can it be amended in the future, or is this just a trade-off between robustness and performance on clean samples? Overall, the article is well-written and makes a clear contribution to the field.

Reviewer 2

- This is an experimental paper with a quite clear message: The auxiliary rotation-based self-supervision may not improve accuracy, but it does something in robustness (both in adversarial, and corruption), and (especially) in OOD detection. - Despite of its simplicity, the idea is generally well-presented, with good experimental results. - However, I feel the current manuscript does not provide enough motivation or insight on why the auxiliary rotation task could improve robustness and OOD detection. More detailed analysis on the results would much help the readers to understand the significance of this work: e.g. ablation study, comparing characteristics of the original and proposed networks in the viewpoint of robustness/OOD. - My another concern is about a lack of novelty: Provided that "using pre-training can improve robust and uncertainty" [1], the results in this paper may not be that surprising for some readers, as self-supervision might be just another form of supervised bias in training a network. Again, I think this issue might be relaxed by providing more detailed analysis of the results: Is the claim generalizable to the other self-supervisions apart from the rotation? How about other auxiliary tasks, in the general framework of multi-task learning? Such questions should be justified. - L117 (Section 3.1): In the adv. robustness part, It seems the proposed method makes two (different) modifications from the original adv. training: adding SS-loss (a) on the training objective, and (b) on the PGD objective. Among those two, which one would be more critical for the observed gain? [1] Hendrycks et al., Using Pre-Training Can Improve Model Robustness and Uncertainty, ICML 2019. ----------After rebuttal---------- The authors may discover an interesting finding. But, even after reading the rebuttal, I am still unclear why predicting rotations improves robustness and uncertainty. They provide a part of reasoning/insight in the rebuttal letter, which is nice (hence, I increase my rate a bit, still on a negative side though), but not enough for me, e.g., then the authors should also show whether other non-rotational self-supervised learning works or not to support it. I think the paper has a good potential in the future, but not ready to publish as the current form.

Reviewer 3

Authors proposed using self-supervised learning methods to improve deep learning methods generalization properties such robustness against adversarial attacks and also to improve uncertainty estimates. Authors propose a simple extension to already established algorithm, i.e. PGD, by adding a term (second term in eq.3) to the loss. They show improvement of their proposed method in different problems e.g. out-of-distribution detection. They have used several different datasets e.g. Cifar, Imagenet. Paper is well written and experiments are conducted well. However, the novelty of the proposed method is limited and improvements are not surprising. Using self-supervised learning as a "data augmentation" method will improve the generalization performance of methods. Adding "extra" data points that are strictly not in original dataset will also improve the performance metrics. Hence in my opinion although paper is well written and experiments are well conducted the novelty of the paper is limited.