NeurIPS 2020

Noise2Same: Optimizing A Self-Supervised Bound for Image Denoising

Meta Review

"** What happened in the review phases: In the initial reviews, 3 out of of 4 reviewers (R2, R3, and R4) recommended acceptance, with R2 and R3 in particular judging the paper very positively (Top 50% of accepted NeurIPS papers). R4 was more mildly positive, bothered by felt overclaiming from a suggestion that the method be less sub-optimal than ohers, but still finding the idea of the paper worthy. R1 was more critical, in particular pointing out "confusion on the concepts of J-invariance/dependencies between training and test". Considering the author's response, the AC judges that the points, questions and criticisms raised in each initial review were overall well addressed (e.g. committing to revise the inappropriate “sub-optimal” statement pointed by R4). Except for one important point of contention that remained: authors tried to clarify the misunderstanding of R1 regarding J-invariance/dependencies in training and test connected to results in Table 1. This was brought up in the discussion phase between R1 and R2. R1 expressed that the author response had not clarified the issue, which severely undermined the motivation of their work. R2 shared his own understanding of the matter, which led to clearing the confusion and to R1 understanding the point the authors were making in the paper. ** AC REQUEST 1**: Based on this discussion, the AC concludes that the root of this confusion could be easily corrected if the paper carefully and explicity distinguished that mask-based blind-spot is used during training but not during use/testing (for Table 1, and the N2S model). Similarly (pointed out by R1) the sentence ""the denoising function f trained through mask-based blind-spot approaches is not strictly J-invariant, making Equation (2) not valid"" is confusing/incorrect, because the function thus-trained *if considered to include the masking, which we can also do at usage/test* is J-invariant and so respects Eq. 2. It ceases to be when used without the masking (but explain why do so?). This needs to be clarified. The AC however judges that it should be an easy fix. *********************************************** ** New concern raised during the discussion phase: ** In the discussion phase, R1 found that the results reported in the literature (confirmed by R1 rerunning original implementation) of N2S and Laine et al. [12] noticeably outperformed the proposed method, contrary to what the paper's results table reports. R2 checked the results reported in the literature and agreed. This perceived incorrect reporting of literature results is -- from the AC's reading of the situation -- what prompted all reviewers to significantly lower their initial scores. Reviewers updated their reviews accordingly. Once these updates were made available to the authors, they contacted the AC through CMT with clarifications on this point (see authors' message copied below), explaining that the reported results are results without the post-processing steps -- which is the fair comparison with the approach in the paper as it does does not suppose knowledge of a noise model (which the post-processing steps in the other methods use). Authors also point to several places in the paper where this is stated. As this rebuttal was received by the AC past the end of the discussion phase, he could not bring it up to the reviewers for their consideration. The AC however judges that the author's rebuttal of this concern is correct, and that the reported results are thus a fair comparison with the related methods from the literature. ** AC request 2: ** However the AC asks that the paper be updated by clarifying this explicitly in the caption of the results tables: The caption should report the performance obtained with post-processing informed by noise-model that appears in the literature, and clearly state why the table gives the performance without post-processing. ********************** ** AC's final judgment ** ********************** Based on the expressed reviews, the discussions, and his own reading, the AC judges that the paper contributes significantly to the field of self-supervised (image) denoising, (i.e. when the data is constituted only of noisy images, and the only assumption on the noise is that it be independent across dimensions and zero-mean). It does so by highlighting and showing limitations of prior related approaches, proposing a theoretically well-grounded novel approach, and experimentally showing its superiority (for the setting without assumption of a specific noise model). The AC judges that the new issues that appeared during the discussion phase and that caused the reviewers to significantly lower their scores are the result of misunderstandings, that can easily be avoided by simple fixes in the paper. The AC thus recommends acceptance, provided the simple clarifications described above (AC request 1 and AC request 2, as well as clarifications promised in the initial author response) are implemented in the revised version of the paper. Minor point: the AC also suggests to better *highlight* that an input normalization is applied for Theorem 1 (specifying that it is subtracting mean, dividing by stddev, as normalization can mean different things); and that it is *not applied* for Theorem 2, because it otherwise feels very strange that one has a sigma in the equation, while the other has none. *************************************** Message sent by the authors to the AC through CMT after the end of the discussion phase, after seeing updated reviews: Dear Area Chair, In the reviewer feedback, the four reviewers lower their scores from 5-8-8-6 to 3-6-6-5. The only reason is the concern about the baseline performance raised by Reviewer 1. However, we argue that the concern is invalid and misleading. We exactly follow the experimental settings in the original papers except for excluding the post-processing in [1, 12]. The key reason that different PSNRs are produced by Reviewer 1 is whether the post-processing is used. As we state in Section 5.1, we do not include the post-processing step in [1, 12]. That is because using the post-processing in [1, 12] requires specifying the noise type (from Gaussian, Poisson, and impulse) [12] or the variance of the noise [1], which is unavailable under our problem setting. We clearly explain this difference in our paper as quoted. Lines 93-99, Page 3: “In practice, it is common to have unknown noise models, inconsistent noises, or combined noises with different types, where the Bayesian post-processing is no longer applicable. In contrast, our proposed Noise2Same can make use of the entire input image without any post-processing. Most importantly, Noise2Same does not require the noise model to be known and thus can be used in a much wider range of denoising applications.” Lines 243-246, Page 7: “Note that ImageNet and HànZì have combined noises and Planaria has unknown noise models. As a result, the post-processing steps in Noise2Self [1] and the convolutional blind-spot neural network [12] are not applicable, as explained in Section 2”, Lines 254-256, Page 7: “On the other hand, the convolutional blind-spot neural network [12] suffers from significant performance losses without the Bayesian post-processing, which requires information about the noise models that are unknown.” "