{"title": "Extending Stein's unbiased risk estimator to train deep denoisers with correlated pairs of noisy images", "book": "Advances in Neural Information Processing Systems", "page_first": 1465, "page_last": 1475, "abstract": "Recently, Stein's unbiased risk estimator (SURE) has been applied to unsupervised training of deep neural network Gaussian denoisers that outperformed classical non-deep learning based denoisers and yielded comparable performance to those trained with ground truth. While SURE requires only one noise realization per image for training, it does not take advantage of having multiple noise realizations per image when they are available (e.g., two uncorrelated noise realizations per image for Noise2Noise). Here, we propose an extended SURE (eSURE) to train deep denoisers with correlated pairs of noise realizations per image and applied it to the case with two uncorrelated realizations per image to achieve better performance than SURE based method and comparable results to Noise2Noise. Then, we further investigated the case with imperfect ground truth (i.e., mild noise in ground truth) that may be obtained considering painstaking, time-consuming, and even expensive processes of collecting ground truth images with multiple noisy images. For the case of generating noisy training data by adding synthetic noise to imperfect ground truth to yield correlated pairs of images, our proposed eSURE based training method outperformed conventional SURE based method as well as Noise2Noise. Code is available at https://github.com/Magauiya/Extended_SURE", "full_text": "Extending Stein\u2019s unbiased risk estimator to train\ndeep denoisers with correlated pairs of noisy images\n\nMagauiya Zhussip\n\nShakarim Soltanayev\n\nUlsan National Institute of Science and Technology (UNIST)\n\n{mzhussip, shakarim, sychun}@unist.ac.kr\n\nSe Young Chun\n\nAbstract\n\nRecently, Stein\u2019s unbiased risk estimator (SURE) has been applied to unsupervised\ntraining of deep neural network Gaussian denoisers that outperformed classical\nnon-deep learning based denoisers and yielded comparable performance to those\ntrained with ground truth. While SURE requires only one noise realization per\nimage for training, it does not take advantage of having multiple noise realizations\nper image when they are available (e.g., two uncorrelated noise realizations per\nimage for Noise2Noise). Here, we propose an extended SURE (eSURE) to train\ndeep denoisers with correlated pairs of noise realizations per image and applied it to\nthe case with two uncorrelated realizations per image to achieve better performance\nthan SURE based method and comparable results to Noise2Noise. Then, we\nfurther investigated the case with imperfect ground truth (i.e., mild noise in ground\ntruth) that may be obtained considering painstaking, time-consuming, and even\nexpensive processes of collecting ground truth images with multiple noisy images.\nFor the case of generating noisy training data by adding synthetic noise to imperfect\nground truth to yield correlated pairs of images, our proposed eSURE based training\nmethod outperformed conventional SURE based method as well as Noise2Noise.\nCode is available at https://github.com/Magauiya/Extended_SURE\n\n1\n\nIntroduction\n\nPowerful deep neural networks (DNNs) have been created and investigated for high-level computer\nvision tasks such as image classi\ufb01cation [1, 2], object detection [3, 4], and semantic segmentation [5]\nas well as for low-level computer vision tasks such as image denoising [6, 7, 8, 9]. Initially, it was\nchallenging for DNNs to outperform powerful classical denoisers such as BM3D [7]. However, recent\nworks with DNNs proposed and demonstrated that it is possible for DNNs to outperform classical\ndenoisers for synthetic Gaussian noise [8] as well as for real noise [10]. All aforementioned DNN\nbased denoisers were trained in a supervised way with noiseless ground truth images.\nCollecting high-quality noiseless images for training and evaluating DNN denoisers is challenging.\nPlotz and Roth collected high-quality benchmark data for denoising by averaging 19 independent\nnoise realizations per one image [11]. It is a painstaking process to take tens of photos for one high-\nresolution image of static objects and to perform post-processing to compensate for lighting changes.\nIt seems even more challenging, time-consuming, and even expensive to collect noiseless high-quality\nground truth data for slowly moving objects (e.g., animals, humans), for medical imaging, and for\nairborne hyper-spectral imaging. Even though it is possible to take 19 pictures, it may be inevitable\nthat such a ground truth image contains mild noise if each picture is relatively noisy. For example, an\naverage of 19 pictures contaminated with Gaussian noise of \u03c3 = 25 yields a picture with noise level\nof \u03c3 = 5.74 assuming temporal independence of noise. Thus, it seems desirable to have methods to\ndeal with imperfect ground truth data (i.e., mild noise in ground truth) and/or to train DNN denoisers\nwithout clean ground truth.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fRecently, there have been several works on unsupervised training of DNN denoisers with noisy\nimages only. Deep image prior (DIP) exploited the structure of a generator network and minimized\nthe mean-squared error (MSE) between the output of a DNN and a given noisy image to denoise [12].\nWhile DIP does not train the DNN, it requires to compute MSE minimization for each noisy image,\nwhich is slower than other DNN denoisers. Noise2Noise was proposed to train DNNs for image\nrestoration with a set of two (independent) noise realizations per image in an unsupervised way for\nvarious noise models including Gaussian distribution and a wide range of applications including\ncompressive sensing MR recovery [13]. There have been several self-supervised training works\nwith only one noise realization per image. Stein\u2019s unbiased risk estimator (SURE) based training\nmethod for Gaussian denoisers was proposed to train DNNs with a set of a single noise realization per\nimage [14]. SURE based training method has been extended to train DNNs with a set of undersampled\ncompressive sensing measurements [15]. Noise2Void was proposed to train denoiser DNNs using\na blind-spot network and yielded good performance, but it did not yield better performance than\nconventional methods such as BM3D especially for low noise levels [16]. Noise2Self was also\nproposed to train denoiser DNNs using J-invariant masks and yielded comparable performance to the\nnetwork trained with ground truth for Hanzi dataset, but it yielded lower performance for CellNet\ndataset, possibly due to non-optimal selection of J-partition [17]. Laine et al. further improved\nNoise2Void by proposing blind-spot convolutional network architecture with restricted directional\nreceptive \ufb01eld and Bayesian distribution prediction on output colors, yielding excellent performance\ncomparable to the network trained with ground truth for Gaussian denoising [18]. Noise2Void has\nbeen extended to Noise2Boosting, an unbiased boosting estimator to train networks for more general\nrange of applications such as super-resolution and accelerated MRI [19], and probabilistic Noise2Void\nusing blind-spots to predict posterior probability of a pixel [20].\nBoth Noise2Noise and SURE based method outperformed classical denoising methods such as BM3D\nfor synthetic Gaussian noise and they often yielded comparable performance to DNN denoisers that\nwere trained with clean ground truth. Noise2Noise has demonstrated powerful performance in various\nimage denoising and restoration tasks for zero-mean contaminations [13]. However, it required two\nindependent noise realizations per image empirically, while there was no theoretical explanation on\nthe relationship between two noise realizations. Thus, it is not clear if Noise2Noise can be used for\nthe case with a single noise realization or for the case with imperfect ground truth data. Moreover,\nassuming slowly moving objects or slowly varying light conditions, there is a trade-off between low\nnoise in ground truth and identical underlying true image over multiple realizations.\nEven though SURE based training method is limited to Gaussian noise [14], it has a potential to be\nextended to more general noise models such as mixed Poisson-Gaussian model [21], exponential\nfamily [22] or non-parametric model [23]. It is also extended to unsupervised learning in inverse\nproblems [24]. Moreover, several recent works on real noise denoising exploited a heteroscedastic\nGaussian model y \u223c N (x, \u03b1+\u03b2x) with image generation procedures [10] or local AWGN with pixel-\nshuf\ufb02e down-sampling [25]. Since SURE is a point-wise estimator and can deal with heteroscedastic\nGaussian / local AWGN models, SURE based training scheme could be potentially useful for them.\nSURE could also be more robust to the noise-blur trade-off than Noise2Noise by using a single noise\nrealization per image for slowly moving objects.\nHowever, SURE based method is also limited since it does not take advantage of having multiple\nnoise realizations per image when they are available (e.g., two uncorrelated noise realizations per\nimage for Noise2Noise, imperfect ground truth image with mild noise). In this paper, we address the\nfollowing questions: 1) can Noise2Noise deal with correlated noise realizations and imperfect ground\ntruth? 2) can SURE be extended to take advantage of having two uncorrelated noise realizations per\nimage as in Noise2Noise? 3) can the extended SURE handle correlated noisy images and well utilize\nimperfect ground truth?\nHere, we propose eSURE to training deep Gaussian denoisers with correlated pairs of noise realiza-\ntions per image and applied it to the case with two uncorrelated realizations per image to achieve better\nperformance than the original SURE based method and comparable results to Noise2Noise. Then,\nwe further investigated the case of training Gaussian denoisers with imperfect ground truth (i.e., mild\nnoise in ground truth) by adding synthetic noise. For the case of adding noise to imperfect ground\ntruth to yield correlated pairs of images, our proposed eSURE based training method outperformed\nconventional SURE based method as well as Noise2Noise.\nHere is the summary on the contributions of this paper:\n\n2\n\n\ftions per image.\n\n\u2022 Analyzing Noise2Noise theoretically and empirically for correlated pairs of noise realiza-\n\u2022 Extending SURE to take advantage of having a pair of noise realizations per image for\ntraining unlike the conventional SURE that can take a single noise realization per image.\n\u2022 Investigating eSURE that exploits two independent or correlated noise realizations per image\n\nas well as that can utilize imperfect ground truth with mild noise.\n\nWe will show that there is a clear theoretical link between Noise2Noise and eSURE in a limited\ncase of Gaussian denoising. In fact, it turned out that Noise2Noise is a special case of eSURE for\nindependent Gaussian noise in theory. However, while the performance of Noise2Noise was degraded\nwith correlated pairs of noisy images, the performance of the proposed eSURE remains the same.\nThis paper is organized as follows: Section 2 brie\ufb02y reviews SURE, Monte-Carlo SURE (MC-SURE)\nand SURE based denoiser training. Then, Section 3 revisits theoretical derivation of Noise2Noise\ntraining methods, proposes an extended version of MC-SURE (called eSURE) and shows a clear\nlink between our eSURE and Noise2Noise. These are followed by experimental results to validate\nthe effectiveness of our proposed unsupervised DNN training method in Section 4.1 for the case of\nhaving two independent realizations per image and for the case with imperfect ground truth. Finally,\nSection 5 draws a conclusion of this work.\n\n2 Background\n\n2.1 Stein\u2019s unbiased risk estimator (SURE)\n\nTypically, Gaussian contaminated signal (or image) is modeled as a linear equation:\n\n(1)\nwhere x \u2208 RN is an unknown signal, y \u2208 RN is a known measurement, n \u2208 RN is an i.i.d. Gaussian\nnoise such that n \u223c N (0, \u03c32I), and I is an identity matrix. We denote n \u223c N (0, \u03c32I) as n \u223c N0,\u03c32.\nIn general, given an estmator h(y) of x, the SURE has the following form:\n\ny = x + n\n\n\u03b7(h(y)) =\n\n(cid:107)y \u2212 h(y)(cid:107)2\n\nN\n\n\u2212 \u03c32 +\n\n2\u03c32\nN\n\n\u2202hi(y)\n\n\u2202yi\n\nN(cid:88)\n\ni=1\n\n(2)\n\n(3)\n\n(4)\n\nAssuming x to be deterministic signal (or image), the following theorem for (2) holds.\nTheorem 1. [26, 27] The random variable \u03b7(h(y)) is an unbiased estimator of\n\nor\n\nEn\u223cN0,\u03c32\n\nMSE(h(y)) =\n\n(cid:26)(cid:107)x \u2212 h(y)(cid:107)2\n\n1\nN\n\n(cid:27)\n\n(cid:107)x \u2212 h(y)(cid:107)2\n\n= En\u223cN0,\u03c32 {\u03b7(h(y))}\n\nN\n\nwhere En\u223cN0,\u03c32{\u00b7} is the expectation operator in terms of the random vector n.\nAlthough (2) looks appealing in terms of optimizing parameters of an estimator h(y), the analytical\nsolution for the last divergence term in (2) is limited only to some special cases such as the estimator\nh(y) to be non-local mean or linear \ufb01lters [28, 29]. Thus, in order to utilize (2), one needs to \ufb01nd at\nleast an approximate solution of the divergence term for more general cases.\n\n2.2 Monte-Carlo SURE (MC-SURE)\n\nA fast Monte-Carlo approximation of the divergence term has been developed by Ramani et al. in\n[30]. This method yielded accurate unbiased estimate of MSE for many denoising methods h(y).\nTheorem 2. [30] Let \u02dcn \u223c N0,1 \u2208 RN be independent of n or y. Then,\n\n(cid:26)\n\n(cid:18) h(y + \u0001\u02dcn) \u2212 h(y)\n\n(cid:19)(cid:27)\n\n\u2202hi(y)\n\n\u2202yi\n\nE\u02dcn\n\n= lim\n\u0001\u21920\n\n\u02dcnT\n\n\u0001\n\nK(cid:88)\n\ni=1\n\nprovided that h(y) admits a well-de\ufb01ned second-order Taylor expansion. If not, this is still valid in\nthe weak sense provided that h(y) is tempered.\n\n3\n\n\fN(cid:88)\n\ni=1\n\n1\nN\n\nConsequently, by applying Theorem 2 to the divergence term in (2), the divergence approximation of\nthe denoiser h(y) will be:\n\n\u2202hi(y)\n\n\u2202yi\n\n\u2248 1\n\u0001N\n\n\u02dcnT (h(y + \u0001\u02dcn) \u2212 h(y)) ,\n\n(5)\n\nwhere \u02dcnT is a transposed i.i.d Gaussian vector (\u02dcn \u223c N0,1) and \u0001 is a \ufb01xed small positive value.\n\n2.3 SURE based deep denoiser training\n\nRecently, SURE was used as a surrogate metric to minimize the MSE between the output of the DNNs\nand the ground truth for unsupervised training of DNN based Gaussian denoisers [14]. Speci\ufb01cally,\nMC-SURE allows DNN to learn large-scale weights by minimizing MC-SURE with no noiseless\nground truth images for Gaussian denoising. The equation (2) with (4) was reformulated for the DNN\nh\u03b8(\u00b7) as follows:\n\n(cid:26)\n\nM(cid:88)\n\nj=1\n\n\u03b7(h\u03b8(y)) =\n\n1\nM\n\n(\u02dcn(j))t(cid:16)\n\n+\n\n2\u03c32\n\u0001\n\n(cid:107)y(j) \u2212 h\u03b8(y(j))(cid:107)2 \u2212 N \u03c32\n\n(6)\n\nh\u03b8(y(j) + \u0001\u02dcn(j)) \u2212 h\u03b8(y(j))\n\n(cid:17)(cid:27)\n\n,\n\nwhere \u03b8 is the set of DNN denoiser parameters, M is the size of mini-batch, \u0001 is a small \ufb01xed positive\nconstant, and \u02dcn(j) is a single realization from standard normal distribution for each training data j.\nThis approach has been demonstrated to yield state-of-the-art performance in denoising task with\nsynthetic Gaussian noise, yielding comparable to or slightly worse qualitative and quantitative results\nthan MSE-trained DNNs.\n\n3 Methods\n\nIn this section, we \ufb01rst re-visit Noise2Noise method [13] and re-derive the method of Noise2Noise in\na different approach from [13]. Then, we propose to extend the original SURE and MC-SURE to\ndeal with a pair of correlated noisy images instead of a single noisy image and to use it for training\ndeep learning based denoisers with pairs of correlated Gaussian noise realizations per image. We\nalso show that Noise2Noise is a special case of our proposed extended SURE for Gaussian denoising.\nLastly, we will demonstrate that our proposed method is more robust to correlated noise realization\npairs in training data set compared to a Noise2Noise method. Our eSURE is especially useful for the\ncase of using imperfect ground truth images with mild noise.\n\n3.1 Revisiting Noise2Noise\n\nThe Noise2Noise method has been proposed to train DNNs for image processing only with noisy\nimages where two noise realizations per image were required [13]. Its theoretical justi\ufb01cation required\nzero-mean noise, but there was no clear assumption on independence or uncorrelated property of two\nrealizations. However, two independent noise realizations per image were used empirically.\nAssuming that the triplet (x, y, z) follows a joint distribution and the expectation of two noise vectors\ny \u2212 x, z \u2212 x are both zero vectors, the MSE for in\ufb01nite data is as follows:\n\n(cid:2)E(y,z)|x\n(cid:2)E(y,z)|x\n\n(cid:8)(cid:107)x \u2212 z + z \u2212 h\u03b8(y)(cid:107)2|x(cid:9)(cid:3)\n(cid:8)(cid:107)z \u2212 h\u03b8(y)(cid:107)2 + 2(z \u2212 x)T h\u03b8(y)|x(cid:9)(cid:3) + const.\n\n(cid:8)(cid:107)x \u2212 h\u03b8(y)(cid:107)2(cid:9) = Ex\n\n= Ex\n\nE(x,y)\n\nTherefore, for a \ufb01xed x, if y and (z \u2212 x) are uncorrelated or independent such that (z \u2212 x) has zero\nmean vector, then (7) is equivalent to the following Noise2Noise loss function in terms of \u03b8:\n\nE(x,y,z)(cid:107)z \u2212 h\u03b8(y)(cid:107)2.\n\n(8)\n\nConsequently, the optimal network parameters \u03b8 of a denoiser using (8) will yield the same solution\nas the MSE based training with clean ground truth. Noise2Noise achieved outstanding performance\n\n4\n\n(7)\n\n\fin various image restoration tasks including Gaussian noise removal as far as there is the set of two\nnoisy image pairs per one ground truth image [13].\nTherefore, this analysis on Noise2Noise can now predict that if there are imperfect ground truth\nimages \u02dcx with a mild noise, then denoiser training with \u02dcx plus additional synthetic noise may not\nbe able to yield good performance comparable to the case using two independent noise realizations\nor using perfect ground truth data with additional synthetic noise possibly due to non-negligible\nnon-zero term of E(x,y,z)\n\n(cid:8)(z \u2212 x)T h\u03b8(y)(cid:9) in (7).\n\n3.2 Extended SURE and MC-SURE\n\nThe original SURE in (2) works well with a single noise realization per image, but it can not take\nadvantage of having multiple noise realizations per image just like Noise2Noise. Thus, we propose to\nextend the original SURE to be able to handle pairs of noisy images per ground truth image. The\nextended SURE (eSURE) can be formulated in the following way:\nTheorem 3. Let y1 \u223c N (x, \u03c32\nand y2 (cid:44) (y1 +z) \u223c N (x, (\u03c32\nis an unbiased estimator of MSE:\n\ny1I) be an imperfect ground-truth image, z \u223c N (0, \u03c32\ny1 +\u03c32\n\nzI) is an AWGN,\nz)I) is a noisy image. Then, the random variable \u03b3(h\u03b8(y2), y1)\n\n(cid:26) 1\n\nN\n\nEy2\n\n(cid:107)x \u2212 h\u03b8(y2)(cid:107)2\n\n(cid:27)\n\nwhere y1 and z are independent (or uncorrelated) and\n\n\u03b3(h\u03b8(y2), y1) =\n\n1\nN\n\n(cid:107)y1 \u2212 h\u03b8(y2))(cid:107)2 \u2212 \u03c32\n\ny1 +\n\n.\n\n(9)\n\n= Ey2 {\u03b3(h\u03b8(y2), y1)}\nN(cid:88)\n\n2\u03c32\ny1\nN\n\n\u2202hi(y2)\n\u2202(y2)i\n\ni=1\n\nTheorem 3 is developed for the general case where imperfect ground truth images with mild Gaussian\nnoise are available and one needs to train the DNN for denoising images contaminated with larger\nnoise level. Moreover, one can train DNN denoisers using the following Corollary of Theorem 3:\nCorollary. Given a noisy realization pairs of a clean image (y3, y4) from the same distribution\nN (x, \u03c32\nyI). Then, we add i.i.d.\nGaussian noise z \u223c N (0, 1\nyI). Finally, by applying\nTheorem 3 and replacing divergence term with its Monte-Carlo approximation (5), one can minimize\nextended MC-SURE with respect to \u03b8:\n\n2 (y3 + y4) \u223c N (x, 1\n2 \u03c32\nyI) to w, so that v = (w + z) \u223c N (x, \u03c32\n\nyI), we calculate less noisy image w = 1\n\n2 \u03c32\n\n1\nN\n\n\u03b3(h\u03b8(v), w) =\n\n(cid:107)w \u2212 h\u03b8(v)(cid:107)2 \u2212 1\n(10)\n2\nFor a training dataset of noisy M pairs {(y3\n(M ))}, we can generate\n{(w(j), v(j))}, j \u2208 [1, M ] to train deep learning based denoisers with proposed eSURE method. In\nsimulations, our proposed method will show better performance compared to the original MC-SURE\nbased training approach [14] for both grayscale and color image denoising. The proof of Theorem 3\nand other details can be found in the supplementary material.\n\n(\u02dcn)T (h\u03b8(v + \u0001\u02dcn) \u2212 h\u03b8(v)) .\n\n\u03c32\ny\n\u0001N\n(1)),\u00b7\u00b7\u00b7 , (y3\n\n(M ), y4\n\n(1), y4\n\n\u03c32\ny +\n\n3.3 Link between eSURE and Noise2Noise\n\nThe eSURE framework that we proposed can be applied to a pair of uncorrelated Gaussian noisy\nimages (y \u223c N (x, \u03c32\nzI). In that case, the divergence term vanishes leaving us\nthe following expression:\n\nyI) and z \u223c N (x, \u03c32\n\nEz,y {\u03b3(h\u03b8(y), z)} = Ez,y\n\n\u2212 \u03c32\n\nz\n\n(11)\n\n(cid:26)(cid:107)z \u2212 h\u03b8(y))(cid:107)2\n\n(cid:27)\n\nN\n\nFrom the above expression, one clearly sees that the \ufb01rst term corresponds to the cost function of\nNoise2Noise for i.i.d. Gaussian noise denoising case [13], while the second term \u03c32\nz is a constant.\nMinimization of (11) with respect to a set of denoiser parameters \u03b8 should give us the same solution\nfor both Noise2Noise and our eSURE. Although it is not easy to notice the relationship between two\ndifferent approaches from the \ufb01rst sight, it turns out that Noise2Noise is a special case of proposed\nextended MC-SURE based training method for i.i.d. Gaussian denoising. A complete derivation can\nbe found in the supplementary material.\n\n5\n\n\f4 Results\n\n4.1 Experimental setup\n\nWe have conducted two experiments to evaluate our proposed methods. In the \ufb01rst experiment, we\nexperimentally show that eSURE ef\ufb01ciently utilizes given two uncorrelated realizations per image\nto outperform SURE and is is a general case of Noise2Noise for i.i.d. Gaussian noise. Second\nexperiment aimed to investigate the effect of noise correlation for Noise2Noise and eSURE with\nimperfect ground truth. Our proposed method was compared with BM3D [31], DnCNN trained on\nMC-SURE [14], Noise2Noise [13] and DnCNN trained with MSE using noiseless ground truth data.\nWe used DnCNN [6, 8] as a deep denoising network for grayscale and RGB color images. DnCNN\nconsists of 20 layers of CNN with batch normalization followed by ReLU as a non-linear function. For\nbenchmark test images, we have chosen Berkeley\u2019s BSD-68[32] datasets, and widely used standard\ntest images so called Set 12 [31]. All experiments were implemented on Tensor\ufb02ow framework [33]\nand run on NVidia Titan X GPU.\nIt is worth to note that \u0001 in (2) and (9) should be carefully chosen for stable training and high\nperformance. As mentioned in [14] and [34], \u0001 should be directly proportional to the noise standard\ndeviation \u03c3. Therefore, \u0001 was \ufb01ne tuned, so that for our proposed eSURE it was set to be \u0001 =\n1.6 \u00d7 10\u22124 \u00d7 \u03c3. The results of ablation studies can be found in the supplements.\n\n4.2 Case I: two uncorrelated noise realizations per one image\n\nGiven an uncorrelated two noisy realizations for each image in the training set, we trained DnCNNs\nwith MC-SURE, Noise2Noise, and our proposed eSURE, respectively. More precisely, by following\nprocedures described in DnCNN paper [8], we generated 128\u00d72,919 patches with 50\u00d750 size from\nBSD-400 [32] and produced two independent noisy patches (noise level range \u03c3 \u2208 [0 \u2212 55]) per\nclean patch. Using the aforementioned Corollary, we trained DnCNN with eSURE. It is worth to note\nthat MC-SURE requires only a single noisy image per one ground truth in the training set compared\nto eSURE and Noise2Noise. Therefore, for a fair comparison, we concatenated both noisy datasets to\ntrain DnCNN-SURE with twice more data and denoted it as DnCNN-SURE*. More precisely, we\ntreated two different realizations of the same image as different images. Once patches are extracted,\nwe randomly permuted all patches for every epoch and optimized the network with them. DnCNN\ndenoisers were trained for blind denoising with the noise level range of \u03c3 \u2208 [0 \u2212 55] using Adam\noptimizer [35]. An initial learning rate was set to 10-3, which was dropped to 10-4 after 40 epochs\nand the network was further trained for 10 more epochs.\nThe performance of our approach along with the state-of- the-art methods was tabulated in Table 1.\nOur network training approach demonstrates almost identical quantitative results with Noise2Noise\ntrained DnCNN (DnCNN-N2N) in both test sets. These results are consistent with our theoretical\nunderstandings such as 1) eSURE ef\ufb01ciently utilized two uncorrelated realizations compared to\nSURE*, 2) (11) holds for uncorrelated noisy training set and Noise2Noise is a special case of the\nextended MC-SURE. Moreover, quantitative analysis on BSD68 test set reveals that our eSURE\nis consistently better than conventional BM3D for about 0.5dB and outperforms DnCNN-SURE\nfor about 0.15 dB in lower and higher noise cases. The performance gap between the proposed\nmethod, BM3D and DnCNN-SURE are still similar for Set12 test set. In addition, we can observe\nthat minimizing MC-SURE with twice more dataset (DnCNN-SURE*) provides a little improvement,\nbut not enough to reach DnCNN-eSURE and DnCNN-N2N.\nIn terms of visual comparison, our proposed eSURE method effectively removed noise from an\nimage, while preserving texture and edges. In Figure 1, conventional BM3D yielded blurry results. A\nsimilar trend is observed for DnCNN-SURE where details of the denoised test image from BSD68\nwere not fully recovered (see Figure 1). Also one may be able to observe visually similar denoising\nperformance for DnCNN-N2N and our method.\nTo sum up, it was experimentally shown that for two uncorrelated noise realizations of noisy training\nset, DnCNN-eSURE and DnCNN-N2N yield almost identical performance. Also, since eSURE uses\nless noisy data for training (see Corollary), it better approximates MSE and accordingly outperforms\nMC-SURE yielding results that are closer to MSE trained DnCNN.\n\n6\n\n\fTable 1: PSNR results of blind denoisers on BSD68 and Set12 datasets.\n\nBSD-68\n\nMethods\n\u03c3 = 25\n\u03c3 = 50\n\nBM3D DnCNN-SURE DnCNN-SURE* DnCNN-N2N DnCNN-eSURE DnCNN-MSE\n28.56\n25.62\n\n29.08\n26.13\n\n29.08\n26.15\n\n28.92\n26.00\n\n29.00\n26.07\n\n29.20\n26.22\n\n\u03c3 = 25\n\u03c3 = 50\n\n29.97\n26.67\n\nNoisy dataset\n\n-\n\n30.04\n26.87\n\n1\n\nSet 12\n\n30.13\n26.97\n\n2\n\n30.30\n27.07\n\n2\n\n30.31\n27.07\n\n2\n\n30.42\n27.16\n\u221e\n\n(a) Ground Truth\n\n(b) BM3D / 25.40 dB (c) SURE / 25.83 dB (d) N2N / 26.01 dB (e) Ours / 26.01 dB\n\nFigure 1: Denoised test (BSD68) results of BM3D, DnCNN trained with various methods for \u03c3 = 50.\n\n4.3 Case II: two correlated noise realizations per one image - imperfect ground truth\n\nIn many practical applications, collecting noiseless ground truth images is challenging, expensive,\nor even infeasible because of camera/object motion and long exposure time. Thus, we may have a\nlimited number of photo shots of a particular scene and by averaging them, we have a ground truth\nwhere small amount of noise still remained. Adding synthetic noise to imperfect ground truth to\nproduce noisy images for DNN training produces a dataset where the noise in noisy image may be\ncorrelated with the noise in the ground truth. In order to investigate how correlated noise affects our\neSURE, Noise2Noise, and original SURE, we have conducted 2 experiments: DnCNN denoisers\nwere trained for a \ufb01xed noise level for denoising (e.g. \u03c3noisy = 25, 50) on grayscale BSD-400 and\nfor a blind noise (\u03c3noisy \u2208 [\u03c3gt \u2212 55]) on color BSD-432 [32].\nWe simulated the case with imperfect ground truth images (or slightly noisy ground truth), by adding\nsynthetic Gaussian noise with \u03c3gt to the noiseless clean oracle images. Consequently, noisy training\nimages were generated by adding i.i.d. Gaussian noise on the top of the imperfect ground truth\ndataset. Following the same procedures in Section 4.2, 128\u00d72,919 patches with 50\u00d750 size for\ngrayscale and 128\u00d72,019 patches for RGB case were generated. DnCNN denoisers were trained\nusing Adam optimizer [35]. The initial learning rate was set to 10-3, which was dropped to 10-4 after\n40 epochs and the network further trained for 10 more epochs.\nTable 2 shows the performance of denoising methods trained for a \ufb01xed noise given a noisy ground-\ntruth images with \u03c3gt = {1, 5, 10, 20}. The higher the \u03c3gt is, the more correlated noise is in a training\nset. We notice that at low level of ground-truth noise (\u03c3gt = 1), both Noise2Noise (DnCNN-N2N)\nand eSURE (DnCNN-eSURE) yield the best PSNR results and even comparable to the DnCNN-\nMSE. However, as noise correlation gets severe (\u03c3gt = 5, 10), Noise2Noise fails to achieve high\nperformance that is consistent with our theoretical derivation. In contrast, the proposed DnCNN\neSURE produces the best quantitative results in a stable manner. Although, DnCNN-SURE is not\nsusceptible to noise correlation, it still yielded worse performance than DnCNN-eSURE.\nThe experimental results for RGB color image denoising case are tabulated in Table 3. In this case,\nwe observe the same performance degradation pattern of CDnCNN-N2N as the noise in ground truth\nimage increases (more results in the supplementary material). The visual assessment of methods\nalso demonstrates that eSURE trained CDnCNN was able to provide high-quality images with\n\n7\n\n\fTable 2: Results of denoising methods on BSD68 and Set 12 datasets (Performance in dB).\n\n1\n\n\u03c3noisy\n\u03c3gt\nBM3D\n29.05\nDnCNN-SURE\nDnCNN-N2N\n29.23\nDnCNN-eSURE 29.23\nDnCNN-MSE\n\nBM3D\n30.23\nDnCNN-SURE\nDnCNN-N2N\n30.41\nDnCNN-eSURE 30.47\nDnCNN-MSE\n\n25\n5\n\n28.56\n29.01\n29.15\n29.23\n29.23\n\n29.97\n30.19\n30.39\n30.48\n30.47\n\nBSD-68\n\n50\n\n10\n\n1\n\n5\n\n10\n\n20\n\n29.02\n28.37\n29.21\n\n25.95\n26.28\n26.27\n\nSet 12\n\n30.19\n29.46\n30.44\n\n26.77\n27.28\n27.27\n\n25.62\n\n25.97\n26.24\n26.24\n\n25.90\n25.91\n26.27\n\n26.28\n\n26.67\n\n26.85\n27.20\n27.27\n\n26.73\n27.08\n27.25\n\n27.28\n\n25.92\n24.69\n26.25\n\n26.74\n25.39\n27.23\n\nTable 3: Results of denoising methods on RGB color BSD68 dataset(Performance in dB).\n\n1\n\n\u03c3noisy\n\u03c3gt\nCBM3D\n30.97\nCDnCNN-SURE\nCDnCNN-N2N\n31.18\nCDnCNN-eSURE 31.20\nCDnCNN-MSE\n\nRGB BSD-68\n25\n5\n\n10\n\n50\n\n1\n\n5\n\n10\n\n20\n\n30.70\n30.98\n31.08\n31.18\n31.20\n\n30.99\n29.83\n31.19\n\n27.63\n27.89\n27.94\n\n27.38\n\n27.68\n27.87\n27.91\n\n27.64\n27.61\n27.90\n\n27.93\n\n27.63\n25.62\n27.78\n\npreserved texture and color. Moreover, we can see from Figure 2 that denoised output image of\nCBM3D is highly smoothed out and CDnCNN-N2N recovered image still have some noise, while\nour eSURE denoised image shows sharp edges with almost no noise. To imitate more practical\ncase, we experimented with varied \u03c3gt \u2208 [1 \u2212 10] to train denoisers for blind color image denoising\n(\u03c3noisy \u2208 [10.1 \u2212 55]) and tested on images with a \ufb01xed noise level (similar to Table 1) as shown in\nTable 4. This additional experiment yielded consistent results and our proposed eSURE method still\noutperforms other methods.\n\nTable 4: Results of denoising methods on RGB color dataset with varied ground-truth noise (Perfor-\nmance in dB).\n\nMethod CBM3D CDnCNN-SURE CDnCNN-N2N CDnCNN-eSURE CDnCNN-MSE\n\u03c3 = 25\n\u03c3 = 50\n\n31.15\n27.91\n\n30.92\n27.62\n\n30.73\n27.70\n\n31.20\n27.93\n\n30.70\n27.38\n\n5 Conclusion\n\nWe have investigated properties of Noise2Noise and proposed eSURE that extended the original\nSURE to handle correlated pairs of noisy images ef\ufb01ciently for training DNN denoisers. For two\nuncorrelated noisy realizations per image, eSURE yielded better performance than SURE that implies\nef\ufb01cient utilization of two uncorrelated noisy realizations as compared to SURE and SURE*. Our\neSURE also yielded comparable performance to Noise2Noise that is consistent with our theoretical\nanalysis. For two correlated noisy realizations per image or imperfect ground truth, eSURE still\n\n8\n\n\f(a) Ground Truth\n\n(b) Noisy / 20.68 dB\n\n(c) BM3D / 30.82 dB\n\n(d) SURE / 30.90 dB\n\n(e) N2N /29.89 dB\n\n(f) Ours / 31.17 dB\n\nFigure 2: CBM3D and CDnCNN results on test image from BSD-68 with noise \u03c3 = 25. CDnCNN\ntrained with imperfect ground truth with \u03c3gt = 10 for blind noise denoising task using various\napproaches.\n\nyielded the best performance among all compared methods such as BM3D, SURE, and Noise2Noise.\nHowever, Noise2Noise did not yield good performance with correlated noisy realizations as predicted\nbased on our theoretical analysis.\n\n6 Acknowledgments\n\nThis work was supported partly by Basic Science Research Program through the National Research\nFoundation of Korea(NRF) funded by the Ministry of Education(NRF-2017R1D1A1B05035810), the\nTechnology Innovation Program or Industrial Strategic Technology Development Program (10077533,\nDevelopment of robotic manipulation algorithm for grasping/assembling with the machine learning\nusing visual and tactile sensing information) funded by the Ministry of Trade, Industry & Energy\n(MOTIE, Korea), and a grant of the Korea Health Technology R&D Project through the Korea Health\nIndustry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of\nKorea (grant number: HI18C0316).\n\nReferences\n[1] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. ImageNet classi\ufb01cation with deep convolutional\nneural networks. In Advances in Neural Information Processing Systems 25 (NIPS), pages 1097\u20131105,\n2012.\n\n[2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition.\n\nIn IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770\u2013778, 2016.\n\n9\n\n\f[3] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection\nwith region proposal networks. In Advances in Neural Information Processing Systems 28 (NIPS), pages\n91\u201399, 2015.\n\n[4] Joseph Redmon and Ali Farhadi. YOLO9000: better, faster, stronger. In IEEE Conference on Computer\n\nVision and Pattern Recognition (CVPR), pages 7263\u20137271, 2017.\n\n[5] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmen-\ntation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3431\u20133440,\n2015.\n\n[6] Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre Antoine Manzagol. Stacked\ndenoising autoencoders: Learning useful representations in a deep network with a local denoising criterion.\nJournal of Machine Learning Research, 11:3371\u20133408, December 2010.\n\n[7] Harold C Burger, Christian J Schuler, and Stefan Harmeling. Image denoising: Can plain neural networks\ncompete with BM3D? In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages\n2392\u20132399, 2012.\n\n[8] Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. Beyond a Gaussian Denoiser: Resid-\nual Learning of Deep CNN for Image Denoising. IEEE Transactions on Image Processing, 26(7):3142\u2013\n3155, May 2017.\n\n[9] Stamatios Lefkimmiatis. Non-local Color Image Denoising with Convolutional Neural Networks. In IEEE\n\nConference on Computer Vision and Pattern Recognition (CVPR), pages 5882\u20135891, 2017.\n\n[10] Tim Brooks, Ben Mildenhall, Tianfan Xue, Jiawen Chen, Dillon Sharlet, and Jonathan T Barron. Un-\nprocessing Images for Learned Raw Denoising. In IEEE Conference on Computer Vision and Pattern\nRecognition (CVPR), 2019.\n\n[11] Tobias Plotz and Stefan Roth. Benchmarking denoising algorithms with real photographs. In IEEE\n\nConference on Computer Vision and Pattern Recognition (CVPR), pages 1586\u20131595, 2017.\n\n[12] Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. Deep image prior. In IEEE Conference on\n\nComputer Vision and Pattern Recognition, pages 9446\u20139454, 2018.\n\n[13] Jaakko Lehtinen, Jacob Munkberg, Jon Hasselgren, Samuli Laine, Tero Karras, Miika Aittala, and Timo\nAila. Noise2Noise: Learning image restoration without clean data. In International Conference on Machine\nLearning 35 (ICML), pages 2965\u20132974, 2018.\n\n[14] Shakarim Soltanayev and Se Young Chun. Training deep learning based denoisers without ground truth\n\ndata. In Advances in Neural Information Processing Systems 31 (NeurIPS), pages 3261\u20133271, 2018.\n\n[15] Magauiya Zhussip, Shakarim Soltanayev, and Se Young Chun. Training deep learning based image\ndenoisers from undersampled measurements without ground truth and without image prior. In IEEE\nConference on Computer Vision and Pattern Recognition (CVPR), pages 10247\u201356, 2019.\n\n[16] Alexander Krull, Tim-Oliver Buchholz, and Florian Jug. Noise2void-learning denoising from single noisy\nimages. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2129\u20132137,\n2019.\n\n[17] Joshua Batson and Loic Royer. Noise2Self: Blind denoising by self-supervision.\n\nConference on Machine Learning (ICML), pages 524\u2013533, 2019.\n\nIn International\n\n[18] Samuli Laine, Jaakko Lehtinen, and Timo Aila. Improved self-supervised deep image denoising. In\n\nInternational Conference on Learning Representations (ICLR) Workshops, 2019.\n\n[19] Eunju Cha, Jaeduck Jang, Junho Lee, Eunha Lee, and Jong Chul Ye. Boosting CNN beyond Label in\n\nInverse Problems. arXiv preprint arXiv:1906.07330, 2019.\n\n[20] Alexander Krull, Tomas Vicar, and Florian Jug. Probabilistic Noise2Void: Unsupervised Content-Aware\n\nDenoising. arXiv preprint arXiv:1906.00651, 2019.\n\n[21] Yoann Le Montagner, Elsa D Angelini, and Jean-Christophe Olivo-Marin. An Unbiased Risk Estimator\nfor Image Denoising in the Presence of Mixed Poisson\u2013Gaussian Noise. IEEE Transactions on Image\nProcessing, 23(3):1255\u20131268, August 2014.\n\n[22] Y C Eldar. Generalized SURE for Exponential Families: Applications to Regularization. IEEE Transactions\n\non Signal Processing, 57(2):471\u2013481, January 2009.\n\n10\n\n\f[23] Martin Raphan and Eero P Simoncelli. Learning to be Bayesian without supervision. In Advances in\n\nNeural Information Processing Systems (NIPS), pages 1145\u20131152, 2007.\n\n[24] C. Metzler, A. Mousavi, R. Heckel, and R. Baraniuk. Unsupervised Learning with Stein\u2019s Unbiased Risk\nEstimator. In International Biomedical and Astronomical Signal Processing (BASP) Frontiers workshop,\n2019.\n\n[25] Yuqian Zhou, Jianbo Jiao, Haibin Huang, Yang Wang, Jue Wang, Honghui Shi, and Thomas Huang. When\n\nAWGN-based Denoiser Meets Real Noises. arXiv preprint arXiv:1904.03485, 2019.\n\n[26] C M Stein. Estimation of the mean of a multivariate normal distribution. The Annals of Statistics,\n\n9(6):1135\u20131151, November 1981.\n\n[27] T Blu and F Luisier. The SURE-LET Approach to Image Denoising. IEEE Transactions on Image\n\nProcessing, 16(11):2778\u20132786, October 2007.\n\n[28] Dimitri Van De Ville and Michel Kocher. SURE-Based Non-Local Means. IEEE Signal Processing Letters,\n\n16(11):973\u2013976, 2009.\n\n[29] Dimitri Van De Ville and Michel Kocher. Nonlocal means with dimensionality reduction and SURE-based\n\nparameter selection. IEEE Transactions on Image Processing, 20(9):2683\u20132690, March 2011.\n\n[30] S Ramani, T Blu, and M Unser. Monte-Carlo Sure: A Black-Box Optimization of Regularization\nParameters for General Denoising Algorithms. IEEE Transactions on Image Processing, 17(9):1540\u20131554,\nAugust 2008.\n\n[31] Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. Image denoising by sparse\n3-D transform-domain collaborative \ufb01ltering. IEEE Transactions on Image Processing, 16(8):2080\u20132095,\nAugust 2007.\n\n[32] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its\napplication to evaluating segmentation algorithms and measuring ecological statistics. In IEEE International\nConference on Computer Vision (ICCV), pages 416\u2013423, July 2001.\n\n[33] Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin,\nSanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga,\nSherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin\nWicke, Yuan Yu, and Xiaoqiang Zheng. Tensor\ufb02ow: A system for large-scale machine learning. In\nProceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, pages\n265\u2013283, 2016.\n\n[34] Charles-Alban Deledalle, Samuel Vaiter, Jalal Fadili, and Gabriel Peyr\u00e9. Stein Unbiased GrAdient\nestimator of the Risk (SUGAR) for multiple parameter selection. SIAM Journal on Imaging Sciences,\n7(4):2448\u20132487, 2014.\n\n[35] Diederik P. Kingma and Jimmy Lei Ba. Adam: A method for stochastic optimization. In International\n\nConference on Learning Representations (ICLR), 2015.\n\n11\n\n\f", "award": [], "sourceid": 835, "authors": [{"given_name": "Magauiya", "family_name": "Zhussip", "institution": "UNIST"}, {"given_name": "Shakarim", "family_name": "Soltanayev", "institution": "Ulsan National Institute of Science and Technology"}, {"given_name": "Se Young", "family_name": "Chun", "institution": "UNIST"}]}