{"title": "Space-Variant Single-Image Blind Deconvolution for Removing Camera Shake", "book": "Advances in Neural Information Processing Systems", "page_first": 829, "page_last": 837, "abstract": "Modelling camera shake as a space-invariant convolution simplifies the problem of removing camera shake, but often insufficiently models actual motion blur such as those due to camera rotation and movements outside the sensor plane or when objects in the scene have different distances to the camera. In order to overcome such limitations we contribute threefold: (i) we introduce a taxonomy of camera shakes, (ii) we show how to combine a recently introduced framework for space-variant filtering based on overlap-add from Hirsch et al.~and a fast algorithm for single image blind deconvolution for space-invariant filters from Cho and Lee to introduce a method for blind deconvolution for space-variant blur. And (iii), we present an experimental setup for evaluation that allows us to take images with real camera shake while at the same time record the space-variant point spread function corresponding to that blur. Finally, we demonstrate that our method is able to deblur images degraded by spatially-varying blur originating from real camera shake.", "full_text": "Space-Variant Single-Image Blind Deconvolution\n\nfor Removing Camera Shake\n\nStefan Harmeling, Michael Hirsch, and Bernhard Sch\u00a8olkopf\n\nMax Planck Institute for Biological Cybernetics, T\u00a8ubingen, Germany\n\nfirstname.lastname@tuebingen.mpg.de\n\nAbstract\n\nModelling camera shake as a space-invariant convolution simpli\ufb01es the problem\nof removing camera shake, but often insuf\ufb01ciently models actual motion blur such\nas those due to camera rotation and movements outside the sensor plane or when\nobjects in the scene have different distances to the camera. In an effort to address\nthese limitations, (i) we introduce a taxonomy of camera shakes, (ii) we build on a\nrecently introduced framework for space-variant \ufb01ltering by Hirsch et al. and a fast\nalgorithm for single image blind deconvolution for space-invariant \ufb01lters by Cho\nand Lee to construct a method for blind deconvolution in the case of space-variant\nblur, and (iii), we present an experimental setup for evaluation that allows us to\ntake images with real camera shake while at the same time recording the space-\nvariant point spread function corresponding to that blur. Finally, we demonstrate\nthat our method is able to deblur images degraded by spatially-varying blur orig-\ninating from real camera shake, even without using additionally motion sensor\ninformation.\n\n1 Introduction\n\nCamera shake is a common problem of handheld, longer exposed photographs occurring especially\nin low light situations, e.g., inside buildings. With a few exceptions such as panning photography,\ncamera shake is unwanted, since it often destroys details and blurs the image. The effect of a\nparticular camera shake can be described by a linear transformation on the sharp image, i.e., the\nimage that would have been recorded using a tripod. Denoting for simplicity images as column\nvectors, the recorded blurry image y can be written as a linear transformation of the sharp image\nx, i.e., as y = Ax, where A is an unknown matrix describing the camera shake. The task of blind\nimage deblurring is to recover x given only the blurred image y, but not A.\nMain contributions. (i) We present a taxonomy of camera shakes; (ii) we propose an algorithm\nfor deblurring space-variant camera shakes; and (iii) we introduce an experimental setup that allows\nto simultaneously record images blurred by real camera shake and an image of the corresponding\nspatially varying point spread functions (PSFs).\nRelated work. Our work combines ideas of three papers: (i) Hirsch et al\u2019s work [1] on ef\ufb01cient\nspace-variant \ufb01ltering, (ii) Cho and Lee\u2019s work [2] on single frame blind deconvolution, and (iii)\nKrishnan and Fergus\u2019s work [3] on fast non-blind deconvolution.\nPrevious approaches to single image blind deconvolution have dealt only with space-invariant blurs.\nThis includes the works of Fergus et al. [4], Shan et al. [5], as well as Cho and Lee [2] (see Kundur\nand Hatzinakos [6] and Levin et al. [7] for overviews and further references).\nTai et al. [8] represent space-variant blurs as projective motion paths and propose a non-blind decon-\nvolution method. Shan et al. [9] consider blindly deconvolving rotational object motion, yielding\na particular form of space-variant PSFs. Blind deconvolution of space-variant blurs in the context\n\n1\n\n\fof star \ufb01elds has been considered by Bardsley et al. [10]. Their method estimates PSFs separately\n(and not simultaneously) on image patches using phase diversity, and deconvolves the overall image\nusing [11]. Joshi et al. [12] recently proposed a method that estimates the motion path using inertial\nsensors, leading to high-quality image reconstructions.\nThere exists also some work for images in which different segments have different blur: Levin [13]\nand Cho et al. [14] segment images into layers where each layer has a different motion blur. Both\napproaches consider uniform object motion, but not non-uniform ego-motion (of the camera). Hirsch\net al. [1] require multiple images to perform blind deconvolution with space-variant blur, as do \u02c7Sorel\nand \u02c7Sroubek [15].\n\n2 A taxonomy of camera shakes\n\nCamera shake can be described from two perspectives: (i) how the PSF varies across the image, i.e.,\nhow point sources would be recorded at different locations on the sensor, and (ii) by the trajectory of\nthe camera and how the depth of the scene varies. Throughout this discussion we assume the scene\nto be static, i.e., only the camera moves (only ego-motion), and none of the photographed objects\n(no object motion).\nPSF variation across the image. We distinguish three classes:\n\u2022 Constant: The PSF is constant across the image. In this case the linear transformation is a\n\u2022 Smooth: The PSF is smoothly varying across the image. Here, the linear transformation is\nno longer a convolution matrix, but a more general framework is needed such as the smoothly\nspace-varying \ufb01lters in the multi-frame method of Hirsch et al. [1]. For this case, our paper\nproposes an algorithm for single image deblurring.\n\u2022 Segmented: The PSF varies smoothly within segments of the image, but between segments it\n\nconvolution matrix. Most algorithms for blind deconvolution are restricted to this case.\n\nmay change abruptly.\n\nhanging on the wall.\n\nDepth variation across the scene. The depth in a scene, i.e., the distance of the camera to objects\nat different locations in the scene, can be classi\ufb01ed into three categories:\n\u2022 Constant: All objects have the same distance to the camera. Example: photographing a picture\n\u2022 Smooth: The distance to the camera is smoothly varying across the scene. Example: pho-\n\u2022 Segmented: The scene can be segmented into different objects each having a different distance\nto the camera. Example: photographing a scene with different objects partially occluding each\nother.\n\ntographing a wall at an angle.\n\nCamera trajectories. The motion of the camera can be represented by a six dimensional trajectory\nwith three spatial and three angular coordinates. We denote the two coordinates inside the sensor\nplane as a and b, the coordinate corresponding to the distance to the scene as c. Furthermore, \u03b1 and\n\u03b2 describe the camera tilting up/down and left/right, and \u03b3 the camera rotation around the optical\naxis.\nIt is instructive to picture how different trajectories correspond to different PSF variations in different\ndepth situations. Exemplarily we consider the following trajectories:\n\u2022 Pure shift: The camera moves inside the sensor plane without rotation; only a and b vary.\n\u2022 Rotated shift: The camera moves inside the sensor plane with rotation; a, b, and \u03b3 vary.\n\u2022 Back and forth: The distance between camera and scene is changing; only c varies.\n\u2022 Pure tilt: The camera is tilted up and down and left and right; only \u03b1 and \u03b2 vary.\n\u2022 General trajectory: All coordinates might vary as a function of time.\nTable 1 shows all possible combinations. Note that only \u201cpure shifts\u201d in combination with \u201ccon-\nstant depths\u201d lead to a constant PSF across the image, which is the case most methods for camera\nunshaking are proposed for. Thus, extending blind deconvolution to smoothly space-varying PSFs\ncan increases the range of possible applications. Furthermore, we see that for segmented scenes,\ncamera shake usually leads to blurs that are non-smoothly changing across the image. Even though\n\n2\n\n\fConstant depth\nSmooth depth\nSegmented depth segmented segmented\n\nPure shift Rotated shift Back and forth Pure tilt\nsmooth\nconstant\nsmooth\nsmooth\nsegmented segmented\n\nsmooth\nsmooth\nsegmented\n\nGeneral trajectory\nsmooth\nsmooth\n\nsmooth\nsmooth\n\nTable 1: How the PSF varies for different camera trajectories and for different depth situations.\n\nin this case the model of smoothly varying PSFs is incorrect, it might still lead to better results than\nconstant PSFs.\n\n3 Smoothly varying PSF as Ef\ufb01cient Filter Flow\n\nbe written as yi = Pk\u22121\n\nTo obtain a generalized image deblurring method we represent the linear transformation y = Ax by\nthe recently proposed ef\ufb01cient \ufb01lter \ufb02ow (EFF) method of Hirsch et al. [1] that can handle smoothly\nvarying PSFs. For convenience, we brie\ufb02y describe EFF, using the notation and results from [1].\nSpace-invariant \ufb01lters. As our starting point we consider space-invariant \ufb01lters (aka convolutions),\nwhich are an ef\ufb01cient, but restrictive class of linear transformations. We denote by y the recorded\nimage, represented as a column vector of length m, and by a a column vector of length k, repre-\nsenting the space-invariant PSF, and by x the true image, represented as a column vector of length\nn = m + k \u2212 1 (we consider the valid part of the convolution). Then the usual convolution can\nj=0 ajxi\u2212j for 0 \u2264 i < m. This transformation is linear in x, and thus an\ninstance of the general linear transformation y = Ax, where the column vector a parametrizes the\ntransformation matrix A. Furthermore, the transformation is linear in a, which implies that there\nexists a matrix X such that y = Ax = Xa. Using fast Fourier transforms (FFTs), these matrix-\nvector-multiplications (MVMs) can be calculated in O(n log n).\nSpace-variant \ufb01lters. Although being ef\ufb01cient, the (space-invariant) convolution applies only to\ncamera shakes which are pure shifts of \ufb02at scenes. This is generalized to space-variant \ufb01ltering\nby employing Stockham\u2019s overlap-add (OLA) trick [16]. The idea is (i) to cover the image with\noverlapping patches, (ii) to apply to each patch a different PSF, and (iii) to add the patches to obtain\na single large image. The transformation can be written as\ni\u2212j xi\u2212j for 0 \u2264 i < m where\n\ni = 1 for 0 \u2264 i < m.\nw(r)\n\nk\u22121X\n\np\u22121X\n\np\u22121X\n\n(1)\n\nyi =\n\na(r)\nj w(r)\n\nr=0\n\nj=0\n\nr=0\n\nHere, w(r) \u2265 0 smoothly fades the r-th patch in and masks out the others. Note that at each pixel\nthe sum of the weights must sum to one.\nNote that this method does not simply apply a different PSF to different image regions, but instead\nyields a different PSF for each pixel. The reason is that usually, the patches are chosen to overlap\nat least 50%, so that the PSF at a pixel is a certain linear combination of several \ufb01lters, where the\nweights are chosen to smoothly blend \ufb01lters in and out, and thus the PSF tends to be different at each\npixel. Fig. 1 shows that a PSF array as small as 3 \u00d7 3, corresponding to p = 9 and nine overlapping\npatches (right panel of the bottom row), can parametrize smoothly varying blurs (middle column)\nthat closely mimic real camera shake (left column).\nEf\ufb01cient implementation. As is apparent from Eq. (1), EFF is linear in x and in a, the vector\nobtained by stacking a(0), . . . , a(p\u22121). This implies that there exist matrices A and X such that\ny = Ax = Xa. Using Stockham\u2019s ideas [16] to speed-up large convolutions, Hirsch et al. derive\nexpressions for these matrices, namely\n\np\u22121X\np\u22121X\n\nr=0\n\nA = Z T\ny\n\nX = Z T\ny\n\nC T\n\nr F H Diag(F Zaa(r))F Cr Diag(w(r)),\n\nr F H Diag(cid:0)F Cr Diag(w(r))x(cid:1)F ZaBr,\n\nC T\n\n(2)\n\n(3)\n\nwhere Diag(w(r)) is the diagonal matrix with vector w(r) along its diagonal, Cr is a matrix that\ncrops out the r-th patch, F is the discrete Fourier transform matrix, Za is a matrix that zero-pads\n\nr=0\n\n3\n\n\fhand shaked photo of grid\n\narti\ufb01cially blurred grid\n\nPSFs used for arti\ufb01cial blur\n\nFigure 1: A small set of PSFs can parametrize smoothly varying blur: (left) grid photographed with\nreal camera shake, (middle) grid blurred by the EFF framework parametrized by nine PSFs (right).\n\na(r) to the size of the patch, F H performs the inverse Fourier transform, Z T\nof the space-variant convolution.\nReading Eqs. (2) and (3) forward and backward yields ef\ufb01cient implementations for A, AT, X, and\nX T with running times O(n log q) where q is the patch size, see [1] for details. The overlap increases\nthe computational cost by a constant factor and is thus omitted. The EFF framework thus implements\nspace-variant convolutions which are as ef\ufb01cient to compute as space-invariant convolutions, while\nbeing much more expressive.\nNote that each of the MVMs with A, AT, X, and X T is needed for blind deconvolution: A and AT\nfor the estimation of x given a, and X and X T for the estimation of a.\n\ny chops out the valid part\n\n4 Blind deconvolution with smoothly varying PSF\n\nWe now outline a single image blind deconvolution algorithm for space-variant blur, generalizing\nthe method of Cho and Lee [2], that aims to recover a sharp image in two steps: (i) \ufb01rst estimate\nthe parameter vector a of the EFF transformation, and (ii) then perform space-variant non-blind\ndeconvolution by running a generalization of Krishnan and Fergus\u2019 algorithm [3].\n(i) Estimation of the linear transformation: initializing x with the blurry image y, the estimation\nof the linear transformation A parametrized as an EFF, is performed by iterating over the following\nfour steps:\n\u2022 Prediction step: remove noise in \ufb02at regions of x by edge-preserving bilateral \ufb01ltering and\noveremphasize edges by shock \ufb01ltering. To counter enhanced noise by shock \ufb01ltering, we\napply spatially adaptive gradient magnitude thresholding.\n\u2022 PSF estimation step: update the PSFs given the blurry image y and the current estimate of\nthe predicted x, using only the gradient images of x (resulting in a preconditioning effect)\nand enforcing smoothness between neighboring PSFs.\n\u2022 Propagation step: identify regions of poorly estimated PSFs and replace them with neigh-\n\u2022 Image estimation step: update the current deblurred image x by minimizing a least-\n\nboring PSFs.\n\nsquares cost function using a smoothness prior on the gradient image.\n\n(ii) Non-blind deblurring: given the linear transformation we estimate the \ufb01nal deblurred image x\nby alternating between the following two steps:\n\u2022 Latent variable estimation: estimate latent variables regularized with a sparsity prior that\napproximate the gradient of x. This can be ef\ufb01ciently solved with look-up tables, see \u201cw\nsub-problem\u201d of [3] for details.\n\u2022 Image estimation step: update the current deblurred image x by minimizing a least-\nsquares cost function while penalizing the Euclidean norm of the gradient image to the\nlatent variables of the previous step, see \u201cx sub-problem\u201d of [3] for details.\n\nThe steps of (i) are repeated seven times on each scale of a multi-scale image pyramid. We always\nstart with \ufb02at PSFs of size 3 \u00d7 3 pixels and the correspondingly downsampled observed image. For\nup- and downsampling we employ a simple linear interpolation scheme. The resulting PSFs in a\n\n4\n\n\fand the resulting image x at each scale are upsampled and initialize the next scale. The \ufb01nal output\nof this iterative procedure are the PSFs that parametrize the spatially varying linear transformation.\nHaving obtained an estimate for the linear transformation in form of an array of PSFs, the alternating\nsteps of (ii) perform space variant non-blind deconvolution of the recorded image y using a natural\nimage statistics prior (as in [13]). To this end, we adapt the recently proposed method of Krishnan\nand Fergus [3] to deal with linear transformations represented as EFF.\nWhile our procedure is based on Cho and Lee\u2019s [2] and Krishnan and Fergus\u2019 [3] methods for\nspace-invariant single blind deconvolution, it differs in several important aspects which we presently\nexplain.\nDetails of the Prediction step. The prediction step of Cho and Lee [2] is a clever trick to avoid the\nnonlinear optimizations which would be necessary if the image features emphasized by the nonlin-\near \ufb01ltering operations (namely shock and bilateral \ufb01ltering and gradient magnitude thresholding)\nwould have to be implemented by an image prior on x. Our procedure also pro\ufb01ts from this trick\nand we set the hyper-parameters exactly as Cho and Lee do (see [2] for details on the nonlinear\n\ufb01ltering operations). However, we note that for linear transformations represented as EFF, the gradi-\nent thresholding must be applied spatially adaptive, i.e., on each patch separately. This is necessary\nbecause otherwise a large gradient in some region might totally wipe out the gradients in regions\nthat are less textured, leading to poor PSF estimates in those regions.\nDetails on the PSF estimation step. Given the thresholded gradient images of the nonlinear \ufb01ltered\nimage x as the output of the prediction step, the PSF estimation minimizes a regularized least-\nsquares cost function,\n\nk\u2202zy \u2212 A\u2202zxk2 + \u03bbkak2 + \u03bdg(a),\n\n(4)\nwhere z ranges over the set {h, v, hh, vv, hv}, i.e., the \ufb01rst and second, horizontal and vertical\nderivatives of y and x are considered. Omitting the zeroth derivative (i.e., the images x and y\nthemselves) has a preconditioning effect as discussed in Cho and Lee [2]. Matrix A depends on\nthe vector of PSFs a as well. For the EFF framework we added the regularization term g(a) which\nencourages similarity between neighboring PSFs,\n\nX\n\nz\n\n(5)\n\nr=0\n\ns\u2208N (r)\nwhere s \u2208 N (r) if patches r and s are neighbors.\nDetails on the Propagation step. Since high-frequency information, i.e. image details are required\nfor PSF estimation, for images with less structured areas (such as sky) we can not estimate rea-\nsonable PSFs everywhere. The problem stems from the \ufb01nding that even though some area might\nbe less informative about the local PSF, it can look blurred, and thus would require deconvolution.\nThese areas are identi\ufb01ed by thresholding the entropy of the corresponding PSFs (similar to \u02c7Sorel\nand \u02c7Sroubek [15]). The rejected PSFs are replaced by the average of their neighboring PSFs. Since\nthere might be areas for which the neighboring PSFs have been rejected as well, we perform a simple\nrecursive procedure which propagates the accepted PSFs to the rejected ones.\nDetails on the Image estimation step. In both Cho and Lee\u2019s and also in Krishnan and Fergus\u2019\nwork, the image estimation step involves direct deconvolution which corresponds to a simple pix-\nelwise divison of the blurry image by the zero-padded PSF in Fourier domain. Unfortunately, a\ndirect deconvolution does not exist in general for linear transformations represented as EFF, since\nit involves summations over patches. However, we can replace the direct deconvolution by an opti-\nmization of some regularized least-squares cost function ky \u2212 Axk2 + \u03b1k\u2207xkp.\nWhile estimating the linear transformation in (i), the regularizer is Tikhonov on the gradient image,\ni.e., p = 2. As the estimated x is subsequently processed in the prediction step, one might consider\nregularization redundant in the image estimation step of (i). However, the regularization is crucial for\nsuppressing ringing due to insuf\ufb01cient estimation of a. In (ii) during the \ufb01nal non-blind deblurring\nprocedure we employ a sparsity prior for x by choosing p = 1/2.\nThe main difference in the image estimation steps to [2] and [3] is that the linear transformation A\nis no longer a convolution but instead a space-variant \ufb01lter implemented by the EFF framework.\n\np\u22121X\n\nX\n\ng(a) =\n\nka(r) \u2212 a(s)k2,\n\n5\n\n\f\u2192\n\n\u2192\n\n\u21d2\n\n\u2192\n\n\u2192\n\n(a)\n\n(b)\n\n(c)\n\n(d)\n\nFigure 2: How to simultaneously capture an image blurred with real camera shake and its space-\nvarying PSF; (a) the true image and a grid of dots is combined to (b) an RBG image, that is (c)\nphotographed with camera shake, and (d) split into blue and red channel to separate the PSF depict-\ning the blur and the blurred image.\n\n5 Experiments\n\nWe present results on several example images with space-variant blur, for which we are able to\nrecover a deblurred image, while a state-of-the-art method for single image blind deconvolution\ndoes not. We begin by describing the image capture procedure.\nCapturing a gray scale image blurred with real camera shake along with the set of spatially\nvarying PSFs. The idea is to create a color image where the gray scale image is shown in the red\nchannel, a grid of dots (for recording the PSFs) is shown in the blue channel, and the green channel\nis set to zero. We display the resulting RBG image on a computer screen and take a photo with\nreal hand shake. We split the recorded raw image into the red and blue part. The red part only\nshows the image blurred with camera shake and the blue part shows the spatially varying PSFs that\ndepict the effect of the camera shake. To avoid a Moir\u00b4e effect the distance between the camera\nand the computer screen must be chosen carefully such that the discrete structure of the computer\nscreen can not be resolved by the (discrete) image sensor of the camera. We veri\ufb01ed that the spectral\ncharacteristics of the screen and the camera\u2019s Bayer array \ufb01lters are such that there is no cross-talk,\ni.e., the blue PSFs are not visible in the red image. Fig. 2 shows the whole process.\nThree example images with real camera shake. We applied our method, Cho and Lee\u2019s [2]\nmethod, and a custom patch-wise variant of Cho and Lee to three examples captured as explained\nabove. For all experiments, photos were taken with a hand-held Canon EOS 1000D digital single\nlens re\ufb02ex camera with a zoom lens (Canon zoom lens EF 24-70 mm 1:2.8 L USM). The exposure\ntime was 1/4 second, the distance to the screen was about two meters. The input to the deblurring\nalgorithm was only the red channel of the RAW \ufb01le which we treat as if it were a captured gray-scale\nimage. The image sizes are: vintage car 455 \u00d7 635, butcher shop 615 \u00d7 415, elephant 625 \u00d7 455.\nTo assess the accuracy of estimating the linear transformation (i.e., of step (i) in Sec. 4), we compare\nour estimated PSFs evaluated on a regular grid of dots to the true PSFs recorded in the blue channel\nduring the camera shake. This comparison has been made for the vintage car example and is included\nin the supplementary material.\nWe compare with Cho and Lee\u2019s [2] method which we consider currently the state-of-art method\nfor single image blind deconvolution. This method assumes space-invariant blurs, and thus we also\ncompare to a modi\ufb01ed version of this algorithm that is applied to the patches of our method and that\n\ufb01nally blends the individually deblurred patches carefully to one \ufb01nal output image.\nFig 3 shows from top to bottom, the blurry captured image, the result of our method, Cho and\nLee\u2019s [2] result, and a patch-wise variant of Cho and Lee. In our method we used for the linear\ntransformation estimation step (step (i) in Sec. 4) for all examples the hyper-parameters detailed in\n[2]. Our additional hyper-parameters were set as follows: the regularization constant \u03bd weighting\nthe regularization term in cost function (4) that measures the similarity between neighboring PSFs\nis set to 5e4 for all three examples. The entropy threshold for identifying poorly estimated PSFs is\nset to 0.7, with the entropy normalized to range between zero and one. In all experiments, the size\nof a single PSF kernel is allowed to be 15 \u00d7 15 pixels. The space-variant blur was modelled for the\n\n6\n\n\fo\nt\no\nh\np\nd\ne\nk\na\nh\ns\n-\nd\nn\na\nH\n\nt\nl\nu\ns\ne\nr\n\nr\nu\nO\n\n]\n4\n1\n[\n\ne\ne\nL\nd\nn\na\no\nh\nC\n\ne\ne\nL\nd\nn\na\n\no\nh\nC\ne\ns\ni\nw\nh\nc\nt\na\nP\n\nButcher Shop\n\nVintage Car\n\nElephant\n\nFigure 3: Deblurring results and comparison.\n\nvintage car example by an array of 6 \u00d7 7 PSF kernels, for butcher shop by an array of 4 \u00d7 6 PSF\nkernels, and for the elephant by an array of 5 \u00d7 6 PSF kernels. These setting were also used for\nthe patch-wise Cho and Lee variant. For the blending function w(r) in Eq. (1) we used a Bartlett-\nHanning window with 75% overlap in the vintage car example and 50% in the butcher shop and\nelephant example. We choose for the vintage car a larger overlap to keep the patch size reasonably\nlarge. For the \ufb01nal non-blind deconvolution (step (ii) in Sec. 4) hyper-parameter \u03b1 was set to 2e3 and\np was set to 0.5. On the three example images our algorithm took about 30 minutes for space-variant\nimage restoration.\nIn summary, our experiments show that our method is able to deblur space-variant blurs that are too\ndif\ufb01cult for Cho and Lee\u2019s method. Especially, our results reveal greater detail and less restoration\nartifacts, especially noticeable in the regions of the closeup views. Interesting is the comparison\nwith the patch-wise version of Cho and Lee: looking at the details (such as the house number 117\nat the butcher shop, the licence plate of the vintage car, or the trunk of the elephant) our method is\nbetter. At the door frame in the vintage car image, we see that the patch-wise version of Cho and\nLee has alignment problems. Our experience was that this gets more severe for larger blur kernels.\n\n7\n\n\fBlurry image\n\nOur result\n\nJoshi et al. [12]\n\nShan et al. [5]\n\nFergus et al.[18]\n\nFigure 4: Our blind method achieves results comparable to Joshi et al. [12] who additionally require\nmotion sensor information which we do not use. All images apart from our own algorithm\u2019s results\nare taken from [12]. This \ufb01gure is best viewed on screen rather than in print.\nComparison with Joshi et al.\u2019s recent results. Fig. 4 compares the results from [12] with our\nmethod on their example images. Even though our method does not exploit the motion sensor data\nutilized by Joshi et al. we obtain comparable results.\n\nRun-time. The running times of our method is about 30 minutes for the images in Fig. 3 and about\n80 minutes on the larger images of [12] (1123 \u00d7 749 pixels in size). How does this compare with\nCho and Lee\u2019s method for fast deblurring, which works in seconds? There are several reasons for the\ndiscrepancy: (i) Cho and Lee implemented their method using the GPU, while our implementation\nis in Matlab, logging lots of intermediate results for debugging and studying the code behaviour. (ii)\nA space-variant blur has more parameters, e.g. for 6 by 7 patches we need to estimate 42 times as\nmany parameters as for a single kernel. Even though calculating the forward model is almost as fast\nas for the single kernel, convergence for that many parameters appeared to be slower. (iii) Cho and\nLee are able to use direct deconvolution (division in Fourier space) for the image estimation step,\nwhile we have to solve an optimization problem, because we currently do not know how to perform\ndirect deconvolution for the space-variant \ufb01lters.\n\n6 Discussion\n\nBlind deconvolution of images degraded by space-variant blur is a much harder problem than simply\nassuming space-invariant blurs. Our experiments show that even state-of-the-art algorithms such as\nCho and Lee\u2019s [2] are not able to recover image details for such blurs without unpleasant artifacts.\nWe have proposed an algorithm that is able to tackle space-variant blurs with encouraging results.\nPresently, the main limitation of our approach is that it can fail if the blurs are too large or if they vary\ntoo quickly across the image. We believe there are two main reasons for this: (i) on the one hand,\nif the blurs are large, the patches need to be large as well to obtain enough statistics for estimating\nthe blur. On the other hand, if at the same time the PSF is varying too quickly, the patches need to\nbe small enough. Our method only works if we can \ufb01nd a patch size and overlap setting that is a\ngood trade-off for both requirements. (ii) The method of Cho and Lee [2], which is an important\ncomponent of ours, does not work for all blurs. For instance, a PSF that looks like a thick horizontal\nline is challenging, because the resulting image feature might be misunderstood by the prediction\nstep to be horizontal lines in the image. Improving the method of Cho and Lee [2] to deal with such\nblurs would be worthwhile.\nAnother limitation of our method are image areas with little structure. On such patches it is dif\ufb01cult\nto infer a reasonable blur kernel, and our method propagates the results from the neighboring patches\nto these cases. However, this propagation is heuristic and we hope to \ufb01nd a more rigorous approach\nto this problem in future work.\n\n8\n\n\fReferences\n[1] M. Hirsch, S. Sra, B. Sch\u00a8olkopf, and S. Harmeling. Ef\ufb01cient Filter Flow for Space-Variant\nMultiframe Blind Deconvolution. In Proceedings of the IEEE Conference on Computer Vision\nand Pattern Recognition, 2010.\n\n[2] S. Cho and S. Lee. Fast Motion Deblurring. ACM Transactions on Graphics (SIGGRAPH\n\nASIA 2009), 28(5), 2009.\n\n[3] D. Krishnan and R. Fergus. Fast image deconvolution using hyper-Laplacian priors. In Ad-\n\nvances in Neural Information Processing Systems (NIPS), 2009.\n\n[4] R. Fergus, B. Singh, A. Hertzmann, S.T. Roweis, and W.T. Freeman. Removing camera shake\n\nfrom a single photograph. In ACM SIGGRAPH, page 794. ACM, 2006.\n\n[5] Q. Shan, J. Jia, and A. Agarwala. High-quality motion deblurring from a single image. ACM\n\nTransactions on Graphics (SIGGRAPH), 2008.\n\n[6] D. Kundur and D. Hatzinakos. Blind image deconvolution. IEEE Signal Processing Mag.,\n\n13(3):43\u201364, May 1996.\n\n[7] A. Levin, Y. Weiss, F. Durand, and W.T. Freeman. Understanding and evaluating blind decon-\nvolution algorithms. In Proceedings of the IEEE Conference on Computer Vision and Pattern\nRecognition, 2009.\n\n[8] Y. W. Tai, P. Tan, L. Gao, and M. S. Brown. Richardson-Lucy deblurring for scenes under\n\nprojective motion path. Technical report, KAIST, 2009.\n\n[9] Qi Shan, Wei Xiong, and Jiaya Jia. Rotational motion deblurring of a rigid object from a single\n\nimage. In Proc. Int. Conf. on Computer Vision, 2007.\n\n[10] J. Bardsley, S. Jeffries, J. Nagy, and B. Plemmons. A computational method for the restoration\n\nof images with an unknown, spatially-varying blur. Optics Express, 14(5):1767\u20131782, 2006.\n\n[11] J.G. Nagy and D.P. O\u2019Leary. Restoring images degraded by spatially variant blur. SIAM\n\nJournal on Scienti\ufb01c Computing, 19(4):1063\u20131082, 1998.\n\n[12] N. Joshi, S.B. Kang, C.L. Zitnick, and R. Szeliski. Image deblurring using inertial measure-\n\nment sensors. In ACM SIGGRAPH 2010 Papers. ACM, 2010.\n\n[13] A. Levin. Blind motion deblurring using image statistics. In Advances in Neural Information\n\nProcessing Systems (NIPS), 2006.\n\n[14] S. Cho, Y. Matsushita, and S. Lee. Removing non-uniform motion blur from images. In IEEE\n\n11th International Conference on Computer Vision, 2007, 2007.\n\n[15] M. \u02c7Sorel and F. \u02c7Sroubek. Space-variant deblurring using one blurred and one underexposed\n\nimage. In Proceedings of the International Conference on Image Processing (ICIP), 2009.\n\n[16] T.G. Stockham Jr. High-speed convolution and correlation. In Proceedings of the Spring joint\n\ncomputer conference, pages 229\u2013233. ACM, 1966.\n\n[17] N. Joshi, R. Szeliski, and D.J. Kriegman. Image/video deblurring using a hybrid camera. In\n\nProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008.\n\n[18] R. Fergus, B. Singh, A. Hertzmann, S.T. Roweis, and W.T. Freeman. Removing camera shake\n\nfrom a single image. ACM Transactions on Graphics (SIGGRAPH), 2006.\n\n9\n\n\f", "award": [], "sourceid": 687, "authors": [{"given_name": "Stefan", "family_name": "Harmeling", "institution": null}, {"given_name": "Hirsch", "family_name": "Michael", "institution": null}, {"given_name": "Bernhard", "family_name": "Sch\u00f6lkopf", "institution": null}]}