Seo and Milanfar EURASIP Journal on Advances in Signal Processing 2012, 2012:3 http://asp.eurasipjournals.com/content/2012/1/3
RESEARCH Open Access
Robust flash denoising/deblurring by iterative guided filtering
Hae-Jong Seo1* and Peyman Milanfar2
Abstract
A practical problem addressed recently in computational photography is that of producing a good picture of a poorly lit scene. The consensus approach for solving this problem involves capturing two images and merging them. In particular, using a flash produces one (typically high signal-to-noise ratio [SNR]) image and turning off the flash produces a second (typically low SNR) image. In this article, we present a novel approach for merging two such images. Our method is a generalization of the guided filter approach of He et al., significantly improving its performance. In particular, we analyze the spectral behavior of the guided filter kernel using a matrix formulation, and introduce a novel iterative application of the guided filter. These iterations consist of two parts: a nonlinear anisotropic diffusion of the noisier image, and a nonlinear reaction-diffusion (residual) iteration of the less noisy one. The results of these two processes are combined in an unsupervised manner. We demonstrate that the proposed approach outperforms state-of-the-art methods for both flash/no-flash denoising, and deblurring.
1 Introduction
Recently, several techniques [1-5] to enhance the quality of flash/no-flash image pairs have been proposed. While the flash image is better exposed, the lighting is not soft, and generally results in specularities and unnatural appearance. Meanwhile, the no-flash image tends to have a relatively low signal-to-noise ratio (SNR) while containing the natural ambient lighting of the scene. The key idea of flash/no-flash photography is to create a new image that is closest to the look of the real scene by having details of the flash image while maintaining the ambient illumination of the no-flash image. Eisemann and Durand [3] used bilateral filtering [6] to give the flash image the ambient tones from the no-flash image. On the other hand, Petschnigg et al. [2] focused on reducing noise in the no-flash image and transferring details from the flash image to the no-flash image by applying joint (or cross) bilateral filtering [3]. Agrawal et al. [4] removed flash artifacts, but did not test their method on no-flash images containing severe noise. As opposed to a visible flash used by [2-4], recently Krishnan and Fergus [7] used both near-infrared and near-ultraviolet illumination for low light image enhancement. Their so-called dark flash provides high-frequency detail in a less intrusive
way than a visible flash does even though it results in incomplete color information. All these methods ignored any motion blur by either depending on a tripod setting or choosing sufficiently fast shutter speed. However, in practice, the captured images under low-light conditions using a hand-held camera often suffer from motion blur caused by camera shake.
More recently, Zhuo et al. [5] proposed a flash deblur-ring method that recovers a sharp image by combining a blurry image and a corresponding flash image. They integrated a so-called flash gradient into a maximum-a-posteriori framework and solved the optimization problem by alternating between blur kernel estimation and sharp image reconstruction. This method outperformed many states-of-the-art single image deblurring [8-10] and color transfer methods [11]. However, the final output of this method looks somewhat blurry because the model only deals with a spatially invariant motion blur.
Others have used multiple pictures of a scene taken at different exposures to generate high dynamic range images. This is called multi-exposure image fusion [12] which shares some similarity with our problem in that it seeks a new image that is of better quality than any of the input images. However, the flash/no-flash photography is generally more difficult due to the fact that there are only a pair of images. Enhancing a low SNR no-flash image
* Correspondence: mailto:[email protected]
Web End [email protected]
1Sharp Labs of America, Camas, WA 98683, USAFull list of author information is available at the end of the article
2012 Seo and Milanfar; licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0
Web End =http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Seo and Milanfar EURASIP Journal on Advances in Signal Processing 2012, 2012:3 http://asp.eurasipjournals.com/content/2012/1/3
Page 2 of 19
with a spatially variant motion blur only with the help of a single flash image is still a challenging open problem.
2 Overview of the proposed approach
We address the problem of generating a high quality image from two captured images: a flash image (Z ) and a no-flash image (Y ; Figure 1). We treat these two images, Z and Y , as random variables. The task at hand is to generate a new image (X) that contains the ambient lighting of the no-flash image (Y ) and preserves the details of the flash-image (Z ). As in [2], the new image X can be decomposed into two layers: a base layer and a detail layer;
X = Y
base
+ (Z
(2)
where G() is the guided filtering (LMMSE) operator,
zi, zi, zi are samples of
Y, Z,Z respectively, at pixel i, and (a, b, c, d) are coefficients assumed to be constant in k (a square window of size p p) and space-variant. Once we estimate a, b, c, d, Equation 1 can be rewritten as
xi =
yi + (zi
Z)
. (1)
Here, Y might be noisy or blurry (possibly both), and
Y is an estimated version of Y, enhanced with the help of Z. Meanwhile,
Z represents a nonlinear, (low-pass) filtered version of Z so that Z
Z can provide details.
Note that is a constant that strikes a balance between the two parts. In order to estimate
Y and Z, we employ
local linear minimum mean square error (LMMSE) predictorsa which explain, justify, and generalize the idea of guided filteringb as proposed in [1]. More specifically, we assumed that
Y and Z are a liner (affine) function of Z in a window k centered at the pixel k:
yi = G(yi, zi) = azi + b,
zi = G(zi, zi) = czi + d, i k,
zi),= azi + b + zi czi d, = (a c + )zi + b d, = zi + .
det ail
(3)
In fact, xi is a linear function of zi. While it is not pos
sible to estimate a and b directly from (equation (3);
Figure 1 Flash/no-flash pairs. No-flash image can be noisy or blurry.
Seo and Milanfar EURASIP Journal on Advances in Signal Processing 2012, 2012:3 http://asp.eurasipjournals.com/content/2012/1/3
Page 3 of 19
since they in turn depend on xi), the coefficients a, b can be expressed in terms of a, b, c, d which are optimally estimated from two different local linear models shown in Equation 2. Naturally, the simple linear model has its limitations in capturing complex behavior. Hence, we propose an iterative approach to boost its performance as follows:
xi,n = G(xi,n1, zi) + n(zi zi) = nzi + n, (4) where xi,0 = yi and an, bn, and n are functions of the
iteration number n. A block-diagram of our approach is shown in Figure 2. The proposed method effectively removes noise and deals well with spatially variant motion blur without the need to estimate any blur kernel or to accurately register flash/no-flash image pairs when there is a modest displacement between them.
A preliminary version [13] of this article is appeared in the IEEE International Conference on Computer Vision (ICCV 11) workshop. This article is different from [13] in the following respects:
(1) We have provided a significantly expanded statistical derivation and description of the guided filter and its properties in Section 3 and Appendix.(2) Figures 3 and 4 are provided to support the key idea of iterative guided filtering.(3) We provide many more experimental results for both flash/no-flash denoising and de- blurring in Section 5.(4) We describe the key ideas of diffusion and residual iteration and their novel relevance to iterative guided filtering in the Appendix.(5) We prove the convergence of the proposed iterative estimator in the Appendix.(6) As supplemental material, we share our project websitec where flash/no-flash relighting examples are also presented.
In Section 3, we outline the guided filter and study its statistical properties. We describe how we actually estimate the linear model coefficients a, b, c, d and a, b, and we provide an interpretation of the proposed iterative framework in matrix form in Section 4. In Section 5, we demonstrate the performance of the system with some experimental results, and finally we conclude the article in Section 6.
3 The guided filter and its properties
In general, space-variant, nonparametric filters such as the bilateral filter [6], nonlocal means filter [14], and locally adaptive regression kernels filter [15] are estimated from the given corrupted input image to perform denoising. The guided filter can be distinguished from these in the sense that the filter kernel weights are computed from a (second) guide image which is presumably cleaner. In other words, the idea is to apply filter kernels Wij computed from the guide (e.g., flash) image Z to the more noisy (e.g., no-flash) image Y. Specifically, the filter output sample
y a pixel i is computed as a weighted averaged:
yi =
jWij(Z)yj. (5)
Note that the filter kernel Wij is a function of the guide image Z, but is independent of Y. The guided filter kernele can be explicitly written as
Wij(Z) = 1
||
, i, j k, (6)
where || is the total number of pixels (= p2) in k, is a global smoothing parameter, E[Z]k
2 k:(i,j)
1 + (zi E[Z]k)(zj E[Z]k) var(Z)k +
1 lk zl,
and var(Z)k
1
||
lkz2l E[Z]2k. Note that Wij are nor
malized weights, that is, jWij(Z) = 1 Figure 5 shows examples of guided filter weights in four different
Figure 2 System overview. Overview of our algorithm for flash/no-flash enhancement.
Seo and Milanfar EURASIP Journal on Advances in Signal Processing 2012, 2012:3 http://asp.eurasipjournals.com/content/2012/1/3
Page 4 of 19
Figure 3 Iteration improves accuracy. Accuracy of (
20,
20) is improved upon (
1,
1).
patches. We can see that the guided filter kernel weights neatly capture underlying geometric structures as do other data-adaptive kernel weights [6,14,15]. It is worth noting that the use of the specific form of the guided filter here may not be critical in the sense that any other data-adaptive kernel weights such as non-local means kernels [16] and locally adaptive regression kernels [15] could be used.
Next, we study some fundamental properties of the guided filter kernel in matrix form.
We adopt a convenient vector form of Equation 5 as follows:
yi = wTiy, (7)
where y is a column vector of pixels in Y and wTi = [W(i, 1), W(i, 2), . . . , W(i, N)] is a vector of weights for each i. Note that N is the dimensionf of y.
Writing the above at once for all i we have,
y =
wT1 wT2
... wTN
=
W(1, 1) W(1, 2) . . . W(1, N) W(2, 1) W(2, 2) . . . W(2, N)
... ... ... ...W(N, 1) W(N, 2) . . . W(N, N)
= W(z)y, (8)
where z is a vector of pixels in Z and W is only a function of z. The filter output can be analyzed as the product of a matrix of weights W with the vector of the given the input image y.
Figure 4 Effect of iteration. As opposed to
X1, noise in Y was more effectively removed in
X20.
Seo and Milanfar EURASIP Journal on Advances in Signal Processing 2012, 2012:3 http://asp.eurasipjournals.com/content/2012/1/3
Page 5 of 19
Figure 5 Examples of guided filter kernel weights in four different patches. The kernel weights well represent underlying structures.
The matrix W is symmetric as shown in Equation 8 and the sum of each row of W is equal to one (W1N =
1N ) by definition. However, as seen in Equation 6, the definition of the weights does not necessarily imply that the elements of the matrix W are positive in general. While this is not necessarily a problem in practice, we find it useful for our purposes to approximate this kernel with a proper admissible kernel [17]. That is, for the purposes of analysis, we approximate W as a positive valued, symmetric positive definite matrix with rows summing to one, as similarly done in [18]. For the details, we refer the reader to the Appendix A.
With this technical approximation in place, all eigenvalues li (i = 1, ..., N) are real, and the largest eigenvalue of W is exactly one (l1 = 1), with corresponding eigenvector v1 = (1/N)[1, 1, . . . , 1]T = (1/N)1N as shown in Figure 6. Intuitively, this means that filtering by W will leave a constant signal (i.e., a flat image) unchanged. In fact, with the rest of its spectrum inside the unit disk, powers of W converge to a matrix of rank one, with identical rows, which (still) sum to one:
lim
n
Wn = 1NuT1. (9)
So u1 summarizes the asymptotic effect of applying the filter W many times. Figure 7 shows what a typical u1 looks like.
Figure 8 shows examples of the (center) row vector (wT) from Ws powers in one patch of size 25 25. The vector was reshaped into an image for illustration purposes. We can see that powers of W provide even better structure by generating larger (and more sophisticated) kernels. This insight reveals that applying W multiple times can improve the guided filtering performance, which leads us to the iterative use of the guided filter. This approach will produce the evolving coefficients an,
bn introduced in (4). In the following section, we describe how we actually compute these coefficients based on Bayesian mean square error (MSE) predictions.
4 Iterative application of local LMMSE predictors
The coefficientsg ak, bk, ck, dk in (3) are chosen so that
on average the estimated value
Y is close to the observed value of Y (= yi) in k, and the estimated value
Z is close to the observed value of Z (=zi) in k. More specifically, we adopt a stabilized MSE criterion in the window k as our measure of closenessh:
Seo and Milanfar EURASIP Journal on Advances in Signal Processing 2012, 2012:3 http://asp.eurasipjournals.com/content/2012/1/3
Page 6 of 19
Figure 6 Examples of W in one patch of size 25 25. All eigenvalues of the matrix W are nonnegative, thus the guided filter kernel matrix is positive definite matrix. The largest eigenvalue of W is one and the rank of W asymptotically becomes one. This figure is better viewed in color.
(10)
where 1 and 2 are small constants that preventk,k
from being too large. Note that ck and dk become simply 1 and 0 by setting 2 = 0. By setting partial derivatives of MSE(ak, bk) with respect to ak, bk, and partial derivatives of MSE(ck, dk) with respect to ck, dk, respectively, to zero, the solutions to minimum MSE prediction in (10) are
ak = E[ZY] E[Z]E[Y]
E[Z2] E2[Z] + 1
=
MSE(ak, bk) = E[(Y
Y)2] + 1a2k = E[(Y akZ bk)2] + 1a2k, MSE(ck, dk) = E[(Z
Z)2] + 2c2k = E[(Z ckZ dk)2] + 2c2k,
cov(Z, Y) var(Z) + 1
k ,
bk. (12)
As such, the resulting prediction of
Y given the out-
come Z = zi is
ak, b =
1
||
||
k=1
bk = E[Y] kE[Z] = E[Y]k
cov(Z, Y) var(Z) + 1
k E[Z]k,
(11)
ck = E[Z2] E2[Z] E[Z2] E2[Z] + 2
=
var(Z) var(Z) + 2
k ,
||
yi =zi + b =
1
||
var(Z) var(Z) + 2
k E[Z]k,
k=1(kzi + bk),
zi =zi + d =
dk = E[Z] kE[Z] = E[Z]k
(13)
where we compute
E[Z] 1||
1
||
||
k=1(kzi + dk).
l zl, E[Y] 1||
l yl, E[ZY] 1||
l zlyl, E[Z2] 1||
l z2l.
Note that the use of different k results in different predictions of these coefficients. Hence, one must compute an aggregate estimate of these coefficients coming from all windows that contain the pixel of interest. As an illustration, consider a case where we predict
yi using observed values of Y in k of size 3 3 as shown in Figure 9. There are nine possible windows that involve the pixel of interest i. Therefore, one takes into account all nine ak, bks to predict
yi. The simple strategy suggested by He et al. [1] is to average them as follows:
a = 1 ||
||
k=1
Figure 7 Examples of the first left eigenvector u in three patches. The vector was reshaped into an image for illustration purpose.
Seo and Milanfar EURASIP Journal on Advances in Signal Processing 2012, 2012:3 http://asp.eurasipjournals.com/content/2012/1/3
Page 7 of 19
Figure 8 The guided filter kernel matrix. The guided filter kernel matrix W captures the underlying data structure, but powers of W provides even better structure by generating larger (but more sophisticated) kernel shapes. w is the (center) row vector of W. w was reshaped into an image for illustration purposes.
The idea of using these averaged coefficients, b is
analogous to the simplest form of aggregating multiple local estimates from overlapped patches in image denoising and super-resolution literature [19]. The aggregation helps the filter output look locally smooth and contain fewer artifacts.i Recall that
yi and zi zi correspond to the base layer and the detail layer, respectively. The effect of the regularization parameters 1 and 2 is quite the opposite in each case in the sense that the higher 2 is, the more detail through zi zi can be
obtained; whereas the lower 1 ensures that the image content in
Y is not over-smoothed.
These local linear models work well when the window size p is small and the underlying data have a simple pattern. However, the linear models are too simple to deal effectively with more complicated structures, and thus there is a need to use larger window sizes. As we alluded to earlier, the estimation of these linear coefficients in an iterative fashion can deal well with more complex behavior of the image content. More
Figure 9 Explanation of LMMSE.
ak, bk are estimated from nine different windows
predicti
. This figure is better viewed in color.
k and averaged coefficients, b
are used to
Seo and Milanfar EURASIP Journal on Advances in Signal Processing 2012, 2012:3 http://asp.eurasipjournals.com/content/2012/1/3
Page 8 of 19
specifically, by initializing xi,0 = yi, Equation 3 can be
updated as follows
xi,n = G(xi,n1, zi) + n(zi zi), = (an nc + n)zi + bn nd, = nzi + n,
(14)
where n is the iteration number and n >0 is set to be a monotonically decaying functionk of n such that
n=1 n converges. Figure 3 shows an example to illustrate that the resulting coefficients at the 20th iteration predict the underlying data better than a1, b1 do. Similarly,
X20
improves upon
X1 as shown in Figure 4. This iteration is closely related to diffusion and residual iteration which are two important methods [18] which we describe briefly below, and with more detail in Appendix.
Recall that Equation 14 can also be written in matrix form as done in Section 3:
xn = Wxn1
base layer
, (15)
where W and Wd are guided filter kernel matrices composed of the guided filter kernels W and Wd respectively.l Explicitly writing the iterations, we observe
x0 = y
x1 = Wy + 1(I Wd)z,
x2 = Wx1 + 2(I Wd)z = W2y + (1W + 2I)(I Wd)z,
...
xn = Wxn1 + n(I Wd)z = Wny + (1Wn1 + 2Wn2 + + nI)(I Wd)z,
= Wny
diffusion
+n (z Wdz)
det ail layer
(16)
where Pn is a polynomial function of W. The block-diagram in Figure 2 can be redrawn in terms of the matrix formulation as shown in Figure 10. The first termn in
Equation 16 is called the diffusion process that enhances SNR. The net effect of each application of W is essentially a step of anisotropic diffusion [20]. Note that this diffusion is applied to the no-flash image y which has a low SNR. On the other hand, the second term zn is con
nected with the idea of residual iteration [21]. The key idea behind this iteration is to filter the residual signalsm to extract detail. We refer the reader to Appendix B and[18] for more detail. By effectively combining the diffusion and residual iteration [as in (16)], we can achieve the goal of flash/no-flash pair enhancement which is to generate an image somewhere between the flash image z and the no-flash image y, but of better quality than both.n
5 Experimental results
In this section, we apply the proposed approach to flash/ no-flash image pairs for denoising and deblurring. We convert images Z and Y from RGB color space to CIE
Lab, and perform iterative guided filtering separately in each resulting channel. The final result is converted back to RGB space for display. We used the implementation of the guided filter [1] from the authors website.o All figures in this section are best viewed in color.p
5.1 Flash/no-flash denoising5.1.1 Visible flash [2]
We show experimental results on a couple of flash/no-flash image pairs where no-flash images suffer from noiseq. We compare our results with the method based on the joint bilateral filter [2] in Figures 11, 12, and 13. Our proposed method effectively denoised the no-flash image while transferring the fine detail of the flash image and maintaining the ambient lighting of the no-flash image. We point out that the proposed iterative application of the guided filtering in terms of diffusion and residual iteration yielded much better results than one application of either the joint bilateral filtering [2] or the guided filter [1].5.1.2 Dark flash [7]
In this section, we use the dark flash method proposed in [7]. Let us call the dark flash image Z. Dark flash may introduce shadows and specularities in images, which affect the results of both the denoising and detail transfer. We detect those regions using the same methods proposed by [2]. Shadows are detected by finding the regions where |Z - Y| is small, and specularities are found by detecting saturated pixels in Z. After combining the shadow and specularities mask, we blur it using a Gaussian filter to feather the boundaries. By using the resulting mask, the output
Xn at each iteration is alpha-blended with a low-pass filter version of Y as similarly done in [2,7]. In order to realize ambient lighting conditions, we applied the same mapping function to the final output as in [7]. Figures 14, 15, 16, 17, 18, and 19 show that our results yield better detail with less color artifacts than the results of [7].
5.2 Flash/no-flash deblurringMotion blur due to camera shake is an annoying yet common problem in low-light photography. Our proposed method can also be applied to flash/no-flash deblurringr.
Here, we show experimental results on a couple of flash/ no-flash image pairs where no-flash images suffer from mild noise and strong motion blur. We compare our method with Zhuo et al. [5]. As shown in Figures 20, 21, 22, 23, and 24, our method outperforms the method by[5], obtaining much finer details with better color contrast even though our method does not estimate a blur kernel at all. The results by Zhuo et al. [5] tend to be somewhat blurry and distort the ambient lighting of the real scene. We point out that we only use a single blurred image in
+ Pn(W)(I Wd)z
residual iteration
=n + zn,
Seo and Milanfar EURASIP Journal on Advances in Signal Processing 2012, 2012:3 http://asp.eurasipjournals.com/content/2012/1/3
Page 9 of 19
Figure 10 Diagram of the proposed iterative approach in matrix form. Note that the iteration can be divided into two parts: diffusion and residual iteration process.
Figure 24 while Zhuo et al. [5] used two blurred images and one flash image.
6 Summary and future work
The guided filter has proved to be more effective than the joint bilateral filter in several applications. Yet we
have shown that it can be improved significantly more still. We analyzed the spectral behavior of the guided filter kernel using a matrix formulation and improved its performance by applying it iteratively. Iterations of the proposed method consist of a combination of diffusion and residual iteration. We demonstrated that the
Figure 11 Flash/no-flash denoising example compared to the state of the art method [2]. The iteration n for this example is 10.
Seo and Milanfar EURASIP Journal on Advances in Signal Processing 2012, 2012:3 http://asp.eurasipjournals.com/content/2012/1/3
Page 10 of 19
Figure 12 Flash/no-flash denoising example compared to the state of the art method [2]. The iteration n for this example is 10.
Figure 13 Flash/no-flash denoising example compared to the state of the art method [2]. The iteration n for this example is 2.
Seo and Milanfar EURASIP Journal on Advances in Signal Processing 2012, 2012:3 http://asp.eurasipjournals.com/content/2012/1/3
Page 11 of 19
Figure 14 Dark flash/no-flash (low noise) denoising example compared to the state of the art method [7]. The iteration n for this example is 10.
Figure 15 Dark flash/no-flash (mid noise) denoising example compared to the state of the art method [7]. The iteration n for this example is 10.
Seo and Milanfar EURASIP Journal on Advances in Signal Processing 2012, 2012:3 http://asp.eurasipjournals.com/content/2012/1/3
Page 12 of 19
Figure 16 Dark flash/no-flash (high noise) denoising example compared to the state of the art method [7]. The iteration n for this example is 10.
Figure 17 Dark flash/no-flash (low noise) denoising example compared to the state of the art method [7]. The iteration n for this example is 10.
Seo and Milanfar EURASIP Journal on Advances in Signal Processing 2012, 2012:3 http://asp.eurasipjournals.com/content/2012/1/3
Page 13 of 19
Figure 18 Dark flash/no-flash (mid noise) denoising example compared to the state of the art method [7]. The iteration n for this example is 10.
Figure 19 Dark flash/no-flash (high noise) denoising example compared to the state-of-the-art method [7]. The iteration n for this example is 10.
Seo and Milanfar EURASIP Journal on Advances in Signal Processing 2012, 2012:3 http://asp.eurasipjournals.com/content/2012/1/3
Page 14 of 19
Figure 20 Flash/no-flash deblurring example compared to the state-of-the-art method [5]. The iteration n for this example is 20.
proposed approach yields outputs that not only preserve fine details of the flash image, but also the ambient lighting of the no-flash image. The proposed method outperforms state-of-the-art methods for flash/no-flash image denoising and deblurring. It would be interesting to see if the performance of other nonparametric filer kernels such as bilateral filters and locally adaptive regression kernels [15] can be further improved in our iterative framework. It is also worthwhile to explore several other applications such as joint upsampling [22], image matting [23], mesh smoothing [24,25], and specular highlight removal [26] where the proposed approach can be employed.
Appendix
Positive definite and symmetric row-stochastic approximation of WIn this section, we describe how we approximate W with a symmetric, positive definite, and row-stochastic
matrix. First, as we mentioned earlier, the matrix W can contain negative values as shown in Figure 3. We employ the Taylor series approximation (exp (t) 1 + t) to ensure that W has both positive elements, and is positive-definite. To be more concrete, consider a simple example; namely, a local patch of size 5 5 as follows:
Z =
z1 z6 z11 z16 z21
z2 z7 z12 z17 z22
z3 z8 z13 z18 z23
z4 z9 z14 z19 z24
z5 z10 z15 z20 z25
(17)
In this case, we can have up to 9 k of size || = 3
3. Let M
zi, zj
= k:(i,jk)
zi E[Z]k zj E[Z]k
var (Z)k+in
Equation 6. Then, W centered at the index 13 can be written and approximated as follows:
Seo and Milanfar EURASIP Journal on Advances in Signal Processing 2012, 2012:3 http://asp.eurasipjournals.com/content/2012/1/3
Page 15 of 19
Figure 21 Flash/no-flash deblurring example compared to the state-of-the-art method [5]. The iteration n for this example is 20.
Figure 22 Flash/no-flash deblurring example compared to the state-of-the-art method [5]. The iteration n for this example is 20.
Seo and Milanfar EURASIP Journal on Advances in Signal Processing 2012, 2012:3 http://asp.eurasipjournals.com/content/2012/1/3
Page 16 of 19
Figure 23 Flash/no-flash deblurring example compared to the state-of-the-art method [5]. The iteration n for this example is 5.
Figure 24 Flash/no-flash deblurring example compared to the state-of-the-art method [5]. The iteration n for this example is 20.
Seo and Milanfar EURASIP Journal on Advances in Signal Processing 2012, 2012:3 http://asp.eurasipjournals.com/content/2012/1/3
Page 17 of 19
W = 1
81
1 + M(z , z )2(1 + M(z , z ))3(1 + M(z , z ))2(1 + M(z , z ))1 + M(z , z )2(1 + M(z , z ))4(1 + M(z , z ))6(1 + M(z , z ))4(1 + M(z , z ))2(1 + M(z , z )) 3(1 + M(z , z ))6(1 + M(z , z ))9(1 + M(z , z ))6(1 + M(z , z ))3(1 + M(z , z )) 2(1 + M(z , z ))4(1 + M(z , z ))6(1 + M(z , z ))4(1 + M(z , z ))2(1 + M(z , z )) 1 + M(z , z )2(1 + M(z , z ))3(1 + M(z , z ))2(1 + M(z , z ))1 + M(z , z ),
1 81
exp(M(z , z ))2 exp( M(z , z ))3 exp( M(z , z ))2 exp( M(z , z ))exp(M(z , z ))2 exp( M(z , z ))4 exp( M(z , z ))6 exp( M(z , z ))4 exp( M(z , z ))2 exp( M(z , z )) 3 exp( M(z , z ))6 exp( M(z , z ))9 exp( M(z , z ))6 exp( M(z , z ))3 exp( M(z , z )) 2 exp( M(z , z ))4 exp( M(z , z ))6 exp( M(z , z ))4 exp( M(z , z ))2 exp( M(z , z )) exp(M(z , z ))2 exp( M(z , z ))3 exp( M(z , z ))2 exp( M(z , z ))exp(M(z , z ))
,
Next, we convert the matrix W (composed of strictly positive elements now) to a doubly-stochastic, symmetric, positive definite matrix as again done in [18]. The algorithm we use to effect this approximation is due to Sinkhorn [27,28], who proved that given a matrix with strictly positive elements, there exist diagonal matrices R = diag(r) and C = diag(c) such that
W = R W Cis doubly stochastic. That is,
W1N = 1N and 1TN W = 1TN (18)
Furthermore, the vectors r and c are unique to within a scalar (i.e., a r, c/a.) Sinkhorns algorithm for obtaining r and c in effect involves repeated normalization of the rows and columns (see Algorithm 1 for details) so that they sum to one, and is provably convergent and optimal in the cross-entropy sense [29].
Algorithm 1 Algorithm for scaling a matrix A to a nearby doubly-stochastic matrix
Given a matrix A, let (N, N) be size(A) and initialize r = ones(N, 1);for k = 1 : iter;c = 1./(AT r);
r = 1./(A c);endC = diag(c); R = diag(r);
A = RACDiffusion and residual iteration
DiffusionHere, we describe how multiple direct applications of the filter given by W is in effect equivalent to a nonlinear anisotropic diffusion process [20,30]. We define
y0 = y, and
yn = Wn1 = Wny. (19) From the iteration (19) we have
yn = Wn1, (20)
=n1 n1 + Wn1, (21)
=n1 + (W I)n1, (22) which we can rewrite as
yn n1 = (W I)n1
(25)
where Fn is a polynomial function of W of order n +1. The first iterate z1 is precisely the twicing estimate
of Tukey [21].
Convergence of the proposed iterative estimator Recall the iterations:
xn = Wny + (1Wn1 + 2Wn2 + + nI)
(I Wd)z. (26) where Pn is a polynomial function of W. The first (direct diffusion) term in the iteration is clearly stable and convergent as described in (9). Hence, we need to show the stability of the second part. To do this, we note that the spectrum of Pn can be written, and bounded in terms of the eigenvalues li of W as follows:
i(Pn) = eig(Pn) = 1n1i + 2n2i + + n,
1n11 + 2n21 + + n = 1 + 2 + 3 + + n c.
(27)
Pn(W)
y(t)
t = 2y(t) Diffusion Equation (23)
where y is a scaled version of by, and therefore the
left-hand side of the above is a discretization of the deri
vative operator y(t)
t , and as detailed in [18], W - I is effectively the nonlinear Laplacian operator corresponding to the kernel in (6).
Residual iterationAn alternative to repeated applications of the filter W is to consider the residual signals, defined as the difference between the estimated signal and the measured signal. This results in a variation of the diffusion estimator which uses the residuals as an additional forcing term. The net result is a type of reaction-diffusion process[31]. In statistics, the use of the residuals in improving estimates has a rather long history, dating at least back to the study of Tukey [21] who termed the idea twicing. More recently, the idea has been suggested in the applied mathematics community under the rubric of Bregman iterations [32], and in the machine learning and statistics literature [33] as L2-boosting.
Formally, the residuals are defined as the difference between the estimated signal and the measured signal: rn = z zn1, where here we define the initializations z0 = Wz. With this definition, we write the iterated estimates as
zn = zn1 + Wrn = zn1 + W(z zn1). (24) Explicitly writing the iterations, we observe:
z1 = z0 + W(z z0) = Wz + W (z Wz) = (2W W2)z,
z2 = z1 + W(z z1) = (W3 3W2 + 3W)z,
...
zn = Fn(W)Z,
Seo and Milanfar EURASIP Journal on Advances in Signal Processing 2012, 2012:3 http://asp.eurasipjournals.com/content/2012/1/3
Page 18 of 19
is a different initialization than the one used in the diffusion iterations.
where the inequality follows from the knowledge that 0 lN ... l3 l2 <l1 = 1. Furthermore, in Section 4 we defined n to be a monotonically decreasing sequence such that
n=1 n = c < . Hence, all eigenvalues li(Pn) are upper bounded by the constant c, independent of the number of iterations n, ensuring the stability of the iterative process.
End notes
aMore detail is provided in Section 4. bThe guided filter[1] reduces noise while preserving edges as bilateral filter [6] does. However, the guided filter outperforms the bilateral filter by avoiding the gradient reversal artifacts that may appear in such applications as detail enhancement, high dynamic range (HDR) compression, and flash/no-flash denoising. chttp://users.soe.ucsc.edu/~milanfar/research/rokaf/.html/IGF/
Web End =http://users.soe.ucsc.edu/ http://users.soe.ucsc.edu/~milanfar/research/rokaf/.html/IGF/
Web End =~milanfar/research/rokaf/.html/IGF/ http://users.soe.ucsc.edu/~milanfar/research/rokaf/.html/IGF/
Web End =. d in Equation 2
can be rewritten in terms of filtering and we refer the reader to a supplemental material http://personal.ie.cuhk.edu.hk/~hkm007/eccv10/eccv10supp.pdf
Web End =http://personal.ie. http://personal.ie.cuhk.edu.hk/~hkm007/eccv10/eccv10supp.pdf
Web End =cuhk.edu.hk/~hkm007/eccv10/eccv10supp.pdf for derivation. eCross (or joint) bilateral filter [2,3] is defined in a similar way. f N is different from the window size p (N p). g Note that k is used to clarify that the coefficients are estimated for the window k. h Shan et al. [8]
recently proposed a similar approach for high dynamic range compression. i It is worthwhile to note that we can benefit from more adaptive way of combining multiple estimates of coefficients, but this subject is not trea
ted in this article. k We use n = 1
n2 throughout the all experiments. l Recall W in Equation 6. The difference between W and Wd lies in the parameter as follows (2 > 1):
W (Z) = 1||2 k:(i,j)k
1 + (zi E[z]k)
AcknowledgementsWe thank Dilip Krishnan for sharing the post-processing code [7] for dark-flash examples. This study was supported by the AFOSR Grant FA 9550-07-01-0365 and NSF Grant CCF-1016018. This study was done while the first author was at the University of California.
Author details
1Sharp Labs of America, Camas, WA 98683, USA 2University of California-Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
Authors contributionsHS carried out the design of iterative guided filtering and drafted the manuscript. PM participated in the design of iterative guided filtering and performed the statistical analysis. All authors read and approved the final manuscript.
Competing interestsThe authors declare that they have no competing interests.
Received: 23 June 2011 Accepted: 6 January 2012 Published: 6 January 2012
References1. K He, J Sun, X Tang, Guided image filtering, in Proceedings of European Conference Computer Vision (ECCV) (2010)
2. G Petschnigg, M Agrawala, H Hoppe, R Szeliski, M Cohen, K Toyama, Digital photography with flash and no-flash image pairs. ACM Trans Graph. 21(3), 664672 (2004)
3. E Eisemann, F Durand, Flash photography enhancement via intrinsic relighting. ACM Trans Graph. 21(3), 673678 (2004)
4. A Agrawal, R Raskar, S Nayar, Y Li, Removing photography artifacts using gradient projection and flash-exposure sampling. ACM Trans Graph. 24, 828835 (2005). doi:10.1145/1073204.1073269
5. S Zhuo, D Guo, T Sim, Robust flash deblurring. IEEE Conference on Computer Vison and Pattern Recognition (2010)
6. C Tomasi, R Manduchi, Bilateral Filtering for Gray and Color Images, in Proceedings of the 1998 IEEE International Conference of Compute Vision Bombay, India, pp. 836846 (1998)
7. D Krishnan, R Fergus, Dark flash photography. ACM Trans Graph. 28(4), 594611 (2009)
8. Q Shan, J Jia, MS Brown, http://www.ncbi.nlm.nih.gov/pubmed/20467063?dopt=Abstract
Web End =Globally optimized linear windowed tone- http://www.ncbi.nlm.nih.gov/pubmed/20467063?dopt=Abstract
Web End =mapping. IEEE Trans Vis Comput Graph. 16(4), 663675 (2010)
9. R Fergus, B Singh, A Hertsmann, ST Roweis, WT Freeman, Removing camera shake from a single image. ACM Trans Graph (SIGGRAPH). 25, 787794 (2006). doi:10.1145/1141911.1141956
10. L Yuan, J Sun, L Quan, HY Shum, Progressive inter-scale and intra-scale non-blind image deconvolution. ACM Trans Graph. 27(3), 110 (2008)
11. YW Tai, J Jia, CK Tang, Local color transfer via probabilistic segmentation by expectation-maximization. IEEE Conference on Computer Vison and Pattern Recognition (2005)
12. W Hasinoff, Variable-aperture photography, (PhD Thesis, Department of Computer Science, University of Toronto, 2008)
13. H Seo, P Milanfar, Computational photography using a pair of flash/no-flash images by iterative guided filtering. IEEE International Conference on Computer Vision (ICCV) (2011). Submitted
14. A Buades, B Coll, JM Morel, A review of image denoising algorithms, with a new one. Multi-scale Model Simulat (SIAM). 4(2), 490530 (2005)
15. H Takeda, S Farsiu, P Milanfar, http://www.ncbi.nlm.nih.gov/pubmed/17269630?dopt=Abstract
Web End =Kernel regression for image processing and http://www.ncbi.nlm.nih.gov/pubmed/17269630?dopt=Abstract
Web End =reconstruction. IEEE Trans Image Process. 16(2), 349366 (2007)
16. A Buades, B Coll, JM Morel, Nonlocal image and movie denoising. Int J Comput Vis. 76(2), 123139 (2008). doi:10.1007/s11263-007-0052-1
17. T Hofmann, B Scholkopf, AJ Smola, Kernel methods in machine learning. Ann Stat. 36(3), 11711220 (2008). doi:10.1214/00905360700000067718. P Milanfar, A tour of modern image processing. IEEE Signal Process Mag http://users.soe.ucsc.edu/~milanfar/publications/journal/ModernTour_FinalSubmission.pdf
Web End =http://users.soe.ucsc.edu/~milanfar/publications/journal/ http://users.soe.ucsc.edu/~milanfar/publications/journal/ModernTour_FinalSubmission.pdf
Web End =ModernTour_FinalSubmission.pdf (2011)
zj E[Z]k
var(Z)k + 1
,
(28)
m This is generally defined as the difference between the estimated signal
Z and the measured signal Z, but in our context refers to the detail signal. n We refer the reader to Appendix C for proof of convergence of the proposed iterative estimator. o http://personal.ie.cuhk.edu.hk/~hkm007/
Web End =http://personal.ie.cuhk. http://personal.ie.cuhk.edu.hk/~hkm007/
Web End =edu.hk/~hkm007/ . p We refer the reader to the project Website http://users.soe.ucsc.edu/~milanfar/research/rokaf/.html/IGF/
Web End =http://users.soe.ucsc.edu/~milanfar/research/ http://users.soe.ucsc.edu/~milanfar/research/rokaf/.html/IGF/
Web End =rokaf/.html/IGF/ . q The window size p for Wd and W was set to 21 and 5 respectively for all the denoising examples. r The window size p for Wd and W was set to 41 and 81, respectively, for the deblurring examples to deal with displacement between the flash and no-flash image. s Note that due to the use of residuals, this
Wd (Z) = 1
||2 k:(i,j)k
1 + (zi E[z]k)
zj E[Z]k
var(Z)k + 2
.
Seo and Milanfar EURASIP Journal on Advances in Signal Processing 2012, 2012:3 http://asp.eurasipjournals.com/content/2012/1/3
Page 19 of 19
19. M Protter, M Elad, H Takeda, P Milanfar, http://www.ncbi.nlm.nih.gov/pubmed/19095517?dopt=Abstract
Web End =Generalizing the non-local-means http://www.ncbi.nlm.nih.gov/pubmed/19095517?dopt=Abstract
Web End =to super-resolution reconstruction. IEEE Trans Image Process. 18, 3651 (2009)
20. P Perona, J Malik, Scale-space and edge detection using anistropic diffusion. IEEE Trans Pattern Anal Mach Intell. 12(9), 629639 (1990)
21. JW Tukey, Exploratory Data Analysis (Addison Wesley, Reading, MA, 1977)22. J Kopf, MF Cohen, D Lischinski, M Uyttendaele, Joint bilateral upsampling. ACM Trans Graph. 26(3), 96 (2007). doi:10.1145/1276377.1276497
23. K He, J Sun, X Tang, Fast matting using large kernel matting Laplacian matrices. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)
24. S Fleishman, I Drori, D Cohen-Or, Bilateral mesh denoising. ACM Trans Graph. 22(3), 950953 (2003). doi:10.1145/882262.882368
25. T Jones, F Durand, M Desbrun, Non-iterative feature preserving mesh smoothing. ACM Trans Graph. 22(3), 943949 (2003). doi:10.1145/ 882262.882367
26. Q Yang, S Wang, N Ahuja, Real-time specular highlight removal using bilateral filtering. ECCV (2010)
27. PA Knight, The Sinkhorn-Knopp algorithm: convergence and applications. SIAM J Matrix Anal Appl. 30, 261275 (2008). doi:10.1137/06065962428. R Sinkhorn, A relationship between arbitrary positive matrices and doubly stochastic matrices. Ann Math Stat. 35(2), 876879 (1964). doi:10.1214/ aoms/1177703591
29. J Darroch, D Ratcliff, Generalized iterative scaling for log-linear models. Ann Math Stat. 43, 14701480 (1972). doi:10.1214/aoms/117769237930. A Singer, Y Shkolinsky, B Nadler, Diffusion interpretation of nonlocal neighborhood filters for signal denoising. SIAM J Imaging Sci. 2, 118139 (2009). doi:10.1137/070712146
31. G Cottet, L Germain, Image processing through reaction combined with nonlinear diffusion. Math Comput. 61, 659673 (1993). doi:10.1090/S0025-5718-1993-1195422-2
32. S Osher, M Burger, D Goldfarb, J Xu, W Yin, An iterative regularization method for total variation-based image restoration. Multiscale Model Simulat. 4(2), 460489 (2005). doi:10.1137/040605412
33. P Buhlmann, B Yu, Boosting with the L2 loss: regression and classification. J Am Stat Assoc. 98(462), 324339 (2003). doi:10.1198/016214503000125
doi:10.1186/1687-6180-2012-3Cite this article as: Seo and Milanfar: Robust flash denoising/deblurring by iterative guided filtering. EURASIP Journal on Advances in Signal Processing 2012 2012:3.
Submit your manuscript to a journal and benet from:
7 Convenient online submission7 Rigorous peer review7 Immediate publication on acceptance7 Open access: articles freely available onlihttp://www.springeropen.com/
Web End =ne 7 High visibility within the eld7 Retaining the copyright to your article
Submit your next manuscript at 7 http://www.springeropen.com/
Web End =springeropen.com
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Springer International Publishing AG 2012
Abstract
A practical problem addressed recently in computational photography is that of producing a good picture of a poorly lit scene. The consensus approach for solving this problem involves capturing two images and merging them. In particular, using a flash produces one (typically high signal-to-noise ratio [SNR]) image and turning off the flash produces a second (typically low SNR) image. In this article, we present a novel approach for merging two such images. Our method is a generalization of the guided filter approach of He et al., significantly improving its performance. In particular, we analyze the spectral behavior of the guided filter kernel using a matrix formulation, and introduce a novel iterative application of the guided filter. These iterations consist of two parts: a nonlinear anisotropic diffusion of the noisier image, and a nonlinear reaction-diffusion (residual) iteration of the less noisy one. The results of these two processes are combined in an unsupervised manner. We demonstrate that the proposed approach outperforms state-of-the-art methods for both flash/no-flash denoising, and deblurring.[PUBLICATION ABSTRACT]
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer