Full Text

Turn on search term navigation

(ProQuest: ... denotes non-US-ASCII text omitted.)

Mithun Das Gupta 1 and Shyamsundar Rajaram 1 and Nemanja Petrovic 2 and Thomas S. Huang 1

Recommended by Simon Lucey

1, Beckman Institute, Department of Electrical and Computer Engineering (ECE), University of Illinois at Urbana-Champaign (UIUC), IL 61801, USA
2, Google Inc., NY 10011, USA

Received 29 April 2008; Accepted 24 October 2008

1. Introduction

Restoration is a neat showcase of ill-posedness of computer vision. Given a blurred image, there can be several sharp natural images which, when blurred, will generate the original image. The inherent ambiguities in restoration are usually overcome by using different regularization techniques or the Bayesian remedy. In several important applications like surveillance, tracking, and license plate recognition systems, images are mostly severely blurred. But the quality of the blur is usually application specific and hence the need for restoration systems.

There are numerous methods for inferring the sharp image from the blurred input. A reasonable estimate of the high-resolution image may be obtained if we have a priori knowledge about the blurring kernel. If no additive noise is present, Wiener filter is the optimal filter. In the noisy case, Wiener filter gives the MMSE solution. Restoration can be made easier by incorporating several images as in [1]. Further, image restoration can be thought of as a special case of superresolution and as such image deblurring and superresolution have been treated concurrently by many authors. Superresolution algorithms can be classified into many categories based on different criteria such as frequency/image domain, single/multiple images, and so on. Earlier works in this field utilized the band limitedness of the images to interpolate subpixel values from a series of aliased images. Recently, time domain methods have been the principal research fields. Among the time domain methods, two broad sections are iterative methods and learning-based methods. Iterative methods [2-6] mostly use a Bayesian framework, where an initial guess about the high-resolution frame is refined at each iteration. The image prior is usually assumed to be a smoothness prior.

However, it seems that machine learning and specifically, probabilistic inference techniques are currently the most promising line of research. The principal idea of the machine learning approach is to use a set of high-resolution (sharp) images and their corresponding low-resolution (blurred) images to build a compatibility model. The images are stored as patches or as coefficients of other feature representations. Recently, an impressive amount of work has been reported in this field [7-11], to name a few. In [11], PCA-based techniques were used to capture the relationship between the high-resolution and low-resolution patches while nonparametric modeling was used to estimate the missing details. In [9], an example-based learning method was employed for superresolving natural images up to a zoom factor of 8. Along the same lines, Bishop et al. [10] performed video superresolution by considering additional priors to take into account the temporal coherence between successive frames. Machine learning-based restoration methods can be made more powerful and robust if the images are restricted to be of the same type, as in [7], where face images are hallucinated. The spirit of our work is similar to the work of Freeman et al. [9], with several important differences.

In our previous work [12, 13] on restoration using MRFs over a patch image model, we introduced the ideas of partial messages and the restoration-recognition loop. Our algorithm was built on the notion of partial message propagation, where any given node (patch) in an MRF is only partially influenced by its neighbor, depending on the spatial orientation of the two neighbors. Restoration and recognition was performed in an iterative loop which resulted in the localization of the search space.

The limitation of our previous work is the need for segmented images before performing restoration and recognition. Further, the recognition was performed as a separate block. In this work, we suggest a unified framework for performing restoration and recognition without prior segmentation. We present a multilayer Markov random field model having two interconnected layers where restoration is performed in the bottom layer and recognition in the top layer. The messages propagated bottom-up help in improving recognition/segmentation and the messages propagated top-down aid in better restoration.

This paper is organized in the following order. In Section 2, we present the image model and define the problem statement and the notations used in the rest of the paper. In Section 3, we illustrate the features used for learning the potential functions for the multilayer MRF structure. Within this section, we briefly review the belief propagation algorithm and the nonparametric extension to it. We illustrate the use of multilayer MRF structure for two different applications and explain the recognition with respect to the two different applications. We propose an analogous model in the frequency domain and show improvement in handling compression artifacts in images and videos in Section 5. Finally, we present experiments and results in Section 6 and conclude in Section 7.

2. Model

We propose a multilayer architecture to perform restoration and recognition of blurred images (Figure 1). The lower layer (restoration layer) is an undirected graphical model over image patches with compatibility functions represented as nonparametric kernel densities. The top layer (recognition layer) is an undirected graphical model with each node representing a multinomial distribution over the 10 -hypothesized digits present in the neighboring patch nodes in the lower layer. The compatibility functions in the restoration layer are learned using nonparametric kernel density estimation techniques (Section 3). We use an extended version of the nonparametric belief propagation algorithm [14, 15] in the restoration layer, and belief propagation algorithm [16] in the recognition layer.

Figure 1: Multilayer MRF model. The restoration layer consists of nonoverlapping hidden image patches indicated by dark gray circles and each node has a local observation indicated by empty boxes. The recognition layer consists of class membership nodes and are denoted by light gray boxes.

[figure omitted; refer to PDF]

2.1. Problem Statement and Notation

Consider a training set of triplets given by {(^X1 ,^Y1 ,^W1 ),(^X2 ,^Y2 ,^W2 ),..., (^Xn ,^Yn ,^Wn )}, where ^Xi represents a sharp image of one particular digit, ^Yi represents the blurred version, ^Wi ∈{0,1,...,K} indicates the class of the i th training image and K denotes the number of classes. Let there be an unknown kernel f(^Xi ) that maps from ^Xi to ^Yi . The objective is to learn a model which is able to infer the sharp image X (which has more than one digit) and the classes of all m objects _W1 ,...,_Wm (digits) in it, from the blurred input image Y consisting of digit fonts not present in the training set.

We model the image X as an undirected graphical model or more specifically a Markov random field (MRF) [9]. This MRF will be referred to as the restoration field and is the bottom layer in Figure 1. The restoration field is defined by the bottom graph _Gb ={_Vb ,_Eb }, where each node represents a random variable _x_ib , _ib ∈[1,...,N] (where N is the total number of nodes in the restoration layer), corresponding to a patch in the unknown, sharp image, which is associated with an observation node _y_ib which represents the corresponding patch in the observed image. An edge between node _x_ib and node _x_jb indicates that they are spatial neighbors.

We model the class membership of the m objects _W1 ,...,_Wm as another MRF defined by the top graph _Gt ={_Vt ,_Et } . We refer to this MRF as the recognition field and is shown as the top layer in Figure 1. Each node in this MRF represents a random variable _c_it , _it ∈[1,...,L], where _c_it ∈{1,...,K} and L is the total number of nodes in the recognition layer. We note that _Gt and _Gb denote the graph structure corresponding to the top layer and bottom layer, respectively, and they do not include the interlayer connectivity information. Further, for every node i in our two-layer graphical model, we denote the neighborhood in the restoration layer by _Ebi and similarly we denote _Eti as the neighborhood of node i in the recognition layer. The model is illustrated in Figure 2.

Figure 2: The neighborhood model. The node i is the black node. The dark gray nodes are the neighbors which belong to the top layer defined by _Eti and the light gray nodes are the neighbors in the bottom layer defined by _Ebi .

[figure omitted; refer to PDF]

The potential functions modeling the various interactions are summarized in Table 1.

Table 1

Interacting nodes	Potential	Notation
(_x_ib ,_x_jb )	Patch interaction	ψ(_x_ib ,_x_jb )
(_x_ib ,_y_ib )	Association	[varphi](_x_ib ,_y_ib )
(_x_ib ,_c_jt )	Classification	[straight phi](_x_ib ,_c_jt )
(_c_it ,_c_jt )	Class interaction	θ(_c_it ,_c_jt )

The association and the patch interaction potentials can be modeled as continuous parametric distributions that are tractable for the belief propagation algorithm [16] (mixture of Gaussian, etc.). However, as noted in our previous work [17], parametric approaches introduce averaging effects which are against the spirit of the restoration problem. Hence, we use nonparametric kernel density estimation techniques for learning the association, patch interaction, and classification potentials. We elaborate the learning of these potentials in Section 3.

During the learning phase, the potentials [varphi], ψ, θ, [straight phi] are learned from the training data. The inference phase involves computing the marginals of posterior distribution p(_x_ib | Y) , for all the nodes _ib ∈_Vb and the marginals p(_c_jt | Y) , for all the nodes _jt ∈_Vt . In Section 3.1, we discuss the application of the belief propagation algorithm for approximate inference in the multilayer MRF model.

3. Learning Potentials

We model the association potential [varphi](_x_ib ,_y_ib ) as a function over the vectorized representation of patches _x_ib and _y_ib and has the form [figure omitted; refer to PDF] where M is the number of components and ...A9;([_x_ib ,_y_ib^]T ;μ,Λ) is the multivariate normal distribution with mean μ and covariance Λ over the random vector ^[^_x^_ib^,^_y^_ib^]T . From the training set, the patch association vectors corresponding to the image and its blurred version are constructed. The patch association vectors are pruned to avoid redundancy. The potential is constructed by considering a Gaussian kernel with the mean chosen as the patch association vector and the covariances are chosen using the leave-one-out cross validation technique [18].

The patch interaction potential _ψζ(_ib_,_jb₎ (_x_ib ,_x_jb ) is a function over the vectorized two-pixel thick nonoverlapping patch boundary and learnt using the above-mentioned nonparametric estimation technique. The classification potential [straight phi](_x_ib ,_c_jt ) is split into a conditional term [straight phi](_x_ib | _c_jt ) and a marginal term [varsigma](_c_jt ) . The conditional term is nonparametrically represented as a function over vectorized sharp patches obtained from images belonging to the class _c_jt and the marginal term [varsigma](_c_jt ) is given by 1/K . The class interaction potential θ(_c_it ,_c_jt ) is modeled as a probability table given by [figure omitted; refer to PDF] where K is the number of classes and we set q to be 0.7 for our experiments.

3.1. Belief Propagation

For acyclic graphs, the marginal distributions can be calculated efficiently and exactly by a local message passing algorithm known as belief propagation (BP) [16]. In the case of graphs with cycles, the BP algorithm is not exact. The iterative version of BP algorithm produces beliefs which do not converge to true marginals. But, it has been empirically shown that loopy BP produces excellent results for several hard problems. Recently, Yedidia et al. [19] established the link between the fixed points of belief propagation algorithm and stationary points of the "variational free energy" defined on the graphical model. This important result sheds more light on the properties of loopy BP approximation.

In our multilayer MRF model, BP is first performed in the restoration layer which is followed by BP in the recognition layer. There are four different types of messages, namely, the message passed between neighboring nodes (_ib ,_jb )∈_Eb in the restoration layer, the messages passed between neighboring nodes (_it ,_jt )∈_Et in the recognition layer, and the bidirectional messages passed between neighboring nodes (_ib ,_jt )∈_El in the recognition and restoration layer interface.

Recognition-to-Restoration Layer Messages

The message propagated from a node in the recognition layer _c_it to a node in the restoration layer _x_jb is given by [figure omitted; refer to PDF]

Intrarestoration Layer Messages

The message propagated from node _ib to node _jb in the restoration layer during the n th iteration represented as ^m^_ib^,^_jbⁿ (_x_jb ) is given by [figure omitted; refer to PDF]

Restoration-to-Recognition Layer Messages

The message propagated from node _ib to _jt , _m_ib_,_jt (_c_jt ) is [figure omitted; refer to PDF]

Intrarecognition Layer Messages

The message propagated from node _it to node _jt , during the n th iteration represented as ^m^_it^,^_jtⁿ (_c_jt ) is given by [figure omitted; refer to PDF]

3.2. Nonparametric Belief Propagation (NBP)

We note that the messages computed using (4) are mixtures of Gaussians and computing a message ^m^_ib^,^_jb^k (_x_jb ) involves the product of the interaction potential ψ(_x_ib ,_x_jb ) , the association potentials [varphi](_x_ib ,_y_ib ) , the messages from the neighbors in the recognition layer ^m^_kt^,^_ib^n-1 (_x_ib ) for all _kt ∈_Et i, and the messages from the neighbors in the restoration layer ^m^_hb^,^_ib^n-1 (_x_ib ) for all _hb ∈_Eb i, _hb ≠_jb , where each term is a mixture of Gaussians. In order to evaluate (4), the mixture components in the potentials and the messages have to be pruned drastically so that the number of components in the product is within tractable limits to solve the integral. Such an approximation is unsuitable for the restoration problem and alternatively we use the nonparametric extension proposed by Sudderth et al. [15] and independently invented by Isard [14].

The interaction potential can be decomposed into a marginal influence term given by ξ(_x_ib ):=∫ψ(_x_ib ,_x_jb )d_x_jb and an interaction term ψ(_x_jb |_x_ib ) . The message update equation (4) can be solved in two phases. The first phase involves computing the product ^π^_ib^,^_jbⁿ (_x_ib ) : [figure omitted; refer to PDF] Each term in the product ^π^_ib^,^_jbⁿ (_x_ib ) is a mixture of Gaussians and if each term has M mixture components, then the product is a mixture of ^ML Gaussians, where L is the number of terms. Exact computation of the product can be performed, however, it is not feasible because of the ...AA;(^ML ) computations. Pruning of the mixture components can be performed to restrict the number of computations, but it turns out to be a very coarse approximation for the restoration problem. Sequential Gibbs sampling [20] and importance weighting were used in [14, 15] to generate M asymptotically unbiased samples without explicitly computing the product. In this work, we use alternating Gibbs sampling (parallel sampling) [21] to obtain asymptotically unbiased samples ^x^_ib¹ ,^x^_ib² ,...,^x^_ib^M from the product ^π^_ib^,^_jbⁿ (_x_ib ) . The second phase involves integrating the combination of the above product ^π^_ib^,^_jbⁿ (_x_ib ) with the interaction term. Suddherth et al. [15] and Isard [14] proposed Gibbs sampling to solve the first phase and handled the second phase using stochastic integration and further, represented the messages nonparametrically using a kernel density estimate as [figure omitted; refer to PDF] where ^w^_jb^m , ^μ^_jb^m , ^Λ^_jb^m correspond to the weight, mean, and covariance associated with the m th kernel. The message update is performed using stochastic integration, where every sample ^x^_ib^m is propagated to node _jb by sampling ^x^_jb^m from the interaction potential ψ(_x_ib ,_x_jb ) . Now, nonparametric density estimation is used to obtain a kernel density estimate (8) for the message ^m^_ib^,^_jbⁿ (_x_jb ), where the means of the kernels are the propagated samples. Covariances are chosen to be diagonal and identical and are obtained using leave one outcross validation [18].

3.3. Data-Driven Local Retraining

Nonparametric belief propagation can be used to approximately evaluate (4), but, however, we note that the first two terms in (4), ψ(_x_ib ,_x_jb ) and [varphi](_x_ib ,_y_ib ), are nonparametric densities each having more than ¹⁰⁶ components. It is computationally infeasible to compute ^m^_ib^,^_jbⁿ (_x_jb ) and, hence, we need a technique for pruning the nonparametric densities to be computationally reasonable and meaningful. We adopt a data-driven retraining step in which the potentials ψ(_x_ib ,_x_jb ) and [varphi](_x_ib ,_y_ib ) are learned from a subset sampled from the training set based on the posterior marginal p(_c_kt |Y) at the recognition layer node _c_kt , where _kt indicates the common neighbor of _ib and _jb in the recognition layer.

3.4. Multilayer Message Passing

Belief propagation for our two-layer model is given by the following schedule.

(1) Intrarestoration layer messages given by (4) are computed using nonparametric belief propagation and the data-driven local retraining algorithm.

(2) Restoration-to-recognition layer message is approximately evaluated at the maximum posterior marginal estimate _argmax_x_ib p(_x_ib |Y) owing to computational difficulties involved in evaluating the exact integral.

(3) Intrarecognition layer messages are evaluated using (6).

(4) Recognition-to-restoration layer messages are computed using (3) and the resulting nonparametric message is pruned based on the weight of the components.

The above steps are performed iteratively till convergence.

3.5. Restoration and Recognition

After convergence of the message passing algorithm, the marginal distributions p(_x_ib |Y) and p(_c_it |Y) can be evaluated using [figure omitted; refer to PDF] The restored image is obtained by estimating the maximum posterior marginal (MPM) estimate _argmax_x_ib p(_x_ib |Y) at all nodes _ib in the restoration layer and similarly recognition results are obtained by evaluating the MPM estimate _argmax_c_it p(_c_it |Y) at all nodes _it in the recognition layer.

4. Recognition Layer

The recognition layer is incorporated into the model to aid the restoration layer. The basic need for this layer arises from the fact that the training dataset for any practical application is extremely big and we need an efficient way of subdividing this set for searching through this space of training examples.

One important feature of this kind of model is that it does not expect the nodes as well as their interactions to be modeled in any particular form. We investigate two different applications and two different representations of the recognition layer as well as the interactions in this layer.

4.1. License Plate Recognition

For this class of applications, the recognition layer nodes can be represented by a multinomial distribution over the 10 digits. In essence, this layer captures the digit class to which the restoration nodes (_xi 's) belong to. For example, let us consider a case where we are trying to restore an image of 39 . Let us assume we have 3 rows of x nodes under each z node. After a few iterations of our NBP algorithm for the multilayered MRF, the nodes in the two layers converge to the states as shown in Figure 3.

Figure 3: The restoration nodes x converge to the digits. The recognition layer multinomials peak at the digit locations.

[figure omitted; refer to PDF]

4.2. Face Image Restoration

We note that face images have a spatial coherence which can be exploited for recognition. If the face is divided into 4 quadrants with the nose tip as the center, then we can expect the eyes to be in 1st and 2nd quadrants and the cheeks to be in the other two quadrants. This information can be incorporated in our restoration algorithm and we can search for patches only in the first two quadrants while restoring the eye regions (Figure 4).

The division of a face into regions for better restoration. Overlapping nodes ensure better interaction for the boundary patches.

(a) [figure omitted; refer to PDF]

(b) [figure omitted; refer to PDF]

The recognition layer in this model is just the index of the grid structure imposed over the face image. The interactions for the nodes in this layer are captured in the overlapped structure of the grids.

5. Compression Artifacts Removal by Frequency Domain MRF Model

We propose a new progressive and fast approach based on the transform domain MRF model. The L×L DCT transform is interpreted as an ^L2 -subband transform. The transform coefficients at the same subband are modeled as an image. This decomposition is similar to the wavelet transform techniques proposed in [29]. We pose the problem of finding the optimal coefficients as an inference problem in Markov random fields (MRFs). Example-based inference techniques have been used by [9, 30] for image superresolution. In a similar tone, we try to learn the mapping between uncompressed and compressed images with many training patches. But the novelty is that now the patches are no longer image patches, but contiguous blocks of DCT coefficients which form the subband image for a particular frequency. We learn the potential functions for each subband image separately and inference is carried out for each subband image independently of the others. An attractive feature of this technique (originally introduced by Li and Delp [28]) is the fact that not all the subband coefficient images need to be inferred.

5.1. Block DCT Basics

Classical BDCT coding techniques divide the input image of size M×N into blocks of size L×L , and then the 2-D DCT is obtained for each block. The 2-D DCT coefficients of the (i,j) th block _B(i,j) can be described by [figure omitted; refer to PDF] where _f(i,j) (m,n) is the pixel value in _B(i,j) , u,v=0,...,L-1 , i=0,...,M/L , j=0,...,N/L , and [figure omitted; refer to PDF]

After the DCT transform, the transform coefficients are quantized independently in each block (for the sake of simplicity in analysis, we assume that there is no intraprediction in the DCT coefficients across blocks) and the quantized coefficients can be determined by [figure omitted; refer to PDF] where ...AC;(u,v) is a quantization table and γ is a quantization parameter which controls the overall quality level.

"Dequantization" and the inverse DCT (IDCT) are determined by [figure omitted; refer to PDF] [figure omitted; refer to PDF]

Since the dequantization process is not a lossless operation, it can be shown that the reconstructed DCT coefficients can actually have any value inside the QCS instead of the one defined in (13). The values lying outside the QCS are usually clipped. The independent reconstruction of each block DCT coefficients leads to discontinuities along the block boundaries which subsequently causes the blocky artifacts. It has been recognized that the DCT coefficients at the same frequency are highly correlated from block to block. On the other hand, the DCT coefficients at different frequencies are nearly uncorrelated which can be attributed to the suboptimality in energy compaction and decorrelation of the DCT. As a result, the DCT coefficients at the same frequency in different blocks can be regrouped as one coherent group. We define each group as F(u,v) , where (u,v) is the frequency or order of the coefficients. F(u,v) is then considered as a subband image of size (M/L)×(N/L) . A 2×2 BDCT-coded image and its subband images are shown in Figure 5. Clipping is done to ensure a proper display.

Subband representation of the 2×2 block DCT coefficients.

(a) [figure omitted; refer to PDF]

(b) [figure omitted; refer to PDF]

6. Experiments and Results

In this section, we will provide results for the different methods we have illustrated throughout this paper. The images are always represented as patches of size 4×4 . Our training set consists of face images as well as license plate images. The low-resolution training dataset consists of gray scale images obtained by convolving Laplacian kernels of different scales with the high-resolution images. For generating the training data for the digit images, we collected blurred as well as sharp images from 20 different font families. For the face images, we use a proprietary face database which has 5 different poses for the same person. The database consists of aligned faces and hence any test face which is not in the training set needs to be aligned with respect to the training images.

First set of experiments are related to the multilayer MRF structure as elaborated in Section 2. We present results for both the applications mentioned above, namely, license plate image restoration and face image restoration. For the digit case, the results are shown in Figure 7. We test the recognition accuracy and confidence of the proposed alternating restoration and recognition algorithm. As mentioned earlier, the potentials during the first iteration are learned using random images from the training set. These potentials are then refined after the completion of the recognition algorithm, by making use of the confidence scores to obtain representative samples. For this experiment, the training set consists of 200 synthetic images and the test set is composed of 200 real images of the blurred license plates. We present results of recognition accuracy and the improvement in confidence scores, before and after restoration, after 5 runs of the restoration and recognition loop. There was a significant improvement in recognition rate 92 % after restoration compared to the 40 % recognition rate before restoration. In Figure 6, we present the average confidence scores corresponding to the true digit class before and after restoration. We observe that there is a clear improvement in the confidence score for most of the digits ("0", "3", "4", "5", "6", "8"). In some cases ("1", "7"), we observe that the gain is not significant, as the confidence scores are already high. For the experiments with face images, the results are shown in Figure 8.

Figure 6: Confidence scores versus true digit class, before and after restoration.

[figure omitted; refer to PDF]

Recognition and restoration results: (top Left) original license plate; (top right top to bottom) blurred input, deblurred using deconvolution methods, result of our algorithm, actual sharp image. (Bottom left to right) multinomial distributions (X-axis: digit class, Y-axis: confidence score) over the nodes in the recognition layer. The top row is the result after 3 iterations and the bottom row is the result after 6 iterations.

[figure omitted; refer to PDF]

Multilayer MRF-based face superresolution. (Left) Blurred input; (center) deblurred using deconvolution methods; (right) our method.

(a) [figure omitted; refer to PDF]

(b) [figure omitted; refer to PDF]

We demonstrate the superiority of the BP technique for inferring the DCT coefficients over other existing methods that have been proposed over the recent years. We have tried different block sizes resulting in different number of subband images. The overall performance is much more dependent on the compression bit rate rather than the block size. Also for most blocks around 60% of the subband images are modified thereby, saving on runtime. The training data for the experiments have been generated from Feret face recognition dataset [31] and the CMU PIE dataset [32]. The experiments are mainly done to handle face images, but the technique can be easily extended to other areas, as we show in the final set of experiments. We used 200 images for training. For color images, we process the different channels separately. Each training image is compressed with 5 different compression rates resulting in total 1000 training pairs of images.

The experiments are mainly divided into two categories. First, we try our method on the Lena image because of the wealth of published results for Lena. We report improvement over the method proposed by Zou and Yan [22], which reported improvements over most other existing methods. The comparative results are shown in Figure 9. Note that the results have been taken from published images and have not been regenerated. The eye portion in the result shows that it is sharper than the result obtained by Zou and Yan [22]. The PSNR values obtained for the Lena image are shown in Figure 10. Though PSNR may not be the best metric for evaluation, since it has been used by many researchers to claim improvement, we show the results in Figure 10.

Figure 9: Comparative results. (Top row from left to right) LENA, a part of JPEG-coded LENA at 0.188 bpp, result of the proposed method, and Zou and Yan result [22]. (Bottom row) Yang and Galatsanos result [23], Paek et al. result [24], Park and Kim result [25], Jeong et al. result [26], and MPEG-4 result [27].

[figure omitted; refer to PDF]

Figure 10: PSNR values for the methods mentioned in Figure 9.

[figure omitted; refer to PDF]

In the second set of experiments, we test on frames obtained from a real video for which no ground truth was available. The closest method to the one proposed in this work is the one proposed by Li and Delp [28]. The results shown in Figure 11 have been produced by Zhen Li, on his copyright software. Result for a frame of the same video, but using the method proposed in this paper, is shown in Figure 12. Clearly, the face looks much sharper in our method. The reconstructed images show much better performance near the eye and the teeth areas. Figure 13 shows face results generated from images compressed at higher bit rates.

(Top row, left) Compressed image; (right) reconstruction using Li and Delp [28] method (8 × 8 BDCT); (bottom row, left) compressed image; (right) reconstruction using Li-Delp method (2×2 BDCT).

(a) [figure omitted; refer to PDF]

(b) [figure omitted; refer to PDF]

(d) [figure omitted; refer to PDF]

(Top) Compressed image; (bottom) reconstruction using our method (2×2 BDCT).

(a) [figure omitted; refer to PDF]

(b) [figure omitted; refer to PDF]

(d) [figure omitted; refer to PDF]

(Left) Compressed image; (right) reconstruction using our method (2×2 BDCT).

(a) [figure omitted; refer to PDF]

(b) [figure omitted; refer to PDF]

(d) [figure omitted; refer to PDF]

(e) [figure omitted; refer to PDF]

(f) [figure omitted; refer to PDF]

We also compare our method to the method proposed by Yang and Galatsanos [23] for nonface-based experiments, as shown in Figure 14. The training data used for these experiments were still based on face images. Yang and Galatsanos [23] used explicit horizontal, vertical, and diagonal smoothness constraints which are expensive to compute. We try to capture all the information in the potential functions for the MRF.

(Left) Compressed image; (center) reconstruction using our method (2×2 BDCT); (right) reconstruction using Yang and Galatsanos [23] method.

(a) [figure omitted; refer to PDF]

(b) [figure omitted; refer to PDF]

7. Conclusion and Future Directions

In this paper, we have tried to address the problem of image restoration through multiple methods. We model the images as MRF's over a patch-based representation. The two spatial domain methods mentioned move toward the idea that high-level concepts like recognition can be used to aid low-level operations like restoration. We introduce a transformed domain method analogous to the spatial domain patch-based MRF and implement the system for removing compression artifacts from images and videos.

References

[1] B. Bascle, A. Blake, A. Zisserman, "Motion deblurring and super-resolution from an image sequence," in Proceedings of the 4th European Conference on Computer Vision (ECCV '96), pp. 573-581, Cambridge, UK, April 1996.

[2] M. Irani, S. Peleg, "Improving resolution by image registration," CVGIP: Graphical Models and Image Processing , vol. 53, no. 3, pp. 231-239, 1991.

[3] S. P. Kim, N. K. Bose, H. M. Valenzuela, "Recursive reconstruction of high resolution image from noisy undersampled multiframes," IEEE Transactions on Acoustics, Speech, and Signal Processing , vol. 38, no. 6, pp. 1013-1027, 1990.

[4] R. R. Schultz, R. L. Stevenson, "Extraction of high-resolution frames from video sequences," IEEE Transactions on Image Processing , vol. 5, no. 6, pp. 996-1011, 1996., [email protected]; [email protected]

[5] H. Stark, P. Oskoui, "High-resolution image recovery from image-plane arrays, using convex projections," Journal of the Optical Society of America A , vol. 6, no. 11, pp. 1715-1726, 1989.

[6] R. C. Hardie, K. J. Barnard, E. E. Armstrong, "Joint MAP registration and high-resolution image estimation using a sequence of undersampled images," IEEE Transactions on Image Processing , vol. 6, no. 12, pp. 1621-1633, 1997., [email protected]

[7] S. Baker, T. Kanade, "Limits on super-resolution and how to break them," IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 24, no. 9, pp. 1167-1183, 2002., [email protected]

[8] D. Capel, A. Zisserman, "Super-resolution from multiple views using learnt image models," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '01), vol. 2, pp. 627-634, Kauai, Hawaii, USA, December 2001.

[9] W. T. Freeman, E. C. Pasztor, O. T. Carmichael, "Learning low-level vision," International Journal of Computer Vision , vol. 40, no. 1, pp. 25-47, 2000., [email protected]; [email protected]; [email protected]

[10] C. M. Bishop, A. Blake, B. Marthi, "Super-resolution enhancement of video," in Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics (AI&Stats '03), Key West, Fla, USA, January 2003.

[11] C. Liu, H.-Y. Shum, C.-S. Zhang, "A two-step approach to hallucinating faces: global parametric model and local nonparametric model," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '01), vol. 1, pp. 192-198, Kauai, Hawaii, USA, December 2001.

[12] M. Das Gupta, S. Rajaram, N. Petrovic, T. S. Huang, "Restoration and recognition in a loop," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '05), vol. 1, pp. 638-644, San Diego, Calif, USA, [email protected]; [email protected]; [email protected]; [email protected], June 2005.

[13] S. Rajaram, M. Das Gupta, N. Petrovic, T. S. Huang, "Learning-based nonparametric image super-resolution," EURASIP Journal on Applied Signal Processing , vol. 2006, 2006.

[14] M. Isard, "PAMPAS: real-valued graphical models for computer vision," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '03), vol. 1, pp. 613-620, Madison, Wis, USA, June 2003.

[15] E. B. Sudderth, A. T. Ihler, W. T. Freeman, A. S. Willsky, "Nonparametric belief propagation," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '03), vol. 1, pp. 605-612, Madison, Wis, USA, [email protected]; [email protected]; [email protected]; [email protected], June 2003.

[16] J. Pearl Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference , Morgan Kaufmann, San Francisco, Calif, USA, 1988.

[17] "The frobnicatable foo filter," ECCV '06 submission ID 324. Supplied as additional material eccv06.pdf, 2006

[18] B. W. Silverman Density Estimation for Statistics and Data Analysis , CRC Press, Boca Raton, Fla, USA, 1986.

[19] J. S. Yedidia, W. T. Freeman, Y. Weiss, "Understanding belief propagation and its generalizations," Exploring Artificial Intelligence in the New Millennium , pp. 239-269, Morgan Kaufmann, San Francisco, Calif, USA, 2003.

[20] S. Geman, D. Geman, "Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images," IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 6, no. 6, pp. 721-741, 1984.

[21] G. E. Hinton, "Training products of experts by minimizing contrastive divergence," Neural Computation , vol. 14, no. 8, pp. 1771-1800, 2002., [email protected]

[22] J. J. Zou, H. Yan, "A deblocking method for BDCT Compressed images based on adaptive projections," IEEE Transactions on Circuits and Systems for Video Technology , vol. 15, no. 3, pp. 430-435, 2005., [email protected]

[23] Y. Yang, N. P. Galatsanos, "Removal of compression artifacts using projections onto convex sets and line process modeling," IEEE Transactions on Image Processing , vol. 6, no. 10, pp. 1345-1357, 1997., [email protected]

[24] H. Paek, R.-C. Kim, S.-U. Lee, "On the POCS-based postprocessing technique to reduce the blocking artifacts in transform coded images," IEEE Transactions on Circuits and Systems for Video Technology , vol. 8, no. 3, pp. 358-367, 1998.

[25] S. H. Park, D. S. Kim, "Theory of projection onto the narrow quantization constraint set and its application," IEEE Transactions on Image Processing , vol. 8, no. 10, pp. 1361-1373, 1999., [email protected]; [email protected]

[26] Y. Jeong, I. Kim, H. Kang, "A practical projection-based postprocessing of block-coded images with fast convergence rate," IEEE Transactions on Circuits and Systems for Video Technology , vol. 10, no. 4, pp. 617-623, 2000., [email protected]; [email protected]; [email protected]

[27] MPEG-4 video verification model version 18.0 iso/iecjtc1/sc29/wg11, 2001

[28] Z. Li, E. J. Delp, "Block artifact reduction using a transform-domain Markov random field model," IEEE Transactions on Circuits and Systems for Video Technology , vol. 15, no. 12, pp. 1583-1593, 2005., [email protected]; [email protected]

[29] Z. Xiong, M. T. Orchard, Y.-Q. Zhang, "A deblocking algorithm for JPEG compressed images using overcomplete wavelet representations," IEEE Transactions on Circuits and Systems for Video Technology , vol. 7, no. 2, pp. 433-437, 1997.

[30] T. A. Stephenson, T. Chen, "Adaptive Markov random fields for example-based super-resolution of faces," EURASIP Journal on Applied Signal Processing , vol. 2006, 2006.

[31] P. J. Phillips, H. Moon, S. A. Rizvi, P. J. Rauss, "The FERET evaluation methodology for face-recognition algorithms," IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 22, no. 10, pp. 1090-1104, 2000.

[32] T. Sim, M. Bsat, "The CMU pose, illumination, and expression database," IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 25, no. 12, pp. 1615-1618, 2003., [email protected]; [email protected]

Word count: 6188

Show less

Copyright © 2009 Mithun Das Gupta et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Translate

We present a supervised learning approach for object-category specific restoration, recognition, and segmentation of images which are blurred using an unknown kernel. The novelty of this work is a multilayer graphical model which unifies the low-level vision task of restoration and the high-level vision task of recognition in a cooperative framework. The graphical model is an interconnected two-layer Markov random field. The restoration layer accounts for the compatibility between sharp and blurred images and models the association between adjacent patches in the sharp image. The recognition layer encodes the entity class and its location in the underlying scene. The potentials are represented using nonparametric kernel densities and are learnt from training data. Inference is performed using nonparametric belief propagation. Experiments demonstrate the effectiveness of our model for the restoration and recognition of blurred license plates as well as face images.

Details

Title

Models for Patch-Based Image Restoration

Author

Mithun Das Gupta; Rajaram, Shyamsundar; Petrovic, Nemanja; Huang, Thomas S

Publication year

2009

Publication date

2009

Publisher

Springer Nature B.V.

ISSN

16875176

e-ISSN

16875281

Source type

Scholarly Journal

Language of publication

English

ProQuest document ID

856039447

Models for Patch-Based Image Restoration

Jump to:

Full Text

Abstract

Details

Suggested sources