Hybrid Frequency–Spatial Domain Learning for

Full text

Turn on search term navigation

1. Introduction

In recent years, smartphones have become an integral part of our lives, serving crucial roles as communication devices, media consumption tools, and gateways to the virtual world. The pursuit of larger and more immersive displays has catalyzed significant advancements in smartphone design. For instance, bezel sizes have decreased, and innovative solutions, such as notch and punch-hole designs, have been developed to accommodate front-facing cameras. Among these advancements, the under-display camera (UDC) system [1,2] stands out as a groundbreaking technology aimed at achieving full-screen displays. This is accomplished by reducing pixel density to enhance light transmittance and integrating the front-facing camera beneath the display panel, resulting in an aesthetically pleasing effect in which the camera is virtually invisible to the naked eye.

However, UDC systems face challenges related to poor light transmittance and diffraction, leading to degraded images with undesirable effects, such as noise, ghost, haze, fog, and blur [3,4,5], as illustrated in Figure 1. These degradation phenomena occur due to the UDC panel positioned in front of the camera lens. They can be mathematically described using the convolution theorem as follows:

(1) $y = k * x,$

where

*

denotes the convolution operation,

y

represents the degraded image result,

k

denotes the point spread function (PSF) of the UDC system, and

x

represents the original clear image. The PSF characterizes how a focused optical imaging system responds to a point source [6,7,8,9].

To address the challenges arising from poor light transmittance and diffraction in UDC systems, research has focused on the structural optimization of the display panel [10,11,12]. Additionally, a computational approach leveraging artificial intelligence (AI) techniques has been explored to enhance image quality in UDC systems, complementing structural optimization efforts. To mitigate undesired artifacts in degraded images, numerous deep neural networks, predominantly based on U-Net architecture, have been proposed [13,14,15,16,17,18,19]. These studies have primarily utilized a U-Net-based structure to analyze the mapping relationship between original and degraded images within the spatial domain.

In this study, we propose a hybrid frequency–spatial domain learning approach to realize a more effective image restoration process in UDC systems. By training an AI model not only in the spatial (image) domain but also in the frequency (Fourier) domain, we address the technical challenges inherent in UDC systems. The integration of frequency domain learning enables the model to directly analyze and restore global patterns and structural degradations, such as those introduced by diffraction. In addition, spatial domain learning focuses on refining local textures and correcting visible artifacts. This hybrid domain approach allows the model to learn a more intuitive correlation between degraded and original images, significantly enhancing its predictive accuracy and reliability compared to traditional spatial domain-only methods.

For training in most image restoration tasks, a large amount of training data consisting of degraded images and their corresponding original images is essential, and UDC scenarios are no exception. However, obtaining degraded images and original images simultaneously in UDC systems is not straightforward. In outdoor shooting, unless images are captured simultaneously, the subjects may change, making it impossible to obtain exactly corresponding data. Even if an optical system capable of simultaneous shooting is implemented, collecting such data directly is time-consuming and labor-intensive. This challenge is further compounded in UDC systems, where degradation is often specific to the unique optical properties of the display panel.

To effectively address this limitation, we propose the generation of an extensive dataset of virtual degraded images from pristine originals using optical simulations. This simulation-based approach allows us to accurately replicate degradation phenomena, including diffraction effects, that are specific to UDC systems. The resulting synthetic dataset provides diverse and comprehensive training data, reducing the need for laborious data acquisition and enabling the creation of datasets with varying PSFs to represent different UDC imaging scenarios. By leveraging virtual data augmentation, we facilitate the expansion of the dataset to encompass a broad spectrum of UDC scenarios, making the training process both efficient and representative real-world conditions. This use of synthetic data addresses the fundamental challenges of limited paired datasets in UDC image restoration tasks.

The effectiveness of our trained AI models has also been evaluated in restoring experimentally obtained degraded images. This validation indicates the utility and robustness of the proposed image restoration methodology in UDC systems.

The remainder of this paper is organized as follows: Section 2 outlines the proposed methodology, including the process of generating virtual degraded images through optical simulations and the design of the hybrid frequency–spatial domain learning model. Section 3 presents the experimental setup, dataset preparation, and results of the image restoration tasks, comparing the performance of the proposed model with other existing approaches. Section 4 serves as the conclusion of this study, encompassing the implications of the findings, and potential future directions for enhancing image restoration in UDC systems. It summarizes the key contributions and highlights the significance of introducing hybrid frequency–spatial domain learning as a robust solution for addressing the unique challenges of UDC imaging.

2. Methodology

In this study, as shown in Figure 2, we go through three processes to achieve effective image restoration in UDC systems. The first process involves generating virtual degraded images, simulating the unique degradation characteristics specific to UDC systems. The second process focuses on restoring these images by learning the degradation phenomena in the frequency domain, while the final process involves image restoration through learning in the spatial domain. In this section, each process in our effective image restoration methodology is described in detail, outlining the specific techniques used to address the challenges in UDC imaging.

2.1. Generation of the Virtual Degraded Image

2.1.1. Light Propagation Simulation Using the Angular Spectrum Method

To comprehensively evaluate the propagation and diffraction of light through the UDC panel, we employed the angular spectrum method (ASM). The ASM is a powerful tool for understanding how light propagates from one plane to another, making it particularly effective for analyzing complex optical systems, such as UDC panels [20,21,22,23,24,25]. This method simulates light propagation by decomposing the optical field into an infinite number of plane waves of the same frequency traveling in different directions using a two-dimensional Fourier transform. Mathematically, inverse Fourier and Fourier transformations are expressed as [26,27]

(2) $\begin{array}{l} u (x, y, 0) = \int \int U (f_{x}, f_{y}) e^{i 2 π (f_{x} x + f_{y} y)} d f_{x} d f_{y}, \\ \begin{matrix} U (f_{x}, f_{y}) = \int \int u (x, y, 0) e^{- i 2 π (f_{x} x + f_{y} y)} d x d y = F {u (x, y, 0)}_{(f_{x}, f_{y})} . \end{matrix} \end{array}$

In Equation (2), $u (x, y, 0)$ represents the intensity field when the initial distance $z$ is zero, where $x$ and $y$ are spatial coordinates. The term $U (f_{x}, f_{y})$ corresponds to the weight associated with the plane wave $\exp [i 2 π (f_{x} x + f_{y} y)]$ , which represents the angular spectrum of the optical field.

By analyzing individual plane waves and their propagation characteristics, the ASM enables the independent propagation of each Fourier component. This property allows us to analyze the behavior of light as it interacts with the various components and structures of a UDC panel separately. The propagation of each Fourier component is expressed mathematically as follows [28,29]:

(3) $\begin{matrix} e^{i \vec{k} \cdot \vec{r}} = e^{i 2 π (f_{x} x + f_{y} y + z \sqrt{{(\frac{1}{λ})}^{2} - {f_{x}}^{2} - {f_{y}}^{2}})}, \\ u (x, y, z) & = \int \int U (f_{x}, f_{y}) e^{i 2 π (f_{x} x + f_{y} y + z \sqrt{{(\frac{1}{λ})}^{2} - {f_{x}}^{2} - {f_{y}}^{2}})} d f_{x} d f_{y} \\ \begin{matrix} = F^{- 1} {F {u (x, y, 0)}_{(f_{x}, f_{y})} e^{i 2 π z \sqrt{{(\frac{1}{λ})}^{2} - {f_{x}}^{2} - {f_{y}}^{2}}}}_{(x, y)} . \end{matrix} \end{matrix}$

In Equation (3), the exponential term describes the propagation of a plane wave over the distance $z$ , $u (x, y, z)$ represents the intensity field at the distance $z$ from the initial plane, and $λ$ is the wavelength of light. To reconstruct the intensity field at the distance $z$ , an inverse Fourier transform is applied to combine the propagated Fourier components with their respective weights $U (f_{x}, f_{y})$ . By using Equation (3), we can analyze how light propagates through, is diffracted by, and interacts with the UDC panel.

2.1.2. Generation of the Point Spread Function

In this section, we outline our methodology for generating the PSF of the UDC panel, combining both optical experiments and numerical simulations. Our goal is to validate the consistency between these two approaches, thereby offering a comprehensive understanding of the optical behavior of UDC panels and ensuring the accuracy of the simulation model for replicating degradation phenomena.

To begin, we designed a simplified UDC panel structure based on a commercial smartphone panel, focusing on the essential pattern structures that affect light behavior, such as transparent and opaque regions (see Figure 3a). For the optical experiment, we fabricated a simplified UDC panel pattern by laser etching on chrome-coated glass (see Figure 3b). This setup allowed us to conduct experiments to observe how light behaves as it passes through the panel. Concurrently, we digitally recreated the UDC panel structure at a resolution of 1024 (H) × 1024 (V) pixels with a pixel size of 2.544 µm for use in numerical simulations. Here, we utilized the ASM to simulate the PSF of the UDC panel, as explained previously.

The experimental and simulation conditions are illustrated in Figure 4. To capture the PSF, we placed the image plane at the focal point of the lens, positioning the pinhole (point source) 300 mm away. The image distance was calculated using the lens formula:

(4) $\frac{1}{f} = \frac{1}{D_{o}} + \frac{1}{D_{i}},$

where

D_{o}

is the object distance and

D_{i}

is the image distance. Consequently, the image sensor was placed 3.439 mm behind a lens with a focal length of 3.4 mm. In this setup, the point source

u (x, y, 0)

first propagates 300 mm using the ASM before interacting with the UDC panel, which is represented by the transfer function

U D C (x, y)

. The field then passes through the lens, described by its pupil function

P (x, y)

and a quadratic phase factor, and propagates an additional 3.4 mm to the CCD plane using the ASM. This sequence of transformations effectively models the optical behavior of the UDC system and enables precise observation of the PSF formation. Both the numerical simulation and the optical experiment confirmed the PSF formation by the UDC panel within the designated optical imaging system, as shown in Figure 4.

For the experiment, we measured PSF fields at wavelengths of 634 nm, 551 nm, and 445 nm and then simulated the corresponding PSF fields under identical conditions. By comparing the simulated and experimental PSF fields, we assessed their alignment. After validating the consistency between the results of the numerical simulation and the optical experiment for each monochromatic PSF field, we extended this method to simulate PSFs for all wavelengths. Building on this, we describe the generation of a three-channel (3ch) PSF for the UDC. The 3ch PSF, denoted as $k$ , is calculated as follows:

(5) $k (x, y) = \sum_{λ} γ_{c} (λ) \cdot k_{λ} (x, y), c = {R, G, B} .$

Here, $γ_{c}$ represents the camera spectral sensitivity for each color channel R, G, and B, and $k_{λ}$ is the PSF field for each wavelength [30,31,32,33]. The calibrated camera spectral sensitivity $γ_{c}$ and the simulated 3ch PSF $k$ are shown in Figure 5a and Figure 5b, respectively.

2.2. Deep Neural Network Approach Based on Hybrid Frequency–Spatial Domain Learning for Image Restoration

In this study, we adopted a conditional generative adversarial network (cGAN) for image restoration, as cGANs have demonstrated remarkable effectiveness in generating realistic and high-quality images [34,35,36,37,38,39]. A cGAN model comprises two main components: a generator and a discriminator network. The generator synthesizes images by using a combination of random noise and conditional information, aiming to generate images that closely mimic real data. The discriminator, in contrast, receives both real and generated data, along with the conditional information, working to distinguish between them. During training, the generator and discriminator engage in an adversarial process: the generator strives to produce realistic images, while the discriminator attempts to accurately classify images as real or generated.

A distinctive feature of our approach is the integration of hybrid frequency–spatial domain learning within this adversarial framework. Unlike traditional cGANs, which operate primarily in the spatial domain, our model incorporates information from the frequency domain of degraded images. This is essential for capturing structural nuances that are often compromised in UDC systems due to panel-induced degradation.

Our approach involves two separate but complementary cGAN networks: one for the frequency domain and the other for the spatial domain. Each network includes a generator and a discriminator, denoted as $G_{f}$ and $D_{f}$ for the frequency domain, and $G_{s}$ and $D_{s}$ for the spatial domain. Analyzing images in the frequency domain enables us to directly address the loss or alteration of specific frequency components, which often leads to image degradation issues such as blurring and aliasing. In the frequency domain, missing or degraded information appears as gaps or irregularities in k-space—the Fourier transform representation of spatial data.

As explained in Equation (1), image degradation in a UDC system can be described using the convolution theorem. When the PSF of an imaging system is defined, it becomes possible to analyze how input signals are transformed into output signals. While degradation trends can be understood in the spatial domain for individual images, it is challenging to identify consistent patterns across multiple images. In contrast, the k-space domain provides clearer insights into where degradation occurs across diverse images. This clarity is due to the convolution theorem’s representation in the k-space as a simple pixel-wise multiplication [40,41].

In Figure 6, the left panels represent the Fourier domain information for four random sample images, each showing the lost or altered frequency components caused by degradation. While the overall patterns differ across the four images, it is evident that the loss consistently occurs within similar frequency bands. The rightmost panel displays the averaged k-space modulation data computed from all training images, providing a comprehensive visualization of the frequency regions that are systematically impacted by the UDC system.

This analysis guided our model design, enabling it to learn in the Fourier domain by identifying the specific regions where k-space components are predominantly lost or altered. By doing so, the model gains an intuitive understanding of the degradation caused by the defined UDC system, thereby enhancing its ability to effectively restore degraded images.

Building on these insights, we design a frequency domain network to predict and fill missing or altered values in k-space. First, we convert degraded images to the frequency domain using the discrete Fourier transform (DFT), which allows direct analysis and modification of frequency components. The DFT of an image $f (x, y)$ is given by

(6) $F (u, v) = \sum_{x = 0}^{M - 1} \sum_{y = 0}^{N - 1} f (x, y) \cdot e^{- j 2 π (\frac{u x}{M} + \frac{v y}{N})},$

where

u

and

v

are the frequency domain coordinates, and

M

and

N

are the dimensions of the image [42,43,44].

In the frequency domain, the generator $G_{f}$ takes degraded frequency data as input, learning to predict and fill in the missing k-space components. The discriminator $D_{f}$ evaluates the authenticity of reconstructed frequency data by comparing it with the original data, thus encouraging $G_{f}$ to generate frequency data indistinguishable from authentic data.

After estimating and filling in the missing frequency components, we reconstruct the frequency domain data. We then apply the inverse discrete Fourier transform (IDFT) to convert the restored frequency domain data back into the spatial domain:

(7) $\hat{f} (x, y) = \frac{1}{M N} \sum_{u = 0}^{M - 1} \sum_{v = 0}^{N - 1} \hat{F} (u, v) \cdot e^{j 2 π (\frac{u x}{M} + \frac{v y}{N})},$

where

\hat{F} (u, v)

is the reconstructed frequency domain data, and

\hat{f} (x, y)

is the restored spatial domain image.

In parallel, we employ a spatial domain network with generator $G_{s}$ and discriminator $D_{s}$ . This network directly processes spatial images, correcting artifacts such as blur, noise, and aliasing. $G_{s}$ applies anti-aliasing filtering to mitigate effects from under-sampling and other limitations, while $D_{s}$ evaluates the restored images, guiding $G_{s}$ to generate high-quality images closely resembling the originals. $D_{s}$ evaluates input pairs of degraded and restored images by outputting a probability score that reflects how well the restored image matches real data. This evaluation leverages convolutional layers to extract and compare key features, including local textures, edge sharpness, and structural patterns. While metrics such as SSIM or PSNR are not directly measured using $D_{s}$ , its adversarial feedback during training effectively guides $G_{s}$ to enhance visual fidelity and maintain consistency in restored images. Furthermore, $D_{s}$ ’s loss function, detailed in Appendix A.2, ensures that $G_{s}$ generates images indistinguishable from real ones by refining both local and global features.

Both the frequency and spatial domain networks are trained simultaneously yet independently. Each pair of generator and discriminator works within its domain, minimizing discrepancies with the ground truth. Frequency domain training focuses on restoring structural integrity, addressing global features, while spatial domain training corrects visible artifacts and refines local details, enhancing textures and visual quality. The cGAN networks in the frequency domain ( $G_{f}$ and $D_{f}$ ) and spatial domain ( $G_{s}$ and $D_{s}$ ) share an identical architecture. The primary distinction lies in the input data and the domain-specific focus: a frequency domain network processes k-space representations, while a spatial domain network handles pixel-level images. This unified architecture ensures consistency while enabling domain-specific training for optimal restoration performance. The architectural details of the generator and discriminator components are described in Appendix A.1 and visually summarized in Figure A1. Additionally, the combined structure of a single cGAN network unit, comprising both the generator and the discriminator, is illustrated in Figure A2.

By analyzing both domains, our model effectively addresses image degradation in UDC systems. Most deep learning methods primarily rely on spatial domain learning, which is intuitive and aligns with human visual perception. Spatial domain learning focuses on extracting features through convolutional operations, which model relationships between neighboring pixels. This approach is particularly effective at capturing local textures and details, as convolutional kernels operate within limited receptive fields [45,46,47]. However, this same reliance on limited receptive fields makes spatial domain learning less effective at capturing global contextual information. In contrast, frequency domain learning transforms images into a representation in which global patterns are directly accessible, as each frequency component reflects contributions from the entire spatial field [48,49].

By combining these approaches, our hybrid learning strategy enables comprehensive restoration by leveraging complementary information from both domains. This approach achieves higher predictive accuracy and reliability, while reducing the likelihood of overlooked degradations that could occur if only one domain were used.

The light blue and dark yellow blocks in Figure 2 illustrate the AI model pipeline, demonstrating our hybrid frequency–spatial domain learning approach within the cGAN architecture. This model processes virtual degraded images across both frequency and spatial domains, providing a robust framework for understanding and restoring degraded images. Further technical details on the network structure and training process can be found in Appendix A.

3. Results and Discussion

3.1. Image Dataset Establishment

3.1.1. Virtual Degraded Image Generation

Utilizing the derived 3ch PSF in conjunction with the process outlined in Figure 7, we generated virtual degraded images from various types of images. To create these images, every pixel in each channel of an original color image was convolved with its respective PSF kernel, and the results were integrated to form the degraded image. This procedure was repeated to compile a large dataset of virtual degraded images. The original image datasets comprised self-taken photographs from daily life and DIV2K images provided by ‘DIVerse 2K resolution high-quality images for the challenges of CVPR’. To quantify image degradation, we used two metrics: the structural similarity index measure (SSIM) and the peak signal-to-noise ratio (PSNR). SSIM is widely recognized for its ability to assess perceptual similarity between two images [50,51,52,53,54], while PSNR quantifies the ratio between the maximum possible signal power (the original image) and the noise power that distorts its representation (the reconstructed or processed image) [55,56]. For the virtual image set, the average SSIM was 0.8168, and the average PSNR was 27.981 dB. An SSIM value of 1 indicates an original, pristine image, and an average SSIM value of 0.8168 reflects approximately 18.3% image degradation caused by the UDC panel. PSNR provides an additional perspective on image quality, with higher values indicating greater fidelity. Generally, PSNR values between 30 and 50 dB are considered acceptable for most practical applications [57,58]. A PSNR value below 30 dB suggests noticeable quality degradation, while values above 40 dB imply that the processed image is nearly indistinguishable from the original. The virtual data generated in this manner were used alongside actual degraded images as training data, serving to compensate for the limitations of real data.

3.1.2. Actual Degraded Image Acquisition

The acquisition of actual degraded images and their corresponding original images necessitated precise alignment of the optical axis, ensuring that no variation occurred during the process. To accurately replicate a real UDC system, we constructed a holding frame using 3D printing, designed to align with our simulation conditions. This frame allowed us to securely position a commercial smartphone, facilitating the capture of original images. Subsequently, we carefully inserted the UDC panel into the frame, ensuring that there was no alteration to the optical axis or field of view, to obtain the degraded images. The capturing process for the setups without and with the UDC panel are illustrated in Figure 8a and Figure 8b, respectively. This method ensured consistency and accuracy in our image acquisition process.

3.2. Image Restoration of Degraded Images

The validation process in this study comprised two critical steps to confirm the reliability of the proposed approach. Initially, we validated both the standard cGAN and the hybrid frequency–spatial domain learning cGAN models using datasets not included in the training phase. The dataset encompassed a total of 3300 images, consisting of 3000 virtually augmented images and 300 experimentally acquired degraded images. Both datasets were divided into training and validation sets in an 8:2 ratio. Specifically, for training, we used 2400 virtual images and 240 real images, and for validation, we used 600 virtual images and 60 real images. This approach allowed us to train the models on both virtual and real data and to assess their performance comprehensively on validation sets that included both types of data. Both the cGAN and the hybrid frequency–spatial domain learning cGAN models were trained under identical conditions. The technical setup included Visual Studio Code IDE, Python 3.9.7, CUDA 11.8, and PyTorch 2.1.2, ensuring stable and efficient performance. Detailed information on hyperparameters is available in Appendix A.2.

The image restoration results on the virtual degraded images are shown in Figure 9. The figure presents the original pristine, degraded, cGAN-restored, and hybrid domain learning cGAN-restored results, which are highlighted by green, red, yellow, and blue boxes, respectively. A qualitative analysis visually comparing the restored images to their pristine counterparts showed that both models produced visually appealing results. However, a closer examination of the enlarged sections of the images revealed that the images restored by the hybrid domain learning cGAN exhibited enhanced clarity in darker areas and reduced granular-patterned noise, in contrast to those restored by the standard cGAN. The average SSIM values are also summarized in Figure 9b on the virtual degraded images and the restored images by the two different models, where each case is displayed with different colors: blue for the hybrid domain learning cGAN, yellow for the standard cGAN, and red for the degraded images, respectively. Quantitatively, the hybrid domain learning cGAN achieved an average SSIM value of 0.9813, compared to 0.9604 for the standard cGAN, indicating approximately a 20.7% improvement in image quality.

Similarly, Figure 9c presents the PSNR values for the same dataset and models, following the same color scheme. The degraded images showed an average PSNR of 27.112 dB, which was significantly improved to 33.420 dB by the standard cGAN. The hybrid domain learning cGAN achieved the highest average PSNR of 36.683 dB, further demonstrating its ability to enhance image quality. These results are consistent with the SSIM analysis, reaffirming the superior restoration capabilities of the hybrid domain learning approach compared to the standard cGAN. This evaluation verified the efficacy of hybrid frequency–spatial domain learning in image restoration.

To provide a more comprehensive evaluation of the experimental results, we present additional summary statistics, including the minimum, maximum, and median values of SSIM and PSNR for each case. These statistics, summarized in Table 1 and Table 2, offer deeper insights into the variability and consistency of the models’ performance across the dataset.

As shown in Table 1, the hybrid domain learning cGAN consistently outperformed both the standard cGAN and the degraded images across all SSIM metrics. Notably, the median SSIM value for the hybrid domain learning cGAN (0.9813) closely approaches the maximum value of 1, indicating its strong capability to produce high-quality reconstructions with minimal variability. Similarly, Table 2 highlights the PSNR results, where the hybrid domain learning cGAN also achieves the highest scores across all statistics. The significant improvement in the minimum PSNR value (20.715 dB) compared to the standard cGAN (18.602 dB) reflects its robustness in handling even the most challenging degraded images. The maximum PSNR value (46.256 dB) further underscores the model’s ability to produce near-pristine reconstructions under optimal conditions.

Subsequently, we comprehensively evaluated the models’ capabilities using actual degraded images obtained from the optical experiment. This evaluation aimed to verify the hybrid frequency–spatial domain-learning cGAN’s ability to effectively restore actual degraded images. Figure 10 presents the image restoration results using 60 actual degraded images. The results for actual degraded images of the hybrid domain learning cGAN and the standard cGAN showed a trend similar to the results for the virtual degraded images. The hybrid domain learning cGAN achieved an average SSIM value of 0.9608, while the standard cGAN scored 0.9345.

As illustrated in Figure 10c, the degraded images exhibited an average PSNR value of 26.383 dB. The standard cGAN improved this to 32.706 dB, while the hybrid domain learning cGAN further enhanced the results to 36.046 dB. This trend aligns with the SSIM findings, reaffirming the hybrid domain learning cGAN’s superior performance in restoring actual degraded images compared to the standard cGAN.

However, the SSIM and PSNR values for the reconstructed images from the actual degraded images were slightly lower than those from the virtual degraded images. This difference is attributed to additional factors, such as background noise, tilting, or aberration present in the actual acquired degraded images, which were not included in the virtual degraded image generation. Despite these challenges, the relative improvements compared to the degraded images indicate consistent quality enhancement, with the hybrid frequency–spatial domain learning cGAN demonstrating robust performance, even in actual degradation scenarios.

Table 3 and Table 4 provide detailed summary statistics, including the minimum, maximum, and median values of SSIM and PSNR, respectively. Table 3 reveals that the hybrid domain learning cGAN consistently produced higher SSIM values across all statistics compared to the standard cGAN and degraded images. The median SSIM for the hybrid model reached 0.9714, showcasing its ability to deliver high-quality restorations, even for real-world degraded images. This performance, combined with its higher minimum SSIM value (0.8702), underscores the hybrid model’s robustness and reliability. In Table 4, the PSNR results further validate the hybrid model’s superiority. While the degraded images had a median PSNR of 26.704 dB, the hybrid domain learning cGAN improved this to 37.558 dB, outperforming the standard cGAN’s 33.061 dB. Additionally, the hybrid model demonstrated a notable advantage in handling difficult cases, with a minimum PSNR of 20.625 dB compared to the standard cGAN’s 18.712 dB.

3.3. Computational Complexity of the Hybrid Frequency–Spatial Domain Learning cGAN

The proposed hybrid domain learning cGAN comprises two identical cGAN network units: one operating in the frequency domain and the other in the spatial domain. Each unit consists of a generator based on U-Net architecture and a discriminator based on a PatchGAN structure. To facilitate the transitions between the spatial and frequency domains, fast Fourier transform (FFT) and inverse fast Fourier transform (IFFT) operations are integrated into the model. The computational complexity of the entire model is analyzed in terms of floating-point operations (FLOPs), calculated using the PyTorch library.

The generator has a total complexity of 18.25 GFLOPs, while the discriminator contributes 968.43 MFLOPs. The FFT and IFFT operations, required for domain transitions, each add 31.5 MFLOPs. Combining these, the computational complexity of the frequency domain unit, which includes the generator, discriminator, and FFT, amounts to 19.25 GFLOPs. Similarly, the spatial domain unit, comprising the generator, discriminator, and IFFT, also totals 19.25 GFLOPs.

By summing the complexities of both units, the total computational complexity of the proposed hybrid frequency–spatial domain cGAN is calculated as

(8) $Total FLOPs = 19.25 \times 10^{9} + 19.25 \times 10^{9} = 38.5 \times 10^{9} FLOPs (38 . 5 GFLOPs) .$

The detailed theoretical calculations and methodology for deriving the computational complexity can be found in Appendix A.3. Based on the total complexity of 38.5 GFLOPs, processing the model on a high-performance GPU, such as NVIDIA RTX 3060, requires approximately 3.02 milliseconds. It is important to note that this timing is specific to the given hardware and resolution.

4. Conclusions

In this study, we addressed a novel image restoration method for the UDC system of smartphones by introducing hybrid frequency–spatial domain learning into a deep neural network. This method, which incorporates additional information from the Fourier domain during the training process, has demonstrated substantial improvement in image quality. By analyzing images in the frequency domain, the model directly addresses the loss or alteration of specific frequency components, which are often responsible for image degradation phenomena such as blurring, aliasing, and other artifacts. The spatial domain learning complements this by correcting visible artifacts by addressing distortions and degradations observable in the spatial domain, such as noise and blur.

The effectiveness of this approach is highlighted by the increase in the SSIM index from 0.8130 to 0.9813 for the virtual validation datasets, compared to the standard cGAN’s increase from 0.8130 to 0.9604. Similarly, the PSNR values improved significantly, with the hybrid domain learning cGAN achieving an average PSNR of 36.683 dB compared to 33.420 dB for the standard cGAN.

Moreover, the model has shown its capability to restore experimentally obtained degraded images, achieving an SSIM index increase from 0.8047 to 0.9608 and a PSNR increase from 26.383 dB to 36.046 dB. In contrast, the standard cGAN, while also effective, exhibited slightly lower performance compared to the hybrid domain learning cGAN. The consistent outperformance of the hybrid frequency–spatial domain learning cGAN over the standard cGAN, in both SSIM and PSNR metrics, demonstrates the value and effectiveness of the hybrid domain learning approach in image restoration.

In addition to developing the restoration model, we generated a dataset of virtual degraded images using simulations based on the ASM, facilitating efficient training. This data augmentation strategy not only allows for the expansion of the dataset to cover a broad range of UDC imaging system scenarios but also reduces the need for laborious manual data acquisition.

Finally, the computational complexity of the proposed model was analyzed. The proposed model’s computational complexity for processing a 256 × 256 resolution image is calculated as 38.5 GFLOPs, as detailed in this study. When scaling this to the resolution of a standard smartphone front-facing camera, assumed to be 2560 × 1440 (commonly referred to as QHD resolution), the computational complexity increases by approximately 56 times, reaching nearly 2.17 TFLOPs per frame. Based on the hardware specifications of the current commercial UDC products, the processing time is estimated to be approximately 0.85 s per frame, suggesting that the proposed model could be integrated into commercial UDC products. Moreover, by applying techniques such as resolution optimization, region of interest (ROI) processing, model compression, and cloud-based processing for server offloading, the computational load can be significantly reduced, thereby increasing the feasibility of utilizing the model effectively.

The synergy of optical-theory-based simulation and advanced deep learning techniques, as demonstrated in our study, may provide new approaches for enhancing image quality in the UDC system of smartphones. Our proposed methodology can potentially be extended to other imaging systems facing similar degradation challenges, paving the way for future advancements in display and camera technologies.

Author Contributions

Conceptualization, K.K.; data curation, K.K. and Y.K.; formal analysis, K.K.; funding acquisition, Y.-J.K.; investigation, K.K. and Y.K.; methodology, K.K.; project administration, K.K. and Y.-J.K.; resources, K.K. and Y.-J.K.; software, K.K.; supervision, Y.-J.K.; validation, K.K. and Y.K.; visualization, K.K. and Y.K.; writing—original draft preparation, K.K.; writing—review and editing, K.K. and Y.-J.K. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Dataset available on request from the authors.

Acknowledgments

This research was supported by the Commercialization Promotion Agency for R&D Outcomes (COMPA) (NTIS, 2710007979) and LG Display (C2023001955).

Conflicts of Interest

The authors declare no conflicts of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables

View Image - Figure 1. Conceptual pixel layouts of (a) a typical punch-hole camera system and (b) a UDC system, respectively, and their respective image quality.

Figure 1. Conceptual pixel layouts of (a) a typical punch-hole camera system and (b) a UDC system, respectively, and their respective image quality.

Figure 2. Overall flowchart of the proposed method for image restoration in the UDC system.

View Image - Figure 3. Mimicking the UDC panel structure for the experiment: (a) observed image from a commercial smartphone and (b) fabricated UDC panel pattern.

Figure 3. Mimicking the UDC panel structure for the experiment: (a) observed image from a commercial smartphone and (b) fabricated UDC panel pattern.

Figure 4. Setup for the simulation and optical experiment to observe PSF fields generated by the UDC panel.

Figure 5. (a) Camera spectral sensitivity and (b) simulated 3ch PSF fields.

View Image - Figure 6. Representation of k-space regions showing lost or altered data for four different degraded sample images and the averaged loss across the entire dataset.

Figure 6. Representation of k-space regions showing lost or altered data for four different degraded sample images and the averaged loss across the entire dataset.

Figure 7. Process of generating virtual degraded images.

Figure 8. Image acquisition moments by fixed camera by holding frame (a) without UDC panel, and (b) with UDC panel.

View Image - Figure 9. Image restoration result: (a) Visual comparison of pristine images (green), degraded images (red), images restored by the cGAN (yellow), and images restored by the hybrid domain learning cGAN (blue) in cases of validation using a virtual validation dataset. Quantified results of the image restoration quality based on (b) SSIM index and (c) PSNR index for the virtual validation dataset.

Figure 9. Image restoration result: (a) Visual comparison of pristine images (green), degraded images (red), images restored by the cGAN (yellow), and images restored by the hybrid domain learning cGAN (blue) in cases of validation using a virtual validation dataset. Quantified results of the image restoration quality based on (b) SSIM index and (c) PSNR index for the virtual validation dataset.

View Image - Figure 10. Image restoration result: (a) Visual comparison of pristine images (green), degraded images (red), images restored by the cGAN (yellow), and images restored by the hybrid domain learning cGAN (blue) in cases of validation using an experimentally obtained degraded dataset. Quantified results of the image restoration quality based on (b) SSIM index and (c) PSNR index for experimentally obtained degraded dataset.

Figure 10. Image restoration result: (a) Visual comparison of pristine images (green), degraded images (red), images restored by the cGAN (yellow), and images restored by the hybrid domain learning cGAN (blue) in cases of validation using an experimentally obtained degraded dataset. Quantified results of the image restoration quality based on (b) SSIM index and (c) PSNR index for experimentally obtained degraded dataset.

Table 1

Summary statistics of SSIM values for degraded images and images reconstructed by the standard cGAN and by the hybrid domain learning cGAN on virtual degraded images. The highest value in each row is underlined.

SSIM	Degraded Image	Standard cGAN	Hybrid DomainLearning cGAN
Minimum	0.5313	0.7347	0.8097
Maximum	0.9754	0.9903	0.9953
Median	0.8169	0.9603	0.9813

Table 2

Summary statistics of PSNR values for degraded images and images reconstructed by the standard cGAN and by the hybrid domain learning cGAN on virtual degraded images. The highest value in each row is underlined.

PSNR	Degraded Image	Standard cGAN	Hybrid DomainLearning cGAN
Minimum	18.072	18.602	20.715
Maximum	36.097	40.066	46.256
Median	27.112	34.603	37.786

Table 3

Summary statistics of SSIM values for degraded images, images reconstructed by the standard cGAN, and images reconstructed by the hybrid domain learning cGAN on actual degraded datasets. The highest value in each row is underlined.

SSIM	Degraded Image	Standard cGAN	Hybrid DomainLearning cGAN
Minimum	0.6404	0.8502	0.8702
Maximum	0.8972	0.9752	0.9865
Median	0.8099	0.9416	0.9714

Table 4

Summary statistics of PSNR values for degraded images and images reconstructed by the standard cGAN and by the hybrid domain learning cGAN on actual degraded datasets. The highest value in each row is underlined.

PSNR	Degraded Image	Standard cGAN	Hybrid DomainLearning cGAN
Minimum	18.163	18.712	20.625
Maximum	30.9682	39.188	44.692
Median	26.704	33.061	37.558

Appendix A. Details of Proposed AI Model Architecture and Process

Appendix A.1. Dataset Preparation

The generator, based on the U-Net architecture, is designed for image-to-image translation tasks. The down-sampling path, which is an encoder, is composed of 8 layers of U-Net down block. Each block performs a convolution followed by instance normalization (if normalization is true) and leaky ReLU activation. Starting with an input image of size 3 × 256 × 256, each down-sampling step halves the spatial dimensions (height and width) and potentially increases the number of channels. Dropout is applied in the later down-sampling layers to prevent overfitting [59,60]. The up-sampling path, which is a decoder, is composed of 8 layers of U-Net up block. Each block performs a transposed convolution to increase the spatial dimensions and decrease the number of channels. After the transposed convolution, the output is concatenated with a corresponding feature map from the down-sampling path (skip connection), followed by instance normalization and leaky ReLU activation. The final block restores the image to its original size of 3 × 256 × 256 and applies a Tanh activation.

The discriminator is designed to evaluate the authenticity of the generated images based on given input conditions. It processes concatenated pairs of images, with the input comprising 6 channels each of degraded and the original data information. Each block in the discriminator includes a convolutional layer followed by optional instance normalization and leaky ReLU activation. The convolutional layers use a 2D convolution with a specific kernel size, stride, and padding to reduce spatial dimensions while increasing channel numbers. Instance normalization, applied in certain blocks, aids in stabilizing the training process [61,62], while the leaky ReLU activation function allows for a small gradient when inactive [63,64,65]. The discriminator’s structure consists of several stages, each increasing the number of channels while reducing spatial dimensions, focusing on more abstract features. The final layer is a convolutional layer that reduces the depth to 1 channel, producing a patch output to indicate the authenticity of each patch of the input images. A sigmoid activation is applied to this output.

The architecture of both the generator and the discriminator is detailed in Figure A1, providing a visual representation of their structures. Additionally, the combined structure of a single cGAN network unit, comprising both the generator and the discriminator, is illustrated in Figure A2.

View Image - Figure A1. Common architecture of the generator and the discriminator for both frequency and spatial domains in the hybrid domain learning framework. The numbers above each tensor (N, M2) represent the tensor structure, with N dimensions and a size of M × M.

Figure A1. Common architecture of the generator and the discriminator for both frequency and spatial domains in the hybrid domain learning framework. The numbers above each tensor (N, M2) represent the tensor structure, with N dimensions and a size of M × M.

Figure A2. The architecture of a single cGAN network unit used in both the frequency and spatial domains.

Appendix A.2. Training Process

First, weights of convolutional layers were initialized using a normal distribution with a mean of 0 and standard deviation of 0.02. Binary cross-entropy loss (BCE loss) used for the GAN. It measures how well the discriminator can distinguish between real and fake images [66,67]. Pixel-wise L1 loss used for the generator. It measures the difference between the generated image and the real image, encouraging the generator to produce realistic images [68,69]. A weighting factor for the pixel-wise loss balanced the GAN loss and the pixel-wise loss.

In the proposed model, the generator $G$ aims to produce an output image $G (x_{i})$ given an input condition $x_{i}$ , and the discriminator $D$ aims to distinguish between the generated image $G (x_{i})$ and the real target image $y_{i}$ . The total discriminator loss can be formulated as follows:(A1) $L_{D} = \frac{1}{2} (L_{r e a l} + L_{f a k e}),$ where $L_{D}$ is the total loss for the discriminator, $L_{r e a l}$ is the BCE loss for real images, and $L_{f a k e}$ is the BCE loss for fake images. Each of these BCE losses is calculated using the formula for BCE loss [70,71], which is (A2) $L_{B C E} = - \frac{1}{N} \sum_{i = 1}^{N} [l_{i} \cdot \log (p_{i}) + (1 - l_{i}) \cdot \log (1 - p_{i})],$ where $N$ is the number of samples (in this case, the number of patches), $l_{i}$ is the true label for each sample (1 for real and 0 for fake), and $p_{i}$ is the predicted probability for each sample being real, as output by the discriminator. Applying this to the real and fake losses, $L_{r e a l}$ and $L_{f a k e}$ can be expressed as (A3) $\begin{array}{l} L_{r e a l} = - \frac{1}{N} \sum_{i = 1}^{N} [\log (D (x_{i}, y_{i}))], \\ \begin{matrix} L_{f a k e} = - \frac{1}{N} \sum_{i = 1}^{N} [\log (1 - D (x_{i}, G (x_{i})))], \end{matrix} \end{array}$ respectively [72,73]. Here $D (x_{i}, y_{i})$ is the discriminator’s output when presented with a pair of the input condition $x_{i}$ and the real target image $y_{i}$ , and $D (x_{i}, G (x_{i}))$ is the output when presented with the input condition $x_{i}$ and the generated $G (x_{i})$ . Thus, according to the Equation (A1), the total discriminator loss is then the average of these two losses, and it is given by (A4) $L_{D} = - \frac{1}{N} \sum_{i = 1}^{N} [\log (D (x_{i}, y_{i})) + \log (1 - D (x_{i}, G (x_{i})))] .$

The total generator loss is composed of two components, which are BCE loss for generator, and pixel-wise loss. BCE loss can be calculated by using Equation (A2), similar to the discriminator’s loss, but with the objective of fooling the discriminator into classifying the generated images as real. The BCE loss for the generator is given by (A5) $L_{g} = - \frac{1}{N} \sum_{i = 1}^{N} \log (D (x_{i}, G (x_{i}))),$ where N is the number of samples, and $D (x_{i}, G (x_{i}))$ is the discriminator’s output for the fake images generated by the generator from the input image $x_{i}$ . The generator tries to produce an image $G (x_{i})$ that the discriminator will classify as real. The pixel-wise loss is the L1 loss, which is the mean absolute error between the generated images and the real target images. The L1 loss is given by (A6) $L_{p i x e l} = \frac{1}{N} \sum_{i = 1}^{N} | G (x_{i}) - y_{i} |,$ where $G (x_{i})$ is the image generated from the input image and $y_{i}$ is the corresponding real target image. Hence, the total loss for the generator is a weighted sum of the BCE loss of the generator and the pixel-wise loss, and it can be expressed as (A7) $L_{G} = L_{g} + λ_{p i x e l} \cdot L_{p i x e l} .$

For optimization, two separate Adam optimizers are used for the generator and the discriminator, known for their efficiency and adaptive learning rate capabilities [74,75,76]. The learning rate is set to 0.001, and hyperparameters of the Adam approach, which are beta 1 and beta 2, are set to 0.5 and 0.999, respectively. The training loop runs for a predefined number of 2000 epochs, and the dataset is iterated over in mini-batches.

During each batch, the generator’s gradients are zeroed out, producing a fake image from the input. The discriminator then evaluates this fake image alongside the original image. The generator’s loss is calculated, followed by backpropagation and weight updates. Similarly, the discriminator’s gradients are zeroed out for each batch. It evaluates real images and the fake images generated by the generator, calculates its loss, and undergoes backpropagation and weight updates.

Appendix A.3. Computational Complexity

The computational complexity of our proposed model is derived from its architectural components and their associated operations. The model comprises two identical cGAN network units: one operating in the frequency domain and the other in the spatial domain. Each unit includes a generator and a discriminator, both of which are crucial for the adversarial training process. Additionally, the model incorporates fast Fourier transform (FFT) and inverse fast Fourier transform (IFFT) operations to transition between the spatial and frequency domains.

The total computational complexity of the model is analyzed in terms of floating-point operations per second (FLOPs), providing a precise measurement of the computational workload [77,78]. This complexity is calculated by summing the FLOPs of its generators, discriminators, and domain transformation operations (FFT and IFFT). Below, we outline the FLOPs for each component.

Appendix A.3.1. Generator Complexity

The generator is designed using U-Net architecture, which includes multiple down-sampling (encoder) blocks and up-sampling (decoder) blocks. Each down-sampling block employs a 2D convolution operation, and each up-sampling block uses a transposed convolution operation. Additionally, skip connections link the corresponding encoder and decoder layers, ensuring efficient feature reuse.

For the down-sampling path, the computational complexity of the $l$ th block is expressed as (A8) ${FLOPs}_{Conv 2 D}^{(l)} = H_{l} \cdot W_{l} \cdot C_{in}^{(l)} \cdot C_{out}^{(l)} \cdot K^{2} .$

Here, $H_{l}$ and $W_{l}$ represent the height and width of the feature map at the $l$ th layer, $C_{in}^{(l)}$ and $C_{out}^{(l)}$ are the input and output channel dimensions, and $K$ is the kernel size. In our implementation, $K = 4$ . The feature map dimensions $H_{l}$ and $W_{l}$ are halved at each subsequent layer, leading to a gradual reduction in spatial resolution. The total complexity of all down-sampling layers is obtained by summing the contributions from all 8 blocks:(A9) ${FLOPs}_{Down} = \sum_{l = 1}^{L} H_{l} \cdot W_{l} \cdot C_{in}^{(l)} \cdot C_{out}^{(l)} \cdot K^{2},$ where $L$ is the total number of down-sampling layers.

Similarly, in the up-sampling path, the computational complexity of the $l$ th block is given by (A10) ${FLOPs}_{ConvTranspose 2 D}^{(l)} = H_{l} \cdot W_{l} \cdot C_{in}^{(l)} \cdot C_{out}^{(l)} \cdot K^{2} .$

The up-sampling blocks restore the spatial resolution of the feature maps, with the total complexity across all up-sampling layers calculated as (A11) ${FLOPs}_{Up} = \sum_{l = 1}^{L} H_{l} \cdot W_{l} \cdot C_{in}^{(l)} \cdot C_{out}^{(l)} \cdot K^{2} .$

By combining the complexities of the down-sampling and up-sampling paths, the overall computational complexity of the generator is determined as (A12) ${FLOPs}_{Generator} = {FLOPs}_{Down} + {FLOPs}_{Up} .$

Appendix A.3.2. Discriminator Complexity

The discriminator is based on a PatchGAN architecture, which consists of multiple convolutional layers, followed by a final output layer that predicts the authenticity of each image patch. For the $l$ th convolutional layer, the computational complexity is defined as (A13) ${FLOPs}_{Conv 2 D}^{(l)} = H_{l} \cdot W_{l} \cdot C_{in}^{(l)} \cdot C_{out}^{(l)} \cdot K^{2} .$

Here, $H_{l}$ and $W_{l}$ represent the height and width of the feature map at the $l$ th layer, $C_{in}^{(l)}$ and $C_{out}^{(l)}$ are the input and output channel dimensions, and $K$ is the kernel size. For the PatchGAN architecture, the spatial dimensions $H_{l}$ and $W_{l}$ are progressively reduced through successive layers.

The total complexity of the discriminator is calculated by summing the complexities of all convolutional layers, including the final layer that produces the patch-wise output:(A14) ${FLOPs}_{Discriminator} = \sum_{l = 1}^{L} H_{l} \cdot W_{l} \cdot C_{in}^{(l)} \cdot C_{out}^{(l)} \cdot K^{2},$ where $L$ is the total number of layers in the discriminator.

Appendix A.3.3. FFT and IFFT Complexity

The FFT and IFFT operations are integral to the frequency domain network, enabling the transformation of images between the spatial and frequency domains. The computational complexity of a 2D FFT or IFFT operation for an image of size $H \times W$ is given by (A15) ${FLOPs}_{FFT} = H \cdot W \cdot \log_{2} (H \cdot W) .$

The same complexity applies to the IFFT operation:(A16) ${FLOPs}_{IFFT} = H \cdot W \cdot \log_{2} (H \cdot W) .$

Appendix A.3.4. Total Complexity

The proposed model comprises two cGAN networks: one operating in the frequency domain and the other in the spatial domain. The total computational complexity accounts for the generator, discriminator, FFT, and IFFT operations in both domains. It is expressed as (A17) ${FLOPs}_{Total} = 2 \cdot ({FLOPs}_{Generator} + {FLOPs}_{Discriminator}) + 2 \cdot {FLOPs}_{FFT} .$

Here, the factor of 2 reflects the presence of both frequency and spatial domain networks. This formula encapsulates the computational requirements for the entire model, providing a comprehensive measure of its complexity.

References

1. Wang, H.; Lin, Y.; Li, Y.; Yang, Y.; Zhou, T.; Chen, R.; Zhu, Y.; Chen, B.; Li, J. P-132: An Under-Display Camera Optical Structure for Full-Screen LCD. SID Symp. Dig. Tech. Pap.; 2020; 51, pp. 1881-1882. [DOI: https://dx.doi.org/10.1002/sdtp.14274]

2. Chu, P.T.; Wan, C.C. Full-Screen Display with Sub-Display Camera. U.S. Patent; 11,115,596, 7 September 2021.

3. Feng, R.; Li, C.; Chen, H.; Li, S.; Loy, C.C.; Gu, J. Removing Diffraction Image Artifacts in Under-Display Camera via Dynamic Skip Connection Network. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); Virtual, 19–25 June 2021.

4. Qin, Z.; Tsai, Y.-H.; Yeh, Y.-W.; Huang, Y.-P.; Shieh, H.-P.D. See-Through Image Blurring of Transparent Organic Light-Emitting Diodes Display: Calculation Method Based on Diffraction and Analysis of Pixel Structures. J. Disp. Technol.; 2016; 12, pp. 1242-1249. [DOI: https://dx.doi.org/10.1109/JDT.2016.2594815]

5. Wang, Y.; Wan, R.; Yang, W.; Wen, B.; Chau, L.-P.; Kot, A.C. Removing Image Artifacts From Scratched Lens Protectors. arXiv; 2023; arXiv: 2302.05746

6. Lee, J.W. “39-48: 3. Evaluation of Optical Design (last episode). Opt. J.; 2000; 12, 6.

7. Long, M.; Soubo, Y.; Weiping, N.; Feng, X.; Jun, Y. Point-spread Function Estimation for Adaptive Optics Imaging of Astronomical Extended Objects. Astrophys. J.; 2019; 888, 20. [DOI: https://dx.doi.org/10.3847/1538-4357/ab55ea]

8. Heath, M.T. Scientific Computing: An Introductory Survey; 2nd ed. SIAM: Philadelphia, PA, USA, 2018.

9. McNally, J.G.; Karpova, T.; Cooper, J.; Conchello, J.A. Three-dimensional imaging by deconvolution microscopy. Methods; 1999; 19, pp. 373-385. [DOI: https://dx.doi.org/10.1006/meth.1999.0873] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/10579932]

10. Wang, L.; Ma, Y.; Dai, H.; Wu, F.; Kim, M.; Qin, F.; Peng, D.Z. 7-3: OLED Camera-Under Panels with Improved Imaging Quality. SID Symp. Dig. Tech. Pap.; 2022; 53, pp. 51-53. [DOI: https://dx.doi.org/10.1002/sdtp.15413]

11. Wang, Z.; Chang, Y.; Wang, Q.; Zhang, Y.; Qiu, J.; Helander, M. 55-1: Invited Paper: Self-Assembled Cathode Patterning in AMOLED for Under-Display Camera. SID Symp. Dig. Tech. Pap.; 2020; 51, pp. 811-814. [DOI: https://dx.doi.org/10.1002/sdtp.13993]

12. Xu, C.X.; Yao, Q.; Li, X.H.; Shu, S.; He, W.; Xu, Z.Q.; Dong, L.W.; Wang, W.J.; Gao, Z.K.; Yuan, G.C. 7-2: High Transmittance Under-Display Camera Structure with COE. SID Symp. Dig. Tech. Pap.; 2022; 53, pp. 48-50. [DOI: https://dx.doi.org/10.1002/sdtp.15412]

13. Liao, C.-C.; Su, C.-W.; Chen, M.-Y. Mitigation of image blurring for performance enhancement in transparent displays based on polymer-dispersed liquid crystal. Displays; 2019; 56, pp. 30-37. [DOI: https://dx.doi.org/10.1016/j.displa.2018.11.001]

14. Yang, Q.; Liu, Y.; Tang, J.; Ku, T. Residual and Dense UNet for Under-Display Camera Restoration. Computer Vision—ECCV 2020 Workshops; Springer: Cham, Switzerland, 2020; pp. 398-408.

15. Liu, X.; Hu, J.; Chen, X.; Dong, C. UDC-UNet: Under-Display Camera Image Restoration via U-shape Dynamic Network. European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2022; pp. 113-129.

16. Zhou, Y.; Kwan, M.; Tolentino, K.; Emerton, N.; Lim, S.; Large, T.; Fu, L.; Pan, Z.; Li, B.; Yang, Q. UDC 2020 challenge on image restoration of under-display camera: Methods and results. Proceedings of the Computer Vision–ECCV 2020 Workshops; Glasgow, UK, 23–28 August 2020; Proceedings, Part V 16 Springer: Berlin/Heidelberg, Germany, 2020; pp. 337-351.

17. Conde, M.V.; Vasluianu, F.; Nathan, S.; Timofte, R. Real-time under-display cameras image restoration and hdr on mobile devices. European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2022; pp. 747-762.

18. Zhu, Y.; Wang, X.; Fu, X.; Hu, X. Enhanced coarse-to-fine network for image restoration from under-display cameras. European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2022; pp. 130-146.

19. Feng, R.; Li, C.; Chen, H.; Li, S.; Gu, J.; Loy, C.C. Generating Aligned Pseudo-Supervision from Non-Aligned Data for Image Restoration in Under-Display Camera. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; Vancouver, BC, Canada, 17–24 June 2023; pp. 5013-5022.

20. Kozacki, T.; Falaggis, K. Angular spectrum-based wave-propagation method with compact space bandwidth for large propagation distances. Opt. Lett.; 2015; 40, pp. 3420-3423. [DOI: https://dx.doi.org/10.1364/OL.40.003420] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/26176484]

21. Gbur, G.; Korotkova, O. Angular spectrum representation for the propagation of arbitrary coherent and partially coherent beams through atmospheric turbulence. JOSA A; 2007; 24, pp. 745-752. [DOI: https://dx.doi.org/10.1364/JOSAA.24.000745] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/17301863]

22. Zhang, W.; Zhang, H.; Jin, G. Band-extended angular spectrum method for accurate diffraction calculation in a wide propagation range. Opt. Lett.; 2020; 45, pp. 1543-1546. [DOI: https://dx.doi.org/10.1364/OL.385553]

23. Yu, X.; Xiahui, T.; Hao, P.; Wei, W. Wide-window angular spectrum method for diffraction propagation in far and near field. Opt. Lett.; 2012; 37, pp. 4943-4945. [DOI: https://dx.doi.org/10.1364/OL.37.004943] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/23202098]

24. Matsushima, K.; Schimmel, H.; Wyrowski, F. Fast calculation method for optical diffraction on tilted planes by use of the angular spectrum of plane waves. JOSA A; 2003; 20, pp. 1755-1762. [DOI: https://dx.doi.org/10.1364/JOSAA.20.001755] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/12968648]

25. Suh, K.D.; Dalrymple, R.A.; Kirby, J.T. An angular spectrum model for propagation of Stokes waves. J. Fluid Mech.; 1990; 221, pp. 205-232. [DOI: https://dx.doi.org/10.1017/S0022112090003548]

26. Nicola, S.D.; Finizio, A.; Pierattini, G.; Ferraro, P.; Alfieri, D. Angular spectrum method with correction of anamorphism for numerical reconstruction of digital holograms on tilted planes. Opt. Express; 2005; 13, pp. 9935-9940. [DOI: https://dx.doi.org/10.1364/OPEX.13.009935]

27. Shen, F.; Wang, A. Fast-Fourier-transform based numerical integration method for the Rayleigh-Sommerfeld diffraction formula. Appl. Opt.; 2006; 45, pp. 1102-1110. [DOI: https://dx.doi.org/10.1364/AO.45.001102] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/16523770]

28. Matsushima, K.; Shimobaba, T. Band-Limited Angular Spectrum Method for Numerical Simulation of Free-Space Propagation in Far and Near Fields. Opt. Express; 2009; 17, pp. 19662-19673. [DOI: https://dx.doi.org/10.1364/OE.17.019662] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/19997186]

29. Ritter, A. Modified shifted angular spectrum method for numerical propagation at reduced spatial sampling rates. Opt. Express; 2014; 22, pp. 26265-26276. [DOI: https://dx.doi.org/10.1364/OE.22.026265]

30. Jiang, J.; Liu, D.; Gu, J.; Süsstrunk, S. What is the space of spectral sensitivity functions for digital color cameras?. Proceedings of the 2013 IEEE Workshop on Applications of Computer Vision (WACV); Clearwater Beach, FL, USA, 15–17 January 2013; pp. 168-179.

31. Darrodi, M.M.; Finlayson, G.; Goodman, T.; Mackiewicz, M. Reference data set for camera spectral sensitivity estimation. J. Opt. Soc. Am. A; 2015; 32, pp. 381-391. [DOI: https://dx.doi.org/10.1364/JOSAA.32.000381]

32. Zhu, J.; Xie, X.; Liao, N.; Zhang, Z.; Wu, W.; Lv, L. Spectral sensitivity estimation of trichromatic camera based on orthogonal test and window filtering. Opt. Express; 2020; 28, pp. 28085-28100. [DOI: https://dx.doi.org/10.1364/OE.401496] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/32988087]

33. Gao, K.; Chang, M.; Jiang, K.; Wang, Y.; Xu, Z.; Feng, H.; Li, Q.; Hu, Z.; Chen, Y. Image restoration for real-world under-display imaging. Opt. Express; 2021; 29, pp. 37820-37834. [DOI: https://dx.doi.org/10.1364/OE.441256] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34808847]

34. Wang, K.; Gou, C.; Duan, Y.; Lin, Y.; Zheng, X.; Wang, F.-Y. Generative adversarial networks: Introduction and outlook. IEEE/CAA J. Autom. Sin.; 2017; 4, pp. 588-598. [DOI: https://dx.doi.org/10.1109/JAS.2017.7510583]

35. Creswell, A.; White, T.; Dumoulin, V.; Arulkumaran, K.; Sengupta, B.; Bharath, A.A. Generative adversarial networks: An overview. IEEE Signal Process. Mag.; 2018; 35, pp. 53-65. [DOI: https://dx.doi.org/10.1109/MSP.2017.2765202]

36. Qin, Z.; Zeng, Q.; Zong, Y.; Xu, F. Image inpainting based on deep learning: A review. Displays; 2021; 69, 102028. [DOI: https://dx.doi.org/10.1016/j.displa.2021.102028]

37. Bamoriya, P.; Siddhad, G.; Kaur, H.; Khanna, P.; Ojha, A. DSB-GAN: Generation of deep learning based synthetic biometric data. Displays; 2022; 74, 102267. [DOI: https://dx.doi.org/10.1016/j.displa.2022.102267]

38. Kim, K.; Jung, J.; Jang, S.H.; Ko, D.; Kim, Y.-J. Enhancing digital holography through an AI-powered approach for speckle noise reduction and coherence length preservation. Practical Holography XXXVIII: Displays, Materials, and Applications; SPIE: Bellingham, WA, USA, 2024; pp. 13-19.

39. Kim, K.; Jung, J.; Kim, C.; Ahn, G.; Kim, Y.-J. AI-driven pseudo-light source for achieving high coherence and low speckle noise simultaneously in dual-wavelength digital holographic microscopy. Opt. Laser Technol.; 2025; 181, 111572. [DOI: https://dx.doi.org/10.1016/j.optlastec.2024.111572]

40. Edwards, R. Convolutions of Functions. Fourier Series: A Modern Introduction Volume 1; Springer: Berlin/Heidelberg, Germany, 1979; pp. 50-68.

41. Dierolf, P. Multiplication and convolution operators between spaces of distributions. North-Holland Mathematics Studies; Elsevier: Amsterdam, The Netherlands, 1984; pp. 305-330.

42. Sundararajan, D. The Discrete Fourier Transform: Theory, Algorithms and Applications; World Scientific: Singapore, 2001.

43. He, Y.; Chen, H.; Liu, D.; Zhang, L. A framework of structural damage detection for civil structures using fast Fourier transform and deep convolutional neural networks. Appl. Sci.; 2021; 11, 9345. [DOI: https://dx.doi.org/10.3390/app11199345]

44. Dai, S.; Zhang, Y.; Li, K.; Chen, Q.; Ling, J. Arbitrary Sampling Fourier Transform and Its Applications in Magnetic Field Forward Modeling. Appl. Sci.; 2022; 12, 12706. [DOI: https://dx.doi.org/10.3390/app122412706]

45. Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. Computer Vision—ECCV 2020; Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J.-M. Springer International Publishing: Cham, Switzerland, 2020; pp. 213-229.

46. Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. Proceedings of the International Conference on Machine Learning (PMLR); Virtual, 18–24 July 2021; pp. 10347-10357.

47. Ranftl, R.; Bochkovskiy, A.; Koltun, V. Vision transformers for dense prediction. Proceedings of the IEEE/CVF International Conference on Computer Vision; Montreal, BC, Canada, 11–17 October 2021; pp. 12179-12188.

48. Stuchi, J.A.; Angeloni, M.A.; Pereira, R.F.; Boccato, L.; Folego, G.; Prado, P.V.; Attux, R.R. Improving image classification with frequency domain layers for feature extraction. Proceedings of the 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP); Tokyo, Japan, 25–28 September 2017; pp. 1-6.

49. Watanabe, T.; Wolf, D.F. Image classification in frequency domain with 2SReLU: A second harmonics superposition activation function. Appl. Soft Comput.; 2021; 112, 107851. [DOI: https://dx.doi.org/10.1016/j.asoc.2021.107851]

50. Ding, D.; Gan, S.; Chen, L.; Wang, B. Learning-based underwater image enhancement: An efficient two-stream approach. Displays; 2023; 76, 102337. [DOI: https://dx.doi.org/10.1016/j.displa.2022.102337]

51. Hore, A.; Ziou, D. Image quality metrics: PSNR vs. SSIM. Proceedings of the 2010 20th International Conference on Pattern Recognition; Istanbul, Turkey, 23–26 August 2010; pp. 2366-2369.

52. Starovoytov, V.; Eldarova, E.; Iskakov, K.T. Comparative analysis of the SSIM index and the pearson coefficient as a criterion for image similarity. Eurasian J. Math. Comput. Appl.; 2020; 8, pp. 76-90. [DOI: https://dx.doi.org/10.32523/2306-6172-2020-8-1-76-90]

53. Channappayya, S.S.; Bovik, A.C.; Heath, R.W. Rate bounds on SSIM index of quantized images. IEEE Trans. Image Process.; 2008; 17, pp. 1624-1639. [DOI: https://dx.doi.org/10.1109/TIP.2008.2001400]

54. Jeong, S.; Cho, C.; Jeon, J.; Paik, J. UHD TV image enhancement using example-based spatially adaptive image restoration filter. Displays; 2015; 40, pp. 88-95. [DOI: https://dx.doi.org/10.1016/j.displa.2015.06.003]

55. Sara, U.; Akter, M.; Uddin, M.S. Image quality assessment through FSIM, SSIM, MSE and PSNR—A comparative study. J. Comput. Commun.; 2019; 7, pp. 8-18. [DOI: https://dx.doi.org/10.4236/jcc.2019.73002]

56. Mustafa, W.A.; Yazid, H.; Jaafar, M.; Zainal, M.; Abdul-Nasir, A.S.; Mazlan, N. A review of image quality assessment (iqa): Snr, gcf, ad, nae, psnr, me. J. Adv. Res. Comput. Appl.; 2017; 7, pp. 1-7.

57. Chervyakov, N.; Lyakhov, P.; Nagornov, N. Analysis of the quantization noise in discrete wavelet transform filters for 3D medical imaging. Appl. Sci.; 2020; 10, 1223. [DOI: https://dx.doi.org/10.3390/app10041223]

58. Kotevski, Z.; Mitrevski, P. Experimental comparison of psnr and ssim metrics for video quality estimation. International Conference on ICT Innovations; Springer: Berlin/Heidelberg, Germany, 2009; pp. 357-366.

59. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res.; 2014; 15, pp. 1929-1958.

60. Singla, N.; Srivastava, V. Deep learning enabled multi-wavelength spatial coherence microscope for the classification of malaria-infected stages with limited labelled data size. Opt. Laser Technol.; 2020; 130, 106335. [DOI: https://dx.doi.org/10.1016/j.optlastec.2020.106335]

61. Yashchenko, A.; Rodionov, S.; Potapov, A. Method of instance normalization in deep-learning-based models for re-identification. J. Opt. Technol.; 2020; 87, pp. 487-490. [DOI: https://dx.doi.org/10.1364/JOT.87.000487]

62. Lin, B.; Fu, S.; Zhang, C.; Wang, F.; Li, Y. Optical fringe patterns filtering based on multi-stage convolution neural network. Opt. Lasers Eng.; 2020; 126, 105853. [DOI: https://dx.doi.org/10.1016/j.optlaseng.2019.105853]

63. Xu, J.; Li, Z.; Du, B.; Zhang, M.; Liu, J. Reluplex made more practical: Leaky ReLU. Proceedings of the 2020 IEEE Symposium on Computers and communications (ISCC); Rennes, France, 7–10 July 2020; pp. 1-7.

64. Liu, J.; Song, K.; Feng, M.; Yan, Y.; Tu, Z.; Zhu, L. Semi-supervised anomaly detection with dual prototypes autoencoder for industrial surface inspection. Opt. Lasers Eng.; 2021; 136, 106324. [DOI: https://dx.doi.org/10.1016/j.optlaseng.2020.106324]

65. Xiao, Y.-L.; Li, S.; Situ, G.; You, Z. Unitary learning for diffractive deep neural network. Opt. Lasers Eng.; 2021; 139, 106499. [DOI: https://dx.doi.org/10.1016/j.optlaseng.2020.106499]

66. Cheng, Q.; Bai, L.; Han, J.; Guo, E. Super-resolution imaging through the diffuser in the near-infrared via physically-based learning. Opt. Lasers Eng.; 2022; 159, 107186. [DOI: https://dx.doi.org/10.1016/j.optlaseng.2022.107186]

67. Zhang, Y.; Wang, X.; Sun, L.; Lei, P.; Chen, J.; He, J.; Zhou, Y.; Liu, Y. Mask-guided deep learning fishing net detection and recognition based on underwater range gated laser imaging. Opt. Laser Technol.; 2024; 171, 110402. [DOI: https://dx.doi.org/10.1016/j.optlastec.2023.110402]

68. Li, G.; Chen, X.; Li, M.; Li, W.; Li, S.; Guo, G.; Wang, H.; Deng, H. One-shot multi-object tracking using CNN-based networks with spatial-channel attention mechanism. Opt. Laser Technol.; 2022; 153, 108267. [DOI: https://dx.doi.org/10.1016/j.optlastec.2022.108267]

69. Chen, M.; Ji, X.; Lin, S.; Zeng, Y.; Yu, Y. Image reconstruction of scattered vortex light field based on deep learning. Opt. Laser Technol.; 2023; 163, 109347. [DOI: https://dx.doi.org/10.1016/j.optlastec.2023.109347]

70. Ho, Y.; Wookey, S. The real-world-weight cross-entropy loss function: Modeling the costs of mislabeling. IEEE Access; 2019; 8, pp. 4806-4813. [DOI: https://dx.doi.org/10.1109/ACCESS.2019.2962617]

71. Ruby, U.; Theerthagiri, P.; Jacob, I.J.; Vamsidhar, Y. Binary cross entropy with deep learning technique for image classification. Int. J. Adv. Trends Comput. Sci. Eng.; 2020; 9.

72. Isola, P.; Zhu, J.-Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Honolulu, HI, USA, 21–26 July 2017; pp. 1125-1134.

73. Wang, M.; Zhu, W.; Yu, K.; Chen, Z.; Shi, F.; Zhou, Y.; Ma, Y.; Peng, Y.; Bao, D.; Feng, S. Semi-supervised capsule cGAN for speckle noise reduction in retinal OCT images. IEEE Trans. Med. Imaging; 2021; 40, pp. 1168-1183. [DOI: https://dx.doi.org/10.1109/TMI.2020.3048975] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33395391]

74. Lee, M.-C.; Yu, C.-H.; Yao, C.-K.; Li, Y.-L.; Peng, P.-C. A Neural-network-based Inverse Design of the Microwave Photonic Filter Using Multiwavelength Laser. Opt. Commun.; 2022; 523, 128729. [DOI: https://dx.doi.org/10.1016/j.optcom.2022.128729]

75. Kang, S.; Kang, M.; Jang, Y.H.; Kim, C. Deep learning-based penetration depth prediction in Al/Cu laser welding using spectrometer signal and CCD image. J. Laser Appl.; 2022; 34, 042035. [DOI: https://dx.doi.org/10.2351/7.0000767]

76. Zhu, X.; Jiang, F.; Guo, C.; Wang, Z.; Dong, T.; Li, H. Prediction of melt pool shape in additive manufacturing based on machine learning methods. Opt. Laser Technol.; 2023; 159, 108964. [DOI: https://dx.doi.org/10.1016/j.optlastec.2022.108964]

77. Nakata, K.; Miyashita, D.; Deguchi, J.; Fujimoto, R. Adaptive quantization method for CNN with computational-complexity-aware regularization. Proceedings of the 2021 IEEE International Symposium on Circuits and Systems (ISCAS); Daegu, Korea, 22–28 May 2021; pp. 1-5.

78. Guo, Y.; Qin, Z.; Tao, X.; Dobre, O.A. Federated Generative-Adversarial-Network-Enabled Channel Estimation. Intell. Comput.; 2024; 3, 0066. [DOI: https://dx.doi.org/10.34133/icomputing.0066]

Word count: 9847

Show less

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

In the rapidly advancing realm of mobile technology, under-display camera (UDC) systems have emerged as a promising solution for achieving seamless full-screen displays. Despite their innovative potential, UDC systems face significant challenges, including low light transmittance and pronounced diffraction effects that degrade image quality. This study aims to address these issues by examining degradation phenomena through optical simulation and employing a deep neural network model incorporating hybrid frequency–spatial domain learning. To effectively train the model, we generated a substantial synthetic dataset that virtually simulates the unique image degradation characteristics of UDC systems, utilizing the angular spectrum method for optical simulation. This approach enabled the creation of a diverse and comprehensive dataset of virtual degraded images by accurately replicating the degradation process from pristine images. The augmented virtual data were combined with actual degraded images as training data, compensating for the limitations of real data availability. Through our proposed methods, we achieved a marked improvement in image quality, with the average structural similarity index measure (SSIM) value increasing from 0.8047 to 0.9608 and the peak signal-to-noise ratio (PSNR) improving from 26.383 dB to 36.046 dB on an experimentally degraded image dataset. These results highlight the potential of our integrated optics and AI-based methodology in addressing image restoration challenges within UDC systems and advancing the quality of display technology in smartphones.

Details

Title

Hybrid Frequency–Spatial Domain Learning for Image Restoration in Under-Display Camera Systems Using Augmented Virtual Big Data Generated by the Angular Spectrum Method

Author

Kim, Kibaek

; Yoon, Kim; Young-Joo, Kim

First page

Publication year

2025

Publication date

2025

Publisher

MDPI AG

e-ISSN

20763417

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/app15010030

ProQuest document ID

3153580915

Hybrid Frequency–Spatial Domain Learning for Image Restoration in Under-Display Camera Systems Using Augmented Virtual Big Data Generated by the Angular Spectrum Method

Jump to:

Full text

1. Introduction

2. Methodology

2.1. Generation of the Virtual Degraded Image

2.1.1. Light Propagation Simulation Using the Angular Spectrum Method

2.1.2. Generation of the Point Spread Function

2.2. Deep Neural Network Approach Based on Hybrid Frequency–Spatial Domain Learning for Image Restoration

3. Results and Discussion

3.1. Image Dataset Establishment

3.1.1. Virtual Degraded Image Generation

3.1.2. Actual Degraded Image Acquisition

3.2. Image Restoration of Degraded Images

3.3. Computational Complexity of the Hybrid Frequency–Spatial Domain Learning cGAN

4. Conclusions

Abstract

Details

Suggested sources