Content area
This study introduces a novel lightweight image super-resolution reconstruction network aimed at mitigating the challenges associated with computational complexity and memory consumption in existing super-resolution reconstruction networks. The proposed network optimizes its architecture through feature reuse and structural reparameterization, rendering it more suitable for deployment in edge computing environments. Specifically, we have developed a new lightweight reparameterization layer that derives redundant features from intrinsic features using low-cost operations and integrates them with reparameterization techniques to enhance efficient feature utilization. Furthermore, an efficient deep feature extraction module named RGAB has been designed, which retains dense connections, local feature integration, and local residual learning mechanisms while incorporating addition operations for feature integration. The resultant network, termed R2GDN, exhibits a significant reduction in model parameters and improved inference speed. Compared to performance-oriented super-resolution algorithms, our model reduces the number of parameters by approximately 95% and enhances inference speed by 86.8% on the edge device. When benchmarked against lightweight super-resolution algorithms, our model maintains a lower parameter count and achieves a 0.74% improvement in the structural similarity index (SSIM) on the BSD100 dataset for 4 × super-resolution reconstruction. Experimental results demonstrate that R2GDN effectively balances network performance and complexity.
1. Introduction
Image super-resolution reconstruction is a classic and challenging low-level vision task in computer vision, which is an ill-posed problem where obtaining a unique high-resolution image from a low-resolution input is difficult. The concept of super-resolution was first introduced by Harris [1], and has since gained increasing attention and become a hot research topic in image processing. The development of efficient computing hardware and complex algorithms [2] has unlocked the great potential of deep learning in handling unstructured data. This, in turn, has led to the rapid advancement of deep-learning-based image super-resolution methods. SRCNN [3] (Super-Resolution Convolutional Neural Network) was the first to apply convolutional neural networks to this task.
Research on deep learning theory [4] has shown that the solution space of deep neural networks can be expanded by deepening and widening the network structure. Many network designs focus on increasing depth and width for better performance. However, as networks get deeper, training VGG – based networks becomes harder, leading to the development of ResNet with skip connections [5–6]. EDSR [7] by Lim et al. removed the BN layer for better performance in image super-resolution. EDSR (Enhanced Deep Super-Resolution) not only increases network depth but also boosts feature dimensions. RDN [8] (Residual Dense Network) by Zhang et al. uses residual dense blocks with densely connected convolutional layers to extract rich local features. SwinIR (Swin Transformer for Image Restoration) [9] leverages the hierarchical structure and sliding window mechanism of Swin Transformer for effective feature extraction and reconstruction. HAT (Hybrid Attention Transformer) [10] combines channel attention and self-attention mechanisms to activate more input pixels for reconstruction.
With technological progress, high resolution images from super resolution reconstruction contain richer details and are now widely used in medical imaging [11], remote sensing satellite imaging [12], industrial product inspection [13], and microscopic imaging [14–15].
In the domain of image super-resolution reconstruction, although performance-oriented algorithms [7–10] can obtain outstanding image reconstruction outcomes, they typically feature complex model architectures and large parameter sizes. These elements increase both the storage demands for model parameters and the memory usage of intermediate features, which restricts their deployment on devices with limited memory. Hence, developing lightweight models to strike a balance between image reconstruction quality and computational resource consumption is of vital importance [16].
FSRCNN (Fast Super-Resolution Convolutional Neural Networks) [17] is the first network to use deconvolution layers to reconstruct HR images from LR feature maps. EDSR-baseline [7] simplifies the network while maintaining the original model’s performance. DRCN (Deeply-Recursive Convolutional Network) [18] enhances network depth through its recursive structure, expanding the receptive field to capture global image information and effectively reducing the number of parameters. CARN (Cascading Residual Network) [19] uses a cascading structure and residual connections to improve feature propagation efficiency, making it suitable for lightweight super-resolution tasks with fast inference speed. IMDN (Information Multi-Distillation Network) [20] gradually extracts hierarchical features through an information distillation mechanism, enhancing feature extraction capabilities, particularly for complex scene super-resolution reconstruction. RepRFN (Reparameterized Residual Feature Network) [21] designs a multi-scale feature fusion structure, enabling the network to learn and fuse features of various scales and high-frequency edges. SwinIR-light [9], a lightweight version of SwinIR, inherits the advantages of the Swin Transformer while reducing model size and computational requirements, making it suitable for mobile devices.
Despite the lightweight achievements of these networks for super-resolution models to varying degrees, several challenges persist. These include issues of gradient disappearance or explosion; the increased computational burden from information distillation processes, which also involve a large number of parameters and are unsuitable for resource-constrained devices; increased model complexity due to multi-scale structures; high demand for computational resources; and the potential for significant degradation in reconstruction performance.
In this study, we first utilized intrinsic features to generate redundant features [22] through low-cost operations and developed a reparametrized layer for feature reuse (RG-Layer) using structural reparameterization techniques [23–24]. We then designed an efficient deep feature extraction module (RGAB, RepGhost and addition based residual dense blocks) that retains dense connections, local feature integration, and local residual learning, using addition operations for feature fusion. Finally, we constructed R2GDN, an efficient image super-resolution reconstruction network. Compared with performance-oriented image super-resolution reconstruction algorithms (e.g., RDN), our algorithm reduces the number of parameters by 96% and shortens the inference time on the device by 86.83%. When compared with typical lightweight super-resolution algorithms, our algorithm shows improvements in the structural similarity index (SSIM) during benchmark testing.
The method proposed in this paper does not simply assemble existing structures. Compared to traditional dense connection blocks that utilize front-end features through convolution and Concat operations, the RG-Layer designed in this paper transfers the process of fusing feature spaces, which involves deriving redundant features from essential features through inexpensive operations, to the weight space. During inference, the parallel structure can be merged into a single and efficient convolutional layer, thereby achieving rich feature representation at the minimal operational cost. The RGAB module combines the RG-Layer (which efficiently generates diverse features) with additive operations (refining features within a fixed dimension) to create an efficient local learning environment: each RG-Layer fine-tunes and enhances features based on the fusion of previous layers, rather than simply increasing the number of features. This encourages the network to learn more representative and informative features, avoiding the unbounded expansion of feature maps. The additive operation provides a more direct path for gradient propagation, in line with the idea of residual learning, which helps stabilize the training of deep networks. The effectiveness of R²GDN stems from our profound understanding of “feature redundancy” and “feature fusion,” as well as the collaborative innovation of structural reparameterization and additive dense connections, which realizes a super-resolution reconstruction model that is “wide during training and narrow during inference.”
In this paper, our main work is as follows:
1. 1). We combine feature reuse and structural reparameterization techniques in image super-resolution reconstruction to develop a reparametrized layer for feature reuse (RG-Layer). This approach effectively reduces the number of model parameters and computational complexity through feature fusion.
2. 2). We propose an efficient deep feature extraction module (RGAB) for fast and accurate image super-resolution reconstruction. This module achieves competitive results with a lower number of parameters, enhancing feature extraction and utilization efficiency.
3. 3). We explore the effect of the addition operation in image super-resolution reconstruction through experiments. The addition operation, which combines features additively, enables the extraction of local dense features in images and can serve as a reference for lightweight network design. Our model achieves an excellent balance between visual quality and inference speed.
2. Methods
2.1. Structural reparameterization and feature reuse
Structural reparameterization, a deep learning optimization technique, enhances model performance, efficiency, and generalization [25] by adjusting network structure to reduce parameters and computational cost. Chen et al. [24] indicate that Structural reparameterization involves multiple linear operators during training to generate diverse feature maps. During the inference process, these operators are merged into a single operator via parameter fusion for fast inference. The process of structural reparameterization is schematically illustrated in Fig 1.
[Figure omitted. See PDF.]
The specific reparameterization process is shown in Fig 2. This process involves not only the fusion of convolutional kernels but also the merging of bias terms. Taking a parallel structure containing a 3 × 3 convolution and a 1 × 1 convolution as an example, Fig 2(a) illustrates the macroscopic structural changes during training and inference. The core parameter fusion process is shown in Fig 2(b): the 3x3 convolutional kernel is defined as, and the kernel of the 1 × 1 convolution is expanded to a 3 × 3 size through zero-padding to obtain . Subsequently, the equivalent convolutional kernels and of all parallel branches are added element-wise to obtain the fused convolutional kernel ; the equivalent biases and of all parallel branches are directly added to obtain the fused bias . The original entire parallel structure is replaced by a single convolutional layer with the fused weight and bias . During the training phase, the model retains the complex parallel structure to capture rich image features and ensure high performance. During inference, the aforementioned reparameterization operation is applied to all eligible branches in the network, thereby transforming the entire network into a functionally equivalent but much faster and more memory-efficient architecture.
[Figure omitted. See PDF.]
(Fig 2 (a) is a schematic diagram of the parallel structure in the network; Fig 2. (b) illustrates the fusion process of parallel structures in network inference.).
Feature reuse is a key strategy for improving the efficiency and performance of deep learning models. By reusing already computed feature maps, it avoids redundant calculations and reduces the computational load of the model. With multiple layers or operations able to share the same feature representations, feature reuse can decrease model parameters and prompt the model to learn richer and more robust feature representations. Also, it can speed up data processing and improve training efficiency. Common feature reuse methods include residual connections [6,8], dense connections [8,26], feature pyramids [27], and multi-scale feature fusion [7] and so on.
As the performance of neural network models improves, the scale of the network and the number of features increase. The issue of feature redundancy has gradually received attention from researchers [22]. In the task of image super-resolution reconstruction, there are also large numbers of similar features [28]. Obtaining these similar features repeatedly through regular convolution operations leads to a significant waste of computational resources. Therefore, some scholars have proposed using regular convolution to generate some intrinsic features and then using low-cost operations (such as depth-wise convolution, shift operations, etc.) to obtain redundant features. By concatenating these two parts of features, the completeness of the output features can be guaranteed while effectively reducing the number of model parameters and computational requirements. Structural reparameterization moves the fusion process from the feature space to the weight space, and is considered an implicit feature reuse method.
In deep learning, feature fusion is commonly achieved through concatenation and addition operations. Concatenation combines two or more tensors along a specified dimension to generate a larger tensor, increasing channels or feature dimensions. This enables later layers to better capture the relationships between different features. Addition sums two tensors element-wise to generate a larger tensor, facilitating gradient flow and enhancing training stability while reducing gradient vanishing in deep networks.
Although these two feature fusion methods do not introduce additional parameters or FLOPs in the network, researchers [24] have verified that, under the same batch size, the Addition operation has a lower computational cost and shorter runtime compared to the Concatenation operation, as shown in Fig 3.
[Figure omitted. See PDF.]
2.2. RepGhost and addition based residual dense blocks—RGABs
Unlike traditional convolution-activation operations, the Ghost module [22] uses regular convolution to generate some intrinsic features and then employs low-cost operations to obtain redundant features. These two sets of features are concatenated to ensure complete output features by the concat operation while effectively reducing the number of model parameters and computational requirements, as depicted in Fig 4.
[Figure omitted. See PDF.]
(Fig 4(a) is the structure of traditional convolutional; Fig 4(b) is the structure of Ghost module.).
Based on this, we have made several adjustments and finally derived our RG-Layer, as shown in Fig 5. Fig 5(a) shows the RepGhost module [24] used in classification networks, which replaces the concat operation in Fig 4(b) with an Addition operation to enhance efficiency. To comply with structural reparameterization rules, the ReLU operation of nonlinear computation is moved after the Addition operation, and Batch Normalization (BN) is added to the identity mapping branch to introduce nonlinearity during training, making the structure more flexible.
[Figure omitted. See PDF.]
(To highlight the changes in functional structure, we have omitted the 1 × 1 convolution used for channel number transformation during input and output. Fig 5 (a) shows the RepGhost module used in classification networks; Fig 5 (b) shows the RG-Layer of SR network with gated BN unit; Fig 5 (c) shows the further advancement of lightweight RG-Layer; Fig 5 (d) is the re-parameterized structure during inference.).
The RG-Layer of our image super-resolution reconstruction network is depicted in Fig 5 (b) and (c). While batch normalization (BN) can ease network training and prevent overfitting, for image super-resolution reconstruction, it normalizes image color distribution, degrading the original contrast information and thus the super-resolution network’s output quality. Instead of removing the BN layer, we designed a gated structure with trainable parameters to control its effect range, mitigating BN’s negative impact in this task, as shown in Fig 5 (b).
Gated mechanisms are widely used in recurrent neural networks (RNNs) [29] to control the flow of information and improve the model’s ability to learn long-term dependencies. This mechanism dynamically adjusts the transmission of information by learning gating parameters. Inspired by these successes, recent studies have explored the application of gating mechanisms in other domains. For instance, Wang et al. [30] combined gating mechanisms with normalization methods in the context of image restoration tasks, demonstrating that such combinations can enhance feature extraction and improve overall performance.
Drawing inspiration from these advances, we designed a Gated BN Unit (GBU) to address the limitations of traditional BN in image super-resolution tasks. Specifically, we created two gating mechanisms, each consisting of a linear layer followed by an activation function (e.g., sigmoid), to output a value between 0 and 1. This value indicates the degree to which the BN layer should be activated. The process can be mathematically expressed as:
(1)
Where α and β are trainable gating parameters that determine how much input should be retained and the extent to which the BN layer’s normalization effect should be applied.
Inspired by the literature [30] (Wang et al., combining gating with normalization for image restoration), we designed the GBU. Its form aims to allow the network to adaptively determine the proportion of retaining the original input and utilizing BN features through learnable parameters and . Our original intention was to let the network adaptively find the optimal balance between the BN layer and the original identity mapping. Imposing constraints (such as making, or restricting the value range to [0,1] through Sigmoid) is equivalent to us presetting the answer in advance, which may limit the model’s expressive power. Without constraints, the network may discover more optimal feature combination methods beyond human intuition. From the perspective of exploratory research, it is completely scientific not to impose constraints.
To further advance model lightweighting, we designed the structure in Fig 5 (c). Compared to Fig 5 (a) and 5 (b), we replaced the parallel structure with a 1 × 1 convolution branch. The 1 × 1 convolution has trainable parameters that can learn linear transformations of input features and help improve gradient flow [31] in deep networks. The structure in Fig 5 (c) can be re-parameterized to Fig 5 (d) during inference, enabling fast inference for the super-resolution network. Our RG-Layer only contains depth-wise separable convolutions (DConv) and activation functions during inference, making it more suitable for edge devices [23]. Based on the dense connection layer, local feature fusion, and local residual learning [8], we designed a lightweight RGAB (RepGhost and addition based residual dense blocks) module, as shown in Fig 6. The continuous memory mechanism is implemented by passing the features from the previous RGAB’s output to each RG-Layer of the current RGAB.
[Figure omitted. See PDF.]
Let denote the output of the -th RGAB module (the input of the ()-th RGAB module), and denote the output of the ()-th RGAB module, as shown in Fig 6. Both have G features. The output of the -th RG-Layer in the -th RGAB module can be expressed as:
(2)
In the formula 2, represents the activation function. denotes the weight of the -th RG-Layer in the ()-th RGAB block. The output features of the -th RGAB block are combined with the output features of the 1, …, () RG-Layers in the ()-th block to produce G feature maps. The structure, where the output of each layer of the previous RGAB and this block are all superimposed with those of all subsequent layers, enables the extraction of local dense features of the image [8] while reducing the network complexity.
Compared with the block using the concat operation to extract local dense features of the image, the RGAB reduces the number of output features of the -th RG-Layer by a factor of ) (when the input and output features of each RGAB block are equal), greatly reducing the model’s computational parameters and complexity (As mentioned in Section 2.1), thus achieving model lightweighting. Moreover, the output of each RG-Layer is passed in an additive manner to extract local dense features of the image.
The design of RGAB represents a paradigm shift from feature accumulation to feature refinement. In traditional concatenation-based dense blocks, the layer has to process inputs with a channel number of , leading to escalating computational demands. In contrast, our RGAB maintains a constant channel width. This allows the network to continuously enhance the most salient information within a fixed channel capacity, rather than simply expanding its representational space. This approach is inherently more parameter-efficient and computationally economical, directly addressing the core objectives of lightweight model design.
After inference through L RG-Layers in the ()-th RGAB block, local feature fusion is needed. As shown in Fig 6, is the output features of the -th RGAB block. and the outputs of its L RG-Layers in the ()-th RGAB are concatenated through concat operation and then passed through a 1 × 1 convolution operation to control the output feature information.
(3)
The number of features in are the same. To further improve the information flow during network inference, the RGAB introduces local residual learning after local feature fusion. , the output of the ()-th RGAB block can be expressed as:
(4)
2.3. R2GDN network structure
The R2GDN network is illustrated in Fig 7. The network is primarily composed of four components: the shallow feature extraction (SFE) block, repghost and addition based residual dense block (RGABs), the dense feature fusion (DFF), and the up-sampling block.
[Figure omitted. See PDF.]
We denote the input image of the network as and the output image as . Initially, shallow feature extraction is performed on the input image using two convolutional layers. The first convolutional layer extracts feature from .
(5)
The features (Shallow Feature Extraction + Global Residual Learning) are used for further shallow feature extraction and global residual learning. The second convolutional layer takes as input and further extracts shallow features. The output feature, denoted as , is used as the input to the first RGAB.
(6)
Assuming the entire network has N RGAB blocks, we define the output features of the n-th RGAB block as . can be expressed as:
(7)
In formula 7, denotes the operation of the n-th RGAB. is a composite function composed of depth-wise convolution, pointwise convolution, and a non-linear activation function. is obtained by the internal convolution operations within the n-th RGAB and is thus referred to as the local feature.
The dense feature fusion (DFF) is an operation that fuses the local features output by the n-th RGAB modules with the global residual feature . In the equation 8, represents the features output after the DFF operation, and denotes a series of 1 × 1 and 3 × 3 convolutional operations.
(8)
After obtaining the fused features in the low-resolution space, we perform the up sampling operation, which can be expressed as:
(9)
We use sub-pixel convolution for upsampling. This technique overcomes the checkerboard effect problem associated with conventional upsampling methods (e.g., transposed convolution) by reordering the channels of low-resolution feature maps to generate high-resolution outputs. The fundamental concept is to create a feature map with a channel number equal to the square of the upsampling factor through convolution. The pixel shuffle operation subsequently reorganizes these channels to form the high-resolution output.
3. Experimental verification
3.1. Experimental settings
In this study, we used 800 high-quality RGB images from the DIV2K [32] dataset and 2000 high-quality RGB images from the Flickr2K [33] dataset as the training set. We evaluated the performance of our model on five benchmark datasets: Set5 [34] with 5 natural images representing different scenes and content; Set14 [35] containing 14 natural images with more diverse scenes and objects such as architecture, animals, and plants; BSD100 [36] with 100 high-resolution and high-quality natural scene images; Urban100 [37] with 100 urban landscape images featuring various buildings and streets; and Manga109 [38], a super-resolution dataset for manga images, including 109 high-quality manga images.
Low-resolution (LR) images were generated by down-sampling the high-resolution (HR) images using bicubic interpolation. The super-resolution results were evaluated using the Y-channel component of the images in the YCbCr color space, with metrics including Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM).
We set the size of the LR images to 64 × 64. For networks with different upscaling factors, the corresponding HR image patches were automatically cropped from the training images. During each training iteration, one HR image patch was cropped from each training image, and data augmentation was performed by randomly applying one of the following transformations: 90° rotation, horizontal flipping, or vertical flipping. The super-resolution network was implemented using the PyTorch framework and updated using the Adam optimizer. The learning rate for all layers was initialized to 10−4. After 750 training epochs, the learning rate was updated to 10−5; after 900 epochs, it was updated to 10−6; and the training was completed after 1000 epochs.
3.2. Network performance comparison
We compared our network with state-of-the-art methods: FSRCNN [17], VDSR [39], DRCN [18], EDSR-baseline [7], CARN [19], IMDN [20], RepRFN [21], SwinIR-light [9], RDN [8], CFIN [40], FIWHN [41], DMNet [42]. Specifically, the network using the structure shown in Fig 5(b) is denoted as R2GDN-GBU, while the network using the structure shown in Fig 5(c) is denoted as R2GDN. Table 1–2, and Table 3 present the quantitative comparisons of super-resolution reconstruction results for ×2, × 3, and ×4 upscaling factors, respectively. It can be observed that our R2GDN performs well on most datasets, particularly excelling in the Structural Similarity Index (SSIM) metric, surpassing the majority of the models.
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
The Structural Similarity Index (SSIM) measures the similarity between images based on three relatively independent metrics: luminance, contrast, and structure. The superior performance of our model in this metric indicates its ability to better restore structural information in images, such as edges and textures, thereby providing a more natural and visually pleasing result that aligns better with human perception.
As shown in Table 1, R²GDN achieves better objective evaluation metrics in the 5 benchmark test sets for the × 2 reconstruction task. Compared with performance-oriented models (such as RDN), the number of model parameters of R²GDN is reduced by 96.4%, but the SSIM of R²GDN (taking BSD100 as an example) is 0.9058, which is higher than that of RDN (0.9017). Compared with other lightweight models (such as CARN, IMDN, RepRFN), the structural similarity index of R²GDN (taking BSD100 as an example) is increased by 0.89%, 0.69%, and 0.54%, respectively.
As shown in Table 2, compared with performance-oriented models (such as RDN), the SSIM of R²GDN (taking BSD100 as an example) is 0.8319, which is higher than that of RDN (0.8093). Compared with other lightweight models (such as CARN, IMDN, RepRFN), the structural similarity index of R²GDN (taking BSD100 as an example) is increased by 3.55%, 3.41%, and 3.11%, respectively.
As shown in Table 3, compared with performance-oriented models (such as RDN), the SSIM of R²GDN (taking BSD100 as an example) is 0.7461, which is higher than that of RDN (0.7419). Compared with other lightweight models (such as CARN, IMDN, RepRFN), the structural similarity index of R²GDN (taking BSD100 as an example) is increased by 1.53%, 1.47%, and 0.97%, respectively.
On the basis of achieving good super-resolution reconstruction results, our model has found a better balance between performance and lightweight design. Compared with high-performance super-resolution networks, our R2GDN can achieve comparable PSNR and SSIM values with fewer parameters and even outperforms some of these state-of-the-art models in certain metrics. Compared with advanced lightweight networks, R2GDN not only has a small number of parameters but also achieves better evaluation metrics on some datasets.
Compared to GhostSR [28], which directly employs the original Ghost module, which maintains a consistent structure during both training and inference. This structure includes regular convolutions to generate “essential features” and inexpensive shift operations to create “ghost features,” which are then merged via concatenation. The fundamental unit structure of R²GDN (RG-Layer) innovatively integrates structural reparameterization with the Ghost concept. During training, the RG-Layer boasts a richer set of branches and stronger expressive power; during inference, it consolidates into a single standard convolution. When stacking block structures, we opt for the more efficient addition operation over concatenation, reducing latency and promoting feature refinement.
Compared to RepRFN [21], which explicitly fuses multi-scale features through a complex multi-branch structure, offering strong performance but at the cost of increased structural complexity. In contrast, R²GDN adopts a “refinement” mode, repeatedly enhancing features within a fixed channel dimension through additive dense connections, thereby achieving greater simplicity and efficiency. Unlike the concatenation operation used by RepRFN (and most traditional dense networks), we employ addition for feature fusion within the RGAB module. This fundamentally prevents channel expansion, significantly reducing computational complexity and memory access overhead. Experimental results indicate that, despite having slightly more parameters than RepRFN, R²GDN achieves a notable improvement in structural similarity (for example, in ×4 super-resolution on BSD100, the SSIM of R²GDN is 0.7461 compared to 0.7389 of RepRFN).
Fig 8 and Fig 9 presents ×4 visual comparisons on the Urban100 dataset. For the images “img_18” and “img_83” from the Urban100 dataset, it can be observed that the grid structures are better restored compared to other methods. This also demonstrates the effectiveness of our R2GDN.
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
We selected typical networks for a comparative experiment on partial image reconstruction results at ×4 super-resolution. As shown in the images “img_18” and “img_83” from the Urban100 dataset, the subjective visual effects reveal that our method can achieve results comparable to or even better than state-of-the-art methods such as RDN, as well as lightweight methods like FSRCNN, VDSR, EDSR-baseline, CARN, and IMDN. Our approach effectively restores image edges and textures, making details clear and visible. Particularly for the reconstruction of “img_83,” our network accurately reconstructs the textures of the glass reflections. This visual comparison further illustrates that the super-resolution results of our proposed R2GDN have reached an advanced level and can meet the requirements of practical applications.
As shown in Tables 1–3 to 3, R²GDN achieves higher SSIM compared to other models, but has relatively lower PSNR indices. Through an analysis of the definitions of SSIM and PSNR, as well as an in-depth study of the reconstructed images, we have determined that the image reconstruction performance of R²GDN is more inclined towards enhancing the structural information of images rather than pursuing absolute pixel value matching. Studies have already confirmed that in the task of image super-resolution reconstruction, a higher PSNR does not necessarily correspond to better image reconstruction quality. Some reconstructed images have high PSNR values, but their details are overly smooth, leading to a worse intuitive perception. We performed edge extraction on the reconstructed images, and the results are shown in Fig 10. The results indicate that the reconstructed images of R²GDN have richer edge information compared to other models. At the pixel level, these restored details may not be entirely consistent with the original image, thereby increasing the mean squared error (MSE) and decreasing the PSNR.
[Figure omitted. See PDF.]
3.3. Ablation Investigation
We designed a series of ablation experiments to analyze the effectiveness of each module in the model. The network was trained using images of size 64 × 64 and updated with the Adam optimizer, with an initial learning rate of 10−4. After 750 training epochs, the learning rate was updated to 10−5; after 900 epochs, it was updated to 10−6, and the training was completed after 1000 epochs. We evaluated the × 4 super-resolution reconstruction performance on three benchmark datasets: Set14, BSD100, and Urban100. The results of the ablation experiments for R2GDN are summarized in Table 4.
[Figure omitted. See PDF.]
To further validate the contribution of redundant features generated by different low-cost operations to model performance and to demonstrate the advancement of the structure designed in this paper, we conducted replacement tests on low-cost operations in the lightweight layer based on the R2GDN model. We compared the results with identity mapping, batch normalization, GBU, and 1 × 1 convolution. The experimental results show that the lightweight layer structure using 1 × 1 convolution, as adopted in this paper, achieves better results in both objective metrics and subjective evaluation, with the metrics statistics presented in the Table 5.
[Figure omitted. See PDF.]
When the model structure is kept the same, this paper adopts 1x1 convolution as a low-cost operation. Compared to the use of BN structure, the average enhancement in objective evaluation metrics is 3.4%; compared to identity mapping, the average enhancement in objective evaluation metrics is 3.56%; compared to GBU, the average enhancement in objective evaluation metrics is 0.17%. Ablation experiments have shown that the model designed in this paper has certain structural innovation and performance benefits in the task of image super-resolution reconstruction.
Through our in-depth ablation studies, we discovered that the complexity of the Gated BN Unit (GBU) does not correspond to the performance improvements it offers. Consequently, we conducted a more comprehensive optimization: we eliminated the BN and the entire GBU structure and determined that employing a parallel 1 × 1 convolution branch as a low-cost operation achieves superior outcomes. The 1 × 1 convolution is a lightweight yet parameter-rich operation that not only offers linear transformation capabilities beyond identity mapping, aiding in gradient flow improvement, but also integrates seamlessly and cleanly into our reparameterization framework—during inference, it can be seamlessly merged with the main branch’s 3 × 3 convolution into a single, standard convolutional layer, without leaving any additional, non-standard operations.
As shown in Tables 1–3 to 3, sometimes a small average difference in the SSIM metric alone is not sufficient to conclusively demonstrate the superiority of a model. We conducted paired sample t-tests for the reconstruction objective metric SSIM of R2GDN, RepRFN, FIWHN, CFIN, and SwinIR-light on the BSD100 test set under the × 4 scale. We also calculated the 95% confidence intervals for all key comparisons to assess the statistical significance of these differences. The results are shown in Table 6.
[Figure omitted. See PDF.]
Compared with RepRFN, R2GDN has an average SSIM difference of +0.0170, with a 95% confidence interval of [0.00005, 0.03391] and a two-sided p-value of 0.049. Statistics indicate that the difference in the reconstruction objective metric SSIM between R2GDN and RepRFN reaches the level of statistical significance (p < 0.05). More importantly, the lower limit of its 95% confidence interval is greater than zero, which provides us with 95% confidence that the performance advantage of R2GDN is real. Although this advantage may seem small numerically, statistical tests confirm its systematic nature rather than randomness.
Compared with FIWHN, R2GDN has an average SSIM difference of +0.0443, with a 95% confidence interval of [0.0098, 0.0788] and a two-sided p-value of 0.012. Statistics show that the difference in the reconstruction objective metric SSIM between R2GDN and FIWHN reaches the level of statistical significance (p < 0.05) and is significant. Compared with SwinIR-light, R2GDN has an average SSIM difference of +0.0532, with a 95% confidence interval of [0.0082, 0.0981] and a two-sided p-value of 0.021. Statistics indicate that the difference in the reconstruction objective metric SSIM between R2GDN and SwinIR-light reaches the level of statistical significance (p < 0.05) and is significant.
Compared with CFIN, R2GDN has an average SSIM difference of +0.0096, with a 95% confidence interval of [−0.0243, 0.0435] and a two-sided p-value of 0.576. Statistics show that the difference in the reconstruction objective metric SSIM between R2GDN and CFIN does not reach the level of statistical significance (p > 0.05). Statistical conclusions indicate that there is no significant difference in performance between R2GDN and CFIN.
In summary, in response to the questions you raised, the statistical evidence we provided strongly supports the conclusion that “our R2GDN model significantly outperforms RepRFN, FIWHN, and SwinIR in terms of the SSIM metric.”
3.4. Inference time
The inference time is an extremely important metric for lightweight image super-resolution reconstruction algorithms. To further validate that the proposed R2GDN network is lightweight, we conducted comparative experiments on reconstruction speed using both high-performance GPU devices and edge devices. The experiments were performed on the BSD100 dataset (×4) with test images of size 64 × 64 pixels. We first performed a model warm-up operation, as the initial inference time may include network loading time. Subsequently, we ran the model 10 times to measure the average inference time. The results are shown in Fig 11, Tables 7 and 8.
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
High-performance GPU inference test was profiled on an NVIDIA GeForce RTX 4090. Edge-hardware testing was performed on the Jetson Nano B01 developer kit (NVIDIA Jetson family), integrating a quad-core ARM Cortex-A57 MP Core application processor and a 128-core NVIDIA Maxwell graphics processing unit.
3.5. Computational complexity
The computational complexity of a neural network model refers to the computational resources and time required for the model to operate, which typically includes both time complexity and space complexity. Time complexity primarily measures the speed of model operation, the amount of computation required to complete one forward pass or training iteration, common metrics include: FLOPs (Floating Point Operations). Space complexity measures the memory or storage resources required for the model; common metrics include: number of parameters, as shown in Tables 1–3 to 3. FLOPs refer to the number of floating-point operations, which is a key indicator of computational complexity. It represents the total number of floating-point additions, multiplications, and other operations required during computation. Table 9 presents the FLOPs of the R2GDN and other advanced models at a magnification factor of 2. The comparison shows that R2GDN can achieve low computational complexity while ensuring stable image reconstruction performance.
[Figure omitted. See PDF.]
3.6. Evaluation of datasets in different fields
In order to validate the generalizability of R2GDN, we performed super-resolution reconstruction network training and image reconstruction experiments using IC microscopic images. The training set is composed of REFICS [13] and a portion of self-collected images. REFICS is a large-scale synthetic scanning electron microscope (SEM) dataset, which includes 800,000 SEM images spanning 32nm and 90nm node technologies. We chose 5,000 images with minimal noise and high clarity from the active area, polysilicon, and metal layers in REFICS, and combined them with 3,000 self-collected high-definition integrated circuit microscopic images to form the training set. Self-collected high-definition micrographs of IC were acquired from two distinct devices. The first device is fabricated in a 0.18 µm 1P6M Bipolar-CMOS-DMOS (BCD) process; its images were captured at 1,800 × magnification using an optical electron microscope. The second device is manufactured in a 55 nm 1P5M Bipolar-CMOS technology; its images were obtained at 200,000 × magnification via scanning electron microscopy.
We utilized 80 self-collected images that do not overlap with the training set as the overall test set, 50 metal layer images, 50 poly layer images, and 50 diffusion area (DF) images that do not overlap with the training set form the independent test sets. We retrained several typical networks on our IC training dataset for reconstruction performance comparison. The objective performance indicators of our model on ×4 scale are com-pared with other typical model indicators as shown in Table 10. The visual effects of super-resolution reconstruction of IC microscopic images by R2GDN are demonstrated in Fig 12.
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
As indicated in Table 10, in the × 4 reconstruction task for IC circuit microscopic images, R²GDN attains the highest SSIM index in the overall test set, metal test set, and DF test set, and the second-highest SSIM index in the poly test set, compared with other top-performing models. We also performed a perceptual evaluation comparison of the IC reconstruction results. R²GDN achieves the lowest LPIPS (Learned Perceptual Image Patch Similarity) value in the overall test set and active region test set, and the second-lowest LPIPS value in the metal test set and poly test set. The results show that R²GDN has achieved outstanding performance in both structural fidelity (highest SSIM) and visual perceptual quality (lowest LPIPS).
The primary focus for engineers inspecting the microscopic structure of integrated circuits is on the structural characteristics of the circuit, such as linewidths and edges within the circuit. The structural similarity measured by SSIM directly meets these requirements. A high SSIM index indicates that the reconstructed IC microscopic images are more consistent with the actual situation of key structural information, including line shapes and edge locations. If a model that solely aims to achieve high PSNR results in overly smooth edges in the microscopic images, integrating such a model into the IC microscopic image acquisition process would be counterproductive.
4. Conclusions
Aiming to address the issues of high computational complexity and substantial memory consumption in existing super-resolution reconstruction networks, this paper proposes a lightweight image super-resolution network based on feature reuse and structural reparameterization techniques. This approach makes super-resolution networks more suitable for deployment on edge devices, enabling fast and accurate extraction of local-global deep features from images. Specifically, we leverage intrinsic features to generate redundant features via inexpensive operations and employ structural reparameterization techniques to design a feature-reusing reparametrized layer, termed the RG-Layer. Building on this, we design an efficient deep feature extraction module, the Residual Gated Block (RGAB), which maintains dense connections, local feature fusion, and local residual learning while using an Addition feature fusion method. These components are integrated into a highly efficient image super-resolution network, the Reparametrized Gated Dense Network (R2GDN). Experimental results demonstrate that R2GDN achieves a good balance between performance and network complexity compared to other state-of-the-art algorithms. However, a comparison of the reconstruction results reveals that the proposed model still exhibits some issues, such as blurred edges and artifacts in the reconstructed images. In the future, we will focus on improving the image quality of the reconstructed results while further ensuring the network’s lightweight nature to better fit edge devices.
Acknowledgments
This work is supported by the general projects of educational department of Liaoning province (JYTMS20231212).
References
1. 1. Glasner D, Bagon S, Irani M. Super-resolution from a single image. In: 2009 IEEE 12th International Conference on Computer Vision, 2009.
* View Article
* Google Scholar
2. 2. Yang W, Zhang X, Tian Y, Wang W, Xue J-H, Liao Q. Deep Learning for Single Image Super-Resolution: A Brief Review. IEEE Trans Multimedia. 2019;21(12):3106–21.
* View Article
* Google Scholar
3. 3. Dong C, Loy CC, He K. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2016;38(2):295–307.
* View Article
* Google Scholar
4. 4. Ozekinci T, Atmaca S, Dal T. Effect of routine Hepatitis B vaccination program in Southeast of Turkey? Comparing of the results of HBV DNA in terms of age groups for the years 2002 and 2012. Cent Eur J Immunol. 2014;39(1):122–3. pmid:26155112
* View Article
* PubMed/NCBI
* Google Scholar
5. 5. He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 770–8.
* View Article
* Google Scholar
6. 6. Ledig C, Theis L, Huszar F, Caballero J, Cunningham A, Acosta A, et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 105–14.
* View Article
* Google Scholar
7. 7. Lim B, Son S, Kim H, Nah S, Lee KM. Enhanced Deep Residual Networks for Single Image Super-Resolution. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017. 1132–40.
* View Article
* Google Scholar
8. 8. Zhang Y, Tian Y, Kong Y, Zhong B, Fu Y. Residual Dense Network for Image Super-Resolution. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. 2472–81.
* View Article
* Google Scholar
9. 9. Liang J, Cao J, Sun G, Zhang K, Van Gool L, Timofte R. SwinIR: Image Restoration Using Swin Transformer. In: 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2021. 1833–44.
* View Article
* Google Scholar
10. 10. Chen X, Wang X, Zhou J. Activating more pixels in image super-resolution transformer. arXiv e-prints. 2022.
* View Article
* Google Scholar
11. 11. Li G, Zhao L, Sun J. Rethinking multi-contrast MRI super-resolution: rectangle-window cross-attention transformer and arbitrary-scale upsampling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023. 21230–40.
* View Article
* Google Scholar
12. 12. Rajaei A, Abiri E, Helfroush MS. Self-supervised spectral super-resolution for a fast hyperspectral and multispectral image fusion. Sci Rep. 2024;14(1):29820. pmid:39616217
* View Article
* PubMed/NCBI
* Google Scholar
13. 13. Wilson R, Lu H, Zhu M, Forte D, Woodard DL. REFICS: Assimilating Data-Driven Paradigms Into Reverse Engineering and Hardware Assurance on Integrated Circuits. IEEE Access. 2021;9:131955–76.
* View Article
* Google Scholar
14. 14. Qiao C, Li D, Guo Y, Liu C, Jiang T, Dai Q, et al. Evaluation and development of deep neural networks for image super-resolution in optical microscopy. Nat Methods. 2021;18(2):194–202. pmid:33479522
* View Article
* PubMed/NCBI
* Google Scholar
15. 15. Liu S, Weng X, Gao X, Xu X, Zhou L. A Residual Dense Attention Generative Adversarial Network for Microscopic Image Super-Resolution. Sensors (Basel). 2024;24(11):3560. pmid:38894350
* View Article
* PubMed/NCBI
* Google Scholar
16. 16. Rezvani S, Soleymani Siahkar F, Rezvani Y, Alavi Gharahbagh A, Abolghasemi V. Single Image Denoising via a New Lightweight Learning-Based Model. IEEE Access. 2024;12:121077–92.
* View Article
* Google Scholar
17. 17. Dong C, Loy CC, Tang X. Accelerating the Super-Resolution Convolutional Neural Network. Lecture Notes in Computer Science. Springer International Publishing. 2016. 391–407.
* View Article
* Google Scholar
18. 18. Kim J, Lee JK, Lee KM. Deeply-Recursive Convolutional Network for Image Super-Resolution. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
* View Article
* Google Scholar
19. 19. Ahn N, Kang B, Sohn KA. Fast, accurate, and lightweight super-resolution with cascading residual network. 2018.
* View Article
* Google Scholar
20. 20. Hui Z, Gao X, Yang Y, Wang X. Lightweight Image Super-Resolution with Information Multi-distillation Network. In: Proceedings of the 27th ACM International Conference on Multimedia, 2019. 2024–32.
* View Article
* Google Scholar
21. 21. Deng W, Yuan H, Deng L. Reparameterized residual feature network for lightweight image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023. 1712–21.
* View Article
* Google Scholar
22. 22. Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C. GhostNet: More Features From Cheap Operations. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 1577–86.
* View Article
* Google Scholar
23. 23. Ding X, Zhang X, Ma N, Han J, Ding G, Sun J. RepVGG: Making VGG-style ConvNets Great Again. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. 13728–37.
* View Article
* Google Scholar
24. 24. Chen C, Guo Z, Zeng H. Repghost: A hardware-efficient ghost module via re-parameterization. arXiv preprint. 2022.
* View Article
* Google Scholar
25. 25. Luo G, Huang M, Zhou Y, et al. Towards Efficient Visual Adaption via Structural Re-parameterization. ArXiv.org. 2023. https://arxiv.org/abs/2302.08106
* View Article
* Google Scholar
26. 26. Tong T, Li G, Liu X, Gao Q. Image Super-Resolution Using Dense Skip Connections. In: 2017 IEEE International Conference on Computer Vision (ICCV), 2017. 4809–17.
* View Article
* Google Scholar
27. 27. Lai W-S, Huang J-B, Ahuja N, Yang M-H. Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 5835–43.
* View Article
* Google Scholar
28. 28. Nie Y, Han K, Liu Z. GhostSR: Learning Ghost Features for Efficient Image Super-Resolution. 2021.
* View Article
* Google Scholar
29. 29. Krishnamurthy K, Can T, Schwab DJ. Theory of gating in recurrent neural networks. 2020.
* View Article
* Google Scholar
30. 30. Wang Q, Wang H, Zang L, Jiang Y, Wang X, Liu Q, et al. Gated normalization unit for image restoration. Pattern Anal Applic. 2025;28(1).
* View Article
* Google Scholar
31. 31. Howard AG, Zhu M, Chen B, et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications[J]. 2017.
* View Article
* Google Scholar
32. 32. Agustsson E, Timofte R. NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017. 1122–31.
* View Article
* Google Scholar
33. 33. Wang YQ, Wang LG, Yang JG. Flickr1024: A large-scale dataset for stereo image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019. 0–0.
* View Article
* Google Scholar
34. 34. Bevilacqua M, Roumy A, Guillemot C, Morel MA. Low-Complexity Single-Image Super-Resolution based on Nonnegative Neighbor Embedding. In: Procedings of the British Machine Vision Conference 2012, 2012. 135.1-135.10.
* View Article
* Google Scholar
35. 35. Zeyde R, Elad M, Protter M. On Single Image Scale-Up Using Sparse-Representations. Lecture Notes in Computer Science. Springer Berlin Heidelberg. 2012. 711–30.
* View Article
* Google Scholar
36. 36. Martin D, Fowlkes C, Tal D, Malik J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001. 416–23.
* View Article
* Google Scholar
37. 37. Huang J-B, Singh A, Ahuja N. Single image super-resolution from transformed self-exemplars. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. 5197–206.
* View Article
* Google Scholar
38. 38. Matsui Y, Ito K, Aramaki Y, Fujimoto A, Ogawa T, Yamasaki T, et al. Sketch-based manga retrieval using manga109 dataset. Multimed Tools Appl. 2016;76(20):21811–38.
* View Article
* Google Scholar
39. 39. Kim J, Lee JK, Lee KM. Accurate Image Super-Resolution Using Very Deep Convolutional Networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
* View Article
* Google Scholar
40. 40. Li W, Li J, Gao G. Cross-receptive focused inference network for lightweight image super-resolution. ArXiv. 2022.
* View Article
* Google Scholar
41. 41. Li W, Li J, Gao G, Deng W, Yang J, Qi G-J, et al. Efficient Image Super-Resolution With Feature Interaction Weighted Hybrid Network. IEEE Trans Multimedia. 2025;27:2256–67.
* View Article
* Google Scholar
42. 42. Li W, Guo H, Hou Y. Dual-domain modulation network for lightweight image super-resolution. ArXiv.org. https://arxiv.org/abs/2503.10047
* View Article
* Google Scholar
43. 43. Zhang Y, Li K, Li K. Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European conference on computer vision (ECCV), 2018. 286–301.
* View Article
* Google Scholar
Citation: Li T, Jin X, Liu Q, Liu X, Yuan Z, Liang T, et al. (2025) R2GDN: RepGhost based residual dense network for image super-resolution. PLoS One 20(12): e0338432. https://doi.org/10.1371/journal.pone.0338432
About the Authors:
Tianyu Li
Roles: Conceptualization, Software, Validation, Writing – original draft, Writing – review & editing
Affiliation: School of Information Science and Engineering, Shenyang University of Technology, Shenyang, China
Xiaoshi Jin
Roles: Supervision
E-mail: [email protected]
Affiliation: School of Information Science and Engineering, Shenyang University of Technology, Shenyang, China
ORICD: https://orcid.org/0000-0003-0476-7527
Qiang Liu
Roles: Project administration
Affiliation: Engineering Training Center, Shenyang University of Technology, Shenyang, China
Xi Liu
Roles: Investigation
Affiliation: School of Information Science and Engineering, Shenyang University of Technology, Shenyang, China
Zehang Yuan
Roles: Data curation
Affiliation: School of Artificial Intelligence, Shenyang University of Technology, Shenyang, China
Tianyang Liang
Roles: Data curation
Affiliation: School of Artificial Intelligence, Shenyang University of Technology, Shenyang, China
Jia Lou
Roles: Data curation
Affiliation: School of Artificial Intelligence, Shenyang University of Technology, Shenyang, China
Yangfan Rao
Roles: Investigation
Affiliation: School of Information Science and Engineering, Shenyang University of Technology, Shenyang, China
1. Glasner D, Bagon S, Irani M. Super-resolution from a single image. In: 2009 IEEE 12th International Conference on Computer Vision, 2009.
2. Yang W, Zhang X, Tian Y, Wang W, Xue J-H, Liao Q. Deep Learning for Single Image Super-Resolution: A Brief Review. IEEE Trans Multimedia. 2019;21(12):3106–21.
3. Dong C, Loy CC, He K. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2016;38(2):295–307.
4. Ozekinci T, Atmaca S, Dal T. Effect of routine Hepatitis B vaccination program in Southeast of Turkey? Comparing of the results of HBV DNA in terms of age groups for the years 2002 and 2012. Cent Eur J Immunol. 2014;39(1):122–3. pmid:26155112
5. He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 770–8.
6. Ledig C, Theis L, Huszar F, Caballero J, Cunningham A, Acosta A, et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 105–14.
7. Lim B, Son S, Kim H, Nah S, Lee KM. Enhanced Deep Residual Networks for Single Image Super-Resolution. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017. 1132–40.
8. Zhang Y, Tian Y, Kong Y, Zhong B, Fu Y. Residual Dense Network for Image Super-Resolution. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. 2472–81.
9. Liang J, Cao J, Sun G, Zhang K, Van Gool L, Timofte R. SwinIR: Image Restoration Using Swin Transformer. In: 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2021. 1833–44.
10. Chen X, Wang X, Zhou J. Activating more pixels in image super-resolution transformer. arXiv e-prints. 2022.
11. Li G, Zhao L, Sun J. Rethinking multi-contrast MRI super-resolution: rectangle-window cross-attention transformer and arbitrary-scale upsampling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023. 21230–40.
12. Rajaei A, Abiri E, Helfroush MS. Self-supervised spectral super-resolution for a fast hyperspectral and multispectral image fusion. Sci Rep. 2024;14(1):29820. pmid:39616217
13. Wilson R, Lu H, Zhu M, Forte D, Woodard DL. REFICS: Assimilating Data-Driven Paradigms Into Reverse Engineering and Hardware Assurance on Integrated Circuits. IEEE Access. 2021;9:131955–76.
14. Qiao C, Li D, Guo Y, Liu C, Jiang T, Dai Q, et al. Evaluation and development of deep neural networks for image super-resolution in optical microscopy. Nat Methods. 2021;18(2):194–202. pmid:33479522
15. Liu S, Weng X, Gao X, Xu X, Zhou L. A Residual Dense Attention Generative Adversarial Network for Microscopic Image Super-Resolution. Sensors (Basel). 2024;24(11):3560. pmid:38894350
16. Rezvani S, Soleymani Siahkar F, Rezvani Y, Alavi Gharahbagh A, Abolghasemi V. Single Image Denoising via a New Lightweight Learning-Based Model. IEEE Access. 2024;12:121077–92.
17. Dong C, Loy CC, Tang X. Accelerating the Super-Resolution Convolutional Neural Network. Lecture Notes in Computer Science. Springer International Publishing. 2016. 391–407.
18. Kim J, Lee JK, Lee KM. Deeply-Recursive Convolutional Network for Image Super-Resolution. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
19. Ahn N, Kang B, Sohn KA. Fast, accurate, and lightweight super-resolution with cascading residual network. 2018.
20. Hui Z, Gao X, Yang Y, Wang X. Lightweight Image Super-Resolution with Information Multi-distillation Network. In: Proceedings of the 27th ACM International Conference on Multimedia, 2019. 2024–32.
21. Deng W, Yuan H, Deng L. Reparameterized residual feature network for lightweight image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023. 1712–21.
22. Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C. GhostNet: More Features From Cheap Operations. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 1577–86.
23. Ding X, Zhang X, Ma N, Han J, Ding G, Sun J. RepVGG: Making VGG-style ConvNets Great Again. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. 13728–37.
24. Chen C, Guo Z, Zeng H. Repghost: A hardware-efficient ghost module via re-parameterization. arXiv preprint. 2022.
25. Luo G, Huang M, Zhou Y, et al. Towards Efficient Visual Adaption via Structural Re-parameterization. ArXiv.org. 2023. https://arxiv.org/abs/2302.08106
26. Tong T, Li G, Liu X, Gao Q. Image Super-Resolution Using Dense Skip Connections. In: 2017 IEEE International Conference on Computer Vision (ICCV), 2017. 4809–17.
27. Lai W-S, Huang J-B, Ahuja N, Yang M-H. Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 5835–43.
28. Nie Y, Han K, Liu Z. GhostSR: Learning Ghost Features for Efficient Image Super-Resolution. 2021.
29. Krishnamurthy K, Can T, Schwab DJ. Theory of gating in recurrent neural networks. 2020.
30. Wang Q, Wang H, Zang L, Jiang Y, Wang X, Liu Q, et al. Gated normalization unit for image restoration. Pattern Anal Applic. 2025;28(1).
31. Howard AG, Zhu M, Chen B, et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications[J]. 2017.
32. Agustsson E, Timofte R. NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017. 1122–31.
33. Wang YQ, Wang LG, Yang JG. Flickr1024: A large-scale dataset for stereo image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019. 0–0.
34. Bevilacqua M, Roumy A, Guillemot C, Morel MA. Low-Complexity Single-Image Super-Resolution based on Nonnegative Neighbor Embedding. In: Procedings of the British Machine Vision Conference 2012, 2012. 135.1-135.10.
35. Zeyde R, Elad M, Protter M. On Single Image Scale-Up Using Sparse-Representations. Lecture Notes in Computer Science. Springer Berlin Heidelberg. 2012. 711–30.
36. Martin D, Fowlkes C, Tal D, Malik J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001. 416–23.
37. Huang J-B, Singh A, Ahuja N. Single image super-resolution from transformed self-exemplars. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. 5197–206.
38. Matsui Y, Ito K, Aramaki Y, Fujimoto A, Ogawa T, Yamasaki T, et al. Sketch-based manga retrieval using manga109 dataset. Multimed Tools Appl. 2016;76(20):21811–38.
39. Kim J, Lee JK, Lee KM. Accurate Image Super-Resolution Using Very Deep Convolutional Networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
40. Li W, Li J, Gao G. Cross-receptive focused inference network for lightweight image super-resolution. ArXiv. 2022.
41. Li W, Li J, Gao G, Deng W, Yang J, Qi G-J, et al. Efficient Image Super-Resolution With Feature Interaction Weighted Hybrid Network. IEEE Trans Multimedia. 2025;27:2256–67.
42. Li W, Guo H, Hou Y. Dual-domain modulation network for lightweight image super-resolution. ArXiv.org. https://arxiv.org/abs/2503.10047
43. Zhang Y, Li K, Li K. Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European conference on computer vision (ECCV), 2018. 286–301.
© 2025 Li et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.