This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. Introduction
Deep learning has recently attracted attention due to its outstanding performances in computer vision (e.g., image classification and object detection), NLP, and reinforcement learning. In the military domain, unmanned aerial vehicles (UAVs) play a significate role in jamming and reconnaissance. Bai et al. [1] established a 3D UAV air combat model and a UAV maneuvering decision algorithm based on deep reinforcement learning to achieve autonomous operation of UAVs in the future. Saqlain et al. [2] applied deep learning and computer vision to retail management to boost retail sales, proposing a hybrid approach that can effectively monitor retail shelves and satisfy planograms. In face recognition systems, Yang and Song [3] improved the face recognition effect in different light intensities combined with the deep learning algorithm, which is of great practical value.
The success of deep learning is mainly attributed to the following three factors, i.e., powerful computing resources, complex network frameworks, and large-scale datasets. However, obtaining sufficient labeled data in many application scenarios, such as rare diseases, new species, and defective industrial products, is difficult or even impossible. When the annotated data are scarce, traditional deep learning methods generally perform unsatisfactorily. Considering that humans can rapidly establish cognition to novel concepts from just a single or a handful of examples, we hope the network can acquire the ability to recognize visual objects for novel classes with high accuracy and generalization by learning from only a few samples.
Towards the goal of shrinking the gap between human intelligence and artificial intelligence, few-shot learning, especially few-shot classification (FSC), was proposed. FSC aims to learn an effective classifier from the target dataset, which only contains a few labeled images for novel classes. However, different from general deep learning, it is impossible to train an effective classification model from scratch only using the target dataset due to its limited capacity. Therefore, current FSC methods usually employ a base dataset, which contains abundant labeled images for base classes and has no category intersection with the target dataset. The model is firstly pretrained on the base dataset to learn a feature extractor and then is transferred to the target domain for fine-tuning to boost the performance of FSC. At the pretraining stage, the feature extractor is pretrained either on the base dataset directly or by meta-learning which constructs massive few-shot tasks to imitate the target scenarios. As for the fine-tuning stage, current methods always choose the fine-tuning settings relying on experience, e.g., how to set the learning rate, which layers are selected to be frozen, and how many training epochs to be set. They prefer to set the learning rate as 0.001 [4, 5], usually select linear probing (updating only the last linear layer) [6] or full fine-tuning (updating all the model parameters) [7–9], and rarely mention how many training epochs are set. Since there are no validation and test images in the target dataset, it is impossible to evaluate the performance of the fine-tuned model, so how to set hyperparameters beyond experience remains a problem. In addition, the classifier parameters will also be quickly converged to a nonoptimal solution under few-shot conditions, which further reduces the classification performance.
To address the problems mentioned above, in this work, we propose a hybrid fine-tuning strategy (HFT) for FSC, as shown in Figure 1. We first pretrain on the base dataset to get the pretrained model and then fine-tune it on the target dataset according to the acquired hybrid fine-tuning strategy by HFT. The proposed HFT includes an FSLDA module and an AFT module. FSLDA constructs the optimal linear classifier by fully excavating the professional knowledge of the target dataset, which provides the last fully connected layer of the pretrained model a better starting point that fine-tuning with backpropagation probably cannot reach, thus guaranteeing the lower bound of the model accuracy. AFT executes adaptive epoch learning using the validation classes of the base dataset by designing an adaptive fine-tuning termination rule to obtain the optimal training epochs. Therefore, AFT sets hyperparameters by learning instead of experience and can prevent the model from overfitting. AFT also implements model performance evaluation to obtain the hybrid fine-tuning strategy. Finally, we update the pretrained model with the acquired hybrid fine-tuning strategy using the target dataset to get the HFT model. In summary, the main contributions of this study are as follows:
(1) We improve linear discriminant analysis for FSC and propose the FSLDA module, which can be used to initialize the last fully connected layer parameters of the pretrained model and guarantees the lower bound of the model accuracy. Ablation studies on mini-ImageNet dataset show that the Meta-Baseline method [10] with the FSLDA module alone has an average performance improvement of 3.07% and 2.99% under the layer frozen policy “Last1” and “All,” respectively.
(2) We introduce adaptive epoch learning to the fine-tuning stage and propose the AFT module, which can prevent the model from overfitting and output the hybrid fine-tuning strategy under different sample sizes and different layer frozen policies. Ablation results on mini-ImageNet dataset show that the Meta-Baseline method [10] with AFT under the layer frozen policy “All” further brings 0.40%, 0.99%, and 0.79% performance improvements for sample sizes of 10-shot, 20-shot, and 30-shot, respectively.
(3) The acquired hybrid fine-tuning strategy is evaluated under three recently proposed few-shot classification methods. Comparative experiments show that the proposed HFT has an average performance improvement of 2.30% on the mini-ImageNet dataset and 2.78% on the tiered-ImageNet dataset over current experience-based finetuning methods.
[figure(s) omitted; refer to PDF]
2. Related Works
2.1. Few-Shot Classification
Currently, many works have been proposed to address FSC [11–19], which can be mainly divided into three categories: initialization-based methods, metric-based methods, and hallucination-based methods. Initialization-based methods use the target dataset to fine-tune the pretrained model with a small number of gradient backpropagation steps [20, 21]. Metric-based methods extract features from both the labeled and unlabeled images and predict the class labels by computing the similarity metric function, such as cosine similarity [22], Euclidean distance [23], and relation modules [24]. Hallucination-based methods [25] focus on data augmentation by learning a generator from the base dataset and applying it to novel classes to expand the capacity of the target dataset. Recently, some works have employed self-supervision [26, 27], knowledge distillation [28, 29], and distribution calibration [30, 31] to strengthen the feature extractor or the last classifier. Our work is built on the metric-based pretraining methods and improves the initialization-based fine-tuning methods by introducing a hybrid fine-tuning strategy.
2.2. Fine-Tuning Strategy
Before fine-tuning the model with the target dataset, key hyperparameters need to be set, such as the layer frozen policy, the learning rate, and the training epochs. Due to the scarcity of the target dataset, we cannot judge whether the model is suboptimal, overfitted, or underfitted. Thus, current methods usually set the above hyperparameters by experience. There are two popular strategies for the layer frozen policy: running gradient descent on all model parameters [7–9] and fine-tuning the head but freezing lower layers [32]. Some works [33, 34] claim that fine-tuning all model parameters leads to better accuracy than only fine-tuning the head, while most researchers have no consistent conclusions about this. For the learning rate, the mainstream methods [35, 36] on FSC select to set it as 0.001. As for the training epochs, current methods use fixed settings, and their value is rarely mentioned. Recently, an evolutionary algorithm [37] has been proposed for searching the best finetuning configuration, focusing on the learning rate and the layer frozen policy. Our work emphasizes learning the best training epochs, which is essential to prevent the model from overfitting or underfitting and is complementary to the work in [37]. In addition, we propose the FSLDA module to construct the optimal linear classifier for FSC to avoid suboptimal solutions.
3. Methods
This section first introduces the preliminary foundations, including problem definition and model pretraining for FSC. We then give the technical details for the FSLDA and AFT modules, respectively.
3.1. Preliminary Foundations
3.1.1. Problem Definition
In the standard FSC task, we generally have a base dataset
3.1.2. Model Pretraining
A fundamental step for FSC is pretraining the model on the base dataset to provide a suitable feature extractor
3.2. Few-Shot LDA Module
Linear discriminant analysis (LDA) is a dimensionality reduction technique for supervised learning and is mainly used for classification. The core idea of LDA is to project high-dimensional data samples into the best vector space so that interclass distances are larger and intraclass distances are smaller in the new subspace. LDA needs to calculate the covariance matrix using the feature vectors of data samples in the support set or the target dataset. For FSC tasks, the feature dimension is usually larger than the number of data samples; thus, the covariance matrix is irreversible. To address this issue, FSLDA is proposed to initialize the head of the pretrained model by constructing the optimal linear classification function under few-shot conditions. As shown in Figure 2, we introduce the rank factor
[figure(s) omitted; refer to PDF]
Formally, the CNN model we train can be expressed as
According to the LDA theory (details are shown in the Appendix section), given a C-way K-shot task, the optimal linear classifier for class t is given by
To this end, we compute the precision matrix
Once the precision matrix
Finally, we use FSLDA classifier to compute
The FSLDA enables to initialize the parameters in
3.3. Adaptive Fine-Tuning Module
Drawing on the experience of meta-learning-based pretraining methods, we propose the AFT module to obtain the hybrid finetuning strategy. AFT firstly performs adaptive epoch learning using the idea of “chunk by chunk” on the validation classes of the base dataset, which evaluates the model’s performance for each chunk and establishes an adaptive termination rule to output an adaptive epoch that needs to be set at the fine-tuning stage. Then, the higher one between the FSLDA model and the adaptive fine-tuned model is retained, and the optimal hybrid epoch is acquired. Finally, the above procedures are executed on massive pseudofine-tuning tasks to output the final hybrid fine-tuning strategy, ensuring that most tasks converge to higher performance.
Specifically, massive pseudofine-tuning tasks, each of which includes a support set and a query set, are randomly sampled from the validation classes of the base dataset to imitate the fine-tuning task. Like metric-based meta-learning, the support set here is also of the C-way K-shot style. All the remaining samples in the selected classes are used as the query set to evaluate the performance of the model. As shown in Figure 3, we first use the support set to get the FSLDA model and obtain its accuracy
[figure(s) omitted; refer to PDF]
Then, we combine the advantages of the FSLDA model and the adaptive epoch learning and set the optimal hybrid epoch as
When the optimal hybrid epochs for M pseudofine-tuning tasks are ready, the optimal hybrid finetuning strategy can be finally acquired by
Algorithm 1: Pseudocode for the AFT module.
Input: the validation dataset:
Output: hybrid fine-tuning strategy represented by adaptive epoch:
Hyper-parameters: the total number of pseudofine-tuning tasks
for
for each chunk
for each node
end for
if
if
break;
end if
end for
end for
Set
if
else
The pipeline for AFT is summarized as Algorithm 1.
4. Experiments
In this section, we first briefly describe the experimental setup. Then, HFT experiments are carried out to give the hands-on hybrid fine-tuning strategy under different sample sizes and layer frozen policies. Finally, extensive comparison and ablation experiments on the benchmark datasets are conducted to demonstrate the effectiveness of our strategy.
4.1. Experimental Setup
4.1.1. Dataset
We employ mini-ImageNet [22] and tiered-ImageNet [38] datasets. Mini-ImageNet is a subset of ImageNet. It consists of 100 classes, and each class has 600 images with a size of 84
4.1.2. Implementation Details
Following the settings in [10], for the pretraining stage, we first train 100 epochs with batch size 128 on mini-ImageNet, and the learning rate decays at epoch 90. We use SGD optimizer with momentum 0.9, the learning rate 0.1, the decay factor 0.1, and the weight decay 0.0005. For the meta-learning stage, we use SGD optimizer with the weight decay 0.0005 and the learning rate 0.001. For the fine-tuning stage, we set up two kinds of layer frozen policies following [40], namely, fine-tuning all layers (“All,” updating all parameters of the model) and fine-tuning the last layer (“Last1,” allowing to update only the last fully connected layer of the model). We use the SGD optimizer with momentum 0.9, the weight decay 0.0005, and the learning rate 0.001. We use ResNet-18 as the backbone network and apply standard data augmentation, including random resized crop and random horizontal flip.
For the hyperparameter
[figure(s) omitted; refer to PDF]
4.2. HFT Experiments
Following Algorithm 1, we perform experiments on mini-ImageNet to give the hands-on hybrid fine-tuning strategy under different sample sizes (1, 5, 10, 20, 30) and different layer frozen policies (“Last1,” “All”).
The main results are shown in Table 1. For the layer frozen policy “Last1,” the optimal adaptive epoch is always 0 under different sample sizes, which means the FSLDA has initialized the head of the pretrained model so well that only fine-tuning the last layer cannot make the model achieve better performance. Thus, the hands-on hybrid fine-tuning strategy under the layer frozen policy “Last1” is only FSLDA that has constructed the optimal solution for the classifier. In this case, further fine-tuning may lead to suboptimal solutions. In contrast, the hands-on hybrid fine-tuning strategy is inconsistent for the layer frozen policy “All” under different sample sizes. For sample sizes of 1-shot and 5-shot, the hands-on hybrid fine-tuning strategy is also only FSLDA. A common assumption is that too few samples in the support set are not enough to update all the model parameters for better performance. While for sample sizes of 10-shot, 20-shot, and 30-shot, the optimal adaptive epoch is no longer 0. Moreover, as the sample size increases, the optimal adaptive epoch increases, but it is always smaller than the maximum number of epochs. Thus, the hands-on hybrid fine-tuning strategy for sample sizes of 10-shot, 20-shot, and 30-shot contains both FSLDA and AFT. This indicates that adaptive fine-tuning can achieve better performance under the layer frozen policy “All” as the sample size increases.
Table 1
The hands-on hybrid fine-tuning strategy acquired by the proposed method under different sample sizes and layer frozen policies.
Layer frozen policy | 1-Shot | 5-Shot | 10-Shot | 20-Shot | 30-Shot |
Last1 | 0 | 0 | 0 | 0 | 0 |
All | 0 | 0 | 1400 | 1600 | 1800 |
Furthermore, Figure 5 shows typical convergence curves of testing accuracy during adaptive epoch learning on mini-ImageNet under different layer frozen policies and sample sizes. Here, FT-All and FT-Last1, respectively, refer to updating all parameters of the model and updating only the head, where the head is initialized randomly and the fixed epoch is set by experience. HFT-All and HFT-Last1 refer to performing fine-tuning under the corresponding layer frozen policies “All” and “Last1,” where the head is initialized by FSLDA and the epoch is set according to the acquired hands-on hybrid fine-tuning strategy. FSLDA refers to testing accuracy of the FSLDA model without fine-tuning. Note that we show the full curves for HFT-All and HFT-Last1 in Figure 5 for better comparison. We can see that, for sample sizes of 1-shot and 5-shot, the performance of the FSLDA model (purple dotted horizontal line) is always better than those of other methods, indicating that FSLDA is enough when the sample size is no more than 5. While for sample sizes of 10-shot, 20-shot, and 30-shot, the FSLDA model outperforms FT-Last1 (blue lines) and HFT-Last1 (green lines) but is not as good as FT-All (black lines) and HFT-All (red lines) and the latter one is slightly better. These all indicate the reasonableness of the acquired hands-on hybrid fine-tuning strategy.
[figure(s) omitted; refer to PDF]
4.3. Comparative Experiments
Based on the hands-on hybrid fine-tuning strategy obtained in Section 4.2, we now compare the performance of the hybrid fine-tuning strategy (HFT-Last1/HFT-All) with that of the traditional fine-tuning strategy (FT-Last1/FT-All) under different pretraining methods including RFS-simple [29], SKD-GEN0 [41], and R2D2 [42]. For the sake of fairness, the training epoch for FT-Last1/FT-All is set as
Table 2 shows the comparison results on mini-ImageNet. We can see that the accuracy of HFT-Last1/HFT-All is consistently higher than its corresponding accuracy of FT-Last1/FT-All under all sample sizes, layer frozen policies, and pretraining methods. Compared with FT-Last1/FT-All, HFT-Last1/HFT-All has an average performance improvement of 2.30% on the whole, which proves the effectiveness of combining the advantages of FSLDA and AFT. In addition, the results show that the average performance gains of the layer frozen policy “Last1” are higher than those of the layer frozen policy “All” (3.83% vs. 1.90%, 2.36% vs. 1.19%, and 1.38% vs. 0.86%). Since HFT-Last1 is indeed FSLDA, this phenomenon validates that the linear classifier constructed by FSLDA is much better than that acquired by fine-tuning. Thirdly, for sample size from 1-shot to 30-shot, HFT-Last1/HFT-All achieves an average performance improvement of 1.78%
Table 2
Comparison results under different pretraining methods on mini-ImageNet. “Pre-tra” and “Lay-fro” are short for the pretraining method and the layer frozen policy, respectively. We report the mean accuracy of 600 episodes and the 95% confidence intervals.
Pre-tra | Lay-fro | 1-Shot | 5-Shot | 10-Shot | 20-Shot | 30-Shot | Average gain |
R2D2 | FT-Last1 | 50.58 | 66.15 | 71.07 | 75.63 | 76.56 | 3.83 |
HFT-Last1 | 53.47 | 70.13 | 74.72 | 79.67 | 81.16 | ||
FT-all | 51.39 | 68.63 | 73.38 | 79.43 | 80.84 | 1.90 | |
HFT-all | 53.47 | 70.13 | 75.49 | 81.41 | 82.66 | ||
SKD-GEN0 | FT-Last1 | 57.83 | 73.91 | 78.19 | 85.03 | 87.01 | 2.36 |
HFT-Last1 | 60.74 | 77.45 | 81.30 | 86.31 | 87.96 | ||
FT-all | 59.94 | 74.34 | 81.69 | 86.96 | 87.43 | 1.19 | |
HFT-all | 60.74 | 77.45 | 82.34 | 87.15 | 88.63 | ||
RFS-simple | FT-Last1 | 56.99 | 72.43 | 76.27 | 82.97 | 84.02 | 1.38 |
HFT-Last1 | 58.41 | 73.66 | 78.85 | 83.58 | 85.10 | ||
FT-all | 57.10 | 72.79 | 79.31 | 83.01 | 85.23 | 0.86 | |
HFT-all | 58.41 | 73.66 | 79.69 | 83.81 | 86.16 | ||
Average gain | 2.29 | 2.85 | 2.50 | 1.78 | 2.12 | 2.30 |
For tiered-ImageNet dataset, the category correlations between the training set and the test set are weak, and thus, it is more suitable for testing the generalization ability to novel few-shot classification tasks. The comparison results are shown in Table 3. Overall, we can see an average performance improvement of 2.78% for HFT-Last1/HFT-All, surpassing the average gain of 2.30% on mini-ImageNet. This shows that the proposed algorithm has strong generalization ability and can better adapt to novel few-shot classification scenarios. For layer frozen policies “Last1” and “All”, HFT-Last1/HFT-All achieves an average performance improvement of 2.66%
Table 3
Comparison results under different pretraining methods on tiered-ImageNet. “Pre-tra” and “Lay-fro” are short for the pretraining method and the layer frozen policy, respectively. We report the mean accuracy of 600 episodes and the 95% confidence intervals.
Pre-tra | Lay-fro | 1-Shot | 5-Shot | 10-Shot | 20-Shot | 30-Shot | Average gain |
R2D2 | FT-Last1 | 52.10 | 68.99 | 73.21 | 76.82 | 80.38 | 2.66 |
HFT-Last1 | 55.18 | 72.26 | 75.19 | 80.35 | 81.82 | ||
FT-all | 52.90 | 70.87 | 75.02 | 80.04 | 84.69 | 1.45 | |
HFT-all | 55.18 | 72.26 | 76.57 | 81.50 | 85.24 | ||
SKD-GEN0 | FT-Last1 | 60.51 | 76.28 | 80.54 | 83.84 | 86.10 | 3.58 |
HFT-Last1 | 64.17 | 79.42 | 83.75 | 87.60 | 90.25 | ||
FT-all | 61.05 | 76.37 | 83.46 | 87.01 | 90.45 | 1.58 | |
HFT-all | 64.17 | 79.42 | 83.79 | 87.76 | 91.09 | ||
RFS-simple | FT-Last1 | 60.45 | 74.09 | 78.86 | 83.36 | 83.51 | 2.86 |
HFT-Last1 | 63.76 | 77.74 | 81.27 | 85.35 | 86.45 | ||
FT-all | 60.56 | 75.39 | 80.18 | 83.90 | 88.04 | 1.77 | |
HFT-all | 63.76 | 77.74 | 81.42 | 84.98 | 89.01 | ||
Average gain | 3.37 | 3.37 | 2.41 | 2.51 | 2.13 | 2.78 |
4.4. Ablation Experiments
In this section, we analyze the effects of FSLDA and AFT modules in our HFT, respectively. The experiments are carried out on mini-ImageNet under the two layer frozen policies “Last1” and “All,” employing the Meta-Baseline pretraining method [10]. The results are shown in Table 4. For the layer frozen policy “Last1,” HFT is indeed FSLDA; thus, AFT is useless (✓/
Table 4
Ablation experiments on mini-ImageNet employing the meta-baseline pretraining method. We report the mean accuracy of 600 episodes and the 95% confidence intervals.
FSLDA | AFT | 1-Shot | 5-Shot | 10-Shot | 20-Shot | 30-Shot | |
Last1 | 48.14 | 69.32 | 74.02 | 81.66 | 86.39 | ||
✓ | ✓/ | 50.40 | 73.67 | 78.05 | 84.48 | 88.27 | |
All | 47.06 | 68.20 | 73.92 | 82.94 | 87.82 | ||
✓ | 50.40 | 73.67 | 78.05 | 84.48 | 88.27 | ||
✓ | ✓ | — | — | 78.45 | 85.47 | 89.06 |
We can see that using FSLDA alone can perform consistently better than traditional fine-tuning methods under different sample sizes and layer frozen policies. For the layer frozen policy “Last1,” FSLDA alone achieves 2.26%, 4.35%, 4.03%, 2.82%, and 1.88% gains under the sample sizes of 1-shot, 5-shot, 10-shot, 20-shot, and 30-shot, respectively. Overall, it has an average performance improvement of 3.07%. For the layer frozen policy “All,” FSLDA also achieves gains of 3.34%, 5.47%, 4.13%, 1.54%, and 0.45% under the corresponding sample sizes though FSLDA is only designed for the last layer. Moreover, it obtains an average increase of 2.99% on the whole, which is close to that under the layer frozen policy “Last1.” A common explanation for this is that fine-tuning the classifier of the model using few-shot samples in the support set usually converges to a suboptimal solution, leading to the fine-tuned model’s poor performance. FSLDA gives the classifier an optimal solution by fully excavating the professional knowledge of the novel classes, which means the FSLDA model outperforms the model with the experience-based fine-tuning method, even without fine-tuning. For the layer frozen policy “All,” AFT brings 0.40%, 0.99%, and 0.79% performance improvements over individual FSLDA under the sample sizes of 10-shot, 20-shot, and 30-shot, respectively, and the average gain reaches 0.72%. This is because the adaptive epoch obtained by AFT can predictably help the FSLDA model update parameters through backpropagation while preventing the model from underfitting and overfitting, which enables the model to achieve better performance than the FSLDA model alone. One interesting thing is that the accuracies of the policy “All” under sample sizes of 1-shot, 5-shot, and 10-shot are lower than those of the policy “Last1” for the traditional fine-tuning method, which is not consistent with the conclusions of [33, 34] and brings uncertainty to the choice of the layer frozen policy.
5. Conclusion
In this study, we have introduced a hybrid fine-tuning strategy (HFT) for FSC, including the FSLDA and AFT modules. FSLDA constructs the optimal linear classifier, and AFT outputs the hybrid fine-tuning strategy based on the FSLDA model. HFT solves the problem that the linear classifier is suboptimal under few-shot conditions and prevents the model from overfitting and underfitting by using the acquired hands-on hybrid finetuning strategy. By conducting extensive experiments, we find HFT achieves consistent performance improvements compared to traditional finetuning methods under different sample sizes, layer frozen policies, and few-shot classification frameworks. Intuitively, our HFT has enormous potential for FSC and even for few-shot learning. In the future, we will try to explore automatic learning methods of more hyperparameters for the fine-tuning stage.
Acknowledgments
This work was partly supported by the National Natural Science Foundation of China (Grant nos. U19B2033 and 62076020).
LDA classifier: LDA is a classical optimal linear classifier using Bayes’ theorem. For a C-way K-shot classification task, let
To simplify the problem, LDA assumes that
Thus, the posterior probability can be written as
Then, LDA takes the logarithm of the posterior probability (ignores the constant item):
For a C-way K-shot classification task,
Equations (A.3), (A.4), and (A.8) form the LDA classifier as used in Section 3.2.
[1] S. Bai, S. Song, S. Liang, J. Wang, B. Li, E. Neretin, "Uav maneuvering decision-making algorithm based on twin delayed deep deterministic policy gradient algorithm," Journal of Artificial Intelligence and Technology, vol. 2 no. 1, pp. 16-22, 2022.
[2] M. Saqlain, S. Rubab, M. M. Khan, N. Ali, S. Ali, "Hybrid approach for shelf monitoring and planogram compliance (hyb-smpc) in retails using deep learning and computer vision," Mathematical Problems in Engineering, vol. 2022,DOI: 10.1155/2022/4916818, 2022.
[3] Y. Yang, X. Song, "Research on face intelligent perception technology integrating deep learning under different illumination intensities," Journal of Computational and Cognitive Engineering, vol. 1, pp. 32-36, 2022.
[4] Z. Lu, S. He, X. Zhu, L. Zhang, Y.-Z. Song, T. Xiang, "Simpler is better: few-shot semantic segmentation with classifier weight transformer," Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8741-8750, DOI: 10.1109/ICCV48922.2021.00862, .
[5] L. Yang, W. Zhuo, L. Qi, Y. Shi, Y. Gao, "Mining latent classes for few-shot segmentation," Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8721-8730, DOI: 10.1109/ICCV48922.2021.00860, .
[6] D. Hendrycks, K. Lee, M. Mazeika, "Using pre-training can improve model robustness and uncertainty," Proceedings of the International Conference on Machine Learning, pp. 2712-2721, .
[7] R. Girshick, J. Donahue, T. Darrell, J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580-587, .
[8] P. Agrawal, R. Girshick, J. Malik, "Analyzing the performance of multilayer neural networks for object recognition," pp. 329-344, .
[9] K. Chatfield, K. Simonyan, A. Vedaldi, A. Zisserman, "Return of the Devil in the Details: Delving Deep into Convolutional Nets," 2014. https://arxiv.org/abs/1405.3531
[10] Y. Chen, X. Wang, Z. Liu, H. Xu, T. Darrell, "A New Meta-Baseline for Few-Shot Learning," 2020. https://arxiv.org/abs/2003.04390
[11] P. Ma, Z. Zhang, J. Wang, W. Zhang, J. Liu, Q. Lu, Z. Wang, "Review on the application of metalearning in artificial intelligence," Computational Intelligence and Neuroscience, vol. 2021,DOI: 10.1155/2021/1560972, 2021.
[12] P. Bateni, R. Goyal, V. Masrani, F. Wood, L. Sigal, "Improved few-shot visual classification," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,DOI: 10.1109/CVPR42600.2020.01450, .
[13] D. Chen, Y. Chen, Y. Li, F. Mao, Y. He, H. Xue, "Self-supervised learning for few-shot image classification," pp. 1745-1749, .
[14] J. Hong, P. Fang, W. Li, T. Zhang, C. Simon, M. Harandi, L. Petersson, "Reinforced attention for few-shot learning and beyond," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 913-923, .
[15] A. Li, T. Luo, T. Xiang, W. Huang, L. Wang, "Few-shot learning with global class representations," Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9715-9724, DOI: 10.1109/ICCV.2019.00981, .
[16] H.-J. Ye, H. Hu, D.-C. Zhan, F. Sha, "Few-shot learning via embedding adaptation with set-to-set functions," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8808-8817, .
[17] J. Zhang, C. Zhao, B. Ni, M. Xu, X. Yang, "Variational few-shot learning," Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1685-1694, .
[18] D. Zhang, T. Yang, "Visual object tracking algorithm based on biological visual information features and few-shot learning," Computational Intelligence and Neuroscience, vol. 2022,DOI: 10.1155/2022/3422859, 2022.
[19] Z.-M. Wang, J.-Y. Tian, J. Qin, H. Fang, L.-M. Chen, "A few-shot learning-based siamese capsule network for intrusion detection with imbalanced training data," Computational Intelligence and Neuroscience, vol. 2021,DOI: 10.1155/2021/7126913, 2021.
[20] C. Finn, P. Abbeel, S. Levine, "Model-agnostic meta-learning for fast adaptation of deep networks," Proceedings of the International Conference on Machine Learning, pp. 1126-1135, .
[21] C. Finn, K. Xu, S. Levine, "Probabilistic model-agnostic meta-learning," Advances in Neural Information Processing Systems, vol. 31, 2018.
[22] O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, K. Kavukcuoglu, "Matching networks for one shot learning," Advances in Neural Information Processing Systems, vol. 29,DOI: 10.48550/arXiv.1606.04080, 2016.
[23] J. Snell, K. Swersky, R. Zemel, "Prototypical networks for few-shot learning," Advances in Neural Information Processing Systems, vol. 30, 2017.
[24] F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, T. M. Hospedales, "Learning to compare: relation network for few-shot learning," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1199-1208, .
[25] B. Hariharan, R. Girshick, "Low-shot visual recognition by shrinking and hallucinating features," Proceedings of the IEEE International Conference on Computer Vision, pp. 3018-3027, .
[26] P. Mangla, N. Kumari, A. Sinha, M. Singh, B. Krishnamurthy, V. N. Balasubramanian, "Charting the right manifold: manifold mixup for few-shot learning," Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2218-2227, .
[27] C. Liu, Y. Fu, C. Xu, S. Yang, J. Li, C. Wang, L. Zhang, "Learning a few-shot embedding model with contrastive learning," vol. 35 no. 10, pp. 8635-8643, DOI: 10.1609/aaai.v35i10.17047, .
[28] C. Xu, Y. Fu, C. Liu, C. Wang, J. Li, F. Huang, L. Zhang, X. Xue, "Learning dynamic alignment via meta-filter for few-shot learning," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5182-5191, DOI: 10.1109/CVPR46437.2021.00514, .
[29] Y. Tian, Y. Wang, D. Krishnan, J. B. Tenenbaum, P. Isola, "Rethinking few-shot image classification: a good embedding is all you need?," pp. 266-282, .
[30] N. Lai, M. Kan, C. Han, X. Song, S. Shan, "Learning to learn adaptive classifier–predictor for few-shot learning," IEEE Transactions on Neural Networks and Learning Systems, vol. 32 no. 8, pp. 3458-3470, DOI: 10.1109/tnnls.2020.3011526, 2021.
[31] J. Liu, L. Song, Y. Qin, "Prototype rectification for few-shot learning," pp. 741-756, .
[32] J. P. Miller, R. Taori, A. Raghunathan, S. Sagawa, P. W. Koh, V. Shankar, P. Liang, Y. Carmon, L. Schmidt, "Accuracy on the line: on the strong correlation between out-of-distribution and in-distribution generalization," pp. 7721-7735, .
[33] S. Kornblith, J. Shlens, Q. V. Le, "Do better imagenet models transfer better?," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2661-2671, .
[34] K. He, H. Fan, Y. Wu, S. Xie, R. Girshick, "Momentum contrast for unsupervised visual representation learning," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729-9738, .
[35] X. Wang, T. Huang, J. Gonzalez, T. Darrell, F. Yu, "Frustratingly simple few-shot object detection," Proceedings of the International Conference on Machine Learning, pp. 9919-9928, .
[36] C. Zhu, F. Chen, U. Ahmed, Z. Shen, M. Savvides, "Semantic relation reasoning for shot-stable few-shot object detection," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8782-8791, .
[37] Z. Shen, Z. Liu, J. Qin, M. Savvides, K.-T. Cheng, "Partial is better than all: revisiting fine-tuning strategy for few-shot learning," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35 no. 11, pp. 9594-9602, DOI: 10.1609/aaai.v35i11.17155, 2021.
[38] M. Ren, E. Triantafillou, S. Ravi, J. Snell, K. Swersky, J. B. Tenenbaum, H. Larochelle, R. S. Zemel, "Meta-learning for semi-supervised few-shot classification," Training, vol. 1 no. 2, 2018.
[39] S. Ravi, H. Larochelle, Optimization as a Model for Few-Shot Learning, 2017.
[40] Y. Guo, N. C. Codella, L. Karlinsky, J. V. Codella, J. R. Smith, K. Saenko, T. Rosing, R. Feris, "A broader study of cross-domain few-shot learning," pp. 124-141, .
[41] J. Rajasegaran, S. Khan, M. Hayat, F. S. Khan, M. Shah, "Self-supervised Knowledge Distillation for Few-Shot Learning," 2020. https://arxiv.org/abs/2006.09785
[42] L. Bertinetto, J. F. Henriques, P. Torr, A. Vedaldi, "Meta-learning with differentiable closed-form solvers," Proceedings of the International Conference on Learning Representations, .
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2022 Lei Zhao et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/
Abstract
Few-shot classification aims to enable the network to acquire the ability of feature extraction and label prediction for the target categories given a few numbers of labeled samples. Current few-shot classification methods focus on the pretraining stage while fine-tuning by experience or not at all. No fine-tuning or insufficient fine-tuning may get low accuracy for the given tasks, while excessive fine-tuning will lead to poor generalization for unseen samples. To solve the above problems, this study proposes a hybrid fine-tuning strategy (HFT), including a few-shot linear discriminant analysis module (FSLDA) and an adaptive fine-tuning module (AFT). FSLDA constructs the optimal linear classification function under the few-shot conditions to initialize the last fully connected layer parameters, which fully excavates the professional knowledge of the given tasks and guarantees the lower bound of the model accuracy. AFT adopts an adaptive fine-tuning termination rule to obtain the optimal training epochs to prevent the model from overfitting. AFT is also built on FSLDA and outputs the final optimum hybrid fine-tuning strategy for a given sample size and layer frozen policy. We conducted extensive experiments on mini-ImageNet and tiered-ImageNet to prove the effectiveness of our proposed method. It achieves consistent performance improvements compared to existing fine-tuning methods under different sample sizes, layer frozen policies, and few-shot classification frameworks.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details




1 Institute of Automation, Chinese Academy of Sciences, Beijing, China; University of Electronic Science and Technology of China, Chengdu, China
2 University of Electronic Science and Technology of China, Chengdu, China
3 Institute of Automation, Chinese Academy of Sciences, Beijing, China