Content area

Abstract

Time series forecasting, particularly within the Internet of Things (IoT) and hydrological domains, plays a critical role in predicting future events based on historical data, which is essential for strategic decision making. Effective flood forecasting is pivotal for optimal water resource management and for mitigating the adverse impacts of flood events. While deep learning methods have demonstrated exceptional performance in time series prediction through advanced feature extraction and pattern recognition, they encounter significant limitations when applied to scenarios with sparse data, especially in flood forecasting. The scarcity of historical data can severely hinder the generalization capabilities of traditional deep learning models, presenting a notable challenge in practical flood prediction applications. To address this issue, we introduce MetaTrans-FSTSF, a pioneering meta-learning framework that redefines few-shot time series forecasting. By innovatively integrating MAML and Transformer architectures, our framework provides a specialized solution tailored for the unique challenges of flood prediction, including data scarcity and complex temporal patterns. This framework goes beyond standard implementations, delivering significant improvements in predictive accuracy and adaptability. Our approach leverages Model-Agnostic Meta-Learning (MAML) to enable rapid adaptation to new forecasting tasks with minimal historical data. Our inner architecture is a Transformer-based meta-predictor capable of capturing intricate temporal dependencies inherent in flood time series data. Our framework was evaluated using diverse datasets, including a real-world hydrological dataset from a small catchment area in Wuyuan, China, and other benchmark time series datasets. These datasets were preprocessed to align with the meta-learning approach, ensuring their suitability for tasks with limited data availability. Through extensive evaluation, we demonstrate that MetaTrans-FSTSF substantially improves predictive accuracy, achieving a reduction of up to 16%, 19%, and 8% in MAE compared to state-of-the-art methods. This study highlights the efficacy of meta-learning techniques in overcoming the limitations posed by data scarcity and enhancing flood forecasting accuracy where historical data are limited.

Full text

Turn on search term navigation

1. Introduction

Smart water technology is essential for managing and allocating water resources, especially in flood forecasting, which has gained importance due to the significant and consistent increase in floods and extreme precipitation caused by global warming [1]. In addition to climate change, urbanization has significantly increased urban flood hazards in China’s 293 major cities, with 70% experiencing increased hazard in the last decade [2], emphasizing the importance of accurate flood prediction. To address this issue, smart water technology integrates distributed water resource management systems into a comprehensive hydraulic network [3] using popular technologies such as 5G/6G [4,5], sensor networks [6,7], artificial intelligence, etc. Building on this foundation, the Internet of Things (IoT) revolutionizes water management by enabling self-assessment and addressing issues through the analysis of data collected from these smart water systems [8,9]. IoT technologies are increasingly being integrated into flood early warning systems since these technologies facilitate the prediction, monitoring, and detection of flood events [10]. Although they cannot prevent flood disasters, they are valuable tools for transmitting data for disaster preparedness.

The transmitted data can be used for time series forecasting (TSF), which refers to the process of forecasting future values of a time series dataset based on historical data [11]. Consequently, TSF can provide short-term and long-term flood forecasts, which are vital for decision making to ensure accurate assessments of water resource infrastructure capabilities [12]. Notable examples include the European Centre for Medium-Range Weather Forecasts (ECMWF) Global Flood Awareness System (GloFAS) [13], and Google Flood Hub [14]. ECMWF GloFAS integrates numerical weather prediction with hydrological modeling to provide global flood forecasts, offering comprehensive spatial coverage and high temporal resolution. Similarly, Google Flood Hub employs machine learning techniques, leveraging vast datasets such as satellite imagery, meteorological data, and topographical information to deliver precise and timely flood risk assessments. These models have set benchmarks in the field, demonstrating the potential of data-driven and hybrid approaches to mitigate flood impacts effectively. Deep learning’s ability to automatically extract features, recognize complex patterns, and handle large datasets has made it dominant in time series forecasting [15,16]. These advantages significantly enhance the accuracy and effectiveness of TSF in various IoT applications.

However, with their numerous parameters, deep learning models require substantial amounts of data for effective training, as shown in Figure 1a. In data-scarce scenarios, these models may lack sufficient information to learn meaningful representations. Specifically, flood events are infrequent, and collecting and maintaining large datasets is costly. Additionally, the scarcity of comprehensive hydrological data in remote areas complicates flood prediction, as shown in Figure 1b.

Few-shot learning (FSL) offers a promising solution to these issues by enabling models to adapt quickly to new tasks with limited data [17]. In the past decade, considerable effort has been dedicated to FSL, especially in the area of computer vision. One common method for addressing data scarcity is through data augmentation techniques, which expand the training dataset by generating samples through rotation, scaling, or flipping [18,19,20,21]. Another common strategy is transfer learning, where models are pre-trained on large datasets from related domains and then fine-tuned on the target task [22,23,24]. Compared to time series data, image data are generally stationary, meaning that the statistical properties of local regions remain relatively constant across spatial positions. This characteristic also facilitates the transfer of feature extractors between different image datasets. In contrast, flood time series data often involve rare and complex events, which are infrequent and exhibit intricate patterns. Data augmentation struggles to replicate the complexity of these rare events, and transfer learning depends on the similarity between source and target domains [25]. Significant differences in climate, terrain, or hydrological conditions between source and target flood data can render transfer learning ineffective or introduce biases.

Meta-learning is a promising alternative that focuses on developing the ability to learn from limited data. Unlike traditional approaches, meta-learning is designed to adapt quickly to new tasks with minimal data by leveraging knowledge from previous tasks. Meta-learning frameworks, such as Model-Agnostic Meta-Learning (MAML) [26] and Matching Networks [27], have demonstrated their effectiveness in enhancing model performance across various domains. By optimizing the model’s ability to generalize from limited data and adapt to various scenarios, meta-learning can enhance performance and overcome the limitations of data augmentation and transfer learning in time series. This capability is particularly useful for flood time series prediction, where data scarcity and variability pose significant challenges. In contrast to traditional approaches that heavily rely on extensive historical data, references [28,29] have attempted to leverage meta-learning to enhance the adaptability of hydrological models with limited data availability. However, these models often focus narrowly on specific types of hydrological data and do not address the broader challenges of generalizability across diverse flood scenarios. Similarly, authors offer valuable insights into the application of meta-learning in regulated environments, yet they fall short in contexts where data irregularities and environmental variability predominate [30]. However, these models typically focus on specific types of hydrological data and fail to address the challenges of generalizability across diverse flood scenarios.

Therefore, considering the challenges brought by the dynamic and non-stationary nature of time series data for few-shot learning and the complexity of hyperparameter selection in deep learning models, we develop a meta-predictor based on meta-learning for few-shot time series prediction (MetaTrans-FSTSF). Our MetaTrans-FSTSF framework not only addresses these issues through a robust Transformer-based meta-predictor capable of capturing complex temporal dependencies but also demonstrates superior adaptability across a broader spectrum of flood prediction tasks. The main contributions of this paper are as follows:

  • We reformulate the flood time series prediction as a few-shot learning problem, enabling the model to generalize and adapt to flood forecasting with limited historical data by leveraging meta-learning techniques.

  • To address the few-shot learning challenge in time series forecasting, we propose MetaTrans-FSTSF, which is a novel framework that integrates the strengths of MAML and Transformer models into a unified architecture specifically designed for few-shot time series forecasting. Unlike existing applications, our framework emphasizes the rapid adaptation and precise handling of sparse flood time series data through innovative meta-learning and attention mechanisms.

  • Extensive experiments are conducted on various benchmark flood datasets, demonstrating the superior performance of our MAML-based meta-predictor compared to state-of-the-art methods and validating the effectiveness of individual components within the proposed framework.

The remainder of this paper is organized as follows. Section 2 reviews the related works. Section 3 covers the preliminaries essential for understanding our approach. Section 4 presents our proposed MetaTrans-FSTSF framework, detailing the model architecture and its components. Section 5 describes the experimental setup and a comprehensive analysis of the experimental results. Finally, Section 6 concludes the paper and discusses potential directions for future research.

2. Related Work

2.1. Deep Learning for Time Series Forecasting

Time series forecasting is a fundamental scientific problem that is essential for extracting patterns from time series data to accurately predict future values. Recent studies have demonstrated the profound impact of time series forecasting across various fields. For instance, Chen et al. [31] proposed an attention-based deep learning framework to predict the remaining useful life of machinery. In [32], the authors developed a deep learning model that combines attention-based Conv-LSTM modules with Bidirectional LSTM (Bi-LSTM), effectively capturing spatial, short-term temporal features and long-term periodic patterns in traffic flow data to enhance prediction accuracy. In [33], Yi et al. proposed the FreTS model, which is a frequency-domain feature-based Multi-Layer Perceptron (MLP) method that enhances short/long-term prediction performance by learning signals in the frequency domain. Moreover, the authors introduced a novel Fourier Graph Neural Network (FourierGNN) based on Graph Neural Networks (GNNs) and Fourier Graph Operators (FGOs), converting time series forecasting into predictions on hypervariable graphs to achieve efficient and effective forecasting [34].

Deep learning models excel in time series forecasting with abundant data but face challenges in real-world scenarios such as data scarcity and imbalance, which limits their generalization and practical applicability. This work aims to improve forecasting accuracy and robustness with limited samples.

2.2. Few-Shot Learning

FSL has shown significant potential in addressing the challenges of data scarcity, particularly in fields such as image classification and natural language processing. Existing FSL methods can be broadly categorized into three types: data-based, feature-based, and task-based, with a predominant focus on computer vision, particularly image classification datasets.

2.2.1. Data-Based Methods

Data-based methods typically utilize data augmentation techniques, such as rotation, scaling, and flipping, to expand the training dataset by generating modified versions of existing samples [35,36,37,38]. Additionally, methods based on Generative Adversarial Networks (GANs) are widely applied to further enrich the training data by generating new samples [39,40,41,42]. However, applying these techniques to time series data is challenging because time series data often contain rare events and complex patterns, which data augmentation struggles to replicate with the same complexity.

2.2.2. Feature-Level Methods

Feature-level approaches, such as transfer learning, are widely utilized, where models are pre-training on large datasets from related domains and then fine-tuning on the target task to improve generalization and performance [43,44]. Researchers successfully showed the efficiency of transfer learning for time series forecasting classification [45,46,47] and regression [48,49,50]. For instance, Kimura et al. [51] introduce a transfer-learning approach to a convolutional neural network flood model for East Asian regions to predict time series variables like water level. In their work, the authors successfully show that transfer learning improves the prediction accuracy of a classification task. Xu et al. [50] employed a Transfer Learning framework based on a Transformer (TL-Transformer) to transfer hydrological knowledge from data-rich to data-sparse basins, demonstrating that transfer learning can improve flood forecast accuracy in areas with limited observations. However, their methods were only trained using task-specific data, indicating an area for further research and generalization.

Transfer learning can improve flood prediction accuracy, but data scarcity and regional variability in time series data can render it ineffective or introduce bias. In our framework, these differences are mitigated by designing a diverse task distribution during meta-training. By sampling time series prediction tasks from a variety of domains, the meta-learner is trained to adapt effectively to new, unseen tasks even if they differ significantly from the training distribution. This diversity ensures that the learned initialization parameters encapsulate broad adaptability rather than being biased toward a specific domain. Moreover, the regression-based approaches discussed above fail to concurrently achieve two critical objectives simultaneously: being designed to easily adapt to unfamiliar tasks without relying on ad hoc fine-tuning strategies and effectively overcoming the extreme challenges that arise when forecasting time series with few shots (TSF).

2.2.3. Task-Based Methods

In contrast to the aforementioned methods, task-based few-shot learning, such as meta-learning, extracts meta-knowledge from both data and tasks [52]. Meta-knowledge leverages experience from previous tasks to improve learning capabilities for new tasks, enabling the model to quickly adapt and optimize performance in novel scenarios or contexts [53]. The authors in [54] used a meta-learning approach for classifying time series with few shots. Their research emphasizes the importance of fast adaptation to an arbitrary time series classification task with a limited number of labeled samples. However, the paper focuses solely on classification and does not address regression.

3. Preliminaries

3.1. Problem Definition

In traditional machine learning, we typically work with a single dataset D, which is split into a training set Dtrain and a test set Dtest for model training and evaluation, respectively. However, in few-shot learning, the limited samples in the dataset pose challenges in deep learning models. To address this, we use meta-learning for few-shot time series forecasting. Meta-learning enables the model to quickly adapt to new tasks with few samples by learning how to learn from a variety of tasks T={Ti}i=1N, where each task Ti has limited samples. Specifically, we work with a meta-set D that comprises multiple episodes [27], where each episode DTi=(DTis,DTiq)D includes a support set DTis and a query set DTiq.

The meta-set D is divided into three components:

(1)D={Dmeta-tr,Dmeta-val,Dmeta-te},

where Dmeta-tr is used to train the meta-learner that, given a support set Ds, can generate a predictor capable of achieving high performance on the corresponding query set Dq. Dmeta-val is used to fine-tune the hyperparameters of the meta-learner, ensuring that the predictor does not overfit, while Dmeta-te evaluates the performance of meta-learner, confirming its effectiveness in flood forecasting characterized by data scarcity. An example of meta-learning in few-shot time series forecasting is illustrated in Figure 2. The meta-learning paradigm treats flood prediction as one task among many in the meta-test set. The meta-predictor leverages shared knowledge across tasks (other time series datasets) to adapt efficiently to each flood prediction task, even with limited historical data. This adaptability is crucial in regions with sparse hydrological records. Each gray rectangle represents a meta-testing task, which corresponds to a specific flood prediction scenario. Each gray rectangle contains two dashed-line boxes. The support set is used to fine-tune the meta-predictor for a specific flood prediction task, while the query set evaluates its generalization capability. This division mimics the training–testing paradigm for individual tasks within a meta-learning context. In each dashed box, blue waveforms represent input sequences (e.g., past water level and rainfall data) fed into the meta-predictor. These inputs encapsulate historical time series data essential for modeling temporal dependencies in flood events. Red waveforms represent the predicted sequences (e.g., future water levels) produced by the meta-predictor.

3.2. Meta-Learning Paradigm

In traditional deep time series forecasting, a model is trained on a large dataset where each sample consists of input features xi and corresponding labels yi. The goal is to find a set of parameters θ that minimizes the prediction error between the model’s output f(xi;θ) and the label yi. This error is typically quantified by a loss function L. The optimization objective is to find the optimal parameters θ* that minimize the average loss over all samples:  

(2)θ*=argminθ1Ni=1NL(f(xi;θ),yi).

Meta-learning operates at a higher level, focusing on learning to learn. Instead of individual samples, the fundamental data in meta-learning are the task. Each task Ti includes a dataset DTi, which is further divided into a support set and a query set. The core objective of meta-learning is to optimize a learning strategy (meta-learner) that can rapidly train an effective task-specific predictor f(Θ) using the support set of each task. Specifically, the meta-learner A(T;Φ) can obtain an optimal parameter Φ* across multiple tasks TiTtrain. When encountering a new task TjTtest, the meta-learner A(T;ϕ) uses the support set DTjs to train a task-specific predictor f(θj), which can predict time series well on the query set DTjq. The process of training the predictor f(θj) is expressed as

(3)f(θj)=A(DTjs;Φ*),

where A(DTjs;Φ*) denotes the process of using the optimal meta-learning strategy A(Φ*) to train on the support set DTjs, yielding the task-specific parameters θj.

The optimization objective of meta-learning is to find the optimal meta-learner parameters Φ* such that the task-specific models trained with few support samples can minimize the loss on the query set:  

(4)minΦ*ETip(T)L(DTiq;θi*;Φ)

(5)s.t.θi*:=argminθL(DTis;θi;Φ),i=1,,N,

where Tip(T) denotes tasks sampled from a task distribution p(T). L is the loss function that measures the error between predictions and labels.

In contrast to traditional deep learning, which directly optimizes model parameters θ for a single task, meta-learning optimizes meta-parameters Φ across multiple tasks, enabling the model to quickly adapt and generate effective models for new tasks. The strength of meta-learning lies in its ability to generalize across tasks, making it particularly suited for few-shot learning scenarios. By learning the commonalities across tasks, the meta-learning strategy allows the model to learn quickly and effectively with minimal data when faced with new tasks.

4. The Proposed Method

In this section, we present our proposed MetaTrans-FSTSF framework designed for few-shot time series forecasting based on meta-learning. The overall architecture of MetaTrans-FSTSF is shown in Figure 3. The architecture of our framework comprises two core components: the meta-predictor and the meta-learner. The meta-predictor is a deep-learning-based network that models the temporal dependencies in time series data, while the meta-learner governs the adaptation process across tasks.

4.1. Meta-Learner

In time series forecasting, we focus on training a meta-learner that can efficiently adapt to different forecasting tasks. The meta-learner explicitly simulates real-world data-scarce scenarios by training on a variety of tasks, including those with limited historical data and high variability. This ensures that the model learns to generalize patterns and dependencies from diverse, low-data conditions. The general objective in meta-learning is to optimize the meta-learner’s parameters so that for any new task Tj, the model can quickly learn an optimal task-specific predictor using the support set, leading to minimal loss on the query set.

Our approach leverages the MAML-based algorithm as our meta-learner, which enables rapid adaptation to new tasks with minimal data. While our approach leverages the MAML algorithm for its generalization capability, we advance beyond standard MAML by specifically adapting its meta-training to address the unique challenges of few-shot time series forecasting, particularly in flood prediction scenarios. The key distinction between MAML and other meta-learning methods lies in its optimization-based approach. While metric-based methods focus on learning a similarity function [55] and model-based methods often rely on specialized architectures with meta-parameters [56], MAML directly optimizes the model’s initial parameters for rapid adaptation. In traditional MAML, the goal is to learn an initialization of model parameters θ that can be adapted to new tasks with a few gradient updates. Here is how MAML is adapted for time series forecasting: Given a task Ti, the model parameters θ are updated by performing K iterations of gradient descent on the support set DTis:

(6)θi(k+1)=θi(k)αθL(DTis;θi(k)),k=0,1,,K1,

where θi(0)=θ and α is the predictor learning rate. The parameter K denotes the number of adaptation steps and is a critical hyperparameter in time series forecasting, as different tasks may require different levels of adaptation.

After adapting the parameters θi(K) for each task Ti, the performance of these adapted parameters is evaluated on the query set DTiq. The gradients from this evaluation are used to update the initial parameters θ:

(7)θ*θβθi=1NL(DTiq;θi(K))

where β is the meta-learning rate. Our meta-learner adheres to the general meta-learning paradigm, but it specifically utilizes a non-parametric, gradient-based meta-learner. Consequently, building upon the formulation in Equation (5), the optimization objective for meta-learner in time series forecasting is to find the initialization θ that minimizes the expected loss over the distribution of tasks:  

(8)minθETip(T)L(f(DTiq,input;θi(K));DTiq,label)

(9)s.t.θi(K):=θαk=1KθL(f(DTis,input;θi(k1));DTis,label),i=1,,N,

where f(·) represents the meta-predictor, DTis,input and DTis,label represents the input time series and the corresponding label sequence in the training set, respectively. Figure 4 shows the framework of meta-learner based on MAML.

4.2. Meta-Predictor

In time series forecasting, the task-specific adaptation process can be further tailored by considering the sequential nature of the data. During the inner loop, the model could leverage attention mechanisms to effectively capture the temporal patterns during the gradient updates. Transformers have gained significant attention in time series forecasting due to their ability to capture complex dependencies across different time steps through self-attention mechanisms [57]. In our framework, we design a novel Transformer-based meta-predictor that not only extracts sequential features efficiently but also incorporates task-specific rapid adaptation mechanisms. This innovation enables the model to effectively handle the temporal complexities and variability of flood time series data, setting it apart from standard Transformer applications. The meta-predictor includes an encoder–decoder structure, with the encoder processing the input time series and the decoder generating the forecast, where self-attention layers are used to model temporal dependencies.

This section provides a detailed description of how each component of the Transformer is adapted to time series forecasting based on meta-learning with an emphasis on the role of the meta-learner in initializing parameters and the unique benefits it brings to time series forecasting. As shown in Figure 5, the input embedding layer translates raw time series data into a higher-dimensional space suitable for subsequent layers in the encoder. Embedding is crucial for capturing the intricate patterns present in time series data.

(10)Zt,i=We,metaXt,i+be,meta,

where We,meta and be,meta are initialization parameters provided by the meta-learner. These parameters are optimized to produce embeddings that effectively represent the time series data. To incorporate the temporal features of the time series data, positional encoding is added to the embeddings.

(11)PE(t,2m)=sint100002m/d,

(12)PE(t,2m+1)=cost100002m/d.

PEt encodes the positional information for time step t, where m and d are indices and dimensions of the encoding. The positional encoding vector PEt is then added to the embedded input:  

(13)Zt,i=Zt,i+PEt.

Then, the multi-head self-attention layer enables the meta-predictor to focus on different parts of the time series simultaneously, capturing complex dependencies across various time steps. The self-attention mechanism is defined as

(14)headhi=Attention(Qhi,Khi,Vhi)=softmaxQhiKhidkVhi,

where Qhi=WQ,metaiZt,i, and K=WK,metaiZt,i, V=WV,metaiZt,i. The matrices WQ,metai, WK,metai, and WV,metai are initialized by the meta-learner. The multi-head mechanism then aggregates attention outputs from multiple heads.

(15)MultiHead(Qi,Ki,Vi)=Concat(head1,,headH)WO,metai,

where each head calculates attention independently, and WO,meta is a learnable matrix that combines these outputs. Following attention, the feed-forward neural network (FFN) processes each position’s output. This layer applies non-linear transformations to capture additional complex patterns in the time series data.

(16)FFN(Z)=ReLU(ZW1,metai+b1,metai)W2,metai+b2,metai.

The parameters W1,metai, W2,metai, b1,metai, and b2,metai are also initialized by the meta-learner. FFN layer enhances the ability of the meta-predictor to learn complex mappings from input features to output.

To preventing the vanishing or exploding gradient problem, layer normalization and residual connections are used. These techniques ensure that each layer’s output maintains a consistent scale and aids in the convergence of the model.

(17)lt,i=LN(Zt,i+MultiHead(Qi,Ki,Vi)),

(18)Yt,i=LN(lt,i+FFN(lt,i)).

Thus, the parameters of encoder θienc={We,metai,enc;be,metai,enc;WQ,metai,enc;WK,metai,enc;WV,metai,enc;WO,metai,enc;W1,metai,enc;b1,metai,enc;W2,metai,enc;b2,metai,enc}. The decoder in the meta-predictor architecture is responsible for generating predictions based on the encoded sequence and prior outputs. Thus, the task-specific parameters are θi={θienc,θidec}. Finally, the output layer maps the final hidden representations to the prediction space, providing the forecasted values for future time steps:  

(19)Y^t,i=Decoder(Yt,i;θidec).

In the proposed MetaTrans-FSTSF, various parameters can be updated through gradients during the training process. These parameters are distributed across different layers of the Transformer architecture and are initialized by the meta-learner as part of the MAML framework.

This comprehensive parameter set θ encompasses all the learnable parameters in the MetaTrans-FSTSF, which are updated during training through gradient-based optimization. The meta-learner initializes the parameters of the meta-predictor, while the meta-predictor updates the parameters for each task according to Equation (6). Algorithm 1 summarized the training algorithm for MetaTrans-FSTSF.

Algorithm 1 Meta Transformer for Few-Shot Time Series Forecasting (MetaTrans-FSTSF)
  • Require: 

    Tmeta-tr: Set of training tasks; Tmeta-val: Set of validation tasks; Tmeta-te: Set of test tasks; α: Meta-predictor learning rate; β: Meta-learner learning rate; K: Number of inner update iterations.

  • 1:

    Initialize meta-predictor parameters θ

  • 2:

    for each episode in meta-training phase do

  • 3:

        Sample batch of tasks {Ti}i=1NTmeta-tr

  • 4:

        for each task Ti do

  • 5:

            Split task data into support set DTis and query set DTiq

  • 6:

            Initialize task-specific parameters θi(0)θ

  • 7:

            for k=0 to K1 do

  • 8:

               Compute loss L(DTis;θi(k))

  • 9:

               Update task-specific parameters:

  • 10:

                   θi(k+1)=θi(k)αθi(k)L(DTis;θi(k))

  • 11:

            end for

  • 12:

            Compute query set loss L(DTiq;θi(K))

  • 13:

        end for

  • 14:

        Update meta-parameters θ using the query set losses:

  • 15:

                   θθβθi=1NL(DTiq;θi(K))

  • 16:

    end for

  • 17:

    Evaluate meta-learner on Tmeta-val to fine-tune hyperparameters

  • 18:

    for each task TjTmeta-te do

  • 19:

        Split task data into support set DTjs and query set DTjq

  • 20:

        Initialize task-specific parameters θj(0)θ

  • 21:

        for k=0 to K1 do

  • 22:

            Fine-tune parameters:

  • 23:

                   θj(k+1)=θj(k)αθj(k)L(DTjs;θj(k))

  • 24:

        end for

  • 25:

        Evaluate on query set DTjq with final parameters θj(K)

  • 26:

    end for

5. Experiments

5.1. Datasets

We aim to develop a model for flood prediction, particularly for small catchment areas in China, where data are often scarce for deep learning models. To address this challenge, we collected time series data from various domains to create a comprehensive dataset for deep meta-learning. Each dataset represents a different time series forecasting problem with varying sample sizes, time stamps, and features. By training our meta-learner on these diverse datasets, the model can quickly adapt and learn effectively from small sample sizes.

In total, we used 30 datasets for this purpose. Twenty of these datasets were sourced from the UCR [58], which provides a wide variety of time series tasks. Additionally, we selected three datasets from the Informer [59], specifically designed for long-term forecasting, and six datasets from the Monash [60], which is known for its diverse TSF challenges. To ensure the practicality of our framework, we utilized hydrological and meteorological datasets from a small catchment area in Wuyuan, China, encompassing hourly records of precipitation, water level, and flow rate from distributed sensors. The dataset is collected in Shangrao City, Jiangxi Province, China. The geographical location is 117°22′ to 118°11′E and 29°01′ to 29°35′N. The total land area is 2947 km2, consisting mainly of low hills and mountains. Mountains and hills account for more than 83% of the total area. Due to the mountain rivers’ slope and the rapid confluence rate, Wuyuan is prone to flash floods and geological disasters. The monsoon season is from April to June, with a monthly rainfall of 200 mm to 300 mm, accounting for 47.9% of the annual rainfall. There are 35 meteorological stations in the basin to collect precipitation. The distribution of the basin with hydrological and meteorological stations is shown in Figure 6.

The dataset spans multiple flood events, offering a diverse yet sparse time series for testing. These datasets present challenges such as data sparsity, variability in temporal patterns, and non-stationarity, reflecting real-world flood forecasting scenarios.

All datasets were preprocessed to ensure compatibility with our meta-learning framework. This preprocessing included normalization and random splitting into Dmeta-tr, Dmeta-val, and Dmeta-telen. The lengths of these sets, representing the number of tasks, are |Dmeta-tr| =21, |Dmeta-val| =7, and |Dmeta-te| =2. Each task consisted of Ns=15 support samples and Nq=5 query samples, totaling 20 samples per task. We set the history steps lhis to 72 h and the forecasting steps lpre to 1 h, 3 h, 6 h, and 12 h to evaluate the model’s performance across different prediction intervals.

5.2. Evaluation Metrics

To evaluate the performance of our deep meta-learning model on the TSF tasks, we employ several standard metrics commonly used in regression analysis, including Mean Absolute Error (MAE), Mean Squared Error (MSE), and Symmetric Mean Absolute Percentage Error (SMAPE). These metrics are essential for assessing the accuracy and robustness of the model especially in few-shot scenarios where data are scarce. MAE captures the average magnitude of errors without considering their direction, offering a robust measure against outliers. MSE provides a measure sensitive to large errors by computing the square root of the average squared differences, and SMAPE offers an indication of prediction accuracy in percentage terms.

These metrics were computed according to the following equations and provide a comprehensive assessment of model performance across different aspects of the TSF tasks.

  • Mean Absolute Error (MAE):

    (20)MAE=1|DTis|(x(j),y(j))DTis|fx(j);θy(j)|.

  • Mean Squared Error (MSE):

    (21)MSE=1|DTis|(x(j),y(j))DTisfx(j);θy(j)2.

  • Symmetric Mean Absolute Percentage Error (SMAPE):

(22)SMAPE=200%|DTis|(x(j),y(j))DTis|fx(j);θy(j)||fx(j);θ|+|yi|.

where x(j), y(j) are the features and labels in task Ti. These metrics provide a comprehensive evaluation of both the accuracy and robustness of the models under different conditions.

5.3. Baseline Methods

We compare our model with the following baseline methods. The benchmark models are used to predict the water level for the next 1-step, 3-step, 6-step, and 12-step.

  • Multitask [61]: Extends few-shot learning to diverse multimodal tasks by quantifying knowledge transfer between different tasks and leveraging hard parameter sharing.

  • HETMET [62]: Meta-learning model that handles tasks with varying attribute spaces by inferring latent representations from a few labeled instances.

  • TIMEHETMET [63]: Utilizing permutation-invariant deep set-blocks with temporal embeddings to handle heterogeneous multivariate data for few-shot time series forecasting.

  • Meta-LSTM [64]: Long Short-Term Memory network using a generalizable meta-model that can be quickly adapted to new applications.

  • Meta-GRU [65]: Graph Convolutional Networks leverage a generalizable meta-model.

5.4. Implementation Details

We conduct our experiments using PyTorch 2.3.0 on an NVIDIA GeForce RTX 4090 GPU (NVIDIA, Santa Clara, CA, USA), and the programming language is Python 3.8. For the meta-learning framework, we used the Adam optimizer with a meta-learning rate of 0.001, ensuring effective parameter updates during the meta-training phase. The predictor was optimized using SGD with a learning rate of 0.4, facilitating rapid convergence during task-specific adaptation. The training was conducted with 1000 epochs with early stopping implemented to prevent overfitting. The number of update steps K in the inner loop is 5. All hyperparameters were optimized via grid search. A summary of the hyperparameters used in all approaches is provided in Table 1.

5.5. Results

Table 2 summarizes the results of our proposed model and other baseline methods in few-shot time series forecasting. Compared to Meta-GRU and Meta-LSTM, our model achieves a significantly lower MAE of 0.1411, representing a reduction of approximately 16% and 19%, respectively. Similarly, the MSE of 0.0328 achieved by our model is nearly 44% lower than that of Meta-LSTM and 48% lower than Meta-GRU. The SMAPE metric also shows a substantial improvement, with our model achieving 4.4147%, which is about 25% lower than Meta-LSTM and 28% lower than Meta-GRU. To provide a more intuitive understanding of the results, we have now created a histogram (see Figure 7) based on the data from Table 2. This visualization clearly demonstrates that our proposed model significantly outperforms the baseline models across all metrics. Our model leverages self-attention mechanisms, allowing it to weigh the importance of each time step in a sequence, which is particularly advantageous in few-shot learning scenarios, as it allows the model to focus on the most relevant parts of the input data, leading to better generalization and more accurate predictions. When compared to the multitask model, our model shows an 8% lower MAE and a 40% lower MSE. While the multitask model is trained to perform well across a broad set of tasks, it does not adapt specifically to new tasks as effectively as a meta-learning model. The MetaTrans-FSTSF, however, is designed to quickly fine-tune its parameters for each specific task using the support set, leading to more tailored and accurate predictions.

Few-Shot Learning Evaluation: To evaluate the ability of the model to new tasks with limited data, we varied the number of samples in the support set Ns across different few-shot settings: 5, 10, 15, and 20. The results are shown in Figure 8.

At Ns=5, the model achieves competitive accuracy, confirming the superior design of our MetaTrans-FSTSF framework for few-shot learning. The consistent decrease in MAE with increasing support samples highlights the model’s ability to generalize effectively even with minimal data. This capability is rooted in our innovative meta-predictor design, which incorporates meta-initialization and task-specific fine-tuning mechanisms, setting a new benchmark in data-scarce time series forecasting. As more support data are provided, the adaptation ability improves substantially, leading to lower error rates. Specifically, the MAE decreases from 0.1411 to 0.1285 as the support set size increases from Ns=5 to Ns=20. This trend indicates that the MetaTrans-FSTSF becomes more accurate in its predictions with larger support sets because more samples allow the model to better capture the underlying temporal patterns within the time series. These results demonstrate that the Meta-Transformer can quickly adapt to new tasks with varying levels of data availability. The consistent decrease in all error metrics as Ns increases suggests that the model is not only learning effectively from the support set but is also efficiently transferring this learning to new query samples.

Task Adaptation Efficiency: To evaluate the adaptation efficiency of our proposed framework with varying numbers of gradient updates in the Meta-Predictor, we conducted experiments with 1-step, 5-step, and 10-step updates. The results are shown in Figure 9.

With only one gradient update, the model achieves an MSE of 0.1189, which is higher compared to scenarios with more updates. This indicates that a single update is often insufficient for the model to effectively adapt to new tasks. As K increases to 5, the MSE decreases to 0.0328, which shows that more updates facilitate better task-specific adaptation. Further increasing to 10 gradient updates results in an MAE of 0.1089, reflecting a continued improvement in performance. The results show that increasing the number of gradient updates K leads to better model accuracy. This suggests that more gradient updates provide the model with more opportunities to refine its parameters, resulting in improved prediction quality. While more updates lead to better performance, they also increase computational time and resources. Therefore, it is essential to balance the number of updates with practical constraints such as training time and computational cost [66].

Hyperparameter Sensitivity Analysis: We conducted experiments with different hyperparameters to evaluate how the balance between the meta-predictor learning rate (β) and the meta-learner learning rate (α) affects the generalization in few-shot learning scenarios.

As shown in Table 3, the model performs best when the meta-predictor learning rate β is greater than the meta-learner learning rate α, where the MAE is reduced to 0.1411, compared to 0.8414 when α>β. This suggests that a higher inner learning rate allows for more effective adaptation to specific tasks, leading to improved accuracy even with limited data. Additionally, the results shown in Figure 10 demonstrate that the model reaches lower loss values more quickly under the α<β setting, further highlighting the benefits of this learning rate configuration. These findings underscore the importance of carefully tuning learning rates to balance rapid task adaptation with stable cross-task performance, which is crucial for optimizing few-shot learning in time series forecasting tasks.

Visualization: We visualized the prediction results for five query time series samples across various lengths of prediction for both flood predictions. The results for flood prediction, shown in Table 4, reveal that the accuracy of the model decreases as the length of prediction increases. For shorter lengths of prediction, such as 1 h, the predictions closely match the actual values shown in Figure 11. However, with a longer length of prediction, such as 6 h, the discrepancies between predicted and actual values become more pronounced, reflecting increased uncertainty. The performance deteriorates with extended forecasting horizons, indicating that while the model performs well with short-term predictions, its accuracy diminishes with longer-term forecasts.

When implementing MetaTrans-FSTSF for time series forecasting, it is essential to experiment with different values of K to determine the optimal trade-off between rapid adaptation and stability. Additionally, the choice of meta-learner and meta-predictor learning rates (α and β) should be carefully tuned to balance convergence speed and generalization across tasks. By focusing on the initialization of parameters and their rapid adaptation through gradient descent, MAML provides a flexible and effective framework for time series forecasting, especially in few-shot scenarios where data are limited and task-specific fine-tuning is critical.

Despite the promising results, several limitations of the proposed model must be acknowledged. One notable challenge is the issue of data heterogeneity, as the model may struggle to generalize effectively when applied to regions with vastly different climatic or hydrological conditions. This limitation underscores the need for further refinement in transferring learned knowledge across diverse environments. Moreover, the reliance on Transformer-based architectures, although effective, introduces complexity that can hinder interpretability. The lack of transparency in the decision-making process may pose challenges in critical applications such as flood forecasting, where stakeholders require clear explanations for predictions.

To overcome these challenges, future work could focus on improving model interpretability by integrating attention visualization techniques or explainable AI frameworks, which would enhance trust and usability among decision-makers. Additionally, developing adaptive mechanisms that leverage domain adaptation or region-specific fine-tuning could better tailor the model to the unique characteristics of different geographical areas. Beyond flood prediction, the framework could be extended to other spatiotemporal prediction tasks, including traffic forecasting, ecological monitoring, and human activity analysis, thereby broadening its applicability and impact.

6. Conclusions

In this study, we proposed MetaTrans-FSTSF, which is a transformative meta-learning framework that reimagines few-shot time series forecasting. By seamlessly integrating MAML and an enhanced Transformer-based architecture, we address critical challenges such as data scarcity and complex temporal dependencies in flood prediction. Our framework not only achieves state-of-the-art performance but also sets a foundation for future exploration in meta-learning-based time series analysis. By reformulating flood forecasting as a few-shot learning problem, we effectively address the critical challenge of limited historical data, which is essential for accurate and reliable predictions. Our framework integrates a Transformer-based meta-predictor that harnesses the capabilities of the model to capture complex temporal patterns, thereby enhancing predictive performance. The experimental results on various real-world datasets reveal that MetaTrans-FSTSF outperforms several state-of-the-art methods with notable improvements in forecasting accuracy and generalization capability. Specifically, our method demonstrates an average reduction in MAE by 16%, 19%, and 8% compared to baseline methods and shows robustness in few-shot flood forecasting. These findings highlight the potential of meta-learning techniques to advance flood forecasting in data-constrained environments. For future work, we plan to explore the interpretability of meta-learning across different regional flood prediction tasks, aiming to better understand how meta-knowledge is transferred between diverse geographic areas and hydrological conditions. In addition, the methodology from this paper will also be introduced into different application scenarios with spatial and temporal sequence predictions, such as transportation traffic forecasting [67], communication flow prediction [68], human behaviors deduction [69], etc.

Author Contributions

Methodology, C.C. and S.D.; Software, J.J. and W.L.; Validation, A.L. and Q.P.; Formal analysis, Q.P.; Investigation, Q.P.; Resources, C.C., H.L. and W.L.; Data curation, A.L. and H.L.; Writing—original draft, J.J.; Writing—review & editing, C.C. and S.D.; Project administration, C.C. and S.D. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

Authors Huimin Li and Wan Li were employed by the company The Goldenwater Information Technology Development Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables
View Image - Figure 1. Trade-offs in time series forecasting models and hydrological station distribution. The area of each model represents the size of the dataset required for training it. A larger area indicates that the model requires a larger dataset.

Figure 1. Trade-offs in time series forecasting models and hydrological station distribution. The area of each model represents the size of the dataset required for training it. A larger area indicates that the model requires a larger dataset.

View Image - Figure 2. Example of meta-learning in few-shot time series forecasting.

Figure 2. Example of meta-learning in few-shot time series forecasting.

View Image - Figure 3. The overall architecture of the proposed MetaTrans-FSTSF. The meta-learner can find an optimal meta-predictor for each task.

Figure 3. The overall architecture of the proposed MetaTrans-FSTSF. The meta-learner can find an optimal meta-predictor for each task.

View Image - Figure 4. The MAML-based meta-learner framework. The step sizes for the inner update and the meta update are the predictor-learning rate, denoted as [Forumla omitted. See PDF.], and the meta-learning rate, denoted as [Forumla omitted. See PDF.], respectively.

Figure 4. The MAML-based meta-learner framework. The step sizes for the inner update and the meta update are the predictor-learning rate, denoted as [Forumla omitted. See PDF.], and the meta-learning rate, denoted as [Forumla omitted. See PDF.], respectively.

View Image - Figure 5. The structure of the meta-predictor.

Figure 5. The structure of the meta-predictor.

View Image - Figure 6. The distribution of the basin with hydrological and meteorological stations.

Figure 6. The distribution of the basin with hydrological and meteorological stations.

View Image - Figure 7. Performance comparison of different baseline methods.

Figure 7. Performance comparison of different baseline methods.

View Image - Figure 8. Performance metrics with varying [Forumla omitted. See PDF.] in few-shot learning.

Figure 8. Performance metrics with varying [Forumla omitted. See PDF.] in few-shot learning.

View Image - Figure 9. Performance metrics with varying numbers of gradient updates K in the Meta-Predictor.

Figure 9. Performance metrics with varying numbers of gradient updates K in the Meta-Predictor.

View Image - Figure 10. Traning loss for different learning rates.

Figure 10. Traning loss for different learning rates.

View Image - Figure 11. The visualization of five samples for different lengths of prediction.

Figure 11. The visualization of five samples for different lengths of prediction.

Summary of hyperparameters.

Hyperparameter Value
N 30
| D meta-tr | 21
| D meta-val | 7
| D meta-te | 2
N s 5
N q 5
l h i s 72
l p r e 1/3/6/12
α 0.0001
β 0.0004
K 5
Epochs 1000
meta-optimizer Adam
predictor optimizer SGD

The results of different baseline methods. Bold values represent the best performance.

Model MAE MSE SMAPE (%)
Multitask 0.1534 ± 0.0023 0.0437 ± 0.0012 52.1464 ± 0.1243
HetNet 0.1591 ± 0.0019 0.0489 ± 0.0016 56.5227 ± 0.1098
TimeHetNet 0.1628 ± 0.0027 0.0516 ± 0.0021 53.9869 ± 0.1345
Meta-LSTM 0.1685 ± 0.0031 0.0552 ± 0.0029 59.1241 ± 0.1567
meta-GRU 0.1743 ± 0.0035 0.0594 ± 0.0033 61.2452 ± 0.1762
Ours 0.1411 ± 0.0018 0.0328 ± 0.0010 44.1472 ± 0.0987

Performance metrics for different α and β.

Learning Rate Configuration MAE MSE SMAPE (%)
( α = 0.0001 < β = 0.0004 ) 0.1411 0.0328 44.1472
( α = β = 0.0004 ) 0.1604 0.0401 58.8443
( α = 0.001 > β = 0.0004 ) 0.8414 1.2124 159.1874

Performance metrics with different lengths of prediction.

Length of Prediction (lpre) MAE MSE SMAPE (%)
T + 1 0.1411 0.0328 44.1472
T + 3 0.2212 0.1421 50.0933
T + 6 0.2603 0.2129 52.9727
T + 12 0.3041 0.2244 64.1454

References

1. Tabari, H. Climate change impact on flood and extreme precipitation increases with water availability. Sci. Rep.; 2020; 10, 13768. [DOI: https://dx.doi.org/10.1038/s41598-020-70816-2] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/32792563]

2. Tang, Z.; Wang, P.; Li, Y.; Sheng, Y.; Wang, B.; Popovych, N.; Hu, T. Contributions of climate change and urbanization to urban flood hazard changes in China’s 293 major cities since 1980. J. Environ. Manag.; 2024; 353, 120113. [DOI: https://dx.doi.org/10.1016/j.jenvman.2024.120113] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/38286069]

3. Jiang, J.; Chen, C.; Zhou, Y.; Berretti, S.; Liu, L.; Pei, Q.; Zhou, J.; Wan, S. Heterogeneous dynamic graph convolutional networks for enhanced spatiotemporal flood forecasting by remote sensing. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.; 2024; 17, pp. 3108-3122. [DOI: https://dx.doi.org/10.1109/JSTARS.2023.3349162]

4. Bhat, J.R.; Alqahtani, S.A. 6G ecosystem: Current status and future perspective. IEEE Access; 2021; 9, pp. 43134-43167. [DOI: https://dx.doi.org/10.1109/ACCESS.2021.3054833]

5. Chen, C.; Wang, W.; Liu, Z.; Wang, Z.; Li, C.; Lu, H.; Pei, Q.; Wan, S. RLFN-VRA: Reinforcement Learning-based Flexible Numerology V2V Resource Allocation for 5G NR V2X Networks. IEEE Trans. Intell. Veh.; 2024; pp. 1-11. [DOI: https://dx.doi.org/10.1109/TIV.2024.3427399]

6. Shahra, E.Q.; Wu, W. Water contaminants detection using sensor placement approach in smart water networks. J. Ambient. Intell. Humaniz. Comput.; 2023; 14, pp. 4971-4986. [DOI: https://dx.doi.org/10.1007/s12652-020-02262-x]

7. Ramos, H.M.; Kuriqi, A.; Besharat, M.; Creaco, E.; Tasca, E.; Coronado-Hernández, O.E.; Pienika, R.; Iglesias-Rey, P. Smart water grids and digital twin for the management of system efficiency in water distribution networks. Water; 2023; 15, 1129. [DOI: https://dx.doi.org/10.3390/w15061129]

8. Jan, F.; Min-Allah, N.; Düştegör, D. Iot based smart water quality monitoring: Recent techniques, trends and challenges for domestic applications. Water; 2021; 13, 1729. [DOI: https://dx.doi.org/10.3390/w13131729]

9. Chen, C.; Si, J.; Li, H.; Han, W.; Kumar, N.; Berretti, S.; Wan, S. A High Stability Clustering Scheme for the Internet of Vehicles. IEEE Trans. Netw. Serv. Manag.; 2024; 21, pp. 4297-4311. [DOI: https://dx.doi.org/10.1109/TNSM.2024.3390117]

10. Samikwa, E.; Voigt, T.; Eriksson, J. Flood prediction using IoT and artificial neural networks with edge computing. Proceedings of the 2020 International Conferences on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics); Rhodes Island, Greece, 2–6 November 2020; pp. 234-240.

11. Shi, J.; Jain, M.; Narasimhan, G. Time series forecasting (tsf) using various deep learning models. arXiv; 2022; arXiv: 2204.11115

12. Kao, I.F.; Zhou, Y.; Chang, L.C.; Chang, F.J. Exploring a Long Short-Term Memory based Encoder-Decoder framework for multi-step-ahead flood forecasting. J. Hydrol.; 2020; 583, 124631. [DOI: https://dx.doi.org/10.1016/j.jhydrol.2020.124631]

13. Alfieri, L.; Burek, P.; Dutra, E.; Krzeminski, B.; Muraro, D.; Thielen, J.; Pappenberger, F. GloFAS–global ensemble streamflow forecasting and flood early warning. Hydrol. Earth Syst. Sci.; 2013; 17, pp. 1161-1175. [DOI: https://dx.doi.org/10.5194/hess-17-1161-2013]

14. Nevo, S.; Morin, E.; Gerzi Rosenthal, A.; Metzger, A.; Barshai, C.; Weitzner, D.; Voloshin, D.; Kratzert, F.; Elidan, G.; Dror, G. et al. Flood forecasting with machine learning models in an operational framework. Hydrol. Earth Syst. Sci.; 2022; 26, pp. 4013-4032. [DOI: https://dx.doi.org/10.5194/hess-26-4013-2022]

15. Liu, Y.; Wu, H.; Wang, J.; Long, M. Non-stationary transformers: Exploring the stationarity in time series forecasting. Adv. Neural Inf. Process. Syst.; 2022; 35, pp. 9881-9893.

16. Jensen, V.; Bianchi, F.M.; Anfinsen, S.N. Ensemble conformalized quantile regression for probabilistic time series forecasting. IEEE Trans. Neural Netw. Learn. Syst.; 2022; 35, pp. 9014-9025. [DOI: https://dx.doi.org/10.1109/TNNLS.2022.3217694]

17. Wang, Y.; Yao, Q.; Kwok, J.T.; Ni, L.M. Generalizing from a few examples: A survey on few-shot learning. ACM Comput. Surv. (CSUR); 2020; 53, pp. 1-34. [DOI: https://dx.doi.org/10.1145/3386252]

18. Zhao, A.; Balakrishnan, G.; Durand, F.; Guttag, J.V.; Dalca, A.V. Data augmentation using learned transformations for one-shot medical image segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; Long Beach, CA, USA, 15–20 June 2019; pp. 8543-8553.

19. Zhou, J.; Zheng, Y.; Tang, J.; Li, J.; Yang, Z. Flipda: Effective and robust data augmentation for few-shot learning. arXiv; 2021; arXiv: 2108.06332

20. Zhong, Z.; Zheng, L.; Kang, G.; Li, S.; Yang, Y. Random erasing data augmentation. Proceedings of the AAAI Conference on Artificial Intelligence; New York, NY, USA, 7–12 February 2020; Volume 34, pp. 13001-13008.

21. Xie, Q.; Luong, M.T.; Hovy, E.; Le, Q.V. Self-training with noisy student improves imagenet classification. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; Seattle, WA, USA, 13–19 June 2020; pp. 10687-10698.

22. Zhou, K.; Liu, Z.; Qiao, Y.; Xiang, T.; Loy, C.C. Domain generalization: A survey. IEEE Trans. Pattern Anal. Mach. Intell.; 2022; 45, pp. 4396-4415. [DOI: https://dx.doi.org/10.1109/TPAMI.2022.3195549]

23. Peng, Z.; Li, Z.; Zhang, J.; Li, Y.; Qi, G.J.; Tang, J. Few-shot image recognition with knowledge transfer. Proceedings of the IEEE/CVF International Conference on Computer Vision; Seoul, Republic of Korea, 27 October–2 November 2019; pp. 441-449.

24. Li, W.; Wang, Z.; Wang, Y.; Wu, J.; Wang, J.; Jia, Y.; Gui, G. Classification of high-spatial-resolution remote sensing scenes method using transfer learning and deep convolutional neural network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.; 2020; 13, pp. 1986-1995. [DOI: https://dx.doi.org/10.1109/JSTARS.2020.2988477]

25. Wu, Y.; Schuster, M.; Chen, Z.; Le, Q.V.; Norouzi, M.; Macherey, W.; Krikun, M.; Cao, Y.; Gao, Q.; Macherey, K. et al. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv; 2016; arXiv: 1609.08144

26. Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. Proceedings of the International Conference on Machine Learning (PMLR); Sydney, Australia, 6–11 August 2017; pp. 1126-1135.

27. Vinyals, O.; Blundell, C.; Lillicrap, T.; Kavukcuoglu, K.; Wierstra, D. Matching networks for one shot learning. Proceedings of the 30th International Conference on Neural Information Processing Systems; Barcelona, Spain, 5–10 December 2016; Volume 29.

28. Cai, K.; He, J.; Li, Q.; Shangguan, W.; Li, L.; Hu, H. Meta-LSTM in hydrology: Advancing runoff predictions through model-agnostic meta-learning. J. Hydrol.; 2024; 639, 131521. [DOI: https://dx.doi.org/10.1016/j.jhydrol.2024.131521]

29. Sayari, S.; Meymand, A.M.; Aldallal, A.; Zounemat-Kermani, M. Meta-learner methods in forecasting regulated and natural river flow. Arab. J. Geosci.; 2022; 15, 1051. [DOI: https://dx.doi.org/10.1007/s12517-022-10274-4]

30. Mao, J.; Yun, O.; Kim, H.; Chang, H.; Sun, X. MeWP: Meta-learning based Water-Level Prediction. Proceedings of the 2022 IEEE International Conference on Big Data (Big Data); Osaka, Japan, 17–20 December 2022; pp. 1886-1891.

31. Chen, Z.; Wu, M.; Zhao, R.; Guretno, F.; Yan, R.; Li, X. Machine remaining useful life prediction via an attention-based deep learning approach. IEEE Trans. Ind. Electron.; 2020; 68, pp. 2521-2531. [DOI: https://dx.doi.org/10.1109/TIE.2020.2972443]

32. Zheng, H.; Lin, F.; Feng, X.; Chen, Y. A hybrid deep learning model with attention-based conv-LSTM networks for short-term traffic flow prediction. IEEE Trans. Intell. Transp. Syst.; 2020; 22, pp. 6910-6920. [DOI: https://dx.doi.org/10.1109/TITS.2020.2997352]

33. Yi, K.; Zhang, Q.; Fan, W.; Wang, S.; Wang, P.; He, H.; An, N.; Lian, D.; Cao, L.; Niu, Z. Frequency-domain MLPs are more effective learners in time series forecasting. arXiv; 2024; arXiv: 2311.06184

34. Yi, K.; Zhang, Q.; Fan, W.; He, H.; Hu, L.; Wang, P.; An, N.; Cao, L.; Niu, Z. FourierGNN: Rethinking multivariate time series forecasting from a pure graph perspective. arXiv; 2024; arXiv: 2311.06190

35. Maharana, K.; Mondal, S.; Nemade, B. A review: Data pre-processing and data augmentation techniques. Glob. Transit. Proc.; 2022; 3, pp. 91-99. [DOI: https://dx.doi.org/10.1016/j.gltp.2022.04.020]

36. Ding, Y.; Yu, X.; Yang, Y. Modeling the probabilistic distribution of unlabeled data for one-shot medical image segmentation. Proceedings of the AAAI Conference on Artificial Intelligence; Philadelphia, PA, USA, 2–9 February 2021; Volume 35, pp. 1246-1254.

37. Li, J.; Wang, Z.; Hu, X. Learning intact features by erasing-inpainting for few-shot classification. Proceedings of the AAAI Conference on Artificial Intelligence; Philadelphia, PA, USA, 2–9 February 2021; Volume 35, pp. 8401-8409.

38. Hu, R.; Ruan, G.; Xiang, S.; Huang, M.; Liang, Q.; Li, J. Automated diagnosis of covid-19 using deep learning and data augmentation on chest ct. medRxiv; 2020; 2020, 4.

39. Karras, T.; Aittala, M.; Hellsten, J.; Laine, S.; Lehtinen, J.; Aila, T. Training generative adversarial networks with limited data. Adv. Neural Inf. Process. Syst.; 2020; 33, pp. 12104-12114.

40. Li, W.; Chen, J.; Cao, J.; Ma, C.; Wang, J.; Cui, X.; Chen, P. EID-GAN: Generative adversarial nets for extremely imbalanced data augmentation. IEEE Trans. Ind. Inform.; 2022; 19, pp. 3208-3218. [DOI: https://dx.doi.org/10.1109/TII.2022.3182781]

41. Zhou, X.; Hu, Y.; Wu, J.; Liang, W.; Ma, J.; Jin, Q. Distribution bias aware collaborative generative adversarial network for imbalanced deep learning in industrial IoT. IEEE Trans. Ind. Inform.; 2022; 19, pp. 570-580. [DOI: https://dx.doi.org/10.1109/TII.2022.3170149]

42. Choi, J.; Kim, T.; Kim, C. Self-ensembling with gan-based data augmentation for domain adaptation in semantic segmentation. Proceedings of the IEEE/CVF International Conference on Computer Vision; Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6830-6840.

43. Shen, Z.; Liu, Z.; Qin, J.; Savvides, M.; Cheng, K.T. Partial is better than all: Revisiting fine-tuning strategy for few-shot learning. Proceedings of the AAAI Conference on Artificial Intelligence; Philadelphia, PA, USA, 2–9 February 2021; Volume 35, pp. 9594-9602.

44. Zhang, T.; Wu, F.; Katiyar, A.; Weinberger, K.Q.; Artzi, Y. Revisiting few-sample BERT fine-tuning. arXiv; 2020; arXiv: 2006.05987

45. Gu, Y.; Han, X.; Liu, Z.; Huang, M. Ppt: Pre-trained prompt tuning for few-shot learning. arXiv; 2021; arXiv: 2109.04332

46. Wang, Y.; Yan, J.; Ye, X.; Jing, Q.; Wang, J.; Geng, Y. Few-shot transfer learning with attention mechanism for high-voltage circuit breaker fault diagnosis. IEEE Trans. Ind. Appl.; 2022; 58, pp. 3353-3360. [DOI: https://dx.doi.org/10.1109/TIA.2022.3159617]

47. Ganesha, H.; Gupta, R.; Gupta, S.H.; Rajan, S. Few-shot transfer learning for wearable IMU-based human activity recognition. Neural Comput. Appl.; 2024; 36, pp. 10811-10823. [DOI: https://dx.doi.org/10.1007/s00521-024-09645-7]

48. He, Q.Q.; Pang, P.C.I.; Si, Y.W. Transfer Learning for Financial Time Series Forecasting. Proceedings of the Pacific Rim International Conference on Artificial Intelligence; Yanuca Island, Cuvu, Fiji, 26–30 August 2019.

49. Lackinger, A.; Morichetta, A.; Dustdar, S. Time Series Predictions for Cloud Workloads: A Comprehensive Evaluation. Proceedings of the 2024 IEEE International Conference on Service-Oriented System Engineering (SOSE); Shanghai, China, 15–18 July 2024.

50. Xu, Y.; Lin, K.; Hu, C.; Wang, S.; Wu, Q.; Zhang, L.; Ran, G. Deep transfer learning based on transformer or flood forecasting in data-sparse basins. J. Hydrol.; 2023; 625, 129956. [DOI: https://dx.doi.org/10.1016/j.jhydrol.2023.129956]

51. Kimura, N.; Yoshinaga, I.; Sekijima, K.; Azechi, I.; Baba, D. Convolutional Neural Network Coupled with a Transfer-Learning Approach for Time-Series Flood Predictions. Water; 2020; 12, 96. [DOI: https://dx.doi.org/10.3390/w12010096]

52. Tian, P.; Wu, Z.; Qi, L.; Wang, L.; Shi, Y.; Gao, Y. Differentiable meta-learning model for few-shot semantic segmentation. Proceedings of the AAAI Conference on Artificial Intelligence; New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12087-12094.

53. Oreshkin, B.N.; Carpov, D.; Chapados, N.; Bengio, Y. Meta-learning framework with applications to zero-shot time-series forecasting. Proceedings of the AAAI Conference on Artificial Intelligence; Philadelphia, PA, USA, 2–9 February 2021; Volume 35, pp. 9242-9250.

54. Narwariya, J.; Malhotra, P.; Vig, L.; Shroff, G.; Vishnu, T.V. Meta-Learning for Few-Shot Time Series Classification. Proceedings of the 7th ACM IKDD CoDS and 25th COMAD; New York, NY, USA, 5–7 January 2020; pp. 28-36. [DOI: https://dx.doi.org/10.1145/3371158.3371162]

55. Chen, J.; Zhan, L.M.; Wu, X.M.; Chung, F.l. Variational metric scaling for metric-based meta-learning. Proceedings of the AAAI Conference on Artificial Intelligence; New York, NY, USA, 7–12 February 2020; Volume 34, pp. 3478-3485.

56. Yoon, J.; Kim, T.; Dia, O.; Kim, S.; Bengio, Y.; Ahn, S. Bayesian model-agnostic meta-learning. Adv. Neural Inf. Process. Syst.; 2018; 31, pp. 7343-7353.

57. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst.; 2017; 30, pp. 6000-6010.

58. Dau, H.A.; Bagnall, A.; Kamgar, K.; Yeh, C.C.M.; Zhu, Y.; Gharghabi, S.; Ratanamahatana, C.A.; Keogh, E. The UCR time series archive. IEEE/CAA J. Autom. Sin.; 2019; 6, pp. 1293-1305. [DOI: https://dx.doi.org/10.1109/JAS.2019.1911747]

59. Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. Proceedings of the AAAI Conference on Artificial Intelligence; Philadelphia, PA, USA, 2–9 February 2021; Volume 35, pp. 11106-11115.

60. Godahewa, R.; Bergmeir, C.; Webb, G.I.; Hyndman, R.J.; Montero-Manso, P. Monash time series forecasting archive. arXiv; 2021; arXiv: 2105.06643

61. Abdollahzadeh, M.; Malekzadeh, T.; Cheung, N.M.M. Revisit multimodal meta-learning through the lens of multi-task learning. Adv. Neural Inf. Process. Syst.; 2021; 34, pp. 14632-14644.

62. Iwata, T.; Kumagai, A. Meta-learning from tasks with heterogeneous attribute spaces. Adv. Neural Inf. Process. Syst.; 2020; 33, pp. 6053-6063.

63. Brinkmeyer, L.; Drumond, R.R.; Burchert, J.; Schmidt-Thieme, L. Few-shot forecasting of time-series with heterogeneous channels. Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Grenoble, France, 19–23 September 2022; pp. 3-18.

64. Srivastava, A.; Wang, T.Y.; Zhang, P.; De Rose, C.A.F.; Kannan, R.; Prasanna, V.K. Memmap: Compact and generalizable meta-lstm models for memory access prediction. Proceedings of the Advances in Knowledge Discovery and Data Mining: 24th Pacific-Asia Conference, PAKDD 2020; Singapore, 11–14 May 2020; pp. 57-68.

65. Zhang, H.; Qian, S.; Fang, Q.; Xu, C. Multi-modal meta multi-task learning for social media rumor detection. IEEE Trans. Multimed.; 2021; 24, pp. 1449-1459. [DOI: https://dx.doi.org/10.1109/TMM.2021.3065498]

66. Liu, Z.; Chen, C.; Huang, Z.; Chang, Y.C.; Liu, L.; Pei, Q. A Low-Cost and Lightweight Real-Time Object-Detection Method Based on UAV Remote Sensing in Transportation Systems. Remote Sens.; 2024; 16, 3712. [DOI: https://dx.doi.org/10.3390/rs16193712]

67. Chen, J.; Xu, M.; Xu, W.; Li, D.; Peng, W.; Xu, H. A flow feedback traffic prediction based on visual quantified features. IEEE Trans. Intell. Transp. Syst.; 2023; 24, pp. 10067-10075. [DOI: https://dx.doi.org/10.1109/TITS.2023.3269794]

68. Chen, C.; Jiang, J.; Fu, R.; Chen, L.; Li, C.; Wan, S. An intelligent caching strategy considering time-space characteristics in vehicular named data networks. IEEE Trans. Intell. Transp. Syst.; 2021; 23, pp. 19655-19667. [DOI: https://dx.doi.org/10.1109/TITS.2021.3128012]

69. Fang, J.; Wang, F.; Xue, J.; Chua, T.S. Behavioral intention prediction in driving scenes: A survey. IEEE Trans. Intell. Transp. Syst.; 2024; 25, pp. 8334-8355. [DOI: https://dx.doi.org/10.1109/TITS.2024.3374342]

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.