TCN-QV: an attention-based deep learning method

Full text

Turn on search term navigation

1 Introduction

As a traditional safe-haven asset, gold price fluctuations directly affect investors’ confidence and market stability. In an uncertain economic environment, gold is widely employed as a hedging tool [1]. Demand for gold often increases during periods of economic turmoil. For example, following the 2016 Brexit referendum, gold prices soared, reflecting market concerns about economic uncertainty and prompting many investors to turn to gold to mitigate risks. Similarly, when the global economy experiences recession or inflationary pressures, the demand for gold tends to increase, consequently driving up its prices. Additionally, gold is often employed as part of a country’s currency and foreign exchange reserves in international trade, playing a crucial role in the financial stability and creditworthiness of various nations. Fluctuations in gold prices directly impact the value of a nation’s foreign exchange reserves, which in turn influences the formulation of monetary policy. Therefore, closely monitoring changes in gold prices enables China to better understand international market dynamics, adjust its economic strategies promptly, maintain financial security, and ensure stable economic development [2].

After analyzing the fluctuations in gold prices and their impacts on the market and economy, researchers have recognized that accurately predicting changes in gold prices is crucial for both investors and policymakers. The price time series data of gold can essentially be considered a specific type of time series data. Time series analysis in the context of pricing can reveal historical trends in price changes and provide insights into potential factors influencing these fluctuations. Moreover, time series models are particularly adept at handling the dynamic characteristics of economic data, making them indispensable for predicting gold prices.

In recent years, scholars have developed various models for predicting metal prices. Traditional time series models, such as Autoregressive Integrated Moving Average (ARIMA), Seasonal Autoregressive Integrated Moving Average (SARIMA), exponential smoothing, vector autoregression (VAR), and state space models, effectively capture basic trends and seasonal fluctuations.

However, these linear models often struggle to explain the complex nonlinear factors that influence metal prices. Consequently, many researchers have shifted their focus to deep learning methods predominantly based on neural networks. For instance, long short-term memory networks (LSTM) [3], gated recurrent units (GRU) [4], and Temporal Convolutional Network (TCN) have proven effective in capturing dependencies and volatility patterns in price fluctuations. In particular, TCNs enhance information flow through dilated causal convolutions, offering significant advantages in training speed and stability [5]. Despite these advancements, recurrent neural networks (RNNs) and their variants face challenges such as gradient vanishing and gradient explosion, which limit their ability to learn long-term dependencies. Similarly, TCNs may still encounter difficulties in managing long-range dependencies, especially in the context of cyclic information transmission.

In addition to traditional neural network models, the application of Transformer architecture in price prediction has attracted increasing attention. Compared with recurrent neural networks, Transformers are more efficient and accurate in processing large-scale datasets [6]. However, their computational complexity is relatively high. Particularly when handling long sequences, it leads to rapidly increasing computational costs and memory requirements, which limits their applicability in resource-constrained environments [7].

Consequently, many researchers have optimized the traditional self-attention mechanism in Transformers and integrated it into recurrent neural network models to enhance predictive performance. However, these optimized models still face the challenge of exponentially increasing computational costs. It is particularly important to develop a model that can effectively extract contextual relationships while ensuring efficient and precise predictions.

To address the challenges of time series prediction, a novel model is introduced that integrates TCNs, Multi-Layer Perceptrons (MLPs), and optimized global attention mechanisms to enhance predictive performance. This model employs dilated causal convolutions stacked in multiple layers, enabling efficient parallel processing of long sequence data.To overcome the limitations of TCNs in capturing long-range dependencies, the output data from the TCN is transformed into a query matrix (Q) and a value matrix (V) through dimensional adjustments, thereby enhancing the model’s ability to interpret long sequences. By utilizing cross-multiplication, a three-dimensional tensor encapsulating contextual relationships is generated. Residual connections link the output and input, forming a complete attention mechanism for the model. This approach not only improves the performance of TCNs in long sequence prediction but also maximizes operational efficiency compared to traditional self-attention mechanisms.

The main contributions of this paper are as follows:

1. To enhance the learning effectiveness of metal price time series data, an optimized configuration named TCN-QV is proposed, which combines the TCN with a simplified Transformer structure. This model captures long-term dependencies while maintaining computational efficiency, effectively improving the accuracy of metal price predictions.

2. In response to the high computational complexity and insufficient ability of traditional models to extract contextual relationships, the TCN module connected via residuals has been selected. This module employs dilated causal convolution to enhance the network’s capacity to learn spatiotemporal representations.

3. To address the exponential increase in computational costs associated with traditional self-attention mechanisms, the QV Attention module has been introduced. This module utilizes a self-attention mechanism to improve the model’s ability to focus on different features while simultaneously reducing time costs.

4. Overall, this design-combining TCN and QV Attention-enhances the model’s capacity to capture contextual relationships through the integration of TCN. It also ensures improved feature recognition while maintaining high computational efficiency, enabling more accurate temporal predictions.

The organizational structure of this study is as follows: The second section reviews current research related to time series prediction. The third section details our TCN-QV method. The fourth section introduces the experimental environment and dataset, while the fifth section presents the experimental results and analyses. Finally, the sixth section summarizes the study and includes conclusions and discussions.

2 Related work

2.1 Time series forecasting

Time series forecasting is a statistical analysis approach that utilizes historical data to predict future events or trends. Time series data consists of observation values arranged in chronological order and typically shows certain patterns of regularity, trends, and seasonality. Throughout history, researchers have proposed and employed various statistical and machine learning methods to improve the predictive performance of models applied to time series data [8]. Scholars have conducted in-depth research based on initial models such as the ARIMA and the SARIMA [9]. For instance, Zhidan Luo et al. enhanced the ARIMA model by incorporating Taylor expansion to capture relevant information [10]. Jinhai Yao et al. developed a combined prediction model integrating ARIMA with information granulation Support Vector Regression (SVR) to forecast stock market index prices and returns [11]. Zhichao He et al. decomposed the original price sequence into Intrinsic Mode Functions (IMF) using Variational Mode Decomposition (VMD), optimized the parameters of VMD using Improved Firefly Algorithm-based Support Vector Regression (IFASSA), and classified the components of VMD and Wavelet Packet Decomposition (WPD) based on the zero-crossing rate. Eventually, ARIMA was utilized for the final prediction [12].

Meanwhile, numerous scholars have shown improved prediction accuracy by combining different types of models with neural networks or integrating various neural network architectures. For example, Rahim Barzegar et al. used boundary correction (BC) Maximum Overlap Discrete Wavelet Transform (MODWT) in combination with a hybrid CNN-LSTM model for water level prediction [13]. Naman Bhoj et al. highlighted the advantages of combination models in prediction accuracy by comparing traditional neural networks with CNN-GRU models [14]. Meanwhile, Xiao Chen et al. employed the LASSO algorithm to obtain optimal feature combinations. Subsequently, they enhanced the information extracted from individual features by integrating it with Cascaded Long Short-Term Memory (Ca-LSTM) technology [15]. These models can generate more accurate future price forecasts by learning patterns from historical data. Consequently, an increasing number of financial institutions and traders are adopting neural network-based prediction models.

Despite significant progress made by researchers in optimizing statistical and machine learning models, traditional deep learning continues to encounter bottlenecks in capturing contextual relationships. While scholars commonly enhance the data capture capabilities of neural network models through the integration of optimization algorithms, there is a noticeable paucity of research aimed at improving predictive performance by focusing on enhancements within the neural network models themselves. Addressing these challenges requires specific adjustments and optimizations to the neural network architectures when confronted with limitations in prediction performance.

2.2 Transformer and attention mechanism

With the emergence of the Transformer architecture, its unique self-attention mechanism has demonstrated strong performance and flexibility in image recognition and temporal prediction tasks [16]. Through its self-attention mechanism and parallel processing capabilities, the Transformer can better capture complex patterns in temporal data. As research in this area intensifies, Transformers and their variants are increasingly showing significant application value in time series prediction in various industries and fields [17]. For example, Mehme Burukanli et al. obtained promising results in predicting COVID-19 mutations by using Transformer models combined with Adam optimization algorithms [18]. Among the numerous models developed, many scholars tend to combine Transformers with RNN architectures. For instance, Chengyu Li and Guoqi Qian’s hybrid neural network, the FDG Transformer, integrates GRU, LSTM, and Multi-Head Attention (MHA) components [19]. This model effectively exploits the advantages of both RNNs and Transformers, achieving commendable results in stock prediction. By employing a selective sensing mechanism during context extraction, the model can quickly focus attention on important areas within the data, significantly increasing its sensitivity to relevant information during predictions. Wenjie Lu and Jiazheng Li constructed a CNN for data collection and employed an attention mechanism to assess the impact of stock data at different times on stock prices. The GRU model is also utilized for stock price predictions and proves to be more suitable than traditional neural networks [20]. Jindian Liu, Bo Zhang, and colleagues processed soybean futures price data into multiple IMFs and residual sequences using Ensemble Empirical Mode Decomposition (EEMD) [21]. They developed the NAGU model, which embeds an attention mechanism within the GRU structure and achieves strong results in predicting soybean futures prices.

Compared to traditional machine learning algorithms, the combination of self-attention mechanisms with neural networks offers significant advantages. Despite the variety of models that combine neural networks with attention mechanisms, scholars often focus more on integrating them with GRU or LSTM architectures. Hongfeng Xu, Lei Chai, and others developed a novel stock price trend prediction network based on a reinforcement learning (RL) bidirectional GRU architecture and incorporated an attention mechanism to enhance the model’s prediction accuracy [22]. The introduction of Transformers and self-attention mechanisms has fundamentally transformed the field of deep learning. Their efficiency, flexibility, and scalability serve as the foundation for many cutting-edge research initiatives and applications. However, while scholars frequently enhance model predictive capabilities by using neural network architectures-particularly RNNs-in combination with Transformers to improve sensitivity to time series data, there is relatively little exploration of flexibly combining the self-attention mechanism of Transformers with convolutional neural networks rather than simply parallelizing the entire Transformer model [23]. This gap prompts us to investigate the potential of combining TCNs with attention mechanisms [24].

In summary, individual neural networks still have limitations and struggle to efficiently perform temporal forecasting tasks [25]. Although the parallel processing capabilities of Transformers can be enhanced, redundant calculations continue to impede their efficiency in information extraction [26]. In contrast, modifying neural networks and integrating them with attention mechanisms for effective parallelism appears to be a more promising solution [27]. While researchers frequently employ RNN architectures for prediction tasks, they may overlook the advantages of TCNs in time series forecasting [28]. Therefore, this article proposes the TCN-QV module, which aims to extract contextual information more efficiently, make accurate predictions, and be easy to train.

This study proposes an innovative model that combines a MLP, a TCN, and an attention mechanism to enhance the processing capabilities for time-series data and multidimensional feature tasks. The model initially introduces the DC-Convolution layer, enabling it to effectively capture complex internal dependencies. Subsequently, a global attention mechanism is employed to enable the model to fully utilize information from the entire sequence during decision-making by calculating the query (Q) and value (V), thereby incorporating richer contextual information. To address the computational and memory overhead associated with long time series, a local sliding window method is adopted. This method extracts local features through dilated convolution, reducing computational complexity while preserving the flow of global information.

Overall, this model increases its adaptability and representational capacity for temporal data through a flexible architectural design, significantly improving its learning performance in complex feature spaces. The TCN-QV time series prediction model is illustrated in Fig 1.

[Figure omitted. See PDF.]

3 Methodology

3.1 Sliding window block

The traditional full sequence processing method requires loading the entire sequence at once, which is often impractical. This approach demands extremely high computational resources as it necessitates performing calculations on the entire sequence. Consequently, computational complexity increases linearly with the length of the sequence. To address the computational and memory challenges posed by long time series inputs, a sliding window mechanism was implemented for data processing. By defining a fixed-size window, the focus can be placed on the relationships between local time steps while avoiding the high overhead associated with processing the entire sequence at once. This method effectively extracts local features while preserving the flow of global information.

The initial window drawing concept of the model is illustrated in Fig 2. Through this local sliding window strategy, the model achieves efficient feature extraction at lower time steps, significantly reducing computational complexity while retaining key global contextual information for the time series. This approach enables the model to learn and capture useful patterns in the data more effectively.

[Figure omitted. See PDF.]

3.2 Query attention block

This code integrates the QV attention mechanism with the TCN module to achieve efficient feature extraction and modeling. The initial input data is passed through a Dilated Causal Convolution (DC-Conv) layer, which performs dilated causal convolution operations to capture local features within the time series. The output is then projected through a linear layer to generate query (Q) and values (V), which are subsequently processed using a self-attention mechanism. Queries are combined with values through element-wise multiplication to enhance the model’s ability to focus on important features.

To address issues related to gradient vanishing and network degradation, the expressive capability and stability of the model are enhanced through the incorporation of residual connections throughout the entire architecture. This enables effective collaboration between the attention mechanisms and TCN, allowing the model to capture complex patterns and long-range dependencies in the data.

3.2.1 TCN attention.

Compared to traditional RNNs, one-dimensional convolutional networks offer more stable gradient propagation, reducing problems related to gradient vanishing or exploding and providing strong parallel computing capabilities [29]. However, the convolutional layer, specifically Causal Convolution (C-Conv), is still limited by the local receptive field during the convolution process, making it insufficient for modeling longer sequences and recognizing long-term dependencies [30].

To extract more comprehensive temporal information, using a larger receptive field or increasing network depth can improve feature extraction. However, these modifications significantly increase the training time of the network and are often accompanied by overfitting. This requires the exploration of improved model architectures to address these challenges.

To alleviate these issues, C-Conv can be effectively improved to DC-Conv, which expands the receptive field by enlarging the convolution kernel. The specific implementation is illustrated in Fig 3. This approach achieves the expansion of the convolution kernel without increasing the network depth, thereby enhancing the model’s ability to process long sequence data more effectively.

[Figure omitted. See PDF.]

The TCN Attention Block presented in this article is primarily designed to process sequential data and capture long-term dependencies in time series. The receptive field is influenced by factors such as dilation rate, kernel size, and network depth. To achieve a more stable extraction effect, both the dilation coefficient is introduced and the network architecture is deepened.

In this module, there are three sub-blocks, each containing convolutional layers, clipping layers, ReLU activation functions, and dropout layers. These are encapsulated into a single TCN block. Subsequently, both the dilation coefficient and causality coefficient are incorporated to expand the convolutional kernel, thereby increasing the model’s receptive field and enhancing its capability to process longer sequences. In this context, y[x] represents the output of the convolution operation, w[m] denotes the weight of the convolution kernel, x [ t − d ⋅ m ] is the element of the input sequence, and d is the dilation rate. The calculation equation is presented in Eq (1).(1)

At the same time, to avoid gradient vanishing or network degradation, these layers are stacked and connected to residuals, enabling the TCN to capture features at different time scales and fuse them together. The activation function RELU used in the module is shown in Eq (2). The specific calculation equations are presented in Eq (3). Eq (4) summarizes the stacking of convolutional layers and the residual connections in the TCN.(2)(3)(4)

3.2.2 QV(Query Value) attention.

Compared to traditional sequence processing models such as RNNs and LSTMs [31], self-attention mechanisms are more efficient in capturing long-range dependencies within sequences. This is because the self-attention mechanism calculates the correlation between each element in the sequence and assigns a weight to each element, enabling the model to consider all related elements when processing the current element. Additionally, self-attention mechanisms exhibit strong parallelism, significantly enhancing training speed.

However, the self-attention mechanism has some drawbacks. One major issue is that its computational complexity increases linearly with the length of the sequence, which can lead to significant resource consumption for extremely long sequences [32]. Furthermore, self-attention mechanisms may not be as effective as RNNs or LSTMs for processing short sequences due to their lack of sensitivity to temporal order. This means that in certain tasks, self-attention mechanisms may not fully utilize the temporal information present in the sequence.

Although the traditional self-attention mechanism in Transformers effectively captures long-range dependencies through the QKV structure, it may introduce redundancy for long sequences and show lower sensitivity for short sequences. Therefore, this article replaces the traditional self-attention mechanism with an improved QV (Query Value) Attention mechanism.

Compared to traditional self-attention, QV Attention removes the Keys component from the self-attention mechanism, resulting in the loss of the ability to process contextual relationships. However, by reducing the computational processes associated with Queries (Q) and Keys (K) in traditional self-attention mechanisms, the computational and processing efficiency of the model is improved. The QV attention module primarily calculates the self-attention weights within a sequence to capture correlated information. In this module, the model receives the output from the TCN module, and the data processing steps are illustrated in Eqs (5), (6), and (7).

(5)(6)(7)

Meanwhile, this class incorporates a self-attention mechanism that allows the model to simultaneously focus on information from multiple time steps at different positions. Both the Q and V matrices have dimensions of . The output calculation equation is shown in Eq (8).

(8)

Subsequently, compared to the traditional Transformer structure, this model constructs a residual connection between the input and output. Residual connections enable the direct propagation of gradients back to the original input, thereby avoiding the problems of vanishing or exploding gradients. This characteristic helps achieve fast convergence of deep networks while simplifying the optimization process, as only the gradient in the residual part needs to be considered. Consequently, this enhances gradient propagation, accelerates training, and improves the model’s generalization ability.

To further enrich the expressive power of the model, the output undergoes a nonlinear transformation through two MLPs, and dropout is introduced to prevent overfitting [33]. Ultimately, the model maps the output to the desired final output dimension. Additionally, to enhance the adaptive ability of the model, residual connections are constructed to maintain information flow.

3.3 MLP block

In traditional Transformer architectures, self-attention mechanisms are commonly used to process sequential data. Although the self-attention mechanism is excellent at capturing global dependencies, its effectiveness in handling complex nonlinear patterns and feature interactions is limited by linear transformations, which can negatively affect the model’s applicability and performance in diverse tasks. The MLP layer introduced in this study addresses these limitations, enhancing the model’s feature extraction and nonlinear pattern recognition capabilities [34]. This MLP layer consists of two linear transformations and a Gaussian Error Linear Unit (GELU) [35] activation function.

Although traditional ReLU activation functions are favored for their computational simplicity and fast convergence, they possess a significant limitation: negative input values yield zero output and a zero first derivative, which can lead to the phenomenon of "dead neurons." These inactive neurons impede parameter updates during training, diminishing the network’s plasticity and constraining its capacity to learn complex patterns.

To overcome these limitations, the GELU serves as a viable alternative. Built upon the Gaussian error function, GELU permits non-zero gradients for negative input values, facilitating continuous learning and parameter updates within neurons. Consequently, GELU frequently surpasses ReLU in tasks that require fine feature extraction and complex pattern recognition.

Therefore, compared to traditional self-attention mechanisms, the MLP layer in this model can better simulate complex nonlinear relationships by using the GELU activation function. The introduction of the GELU activation function not only improves the model’s expressive power but also enhances its adaptability to complex data distributions. The GELU calculation equation is presented in Eq (9).

(9)

The model maps the input from the hidden layer dimension to the extended dimension , and then maps it back to . This design not only enhances the representation ability of the model, enabling it to capture complex nonlinear relationships, but also promotes stronger learning of local and global dependencies within the data through effective feature aggregation.

3.4 Long-term prediction framework

Long-term forecasting not only requires models to capture short-term patterns but also to accurately predict future behaviors and trends, which is particularly important in the field of financial analysis. Effective long-term forecasting can provide a foundation for decision-making, reduce risks, and optimize resource allocation.

The classic TCN architecture effectively captures contextual features through the DC-Conv module to achieve long-term prediction. In contrast, RNN models, such as LSTM and GRU, employ gating mechanisms for long-term prediction. Given the differences in network structures, it is crucial to select the most appropriate prediction method for long sequence forecasting and quantify the prediction results.

In this article, a long-term prediction method is implemented as shown in Fig 4. To achieve long-term prediction, the features and the true values were misaligned and made to correspond one by one. To ensure accurate extraction of contextual relationships by the model, the TCN model was selected for the initial stage of data processing. However, it has been observed that although TCN demonstrates better overall accuracy in data prediction compared to RNN models, its prediction results exhibit greater volatility, and the model encounters challenges in identifying key nodes of price changes in long-term forecasting. To reduce the volatility of prediction results and enhance the model’s performance, the QV Attention module was introduced.

[Figure omitted. See PDF.]

This integration enables the TCN’s long-term prediction method to fully utilize the advantages of QV Attention in long sequence forecasting, as shown in Fig 5. This approach can more accurately predict long sequences, thereby fully exploiting the model’s potential.

[Figure omitted. See PDF.]

4 Experimental section

4.1 Experimental environment

The server used for this experiment runs on a 64-bit Windows 11 system. It is equipped with a 12th generation Intel Core i7-1260P CPU with a clock speed of 2.10 GHz. The development tool utilized in this experiment is PyCharm 2020.1.3, and the programming language is Python 3.8.10.

4.2 Experimental dataset

The dataset chosen for this experiment consists of trading data for Shanghai gold in China, spanning from October 30, 2002, to July 24, 2024. The gold samples in the dataset include purities of 99.99% and 99.95%. Additionally, the experiment involves trading data for 100 grams of gold in Shanghai from December 25, 2005, to July 24, 2006, as well as trading data for Shanghai Gold (T+D) from September 27, 2004, to July 24, 2024. The data was obtained from the CBC Metal Mesh platform website through a third-party data interface.

The daily trading data encompasses various parameters, such as the opening price, highest price, lowest price, and closing price of Shanghai Gold Futures. Notably, the price of gold in Shanghai has risen steadily from approximately 85 yuan to around 550 yuan between 2002 and 2024, showing significant fluctuations during this upward trend, which complicates predictive modeling. Table 1 presents a selection of gold price data in Shanghai. In this study, the selected features are date, opening price, highest price, and lowest price, with the prediction target being the closing price.

In Table 1, the columns are defined as follows:

Date: Represents the transaction time.

Open: Indicates the daily opening price of the Shanghai Gold industry in China.

Highest: Signifies the highest price of Shanghai gold for that day.

Lowest: Denotes the lowest price of Shanghai gold for that day.

Close: Represents the closing price of Shanghai gold for that day.

[Figure omitted. See PDF.]

4.3 Data preprocessing

This study selects the top 80% of the dataset as the training set and the bottom 20% as the testing set. To prevent the excessive influence of certain feature data on model training, normalization is employed. The training and testing datasets are normalized separately by using the Min-Max scaling method, which scales all data to the range [0,1]. This approach helps balance weight updates and promotes more effective model training by reducing the impact of differences in the order of magnitude between features.The scaling calculation equation is presented in Eq (10).

(10)

The dataset can be represented as , where n is the number of data entries. x is a d-dimensional vector that represents various price data of the Shanghai gold market in China on a given day. By applying a sliding window technique to partition the data into a more detailed structure, the adjusted format can be denoted as follows:.Here, represents the window centered at the time point t, and s signifies the size of the time window. Consequently, the constructed window includes the input data from the nearest s time steps.

Thus, the dimensionality of the dataset is ( n , d ) . The time series sample set generated is three-dimensional data with dimensions ( n , s , d ) , where n denotes the number of time slices, s represents the number of time steps in each sample, and d corresponds to the number of data items encapsulated in the Shanghai gold pricing data. Assuming that the experimental data length is n, Three-dimensional data is created based on a step size of 32 and a dimension d. The construction process is illustrated in Fig 6:

[Figure omitted. See PDF.]

The data from rows 1 to 32 constitutes the first layer. The data from rows 2 to 33 forms the second layer, and so on. This leads to a total of n-32 layers. Each data point represents a time step. From the input perspective, it has a dimensionality of d = 3, while from the output perspective, d = 1. The construction of the three-dimensional data is accomplished through sliding window operations.

To evaluate the performance of various aspects of the model and comprehensively assess its predictive capability, I employ multiple metrics: RMSE (Root Mean Square Error), MAE (Mean Absolute Error), NRMSE (Normalized Root Mean Square Error), MAPE (Mean Absolute Percentage Error), and R2 (Coefficient of Determination). The combination of these metrics enables a multifaceted analysis of the model’s performance, helping to identify potential issues and optimize the model, thereby enhancing its practicality and reliability.

When represents the true value, denotes the predicted value, n is the sample size, and ȳ is the average of the true values, the calculations for these indicators can be expressed as follows:

RMSE: This metric is computed by calculating the square root of the error between predicted and actual values. It is particularly effective in highlighting the impact of larger errors, especially in highly nonlinear scenarios, thus making it a robust measure of model fit. The indicator is calculated using Eq (11).

(11)

MAE: This metric assesses model performance by determining the average absolute error. It provides a clear understanding of the average deviation between predicted and actual values, thus reflecting the model’s reliability and stability. The indicator is calculated using Eq (12).

(12)

NRMSE: This normalized measure employs RMSE to facilitate comparisons among different datasets.The indicator is calculated using Eq (13).

(13)

MAPE: This metric evaluates the percentage error of predicted values in relation to actual values. It is suitable for assessing time series data with prolonged variations, effectively demonstrating the performance of relative errors.The indicator is calculated using Eq (14).

(14)

: This value measures the model’s explanatory power with respect to data variability. It has a range from 0 to 1. A value close to 1 indicates a strong explanatory power for the actual data.The indicator is calculated using Eq (15).

(15)

5 Experimental result

5.1 Prediction experiment

In this experiment, a horizontal comparison was conducted to validate the advantages of integrating the TCN with the QV Attention model. This study involved evaluating seven prominent time series prediction models: LSTM, GRU, TCN, Transformer, MLP, TCN-LSTM, and Informer [36]. By assessing the performance of these models on the same dataset, we can perform a comprehensive analysis of each model’s predictive capabilities in practical applications, thereby providing valuable insights into their respective strengths and weaknesses.

Four datasets were used for training and evaluation, with each model undergoing meticulous adjustments to optimize performance for specific tasks. Subsequently, the results of these models are compared with the experimental outcomes of the TCN-QV model. This horizontal comparison allows us to identify the advantages of combining the TCN with the QV Attention model, providing a strong empirical foundation for future research. The experimental results for the different datasets in this study are presented in Table 2, while the predicted outcomes are illustrated in Fig 7.

[Figure omitted. See PDF.]

When comparing the prediction results of different models, the performance of the TCN-QV model is particularly remarkable. The difference between its predicted values and the true values is extremely small—almost negligible—as demonstrated by the high degree of overlap between its predicted curve and the true value curve. This outstanding consistency indicates that the TCN-QV model excels at capturing trends and details within the data, accurately reflecting fluctuations and changes.

In contrast, models such as LSTM and Informer exhibit relatively inferior performance. While they are capable of recognizing the volatility in future data, they struggle to effectively represent the magnitude of this volatility. The TCN-LSTM model, which incorporates TCN to identify temporal correlations in the data, shows significant improvement over the traditional LSTM, although it still experiences some degree of error.

The prediction curves generated by MLP, TCN, GRU, and Transformer show relatively small deviations from the actual values. However, notable discrepancies remain in areas with significant data fluctuations, leading to considerable prediction errors. For instance, while the TCN model’s predictions are closer to the true values, it still experiences significant volatility.

In this experiment, the TCN-QV model not only outperformed other models across various performance indicators but also exhibited greater accuracy and stability in its predictions. This clearly highlights the potential and advantages of the TCN-QV model for time series prediction tasks, providing strong support for future research and applications.

5.2 Processing latency experiment

In contrast, different models exhibit varying testing speeds during the iteration process. This experiment compares and tests commonly used models (including TCN, LSTM, GRU, and Transformer) under consistent conditions, ensuring the same test dataset and approximately the same training parameters (with about 84,300 parameters). By statistically analyzing their testing times, we can evaluate the models’ ability to generate outputs after learning from the dataset, which has broader practical implications.

For the four existing datasets, the testing duration and prediction metrics of each model are presented in Fig 8. In this figure, the line represents the testing duration, while the bar chart shows the metrics used to assess the models’ prediction results.

[Figure omitted. See PDF.]

As is evident from Fig 8, the testing time of the TCN model is relatively short compared to that of the other models, indicating its high efficiency in data processing. Closely following is the TCN-QV model used in this study. It has a testing duration similar to that of LSTM but generally lower than that of the GRU model, demonstrating its outstanding efficiency on these datasets. Conversely, the Transformer model exhibits the longest computation time, likely due to its complex structure and computational requirements.

When evaluating the accuracy of the models, the graph shows that both the TCN-QV and TCN models exhibit relatively low MAE and MAPE values compared to others. Specifically, the TCN-QV model achieves significantly lower values than the TCN model. In contrast, both LSTM and GRU models present high error metrics, particularly in MAE. The Transformer model also shows elevated values, indicating that the predictive capabilities of LSTM, GRU, and Transformer models were not effective in this experiment.

In summary, although the testing time of the TCN-QV model is longer than that of the TCN model, the results indicate that the model’s accuracy has significantly improved compared to TCN. Furthermore, while its testing duration is similar to that of LSTM, its performance is notably superior. The same holds true for the comparison between GRU and Transformer. Therefore, the TCN-QV model presented in this study has substantial practical application value.

5.3 Time step experiment

In this experiment, I aim to evaluate and compare the impact of predicting different time series lengths on model performance, including our proposed model and several existing models. The models employed in this study include LSTM, GRU, TCN, and Transformer. These models have demonstrated good performance in processing time series data; however, their efficacy may vary when predicting different lengths of time series. Time series data is selected from the Au (T+D) dataset and segmented based on the predictive time step training data, ensuring that the training data aligns with the predicted data across multiple time steps. Subsequently, this data is divided into subsequences of varying lengths for model training and testing. Specifically, multiple time series lengths are established, and experiments are conducted for each length. Each model is trained on the same training and test sets to ensure fairness in the experiment. The experimental results are presented in Table 3, and the predicted curves are shown in Fig 9.

[Figure omitted. See PDF.]

When predicting shorter time steps, the TCN-QV model shows significantly better performance than the other models. Its RMSE and MAE values are notably lower than those of the LSTM and GRU models. The figure illustrates that the predicted curve (dark blue) of TCN-QV almost completely overlaps with the actual price curve, highlighting its advantage in short-term forecasting. As the predictive time step increases, the performance of the TCN and Transformer models gradually improves. Particularly when the time step is 16, TCN’s RMSE and MAE show strong performance, demonstrating its potential for mid-term prediction. Although the predictive performance of the TCN-QV model declines somewhat, it still maintains extremely high accuracy compared to the other models, retaining its status as the overall top performer.

As the predicted time steps extend to longer durations, the prediction accuracy of all models generally decreases. Notably, the RMSE and MAE metrics for LSTM and GRU models significantly increase. The TCN model shows errors in predicting the overall trend, highlighting its limitations in long-term forecasting. Although TCN-QV experiences a slight downward trend, its prediction results remain more accurate than those of the other models.

From the provided prediction in Fig 9, it is evident that the disparity between the actual price curve and the predicted curves of each model varies at different time steps. The TCN-QV model demonstrates the highest degree of overlap between its predicted curve and the actual price curve, underscoring its superior predictive ability. In contrast, the prediction curves for LSTM and GRU experience significant fluctuations over long time steps, showing considerable deviations from the true prices and indicating their instability in long-term forecasting.

In summary, the TCN-QV model performs exceptionally well across all time steps, particularly in short-term forecasting. LSTM and GRU offer average performance in short-term scenarios but experience significant declines in long-term forecasting. Consequently, it is apparent that the TCN-QV model demonstrates robust predictive capabilities in price forecasting, especially in scenarios that demand high-precision short-term predictions.

5.4 Model ablation experiment

The contextual correlation mechanism of this experimental model is constructed by combining DC-Conv and QV Attention. Therefore, it is essential to investigate the contributions of the attention mechanisms from both components to the model’s performance. Conducting ablation experiments is crucial for evaluating the contributions of the QV Attention and TCN components. By systematically removing or replacing specific components while keeping relevant parameters constant, we can observe changes in model performance and clearly identify the impact of each component on the final results. The results of the ablation experiment are presented in Table 4.

[Figure omitted. See PDF.]

The TCN-QV Attention model significantly outperforms traditional time series prediction models across four datasets. In the Au99.99% dataset, it achieves exceptionally high accuracy, with RMSE and MAE metrics approaching zero and R² values reaching 0.9985, indicating an almost perfect fit.

Conversely, while the TCN model demonstrates better performance than traditional RNN or CNN models, its results remain insufficient, with relatively high RMSE and MAE values indicating notable decreases in accuracy. The QV Attention model also lags behind the TCN-QV Attention model in accuracy, despite some improvements. As shown in Fig 10, the predicted curve of the TCN-QV Attention model nearly overlaps with the actual gold price trend, especially during sharp fluctuations. It excels at capturing these dynamic changes, maintaining high accuracy in both stable and volatile conditions.

[Figure omitted. See PDF.]

The ablation experiment further confirms the TCN-QV Attention model’s advantages in time series prediction tasks, effectively combining the long-term dependency capabilities of time convolutional networks with the QV Attention mechanism. This approach enhances the model’s focus on critical time points, resulting in significant improvements in accuracy and stability.

6 Discussion

The experimental results indicate that the TCN-QV model exhibits strong performance in predicting gold prices. For small samples of metal price time series, the optimal TCN-QV model achieved a minimum improvement of approximately 5.47% and a maximum improvement of about 33.69% in MAE compared to the benchmark models across four experimental datasets. Notably, even with an increase in prediction step size, the model continues to maintain good performance. Furthermore, while controlling for the number of training parameters, the TCN-QV model demonstrates relatively high efficiency compared to the recurrent neural network benchmark models, despite its testing time being slower than that of TCN. However, the experiments presented in this paper have certain limitations. Due to the small sample size of the selected gold price dataset, recurrent neural networks like LSTM and GRU exhibit poor predictive performance, while convolutional networks, particularly TCN, perform effectively. Consequently, this study did not explore the predictive performance of TCN-QV on larger datasets, which may impact the model’s generalization ability. Additionally, the training dataset is based on weekday gold price data, which is limited in size, potentially affecting the model’s applicability to larger datasets. The experiments in this study were conducted using CPU computation, which may introduce some errors in time measurement. In the experiments focused on processing latency, we adjusted the training parameters to be similar across models, but we did not ensure complete uniformity. Consequently, the experimental results exhibit certain limitations. Overall, the TCN-QV model shows promising results in gold price prediction and could offer valuable insights for investment decisions and national risk management.

7 Conclusion

In this article, the TCN-QV Attention model is proposed, which integrates traditional convolutional networks with variants of self-attention mechanisms to efficiently extract temporal features. By stacking DC-Conv layers through residual connections for context time feature extraction, computational speed is improved without sacrificing accuracy. Additionally, a novel QV Attention mechanism is utilized to enhance the extraction of important information. Furthermore, the impact of QV Attention and the number of residual blocks on overall accuracy is discussed, aiming to achieve optimal model performance. Tests conducted across different prediction time steps show that as the time step increases, the performance of other models significantly deteriorates, while the TCN-QV Attention model maintains strong performance in long-term predictions. Thus, the TCN-QV Attention model can act as a versatile solution for various temporal prediction scenarios. In future research, I will focus on optimizing model complexity and testing duration.

References

1. 1. Mantegna RN. Hierarchical structure in financial markets. Eur Phys J B. 1999;11(1): 193–7.

* View Article

* Google Scholar

2. 2. Li Y, Du Q. Oil price volatility and gold prices volatility asymmetric links with natural resources via financial market fluctuations: implications for green recovery. Resources Policy. 2024; 88104279.

* View Article

* Google Scholar

3. 3. Esangbedo MO, Taiwo BO, Abbas HH, Hosseini S, Sazid M, Fissha Y. Enhancing the exploitation of natural resources for green energy: an application of LSTM-based meta-model for aluminum prices forecasting. Resources Policy. 2024; 92105014.

* View Article

* Google Scholar

4. 4. Huang Y, Bai Y, Ding L, Zhu Y-J, Ma Y-J. Application of a hybrid model based on ICEEMDAN, Bayesian hyperparameter optimization GRU and the ARIMA in nonferrous metal price prediction. Cybern Syst. 2022; 54(1):27–59.

* View Article

* Google Scholar

5. 5. Bai S, Kolter JZ, Koltun V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint. 2018.

* View Article

* Google Scholar

6. 6. Vaswani A. Attention is all you need. Advances in Neural Information Processing Systems: 2017.

* View Article

* Google Scholar

7. 7. Tong J, Zhang Y. A real-time label-free self-supervised deep learning intrusion detection for handling new type and few-shot attacks in iot networks. IEEE Internet Things J. 2024.

* View Article

* Google Scholar

8. 8. Zhang Y, Liang M, Ou H. Prediction of precious metal index based on ensemble learning and shap interpretable method. Comput Econ. 2024; 1–36.

* View Article

* Google Scholar

9. 9. Gao J, Cao Q, Chen Y. Auto-regressive moving diffusion models for time series forecasting. arXiv preprint. 2024.

* View Article

* Google Scholar

10. 10. Luo Z, Guo W, Liu Q, Tse Y. A hybrid prediction model with time?varying gain tracking differentiator in Taylor expansion: evidence from precious metals. J Forecasting. 2022; 42(5):1138–49.

* View Article

* Google Scholar

11. 11. YAO J. Study on stock index prediction based on arima and information granular svr combination. Oper Res Manag Sci. 2022; 31(5):214.

* View Article

* Google Scholar

12. 12. He Z, Huang J. A novel non-ferrous metal price hybrid forecasting model based on data preprocessing and error correction. Resources Policy. 2023; 86104189.

* View Article

* Google Scholar

13. 13. Barzegar R, Aalami MT, Adamowski J. Coupling a hybrid CNN-LSTM deep learning model with a Boundary Corrected Maximal Overlap Discrete Wavelet Transform for multiscale Lake water level forecasting. J Hydrol. 2021; 598126196.

* View Article

* Google Scholar

14. 14. Bhoj N, Singh Bhadoria R. Time-series based prediction for energy consumption of smart home data using hybrid convolution-recurrent neural network. Telematics Inf. 2022; 75101907.

* View Article

* Google Scholar

15. 15. Chen X, Cao L, Cao Z, Zhang H. A multi-feature stock price prediction model based on multi-feature calculation, LASSO feature selection, and Ca-LSTM network. Connect Sci. 2024; 36(1).

* View Article

* Google Scholar

16. 16. Brauwers G, Frasincar F. A general survey on attention mechanisms in deep learning. IEEE Trans Knowl Data Eng. 2023; 35(4):3279–98.

* View Article

* Google Scholar

17. 17. Wang C, Chen Y, Zhang S, Zhang Q. Stock market index prediction using deep Transformer model. Expert Syst Appl. 2022; 208118128.

* View Article

* Google Scholar

18. 18. Burukanli M, Yumuşak N. TfrAdmCov: a robust transformer encoder based model with Adam optimizer algorithm for COVID-19 mutation prediction. Connect Sci. 2024; 36(1).

* View Article

* Google Scholar

19. 19. Li C, Qian G. Stock price prediction using a frequency decomposition based GRU transformer neural network. Appl Sci. 2022; 13(1):222.

* View Article

* Google Scholar

20. 20. Lu W, Li J, Wang J, Wu S. A novel model for stock closing price prediction using cnn-attention-gru-attention. Econ Comput Econ Cybern Stud Res. 2022; 56(3).

* View Article

* Google Scholar

21. 21. Liu J, Zhang B, Zhang T, Wang J. Soybean futures price prediction model based on eemd-nagu. IEEE Access. 2023.

* View Article

* Google Scholar

22. 22. Xu H, Chai L, Luo Z, Li S. Stock movement prediction via gated recurrent unit network based on reinforcement learning with incorporated attention mechanisms. Neurocomputing. 2022; 467214–28.

* View Article

* Google Scholar

23. 23. Guo X, Hua D, Bao P, Li T, Yao N, Cao Y, et al. A short-term electricity price forecasting method based on improved vmd-pso-cnn-lstm. J Electric Power Sci Technol. 2024; 39(2):35–43.

* View Article

* Google Scholar

24. 24. Wang N, Zhao X. Time series forecasting based on convolution transformer. IEICE Trans Inf Syst. 2023; E106.D(5):976–85.

* View Article

* Google Scholar

25. 25. Zhao Y, Chen J, Shimada H, Sasaoka T. Non-ferrous metal price point and interval prediction based on variational mode decomposition and optimized LSTM network. Mathematics. 2023; 11(12):2738.

* View Article

* Google Scholar

26. 26. Tri D, Gu A. Transformers are ssms: generalized models and efficient algorithms through structured state space duality. arXiv preprint 2405.21060. 2024.

* View Article

* Google Scholar

27. 27. Li F, Zhou H, Liu M, Ding L. A medium to long-term multi-influencing factor copper price prediction method based on CNN-LSTM. IEEE Access. 2023; 1169458–73.

* View Article

* Google Scholar

28. 28. Luo D, Wang X. Moderntcn: a modern pure convolution structure for general time series analysis. The Twelfth International Conference on Learning Representations. 2024.

* View Article

* Google Scholar

29. 29. Mohsin M, Jamaani F. A novel deep-learning technique for forecasting oil price volatility using historical prices of five precious metals in context of green financing – A comparison of deep learning, machine learning, and statistical models. Resources Policy. 2023; 86104216.

* View Article

* Google Scholar

30. 30. Tian J, Shen C, Wang B, Xia X, Zhang M, Lin C. Lesson: multi-label adversarial false data injection attack for deep learning locational detection. IEEE Trans Dependable Secure Comput. 2024.

* View Article

* Google Scholar

31. 31. Jia Y, Lin Y, Yu J, Wang S, Liu T, Wan H. Pgn: the rnn’s new successor is effective for long-range time series forecasting. arXiv preprint. 2024.

* View Article

* Google Scholar

32. 32. Zhang Z, Han Y, Ma B, Liu M, Geng Z. Temporal chain network with intuitive attention mechanism for long-term series forecasting. IEEE Trans Instrum Meas. 2023.

* View Article

* Google Scholar

33. 33. Tian J, Shen C, Wang B, Ren C, Xia X, Dong R, Cheng T. Evade: targeted adversarial false data injection attacks for state estimation in smart grid. IEEE Trans Sustainable Comput. 2024.

* View Article

* Google Scholar

34. 34. Abu-Doush I, Ahmed B, Awadallah MA, Al-Betar MA, Rababaah AR. Enhancing multilayer perceptron neural network using archive-based harris hawks optimizer to predict gold prices. J. King Saud Univ Comput Inf Sci. 2023; 35(5):101557.

* View Article

* Google Scholar

35. 35. Hendrycks D, Gimpel K. Gaussian error linear units (gelus). arXiv preprint; arXiv:1606.08415. 2016.

* View Article

* Google Scholar

36. 36. Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, Zhang W. Informer: beyond efficient transformer for long sequence time-series forecasting, 2021.

* View Article

* Google Scholar

Citation: Yang Y (2025) TCN-QV: an attention-based deep learning method for long sequence time-series forecasting of gold prices. PLoS One 20(5): e0319776. https://doi.org/10.1371/journal.pone.0319776

About the Authors:

Yishuai Yang

Roles: Data curation, Formal analysis, Writing – original draft, Writing – review & editing

E-mail: [email protected]

Affiliation: College of Management Science, Chengdu University of Technology, Erxianqiao, Chengdu 610059, Sichuan, P.R.China

ORICD: https://orcid.org/0009-0004-9592-7382

References

1. Mantegna RN. Hierarchical structure in financial markets. Eur Phys J B. 1999;11(1): 193–7.

2. Li Y, Du Q. Oil price volatility and gold prices volatility asymmetric links with natural resources via financial market fluctuations: implications for green recovery. Resources Policy. 2024; 88104279.

3. Esangbedo MO, Taiwo BO, Abbas HH, Hosseini S, Sazid M, Fissha Y. Enhancing the exploitation of natural resources for green energy: an application of LSTM-based meta-model for aluminum prices forecasting. Resources Policy. 2024; 92105014.

4. Huang Y, Bai Y, Ding L, Zhu Y-J, Ma Y-J. Application of a hybrid model based on ICEEMDAN, Bayesian hyperparameter optimization GRU and the ARIMA in nonferrous metal price prediction. Cybern Syst. 2022; 54(1):27–59.

5. Bai S, Kolter JZ, Koltun V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint. 2018.

6. Vaswani A. Attention is all you need. Advances in Neural Information Processing Systems: 2017.

7. Tong J, Zhang Y. A real-time label-free self-supervised deep learning intrusion detection for handling new type and few-shot attacks in iot networks. IEEE Internet Things J. 2024.

8. Zhang Y, Liang M, Ou H. Prediction of precious metal index based on ensemble learning and shap interpretable method. Comput Econ. 2024; 1–36.

9. Gao J, Cao Q, Chen Y. Auto-regressive moving diffusion models for time series forecasting. arXiv preprint. 2024.

10. Luo Z, Guo W, Liu Q, Tse Y. A hybrid prediction model with time?varying gain tracking differentiator in Taylor expansion: evidence from precious metals. J Forecasting. 2022; 42(5):1138–49.

11. YAO J. Study on stock index prediction based on arima and information granular svr combination. Oper Res Manag Sci. 2022; 31(5):214.

12. He Z, Huang J. A novel non-ferrous metal price hybrid forecasting model based on data preprocessing and error correction. Resources Policy. 2023; 86104189.

13. Barzegar R, Aalami MT, Adamowski J. Coupling a hybrid CNN-LSTM deep learning model with a Boundary Corrected Maximal Overlap Discrete Wavelet Transform for multiscale Lake water level forecasting. J Hydrol. 2021; 598126196.

14. Bhoj N, Singh Bhadoria R. Time-series based prediction for energy consumption of smart home data using hybrid convolution-recurrent neural network. Telematics Inf. 2022; 75101907.

15. Chen X, Cao L, Cao Z, Zhang H. A multi-feature stock price prediction model based on multi-feature calculation, LASSO feature selection, and Ca-LSTM network. Connect Sci. 2024; 36(1).

16. Brauwers G, Frasincar F. A general survey on attention mechanisms in deep learning. IEEE Trans Knowl Data Eng. 2023; 35(4):3279–98.

17. Wang C, Chen Y, Zhang S, Zhang Q. Stock market index prediction using deep Transformer model. Expert Syst Appl. 2022; 208118128.

18. Burukanli M, Yumuşak N. TfrAdmCov: a robust transformer encoder based model with Adam optimizer algorithm for COVID-19 mutation prediction. Connect Sci. 2024; 36(1).

19. Li C, Qian G. Stock price prediction using a frequency decomposition based GRU transformer neural network. Appl Sci. 2022; 13(1):222.

20. Lu W, Li J, Wang J, Wu S. A novel model for stock closing price prediction using cnn-attention-gru-attention. Econ Comput Econ Cybern Stud Res. 2022; 56(3).

21. Liu J, Zhang B, Zhang T, Wang J. Soybean futures price prediction model based on eemd-nagu. IEEE Access. 2023.

22. Xu H, Chai L, Luo Z, Li S. Stock movement prediction via gated recurrent unit network based on reinforcement learning with incorporated attention mechanisms. Neurocomputing. 2022; 467214–28.

23. Guo X, Hua D, Bao P, Li T, Yao N, Cao Y, et al. A short-term electricity price forecasting method based on improved vmd-pso-cnn-lstm. J Electric Power Sci Technol. 2024; 39(2):35–43.

24. Wang N, Zhao X. Time series forecasting based on convolution transformer. IEICE Trans Inf Syst. 2023; E106.D(5):976–85.

25. Zhao Y, Chen J, Shimada H, Sasaoka T. Non-ferrous metal price point and interval prediction based on variational mode decomposition and optimized LSTM network. Mathematics. 2023; 11(12):2738.

26. Tri D, Gu A. Transformers are ssms: generalized models and efficient algorithms through structured state space duality. arXiv preprint 2405.21060. 2024.

27. Li F, Zhou H, Liu M, Ding L. A medium to long-term multi-influencing factor copper price prediction method based on CNN-LSTM. IEEE Access. 2023; 1169458–73.

28. Luo D, Wang X. Moderntcn: a modern pure convolution structure for general time series analysis. The Twelfth International Conference on Learning Representations. 2024.

29. Mohsin M, Jamaani F. A novel deep-learning technique for forecasting oil price volatility using historical prices of five precious metals in context of green financing – A comparison of deep learning, machine learning, and statistical models. Resources Policy. 2023; 86104216.

30. Tian J, Shen C, Wang B, Xia X, Zhang M, Lin C. Lesson: multi-label adversarial false data injection attack for deep learning locational detection. IEEE Trans Dependable Secure Comput. 2024.

31. Jia Y, Lin Y, Yu J, Wang S, Liu T, Wan H. Pgn: the rnn’s new successor is effective for long-range time series forecasting. arXiv preprint. 2024.

32. Zhang Z, Han Y, Ma B, Liu M, Geng Z. Temporal chain network with intuitive attention mechanism for long-term series forecasting. IEEE Trans Instrum Meas. 2023.

33. Tian J, Shen C, Wang B, Ren C, Xia X, Dong R, Cheng T. Evade: targeted adversarial false data injection attacks for state estimation in smart grid. IEEE Trans Sustainable Comput. 2024.

34. Abu-Doush I, Ahmed B, Awadallah MA, Al-Betar MA, Rababaah AR. Enhancing multilayer perceptron neural network using archive-based harris hawks optimizer to predict gold prices. J. King Saud Univ Comput Inf Sci. 2023; 35(5):101557.

35. Hendrycks D, Gimpel K. Gaussian error linear units (gelus). arXiv preprint; arXiv:1606.08415. 2016.

36. Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, Zhang W. Informer: beyond efficient transformer for long sequence time-series forecasting, 2021.

Word count: 8674

Show less

© 2025 Yang. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Accurate prediction of gold prices is crucial for investment decision-making and national risk management. The time series data of gold prices exhibits random fluctuations, non-linear characteristics, and high volatility, making prediction extremely challenging. Various methods, from classical statistics to machine learning techniques like Random Forests, Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN), have achieved high accuracy, but they also have inherent limitations. To address these issues, a model that combines Temporal Convolutional Networks (TCN) with Query (Q) and Keys (K) attention mechanisms (TCN-QV) is proposed to enhance the accuracy of gold price predictions. The model begins by employing stacked dilated causal convolution layers within the TCN framework to effectively extract temporal features from the sequence data. Subsequently, an attention mechanism is introduced to enable adaptive weight distribution according to the information features. Finally, the predicted results are generated through a dense layer. This method is used to predict the time series data of gold prices in Shanghai. The optimized model demonstrates a substantial improvement in Mean Absolute Error (MAE) compared to the baseline model, achieving reductions of approximately 5.47% in the least favorable case and up to 33.69% in the most favorable scenario across four experimental datasets. Additionally, the model is tested across different time steps and shows satisfactory performance in long sequence predictions. To validate the necessity of the model components, this paper conducts ablation experiments to confirm the significance of each segment.

Details

Title

TCN-QV: an attention-based deep learning method for long sequence time-series forecasting of gold prices

Author

Yang, Yishuai

First page

e0319776

Section

Research Article

Publication year

2025

Publication date

May 2025

Publisher

Public Library of Science

e-ISSN

19326203

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1371/journal.pone.0319776

ProQuest document ID

3200693507

TCN-QV: an attention-based deep learning method for long sequence time-series forecasting of gold prices

Jump to:

Full text

1 Introduction

2 Related work

2.1 Time series forecasting

2.2 Transformer and attention mechanism

3 Methodology

3.1 Sliding window block

3.2 Query attention block

3.2.1 TCN attention.

3.2.2 QV(Query Value) attention.

3.3 MLP block

3.4 Long-term prediction framework

4 Experimental section

4.1 Experimental environment

4.2 Experimental dataset

4.3 Data preprocessing

5 Experimental result

5.1 Prediction experiment

5.2 Processing latency experiment

5.3 Time step experiment

5.4 Model ablation experiment

6 Discussion

7 Conclusion

References

Abstract

Details

Suggested sources