A variational quantum circuits architecture with

Full text

Turn on search term navigation

Introduction

Forecasting chaotic time series is vital for understanding and predicting the behaviors of nonlinear dynamical systems in various fields, such as sensor weak signal detection [33], power and energy systems [4], financial and economic planning [16], and weather and natural disasters [22]. The difficulty in forecasting these systems arises from their extreme sensitivity to beginning conditions, where even slight alterations can result in substantial disparities in long-term predictions. The development of time series prediction approaches has witnessed the shift from traditional linear models, such as Autoregressive (AR) [10], Autoregressive Moving Average (ARMA) [38], and Differenced Autoregressive Moving Average (ARIMA) [20] models. In addition, Su Liyun proposed a new method that utilizes optimum kernel functions for multivariate local polynomial fitting. This approach uses local approximations to accurately capture nonlinear characteristics in time series data [32]. However, these approaches frequently produce significant inaccuracies when used on intricate, non-linear, and chaotic time series. As a result, academics have increasingly created and used nonlinear models to anticipate chaotic time series. The research on predicting chaotic time series has incorporated sophisticated methods such as Artificial Neural Networks (ANN) [17], RNN [31], and LSTM [31] as a result of advancements in processing capabilities and the evolution of machine learning and deep learning technologies. Sangiorgio M and Dercole F conducted a study to assess the resilience of LSTM neural networks in predicting chaotic time series over multiple steps [29]. Scientists have used neural networks and ARIMA methods to build a hybrid model for forecasting time series data. This hybrid model integrates Moving Average filters with ARIMA and ANN. Babu C N and Reddy B E offered a hybrid model in the paper [3]. Due to their remarkable nonlinear approximation capabilities and memory processes, these approaches have shown significant potential in accurately capturing time series properties. Nevertheless, these techniques may not consistently conform to the modalities of chaotic systems. Neural network models specially tailored for specific application scenarios have demonstrated exceptional generalization skills, uncovering intricate nonlinear connections even without considerable prior knowledge [36]. The complexity and unpredictability of chaotic systems present significant difficulties in improving prediction accuracy. This requires ongoing research and development of neural network structures to enhance the models' ability to generalize and adapt.

Chaotic systems possess inherent characteristics such as linear and nonlinear correlations, chaos mechanisms, and noise [24, 37]. Therefore, extracting important dynamic information from chaotic time series is essential. Introducing multi-head attention techniques in deep learning has led to significant progress in capturing crucial dynamical information in chaotic systems and improving prediction performance. Multi-head attention nears, which relies on the Transformer model [39], effectively extracting and utilizing essential features of time series data by mimicking the distribution of human attention during information processing. The progress in creating attention model variants for different application domains has been tremendous. Google researchers released BERT [9], a bidirectional model built on the Transformer architecture. BERT obtained exceptional results in several natural language processing (NLP) tasks by first training on extensive collections of text and then fine-tuning the model. Its performance, particularly in text comprehension, was unparalleled. Open AI's latest iteration of the holistic language model GPT-4 [28], constructed using the Transformer framework, can alter human writing styles, generate images, and compose code. This has the potential to impact scientific research significantly. Attention mechanisms in models capture essential semantic information, resulting in effective information transfer. In addition, the multi-head attention mechanisms exhibit excellent performance by effectively capturing intricate and comprehensive information across several levels. This highlights their remarkable capability in managing intricate information flows.

Many researchers are committed to incorporating the self-attention process with diverse modules to enhance model performance. The self-attention mechanism, known for its robust feature extraction and contextual modeling skills, has become a central focus in deep learning research. By adequately integrating with other modules, self-attention can deftly leverage its advantages, raising performance in various tasks. Fu K and Li H introduced the FGNet [12, 13], a new Fourier attention module. This module extracts frequency domain features using the Fourier transform, combines series and channel characteristics through channel swapping, and integrates these features in the frequency domain to enhance feature representation. In addition, they suggested the implementation of an information interaction module called Mix Former [12, 13]. This module enhances feature communication by extending and contracting dimensions. They improved the ability to express features by overcoming the communication obstacle between sequence and channel information. Su Liyun and his team integrated multi-head attention mechanisms with the Broad Learning System to introduce the Multi-Attn BLS, a method for forecasting chaotic time series [34]. In the domain of deep learning image processing, researchers like SF Abbasi have presented an enhanced VGG16 algorithm for the recognition of AI-generated medical images. The study synthesized 10,000 images of skin lesions using Generative Adversarial Networks (GANs) then trained the model on real photographs to enhance its classification capability[2].The team proposed an image encryption method that integrates entangled logic mapping and chaotic systems to facilitate pixel obfuscation, diffusion, and dissimilarity processing, ultimately producing a ciphertext image via a grayscale substitution box (S-Box), thereby augmenting the privacy protection of the image during transmission [1].These scientific accomplishments offer novel perspectives and approaches.

Continuous advancements in quantum computing are always driving breakthroughs in Quantum Machine Learning. The growing capabilities and stability of quantum computing provide a new motivation for optimizing machine learning models using quantum algorithms. In the field of Noisy Intermediate-Scale Quantum (NISQ) [25] machines, VQC algorithms [14, 15, 19, 30] have gained significant attention from researchers. These algorithms are at the forefront of exploring shallow quantum algorithms and surpassing traditional computational limits. In the resource-limited NISQ context, the VQC algorithm has the benefit of needing substantially fewer parameters compared to other quantum computing algorithms [5]. Researchers like Yu have suggested approaches utilizing Quantum Long Short-Term Memory Networks [40] to forecast sun radiation for the next hour. By combining quantum computing and deep learning techniques, this strategy enables precise predictions of solar radiation levels by evaluating past data. As a result, it expands the range of applications for VQC in deep learning. Qi J and other researchers have developed an improved model called TTN-VQC [26] within the field of quantum learning. This model assesses its capacity to represent target functions and forecast unseen data accurately. TTN-VQC performed exceptionally in functional regression studies on the MNIST dataset for handwritten digit classification. This was evident in its ability to effectively represent the data and generalize well to the training and testing sets. The notion of quantum states in quantum computing provides a novel approach to expressing intricate connections, especially suitable for analyzing the complex dynamics within chaotic systems. The VQC model utilizes its ability to encode quantum bits polymorphically, allowing for the modeling of complex nonlinear relationships in high-dimensional spaces. Quantum computing exhibits significant promise in various fields, such as quantum chemistry, combinatorial optimization, and machine learning. VQC possesses the potential to exceed the expressive capabilities of conventional neural networks, particularly in generative modeling and probability distribution learning tasks,nevertheless, it requires structural optimization to realize practical quantum advantages [11]. Therefore, we intend to adapt the VQC module to predict chaotic time series. The goal is to improve the accuracy and reliability of the predictions, thereby enhancing the overall performance of the prediction assignment. This study examines the VQC model's applicability in predicting chaotic time series. Additionally, it aims to develop the VQC model by improving its attention to detail and structural efficiency without compromising its high predictive performance.

This study presents a novel approach that combines multi-head attention mechanisms with VQC to forecast chaotic time series, drawing inspiration from VQC in categorizing problems. This method tries to distill relevant information at several levels by expanding the VQC model, introducing residual connections to ensure information flow coherence, and integrating multi-head attention mechanisms. This technique combines the advanced characteristics of quantum learning with deep learning technology. This technique improves forecast accuracy and the model's attention to detail. This will make the model better suited to handle the complexity and uncertainty of chaotic systems.

The main contributions of our work can be briefly described as follows:

The QMulti-Attn approach is a combination of VQC used in quantum learning and the multi-attention mechanism used in deep learning. It is designed to forecast chaotic time series.
The VQC module operates on the high-dimensional sequence that has undergone phase space projection rather than the original sequence. This approach enhances contextual information extraction from chaotic time series reconstructed in phase space. It utilizes residual connectivity to preserve information integrity, improving network stability and training efficiency.
This study presents a novel approach combining the multi-head attention mechanism with position coding to detect and filter chaotic features, remove irrelevant information, and effectively capture the reconstructed features, VQC-generated features, and their long-distance relationships. This approach aims to model the dynamic behavior of chaotic systems comprehensively.
The experiments were conducted on two simulated chaotic time series datasets, Lorenz and Rossler, and a real-world chaotic time series dataset called marine clutter. The root mean square error (RMSE) of QMulti-Attn in actual chaotic systems shows a relative improvement of 3.50% and 7.46% compared to LSTM and TSMixer [7], respectively.

The rest of this paper is organized as follows: "Related works" explores the theoretical basis of the QMulti-Attn model and provides detailed explanations of the leading technologies used in the QMulti-Attn technique. "QMulti-Attn model" offers an elaborate explanation of the fundamental stages involved in implementing the QMulti-Attn paradigm. "Experiments and results" provides a comprehensive examination of the impacts of the proposed model by utilizing three experimental datasets. In "Conclusion", we will provide a summary of the research findings and discuss potential areas for future research.

Related works

This section presents a summary of the theoretical basis that supports our suggested QMulti-Attn model. The goal is to make it easier to understand the model. This encompasses the relevant ideas of Variational Quantum Classifiers and multi-head attention mechanisms.

Reconstruction of chaotic time series in multiple dimensions

Chaotic systems produce chaotic time series and can exist in either a single variable or several variables. Due to the inherent intricacy of chaotic systems, scholars like Packard [23] and Takens [35] have established the theoretical basis for studying chaotic time series by developing the notion of phase space reconstruction. Phase space reconstruction is a method that converts time series data into a collection of points located in a phase space. By extracting lagged coordinates from the time series and aggregating them into vectors, we may establish a position in the phase space for each vector. This process transforms the initial time series data into a set of points in the phase space, allowing us to uncover the dynamic properties of the data.

The most suitable delay length and embedding dimension is crucial for phase space reconstruction. The delay time is the interval between each sample in the time series. Simultaneously, the embedding dimension dictates the number of dimensions employed to generate the vectors in phase space. Takens' embedding theorem [35] asserts that selecting an appropriate delay time and embedding dimension makes it feasible to convert a time series with a low number of dimensions into a phase space with a high number of dimensions. This transformation allows for the visualization of the dynamic properties of the original data in this new space. This approach enables a thorough understanding and analysis of system behavior, especially when studying chaotic systems or nonlinear dynamics. The reconstructed phase space points for an observed chaotic time series $\{x (t), t = 1, 2, \dots, n\}$ are defined as:

X (t) = (x (t) - x (t - τ), \dots, x (t - (m - 1) τ))

where

t = n_{1}, n_{1} + 1, \dots, n ; n_{1} = 1 + (m - 1) τ

τ

is the delay time, and

m

is the embedding dimension. The reconstructed time series is further represented as

\{X, Y\}

, where

X

is precisely expressed as:

X = [\begin{matrix} x (t_{1}) & x (t_{1} - τ) & \dots & x (t_{1} - (m - 1) τ) \\ x (t_{2}) & x (t_{2} - τ) & . . . & x (t_{2} - (m - 1) τ) \\ ⋮ & ⋮ & ⋮ & ⋮ \\ x (t_{p}) & x (t_{p} - τ) & . . . & x (t_{p} - (m - 1) τ) \end{matrix}]

Y = {[x (t_{1} + 1), x (t_{2} + 1), \dots, x (t_{P} + 1)]}^{T}

Therefore, there is a mapping function $g : R^{m} \to R$ that establishes the relationship between $X$ and $Y$ , represented as $Y = g (X)$ . This work exclusively focuses on the single-step forecasting of chaotic time series.

VQC

The core principle of VQC revolves around encoding input data and mapping it onto quantum states through the utilization of VQC. By manipulating quantum states using quantum gates and measurement operations, we extract essential characteristics for classification problems and analyze measurement outputs to categorize incoming data. The VQC architecture consists of three layers: an encoding, a variational, and a measuring layer.

Encoding layer

The encoding layer of a quantum circuit is responsible for converting input data into quantum states. To simplify the network model, this research utilizes single-qubit rotation gate encoding. This method involves altering the rotation angles to represent different input properties. The encoding of single-qubit rotation gates is distinguished by its exceptional flexibility, robust capacity for quantum state representation, minimal gate error, and effortless adjustability. The encoding process is aided by $R_{y}$ gates and Hadamard ( $H$ ) gates, which can be symbolized as:

R_{y} (α) = [\begin{matrix} cos \frac{α}{2} & - sin \frac{α}{2} \\ sin \frac{α}{2} & cos \frac{α}{2} \end{matrix}]

H = \frac{1}{\sqrt{2}} [\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}]

The encoding process can be described in the following manner: The quantum $H$ gate initially changes the quantum state $|0 ⟩ \otimes \dots \otimes| 0 ⟩$ into an entangled state, as represented by Eq. (6):

\begin{matrix} {(H | 0 ⟩)}^{\otimes N} = & \frac{1}{\sqrt{2^{N}}} (| 0 ⟩ + {| 1 ⟩)}^{\otimes N} \\ = & \frac{1}{\sqrt{2^{N}}} (| 0 ⟩ \otimes \dots \otimes | 0 ⟩ \\ + \dots + | 1 ⟩ \otimes \dots \otimes | 1 ⟩) \\ = & \frac{1}{\sqrt{2^{N}}} \sum_{i = 0}^{2^{N} - 1} | i ⟩ \end{matrix}

The classical information $(x_{1}, x_{2}, \dots, x_{n})$ is used to set the angle parameters of the $R_{y}$ gates on each quantum bit. This process results in a quantum state represented as $(\otimes_{i = 0}^{N - 1} R_{y} (arctan (x_{i})) H) {| 0 ⟩}^{\otimes N}$ ,where $n$ is the dimension of the classical data, and $N$ is the number of qubits.

Variational layer

The variational layer is the central element of the VQC structure, consisting of a sequence of adjustable parameters. After applying a series of rotation gates and CNOT gates in the VQC variational layer, measurements are performed on $N$ quantum bits using Pauli $Z$ gates. The Fubini-Study metric tensor $g$ is composed of the measurement outcomes and the outputs of parametric single-qubit gates that make up the second portion of multiple CNOT two-qubit gates. This process finalizes the update of network parameters, enhancing the optimization of the network model. The variational layer creation process can be represented by Eq. (7) for an even number of $N$ , and by Eq. (8) for an odd number of $N$ . In these equations, ${CNOT}_{i, j}$ represents the two-qubit gate that operates on the $i$ -th and $j$ -th quantum bits.

\begin{matrix} (\otimes_{i = 0}^{N - 1} R_{y} (w_{i})) ({CNOT}_{N - 2, N - 1},) \dots, {CNOT}_{1, 2}, {CNOT}_{0, 1}) {CNOT}_{N - 1, 0} \\ ({CNOT}_{N - 3, N - 1}, \dots, {CNOT}_{1, 3}, {CNOT}_{0, 2}) \\ (\begin{matrix} {CNOT}_{N - 1, N - 3}, \dots, {CNOT}_{3, 1}, {CNOT}_{2, 0} \end{matrix}) \end{matrix}

\begin{matrix} (\otimes_{i = 0}^{N - 1} R_{y} (w_{i})) ({CNOT}_{N - 2, N - 1},) \dots, {CNOT}_{1, 2}, {CNOT}_{0, 1}) {CNOT}_{N - 1, 0} \\ (\begin{matrix} {CNOT}_{N - 1, N - 2}, \dots, {CNOT}_{2, 1}, {CNOT}_{1, 0} \end{matrix}) \end{matrix}

Measurement layer

The measurement layer of a quantum circuit is responsible for converting the states of quantum bits into classical bits to acquire the final output. This layer executes measurement operations on qubits. To maximize the retention of information from the input data, this work utilizes Pauli $Z$ gates to measure each quantum bit, as expectation values can effectively extract valuable information from a quantum circuit. The resulting expectation values from measurements are then used as parameters for the subsequent layer of gates, thus connecting two quantum circuits and successfully exploiting the measurement information within the quantum circuit:

\begin{matrix} Z = & [\begin{matrix} 1 & 0 \\ 0 & - 1 \end{matrix}] = | 0 ⟩ ⟨ 0 | - | 1 ⟩ ⟨ 1 | \\ b_{i} = & {⟨ δ_{z}^{i} ⟩}_{i = 1}^{N} \\ = & ⟨ 0 | U_{0}^{+} (x) U_{i}^{+} (θ_{i}) {ZU}_{i} (θ_{i}) U_{0} (x) | 0 ⟩ \\ = & ⟨ x | U_{i}^{+} (θ_{i}) {ZU}_{i} (θ_{i}) | x ⟩ \end{matrix}

where

Z

indicate the Pauli

Z

gate,

N

denote the number of qubits in VQC, and

{⟨ δ_{z}^{i} ⟩}_{i = 1}^{N}

signify the average value of Pauli

Z

gate measurements acting on the

i

-th quantum bit,

U_{0} (x)

corresponds to the single-qubit rotation gate

R_{y}

in the VQC encoding layer, and

U_{i} (θ_{i})

reflects the single-qubit rotation gate

R_{y}

in the VQC variational layer. The VQC comprises a series of

k

indistinguishable layers of quantum circuits, as seen in Fig. 1.

[See PDF for image]

Fig. 1

The structure of VQC

Attention mechanism

The attention mechanism, which takes inspiration from the human visual system, enhances neural networks by enabling them to concentrate on crucial information while analyzing sequential input. The fundamental concept revolves around comparing a query ( $Q$ ) with a sequence of key-value pairs ( $K$ - $V$ ), resulting in dynamic weighted emphasis on the output. $Q$ , $K$ , and $V$ are all represented as vectors in this process. $Q$ is closely associated with a particular task and directs attention. $K$ represents the feature representation of the data, its form and dimensions determined by the task requirements and neural network architecture. $V$ represents the corresponding feature values of the data used to create the output. In the self-attention mechanism, the values $Q$ , $K$ , and $V$ are obtained through distinct transformations of the identical input data, representing a self-directed attribute [39].

The primary process of this mechanism comprises calculating the similarity between $Q$ and $K$ , and then combining $V$ using a weighted summation to create an output vector that has a comprehensive representation. The weight-based focusing mechanism improves the flexibility and efficiency of information processing and enhances the model's capacity to catch essential elements in complex sequences or time-series data.

Scaled dot-product attention

As per the conventional self-attention method outlined in literature [8], the computation of scaled dot-product attention can be expressed as:

A t t (Q, K, V) = S o f t m a x (\frac{{QK}^{T}}{\sqrt{d_{k}}}) V

where

Q \in R^{L_{Q} \times d_{q}}

K \in R^{L_{K} \times d_{k}}

, and

V \in R^{L_{V} \times d_{v}}

d_{q}

d_{k}

, and

d_{v}

respectively represent the dimensions of

Q

K

and

V

. The query vectors and key vectors interact using dot-product operations, which produce attention ratings between each query and key. Afterwards, the raw attention ratings are adjusted by scaling them, typically using the square root of the dimension of the query vectors as the scaling factor. This is done to enhance gradients' stability and optimize the training process's numerical strength. Subsequently, the

Softmax

function is employed to normalize the scaled attention ratings, ensuring that the total sum of attention weights across all positions is equal to 1 and highlighting the relative importance across different positions. Ultimately, the attention output is acquired by aggregating and summing the normalized attention weights with the corresponding value vectors, resulting in a representation encompassing significant information. The arrangement of scaled dot-product attention is illustrated on the left side of Fig. 2.

[See PDF for image]

Fig. 2

Multi-head attention

Multi-head self-attention

$Q$ , $K$ and $V$ are subject to $h$ distinct trainable linear projections to convert into dimensions $d_{q}$ , $d_{k}$ , and $d_{v}$ , respectively. This process aims to capture significant semantic information at several levels. Each expected iteration is identified as a separate attention head. As depicted in Fig. 2, multiple separate attention heads are merged and reprojected to obtain the final value known as multi-head attention. The formula used for computation is as follows:

\begin{matrix} M u l t i - H e a d = [h e a d_{1}, h e a d_{2}, . . ., h e a d_{h}] W^{o} \\ h e a d_{i} = A t t ({QW}_{i}^{Q}, {KW}_{i}^{K}, {VW}_{i}^{V}), i = 1, 2, . . ., h \end{matrix}

where

W_{i}^{Q} \in R^{n_{d} \times d_{k}}, W_{i}^{K} \in R^{n_{d} \times d_{k}}, W_{i}^{V} \in R^{n_{d} \times d_{v}}, W^{O} \in R^{h d_{v} \times n_{d}}

h

signifies the number of heads. The arrangement of multi-head self-attention is illustrated on the right side of Fig. 2.

QMulti-Attn model

This section introduces the justification and a new framework called the QMulti-Attn model for predicting chaotic time series.

Model motivation

Chaotic time series possess distinctive and intricate qualities such as chaotic mechanisms, self-similarity, fractal properties, nonlinear determinism, and high disorderliness. Considering these properties, constructing models for chaotic time series prediction is especially tough and complex. When addressing prediction problems involving large-scale chaotic time series, it is essential to take into account the following modeling components:

characterizing and restoring the state of chaotic dynamical systems is extremely difficult. These systems display intricate and unpredictable dynamic behaviors because of their non-linear nature and extreme sensitivity to beginning conditions. This work utilizes the C–C approach [6], which is based on phase space reconstruction, to uncover the dynamic structure of chaotic systems. By carefully choosing the delay time τ and embedding dimension m, the original time series data is transformed into trajectories in a space with many dimensions. This technique enables us to investigate chaotic systems' evolutionary processes and behavioral characteristics by analyzing phase space trajectories.
Nonlinear mapping and multi-level semantic information extraction: The time series trajectories acquired by reconstructing the phase space provide insights into the distinct characteristics of chaotic systems, including the arrangement of singular points, bifurcation phenomena, and chaotic attractors. This study creatively incorporates a multi-head attention mechanism into VQC, drawing inspiration from using quantum learning in categorization problems. This fusion method effectively captures chaotic characteristics from the reconstructed sequences while preserving the original structure's integrity, resulting in outstanding performance and improved model interpretability.

QMulti-Attn model framework

We introduce the framework of our QMulti-Attn model, which effectively analyzes and predicts the intricate dynamics of chaotic systems. This is accomplished through three primary stages: employing phase space reconstruction, utilizing VQC to capture chaotic characteristics, and extracting high-level semantic information using a multi-head attention mechanism.

Using phase space reconstruction theory [35] and the C–C approach, we retrieve the state of chaotic systems by selecting the suitable delay time τ and embedding dimension m. This process involves converting the original chaotic time series into a more predictable form X.

Algorithm 1 C–C Method for Phase Space Reconstruction

This stage not only establishes a strong basis for further detailed analysis but also improves the model's comprehension of the inherent dynamic characteristics of chaotic time series.

Given the intricate nature of the nonlinear dynamics of chaotic systems, we utilize VQC to analyze the reconstructed time series.

b_{j} = VQC (x (j), x (j - τ), \dots, x (j - (m - 1) τ))

where

j = t_{1}, t_{2}, \dots, t_{p}

, and

VQC

is a collection of

k

identical layers of quantum circuits stacked together.

We choose the value of the quantum bits $N$ to align with the embedding dimension $m$ . VQC utilizes the properties of quantum superposition and quantum parallelism to effectively evaluate the chaotic relationships within a sequence, allowing for the extraction of chaotic features. The characteristics generated by VQC are next submitted to a Hadamard Product operation with the original reconstructed data:

B = [\begin{matrix} b_{t_{1}} \\ b_{t_{2}} \\ ⋮ \\ b_{t_{p}} \end{matrix}]

Z = [BX |X)]

It is essential to mention that we include Positional Embeddings (PE) in the concatenated features $Z$ , which are then passed to the multi-head attention layer. In addition, by combining phase space reconstruction features, this approach improves the model's capacity to detect essential characteristics and maintains the structure of the phase space reconstruction data. The VQC layer enhances the fusion of chaotic attributes with quantum properties, expanding the data's representation dimensions.

We combine the fused features with the original data to create a comprehensive input vector. This vector is then introduced to the multi-head attention mechanism in our QMulti-Attn model. The computing formula for the multi-head attention mechanism in our QMulti-Attn model is as stated below:

\begin{matrix} Q_{q} = Z W^{qQ} \\ K_{q} = Z W^{qK} \\ V_{q} = Z W^{qV} \end{matrix}

where

W^{qQ}

W^{qK}

W^{qV}

are learnable parameters of three linear projection layers,

d_{q}

d_{k}

, and

d_{v}

denote the dimensions of

Q_{q}

K_{q}

and

V_{q}

respectively.

A t t (Q_{q}, K_{q}, V_{q}) = S o f t m a x (\frac{Q_{q} K_{q}^{T}}{\sqrt{d_{k}}}) V_{q}

\begin{matrix} M u l t i - H e a d = [h e a d_{1}, h e a d_{2}, \dots, h e a d_{h}] W^{o} \\ h e a d_{i} = A t t (Z W_{i}^{qQ}, Z W_{i}^{qK}, Z W_{i}^{qV}), i = 1, 2, \dots, h \end{matrix}

where

W_{i}^{qQ} \in R^{2 n_{d} \times d_{k}}, W_{i}^{qK} \in R^{2 n_{d} \times d_{k}}, W_{i}^{qV} \in R^{2 n_{d} \times d_{v}}, W^{O} \in R^{2 h d_{v} \times n_{d}}

. Here,

h

represents the number of attention heads. The multi-head attention mechanism enhances the extraction of intricate spatiotemporal relationships in the sequence, including linear correlations and nonlinear determinism. This leads to an improvement in both the overall prediction accuracy and interpretability of the model. Figure 3 illustrates the suggested QMulti-Attn framework.

[See PDF for image]

Fig. 3

Framework of the QMulti-Attn model

Experiments and results

This work utilized two commonly adopted simulated chaotic time series datasets, Lorenz and Rossler, and a real-world dataset obtained from marine clutter chaos time series data. The initial dataset comprised 10,000 values. We removed the initial and final 3000 samples from each dataset to guarantee the existence of chaotic properties. We only kept the middle 4000 samples as the primary dataset for further tests. This was done to exclude any potential non-stationarity and noise influences. In addition, we divided the initial 3000 values into a training set, allocating 70% of the data for this purpose. The remaining 1000 values were designated as test samples.

Experimental setup

Datasets

This study involved a comprehensive experimental analysis of three distinct chaotic time series datasets. Below are precise explanations of each dataset and the corresponding experimental parameters.

Lorenz chaotic time series [21], the formulation is as follows:

\{\begin{matrix} \frac{d x}{d t} = σ [y (t) - x (t)] \\ \frac{d y}{d t} = ρ x (t) - y (t) - x (t) z (t) \\ \frac{d z}{d t} = x (t) y (t) - β z (t) \end{matrix})

where

x (t)

y (t)

z (t)

represent the derivatives of

x

y

, and

z

with respect to time.

σ

ρ

, and γ are system parameters that determine the behavior of the Lorenz system. When

σ

= 10,

ρ

= 28, γ = 8/3, the Lorenz system exhibits chaotic behavior. We employed the Runge–Kutta method to produce a total of 10,000 data points for the variables

x (t)

y (t)

,and

z (t)

in the Lorenz system. We chose

x (t)

as the experimental objective and excluded the initial and final 3000 sample points. The remaining 4000 sample points were examined as the chaotic time series. The delay time for the chaotic time series was set to 16 and the embedding dimension was set to 5. The C–C approach was used to solve the problem. Following the process of phase space reconstruction, we successfully acquired a total of 3936 data points. The data points were partitioned into training and testing sets, with the initial 3000 sample points allocated for training and the remaining points designated for testing. The training data was subsequently partitioned into a training set and a validation set using a random selection process, with a ratio of 7:3.

Rossler chaotic time series [27], the formulation is as follows:

\{\begin{matrix} \frac{dx}{dt} = - y - z \\ \frac{dy}{dt} = x + a y \\ \frac{dz}{dt} = b + z (x - c) \end{matrix})

Using the same approach as the Lorenz system, the parameters for this system are $a$ = 0.2, $b$ = 0.2, $c$ = 5.9. We created 4000 sample points and chose x(t) as the experimental aim. The delay period for the Rossler chaotic time series was set to 38, and the embedding dimension was set to 5. We solved it using the C–C method. Following phase space reconstruction, we have acquired 3848 phase points. The identical partitioning procedure for the Lorenz system acquired the training, validation, and test set.

Sea Clutter chaotic time series [18]

In addition, we analyzed Sea Clutter data collected by the IPIX1993 radar at McMaster University, Canada, to verify the model's accuracy in real-world chaotic systems. The processing strategy used was identical to that employed for the two simulated chaotic time series datasets. The sample delay time was set to 16, the embedding dimension was set to 5, and the problem was solved using the C–C approach. 3936 phase points were recovered during phase space restoration. A random partition was made of the initial 3000 sample points into a training and validation set, with a ratio of 7:3. The remaining points were utilized as the test set.

Implementation setup

Tables 1 and 2 demonstrate the crucial parameter settings and experimental conditions. Table 2 provides the definitions of the variables used in the QMulti-Attn model. Specifically, $lr$ refers to the learning rate, batch size refers to the number of training batches, $N$ represents the number of quantum bits in VQC, and $k$ indicates the number of stacks of variational quantum circuit layers.

Table 1. Experiment environments

Description	Detailed configuration
CUDA	12.3
Python	3.11.5
Pytorch	2.1.0
GPU	RTX 4070

Table 2. Parameter settings of experiments

	$lr$	Batch size	Heads	$N$	$k$
Lorenz	0.00022	50	8	5	5
Rossler	0.0002
Sea Clutter	0.000065

Evaluation metrics

The primary evaluation metrics employed were RMSE, RMSPE, MAE, and MAPE, which thoroughly assess sensitivity to extremely high and low values. The meanings of the metrics mentioned above are as follows:

1) Root-Mean-Square Error (RMSE)

RMSE = \sqrt{\frac{1}{N_{T}} \sum_{i = 1}^{N_{T}} {(y_{i} - {\hat{y}}_{i})}^{2}}

2) Root-Mean-Square Percentage Error (RMSPE)

RMSPE = \sqrt{\frac{1}{N_{T}} \sum_{i = 1}^{N_{T}} {|\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}|}^{2}}

3) Mean Absolute Error (MAE)

MAE = \frac{1}{N_{T}} \sum_{i = 1}^{N_{T}} |y_{i} - {\hat{y}}_{i}|

4) Mean Absolute Percentage Error (MAPE)

MAPE = \frac{1}{N_{T}} \sum_{i = 1}^{N_{T}} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}|

where

N_{T}

is the number of samples in the test data,

y_{i}

symbolize the target actual value of the test data, and

{\hat{y}}_{i}

signify the target predicted value of the test data.

Comparison with state-of-the-art models

To assess the efficacy of the QMulti-Attn model, this study utilized the following comparative models for performance evaluation:

TSMixer: A time series mixer architecture based on a multilayer perceptron.

LSTM: A neural network architecture incorporating gating mechanisms to control information flow, effectively capturing long-term dependencies.

RNN: Comprising two layers of simple recurrent neural network units followed by a dense connection layer.

VQC: A combination of Variational Quantum Circuit and multilayer perceptron (Table 3).

Table 3. The table presents the outcomes of evaluating all datasets based on the three nonlinear dynamics metrics: Correlation Dimension, Kolmogorov entropy, and Hurst Exponent

	Correlation dimension	Kolmogorov entropy	Hurst exponent
Lorenz	0.9985	0.1474	0.8332
Rossler	0.9796	0.0291	0.9406
Sea clutter	0.7241	0.9336	0.7451

Both Lorenz and Rossler exhibit correlation dimensions near 1, signifying that these two archetypal chaotic systems possess intricate and low-dimensional attractor structures characteristic of low-dimensional chaos. Conversely, the reduced dimensionality of Sea Clutter signifies that its dynamical behavior is comparatively simpler, leading to a limited phase space dimension for reconstruction. The Kolmogorov entropy of Sea Clutter significantly exceeds that of Lorenz and Rossler, indicating a greater pace of information production in the Sea Clutter sequence and a higher level of chaos within the system. The Rossler system has minimal entropy, signifying local chaotic behavior but demonstrating a predominantly deterministic overall evolution. All three sequences possess a Hurst index exceeding 0.7, signifying a pronounced long-range positive correlation. Lorenz and Rossler exemplify typical low-dimensional chaotic systems characterized by intricate dynamics and a robust trend. The lower entropy suggests they are chaotic yet exhibit a stable evolutionary pattern. Conversely, the Sea Clutter system demonstrates greater uncertainty and reduced complexity while still maintaining a discernible trend, implying it may also be a chaotic system. A specific pattern indicates that it may be a complex system combining predictable structure with increased noise, which aligns with the physical characteristics of Sea Clutter.

The tuning experiments indicate that the model with Head = 8 consistently attains optimal or near-optimal performance regarding MAE, RMSE, and other significant error metrics across three representative chaotic sequence datasets: the Lorenz system, the Rossler system, and the Sea Clutter dataset. In the Lorenz system, Head = 8 yields the minimal MAE and RMSE; in the Rossler system, Head = 8 similarly attains the least error; in the Sea Clutter dataset, the RMSE for Head = 8 approaches that of Head = 16, despite the comparatively gradual variation in error, while the error for Head = 16 increases across multiple datasets, albeit with a more pronounced expression. Although Head = 16 offers greater expressiveness, the error increases across many datasets, indicating dispersed information and heightened training difficulty. In contrast, Head = 8 achieves an optimal balance between accuracy and efficiency, rendering it an ideal selection for harmonizing performance and computational expense. The Transformer architecture, commonly employing eight attention heads, has demonstrated robust performance across several workloads. Consequently, universally implementing Head = 8 also aids in: preserving consistency with the classical framework; enabling side-by-side comparisons with alternative models or studies; and enhancing the reproducibility and generalizability of the model. In conclusion, taking into account the experimental impact, computational efficiency, and general applicability, configuring the number of attention heads to 8 represents an optimal balance between practical efficacy and engineering feasibility, thereby ensuring stable and efficient predictive performance across various chaotic systems.

This study involved a comprehensive examination of the performance of the QMulti-Attn model and several other models in a prediction job. The results were then compared to the experimental findings in Fig. 4 and Table 4. The primary graphic presents the complete time series, illustrating the predictive performance of several models over multiple cycles. The addition of two small plots allows for more intricate perspectives. The little figure in the top-left corner provides a closer view of the section from time step 0 to 500. It highlights the disparities between the model's predictions and the data within this timeframe. The little plot in the bottom-right quadrant offers a closer examination of the specifics between time steps 400 to 410, showcasing the nuanced disparities in model predictions at these particular time intervals. The results demonstrate that the QMulti-Attn model has outstanding performance in capturing data trends and effectively managing the discrepancies between projected and actual values with a minimal margin of error. However, despite showing some learning capability during training, the VQC pure quantum learning model falls short in accurately forecasting peak and valley values in the prediction sequence. This highlights its limits in effectively handling complicated time series data. Hence, this implies that it would be essential to integrate the pristine quantum learning model with additional approaches or algorithms in specific prediction scenarios to enhance its performance.

[See PDF for image]

Fig. 4

Comparative prediction curves on the Lorenz time series

Table 4. Displays the comparative outcomes of the three tests, emphasizing the superior metrics in bold

	Lorenz	Rossler	Sea clutter
Model	MAE/MAPE/RMSE/RMSPE	MAE/MAPE/RMSE/RMSPE	MAE/MAPE/RMSE/RMSPE
TSMixer	0.0186/0.1682/0.0279/0.4101	0.0087/0.0413/0.0121/0.2032	0.1011/0.6590/0.1286/0.8118
RNN	0.0034/0.0446/0.0048/0.2112	0.0009/0.0071/0.0011/0.0843	0.1201/0.8372/0.1553/0.9150
LSTM	0.0136/0.1533/0.0261/0.3915	0.0088/0.0295/0.0156/0.1718	0.1050/0.6696/0.1341/0.8183
VQC	0.1230/1.5828/0.1475/1.2581	0.0167/0.1021/0.0242/0.3195	0.1198/0.9321/0.1517/0.9655
QMulti-Attn	0.0027/0.0209/0.0036/0.1446	0.0004/0.0024/0.0004/0.0491	0.0980/0.6509/0.1241/0.8068

MAPE and RMSPE are expressed as percentages (%) in all measures

Bold values indicate better results than other prediction methods

Based on the information presented in Table 4, the QMulti-Attn model demonstrates superior performance compared to other models like TSMixer and LSTM across all prediction error metrics (including RMSE, RMSPE, MAE, and MAPE) when applied to the Lorenz dataset. The QMulti-Attn model performs better regarding decreased prediction errors than other non-quantum learning models, particularly when compared to LSTM and TSMixer models. The QMulti-Attn model, which incorporates the multi-head attention mechanism, improves upon the VQC quantum learning model by efficiently extracting global information and multi-level semantic information from chaotic time series. As a result, it achieves superior prediction performance. The findings confirm the QMulti-Attn model's efficacy and its design architecture's soundness.

In addition, we investigated how the number of heads in the multi-head attention mechanism affects the predictive accuracy of chaotic time series. Table 5 demonstrates that the Lorenz system has a high level of sensitivity to variations in the number of heads in the multi-head attention mechanism. The model's performance is optimal when the number of self-attention heads is set to 8. As the number of heads increases or decreases, the prediction error measures (such as MAE, MAPE, RMSE, and RMSPE) consistently show an upward trend.

Table 5. Effects of initial conditions on the evaluation of MAE/MAPE/RMSE/RMSPE on Lorenz, Rossler, and Sea Clutter test sets

	Lorenz	Rossler	Sea clutter
Head	MAE/MAPE/RMSE/RMSPE	MAE/MAPE/RMSE/RMSPE	MAE/MAPE/RMSE/RMSPE
2	0.0110/0.1924/0.0124/0.4386	0.0072/0.0548/0.0075/0.2341	0.0996/0.6964/0.1270/0.8345
4	0.0056/0.1170/0.0065/0.3420	0.0020/0.0084/0.0028/0.0919	0.1039/0.7305/0.1332/0.8547
8	0.0027/0.0209/0.0036/0.1446	0.0004/0.0024/0.0004/0.0491	0.0980/0.6509/0.1241/0.8068
16	0.0053/0.0520/0.0079/0.2280	0.0041/0.0292/0.0046/0.1709	0.0976/0.6804/0.1231/0.8249

Bold values indicate better results than other head number.

Combining the multi-head attention mechanism model with VQC shows exceptional performance in predicting chaotic time series. This is due to the numerous advantages of quantum computing, such as its potential for nonlinear feature extraction, strong generalization abilities, and excellent adaptability. Compared to traditional machine learning methods, the VQC model outperforms them regarding predictive accuracy. Furthermore, efficient techniques for extracting features, such as multi-head attention processes, play a crucial role in dealing with intricate, non-linear, and high-dimensional data. Hence, the predictive technique suggested in this research demonstrates outstanding accuracy in forecasting chaotic time series of the Lorenz system.

This study thoroughly examined the efficacy of the QMulti-Attn model in accurately forecasting Rossler chaotic sequences. By examining Fig. 5 and the data in Table 4, it is evident that the QMulti-Attn model exhibits exceptional predictive capabilities. It shows small mistakes in its predictions and a high level of agreement with the actual values. The model's RMSE, RMSPE, MAE, and MAPE values are 0.0004, 0.0024, 0.0004, and 0.0491, respectively. These values are the least among all the other models being compared.

[See PDF for image]

Fig. 5

Comparative prediction curves on the Rossler time series

In contrast to the Lorenz system, the time series of the Rossler system exhibits a reduced number of peak and estimation points. This characteristic simplifies its prediction and enhances the performance of all models. The VQC, a pure quantum learning model, demonstrates comparable performance in forecasting Rossler chaotic time series as it does in the Lorenz system. However, it can only learn the system's general trend without achieving precise predictions. This outcome highlights the difficulties in pure quantum learning for time series prediction without structures like the multi-head attention mechanism. It should be emphasized that VQC is primarily intended to classify problems. Its design and training methods may not be appropriate for time series prediction, which often involves dealing with temporal dependency and data continuity. These factors can contribute to challenges when directly applying VQC to time series prediction.

However, our experimental findings also show that integrating VQC with the multi-head attention mechanism can significantly enhance performance, resulting in a considerable improvement in prediction accuracy compared to the VQC model without this mechanism. This confirms the efficacy of the multi-head attention mechanism in improving VQC's capability to address time series prediction difficulties.

In addition, we examined how varying the number of self-attention heads affects the performance of the QMulti-Attn model. Table 5 demonstrates that when the quantity of self-attention heads rises, specifically when it surpasses 2, the model's MAE, MAPE, RMSE, and RMSPE display a declining pattern, further confirming the efficacy of the multi-self-attention mechanism. Our research findings highlight the effectiveness and precision of our prediction method for chaotic time series in the Rossler system.

This study investigates the efficacy of the QMulti-Attn model in dealing with real-world Sea Clutter data. The results in Fig. 6 demonstrate that the model can effectively forecast the specific characteristics and general patterns of the marine clutter data, indicating its great predictive capacity. The model shows superior performance, as seen by its reduced prediction mistakes and high accuracy. This further strengthens its ability to accurately simulate the Sea Clutter system's intricate, non-linear, chaotic dynamics.

[See PDF for image]

Fig. 6

Comparative prediction curves on the sea clutter time series

The study examined the influence of varying numbers of self-attention heads on the predictive accuracy of the QMulti-Attn model. The findings indicated that manipulating the number of heads had minimal impact on predicting performance. Although the increase in the number of heads to 16 led to a decrease in MAE and RMSE, it did not significantly impact other performance indicators, such as MAPE and RMSPE. This implies that the disordered characteristics of marine clutter are incredibly intricate, and merely increasing the quantity of data may not significantly enhance the ability to make accurate predictions.

The work emphasizes the significance of utilizing deep learning approaches to extract characteristics from disorderly time series data, such as sea debris. Moreover, it highlights the possibility of integrating VQC with multi-head attention mechanisms, specifically to investigate the impact of increasing the number of attention heads (Table 6).

Table 6. An ablation study was conducted on the VQC and Multi-Head Attention (Multi-Attn) models using three chaotic time series datasets

Model	VQC	Multi-Attn	Lorenz	Rossler	Sea Clutter
QMulti-Attn	√	√	0.0036	0.0004	0.1241
QMulti-Attn	√	×	0.1475	0.0242	0.1522
QMulti-Attn	×	√	0.0042	0.0006	0.1284

In our QMulti-Attn model, we conduct separate ablations on VQC and Multi-Head Attention modules. RMSE metrics have been reported

Bold values indicate better results than other ablation methods

This study conducted a thorough investigation of the performance of the QMulti-Attn model through ablation experiments. The study specifically focused on integrating the quantum computing component, VQC, and the multi-head attention mechanism, Multi-Attn. When the VQC and Multi-Attn components were engaged, the model demonstrated superior performance on the Lorenz dataset, achieving a score of 0.0036. On the Rossler dataset, the score was 0.0004; on the marine clutter dataset, the score was 0.1241. The results suggest that the model had consistently low error rates on all three datasets, with awe-inspiring performance on the Lorenz and Rossler simulated datasets.

When the model was activated with only VQC and without introducing Multi-Attn, its performance decreased on all three datasets. The Lorenz and Rossler datasets found the most substantial decreases, dropping to 0.1475 and 0.0242, respectively. This discovery highlights the essential function of Multi-Attention in improving the model's overall performance, specifically in handling these two categories of datasets.

When only Multi-Attn was used, the model's performance slightly improved on the Lorenz and Rossler datasets compared to the prior scenario. However, it still significantly fell behind the performance achieved when VQC and Multi-Attn were used together. This finding demonstrates that while Multi-Attn has a beneficial impact on improving model performance, its effectiveness is constrained when VQC support is unavailable.

This study on ablation emphasizes the crucial significance of VQC and Multi-Attn (Multi-Attention) in improving the model's capacity to deal with intricate datasets. The QMulti-Attn model greatly enhances processing performance on different datasets, particularly excelling in analyzing complicated data such as the Lorenz and Rossler datasets.

Conclusion

This work presents a novel approach for forecasting chaotic time series by integrating the VQC with the Multi-Attention mechanism. The QMulti-Attn model operates on the complex and multi-dimensional sequence of chaotic time series data after it has been projected onto phase space. The VQC module initially retrieves the abundant contextual information from the phase space reconstructed chaotic time series. Furthermore, residual concatenation is employed to connect the phase space rebuilt features, thereby preserving the integrity of the reconstructed phase space information, streamlining the training procedure of the deep network, and enhancing the stability of the network. Afterward, a multi-head attention technique incorporating position coding is employed to precisely detect and separate significant chaotic characteristics while disregarding unnecessary data. The attention mechanism can effectively capture the phase space reconstruction features, VQC generation features, and long-distance dependencies. This allows for comprehensive and accurate modeling of the dynamic behavior of chaotic systems by covering multiple critical positions in the sequence. Our method has been extensively validated through trials, confirming its efficiency. Incorporating the VQC module into the multi-head attention mechanism model significantly enhances 14.29%, 33.33%, and 3.35% in the prediction performance (RMSE) of Lorenz, Rossler, and Sea Clutter, respectively. Our experimental findings indicate that the QMulti-Attn model is more effective than the VQC and the standalone multi-attention mechanism. Our technique demonstrated superior performance in experiments conducted on three datasets compared to the LSTM and the state-of-the-art TSMixer model. We present pertinent accuracy measures that substantiate this assertion. This study offers a comprehensive investigation into the combined impact of quantum computing and attention mechanisms on processing complex and nonlinear data with high dimensions.

It provides a solid theoretical basis and practical recommendations for understanding and effectively utilizing the integration of these advanced technologies. The empirical findings indicate that the QMulti-Attn model exhibits commendable performance on the Lorenz and Rossler systems, underscoring its significant capability in managing chaotic time series jobs. The findings indicate that the model is applicable to financial market research, weather forecasting, and other highly nonlinear and complex dynamic domains, offering enhanced accuracy and reliability for forecasting and decision-making in pertinent industries.

The study's shortcomings mostly stem from the intrinsic disparities between quantum computing and conventional computing paradigms. Quantum computing processes cannot be simply and equivalently translated into FLOPs metrics in classical computing; thus, simulating quantum processes on classical computers incurs exponential resource overheads. For $n$ quantum bits, the simulation necessitates $O (2^{n})$ memory. When $n = 5$ , the calculation time for a single forward propagation surpasses 24 h, it becomes challenging to demonstrate the prospective benefits of quantum computing regarding efficiency. The convergence mechanism of quantum optimization markedly differs from that of classical methods. Quantum gradient descent utilizes the quantum tunneling effect for parameter updates, with its convergence efficiency positively connected to the decoherence duration $T 2$ of quantum bits and negatively correlated to the error rate $ε_{gate}$ of quantum gates. Nevertheless, these critical measurements can only be evaluated on genuine quantum devices and are presently challenging to accurately ascertain through simulation.

Furthermore, a promising avenue for research is optimizing the co-design of quantum circuits and classical neural networks. By implementing meticulous design, we can maximize the benefits of both paradigms while reducing their drawbacks, hence boosting the performance and versatility of the models. This hybrid quantum–classical technique is anticipated to advance the development of chaotic time series prediction technology and potentially unlock novel solutions for a broader spectrum of complex system issues.

Acknowledgements

This paper was supported by the Natural Science Foundation of Chongqing in China (Grant No. CSTB2023NSCQ-MSX0374, Grant No. CSTB2023NSCQ-LZX0048) and Program for Chongqing Scholars and Innovative Research Team in University Support from Chongqing intelligent finance research collaborative innovation team.

Data availability

Data will be made available on request.

Declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1. Abbasi SF, Ahmad J, Khan JS, Khan MA, Sheikh SA (2019) Visual meaningful encryption scheme using intertwinning logistic map. In: Intelligent computing: proceedings of the 2018 computing conference, vol 2

2. Abbasi SF, Bilal M, Mukherjee T, Churm J, Pournik O, Epiphaniou G, Arvanitis TN (2024) Deep learning-based synthetic skin lesion image classification. In: Digital health and informatics innovations for sustainable health care systems. IOS Press, pp 1145–1150

3. Babu, CN; Reddy, BE. A moving-average filter based hybrid ARIMA–ANN model for forecasting time series data. Appl Soft Comput; 2014; 23, pp. 27-38.

4. Božić, M; Stojanović, M; Stajić, Z; Floranović, N. Mutual information-based inputs selection for electric load time series forecasting. Entropy; 2013; 15, 3 pp. 926-942.3041136

5. Bravo-Prieto, C; Lumbreras-Zarapico, J; Tagliacozzo, L; Latorre, JI. Scaling of variational quantum circuit depth for condensed matter systems. Quantum; 2020; 4, 272.

6. Chen, CP; Liu, Z; Feng, S. Universal approximation capability of broad learning system and its structural variations. IEEE Trans Neural Netw Learn Syst; 2018; 30, 4 pp. 1191-1204.3943046

7. Chen SA, Li CL, Yoder N, Arik SO, Pfister T (2023) Tsmixer: an all-mlp architecture for time series forecasting. arXiv preprint arXiv:2303.06053

8. Corbetta, M; Shulman, GL. Control of goal-directed and stimulus-driven attention in the brain. Nat Rev Neurosci; 2002; 3, 3 pp. 201-215.

9. Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

10. Ding, J; Han, L; Chen, X. Time series AR modeling with missing observations based on the polynomial transformation. Math Comput Model; 2010; 51, 5–6 pp. 527-536.2594704

11. Du, Y; Hsieh, M-H; Liu, T; Tao, D. Expressive power of parametrized quantum circuits. Phys Rev Res; 2020; 2, 3 033125.

12. Fu, K; Li, H; Bai, Y. Mixformer: an improved self-attention architecture applied to multivariate chaotic time series prediction. Expert Syst Appl; 2024; 241, 122484.

13. Fu, K; Li, H; Shi, X. An encoder–decoder architecture with Fourier attention for chaotic time series multi-step prediction. Appl Soft Comput; 2024; 156, 111409.

14. Gard, BT; Zhu, L; Barron, GS; Mayhall, NJ; Economou, SE; Barnes, E. Efficient symmetry-preserving state preparation circuits for the variational quantum eigensolver algorithm. npj Quant Inf; 2020; 6, 1 10.

15. Griol-Barres, I; Milla, S; Cebrián, A; Mansoori, Y; Millet, J. Variational quantum circuits for machine learning. An application for the detection of weak signals. Appl Sci; 2021; 11, 14 6427.

16. Gu, Z; Xu, Y. Chaotic dynamics analysis based on financial time series. Complexity; 2021; 2021, 1 2373423.

17. Guijo-Rubio, D; Durán-Rosal, AM; Gómez-Orellana, AM; Fernández, JC. An evolutionary artificial neural network approach for spatio-temporal wave height time series reconstruction. Appl Soft Comput; 2023; 146, 110647.

18. Haykin S (2001) The Dartmouth database of IPIX radar

19. He, Z; Li, L; Zheng, S; Li, Y; Situ, H. Variational quantum compiling with double Q-learning. New J Phys; 2021; 23, 3 033002.

20. Khashei, M; Bijari, M. A novel hybridization of artificial neural networks and ARIMA models for time series forecasting. Appl Soft Comput; 2011; 11, 2 pp. 2664-2675.

21. Lorenz, EN. Deterministic nonperiodic flow. J Atmos Sci; 1963; 20, 2 pp. 130-141.4021434

22. McGovern, A; Rosendahl, DH; Brown, RA; Droegemeier, KK. Identifying predictive multi-dimensional time series motifs: an application to severe weather prediction. Data Min Knowl Disc; 2011; 22, pp. 232-258.

23. Packard, NH; Crutchfield, JP; Farmer, JD; Shaw, RS. Geometry from a time series. Phys Rev Lett; 1980; 45, 9 712.

24. Pérez, G; Cerdeira, HA. Extracting messages masked by chaos. Phys Rev Lett; 1995; 74, 11 1970.

25. Preskill, J. Quantum computing in the NISQ era and beyond. Quantum; 2018; 2, 79.

26. Qi, J; Yang, C-HH; Chen, P-Y; Hsieh, M-H. Theoretical error performance analysis for variational quantum circuit based functional regression. npj Quant Inf; 2023; 9, 1 4.

27. Rosso, OA; Larrondo, H; Martin, MT; Plastino, A; Fuentes, MA. Distinguishing noise from chaos. Phys Rev Lett; 2007; 99, 15 154102.

28. Sanderson, K. GPT-4 is here: what scientists think. Nature; 2023; 615, 7954 773.

29. Sangiorgio, M; Dercole, F. Robustness of LSTM neural networks for multi-step forecasting of chaotic time series. Chaos Solit Fract; 2020; 139, 4118800 110045.

30. Schuld, M; Bocharov, A; Svore, KM; Wiebe, N. Circuit-centric quantum classifiers. Phys Rev A; 2020; 101, 34086395 032308.

31. Sherstinsky, A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys D; 2020; 404, 4057560 132306.

32. Su, L-Y. Prediction of multivariate chaotic time series with local polynomial fitting. Comput Math Appl; 2010; 59, 2 pp. 737-744.2575563

33. Su, L; Ling, X. Estimating weak pulse signal in chaotic background with Jordan neural network. Complexity; 2020; 2020, 1 3284587.

34. Su, L; Xiong, L; Yang, J. Multi-Attn BLS: multi-head attention mechanism with broad learning system for chaotic time series prediction. Appl Soft Comput; 2023; 132, 109831.

35. Takens F (2006) Detecting strange attractors in turbulence. In: Dynamical systems and turbulence, warwick 1980: proceedings of a symposium held at the University of Warwick 1979/80

36. Tang, L-H; Bai, Y-L; Yang, J; Lu, Y-N. A hybrid prediction method based on empirical mode decomposition and multiple model fusion for chaotic time series. Chaos Solit Fract; 2020; 141, 4166030 110366.

37. Theiler, J; Eubank, S; Longtin, A; Galdrikian, B; Farmer, JD. Testing for nonlinearity in time series: the method of surrogate data. Phys D; 1992; 58, 1–4 pp. 77-94.

38. Toque, C; Terraza, V. Time series factorial models with uncertainty measures: applications to ARMA processes and financial data. Commun Stat Theory Methods; 2011; 40, 9 pp. 1533-1544.2775938

39. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30

40. Yu, Y; Hu, G; Liu, C; Xiong, J; Wu, Z. Prediction of solar irradiance one hour ahead based on quantum long short-term memory network. IEEE Trans Quant Eng; 2023; 4, pp. 1-15.

Word count: 7903

Show less

© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Traditional deep learning approaches that rely on analyzing single-dimensional time sequences face extraordinary challenges in predicting chaotic time sequences due to their primary properties, which include considerable nonlinearity, high sensitivity to initial conditions, and dynamic variability. This paper presents a novel prediction model for Quantum Multi-head attention (QMulti-Attn) for chaotic time sequences. The model combines a variational quantum circuit (VQC) with multi-headed self-attention mechanisms. The model is specifically designed to integrate the diversity and complexity of VQC quantum state space with multi-headed self-attention mechanisms. This enables the model to identify and handle critical dynamic features in chaotic time sequences while improving its predictive accuracy and generalization capabilities. Initially, the data that has been received is transformed into a meta group of a predetermined size. The input for QMulti-Attn consists of a multidimensional array with embedded dimensions and a time delay. The model’s capacity for chaotic time series prediction is augmented by the incorporation of the VQC. As a result, the introduction of VQC improved the model’s capacity to identify and solve intricate patterns in chaotic time sequence prediction tasks. This enhancement was achieved by maintaining the integrity of the original features and simplifying the deep network’s training process through residual connections. The long-term dependency mechanism is employed to replicate the dynamic behavior of the chaotic system. The QMulti-Attn model outperforms the Recurrent Neural Network (RNN), Time-Series Mixer (TSMixer), and Long Short-Term Memory (LSTM) models in two simulated chaotic time sequence datasets (Lorenz and Rossler) and a real chaos time sequencing dataset, Sea Clutter. The model’s standardized mean square error on the Sea Clutter test set exhibits a 7.46% relative improvement compared to TSMixer. The QMulti-Attn model synergistically integrates quantum learning with deep learning to achieve remarkable performance in predicting chaotic temporal sequences. The model is anticipated to significantly enhance our comprehension and forecasting of intricate nonlinear dynamic systems in the actual world.

Details

Title

A variational quantum circuits architecture with multi-head attention for chaotic time series prediction

Author

Su, Liyun¹

; Wu, Man¹; Li, Fenglan²

¹ Chongqing University of Technology, School of Science, Chongqing, China (GRID:grid.411594.c) (ISNI:0000 0004 1777 9452)
² Chongqing University of Technology, School of Electrical and Electronic Engineering, Chongqing, China (GRID:grid.411594.c) (ISNI:0000 0004 1777 9452)

Pages

347

Publication year

2025

Publication date

Aug 2025

Publisher

Springer Nature B.V.

ISSN

21994536

e-ISSN

21986053

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1007/s40747-025-01973-y

ProQuest document ID

3219507499

A variational quantum circuits architecture with multi-head attention for chaotic time series prediction

Jump to:

Full text

Abstract

Details

Suggested sources