Content area
Improving financial time series forecasting presents challenges because models often struggle to identify diverse fault patterns in unseen data. This issue is critical in fintech, where accurate and reliable forecasting of financial data is essential for effective risk management and informed investment strategies. This work addresses these challenges by initializing the weights and biases of two proposed models, Gated Recurrent Units (GRUs) and the Echo State Network (ESN), with different chaotic sequences to enhance prediction accuracy and capabilities. We compare reservoir computing (RC) and recurrent neural network (RNN) models with and without the integration of chaotic systems, utilizing standard initialization. The models are validated on six different datasets, including the 500 largest publicly traded companies in the US (S&P500), the Irish Stock Exchange Quotient (ISEQ) dataset, the XAU and USD forex pair (XAU/USD), the USD and JPY forex pair with respect to the currency exchange rate (USD/JPY), Chinese daily stock prices, and the top 100 index of UK companies (FTSE 100). The ESN model, combined with the Lorenz system, achieves the lowest error among other models, reinforcing the effectiveness of chaos-trained models for prediction. The proposed ESN model, accelerated by the Kintex-Ultrascale KCU105 FPGA board, achieves a maximum frequency of
Full text
1. Introduction
Intelligent decision-making in financial market prediction is essential for a stock trading strategy [1]. Time series data is used across various fields of the natural and social sciences, including biologically inspired systems, population dynamics, financial markets, climate studies, and global flow models [2]. Time series data can be classified into linear and nonlinear categories; however, many actual time series exhibit significant nonlinear characteristics [3]. Predicting chaotic and non-periodic dynamic data is a challenging task; however, deep learning models, as part of machine learning, significantly enhance training effectiveness, leading to high performance and accurate results. Stock price prediction has some challenges, such as capturing patterns, dealing with noisy data, and non-stationary properties in which statistical properties vary over time [4]. In volatile financial markets, prediction models rely on time-domain or frequency-domain analysis, which affects the capture of non-stationary and multi-scale dynamics data. Fourier transform (FT) methods are efficient for analyzing stationary data, but they do not apply to non-stationary financial time series [5]. The deep learning model can be integrated with chaotic systems to enhance the accuracy of time series forecasting [6] by initializing the neural network’s weights and biases with a chaotic sequence, thereby improving the generalization capabilities of the prediction model. RNNs are effective at predicting dynamical systems, but they require significant time and resources for training [7,8]. Reservoir computing (RC) models offer an effective method for time series prediction. However, the lack of additional optimization for the output layer leads to significant variance in training outcomes [7]. The performance of RC improves when multiple reservoir layers are used rather than a single one. RC utilizes dynamical systems as nonlinear generalizations to improve learning features and hidden patterns in complex time series [9]. The Echo State Network (ESN), as a type of RC model, is efficient for time series prediction and requires low resource demand for both training and prediction. Hyperparameter optimization reduces variance across initializations [7].
Various RNN and RC models were presented in the literature to improve time series prediction accuracy. A new reservoir computing (RC) approach for mapping input data into the reservoir’s state space was proposed in [7]. This approach increased the parallelizability, depth, and predictive capabilities of the neural network model while reducing the dependence on randomness. A novel deep learning method based on the Chen system, called neural basis expansion analysis for interpretable time series forecasting (N-BEATS), was proposed in [6]. This approach improved the performance and efficiency of deep learning models for time-series forecasting. Their experiment was validated on thirteen available time series datasets. A novel approach of the Leverage Convolution LC ARFIMA– GARCH model was proposed in [10] to address the challenges of high-dimensionality, noisy, and non-stationary time series with complex fractal dynamics. The model demonstrated high performance in capturing long-range dependence, whereas the autoregressive approach often fails to capture fragile patterns, producing white noise instead. A modeling strategy, named the vector autoregression (VAR)-based rolling prediction model based on machine learning (ML) techniques, was proposed in [2] for predicting stock prices. A deep learning method combined with a proportional-integral-derivative (PID) error corrector was proposed in [11] for accurate prediction in chaotic time series and the actual pond aquaculture water environment dataset. Three machine learning methods, including RC-ESN, a deep feed-forward artificial neural network (ANN), and RNN–LSTM, were investigated in [12] for predicting short-term evolution and reproducing the long-term statistics of a multiscale spatiotemporal Lorenz 96 system. The RC–ESN method achieved the highest accuracy in forecasting chaotic trajectories. For ESN’s reservoir design, an optimization approach was presented in [13] to adapt the reservoir weights based on input data properties, thereby influencing the topology and weights’ impact on prediction accuracy. The performance of ESN depends on the reservoir structure, which relies on random weights that are independent of the input data characteristics. A mirrored echo state network (MESN) is constructed in [14] based on a mirrored algorithm to optimize input weights. This methodology mirrors the traditional ESN by exchanging the order of weight determination, thereby creating a mirror symmetry with it. This model was efficient in predicting the Mackey-Glass system (MGS) with large chaotic factors. A sparse compressed deep echo state network (SCDESN) was proposed in [1] with an arithmetic optimization algorithm to optimize the hyperparameters of the SCDESN model. This model was validated on a classic benchmark dataset and two real-world chaotic time series datasets. Grouped vector autoregressive reservoir computing (GVARC) was generated in [15] for time-series forecasting. This model relies on the theory of randomly distributed embedding (RDE) to improve the selection of large-scale parameters in deep RCs. Another method, called Hierarchical Echo State Network with Sparse Learning (HESN-SL), was proposed in [16] to reduce the approximate collinearity among echo-state information. This model ensured stability when applied to time series forecasting, which satisfies the echo state property. An emerging ML approach, the dendritic neuron model (DNM) with a scale-free differential evolution (SFDE) training algorithm, was applied in [17] to achieve better predictive performance in forecasting financial time series. A novel dual-attention-based sequential auto-encoding architecture, called DAttAE, was proposed in [8] to effectively learn and predict new COVID-19 cases as a chaotic, non-smooth time-series dataset.
Hardware accelerators speed up the complex operations of neural networks compared to general-purpose CPUs [18,19]. Deep learning models comprise numerous complex operations, such as intensive matrix calculations [20], making FPGAs an efficient platform for executing these tasks. The field-programmable gate array (FPGA)-based acceleration of neural networks offers a distinct alternative to Graphics Processing Units (GPUs), Tensor Processing Units (TPUs), and Application-Specific Integrated Circuits (ASICs) [21]. FPGA accelerators are characterized by their programmability, flexibility, improved power efficiency, parallelism, and pipelining in computations [22], as well as scalability [23].
This study explores the potential of deterministic chaotic sequences to outperform traditional random initialization methods in terms of convergence and generalization. The main focus is to improve time-series forecasting by initializing the two proposed models (GRU and ESN) with chaotic sequences and comparing their performance with that of standard initialization. The two proposed models are applied to six different datasets, including the S&P 500 dataset in [6]; ISEQ overall historical data in [10]; and XAU/USD, USD/JPY, Pci-Suntek Tech stock, and FTSE 100 historical data in [2]. The prediction accuracy of the proposed ESN and GRU models is calculated both with and without chaos training. The proposed ESN model, trained using the Lorenz system, achieves the lowest prediction error compared to the GRU model and other models in the literature. The FPGA implementation of the proposed model on Kintex-Ultrascale KCU105 achieves high performance with low power consumption.
This paper is organized as follows: Section 2 describes the datasets employed in this study. Section 3 presents an overview of the chaotic systems and standard initialization, and the mechanism of chaos training. Section 4 outlines the architectures of the proposed models. Section 5 outlines the algorithms for integrating the chaotic system with neural network training and provides a comparison of the proposed models to identify the best prediction model. Section 6 details the hardware implementation of the proposed ESN architecture and its resource utilization results. Finally, Section 7 summarizes the conclusions and suggests future research directions.
2. Overview of the Employed Datasets
2.1. S&P 500
The S&P 500 dataset represents the 500 daily largest publicly traded companies in the US [6]. This data covers about of the US equity market value. It describes the measure of the US large-cap market from 31 March 2015 to 7 April 2025, and it comprises 2522 points, as shown in Figure 1a.
2.2. ISEQ Overall Historical Data
The Irish Stock Exchange Quotient (ISEQ) dataset represents the daily open price index from 4 January 2021 to 8 March 2023 [10]. It contains 796 points, as shown in Figure 1b. It has chaotic, rapid shocks; non-stationary seasonality; and a negative-skewed distribution. These challenges make forecasting complex.
2.3. XAU/USD Historical Data
XAU/USD contains a forex pair of XAU: ISO 4217 [24] code for gold, where X is the non-currency asset and AU is the atomic symbol for gold, and USD is the US Dollar pricing currency [2]. This data includes the trading day, daily closing price, opening price, highest and lowest price during the trading day, and the daily percentage price change. This dataset represents the gold/US Dollar exchange rate from 29 December 2020 to 20 January 2022, with 524 points. The Intraday timestamps in Figure 1c illustrate the volatility patterns in the dataset.
2.4. USD/JPY Historical Data
The USD/JPY represents the forex pair consisting of the US Dollar as the base currency and the Japanese Yen as the quote currency [2]. This dataset covers the period from 29 December 2020 to 20 January 2022, with 524 points, as shown in Figure 1d. The data follows standard daily financial market conventions, containing the daily closing, opening, maximum, minimum, and the percentage change in the exchange rate.
2.5. Pci-Suntek Tech Stock Prices (600728.SS)
This dataset includes the Shanghai Stock Exchange (SSE) and provides daily historical prices of Chinese stocks [2]. The symbol 600728 indicates the unique stock code of a Chinese IT company assigned to Pci-Suntek Technology, and is the suffix for the SSE in China’s primary stock market. Each data entry contains the date, price, opening price, maximum intraday price, minimum intraday price, volume, and percentage price change. Figure 1e shows the dataset values from 28 January 2019 to 26 January 2022, capturing 730 trading days of market activity.
2.6. FTSE 100 Historical Data ()
This dataset uses the symbol , which refers to the UK-based multinational grocery retailer Tesco PLC [2]. The first part, , refers to the Tesco PLC ticker symbol, and the second part is the suffix for the London Stock Exchange (LSE). This dataset represents daily historical data for the FTSE 100 index, comprising the market capitalization of the top 100 UK companies. Figure 1f covers the dataset period from 28 January 2019 to 26 January2022, with 760 points. Each entry includes the daily closing price, opening price, high, low, trading volume, and daily percentage change.
3. Overview of the Employed Chaotic Sequences and Standard Initialization in Training Neural Network Models
3.1. The Employed Chaotic Systems
The main characteristics of chaotic systems are determinism and sensitivity to initial conditions [1]. The chaotic initialization mechanism uses a specific base value and then randomly perturbs by a value between 0 and [6]. This mechanism generates diverse initial weight configurations, thereby improving prediction accuracy. All chaotic sequences are normalized, and the number of samples for each system equals the total weight needed:
(1)
3.1.1. Standard Chen System
The Chen system is a three-dimensional continuous dynamical system [25]. The chaotic sequence of the Chen system is generated by
(2a)
(2b)
(2c)
where x, y, and z are the state variables. The parameters a, b, and c have the values 35, 3, and 28, respectively, with time step . The initial conditions are represented by(3)
The three-phase plot of the Chen system is shown in Figure 2a.
3.1.2. Perturbed Chen Oscillator
The perturbed Chen oscillator (PCO) is a chaotic system derived from the well-known Chen system, a type of continuous-time dynamical system. The system is described as follows:
(4a)
(4b)
(4c)
where d is the perturbation term, and a, b, and c are the oscillator parameters. The three-phase plot of the PCO oscillator is shown in Figure 2b. Chaotic behavior is observed by configuring the PCO parameters, , , , and , with the time step and the initial conditions:(5)
3.1.3. The Lorenz System
The Lorenz oscillator is a well-known, widely used chaotic system that arises from simplified models of atmospheric convection [26]. The system is defined as follows:
(6a)
(6b)
(6c)
where , , and are the system parameters governing the oscillator’s behavior. The chaotic behavior of the Lorenz oscillator is observed when configuring the system parameters, , , and , with time step and the initial conditions:(7)
The three-phase plot of the Lorenz system is shown in Figure 2c.
3.1.4. Mackey–Glass Time Series
The Mackey–Glass system is a time-delay differential equation represented by [7,13]:
(8)
where the values of the parameters and n are , , 17, and 10, respectively. The symbol represents the delayed x at time with time step . The time series of the Mackey–Glass equation depends on the value of parameter , displaying a range of periodic and chaotic dynamics. The system shows non-chaotic or periodic behavior for and chaotic behavior for , as shown in Figure 3.3.2. He and Xavier Standard Initialization
The standard He initialization was proposed by Kaiming He for networks using ReLU activation functions [6]. It addresses the vanishing gradient problem by scaling weights based on the number of neurons. Xavier Glorot proposed the standard Xavier initialization for sigmoid/tanh activations. It balances the variance between input and output dimensions. The initialization of weights with the He and Xavier models is represented by the following:
(9a)
(9b)
3.3. Chaotic System Distribution
Figure 4 illustrates the histogram analysis for the chaotic and standard initialization using the built-in function ‘’ in MATLAB. The standard Chen system in Figure 4a shows a broad distribution of values between and 25. The perturbed Chen oscillator (PCO) shown in Figure 4b widens the distribution from to 30 compared to the standard Chen system. The additional perturbation introduces more extreme values and increases chaotic variability. The Lorenz system in Figure 4c shows a broad distribution from to 25. The Mackey–Glass system in Figure 4d shows a concentrated distribution from to with a heavy-tailed skew, reflecting its time-delay-driven quasi-periodic chaos. He initialization in Figure 4e shows a narrow Gaussian distribution centered at 0 ranging from to , scaled by for ReLU networks. This ensures stable gradient flow by preventing variance explosion. Xavier initialization in Figure 4f is similar to He but scaled by for tanh/sigmoid activations. The distribution is slightly narrower and optimized for saturating activation functions.
Figure 5 provides a comparative analysis between chaotic and random initialization distributions. Integrating chaotic systems into deep learning models for weight initialization helps prevent vanishing gradients, enabling faster convergence and reduced overfitting [6]. These deep learning models capture long-term dependencies and nonlinear behavior in time-series data due to the fundamental structural and dynamical properties of chaotic systems. Chaotic distribution improves predictive performance compared to standard random initialization.
4. Recurrent Neural Network and Reservoir Computing Model: General Architectures
4.1. Gated Recurrent Unit Model
The GRU architecture extends conventional RNNs to address the challenge of capturing long-term temporal dependencies [22,25]. The GRU architecture is simpler than the LSTM since it does not include a separate memory cell [27]. GRU consists of two gates: the update gate and the reset gate [28]. The update gate regulates the amount of retained past information, while the reset gate governs the proportion of the previous hidden state to discard. The general architecture of the GRU model is shown in Figure 6a. The internal operations of a GRU cell are defined as:
(10a)
(10b)
(10c)
(10d)
where , , and represent the input weights, recurrent weights, and bias matrices, respectively. The symbol refers to the update gate, is the reset gate, is the candidate hidden state, is the previous hidden state, and is the current state. The hidden state at the final time step is used to generate predictions.4.2. Echo State Network Model
An ESN is a variant of RNNs that consists of an input layer, a dynamic reservoir (hidden layer), and an output layer [18,29]. The hidden layer provides complex nonlinear mappings and encodes the input signals from a low-dimensional input space to a high-dimensional state space. Computation occurs mainly within the reservoir neurons, which reduces training time [30,31]. The ESN architecture is shown in Figure 6b. The main ESN hyperparameters include: spectral radius controls the stability of the reservoir dynamics, input scaling s regulates input magnitude, and sparsity p determines reservoir connectivity. The reservoir state transforms inputs nonlinearly while retaining historical information. The state update is calculated by
(11)
where is the input at time t, and is the leakage rate that balances the previous state and the new input-driven activation. The nonlinearity function ensures that values are between and 1. The input weights and the reservoir weights are randomly generated. The training phase is separated into two stages. The first stage collects extended states:(12)
The second stage is the ridge regression for the output weights. Only the output weights are learned during training [32]. The output weights are computed by
(13)
where is the regularization coefficient that prevents overfitting in ridge regression. The symbol I is the identity matrix of size , where is the number of neurons in the reservoir layer. The target output is . The autoregressive prediction generates next-step predictions using the learned weights. The normalized predicted output is calculated by(14)
The state update during prediction is computed by
(15)
5. Prediction Models Validation
5.1. Chaotic Initialization Methodology
The GRU model features a fully trainable RNN with a much larger parameter set, while the ESN model has fewer trainable parameters, as shown in Table 1. The architecture of the GRU model consists of two hidden layers, each with 64 neurons. Both models use a sequential forecasting strategy, where multiple independent predictions are generated across the test dataset using a window size of 10 and a forecast horizon of 1. Data preprocessing begins by handling the missing values in the original dataset using linear interpolation. The activation functions are sensitive to their input range; the tanh function outputs values in the range , while the sigmoid outputs values in the range . Normalizing the input data ensures that values fall within appropriate ranges for these activation functions, maintaining effective gradient propagation during training. The input features are normalized using min–max scaling to ensure stable and efficient network training. The normalization is computed using the following formula:
(16)
Algorithm 1 demonstrates the chaotic initialization for weights in ESN training. Multiple runs are necessary to ensure the model’s prediction accuracy [7]. For chaotic weight initialization, samples are taken from a normalized chaotic time series. The same chaotic sequence is used for all weight matrices in the network.
| Algorithm 1 Chaotic Sequence-Initiated ESN |
1:. Input: , Reservoir size: , Spectral radius: , Leakage rate: , Regularization: , Sparsity: s, Training ratio: 2:. procedure 3:. 4:. ▹ Clip to [−1, 1] and scale to [−0.1, 0.1] 5:. 6:. ▹ Indices of sparse matrix with approximately () 7:. 8:. ▹ Extract non-zero values 9:. 10:. 11:. 12:. return 13:. end procedure 14:. procedure 15:. Initialize reservoir state: 16:. for each sequence do 17:. for to W do 18:. 19:. end for 20:. 21:. end for 22:. Compute output weights: 23:. 24:. return 25:. end procedure 26:. procedure 27:. for to 10 do 28:. for each method do 29:. if m is chaotic method then 30:. 31:. 32:. else 33:. 34:. end if 35:. end for 36:. end for 37:. end procedure |
5.2. Prediction Result Comparison
Figure 7 and Figure 8 illustrate how different chaotic systems influence the prediction performance of the GRU and ESN models across ten independent runs. The ESN model with the Lorenz system achieves the best predictive performance, as shown in Table 2. Performance metrics are computed on the normalized prediction values using the following equations:
(17a)
(17b)
(17c)
(17d)
where MSE is the mean square error, MAE is the mean absolute error, and MAPE is the mean absolute percentage error.6. Hardware Implementation of the Proposed ESN Model
6.1. Precision Analysis
The fixed-point implementation of the ESN prediction process in MATLAB is achieved using the ‘’ function. The floating-point representation provides accurate values, while the fixed-point representation simulates hardware behavior. This approach enables evaluation of numerical accuracy under realistic hardware constraints. The fractional bit width is selected based on error analysis between the floating-point reference and fixed-point implementation. As illustrated in Figure 9, a 16-bit fractional precision provides the lowest error across the tested range of 5 to 20 fractional bits.
6.2. Hyperbolic Tangent Function Implementation
The hyperbolic tangent activation function () in the ESN model is implemented using a polynomial approximation of order five, as shown in Figure 10. The maximum error between the fixed-point implementation and MATLAB’s built-in function is . This polynomial approximation achieves high-accuracy and low-complexity approximation using Horner’s method, as shown in the following equations:
(18)
where y is the input signal and refers to the polynomial coefficients. These coefficients are calculated using the built-in function ‘’ in MATLAB, as shown in Table 3. These coefficients represent the positive region at , with the other region () derived from the symmetric property:(19)
The implementation of the function is applied element-wise. The hardware architecture of the tanh implementation consists of only one multiplier and one adder, as shown in Figure 11. The timing diagram in Figure 12 of the ‘’ module shows two complete computation cycles for inputs and , demonstrating the pipelined nature of the hyperbolic tangent calculations. The control signal ‘’ is the positive edge clock, is the negative edge reset, and the output flag is . The controller state ‘’ has values from 0 to 4. The output values from ‘’ are and . The vertical dashed lines mark the clock edges, and the arrows indicate the computation latency, which is five cycles. The labels represent the computation phases. The input is latched in the first cycle (state 0), and the polynomial is evaluated through states 1 to 4. The first output is carried out at the fifth cycle with output signal ‘’ at state 4. The second input is latched in the sixth cycle in state 0.
6.3. Matrix Multiplication Implementation
Multiplication operations are computationally intensive and dominate resource utilization. The only two multiplication operations are the state prediction ‘S_pred’ and the output prediction ‘Y_pred’ in Equations (11) and (14). The multiplication implementation for two 100-element vectors is described by the following equation:
(20)
where is the first input with size , and is the second input with size . The index i iterates from 0 to 99. The best implementation to reduce hardware complexity is to use one shared multiplication block instead of two. This design avoids duplication and reduces overall resource consumption by ∼50% compared to the dual-block design. The hardware block in Figure 13 uses 100 multipliers each for a bit-width of 19 bits.6.4. Reservoir State Implementation
Figure 14 illustrates the top module of the proposed ESN for the updated state and the predicted output calculations. The weights , , and and the previous state are obtained from the software training process and then stored in memory blocks (RAMs). The size of is , is , is , and is . The address of the RAM for and accumulates when the signal ‘’ is activated. The shared block ‘Matrix_MU’ calculates the matrix multiplication operations of size 100 for the updated and prediction states. The 8-bit counter ‘Counter_100’ counts from 0 to 99 for each neuron index. This counter manages the state update and the output prediction operations, generating the control signal ‘’ after reaching the value 99.
The control unit of the top proposed model in Figure 15 consists of three states. The ‘’ state indicates the end of the prediction process at ‘’. The ‘’ state computes the new reservoir state . The prediction output state ‘’ computes the final predicted output when the signal ‘’ is activated.
6.5. Utilization Results and Performance Evaluation Comparison
The architecture of the proposed ESN model is implemented on the Kintex-UltraScale KCU105 FPGA board, achieving a maximum frequency of 83.5 MHz. The total power consumption of the design equals 0.677 W, with a dynamic power of 0.197 W. The total prediction latency is 496 cycles, corresponding to 4.96 µs. Table 4 presents the resource utilization of the design, which includes the look-up table (LUT), block RAM (BRAM), flip-flop (FF), and digital signal processing block (DSP). The DSP utilization is (104 units) and the BRAM utilization is ( blocks), demonstrating an efficient mapping of reservoir computations to FPGA hardware. The low LUTs/FFs utilization confirms minimal overhead for control logic, highlighting the efficiency for latency-critical edge inference.
Table 5 states that the proposed ESN achieves lower power than the models in [18]. The performance calculations are given by
(21)
where is the Giga Operations Per Second, and is the maximum operating frequency. The number of operations is 102 per cycle, indicating the maximum number of adders and multipliers that operate simultaneously. The performance is , so the efficiency in terms of is .7. Conclusions
This study examined the efficacy of chaotic dynamical systems in initializing the weights and biases of neural network models, including GRU and ESN. This work presented a hardware accelerator ESN model with Lorenz chaos training for accurate time series prediction. This proposed model was compared with the proposed GRU model, both with and without chaotic integration, to validate the prediction accuracy. The ESN and GRU models were trained by four different chaotic systems, including the standard Chen system, PCO, Lorenz system, and Mackey–Glass equation, and two standard systems, named He and Xavier initialization. The validation of the models was implemented using six different datasets: S&P 500, ISEQ overall historical data, XAU/USD, USD/JPY, Pci-Suntek Tech stock, and FTSE 100 historical data. The prediction results of the proposed ESN model in hardware simulation were compared to those of floating and fixed-point representations. The proposed ESN model, trained by the Lorenz system, delivered the lowest prediction errors. The hardware implementation of the proposed ESN architecture achieved a power consumption of W at a maximum frequency of MHz on Kintex UltraScale KCU105 FPGA board. Future work will explore different models, such as transformer and autoencoder, for more accuracy and generalization in financial time series prediction tasks. Additionally, we plan to explore different time horizons to achieve a more efficient and stable model.
Conceptualization, Z.A.H., M.H.Y., and L.A.S.; methodology, Z.A.H., M.H.Y., and L.A.S.; software, Z.A.H. and M.H.Y.; validation, Z.A.H. and M.H.Y.; formal analysis, Z.A.H. and M.H.Y.; investigation, Z.A.H. and M.H.Y.; resources, L.A.S.; data curation, Z.A.H. and M.H.Y.; writing—original draft preparation, Z.A.H.; writing—review and editing, M.H.Y. and L.A.S.; visualization, Z.A.H. and M.H.Y.; supervision, L.A.S.; project administration, L.A.S.; funding acquisition, L.A.S. All authors have read and agreed to the published version of the manuscript.
S&P 500 dataset:
The authors declare no conflicts of interest.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Figure 1 Time series vs. actual prices for the employed datasets: (a) S&P 500, (b) ISEQ overall historical data, (c) XAU USD historical data, (d) USD JPY historical data, (e) Pci-Suntek Tech stock prices, and (f) FTSE 100 historical data.
Figure 2 Chaotic system’s three phases: (a) standard Chen system, (b) perturbed Chen oscillator, and (c) Lorenz system.
Figure 3 Mackey–Glass equation time series for different values of
Figure 4 The distribution for the following: (a) standard Chen system, (b) perturbed Chen oscillator, (c) Lorenz system, (d) Mackey–Glass equation, (e) He initialization, and (f) Xavier initialization.
Figure 5 Chaotic vs. random distribution analysis: (a) power spectrum, (b) autocorrelation comparison, (c) gradient patterns, and (d) phase space.
Figure 6 General architectures of: (a) the GRU model and (b) the ESN model.
Figure 7 The proposed models’ predictions overlaid on the actual data with all chaotic systems: (a) GRU for S&P 500 dataset, (b) ESN for S&P 500 dataset, (c) GRU for ISEQ dataset, (d) ESN for ISEQ dataset, (e) GRU for XAU/USD dataset, and (f) ESN for XAU/USD dataset.
Figure 8 The proposed models’ predictions overlaid on the actual data with all chaotic systems: (a) GRU for USD/JPY dataset, (b) ESN for USD/JPY dataset, (c) GRU for Pci-Suntek dataset, (d) ESN for Pci-Suntek dataset, (e) GRU for FTSE 100 dataset, and (f) ESN for FTSE 100 dataset.
Figure 9 Error comparison between floating-point, represented by the built-in function ‘
Figure 10 The
Figure 11 The polynomial approximation of the hardware implementation of the ‘
Figure 12 Timing diagram of the pipelined hyperbolic tangent (
Figure 13 Matrix multiplication block of size 100.
Figure 14 Top-module hardware architecture of the proposed ESN for time series forecasting prediction.
Figure 15 Control unit of the proposed top-module ESN.
Quantitative comparison of the key characteristic parameters of the proposed GRU and ESN models.
| Model | Key Parameters | Trainable Parameters | Optimization |
|---|---|---|---|
| ESN | Reservoir size = 100 | Only output weights: | No backpropagation; |
| GRU | Hidden size = 64 | Weights and biases | Gradient-based with |
Performance metric comparisons for different datasets with and without chaos training models.
| Employed | Models | Performance Metrics | |||||
|---|---|---|---|---|---|---|---|
| MSE | MAE | MAPE | |||||
| Chaotic | Non-Chaotic | Chaotic | Non-Chaotic | Chaotic | Non-Chaotic | ||
| S&P 500 | GRU (proposed) | 0.000183 | 0.000129 | 0.043727 | 0.009958 | 5.0875% | 1.2055% |
| ESN (proposed) | | | 0.0017 | 0.0627 | | | |
| N-BEATS in [ | | | ____ | ____ | | | |
| ISEQ | GRU (proposed) | 0.00211053 | 0.000080316 | 0.0342342 | 0.0068244 | 6.07399% | 2.1758% |
| ESN (proposed) | | | | 0.089523 | | ____ | |
| LC ARFIMA | ____ | | ____ | | ____ | | |
| XAU/USD | GRU (proposed) | 0.000568 | 0.0006516 | 0.018667 | 0.02112 | 3.26004% | 3.6862% |
| ESN (proposed) | | | 0.017021 | 0.0959 | 3.0093% | ____ | |
| VAR in [ | ____ | | ____ | ____ | ____ | ____ | |
| USD/JPY | GRU (proposed) | 0.004955 | 0.002373 | 0.057375 | 0.037068 | 4.62643% | 3.0454% |
| ESN (proposed) | | 1.8405 | 0.03501 | 1.1495 | 2.8868 | ____ | |
| VAR in [ | ____ | | ____ | ____ | ____ | ____ | |
| Pci-Suntek | GRU (proposed) | 0.00078 | 0.000855 | 0.0199 | 0.02194 | 7.67069% | 8.1976% |
| ESN (proposed) | | 0.017508 | 0.018295 | 0.093627 | 6.93355% | ____ | |
| VAR in [ | ____ | | ____ | ____ | ____ | ____ | |
| FTSE 100 | GRU (proposed) | 0.000775 | 0.0005818 | 0.02130 | 0.018116 | 2.58066% | 2.2101% |
| ESN (proposed) | | 0.008423 | 0.015755 | 0.068239 | 1.8993% | 13.6599% | |
| VAR in [ | ____ | | ____ | ____ | ____ | ____ | |
The font color is to highlight the best results compared to those in the other models.
The polynomial coefficients for
| | | | | | | |
|---|---|---|---|---|---|---|
| Coefficients | | | | | | |
The resource utilization on FPGA of the proposed ESN.
| Resource | Utilization | Available | Utilization % |
|---|---|---|---|
| LUTs | 15,833 | 2,424,000 | |
| FFs | 4306 | 484,800 | |
| BRAMs | | 600 | |
| DSPs | 104 | 1920 | |
Hardware implementation results between the proposed ESN model and the other ESN models in [
| Metric | ESN in [ | intESN in [ | ESN (Proposed) | ||||
|---|---|---|---|---|---|---|---|
| Neurons | 100 | 200 | 300 | 100 | 200 | 300 | 100 |
| Device | Xilinx Zynq-7000 FPGA | Xilinx Zynq-7000 FPGA | Kintex-UltraScale KCU105 | ||||
| Max Frequency (MHz) | 100 MHz | 100 MHz | 83.5 MHz | ||||
| Power (W) | 1.73 | 1.78 | 1.95 | 1.59 | 1.6 | 1.6 | 0.677 W (dynamical power 0.197 W) |
| Precision | 32-bit float | 3-bit integer | 19-bit (16-bit fractional, 3-bit integer) | ||||
1. Wang, H.; Mo, Y. Sparse compressed deep echo state network with improved arithmetic optimization algorithm for chaotic time series prediction. Expert Syst. Appl.; 2025; 259, 125249. [DOI: https://dx.doi.org/10.1016/j.eswa.2024.125249]
2. Chen, J.; Wen, Y.; Nanehkaran, Y.; Suzauddola, M.; Chen, W.; Zhang, D. Machine learning techniques for stock price prediction and graphic signal recognition. Eng. Appl. Artif. Intell.; 2023; 121, 106038. [DOI: https://dx.doi.org/10.1016/j.engappai.2023.106038]
3. Wen, S.C.; Yang, C.H. Time series analysis and prediction of nonlinear systems with ensemble learning framework applied to deep learning neural networks. Inf. Sci.; 2021; 572, pp. 167-181. [DOI: https://dx.doi.org/10.1016/j.ins.2021.04.094]
4. Yu, H.; Dong, J.; Li, B. Economic insight through an optimized deep learning model for stock market prediction regarding Dow Jones industrial Average index. Expert Syst. Appl.; 2025; 291, 128473. [DOI: https://dx.doi.org/10.1016/j.eswa.2025.128473]
5. Huang, Y.; Pei, Z.; Yan, J.; Zhou, C.; Lu, X. A combined Adaptive Gaussian Short-Term Fourier Transform and Mamba framework for stock price prediction. Eng. Appl. Artif. Intell.; 2025; 162, 112588. [DOI: https://dx.doi.org/10.1016/j.engappai.2025.112588]
6. Jia, B.; Wu, H.; Guo, K. Chaos theory meets deep learning: A new approach to time series forecasting. Expert Syst. Appl.; 2024; 255, 124533. [DOI: https://dx.doi.org/10.1016/j.eswa.2024.124533]
7. Viehweg, J.; Walther, D.; Mäder, P. Temporal convolution derived multi-layered reservoir computing. Neurocomputing; 2025; 617, 128938. [DOI: https://dx.doi.org/10.1016/j.neucom.2024.128938]
8. Pham, P.; Pedrycz, W.; Vo, B. Dual attention-based sequential auto-encoder for COVID-19 outbreak forecasting: A case study in Vietnam. Expert Syst. Appl.; 2022; 203, 117514. [DOI: https://dx.doi.org/10.1016/j.eswa.2022.117514] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35607612]
9. Yan, M.; Huang, C.; Bienstman, P.; Tino, P.; Lin, W.; Sun, J. Emerging opportunities and challenges for the future of reservoir computing. Nat. Commun.; 2024; 15, 2056. [DOI: https://dx.doi.org/10.1038/s41467-024-45187-1]
10. Bukhari, A.H.; Raja, M.A.Z.; Alquhayz, H.; Almazah, M.M.; Abdalla, M.Z.; Hassan, M.; Shoaib, M. Predictive analysis of stochastic stock pattern utilizing fractional order dynamics and heteroscedastic with a radial neural network framework. Eng. Appl. Artif. Intell.; 2024; 135, 108687. [DOI: https://dx.doi.org/10.1016/j.engappai.2024.108687]
11. Zhou, X.; Hao, Y.; Liu, Y.; Dang, L.; Qiao, B.; Zuo, X. Short-term prediction of dissolved oxygen and water temperature using deep learning with dual proportional-integral-derivative error corrector in pond culture. Eng. Appl. Artif. Intell.; 2025; 142, 109964. [DOI: https://dx.doi.org/10.1016/j.engappai.2024.109964]
12. Chattopadhyay, A.; Hassanzadeh, P.; Subramanian, D. Data-driven predictions of a multiscale Lorenz 96 chaotic system using machine-learning methods: Reservoir computing, artificial neural network, and long short-term memory network. Nonlinear Process. Geophys.; 2020; 27, pp. 373-389. [DOI: https://dx.doi.org/10.5194/npg-27-373-2020]
13. Gonbadi, L.; Rostami, H.; Sahafizadeh, E.; Rostami, S.; Nejad, M.M.; Shirzadi, A. Input driven optimization of echo state network parameters for prediction on chaotic time series. Sci. Rep.; 2025; 15, 33005. [DOI: https://dx.doi.org/10.1038/s41598-025-18261-x]
14. Chen, X.; Chen, L.; Li, S.; Jin, L. A mirrored echo state network with application to time series prediction. Inf. Sci.; 2025; 716, 122260. [DOI: https://dx.doi.org/10.1016/j.ins.2025.122260]
15. Wang, H.; Wang, Z.; Yu, M.; Liang, J.; Peng, J.; Wang, Y. Grouped Vector Autoregression Reservoir Computing Based on Randomly Distributed Embedding for Multistep-Ahead Prediction. IEEE Trans. Neural Netw. Learn. Syst.; 2025; 36, pp. 17265-17279. [DOI: https://dx.doi.org/10.1109/TNNLS.2025.3553060]
16. Na, X.; Ren, W.; Liu, M.; Han, M. Hierarchical Echo State Network With Sparse Learning: A Method for Multidimensional Chaotic Time Series Prediction. IEEE Trans. Neural Netw. Learn. Syst.; 2023; 34, pp. 9302-9313. [DOI: https://dx.doi.org/10.1109/TNNLS.2022.3157830] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35333719]
17. Tang, Y.; Song, Z.; Zhu, Y.; Hou, M.; Tang, C.; Ji, J. Adopting a dendritic neural model for predicting stock price index movement. Expert Syst. Appl.; 2022; 205, 117637. [DOI: https://dx.doi.org/10.1016/j.eswa.2022.117637]
18. Kleyko, D.; Frady, E.P.; Kheffache, M.; Osipov, E. Integer Echo State Networks: Efficient Reservoir Computing for Digital Hardware. IEEE Trans. Neural Netw. Learn. Syst.; 2022; 33, pp. 1688-1701. [DOI: https://dx.doi.org/10.1109/TNNLS.2020.3043309] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33351770]
19. Hassaan, Z.A.; Yacoub, M.H.; Said, L.A. Gated recurrent unit accelerators for financial time series prediction on field-programmable gate array. Eng. Appl. Artif. Intell.; 2025; 162, 112534. [DOI: https://dx.doi.org/10.1016/j.engappai.2025.112534]
20. Sharobim, B.K.; Yacoub, M.H.; Sayed, W.S.; Radwan, A.G.; Said, L.A. Artificial Neural Network Chaotic PRNG and simple encryption on FPGA. Eng. Appl. Artif. Intell.; 2023; 126, 106888. [DOI: https://dx.doi.org/10.1016/j.engappai.2023.106888]
21. Mao, G.; Rahman, T.; Maheshwari, S.; Pattison, B.; Shao, Z.; Shafik, R.; Yakovlev, A. Dynamic Tsetlin Machine Accelerators for On-Chip Training Using FPGAs. IEEE Trans. Circuits Syst. I Regul. Pap.; 2025; 72, pp. 6962-6975. [DOI: https://dx.doi.org/10.1109/TCSI.2025.3564875]
22. Wasef, M.; Rafla, N. SoC Reconfigurable Architecture for Implementing Software Trained Recurrent Neural Networks on FPGA. IEEE Trans. Circuits Syst. I Regul. Pap.; 2023; 70, pp. 2497-2510. [DOI: https://dx.doi.org/10.1109/TCSI.2023.3262479]
23. Liu, F.; Li, H.; Hu, W.; He, Y. Review of neural network model acceleration techniques based on FPGA platforms. Neurocomputing; 2024; 610, 128511. [DOI: https://dx.doi.org/10.1016/j.neucom.2024.128511]
24. Foreign Currency Financial Reporting from Euro to Yen to Yuan; Wiley: Hoboken, NJ, USA, 2012; pp. 251-266. [DOI: https://dx.doi.org/10.1002/9781119202448.app1]
25. Zhang, X.; Zhu, Y.; Lou, X. Reconfigurable and Energy-Efficient Architecture for Deploying Multi-Layer RNNs on FPGA. IEEE Trans. Circuits Syst. I Regul. Pap.; 2024; 71, pp. 5969-5982. [DOI: https://dx.doi.org/10.1109/TCSI.2024.3464687]
26. AbdElbaky, M.H.; Yacoub, M.H.; Sayed, W.S.; Said, L.A. High-performance FPGA-accelerated LSTM neural network for chaotic time series prediction. AEU Int. J. Electron. Commun.; 2025; 199, 155845. [DOI: https://dx.doi.org/10.1016/j.aeue.2025.155845]
27. Xu, H.; Chai, L.; Luo, Z.; Li, S. Stock movement prediction via gated recurrent unit network based on reinforcement learning with incorporated attention mechanisms. Neurocomputing; 2022; 467, pp. 214-228. [DOI: https://dx.doi.org/10.1016/j.neucom.2021.09.072]
28. Farhadi, A.; Zamanifar, A.; Alipour, A.; Taheri, A.; Asadolahi, M. A Hybrid LSTM-GRU Model for Stock Price Prediction. IEEE Access; 2025; 13, pp. 117594-117618. [DOI: https://dx.doi.org/10.1109/ACCESS.2025.3586558]
29. Li, T.; Guo, Z.; Li, Q. Decomposition-based deep projection-encoding echo state network for multi-scale and multi-step wind speed prediction. Expert Syst. Appl.; 2025; 266, 126074. [DOI: https://dx.doi.org/10.1016/j.eswa.2024.126074]
30. Gong, Y.; Lun, S.; Li, M. Broad-ESN Based on Radical Activation Function for Predicting Time Series With Multiple Variables. IEEE Trans. Neural Netw. Learn. Syst.; 2025; 36, pp. 17310-17321. [DOI: https://dx.doi.org/10.1109/TNNLS.2025.3563937]
31. Delshad, A.; Cherry, E.M. Predicting complex time series with deep echo state networks. Chaos; 2025; 35, 093126. [DOI: https://dx.doi.org/10.1063/5.0283425]
32. Sharifi Ghazijahani, M.; Cierpka, C. On the spatial prediction of the turbulent flow behind an array of cylinders via echo state networks. Eng. Appl. Artif. Intell.; 2025; 144, 110079. [DOI: https://dx.doi.org/10.1016/j.engappai.2025.110079]
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.