1. Introduction
With global carbon emissions continually rising, climate change impacts are increasingly pronounced. Achieving carbon peaking and carbon neutrality has thus become a shared international objective [1]. Traditional fossil fuel combustion emits substantial greenhouse gases, exacerbating global warming and associated environmental issues such as rising sea levels [2,3]. Consequently, countries are pivoting to clean energy, emphasising renewables such as hydropower, wind, and solar [4,5].
Pumped-storage hydropower plants (PSHPs) are vital to grids dominated by renewables. By pumping water from the lower reservoir to the upper reservoir during off-peak electricity demand periods, PSHPs convert electrical energy into potential energy. At peak demand, or when renewables fall short, stored water is released through turbines, improving grid flexibility and stability [6,7]. PSHPs effectively mitigate the intermittency of renewable energy sources and improve overall power system reliability. Compared to fossil fuel generation, PSHPs offer higher round-trip efficiency (typically exceeding 70%), longer service life, lower maintenance costs, and near-zero direct carbon emissions, making them an environmentally friendly energy solution.
Short-term PSHP scheduling distributes load across units to minimise water use while respecting constraints such as vibration zones [8]. The process includes unit commitment (startup/shutdown status) [9] and load dispatch (assigned power output) [10]. The strong nonlinearity of PSHPs makes scheduling complex, so researchers rely on the Equal Incremental Rate method, Dynamic Programming (DP), and various intelligent algorithms.
The Equal Incremental Rate method [11] is relatively easy to implement and has found limited application in low-dimensional problems. In hydropower load allocation, ref. [12] proposed an equal-incremental-rate algorithm that identifies multiple solution regions. The method was validated with a hybrid model and applied to Geheyan Hydropower Station. Ref. [13] developed a short-term scheduling model for large hydropower stations and solved it with multi-core parallel DP. Ref. [14] merged progressive-structure and progressive-step DP into a hybrid approach and validated it on hydropower load allocation. Although DP is popular for PSHP scheduling, state-space discretisation leads to a curse of dimensionality in high-dimensional cases.
In recent years, optimisation algorithms have proliferated because of their simplicity and versatility [15,16]. Efficient methods such as Particle Swarm Optimisation [17] and Simulated Annealing [18] have demonstrated outstanding performance. Meanwhile, novel algorithms continue to emerge. Ref. [19] proposed a two-stage firefly optimisation algorithm with distinct encoding schemes for the economic load-dispatch problem. The authors added a dynamic patching mechanism to escape local optima, and numerical simulations confirmed its effectiveness. Ref. [20] introduced a two-layer optimisation framework that utilised the Cuckoo Search algorithm to solve a hydropower short-term scheduling model considering photovoltaic generation uncertainty. Compared with actual operations, the proposed deterministic and stochastic schemes reduced water consumption by 1.5% and 1.0%, respectively.
Of the three categories, the Equal Incremental Rate method is simple but increasingly unsuitable for nonlinear, high-dimensional problems. The DP method requires a high degree of discretisation, with its accuracy increasing with finer granularity. However, this also leads to a significant increase in computational demand and may result in the “curse of dimensionality”. Although intelligent algorithms offer relatively high solving efficiency, their accuracy often depends on manually tuned parameters and prior experience. Moreover, these methods tend to exhibit instability, and their results may not be reliably reproducible.
Recent advances in artificial intelligence, particularly Deep Reinforcement Learning (DRL), have offered promising tools for tackling complex, nonlinear optimisation problems. DRL fuses deep learning’s representation strength with reinforcement learning’s search efficiency, yielding strong results in energy scheduling. Ref. [21] employed the Deep Q-Network (DQN) approach to construct a short-term optimal scheduling framework for a multi-energy power system integrating hydropower, wind, and solar energy. Using real wind- and solar-power data, the framework markedly improved generation efficiency and decision quality. Ref. [22] integrated twin-delayed deep with learning-rate annealing and hindsight prioritised replay to build a battery-degradation model. The proposed system reduced costs and proved more adaptable. Nonetheless, its application to PSHP load optimisation has so far been limited. Widely used DRL variants such as Deep Q-Networks (DQNs), actor–critic schemes, and Proximal Policy Optimisation (PPO) [23] have encountered convergence difficulties, significant implementation complexity, or constraints in continuous-action spaces. The Deep Deterministic Policy Gradient (DDPG) algorithm, expressly devised for continuous, high-dimensional state–action domains, therefore emerges as a promising alternative [24].
To address these challenges, this study presents an intelligent scheduling framework for pumped-storage hydropower units by combining an Atomic Orbital Search-optimised Long Short-Term Memory network with the Deep Deterministic Policy Gradient algorithm (AOS-LSTM-DDPG). The AOS-LSTM model accurately fits unit flow characteristics using power output, water head, and discharge as inputs, significantly improving prediction accuracy and response efficiency. These fitted curves are embedded into a Markov decision process to guide the DDPG agent in learning water-efficient scheduling strategies. Operational constraints, including vibration zone avoidance, are incorporated into the reward design to enhance real-world applicability. Under a representative daily load scenario, the proposed method exhibits stable convergence and effectively reduces both water consumption and vibration events.
The main innovations and contributions of this study are as follows: High-precision flow-efficiency curve fitting: AOS tunes LSTM hyperparameters to model nonlinear flow-efficiency curves more accurately than traditional methods, giving reliable input for scheduling. Constraint-aware DRL embedded with fitted flow curves: Embedding the fitted curves into a DDPG-based Markov process yields a policy that minimises water use while respecting constraints such as vibration zone avoidance. The model offers fast inference, stable convergence, and superior water-saving and economic outcomes.
Other parts of this study are as follows: Section 2 models the short-term scheduling problem, detailing the objective and constraints. Section 3 describes the AOS-LSTM-DDPG framework. Section 4 presents case studies and comparative analysis. Finally, Section 5 concludes with key findings, limitations, and future work.
2. The Problem Description
This study investigates a short-term load optimisation scheduling model for pumped-storage hydropower units.
2.1. Objective Function
A PSHP must meet grid demand by adjusting output for peak-shaving and frequency regulation. At the same time, total water use should be minimised to improve efficiency. Accordingly, the optimisation objective is to minimise water consumption over the scheduling horizon:
(1)
where is the total water consumption over the scheduling horizon (m3); T is the total number of time intervals within the scheduling period; N represents the number of operable units in the pumped-storage power station; and is the predicted water consumption obtained from the prediction model, taking the operational water head and power output of the unit at time interval t as inputs.The plant under study uses fixed-speed pump turbines whose pumping power is either rated or zero; pumping-mode water use is therefore excluded, and only generation-mode consumption is optimised.
2.2. Constraints
When optimising PSHP generation schedules, multiple operational constraints must be considered. These constraints ensure safe and stable system operation, meet grid load requirements, optimise operational costs, and improve economic performance. Specific descriptions of these constraints are described below.
2.2.1. Load Balance Constraint:
The sum of power outputs from all units must satisfy the grid load demand at any given time [25]:
(2)
where is the power output of the nth PSHP unit at time (MW); is the grid load demand at time t (MW); and is a binary variable representing the operating status of the nth PSHP unit at time t, defined as follows:(3)
2.2.2. Unit Output Constraints:
The power output of each unit must remain within its permissible operating range:
(4)
(5)
where are the minimum and maximum allowable power outputs for unit n (MW).2.2.3. Generating Flow Rate Constraints
The generating flow rate for each unit must be maintained within its allowable operational range:
(6)
where are the minimum and maximum allowable generating flow rates for unit n (m3/s).2.2.4. Unit Vibration Zone Constraints
Units should avoid operating within vibration zones to the greatest extent possible:
(7)
where and represent the upper and lower power output limits (MW), respectively, for the i-th vibration zone of unit n underwater head H.2.2.5. Vibration Zone Crossing Risk Constraint
To reduce risks associated with frequent transitions through vibration zones during unit operation, the frequency of units crossing vibration zones must be constrained. This constraint can be expressed as follows:
(8)
where represents the vibration zone crossing risk index for the units at time t; denotes the maximum allowable vibration zone crossing risk index at time t.Given a determined load allocation plan, the specific calculation of the risk index is as follows:
For
(9)
where is the number of vibration zone crossings for unit within the time interval ; is the probability that unit n continuously operates within the recommended power range [A, B] during the interval (t − 1,t); and [A, B] is the recommended operational power output range (MW).3. Materials and Methods
3.1. Refined Fitting Strategy for Unit Flow Characteristic Curves Based on AOS-LSTM
To accurately calculate water consumption for the load allocation of pumped-storage hydropower (PSHP) units, this study proposed a refined fitting strategy using an Atomic Orbital Search (AOS)-optimised Long Short-Term Memory (LSTM) neural network, abbreviated as AOS-LSTM. The trained model enhanced computational precision and response speed when integrated into the load optimisation scheduling model.
Flow characteristic curves of pump-turbine units are essential for PSHP operation and design, describing relationships between power output and flow rate under various conditions. Traditional approaches for constructing these curves involve fitting discrete measurement data [26] or using specialised simulation software [27]. However, these methods have limitations such as dependency on measured data, limited fitting accuracy, and complex modelling processes. Recent advances in artificial intelligence have explored intelligent algorithms and deep learning [28] to enhance accuracy and generalisation performance.
In this work, the AOS algorithm was employed to optimise key hyperparameters of an LSTM network, including hidden-layer width, batch size, and iteration count. The AOS-LSTM approach significantly improved prediction accuracy and response efficiency, thereby providing reliable inputs for PSHP load-optimisation scheduling.
3.1.1. LSTM Neural Network Structure
Recurrent Neural Networks (RNNs) are widely used for processing sequential data but often suffer from gradient vanishing and limited long-term memory. Long Short-Term Memory (LSTM) networks use input, forget, and output gates to overcome these problems and have proven effective in power-system analysis and fault diagnosis [29,30]. In this study, the LSTM is used to fit the flow characteristic curves of hydropower units. The LSTM memory cell (Figure 1) comprises three gates.
3.1.2. Atomic Orbital Search (AOS) Algorithm
To tune the LSTM’s hyperparameters (hidden neurons, batch size, training epochs), we adopt AOS—a new population-based metaheuristic inspired by electron motion in quantum orbitals [31]. AOS offers robust global search with few parameters and has proved effective in feature selection [32], photovoltaic parameter estimation [33], and path planning.
Basic Principle
AOS models each candidate solution as an electron moving within hypothetical concentric shells around an atomic nucleus. Electron transitions follow quantum rules: absorbing limited energy moves the electron to a higher shell; otherwise, it drops to a lower shell and emits a photon. This process guides the exploration and exploitation dynamics in the search space.
Initialisation
The initial positions of solution candidates (electrons) are randomly assigned based on the following:
(10)
where is the position of the sth candidate in the jth dimension, and is a uniformly distributed random number. The candidate distribution is further shaped by a probability density function (PDF), which defines the likelihood of electrons appearing at different radial distances from the nucleus (see Figure 2) [31].Binding State and Energy
To simulate atomic behaviour, the algorithm evaluates the average position and energy (fitness value) of candidates within each shell:
(11)
The global binding state and energy of the population are given by the following:
(12)
where k and m denote the number of candidates in a shell and the total population size, respectively.Search and Update Mechanism
The search process is controlled by a photon rate parameter, PR, representing the probability of photon–electron interaction. For each candidate, a random number is generated:
If , the electron performs exploratory motion:
(13)
where is a random direction vector in .If , the electron either emits or absorbs a photon depending on its fitness:
Photon emission ( ):
(14)
Photon emission ( ):
(15)
Here, is the position of the candidate with the lowest energy in the zth layer, and are randomly generated coefficients in [0, 1].
Advantages and Integration of AOS in LSTM Training
Compared with traditional optimisation algorithms such as Particle Swarm Optimisation (PSO), the AOS algorithm offers several notable advantages for hyperparameter optimisation: strong global search driven by probabilistic electron dynamics;
Strong global search capability: the probabilistic modelling of electron motion enables effective exploration of the search space;
Ability to escape local optima: the energy-level transition mechanism allows the algorithm to overcome local minima;
Well-balanced exploration and exploitation: the multi-layer energy structure inherently maintains a balance between global exploration and local exploitation;
Low parameter dependency: AOS requires fewer control parameters, making it easy to integrate with deep learning models.
Therefore, this study integrates AOS into the LSTM training framework, forming the AOS-LSTM method to enhance the predictive accuracy and training stability in modelling the flow characteristics of pumped-storage hydropower units.
3.1.3. LSTM Neural Network Model Optimised by AOS
The structural hyperparameters of LSTM neural networks have a significant influence on their prediction accuracy. AOS tunes three hyperparameters—hidden neurons, batch size, and training epochs. Each hyperparameter set is viewed as a candidate solution and updated via the electron-orbital transition model [34]. In each iteration, the positions of the candidate solutions are adjusted to minimise the fitness function (prediction error), thereby obtaining the optimal hyperparameter combination. The dependent variables refer to the LSTM model’s prediction performance under a given set of hyperparameters. The primary indicators are the mean absolute error (MAE), the root mean square error (RMSE), and the convergence speed of training. These metrics constitute the fitness function used to guide the AOS search process. The detailed optimisation procedure of AOS-LSTM is described as follows: Initialisation: generate an initial set of candidate LSTM hyperparameter solutions randomly, analogous to electron positions within an atomic system. Orbital updating: reassign electrons to orbitals according to fitness (validation loss) and inter-electron spacing. Position updating: perturb each electron’s position (hyperparameters) by a stochastic step drawn from its current orbital. Energy (fitness) evaluation: re-compute fitness; retain the new position if loss decreases, otherwise revert or downgrade its energy level. Iterative optimisation: repeat steps 2 through 4 until the predefined maximum number of iterations is reached or a convergence criterion is satisfied. The specific algorithm flowchart is illustrated in Figure 3.
3.1.4. Model Evaluation Metrics
Model accuracy is quantified with the mean absolute error (MAE) and the root mean square error (RMSE) [35]; computational efficiency is gauged by training time. Smaller MAE/RMSE values imply higher predictive accuracy, whereas shorter training time indicates better computational efficiency. The error measures are defined as follows: Mean absolute error (MAE):
(16)
Root mean square error (RMSE):
(17)
where n is the number of samples, and and denote the predicted and actual values for the sample, respectively.3.2. DDPG-Based Load Optimisation Scheduling Model for Pumped-Storage Units
3.2.1. DDPG Model
In reinforcement learning, a Markov decision process (MDP) describes how an agent interacts with its environment [36].
An MDP consists of five key components: state space , action space , state transition probabilities , reward function , and discount factor . At each time step t, the agent observes the current environment state and selects an action . The environment then returns a reward and transitions to the next state . Through continuous interaction, the agent gradually learns the optimal policy to maximise cumulative long-term rewards.
In this study, the load optimisation scheduling problem for pumped-storage units is formulated as a Markov decision process (MDP) and solved using the DDPG algorithm to handle continuous action spaces. The actor network outputs each unit’s power set-point at every step. Policy gradients, guided by the critic, continually refine these set-points.
The dependent variables include feedback from the environment, specifically the following: Reward: a composite metric reflecting unit water consumption, load balance, and vibration zone avoidance; System State: information such as the current water head, load demand, and previous outputs.
The reward and state feedback evaluate actions and drive policy updates via back-propagation.
Basic Principle
The action space is defined as the continuous output power range of pumped-storage units:
(18)
where denotes the output power of unit n at time t (MW), and represents the maximum permissible output power of unit n (MW).State Space Design
The state space should fully capture environmental information to enhance the decision-making efficiency and stability of the agent. At any given time step t, the state space is defined as follows:
(19)
where each parameter is consistent with the definitions provided earlier, representing grid load demand, water head, the upper and lower power bounds of vibration zones, and allowable minimum and maximum flow rates, respectively.Reward Function Design
The design of the reward function directly influences the agent’s learning performance in load optimisation scheduling for pumped-storage units. In this study, operational constraints are transformed into penalty terms within the reward function to achieve the optimisation objectives:
Vibration Zone Penalty Term:
(20)
Vibration Zone Crossing Risk Penalty Term:
(21)
where, , and are positive penalty coefficients.Consequently, the reward function for the pumped-storage unit load optimisation scheduling model is defined as follows:
(22)
where denotes the water consumption index at the current time step t; refers to a theoretical baseline of maximum water consumption (the total water usage when all units operate at full load); by subtracting the actual water consumption from this value, a water-saving index can be derived and incorporated into the reward function. To balance the magnitudes between the water-saving term and the penalty terms, the reward function was scaled by introducing a normalisation factor . Maximising this reward function thus corresponds to minimising both water consumption and operational risk. As shown in Figure 4, the overall framework of the DDPG algorithm for pumped-storage unit load optimisation scheduling integrates the plant environment with the agent learning process.4. Results and Discussion
To validate the applicability and effectiveness of the proposed model, a case study was conducted using real operational data from a pumped-storage hydropower station in China. The plant has a total installed capacity of 1200 MW, consisting of four Francis-type reversible pump-turbine units, each rated at 300 MW.
4.1. Fitting of Unit Flow Characteristic Curves
This study proposes a high-precision method for fitting pump-turbine flow curves using an AOS-optimised LSTM network. First, the collected NHQ data under all operating conditions were preprocessed as follows: All data features were normalised to eliminate the influence of scale differences. Missing values in the dataset were imputed using interpolation; The time-series data were transformed into a sliding-window format, facilitating their input into the LSTM model for subsequent prediction.
These steps improved training efficiency and helped the model capture key patterns. After preprocessing, the data dimensions were (3135, 3).
The preprocessed data were fed into the established LSTM neural network model. This study then applies AOS to tune three key hyperparameters: hidden-layer size, batch size, and training epochs. Figure 5 compares the fitness-value convergence of AOS-LSTM and PSO-LSTM.
Figure 5 shows both algorithms’ fitness values drop with each epoch and then plateau, confirming convergence. AOS-LSTM stabilises by the seventh epoch, whereas PSO-LSTM needs about twenty, indicating faster convergence. These results highlight AOS’s stronger global search and local refinement. Moreover, in terms of final converged values, the fitness value of AOS-LSTM stabilised around 0.0245, slightly lower than that of PSO-LSTM. The corresponding optimal network parameters obtained by the AOS-LSTM were determined to be 56 hidden-layer neurons, a batch size of 96, and 279 training epochs.
To ensure robust evaluation under the inherent uncertainty of intelligent algorithms, ten independent experiments were conducted with identical initial conditions. For additional validation, the model was compared with Back Propagation Neural Network (BPNN), LSTM, and PSO-LSTM on MAE, RMSE, and computation time (Table 1). The results confirm the method’s superior performance.
The prediction results of the four models are illustrated in Figure 6.
Table 1 shows that the AOS-LSTM outperforms traditional BPNN and LSTM in accuracy while keeping computation time reasonable. Specifically, the MAE values of the proposed model decreased by 87.11% and 84.25%, and the minimum RMSE values decreased by 86.68% and 81.49%, respectively. Against PSO-LSTM, AOS-LSTM cuts runtime and improves MAE and minimum RMSE by 69% and 65%, respectively. Figure 6 confirms that AOS-LSTM tracks both overall trends and extremes. Furthermore, the deviation in results across ten independent experiments did not exceed 0.6%, confirming the stability of the proposed approach. Therefore, the AOS-LSTM method demonstrates excellent adaptability and substantial potential for future precise predictions.
4.2. Training Procedure of the DDPG Model
In deep reinforcement learning, the proper selection and tuning of parameters significantly influence algorithm performance and convergence speed, among which the learning rate is particularly critical. An excessive learning rate makes the reward oscillate and prevents stable convergence. Conversely, a learning rate that is too small can slow down the improvement of rewards, prolonging the training process. Therefore, the learning rate often requires careful, repeated tuning to obtain good results. After conducting extensive parameter optimisation experiments, the final parameter values employed in this study are summarised in Table 2.
In addition to network-related hyperparameters, the reward function incorporates several penalty terms governed by corresponding coefficients, including vibration zone penalty () and transition-through-vibration-zone penalties ( and ). A sensitivity analysis assessed how these penalties affect training stability and performance. For efficiency, sensitivity runs were capped at 80,000 iterations enough to reveal clear performance trends. Figure 7 plots cumulative rewards for three representative penalty sets.
A small penalty () speeds early convergence but ends with a low cumulative reward, showing weak constraint enforcement and many infeasible schedules. A large penalty () slows convergence and produces volatile rewards; the agent over-penalises exploration, becoming conservative and less generalisable. A moderate penalty () gives stable learning and the highest final reward, balancing water-saving and operational constraints.
Penalty choice is critical: too small weakens constraints, too large hampers learning. Therefore, the penalty coefficient combination () is adopted for the final model configuration in the subsequent experiments.
Training used stochastic exploration of the action space. Random noise added to actions encouraged early exploration and avoided local optima. The model underwent 150,000 training iterations, and the results are illustrated in Figure 8. As depicted, the agent’s learning process can be clearly divided into several phases: an initial exploration phase (iterations 0 to approximately 30,000), during which the agent continuously adapted to and explored the environment; a subsequent learning and optimisation phase (approximately 30,000–100,000 iterations), in which the agent progressively improved its decision-making policy; and finally, after about 100,000 iterations, the cumulative reward began to stabilise, indicating that the agent had effectively learned to make optimal decisions in the stochastic environment, thus maximising cumulative rewards. However, slight fluctuations in the stabilised cumulative reward values were still observable due to the introduction of random noise during action selection, which enhanced the model’s capability to avoid local optima and thereby improved its generalisation and adaptability.
The model was developed using the Python programming language (version 3.10.13) and the PyTorch deep learning framework (version 2.3.1) and was implemented in the PyCharm (2024.1.4) development environment. All training and testing processes were executed using an NVIDIA (Santa Clara, CA, USA) GeForce RTX 4060 GPU.
4.3. Decision Analysis of the Proposed Model
A real-world case study on a Chinese PSHP station tests the proposed strategy. The selected daily scenario represents a typical summer weekday in the southern power grid, where the pumped-storage station operates in a “one pumping and two generating” mode. Specifically, electricity is generated during the morning and evening peak hours to meet high grid demand, and water is pumped during the night when electricity consumption is low.
Figure 9 plots the daily load across 96 fifteen-minute intervals. The time axis (0–96) spans the full 24 h cycle. Positive values denote generation; negative values denote pumping. Figure 10 illustrates the head variation under the selected representative daily scheduling scenario.
To benchmark AOS-LSTM-DDPG, three methods are compared: Dynamic Programming (DP), Particle Swarm Optimisation (PSO), and standard DDPG. Figure 8 illustrates the unit power output distributions of the pumped-storage hydropower units under the daily load curve for the three optimisation strategies (DP, PSO, and AOS-LSTM-DDPG), where Unit1# through Unit4# denote the four pumped-storage units. Table 2 compares economic performance and operational risk for each strategy. Results are discussed in terms of economy and operational security.
Economic Performance Evaluation
As shown in Figure 11, both AOS-LSTM-DDPG and DP achieve stepwise load allocation across four generating units. During the morning and evening peaks, all units run steadily at 70–100% of rated output, boosting overall efficiency. In contrast, the output of PSO appears disordered, exhibiting significant imbalance; some units are either overloaded or underloaded, resulting in energy efficiency losses.
In terms of quantitative metrics, in Table 3, the water consumption of AOS-LSTM-DDPG is , which is approximately 0.85%, 1.78%, and 2.36% lower than that of DDPG, PSO, and DP, respectively, indicating superior water-saving performance. Moreover, the inference time for both AOS-LSTM-DDPG and standard DDPG is within 1 s, satisfying the requirements for fast-response scheduling. By comparison, DP and PSO take 206.59 s and 10.82 s; DP’s discrete search struggles with real-time, continuous inputs.
2.. Operational Safety and Risk Assessment
From an operational risk perspective, the AOS-LSTM-DDPG method clearly outperforms the other approaches. Table 2 reports only two vibration-zone operations and two crossings for AOS-LSTM-DDPG. In contrast, DP resulted in 29 such operations and 14 transitions, which are approximately 1350% and 600% higher, respectively. PSO yielded 22 operations and 18 transitions, about 1000% and 800% higher. Even the standard DDPG, which lacks flow-efficiency guidance, experienced six and five occurrences, respectively, around 200% and 150% higher. These results show that flow-efficiency guidance markedly improves constraint awareness and risk avoidance
As shown in Figure 11a, AOS-LSTM-DDPG ensures that all units operate steadily within the high-efficiency range, avoiding inefficient low-load conditions. It also minimises the number of startups, thereby reducing equipment wear. In contrast, the PSO strategy illustrated in Figure 11c exhibits frequent switching and load fluctuations, causing abrupt changes in operational states, significantly increasing vibration risk, and reducing scheduling stability.
In summary, the proposed AOS-LSTM-DDPG framework offers superior economic efficiency and operational robustness. By combining flow-curve-informed state representation with deep reinforcement learning, the model effectively balances optimal scheduling with constraint satisfaction. It not only achieves lower water consumption and faster decision speed but also greatly mitigates vibration-zone risk, affirming its practical value for the intelligent dispatch of pumped-storage hydropower units.
5. Conclusions
This study proposes AOS-LSTM-DDPG, which combines high-precision flow-curve fitting with a constraint-aware DRL scheduler for pumped-storage units. The method accounts for real-world operational constraints, including system load, vibration zone avoidance, unit startup/shutdown duration, and operational state transitions. The main conclusions are as follows: High-accuracy flow-curve fitting: The proposed AOS-optimised LSTM model accurately captures the flow-efficiency characteristics of PSH units. Compared with traditional fitting methods, it improves prediction accuracy by at least 65.35%, providing physical guidance and enhancing the agent’s constraint-awareness during dispatch. Efficient DRL-based scheduling: With the inclusion of flow-feature guidance, the AOS-LSTM-DDPG model demonstrates stable convergence during 2 million training iterations and supports real-time inference within 1 s. Under a representative daily load scenario, it achieves the lowest water consumption (1.983 × 107 m3), outperforming standard DDPG (−0.85%), PSO (−1.78%), and DP (−2.36%) in economic efficiency. Significant improvement in operational safety: The proposed method records only two vibration-zone operations and two transitions, representing a reduction of over 93.1%/85.7% compared to DP (29/14 events) and 90.9%/88.9% compared to PSO (22/18 events). This highlights its superior capability in constraint compliance and operational stability.
In terms of limitations, the tests covered one day only; future work will extend to multi-day and seasonal cases. Future research will be extended to cover cross-day and seasonal variations using multi-timescale scheduling frameworks; fixed penalty weights may curb adaptability-dynamic weighting and multi-objective DRL will be explored next. Future work could explore multi-objective DRL frameworks.
In conclusion, the proposed AOS-LSTM-DDPG method demonstrates significant advantages in economic performance, safety, and decision-making efficiency, making it a promising tool for intelligent and real-time scheduling in complex PSHP systems.
Software, X.M. and C.H.; writing—original draft, X.M. and C.H.; writing—review and editing, H.P., Y.Z. and X.W.; data curation, L.L. All authors have read and agreed to the published version of the manuscript.
Data are contained within the article.
Author Chenyang Hang was employed by the company NARI Group Corporation (State Grid Electric Power Research Institute). The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Figure 1 Structure of an LSTM memory cell.
Figure 2 Schematic diagram of PDF determining the distribution of the candidate solutions.
Figure 3 Flowchart of the AOS-LSTM algorithm.
Figure 4 The framework of the DDPG algorithm for pumped-storage load optimisation scheduling.
Figure 5 The fitness-value convergence curve.
Figure 6 Comparison of prediction results among four models.
Figure 7 Cumulative reward curves under different penalty coefficient combinations.
Figure 8 Cumulative reward evolution during agent training.
Figure 9 Daily load curve in “one pumping and two generating” modes.
Figure 10 Head variation curve during the scheduling period.
Figure 11 Comparison of unit power-dispatch results obtained with three optimisation methods.
Comparison of prediction performance among four models.
Model | MAE | Minimum RMSE | Average RMSE | Computation Time (s) |
---|---|---|---|---|
BPNN | 1.017 | 1.187 | 1.954 | 12.859 |
LSTM | 0.832 | 0.854 | 1.246 | 13.032 |
PSO-LSTM | 0.422 | 0.456 | 1.086 | 68.452 |
AOS-LSTM | 0.131 | 0.158 | 0.625 | 51.365 |
Specific parameters of the DDPG model.
Parameter | Critic-Network | Actor-Network |
---|---|---|
Learning rate | 0.00004 | 0.00003 |
Soft update coefficient | 0.01 | 0.01 |
Number of network layers | 3 | 3 |
Neurons per layer | 64 | 64 |
Hidden-layer activation | ReLU | ReLU |
Output-layer activation | / | Tanh |
Training episodes | 1500 | 1500 |
Comparison of calculation indicators for four models.
Model | Model Training Time (h) | Decision Time (s) | Water Consumption (×107 m3) | In-Zone Operations | Zone Crossings |
---|---|---|---|---|---|
DP | ____ | 206.59 | 2.031 | 29 | 14 |
PSO | 10.82 | 2.019 | 22 | 18 | |
DDPG | 2.8 | 0.7 | 2 | 6 | 5 |
AOS-LSTM-DDPG | 2.9 | 0.74 | 1.983 | 2 | 2 |
1. Zhang, F.; Wang, X.; Liu, G. Allocation of carbon emission quotas based on global equality perspective. Environ. Sci. Pollut. Res.; 2022; 29, pp. 53553-53568. [DOI: https://dx.doi.org/10.1007/s11356-022-19619-8]
2. Guo, X.; Huang, K.; Li, L.; Wang, X. Renewable energy for balancing carbon emissions and reducing carbon transfer under global value chains: A way forward. Sustainability; 2022; 15, 234. [DOI: https://dx.doi.org/10.3390/su15010234]
3. Han, X.; Ding, L.; Chen, G.; Liu, J.; Lin, J. Key technologies and research prospects for cascaded hydro-photovoltaic-pumped storage hybrid power generation system. Trans. China Electrotech. Soc.; 2020; 35, pp. 2711-2722.
4. Azarova, E.; Jun, H. Investigating determinants of international clean energy investments in emerging markets. Sustainability; 2021; 13, 11843. [DOI: https://dx.doi.org/10.3390/su132111843]
5. Chai, R.; Li, G. Renewable clean energy and clean utilization of traditional energy: An evolutionary game model of energy structure transformation of power enterprises. Syst. Eng. Theory Pract.; 2022; 42, pp. 184-197.
6. Han, M.; Chang, X.; Li, J.; Yang, G.; Shang, T. Application and development of pumped storage technology. Sci. Technol. Rev.; 2016; 34, pp. 57-67.
7. Xu, R.; Zhang, J.; Liu, M.; Cao, C.; Chao, X. Life cycle cost of electrochemical energy storage and pumped storage. Adv. Technol. Electr. Eng. Energy; 2021; 40, pp. 10-18.
8. Zhao, Z.; Jin, C.C.X.; Liu, L.; Yan, L. A MILP model for hydro unit commitment with irregular vibration zones based on the constrained Delaunay triangulation method. Int. J. Electr. Power Energy Syst.; 2020; 123, 106241. [DOI: https://dx.doi.org/10.1016/j.ijepes.2020.106241]
9. Vieira, D.A.G.; Costa, E.E.; Campos, P.H.F.; Mendonça, M.O.; Silva, G.R.L. A real-time nonlinear method for a single hydropower plant unit commitment based on analytical results of dual decomposition optimization. Renew. Energy; 2022; 192, pp. 513-525. [DOI: https://dx.doi.org/10.1016/j.renene.2022.04.080]
10. Cheng, X.; Feng, S.; Zheng, H.; Wang, J.; Liu, S. A hierarchical model in short-term hydro scheduling with unit commitment and head-dependency. Energy; 2022; 251, 123908. [DOI: https://dx.doi.org/10.1016/j.energy.2022.123908]
11. Shi, C.; Wei, T.; Tang, X.; Zhou, L.; Zhang, T. Charging–discharging control strategy for a flywheel array energy storage system based on the equal incremental principle. Energies; 2019; 12, 2844. [DOI: https://dx.doi.org/10.3390/en12152844]
12. Liao, S.; Liu, J.; Liu, B.; Cheng, C.; Zhou, L.; Wu, H. Multicore parallel dynamic programming algorithm for short-term hydro-unit load dispatching of huge hydropower stations serving multiple power grids. Water Resour. Manag.; 2020; 34, pp. 359-376. [DOI: https://dx.doi.org/10.1007/s11269-019-02455-w]
13. Li, J.; Moe Saw, M.M.; Chen, S.; Yu, H. Short-term optimal operation of Baluchaung II hydropower plant in Myanmar. Water; 2020; 12, 504. [DOI: https://dx.doi.org/10.3390/w12020504]
14. Wang, W.; Wang, P.; Dong, Y. Modified dynamic programming algorithm and its application in distribution of power plant load. E3S Web of Conferences, Proceedings of the 2019 International Conference on Building Energy Conservation, Thermal Safety and Environmental Pollution Control (ICBTE 2019), Hefei, China, 1–3 November 2019; EDP Sciences: Les Ulis, France, 2019; Volume 136, 01005.
15. Hashim, F.A.; Houssein, E.H.; Hussain, K.; Mabrouk, M.S.; Al-Atabany, W. Honey badger algorithm: New metaheuristic algorithm for solving optimization problems, Math. Comput. Simul.; 2022; 192, pp. 84-110. [DOI: https://dx.doi.org/10.1016/j.matcom.2021.08.013]
16. MiarNaeimi, F.; Azizyan, G.; Rashki, M. Horse herd optimization algorithm: A nature-inspired algorithm for high-dimensional optimization problems. Knowl. Based Syst.; 2021; 213, 106711. [DOI: https://dx.doi.org/10.1016/j.knosys.2020.106711]
17. Zhang, X.; Wang, Z.; Lu, Z. Multi-objective load dispatch for microgrid with electric vehicles using modified gravitational search and particle swarm optimization algorithm. Appl. Energy; 2022; 306, 118018. [DOI: https://dx.doi.org/10.1016/j.apenergy.2021.118018]
18. Shang, Y.; Fan, Q.; Shang, L.; Sun, Z.; Xiao, G. Modified genetic algorithm with simulated annealing applied to optimal load dispatch of the Three Gorges Hydropower Plant in China. Hydrol. Sci. J.; 2019; 64, pp. 1129-1139. [DOI: https://dx.doi.org/10.1080/02626667.2019.1625052]
19. Wang, X.; Yang, K.; Zhou, X. Two-stage glowworm swarm optimization for economical operation of hydropower station. IET Renew. Power Gener.; 2018; 12, pp. 992-1003. [DOI: https://dx.doi.org/10.1049/iet-rpg.2017.0466]
20. Ming, B.; Liu, P.; Guo, S.; Cheng, L.; Zhou, Y.; Gao, S.; He, L. Robust hydroelectric unit commitment considering integration of large-scale photovoltaic power: A case study in China. Appl. Energy; 2018; 228, pp. 1341-1352. [DOI: https://dx.doi.org/10.1016/j.apenergy.2018.07.019]
21. Jiang, W.; Liu, Y.; Fang, G.; Ding, Z. Research on short-term optimal scheduling of hydro-wind-solar multi-energy power system based on deep reinforcement learning. J. Clean. Prod.; 2023; 385, 135704. [DOI: https://dx.doi.org/10.1016/j.jclepro.2022.135704]
22. Zhou, Y.; Huang, Y.; Mao, X.; Kang, Z.; Huang, X.; Xuan, D. Research on energy management strategy of fuel cell hybrid power via an improved TD3 deep reinforcement learning. Energy; 2024; 293, 130564. [DOI: https://dx.doi.org/10.1016/j.energy.2024.130564]
23. Liang, T.; Sun, B.; Tan, J.; Cao, X.; Sun, H. Scheduling scheme of wind-solar complementary renewable energy hydrogen production system based on deep reinforcement learning. High Volt. Eng.; 2023; 49, pp. 2264-2275.
24. Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y. Continuous control with deep reinforcement learning. arXiv; 2015; arXiv: 1509.02971
25. Ha, P.T.; Tran, D.T.; Nguyen, T.T. Electricity Generation Cost Reduction for Hydrothermal Systems with the Presence of Pumped Storage Hydroelectric Plants. Neural Comput. Appl.; 2022; 34, pp. 9931-9953. [DOI: https://dx.doi.org/10.1007/s00521-022-06977-0]
26. Novara, D.; McNabola, A. A model for the extrapolation of the characteristic curves of pumps as turbines from a datum best efficiency point. Energy Convers. Manag.; 2018; 174, pp. 1-7. [DOI: https://dx.doi.org/10.1016/j.enconman.2018.07.091]
27. Cavazzini, G.; Zanetti, G.; Santolin, A.; Ardizzon, G. Characterization of the hydrodynamic instabilities in a pump-turbine operating at part load in turbine mode. IOP Conference Series: Earth and Environmental Science, Proceedings of the 31st IAHR Symposium on Hydraulic Machinery and Systems, Trondheim, Norway, 26 June 2022–1 July 2022; IOP Publishing: New York, NY, USA, 2022; Volume 1079, 012033.
28. Pan, H.; Hang, C.; Feng, F.; Zheng, Y.; Li, F. Improved neural network algorithm based flow characteristic curve fitting for hydraulic turbines. Sustainability; 2022; 14, 10757. [DOI: https://dx.doi.org/10.3390/su141710757]
29. Fang, Z.; Yang, Z.; Peng, H.; Chen, G. Prediction of Ultra-Short-Term power system based on LSTM-Random forest combination model. Journal of Physics: Conference Series, Proceedings of the 2nd International Conference on Electronics, Electrical and Information Engineering, Changsha, China, 11–14 August 2022; IOP Publishing: New York, NY, USA, 2022; Volume 2387, 012033.
30. Zhou, Y.; Kumar, A.; Gandhi, C.P.; Vashishtha, G.; Tang, H.; Kundu, P.; Xiang, J. Discrete entropy-based health indicator and LSTM for the forecasting of bearing health. J. Braz. Soc. Mech. Sci. Eng.; 2023; 45, 120. [DOI: https://dx.doi.org/10.1007/s40430-023-04042-y]
31. Azizi, M. Atomic orbital search: A novel meta-heuristic algorithm. Appl. Math. Model.; 2021; 93, pp. 657-683. [DOI: https://dx.doi.org/10.1016/j.apm.2020.12.021]
32. Abd Elaziz, M.; Ouadfel, S.; Abd El-Latif, A.A.; Ali lbrahim, R. Feature selection based on modified bio-inspired atomic orbital search using arithmetic optimization and opposite-based learning. Cogn. Comput.; 2022; 14, pp. 2274-2295. [DOI: https://dx.doi.org/10.1007/s12559-022-10022-6]
33. Ali, F.; Sarwar, A.; Bakhsh, F.I.; Ahmad, S.; Shah, A.A.; Ahmed, H. Parameter extraction of photovoltaic models using atomic orbital search algorithm on a decent basis for novel accurate RMSE calculation. Energy Convers. Manag.; 2023; 277, 116613. [DOI: https://dx.doi.org/10.1016/j.enconman.2022.116613]
34. Ha, P.T.; Tran, D.T.; Phan, T.M.; Nguyen, T.T. Maximization of Total Profit for Hybrid Hydro-Thermal-Wind-Solar Power Systems Considering Pumped Storage, Cascaded Systems, and Renewable Energy Uncertainty in a Real Zone, Vietnam. Sustainability; 2024; 16, 6581. [DOI: https://dx.doi.org/10.3390/su16156581]
35. Hodson, T.O. Root mean square error (RMSE) or mean absolute error (MAE): When to use them or not. Geosci. Model Dev. Discuss.; 2022; 2022, pp. 1-10. [DOI: https://dx.doi.org/10.5194/gmd-15-5481-2022]
36. Papadimitriou, C.H.; Tsitsiklis, J.N. The complexity of Markov decision processes. Math. Oper. Res.; 1987; 12, pp. 441-450. [DOI: https://dx.doi.org/10.1287/moor.12.3.441]
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
The short-term scheduling of pumped-storage hydropower plants is characterised by high dimensionality and nonlinearity and is subject to multiple operational constraints. This study proposes an intelligent scheduling framework that integrates an Atomic Orbital Search (AOS)-optimised Long Short-Term Memory (LSTM) network with the Deep Deterministic Policy Gradient (DDPG) algorithm to minimise water consumption during the generation period while satisfying constraints such as system load and safety states. Firstly, the AOS-LSTM model simultaneously optimises the number of hidden neurons, batch size, and training epochs to achieve high-precision fitting of unit flow–efficiency characteristic curves, reducing the fitting error by more than 65.35% compared with traditional methods. Subsequently, the high-precision fitted curves are embedded into a Markov decision process to guide DDPG in performing constraint-aware load scheduling. Under a typical daily load scenario, the proposed scheduling framework achieves fast inference decisions within 1 s, reducing water consumption by 0.85%, 1.78%, and 2.36% compared to standard DDPG, Particle Swarm Optimisation, and Dynamic Programming methods, respectively. In addition, only two vibration-zone operations and two vibration-zone crossings are recorded, representing a reduction of more than 90% compared with the above two traditional optimisation methods, significantly improving scheduling safety and operational stability. The results validate the proposed method’s economic efficiency and reliability in high-dimensional, multi-constraint pumped-storage scheduling problems and provide strong technical support for intelligent scheduling systems.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details


1 School of Electrical and Power Engineering, Hohai University, Nanjing 211100, China; [email protected] (X.M.); [email protected] (X.W.);
2 NARI Group Corporation (State Grid Electric Power Research Institute), Nanjing 211106, China; [email protected]