1. Introduction
The analysis of power market equilibrium is pivotal in various domains, including market operation simulation, market mechanism design, trading decision-making by market entities, market power analysis, electricity price forecasting, and source-grid-load investment planning [1]. The bilevel optimization model is the predominant approach for modeling power market equilibrium. Solutions are primarily derived through two methods: the model transformation method, which leverages the Karush-Kuhn-Tucker (KKT) conditions or the strong duality theorem to convert the bilevel model into a single-level mathematical program with equilibrium constraints (MPEC) or an equilibrium problem with equilibrium constraints (EPEC) [2,3]; and the iterative solution method, which employs optimization algorithms based on agent-based models, such as reinforcement learning and deep reinforcement learning. The model transformation method presupposes that the lower-level model is a convex optimization problem, which restricts its applicability to unit commitment and clearing models in the power market and is more suited for single-period market equilibrium problems [4,5,6]. In contrast, the iterative solution method is adept at addressing multi-period market equilibrium problems. Notably, deep reinforcement learning circumvents the “curse of dimensionality” that can arise from the discretization of the action space in reinforcement learning, making it suitable for large-scale system models.
Currently, power market modeling predominantly employs Nash game equilibrium, with insufficient attention given to collusion equilibrium scenarios [7,8]. This oversight hinders a comprehensive understanding and effective regulation of the power market’s actual operations. Collusion among power producers is a real possibility in the electricity market, yet most existing research concentrates on Nash game equilibrium, neglecting the collusive equilibrium scenario. Due to the lack of research on collusion scenarios, market regulators cannot quickly identify collusion behaviors in the electricity market. Collusion can lead to market price distortions, inefficient resource allocation, consumer harm, and reduced market efficiency. Furthermore, the inadequate consideration of collusive equilibrium results in deficiencies in market mechanism design and policy formulation [9]. Existing market rules and regulatory policies may fail to identify and prevent collusive practices, thus jeopardizing fair competition and the stable operation of the electricity market. There is also a dearth of in-depth research on monitoring and punitive mechanisms for collusion, which hampers the creation of a strong deterrent against potential collusive activities. Additionally, overlooking the collusive equilibrium scenario restricts the assessment and management of power market risks. Collusion can induce market volatility and increased uncertainty, affecting the safe and stable operation of the power system. The absence of quantitative analysis and countermeasures for market risks under collusive equilibrium leaves the power market exposed to collusive risks.
Considering the problem mentioned above, this paper introduces an innovative application of the Deep Deterministic Policy Gradient (DDPG) algorithm to address the collusion equilibrium in the power market, validated through rigorous testing within the IEEE three-bus model. Using the DDPG model to simulate the power market, participants can converge to the market equilibrium point in different scenarios. The results show that the collusive equilibrium significantly increases the bidding price by market participants compared with Nash equilibrium. Concurrently, such collusive practices significantly elevate the node marginal price of electricity. The increased nodal marginal price not only indicates the distortion inflicted by collusion on the electricity market’s price formation mechanism but also underscores the detrimental effects of collusion on the market’s fairness and efficiency.
2. Problem Modeling
2.1. Electrical Market
Bidding curve and cost: Considering actual market operations, this paper employs slope bidding, assuming linearity between the unit’s power generation cost and its output. Given that unit costs are typically represented in quadratic form in existing literature, we adopt the secant method for approximation in this study [10].
Assume that the true cost of generator i is a quadratic function about output power. This quadratic model captures the relationship between the power output and the cost associated with it. The cost function is given by
(1)
In this equation, represents the quadratic cost of generator i, is the coefficient that determines the curvature of the cost function, is the output power of generator i at a given time interval, and is the linear coefficient that affects the cost. The constraints ensure that the output power is within the operational limits of the generator, where is the minimum power output and is the maximum power output.
In this paper, we employ secant functions to approximate quadratic cost functions. The quadratic real cost is approximated using the secant approximation formula as follows:
(2)
Here, represents the marginal cost, which is the slope of the secant line between the maximum and minimum power outputs. This approximation allows us to estimate the cost at any output level within the range . Furthermore, the bidding function is defined as
(3)
In this context, denotes the bidding marginal cost, which is restricted to either zero or three times the marginal cost . This binary choice reflects a strategic decision in bidding within the market.
Figure 1 illustrates the cost function for a unit, depicting both the quadratic curve and the secant line approximation, effectively capturing the relationship between power output and cost.
Market model:at various time intervals, the electrical consumer demand can be accurately modeled as a linear function, which is represented by the equation
(4)
where, f is the slope of the demand function, D is the total demand, and is the quantity demanded. This linear model helps in understanding the relationship between the consumer price and the quantity demanded.Subsequently, the consumer utility curve, which is a measure of the satisfaction derived from consuming a certain quantity of electricity, is expressed as follows:
(5)
This quadratic utility function captures the diminishing marginal utility, where the second term indicates that as the quantity demanded increases, the additional satisfaction gained from consuming more electricity decreases.
In the pursuit of maximizing total social welfare, the Independent System Operator (ISO) clears the market under the constraints of nodal power balance, branch flow limitations, and generator output boundaries. This market clearing process at discrete time intervals can be effectively modeled using a DC power flow approach. The optimization problem is formulated as follows:
(6)
In this formula, the vector F represents the maximum allowable flow on each line within the network. The matrix PTDF, which is known as the Power Transfer Distribution Factor matrix, plays a crucial role in the system. Vectors and denote the power generation and demand across all buses, respectively, and these are linear combinations of and .
At each time interval, the profit for each generator i is determined by maximizing the revenue less the cost. This can be mathematically represented as
(7)
This formulation captures the economic cost of power generation, taking into account both the market price and the generation cost. The term represents the cost function for generator i, with indicating that the generator is located at bus i where denotes the nodal price at bus i in Equation (7).
2.2. DDPG
In contrast to game theory methods that require full information, the agent-based simulation technique models the market as a partially observable Markov decision process, which employs a reinforcement learning approach. Within this framework, each generation agent g perceives an observable state at time t, selects an action , and subsequently receives a reward . The objective for each agent is to maximize their cumulative reward over the entire time horizon T, although the specific formula for this cumulative reward is not included here for brevity.
The variables are interpreted in conjunction with market mechanisms as follows: The state , which is a key component in defining the decision-making context for each generation agent g at time interval t, is constituted by the nodal prices from the previous time interval and the total load demand of the current time interval. Specifically, it is represented by the combination of these factors as
where denotes the nodal price at bus i during the previous time interval, and represents the total load demand at time t.In this context, the strategic variable corresponds to the action , which is generated by the GenCo agent g. However, their ranges may differ and require subsequent scaling. The payoff , often termed as reward in the RL domain, is assumed to reflect the GenCos’ rational consideration of their own payoffs, thus equating payoffs to rewards in this study.
The Deep Deterministic Policy Gradient (DDPG) algorithm, an actor-critic and model-free approach, is based on the Deterministic Policy Gradient and operates in continuous state and action spaces. The DDPG algorithm is built around two main networks: the actor network, described by the policy function with parameters , and the critic network, described by the action-value function with parameters . To facilitate training, target networks are created: the actor target network with parameters and the critic target network with parameters . The network architectures used in this study are depicted in Figure 2.
The Deep Deterministic Policy Gradient (DDPG) agent-based learning method leverages the Bellman equation to establish a recursive relationship, which is fundamental to the learning process. This relationship is expressed as , where represents the action-value function, is the immediate reward, and is the discount factor.
The loss function for the Q network, which is crucial for training the network, is given by: . Here, is the target value used for training, computed as . This formulation allows the agent to learn the optimal policy by minimizing the difference between the predicted and target values of the Q function.
The network is updated by applying the chain rule to the previous equation with respect to its parameters:
In the DDPG algorithm, the sequence of operations is as follows: The policy network produces an action given the current state , and noise is introduced to this action, yielding . Upon taking action , the environment transitions the agent to the next state , along with a reward . This experience, encapsulated as , is saved in the replay buffer. For training, a mini-batch consisting of N randomly selected experiences from the buffer, denoted as , is utilized.
Utilizing the mini-batch , the policy network computes , which is then fed into the action-value network Q. Within this framework, the gradients are calculated via automatic differentiation. Specifically, the gradient of the action a is determined as
(8)
and the gradient of the parameter is found by(9)
These gradients facilitate the approximation of the policy gradient:
(10)
Employing a small learning rate , updates to the network ’s parameters are made via gradient ascent to refine the policy’s performance.
(11)
Lastly, the target networks are softly updated with a small update rate :
2.3. Infinitely Repeated Games
Electricity market auctions are conceptualized as an infinite series of identical static games, each incorporating a discount factor that reflects the present value of future payoffs. This factor becomes crucial for generator companies (GenCo) g, as their present value of payoffs is determined by the discounted sum . The closer is to 1, the more significant the weight of future payoffs, signifying greater patience from the generator. When these static games are consistent over time, they constitute an infinitely repeated game . Gibbons’ [11] analysis of the infinitely repeated Prisoner’s Dilemma reveals that the grim-trigger strategy, characterized by initial cooperation and subsequent defection upon betrayal, constitutes a Nash Equilibrium when , with being the critical discount factor dependent on the payoffs from cooperation, betrayal, and mutual defection. The Folk Theorem [12] is that in such scenarios, highly patient players can secure higher payoffs than in single-stage games, with cooperative strategies emerging as Nash Equilibria as approaches 1.
2.4. Nash Equilibrium and Collusion Equilibrium
We categorized observed episodes and agent strategies along a spectrum from competitive to collusiveby defining an episodic collusion measure. In the context of a Markov game, a collection of agent policies is identified as competitive, or a Nash equilibrium, if no agent i can unilaterally select a different policy to improve their expected total episode profit, given the fixed strategies of their opponents, as expressed by the maximization of individual expected rewards:
(12)
Conversely, the same collection of policies is deemed collusive, or a monopolistic equilibrium, if it aims to maximize the expected total profit across all agents, which is the maximization of the sum of expected rewards for all agents:
(13)
3. Results
3.1. Setting
We employ multiple Deep Deterministic Policy Gradient (DDPG) agents to simulate an electricity market, where each agent independently learns its own policy, considering other agents as part of the environment. Centralized training, opponent modeling, and communication can improve convergence but require knowledge of other agents’ actions or policies, conflicting with market realities. We focus on the market strategy of each generator (gen) without prior knowledge of other gen’ behaviors, thus opting for an IQL-like architecture. Moreover, considerable empirical evidence indicates that such architectures are often effective in practical scenarios, a finding our experimental results will corroborate.
The process of employing multiple Deep Deterministic Policy Gradient (DDPG) agents to simulate an electricity market is delineated by DDPG. In this context, distinct agents are represented by the subscript g for their respective parameters. It is posited in this study that the strategic variable is allowed to vary without constraints within the interval of 0 to , represented as . As given by the equation, the permissible bounds of the strategic variable require the output a of the actor network, which ranges from −1 to 1 as per the hyperbolic tangent function , to be scaled. This is shown in Figure 1.
(14)
The capacity of the replay buffer is established at . Training of the agents is initiated once the replay buffer reaches its full capacity. For the agent’s exploration mechanism, Gaussian noise is utilized. The default hyperparameters for the Deep Deterministic Policy Gradient (DDPG) algorithm are as follows: , , , , , and T = 10,000 characterized by:
(15)
The is initialized at a value of 1, and it is designed to gradually decrease to 0.02, with a minimum threshold established to prevent excessively small values from occurring once the training phase begins. The decay of is governed by the following piecewise function:
(16)
In this context, the termination point for training, denoted by , is set to . Once the value of t surpasses , the training and the addition of noise are ceased to allow for the observation of the network’s unaided output, marking the transition into the testing phase. For a detailed account of the parameters of the three-node power market model, reference is made to [13]. The IEEE three-bus system model is depicted in Figure 3. Marginal cost prices for Gen are 17.5 USD/MWh and 20 USD/MWh, respectively. The max power for the line of Bus 1 to Bus 3 is 25 MWh.
3.2. Equilibrium Result
Within the electricity market model, we deployed both the Particle Swarm Optimization (PSO) and the Deep Deterministic Policy Gradient (DDPG) algorithms to simulate two distinct equilibrium scenarios: Nash equilibrium and collusive equilibrium. Under the Nash equilibrium, the PSO algorithm achieved an equilibrium point of (22.70, 24.65) as depicted in Figure 4, whereas the DDPG algorithm arrived at (22.59, 24.83) as shown in Figure 5. In the context of collusive equilibrium, the PSO algorithm determined the point (28.93, 29.58) presented in Figure 4, whereas the DDPG algorithm’s convergence was to (28.81, 29.14) illustrated in Figure 5. These results demonstrate the algorithms’ capability to converge to equilibrium points within their respective scenarios, with the DDPG algorithm exhibiting tighter convergence in the collusive equilibrium scenario. This performance is attributed to DDPG’s adaptability and robustness in managing continuous action spaces and high-dimensional state spaces. Consequently, the DDPG algorithm emerges as a potent tool for pinpointing equilibrium points within the electricity market model, thereby providing precise decision support for market participants. Additionally, Figure 6a,b indicate the presence of optimal collusion when blocking occurs.
3.3. Collusion Effect
Figure 7 demonstrates the substantial influence of collusive behavior on the profits of two power-generating units within the electricity market. Under collusive equilibrium conditions, these units enhance their profits by decreasing their generation levels and increasing the nodal marginal price. This conduct contravenes the tenets of market competition, potentially diminishing market efficiency and curtailing consumer surplus. Given that collusive behavior could lead to market power abuse and unfair competition, it is imperative for regulatory bodies to intensify their oversight to safeguard market fairness and transparency.
4. Conclusions
In order to solve the limitation of not fully considering the collusion equilibrium scenario in the current electricity market research, this study introduces the Deep Deterministic Policy Gradient (DDPG) algorithm to solve the collusion equilibrium in the electricity market. It is verified by the IEEE three-bus model. The results of the DDPG algorithm and PSO algorithm show that both algorithms can converge to market Nash equilibrium and collusion equilibrium. Further observation shows that when the units collude, the node marginal price and social welfare will be significantly increased, and the profits of the colluding units will be significantly increased. This study improves the understanding of collusive behavior in the electricity market and lays a foundation for regulatory intervention.
Data curation, Writing—original draft, Resource, Project administration, Y.L., J.C., M.C. and Z.H.; writing—review and editing, Y.G. and C.L. All authors have read and agreed to the published version of the manuscript.
Data are unavailable due to privacy restrictions.
Authors Yifeng Liu, Jingpin Chen, Meng Chen and Zhongshi He were employed by the company Hubei Electric Power Co., Ltd. Power Exchange Center. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Figure 4. PSO for electrical market equilibrium. (a) Nash equilibrium. (b) Collusion equilibrium.
Figure 5. DDPG for Eeectrical market equilibrium. (a) Nash equilibrium. (b) Collusion equilibrium.
Figure 6. Collusive benefit distribution. (a) Collusive benefit distribution without line blocking. (b) Collusive benefit distribution with line blocking.
References
1. Boyd, S.; Vandenberghe, L.; Faybusovich, L. Convex optimization. IEEE Trans. Autom. Control.; 2006; 51, 1859.
2. Hassan, A.; Dvorkin, Y. Energy storage siting and sizing in coordinated distribution and transmission systems. IEEE Trans. Sustain. Energy; 2018; 9, pp. 1692-1701. [DOI: https://dx.doi.org/10.1109/TSTE.2018.2809580]
3. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G. et al. Human-level control through deep reinforcement learning. Nature; 2015; 518, pp. 529-533. [DOI: https://dx.doi.org/10.1038/nature14236] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/25719670]
4. Ruiz, C.; Conejo, A.J. Pool strategy of a producer with endogenous formation of locational marginal prices. IEEE Trans. Power Syst.; 2009; 24, pp. 1855-1866. [DOI: https://dx.doi.org/10.1109/TPWRS.2009.2030378]
5. Zhang, X.P. Restructured Electric Power Systems: Analysis of Electricity Markets with Equilibrium Models; John Wiley & Sons: Hoboken, NJ, USA, 2010.
6. Ye, Y.; Qiu, D.; Sun, M.; Papadaskalopoulos, D.; Strbac, G. Deep reinforcement learning for strategic bidding in electricity markets. IEEE Trans. Smart Grid; 2019; 11, pp. 1343-1355. [DOI: https://dx.doi.org/10.1109/TSG.2019.2936142]
7. Chow, Y.; Ghavamzadeh, M. Algorithms for CVaR optimization in MDPs. Adv. Neural Inf. Process. Syst.; 2014; 27, pp. 1-9.
8. Wu, C.; Gu, W.; Yi, Z.; Lin, C.; Long, H. Non-cooperative differential game and feedback Nash equilibrium analysis for real-time electricity markets. Int. J. Electr. Power Energy Syst.; 2023; 144, 108561. [DOI: https://dx.doi.org/10.1016/j.ijepes.2022.108561]
9. Moitre, D. Nash equilibria in competitive electric energy markets. Electr. Power Syst. Res.; 2002; 60, pp. 153-160. [DOI: https://dx.doi.org/10.1016/S0378-7796(01)00174-2]
10. Astero, P.; Choi, B.J. Electrical market management considering power system constraints in smart distribution grids. Energies; 2016; 9, 405. [DOI: https://dx.doi.org/10.3390/en9060405]
11. Gibbons, R. A Primer in Game Theory; Prentice Hall: Saddle River, NJ, USA, 1992.
12. Friedman, J.W. A non-cooperative equilibrium for supergames. Rev. Econ. Stud.; 1971; 38, pp. 1-12. [DOI: https://dx.doi.org/10.2307/2296617]
13. Waheed, M.; Jasim Sultan, A.; al Bakry, A.A.A.; Saeed, F.N. Harmonic reduction in IEEE-3-bus using hybrid power filters. IOP Conf. Ser. Mater. Sci. Eng.; 2021; 1105, 012012. [DOI: https://dx.doi.org/10.1088/1757-899X/1105/1/012012]
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
The evolution of the electricity market has brought the issues of market equilibrium and collusion to the forefront of attention. This paper introduces the Deep Deterministic Policy Gradient (DDPG) algorithm on the IEEE three-bus electrical market model. Specifically, it simulates the behavior of market participants through reinforcement learning (DDPG), and Nash equilibrium and the collusive equilibrium of the power market are simulated by setting different reward functions. The results show that, compared with the Nash equilibrium, collusion equilibrium can increase the price of nodal marginal electricity and reduce total social welfare.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
1 Hubei Electric Power Co., Ltd., Power Exchange Center, Wuhan 430073, China
2 Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China