Designing realistic RL environment for power systems

Introduction

Power grids are critical infrastructure: ensuring they are reliable, robust and secure is essential to humanity, to everyday life, and to progress. With increasing renewable generation, growing electricity demand, and more severe weather events due to climate change, the task of maintaining efficient and robust power distribution poses a tremendous challenge to grid operators. In recent years, Reinforcement Learning (‘RL’) has shown substantial progress in solving highly complex, nonlinear problems, such as AlphaGo [1], and it is now feasible that an RL agent could address the growing challenge of grid control. Learning to Run a Power Network (‘L2RPN’) is one competition–organized by Réseau de Transport d’Electricité and Electric Power Research Institute–aimed at testing out the capabilities of RL and other algorithms to safely control electricity transportation in power grids. In 2020, L2RPN’s winners used a Semi Markov Afterstate Actor-Critic (‘SMAAC’) approach to successfully manage a grid. L2RPN represents an important first step in commercializing AI for the power grid, but additional refinement of the RL environment is necessary to make it realistic for application in the real world.

Power Grid

A power grid consists of four main physical layers:

  • Generation: where electricity is produced,
  • Transmission: the primary pathways for generators to move electricity,
  • Distribution: the ancillary pathways connecting transmission lines to local load, and
  • Load: where the electricity is consumed.

Power system

Fig.1 - The physical layers of the grid from generation to consumption of load [2]

There are two physical laws the grid must obey at all times:

  • Power balance: supply (generation) and demand (load) must be in balance at all times, and
  • Kirchoff’s Law: at every individual bus, the amount of electricity injected must equal the amount withdrawn. Here bus refers to a node on a grid that connects lines and can contain components such as generator or load

Apart from the above two physical laws it is important to note that other components of the grid are also governed by physical constraint. Each line on the grid has a thermal limit capacity which limits the amount of power flow on a line. Similarly generators have various physical characteristics varying by the type of generation such as ramp up/ramp down rate, and minimum/maximum operating capacity which dictates its ability to dispatch more power if needed.

Grid as graph Fig.2 - Electricity grid as a graph - The total generation (in green) is always equivalent to total load consumption (in red) and at bus 5 the total incoming power is same as outgoing power

Role of Grid Operator

The most important objective for a grid operator is to dispatch generators in such a way that load is met without violating physical limits, and so that the generation dispatched is the ‘cheapest’ or most cost-effective. Grid operators must ensure that the grid is stable at all times, while blackouts are avoided at all costs, and they do this by planning–generally one day ahead–for the amount of expected load and required generation, analyzing various contingency scenarios to assess the impact of potential outages or overloaded lines. If grid stability risks are found, the grid operator will adjust generator dispatch instructions to minimize such risk. Anticipating grid stability risk ahead of time avoids blackouts and their associated impact to the grid operator, grid asset owners and electricity consumers; the February 2021 winter storm in Texas, for instance, is estimated to have caused financial losses of 80 to 130 billion dollars and contributed to at least 210 deaths [3]

Summary of RL Environment, Actions and Rewards

The L2RPN challenge uses grid2op platform for simulating the power grid, translating grid control into a RL environment:

  • State: The RL environment consists of a topological graph of the electrical grid where each node represents a substation and transmission lines representing edges. Along with the graph there is data representing the current state of the grid such as active power, reactive power, thermal limits of lines, lines status and voltages
  • Action: The environment has two major category of actions
    (i) Discrete Topological Actions: connecting and disconnecting lines and switching where generators and loads connect to buses in the grid
    (ii) Continuous dispatch actions: adjusting generator production levels. L2RPN considers all generator types as dispatchable units with agents having ability to adjust the generation level to satisfy load.

  • Reward: L2RPN doesn’t explicitly define any reward [4]; the goal of the agent is to ensure power balance, minimize the performance score, and avoid disconnecting the grid which triggers the game over conditions.
  • Score: To evaluate how well the agent performed L2RPN have defined scoring metric to minimize where $ Score = \sum_{t=0}^{t_{over}} \ (prod_t - load_t) + sum_{t=t_{over}}^{t_{end}} \ penalty + sum_{t=0}^{t_{over}} \ redispatch_t $

Here $ prod_t $ is total supply by generators, $ load_t $ is total load consumption, the penalty term is the penalty in case of early termination due to game over conditions, and redispatch is the total adjustment in generation level.

Semi Markov Afterstate Actor-Critic (SMAAC)

Yoon, Deunsol, et al. (2020) [5] use a Semi Markov Afterstate Actor-Critic (SMAAC) method to create an RL agent to manage the power grid. They introduce a clever idea of afterstate representation, which refers to the state obtained after the agent has made an action, but before the environment reacts. They combine this with a hierarchical policy framework with a high-level policy and low-level policy, where (i) the goal of the high-level policy is to find the best possible topology and (ii) the goal of the low-level policy is to figure the sequence of actions required to reach the topology desired by high-level policy. SMAAC outperformed all other agents in the L2RPN challenge, achieving the lowest overall cost of operation score.

Despite the array of possible actions the agent can take, SMAAC only takes an action in hazardous conditions (i.e when a line becomes overloaded) and the action constitutes only the discrete topological action of switching bus connections. The authors found that the action of disconnecting lines was not useful, as a fully connected grid was beneficial to maintaining grid stability. Similarly, the authors did not consider the action of dispatching generators due to the penalty associated with this action in the L2RPN’s evaluation score. While encouraging, these restrictive limitations and the resulting limited actions taken by the SMAAC agent do not reflect the reality of grid operators and suggest that the environment could be better defined in order to address this reality.

Suggestions to make environment more realistic

Making RL work in practice is difficult. There are many factors which can contribute to an RL algorithm failing outside of a synthetic, research environment, but as it relates to the L2RPN challenge there are two specific issues: (i) the realism of the set of agent actions (ii) the alignment of the reward function with the overall goal of solving the intended problem.

  • The realism of the set of agent actions:

    Although grid operators can perform topological actions, in practice these actions are rarely taken. Grid operators have traditionally viewed grid topology as fixed. Including transmission switching in power system modeling adds binary variables to an already complex non-linear optimization problem–an increase in computation requirements which is rarely worth the effort [6]. Furthermore, not all components of grids are equipped with switches [7]. Until all elements of the grid are switchable, the RL environment should use topological actions incredibly conservatively, if at all, to better reflect this reality.

    In contrast, generation dispatch is a vital tool for grid operators, but this action is under-utilized by participants in L2RPN’s competition due to the penalty imposed.

    The grid operator’s objective is to dispatch the most cost-effective generation while maintaining grid stability, making dispatch instructions a vital control mechanism. Adding different types of generators to the RL environment, such as dispatchable generation like natural gas and coal plants, along with renewable generators, would be a big step towards making the problem more realistic. By adding generator type information, it can act as a filter on which generators can be re-dispatched. Furthermore, for dispatchable generators we propose adding information such as generator maximum capacity, minimum capacity and ramp up and ramp down time so when making dispatch actions the agent is forced to follow the same physical constraints as a generator in reality.

  • Realistic reward function through N-1 security constraint:

    Agents are penalized in the case of grid failure, i.e when load is not met by generators; while this is a necessary penalty, it is not sufficient for ensuring robust and secure grid operations. At worst, it fails to quantify the risk of blackouts. We propose making the grid N-1 secure by adding slack variables as part of the objective function to encourage agents to avoid risky grid states.

    The Texas grid operator, ERCOT, uses Network Constraint Unit Commitment (NCUC) to determine unit commitment [8], minimizing total cost while meeting transmission and resource constraints. NCUC employs the penalty factors on violations of the security constraint to ensure a solution is feasible. NCUC is defined as

    $ Minimize \sum_{i=1}^{NG} \sum_{t=1}^{T} SUC_{i,t} + MEC_{i,t} + C_{i,t}(P_{i,t}) + Penalty_{pb} \times \sum_{i=1}^{T} (Slack_{el,t} + Slack_{es,t}) + \sum_{i=1}^{NG} \sum_{t=1}^{T} Penalty_{lc} \times Slack_{lc,t} $ where \(\begin{equation} SUC_{i,t} = Startup\,cost\,of\,unit\,i\, \\ MEC_{i,t} = Minimum\,Energy\,cost\, of\, unit\, i \\ C_{i,t} = Incremental\, unit\, cost\, of\, unit\, i\, at\, interval\, t \\ P_{i,t} = Dispatch\, MW\, unit\, i\, at\, interval\, t \\ Penalty_{pb} = Penalty\, cost\, of\, power\, balance\, violation \\ Slack_{el} = Slack\, variable\, of\, energy\, long \\ Slack_{es} = Slack\, variable\, of\, energy\, short \\ Penalty_{lc} = Penalty\, cost\, of\, line\, violation \\ Slack_{lc} = Slack\, variable\, for\, line\, constraint \\ \end{equation}\)

    We propose using the above cost function as part of the score used to evaluate success for the agent, given it takes into account all physical parameters related to the grid and more closely follows the objective of current grid operators.

Conclusion

RL is well suited as an automated control algorithm to manage the power grid and L2RPN has accomplished much by designing a preliminary RL environment. The winners of the challenge used a clever algorithm to handle the problem of a large number of actions and states in power system control. The next step to applying RL to power grids is to reformulate the environment actions and scoring function, making the problem more realistic and suitable for commercial deployment.

References

[1] D. Silver, A. Huang, C. J. Maddison et al., “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no.7587, pp. 484–489, (2016).

[2] Conejo, Antonio J., and Luis Baringo. Power system operations. Switzerland: Springer, (2018).

[3] Winter Storm Uri 2021- The Economic Impact of the Storm https://comptroller.texas.gov/economy/fiscal-notes/2021/oct/winter-storm-impact.php

[4] Marot, Antoine, et al. “L2RPN: Learning to Run a Power Network in a Sustainable World NeurIPS2020 challenge design.” (2020).

[5] Yoon, Deunsol, et al. “Winning the L2RPN Challenge: Power Grid Management via Semi-Markov Afterstate Actor-Critic.” International Conference on Learning Representations. (2020).

[6] Hedman, Kory W., Shmuel S. Oren, and Richard P. O’Neill. “A review of transmission switching and network topology optimization.” 2011 IEEE power and energy society general meeting. IEEE, (2011).

[7] A. V. Ramesh and X. Li, “Security Constrained Unit Commitment with Corrective Transmission Switching,” 2019 North American Power Symposium (NAPS), 2019, pp. 1-6, doi: 10.1109/NAPS46351.2019.9000308, (2019).

[8] Hui, Hailong. “Reliability unit commitment in ERCOT nodal market.” (2013).