<!DOCTYPE html>
<html lang="en-us">

  <head>
  <link href="http://gmpg.org/xfn/11" rel="profile">
  <meta http-equiv="content-type" content="text/html; charset=utf-8">

  <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1">

  <title>
    
      Designing realistic RL environment for power systems &middot; The ICLR Blog Track
    
  </title>

  
  <link rel="canonical" href="https://iclr.iro.umontreal.ca/c0c70012-ea59-4d2f-b293-ce53cb261486_1642192092/2021/12/01/reinforcement-learning/">
  

  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/c0c70012-ea59-4d2f-b293-ce53cb261486_1642192092/public/css/poole.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/c0c70012-ea59-4d2f-b293-ce53cb261486_1642192092/public/css/syntax.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/c0c70012-ea59-4d2f-b293-ce53cb261486_1642192092/public/css/lanyon.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/c0c70012-ea59-4d2f-b293-ce53cb261486_1642192092/public/css/custom.css">
  <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=PT+Serif:400,400italic,700%7CPT+Sans:400">

  <link rel="apple-touch-icon-precomposed" sizes="144x144" href="https://iclr.iro.umontreal.ca/c0c70012-ea59-4d2f-b293-ce53cb261486_1642192092/public/apple-touch-icon-precomposed.png">
  <link rel="shortcut icon" href="https://iclr.iro.umontreal.ca/c0c70012-ea59-4d2f-b293-ce53cb261486_1642192092/public/favicon.ico">

  <link rel="alternate" type="application/rss+xml" title="RSS" href="https://iclr.iro.umontreal.ca/c0c70012-ea59-4d2f-b293-ce53cb261486_1642192092/atom.xml">

  

  <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript" ></script>
 <!-- <script type="text/x-mathjax-config"> MathJax.Hub.Config({ TeX: { equationNumbers: { autoNumber: "AMS" } } }); </script> -->
  <script type="text/x-mathjax-config">
      MathJax.Hub.Config({
        tex2jax: { inlineMath: [ ['$','$'], ["\\(","\\)"] ],
         processEscapes: false
        }
      });
</script>
</head>


  <body>

    <!-- Target for toggling the sidebar `.sidebar-checkbox` is for regular
     styles, `#sidebar-checkbox` for behavior. -->
<input type="checkbox" class="sidebar-checkbox" id="sidebar-checkbox">
<!-- <input type="checkbox" class="sidebar-checkbox" id="sidebar-checkbox" > -->

<!-- Toggleable sidebar -->
<div class="sidebar" id="sidebar">
  <div class="sidebar-item">
    <p>For short-term, peer-sourced tests of time, generalizations, specializations, reproductions, etc.!</p>
  </div>

  <nav class="sidebar-nav">

    

    
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/c0c70012-ea59-4d2f-b293-ce53cb261486_1642192092/">ICLR 2022 Blog Track</a>
        
      
    
      
        
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/c0c70012-ea59-4d2f-b293-ce53cb261486_1642192092/about/">About</a>
        
      
    
      
    
      
        
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/c0c70012-ea59-4d2f-b293-ce53cb261486_1642192092/submitting/">Submitting</a>
        
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/c0c70012-ea59-4d2f-b293-ce53cb261486_1642192092/tags/">Tags</a>
        
      
    

    <a class="sidebar-nav-item" href="https://github.com/iclr-blog-track/iclr-blog-track.github.io">GitHub project</a>
    <span class="sidebar-nav-item">Currently vICLR Spring 2021</span>
  </nav>

  <div class="sidebar-item">
    <p>
      &copy; 2022. All rights reserved.
    </p>
  </div>
</div>


    <!-- Wrap is the content to shift when toggling the sidebar. We wrap the
         content to avoid any CSS collisions with our real content. -->
    <div class="wrap">
      <div class="masthead">
        <div class="container">
          <h3 class="masthead-title">
            <a href="/" title="Home">The ICLR Blog Track</a>
            <small></small>
          </h3>
        </div>
      </div>

      <div class="container content">
        <div class="post">
  <h1 id="iclr-post-title" class="post-title">Designing realistic RL environment for power systems</h1>
  <span class="post-date">01 Dec 2021 | 
    <a class="content-tag" href="/tags/#reinforcement-learning"> Reinforcement Learning </a>
  
    <a class="content-tag" href="/tags/#power-systems"> Power Systems </a>
  
    <a class="content-tag" href="/tags/#simulation"> Simulation </a>
  </span>

  <span id="iclr-post-authors" class="post-date">Anonymous</span>
  <h2 id="introduction">Introduction</h2>

<p>Power grids are critical infrastructure: ensuring they are reliable, robust and secure is essential to humanity,
to everyday life, and to progress. With increasing renewable generation, growing electricity demand, and more severe
weather events due to climate change, the task of maintaining efficient and robust power distribution poses a
tremendous challenge to grid operators. In recent years, Reinforcement Learning (‘RL’) has shown substantial progress in
solving highly complex, nonlinear problems, such as AlphaGo <a href="#1">[1]</a>,
and it is now feasible that an RL agent could address the growing challenge of grid control.
Learning to Run a Power Network (‘L2RPN’) is one competition–organized by Réseau de Transport d’Electricité and
Electric Power Research Institute–aimed at testing out the capabilities of RL and other algorithms to safely control
electricity transportation in power grids. In 2020, L2RPN’s winners used a Semi Markov Afterstate Actor-Critic (‘SMAAC’)
approach to successfully manage a grid.  L2RPN represents an important first step in commercializing AI for the power
grid, but additional refinement of the RL environment is necessary to make it realistic for application in the real world.</p>

<h2 id="power-grid">Power Grid</h2>
<p>A power grid consists of four main physical layers:</p>
<ul>
  <li>Generation: where electricity is produced,</li>
  <li>Transmission: the primary pathways for generators to move electricity,</li>
  <li>Distribution: the ancillary pathways connecting transmission lines to local load, and</li>
  <li>Load: where the electricity is consumed.</li>
</ul>

<p><img src="https://iclr.iro.umontreal.ca/c0c70012-ea59-4d2f-b293-ce53cb261486_1642192092/public/images/2021-12-01-reinforcement-learning/power_grid.png" alt="Power system" /></p>

<p><em>Fig.1 - The physical layers of the grid from generation to consumption of load <a href="#2">[2]</a></em></p>

<p>There are two physical laws the grid must obey at all times:</p>
<ul>
  <li>Power balance: supply (generation) and demand (load) must be in balance at all times, and</li>
  <li>Kirchoff’s Law: at every individual bus, the amount of electricity injected must equal the
amount withdrawn. Here bus refers to a node on a grid that connects lines and can contain
 components such as generator or load</li>
</ul>

<p>Apart from the above two physical laws it is important to note that other components of the grid are also governed by
physical constraint. Each line on the grid has a thermal limit capacity which limits the amount of power flow on a line.
Similarly generators have various physical characteristics varying by the type of generation such as ramp up/ramp down
rate, and minimum/maximum operating capacity which dictates its ability to dispatch more power if needed.</p>

<p><img src="https://iclr.iro.umontreal.ca/c0c70012-ea59-4d2f-b293-ce53cb261486_1642192092/public/images/2021-12-01-reinforcement-learning/power_grid_graph.png" alt="Grid as graph" />
<em>Fig.2 - Electricity grid as a graph - The total generation (in green) is always equivalent
to total load consumption (in red) and at bus 5 the total incoming power is same as outgoing power</em></p>

<h2 id="role-of-grid-operator">Role of Grid Operator</h2>
<p>The most important objective for a grid operator is to dispatch generators in such a way that load
is met without violating physical limits, and so that the generation dispatched is the ‘cheapest’ or most
cost-effective. Grid operators must ensure that the grid is stable at all times, while blackouts are avoided
at all costs, and they do this by planning–generally one day ahead–for the amount of expected load and required
generation, analyzing various contingency scenarios to assess the impact of potential outages or overloaded
lines. If grid stability risks are found, the grid operator will adjust generator dispatch instructions to
minimize such risk. Anticipating grid stability risk ahead of time avoids blackouts and their associated
impact to the grid operator, grid asset owners and electricity consumers; the February 2021 winter storm
in Texas, for instance, is estimated to have caused financial losses of 80 to 130 billion dollars and
contributed to at least 210 deaths <a href="#3">[3]</a></p>

<h2 id="summary-of-rl-environment-actions-and-rewards">Summary of RL Environment, Actions and Rewards</h2>
<p>The L2RPN challenge uses grid2op platform for simulating the power grid, translating grid control into a
RL environment:</p>

<ul>
  <li>State: The RL environment consists of a topological graph of the electrical grid where each node represents a
 substation and transmission lines representing edges. Along with the graph there is data representing the
 current state of the grid such as active power, reactive power, thermal limits of lines, lines status and
 voltages</li>
  <li>
    <p>Action: The environment has two major category of actions <br />
 (i) Discrete Topological Actions: connecting and disconnecting
 lines and switching where generators and loads connect to buses in the grid <br />
 (ii) Continuous dispatch actions:  adjusting generator production levels.
 L2RPN considers all generator types as dispatchable units with agents having ability to adjust the generation level
 to satisfy load.</p>
  </li>
  <li>Reward: L2RPN doesn’t explicitly define any reward <a href="#4">[4]</a>; the goal of the agent is to ensure power balance,
 minimize the performance score, and avoid disconnecting the grid which triggers the game over conditions.</li>
  <li>Score: To evaluate how well the agent performed L2RPN have defined scoring metric to minimize where
 $ Score = \sum_{t=0}^{t_{over}} \ (prod_t - load_t) + sum_{t=t_{over}}^{t_{end}} \ penalty + sum_{t=0}^{t_{over}} \ redispatch_t $</li>
</ul>

<p>Here $ prod_t $ is total supply by generators, $ load_t $ is total load consumption, the penalty term is the
penalty in case of early termination due to game over conditions, and redispatch is the total adjustment in
generation level.</p>

<h2 id="semi-markov-afterstate-actor-critic-smaac">Semi Markov Afterstate Actor-Critic (SMAAC)</h2>
<p>Yoon, Deunsol, et al. (2020) <a href="#5">[5]</a>  use a Semi Markov Afterstate Actor-Critic (SMAAC) method to create an RL agent to manage the
power grid. They introduce a clever idea of afterstate representation, which refers to the state obtained after
the agent has made an action, but before the environment reacts. They combine this with a hierarchical policy
framework with a high-level policy and low-level policy, where
(i) the goal of the high-level policy is to find the best possible topology and
(ii) the goal of the low-level policy is to figure the sequence of actions required to reach the topology
desired by high-level policy. SMAAC outperformed all other agents in the L2RPN challenge,
achieving the lowest overall cost of operation score.</p>

<p>Despite the array of possible actions the agent can take, SMAAC only takes an action in hazardous conditions
(i.e when a line becomes overloaded) and the action constitutes only the discrete topological action of switching bus
connections. The authors found that the action of disconnecting lines was not useful, as a fully connected grid was
beneficial to maintaining grid stability. Similarly, the authors did not  consider the action of dispatching generators
due to the penalty associated with this action in the L2RPN’s evaluation score. While encouraging, these restrictive
limitations and the resulting limited actions taken by the SMAAC agent do not reflect the reality of grid operators
and suggest that the environment could be better defined in order to address this reality.</p>

<h2 id="suggestions-to-make-environment-more-realistic">Suggestions to make environment more realistic</h2>

<p>Making RL work in practice is difficult. There are many factors which can contribute to an RL algorithm failing outside
of a synthetic, research environment, but as it relates to the L2RPN challenge there are two specific issues:
(i) the realism of the set of agent actions
(ii) the alignment of the reward function with the overall goal of solving the intended problem.</p>

<ul>
  <li>
    <p>The realism of the set of agent actions:</p>

    <p>Although grid operators can perform topological actions, in practice these actions  are rarely taken.
  Grid operators have traditionally viewed grid topology as fixed. Including transmission switching in power
  system modeling adds binary variables to an already complex non-linear optimization problem–an increase in
  computation requirements which is rarely worth the effort <a href="#6">[6]</a>. Furthermore, not all components of grids are equipped
  with switches <a href="#7">[7]</a>. Until all elements of the grid are switchable, the RL environment should use topological
  actions incredibly  conservatively, if at all, to better reflect this reality.</p>

    <p>In contrast, generation dispatch is a vital tool for grid operators, but this action is under-utilized by
  participants in L2RPN’s competition due to the penalty imposed.</p>

    <p>The grid operator’s objective is to dispatch the  most cost-effective generation  while maintaining grid stability,
  making dispatch instructions a vital control mechanism. Adding different types of generators to the RL environment,
  such as dispatchable generation like natural gas and coal plants, along with renewable generators, would be a big
  step towards making the problem more realistic. By adding generator type information, it can act as a filter on
  which generators can be re-dispatched. Furthermore, for dispatchable generators we propose adding information such
  as generator maximum capacity, minimum capacity and ramp up and ramp down time so when making dispatch actions the
  agent is forced to follow the same physical constraints as a generator in reality.</p>
  </li>
  <li>
    <p>Realistic reward function through N-1 security constraint:</p>

    <p>Agents are penalized in the case of grid failure, i.e when load is not met by generators; while this is a
  necessary penalty,  it is not sufficient for ensuring robust and secure grid operations. At worst, it fails to
  quantify the risk of blackouts. We propose making the grid N-1 secure by adding slack variables as part of the
  objective function to encourage agents to avoid risky grid states.</p>

    <p>The Texas grid operator, ERCOT, uses Network Constraint Unit Commitment (NCUC) to determine unit commitment <a href="#8">[8]</a>,
  minimizing total cost while meeting transmission and resource constraints. NCUC employs the penalty factors on
  violations of the security constraint to ensure a solution is feasible. NCUC is defined as</p>

    <p>$ Minimize \sum_{i=1}^{NG} \sum_{t=1}^{T} SUC_{i,t} + MEC_{i,t} + C_{i,t}(P_{i,t}) + Penalty_{pb} \times \sum_{i=1}^{T} (Slack_{el,t} + Slack_{es,t}) + \sum_{i=1}^{NG} \sum_{t=1}^{T} Penalty_{lc} \times Slack_{lc,t} $
  where
  \(\begin{equation}
      SUC_{i,t} = Startup\,cost\,of\,unit\,i\, \\
      MEC_{i,t} = Minimum\,Energy\,cost\, of\, unit\, i \\
      C_{i,t} = Incremental\, unit\, cost\, of\, unit\, i\, at\, interval\, t \\
      P_{i,t} = Dispatch\, MW\, unit\, i\, at\, interval\, t \\
      Penalty_{pb} = Penalty\, cost\, of\, power\, balance\, violation \\
      Slack_{el} = Slack\, variable\, of\, energy\, long \\
      Slack_{es} = Slack\, variable\, of\, energy\, short \\
      Penalty_{lc} = Penalty\, cost\, of\, line\, violation \\
      Slack_{lc} = Slack\, variable\, for\, line\, constraint \\
  \end{equation}\)</p>

    <p>We propose using the above cost function as part of the score used to evaluate success for the agent,
  given it takes into account all physical parameters related to the grid and more closely follows the
  objective of current grid operators.</p>
  </li>
</ul>

<h2 id="conclusion">Conclusion</h2>
<p>RL is well suited as an automated control algorithm to manage the power grid and L2RPN has accomplished much
by designing a preliminary RL environment. The winners of the challenge used a clever algorithm to handle the
problem of a large number of actions and states in power system control. The next step to applying RL to power
grids is to reformulate the environment actions and scoring function, making the problem more realistic and
suitable for commercial deployment.</p>

<h2 id="references">References</h2>
<p><a id="1">[1]</a>
D. Silver, A. Huang, C. J. Maddison et al., “Mastering the game of go with deep neural networks and tree search,”
 Nature, vol. 529, no.7587, pp. 484–489, (2016).</p>

<p><a id="2">[2]</a>
Conejo, Antonio J., and Luis Baringo. Power system operations. Switzerland: Springer, (2018).</p>

<p><a id="3">[3]</a>
Winter Storm Uri 2021- The Economic Impact of the Storm
https://comptroller.texas.gov/economy/fiscal-notes/2021/oct/winter-storm-impact.php</p>

<p><a id="4">[4]</a>
Marot, Antoine, et al. “L2RPN: Learning to Run a Power Network in a Sustainable World NeurIPS2020 challenge
design.” (2020).</p>

<p><a id="5">[5]</a>
Yoon, Deunsol, et al. “Winning the L2RPN Challenge: Power Grid Management via Semi-Markov Afterstate Actor-Critic.”
International Conference on Learning Representations. (2020).</p>

<p><a id="6">[6]</a>
Hedman, Kory W., Shmuel S. Oren, and Richard P. O’Neill. “A review of transmission switching and network
topology optimization.” 2011 IEEE power and energy society general meeting. IEEE, (2011).</p>

<p><a id="7">[7]</a>
A. V. Ramesh and X. Li, “Security Constrained Unit Commitment with Corrective Transmission Switching,”
2019 North American Power Symposium (NAPS), 2019, pp. 1-6, doi: 10.1109/NAPS46351.2019.9000308, (2019).</p>

<p><a id="8">[8]</a>
Hui, Hailong. “Reliability unit commitment in ERCOT nodal market.” (2013).</p>


</div>

<div id="bibtex-container" class="related">
  For attribution in academic contexts, please cite this work as
  <pre id="bibtex-academic-attribution">

  </pre>

  BibTeX citation
  <pre id="bibtex-box">

  </pre>
</div>
<script>
  let authorsSpan = document.getElementById("iclr-post-authors");
  let authorsText = authorsSpan.textContent;
  let lnameFnameInstitution = authorsText.split(";");
  let lfiList = lnameFnameInstitution.map(lfi => lfi.split(",").map(item => item.trim()));
  let bibtexLFI = lfiList.map(lfi => lfi[0] + ", " + lfi[1]).join(" and ")
  let academicLFI = lfiList.map(lfi => lfi[0]);
  {
    if(academicLFI.length > 2) academicLFI = academicLFI[0] + ", et al.";
    else if(academicLFI.length == 2) academicLFI = academicLFI[0] + " & " + academicLFI[1];
    else academicLFI = academicLFI[0];
  }

  let titleSpan = document.getElementById("iclr-post-title");
  let titleText = titleSpan.textContent.trim();
  let bibtexTitleShorthand = (lfiList[0][1]+
    "2022"+
    titleText.split(" ").slice(0, 3).join("")
  ).replace(" ", "").replace(/[\p{P}$+<=>^`|~]/gu, '').toLowerCase().trim();

  let bibtexTemplate = `
@inproceedings{${bibtexTitleShorthand}},
  author = {${bibtexLFI}},
  title = {${titleText}},
  booktitle = {ICLR Blog Track},
  year = {2022},
  note = {${window.location.href}},
  url  = {${window.location.href}}
}
  `.trim();
  document.getElementById("bibtex-box").innerText = bibtexTemplate;

  let academicTemplate = `
${academicLFI}, "${titleText}", ICLR Blog Track, 2022.
`.trim();
  document.getElementById("bibtex-academic-attribution").innerText = academicTemplate;

</script>


<div class="related">
  <h2>Related posts</h2>
  <ul class="related-posts">
    
      <li>
        <h3>
          <a href="/2021/09/01/sample-submission/">
            Sample Submission
            <small>01 Sep 2021 | 
    <a class="content-tag" href="/tags/#reinforcement-learning"> Reinforcement Learning </a>
  
    <a class="content-tag" href="/tags/#power-systems"> Power Systems </a>
  
    <a class="content-tag" href="/tags/#simulation"> Simulation </a>
  </small>
          </a>
        </h3>
      </li>
    
      <li>
        <h3>
          <a href="/2020/04/02/example-content/">
            Example content (Basic Markdown)
            <small>02 Apr 2020 | 
    <a class="content-tag" href="/tags/#reinforcement-learning"> Reinforcement Learning </a>
  
    <a class="content-tag" href="/tags/#power-systems"> Power Systems </a>
  
    <a class="content-tag" href="/tags/#simulation"> Simulation </a>
  </small>
          </a>
        </h3>
      </li>
    
  </ul>
</div>


<script src="https://utteranc.es/client.js"
        repo="iclr-blog-track/iclr-blog-track.github.io"
        issue-term="pathname"
        label="utterance"
        theme="boxy-light"
        crossorigin="anonymous"
        >
</script>


      </div>
    </div>

    <label for="sidebar-checkbox" class="sidebar-toggle"></label>

    <script src='/public/js/script.js'></script>
  </body>
</html>
