Human-in-the-loop Reinforcement Learning Method for Volt/Var Control in Active Distribution Network with Safe Operation Mechanism

Yuechuan Tao, Junhua Zhao

Published: 18 Jun 2025, Last Modified: 21 Jul 2025IEEE Transactions on Sustainable EnergyEveryoneRevisionsCC BY-NC 4.0

Abstract: In recent years, distributed energy resources (DERs) in power systems have been increasingly integrated into the distribution network. DERs will improve the flexibility and economy of active distribution networks (ADNs) while introducing increased complexity and challenges in maintaining stable and efficient system operations. The traditional voltage regulation methods struggle to cope with these complexities, highlighting the need for more advanced and adaptive control strategies for fast-response PVs and battery energy storage systems (BESS). This paper proposes a novel Human-in-the-loop deep reinforcement learning (HITL-DRL) framework for Volt/Var control in ADNs, addressing the limitations of the existing approaches by integrating human experience and knowledge into the learning process. Additionally, a Security-Clipped Proximal Policy Optimization (SC-PPO) algorithm is introduced to ensure safe operation during reinforcement learning. The paper explores three human-intervention strategies: human demonstration, human feedback, and setting adversary, which enhance the learning process by leveraging expert knowledge and experience. The proposed HITL-DRL framework demonstrates improved convergence speed, robustness, reduced exploration risk, and increased interpretability and trust, paving the way for more effective voltage regulation in complex power systems. The proposed HITL-DRL method is verified in the IEEE 33-bus system, demonstrating superior performance over standard DRL algorithms in terms of training speed and robustness, achieving the highest average reward and the second-fastest computational time. Compared to traditional PPO, our method significantly excels in managing unforeseen contingencies, resulting in a lower voltage violation rate of 73.4%. Compared with the model-based method, the strategy of HITL-DRL is very close to that of optimization results in terms of energy loss and voltage violation rates. However, HITL-DRL shows advantages in decision-making time, responding within 1 millisecond, which is capable of rapidly adapting to time-vary changes in ADNs.