Skip to main navigation Skip to search Skip to main content

Real-Time Resilient Power System Operation with Defender-Attacker Soft Actor-Critic Reinforcement Learning

  • Xiang Wei
  • , Ka Wing Chan
  • , Khaled Al Jaafari
  • , Xian Zhang
  • , Guibin Wang
  • , Ahmed Rabee Sayed
  • The Hong Kong Polytechnic University Shenzhen Research Institute
  • College of Chemistry and Environmental Engineering of Shenzhen University

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Threatened by weather disasters and operational uncertainties, power systems require resilient and cost-effective decision making to ensure security. This article proposes a novel deep reinforcement learning algorithm, namely defender-attacker soft actor-critic (DA-SAC), designed for contingency-constrained optimal power flow under N-k security criteria. A two-agent Markov decision process is formulated, where the defender learns robust control actions and the attacker identifies worst-case contingencies. The core soft actor-critic algorithm is enhanced by integrating constraint violation levels into the reward function and employing a two-timescale learning scheme to improve feasibility and stability. The proposed method is validated on the IEEE 30-bus and 118-bus systems. Simulation results show that DA-SAC significantly reduces unserved energy, load shedding, and constraint violations, outperforming conventional and deep-reinforcement-learning-based benchmarks under N-1, N-2, and N-3 scenarios. These results demonstrate that DA-SAC offers a fast, resilient, and practical solution for real-time power system operation under severe contingencies.

Original languageBritish English
Pages (from-to)684-695
Number of pages12
JournalIEEE Transactions on Industrial Informatics
Volume22
Issue number2
DOIs
StatePublished - 2026

Keywords

  • Deep reinforcement learning (DRL)
  • optimal power flow (OPF)
  • real-time decision making
  • robust optimization
  • secure operation

Fingerprint

Dive into the research topics of 'Real-Time Resilient Power System Operation with Defender-Attacker Soft Actor-Critic Reinforcement Learning'. Together they form a unique fingerprint.

Cite this