Abstract
Threatened by weather disasters and operational uncertainties, power systems require resilient and cost-effective decision making to ensure security. This article proposes a novel deep reinforcement learning algorithm, namely defender-attacker soft actor-critic (DA-SAC), designed for contingency-constrained optimal power flow under N-k security criteria. A two-agent Markov decision process is formulated, where the defender learns robust control actions and the attacker identifies worst-case contingencies. The core soft actor-critic algorithm is enhanced by integrating constraint violation levels into the reward function and employing a two-timescale learning scheme to improve feasibility and stability. The proposed method is validated on the IEEE 30-bus and 118-bus systems. Simulation results show that DA-SAC significantly reduces unserved energy, load shedding, and constraint violations, outperforming conventional and deep-reinforcement-learning-based benchmarks under N-1, N-2, and N-3 scenarios. These results demonstrate that DA-SAC offers a fast, resilient, and practical solution for real-time power system operation under severe contingencies.
| Original language | British English |
|---|---|
| Pages (from-to) | 684-695 |
| Number of pages | 12 |
| Journal | IEEE Transactions on Industrial Informatics |
| Volume | 22 |
| Issue number | 2 |
| DOIs | |
| State | Published - 2026 |
Keywords
- Deep reinforcement learning (DRL)
- optimal power flow (OPF)
- real-time decision making
- robust optimization
- secure operation
Fingerprint
Dive into the research topics of 'Real-Time Resilient Power System Operation with Defender-Attacker Soft Actor-Critic Reinforcement Learning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver