Reinforcement learning for enhancing the stability and management of power systems with new resources

Access full-text files



Journal Title

Journal ISSN

Volume Title



Modern power systems face numerous challenges due to uncertainties arising from factors such as renewable energy source intermittency, stochastic load demand, and evolving grid dynamics. These uncertainties can lead to imbalances in power supply and demand, resulting in frequency and voltage deviations and, in extreme cases, blackouts. To address these challenges, advanced control and optimization techniques, particularly reinforcement learning (RL), have gained significant interest in ensuring efficient and reliable power system operations. RL offers a promising approach for decision-making under uncertainty, enabling agents to learn optimal policies without explicit uncertainty modeling. This thesis explores the application of RL to two classes of operational problems within power systems. The first class focuses on power system resource management, including optimal battery control (OBC) and electric vehicle charging station (EVCS) operation. Challenges arise when formulating these problems as Markov Decision Process (MDP) to adopt RL. For example, incorporating cycle-based degradation costs into the MDP for OBC is not straightforward due to its dependence on past state of charge (SoC) trajectories. Similarly, the state and action spaces in EVCS problem scale with the number of EVs, leading to high-dimensional MDP formulations. This thesis proposes RL-based solutions for these resource management problems, while addressing the challenges by incorporating precise battery degradation model and efficient aggregation schemes to MDP. The second class of problems deals with wide-area dynamics control for power system stability enhancement. Here, it is crucial for RL approaches to account for risk measures in offline-trained RL policies, considering uncertainties and perturbations in practice. The thesis focuses on load frequency control (LFC), which is vulnerable to variability due to high load perturbations, especially in small-scale systems like networked microgrids. Additionally, wide-area damping control (WADC) relies on communication networks, and communication delays can negatively impact its performance, given its fast time-scale. Moreover, the increasing integration of grid-forming inverters (GFMs) poses challenges in accurately modeling the overall system dynamics, which results in high variability in the system. To address these uncertainties and perturbations, this thesis integrates a mean-variance risk constraint into classic linear quadratic regulator (LQR) problems with linearized dynamics, limiting deviations of state costs from their expected values and reducing system variability in worst-case scenarios. In addition, structured feedback controllers need to be considered to match specific information-exchange graphs, which complicates the geometry of feasible region. To design risk-aware controllers for constrained LQR problems, a stochastic gradient-descent with max-oracle (SGDmax) algorithm is developed. This algorithm ensures convergence to a stationary point with a high probability, making it computationally efficient as it solves the inner loop problem of a dual problem easily and utilizes zero-order policy gradients (ZOPG) to estimate unbiased gradients, eliminating the need to compute first-order values. The policy gradient nature of SGDmax also allows the incorporation of structure by considering only non-zero entries in the ZOPG. In summary, this thesis presents RL applications for effectively managing emerging energy resources and enhancing the stability of interconnected power systems. The analytical and numerical results offer efficient and reliable solutions to address uncertainty, supporting the transition towards a sustainable and resilient electricity infrastructure.


LCSH Subject Headings