Spring 2024 ECE Distinguished Seminar Series
Reinforcement Learning for Safety-Critical Systems
Dr. Enrique Mallada
Associate Professor, Department of Electrical and Computer Engineering
Johns Hopkins University
February 16, 2024, 11:00 am
ENGR 4201
Abstract
Integrating Reinforcement Learning (RL) in safety-critical applications, such as autonomous vehicles, healthcare, and industrial automation, necessitates an
increased focus on safety and reliability. In this talk, we consider two complementary mechanisms to augment RL's suitability for safety-critical systems.
Firstly, we consider a constrained reinforcement learning (C-RL) setting, wherein agents aim to maximize rewards while adhering to required constraints on
secondary specifications. Several algorithms rooted in sampled-based primal-dual methods have been recently proposed to solve this problem in policy space. However, such methods exhibit a discrepancy between the behavioral and optimal policies due to their
reliance on stochastic gradient descent-ascent algorithms. We propose a novel algorithm for constrained RL that does not suffer from these limitations. Leveraging recent results on regularized saddle-flow dynamics, we develop a novel stochastic gradient descent-ascent
algorithm whose trajectories almost surely converge to the optimal policy.
Secondly, we study the problem of incorporating safety-critical constraints to RL that allow an agent to avoid (unsafe) regions of the state space. Though
such a safety goal can be captured by an action-value-like function, a.k.a. safety critics, the associated operator lacks the desired contraction and uniqueness properties that the classical Bellman operator enjoys. In this work, we overcome the non-contractiveness
of safety critic operators by leveraging that safety is a binary property. To that end, we study the properties of the binary safety critic associated with a deterministic dynamical system that seeks to avoid reaching an unsafe region. We formulate the corresponding
binary Bellman equation (B2E) for safety and study its properties. While the resulting operator is still non-contractive, we fully characterize its fixed points representing--except for a spurious solution--maximal persistently safe regions of the state space
that can always avoid failure. We provide an algorithm that, by design, leverages axiomatic knowledge of safe data to avoid spurious fixed points.
Bio: Enrique Mallada has been an associate professor of electrical and computer engineering at Johns Hopkins University since
2022. Before joining Hopkins in 2016 as an assistant professor, he was a post-doctoral fellow at the Center for the Mathematics of Information at the California Institute of Technology from 2014 to 2016. He received his telecommunications engineering degree
from Universidad ORT, Uruguay, in 2005 and his Ph.D. degree in electrical and computer engineering with a minor in applied mathematics from Cornell University in 2014. Dr. Mallada was awarded the Johns Hopkins Alumni Association Excellence in Teaching award
in 2021, the NSF CAREER award in 2018, the ECE Director's Ph.D. Thesis Research Award for his dissertation in 2014, the Cornell University's Jacobs Fellowship in 2011 and the Organization of American States scholarship from 2008 to 2010. His research interests
lie in the areas of control, dynamical systems, optimization, and machine learning, with applications to infrastructure networks and autonomous systems.