Offline reinforcement learning via an optimization lens




Feng, Yihao

Journal Title

Journal ISSN

Volume Title



In this dissertation we develop new methodologies and frameworks to address challenges in offline reinforcement learning. More specifically, four perspectives have been studied to improve the efficiency and reliability of offline policy evaluation and learning: 1) We manage to improve the accuracy of off-policy policy evaluation, by proposing a doubly robust estimator for long horizon reinforcement learning problems. 2) We develop a surrogate loss for value function learning, which avoids double-sample problem and can be easily estimated and optimized using off-policy data. 3) We provide two different approaches to constructing non-asymptotic confidence bounds for off-policy evaluation, which are significantly tight than prior importance sampling based bounds. 4) We derive a tractable lower bound for offline policy learning, which can be utilized for unified model and policy learning, thus stabilizes the policy training under offline setting. As a result, we can make offline reinforcement learning techniques more applicable for real-world applications.


LCSH Subject Headings