Boosting deep reinforcement learning algorithms with deep probabilistic models
Access full-text files
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This thesis develops new methodologies that boost deep reinforcement learning algorithms from a probabilistic point of view. More specifically, three angles are studied to make improvements in terms of sample efficiency of the deep reinforcement learning algorithms: 1). We apply a hierarchical structure on policy construction to obtain a flexible policy so that it has the capability of capturing complex distribution and make more appropriate decisions. 2). We manage to reduce the variance of the policy gradient estimation calculated via a Monte Carlo estimation by designing a "self-critic'' baseline function, the new gradient estimator has a smaller variance and leads to a better empirical performance. 3). We apply the distributional reinforcement learning framework on the continuous-action setting with a stochastic policy, and stabilize the training process with double generative networks. All the methods bring clear gains, which demonstrate the benefits of applying deep probabilistic models to improve deep reinforcement learning algorithms.