Imitation learning with auxiliary, suboptimal, and task-agnostic data




Goo, Wonjoon

Journal Title

Journal ISSN

Volume Title



In order for robots or other autonomous agents to be widely deployed in our daily lives, they must have the ability to adapt to new environments and learn based on the end-user's preferences. Reinforcement learning (RL) is one common way to learn adaptive behaviors, in which the agent automatically learns using its own experiences within an environment. However, RL is often impractical to apply to real-world tasks due to its trial-and-error nature; RL algorithms require exploration, which can be time-consuming and even unsafe. Furthermore, specifying task requirements (i.e. designing a reward function) requires expert knowledge about both RL and the task domain, hindering novice end-users from using RL to customize the agent as desired.

By contrast, the imitation learning (IL) framework can avoid these problems by (1) directly obtaining example solutions to a task in the form of demonstrations, and (2) finding a policy that mimics the given demonstrations. This framework provides a natural, expressive way for novice users to teach an autonomous agent, but it also has limited real-world applicability due to its burdensome data requirements of both quantity and quality; imitation learning algorithms often require a large number of demonstrations and assume that the provided demonstrations are near-optimal. Moreover, imitation learning algorithms often rely on RL to find the imitating policy, so they inherit the exploration problems of RL.

This dissertation aims to address the shortcomings of imitation learning by utilizing more widely available data, which is auxiliary, suboptimal, and task-agnostic: First, we introduce an imitation learning method that utilizes unsegmented, unordered video demonstrations to address the quantity issue in IL. Second, we develop an inverse reinforcement learning algorithm that leverages sub-optimal demonstrations to tackle the quality issue. Third, we investigate offline RL methods in order to allow imitation learning to be done with pre-generated task-agnostic experience. Finally, we combine the presented inverse reinforcement learning and offline RL methods to build a practical imitation learning algorithm, which only requires a handful of preference labels from an end user along with a task-agnostic experience set that can be gathered effortlessly.

Together, the presented works in this dissertation made significant steps towards enabling the full potential of the imitation learning framework that promises easy programming by non-expert end-users, and this is the main contribution of this thesis.


LCSH Subject Headings