Imitation learning with auxiliary, suboptimal, and task-agnostic data

dc.contributor.advisorNiekum, Scott David
dc.contributor.committeeMemberStone, Peter
dc.contributor.committeeMemberZhu, Yuke
dc.contributor.committeeMemberLim, Joseph
dc.creatorGoo, Wonjoon
dc.date.accessioned2023-03-04T04:48:39Z
dc.date.available2023-03-04T04:48:39Z
dc.date.created2022-12
dc.date.issued2022-11-30
dc.date.submittedDecember 2022
dc.date.updated2023-03-04T04:48:40Z
dc.description.abstractIn order for robots or other autonomous agents to be widely deployed in our daily lives, they must have the ability to adapt to new environments and learn based on the end-user's preferences. Reinforcement learning (RL) is one common way to learn adaptive behaviors, in which the agent automatically learns using its own experiences within an environment. However, RL is often impractical to apply to real-world tasks due to its trial-and-error nature; RL algorithms require exploration, which can be time-consuming and even unsafe. Furthermore, specifying task requirements (i.e. designing a reward function) requires expert knowledge about both RL and the task domain, hindering novice end-users from using RL to customize the agent as desired. By contrast, the imitation learning (IL) framework can avoid these problems by (1) directly obtaining example solutions to a task in the form of demonstrations, and (2) finding a policy that mimics the given demonstrations. This framework provides a natural, expressive way for novice users to teach an autonomous agent, but it also has limited real-world applicability due to its burdensome data requirements of both quantity and quality; imitation learning algorithms often require a large number of demonstrations and assume that the provided demonstrations are near-optimal. Moreover, imitation learning algorithms often rely on RL to find the imitating policy, so they inherit the exploration problems of RL. This dissertation aims to address the shortcomings of imitation learning by utilizing more widely available data, which is auxiliary, suboptimal, and task-agnostic: First, we introduce an imitation learning method that utilizes unsegmented, unordered video demonstrations to address the quantity issue in IL. Second, we develop an inverse reinforcement learning algorithm that leverages sub-optimal demonstrations to tackle the quality issue. Third, we investigate offline RL methods in order to allow imitation learning to be done with pre-generated task-agnostic experience. Finally, we combine the presented inverse reinforcement learning and offline RL methods to build a practical imitation learning algorithm, which only requires a handful of preference labels from an end user along with a task-agnostic experience set that can be gathered effortlessly. Together, the presented works in this dissertation made significant steps towards enabling the full potential of the imitation learning framework that promises easy programming by non-expert end-users, and this is the main contribution of this thesis.
dc.description.departmentComputer Science
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/2152/117578
dc.identifier.urihttp://dx.doi.org/10.26153/tsw/44458
dc.language.isoen
dc.subjectReinforcement learning
dc.subjectImitation learning
dc.subjectLearning from demonstration
dc.titleImitation learning with auxiliary, suboptimal, and task-agnostic data
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentComputer Sciences
thesis.degree.disciplineComputer Science
thesis.degree.grantorThe University of Texas at Austin
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
GOO-DISSERTATION-2022.pdf
Size:
3.59 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
4.45 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.84 KB
Format:
Plain Text
Description: