Estimation and control of visitation distributions for reinforcement learning

dc.contributor.advisorStone, Peter, 1971-
dc.contributor.committeeMemberNiekum, Scott
dc.contributor.committeeMemberLiu, Qiang
dc.contributor.committeeMemberKrähenbühl, Philipp
dc.contributor.committeeMemberBellemare, Marc
dc.creatorDurugkar, Ishan
dc.creator.orcid0000-0002-6222-7054
dc.date.accessioned2023-06-13T00:48:02Z
dc.date.available2023-06-13T00:48:02Z
dc.date.created2023-05
dc.date.issued2023-03-09
dc.date.submittedMay 2023
dc.date.updated2023-06-13T00:48:03Z
dc.description.abstractIn sequential decision making tasks an agent needs to make decisions and interact with the world in order to maximize its long-term expected utility. These tasks are complex since the agent's actions determine not just the immediate utility but also its future data stream. The distribution of this data – the state, state-action, or transition visitation distribution – can be considered a signature of the agent's policy. This thesis investigates novel techniques for incorporating the estimation and control of the state and state-action distribution induced by the policy into RL algorithms. The main hypothesis is that these techniques improve the effectiveness of RL algorithms. This hypothesis is tested in four main settings. First, when dealing with sim-to-real transfer, this dissertation introduces and analyzes a method that adjusts the simulator such that the state-action visitation distribution in the simulator is similar to that in the real world for the same agent behavior. Second, when the task itself can be specified as a target distribution, the dissertation examines whether the Wasserstein distance can be used to learn a reward function that guides the agent's state visitation distribution towards this target. Third, it investigates whether estimating the data distribution improves the accuracy of the algorithm that evaluates the agent's behavior with a fixed batch of data. Fourth, it investigates whether providing a target distribution derived from demonstrations of a coordinated team can be used by agents in a multi-agent environment to learn to coordinate. In addition to these main contributions, this thesis also lays out additional directions where the estimation and control of distributions could enhance RL algorithms. All methods in the thesis are fully implemented, and are analyzed both theoretically and empirically.
dc.description.departmentComputer Science
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/2152/119261
dc.identifier.urihttp://dx.doi.org/10.26153/tsw/46139
dc.language.isoen
dc.subjectReinforcement learning
dc.subjectDistribution control
dc.subjectDistribution matching
dc.subjectGenerative adversarial techniques
dc.titleEstimation and control of visitation distributions for reinforcement learning
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentComputer Sciences
thesis.degree.disciplineComputer Science
thesis.degree.grantorThe University of Texas at Austin
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
DURUGKAR-DISSERTATION-2023.pdf
Size:
8.11 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
4.45 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.84 KB
Format:
Plain Text
Description: