Reducing sampling error in batch temporal difference learning

dc.contributor.advisorStone, Peter, 1971-
dc.creatorPavse, Brahma Suneil
dc.date.accessioned2021-09-20T16:59:06Z
dc.date.available2021-09-20T16:59:06Z
dc.date.created2020-05
dc.date.issued2020-05-05
dc.date.submittedMay 2020
dc.date.updated2021-09-20T16:59:06Z
dc.description.abstractTemporal difference (TD) learning is one of the main foundations of modern reinforcement learning. This thesis studies the use of TD(0), a canonical TD algorithm, to estimate the value function of a given evaluation policy from a batch of data. In this batch setting, we show that TD(0) may converge to an inaccurate value function because the update following an action is weighted according to the number of times that action occurred in the batch -- not the true probability of the action under the evaluation policy. To address this limitation, we introduce policy sampling error corrected-TD(0) (PSEC-TD(0)). PSEC-TD(0) first estimates the empirical distribution of actions in each state in the batch and then uses importance sampling to correct for the mismatch between the empirical weighting and the correct weighting for updates following each action. We refine the concept of a certainty-equivalence estimate and argue that PSEC-TD(0) converges to a more desirable fixed-point than TD(0) for a fixed batch of data. Finally, we conduct a thorough empirical evaluation of PSEC-TD(0) on three batch value function learning tasks in a variety of settings and show that PSEC-TD(0) produces value function estimates with lower mean squared error than the standard TD(0) algorithm
dc.description.departmentComputer Science
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/2152/87909
dc.identifier.urihttp://dx.doi.org/10.26153/tsw/14853
dc.language.isoen
dc.subjectReinforcement learning
dc.subjectMachine learning
dc.subjectArtificial intelligence
dc.subjectTemporal difference learning
dc.subjectImportance sampling
dc.subjectOff-policy learning
dc.subjectValue function learning
dc.subjectBatch machine learning
dc.titleReducing sampling error in batch temporal difference learning
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentComputer Sciences
thesis.degree.disciplineComputer Science
thesis.degree.grantorThe University of Texas at Austin
thesis.degree.levelMasters
thesis.degree.nameMaster of Science in Computer Sciences

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
PAVSE-THESIS-2020.pdf
Size:
767.54 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
4.45 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.84 KB
Format:
Plain Text
Description: