Abstract

Translate

In reinforcement learning, importance sampling is a widely used method for evaluating an expectation under the distribution of data of one policy when the data has in fact been generated by a different policy. Importance sampling requires computing the likelihood ratio between the action probabilities of a target policy and those of the data-producing behavior policy. In this article, we study importance sampling where the behavior policy action probabilities are replaced by their maximum likelihood estimate of these probabilities under the observed data. We show this general technique reduces variance due to sampling error in Monte Carlo style estimators. We introduce two novel estimators that use this technique to estimate expected values that arise in the RL literature. We find that these general estimators reduce the variance of Monte Carlo sampling methods, leading to faster learning for policy gradient algorithms and more accurate off-policy policy evaluation. We also provide theoretical analysis showing that our new estimators are consistent and have asymptotically lower variance than Monte Carlo estimators.

Details

Title

Importance sampling in reinforcement learning with an estimated behavior policy

Author

Hanna, Josiah P¹

; Niekum Scott²; Stone, Peter²

¹ University of Edinburgh, School of Informatics, Edinburgh, UK (GRID:grid.4305.2) (ISNI:0000 0004 1936 7988)
² University of Texas at Austin, Department of Computer Science, Austin, USA (GRID:grid.89336.37) (ISNI:0000 0004 1936 9924)

Pages

1267-1317

Publication year

2021

Publication date

Jun 2021

Publisher

Springer Nature B.V.

ISSN

08856125

e-ISSN

15730565

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1007/s10994-020-05938-9

ProQuest document ID

2542532415

© The Author(s) 2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Importance sampling in reinforcement learning with an estimated behavior policy

Jump to:

Abstract

Details

Suggested sources