Content area

Abstract

The contextual bandit literature has traditionally focused on algorithms that address the exploration-exploitation tradeoff. In particular, greedy algorithms that exploit current estimates without any exploration may be sub-optimal in general. However, exploration-free greedy algorithms are desirable in practical settings where exploration may be costly or unethical (e.g., clinical trials). Surprisingly, we find that a simple greedy algorithm can be rate optimal (achieves asymptotically optimal regret) if there is sufficient randomness in the observed contexts (covariates). We prove that this is always the case for a two-armed bandit under a general class of context distributions that satisfy a condition we term covariate diversity. Furthermore, even absent this condition, we show that a greedy algorithm can be rate optimal with positive probability. Thus, standard bandit algorithms may unnecessarily explore. Motivated by these results, we introduce Greedy-First, a new algorithm that uses only observed contexts and rewards to determine whether to follow a greedy algorithm or to explore. We prove that this algorithm is rate optimal without any additional assumptions on the context distribution or the number of arms. Extensive simulations demonstrate that Greedy-First successfully reduces exploration and outperforms existing (exploration-based) contextual bandit algorithms such as Thompson sampling or upper confidence bound (UCB).

Details

1009240
Identifier / keyword
Title
Mostly Exploration-Free Algorithms for Contextual Bandits
Publication title
arXiv.org; Ithaca
Publication year
2020
Publication date
Apr 19, 2020
Section
Computer Science; Statistics
Publisher
Cornell University Library, arXiv.org
Source
arXiv.org
Place of publication
Ithaca
Country of publication
United States
University/institution
Cornell University Library arXiv.org
e-ISSN
2331-8422
Source type
Working Paper
Language of publication
English
Document type
Working Paper
Publication history
 
 
Online publication date
2020-04-21
Milestone dates
2017-04-28 (Submission v1); 2017-06-19 (Submission v2); 2018-02-15 (Submission v3); 2018-02-27 (Submission v4); 2018-10-02 (Submission v5); 2019-08-30 (Submission v6); 2019-11-25 (Submission v7); 2020-04-19 (Submission v8)
Publication history
 
 
   First posting date
21 Apr 2020
ProQuest document ID
2071664005
Document URL
https://www.proquest.com/working-papers/mostly-exploration-free-algorithms-contextual/docview/2071664005/se-2?accountid=208611
Full text outside of ProQuest
Copyright
© 2020. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2020-04-22
Database
2 databases
  • ProQuest One Academic
  • ProQuest One Academic