Content area

Abstract

The Constrained Markov Decision Process (CMDP) formulation allows to solve safety-critical decision making tasks that are subject to constraints. While CMDPs have been extensively studied in the Reinforcement Learning literature, little attention has been given to sampling-based planning algorithms such as MCTS for solving them. Previous approaches perform conservatively with respect to costs as they avoid constraint violations by using Monte Carlo cost estimates that suffer from high variance. We propose Constrained MCTS (C-MCTS), which estimates cost using a safety critic that is trained with Temporal Difference learning in an offline phase prior to agent deployment. The critic limits exploration by pruning unsafe trajectories within MCTS during deployment. C-MCTS satisfies cost constraints but operates closer to the constraint boundary, achieving higher rewards than previous work. As a nice byproduct, the planner is more efficient w.r.t. planning steps. Most importantly, under model mismatch between the planner and the real world, C-MCTS is less susceptible to cost violations than previous work.

Details

1009240
Business indexing term
Title
C-MCTS: Safe Planning with Monte Carlo Tree Search
Publication title
arXiv.org; Ithaca
Publication year
2024
Publication date
Oct 27, 2024
Section
Computer Science
Publisher
Cornell University Library, arXiv.org
Source
arXiv.org
Place of publication
Ithaca
Country of publication
United States
University/institution
Cornell University Library arXiv.org
e-ISSN
2331-8422
Source type
Working Paper
Language of publication
English
Document type
Working Paper
Publication history
 
 
Online publication date
2024-10-29
Milestone dates
2023-05-25 (Submission v1); 2023-09-29 (Submission v2); 2024-06-05 (Submission v3); 2024-10-27 (Submission v4)
Publication history
 
 
   First posting date
29 Oct 2024
ProQuest document ID
2819550588
Document URL
https://www.proquest.com/working-papers/c-mcts-safe-planning-with-monte-carlo-tree-search/docview/2819550588/se-2?accountid=208611
Full text outside of ProQuest
Copyright
© 2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-02-13
Database
2 databases
  • ProQuest One Academic
  • ProQuest One Academic