Full text

Turn on search term navigation

Copyright © 2009 Quanxin Zhu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

We study the policy iteration algorithm (PIA) for continuous-time jump Markov decision processes in general state and action spaces. The corresponding transition rates are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. The criterion that we are concerned with is expected average reward. We propose a set of conditions under which we first establish the average reward optimality equation and present the PIA. Then under two slightly different sets of conditions we show that the PIA yields the optimal (maximum) reward, an average optimal stationary policy, and a solution to the average reward optimality equation.

Details

Title
Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces
Author
Zhu, Quanxin; Yang, Xinsong; Huang, Chuangxia
Publication year
2009
Publication date
2009
Publisher
John Wiley & Sons, Inc.
ISSN
10853375
e-ISSN
16870409
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
855220555
Copyright
Copyright © 2009 Quanxin Zhu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.