A PROBABILISTIC ANALYSIS OF BIAS OPTIMALITY IN UNICHAIN MARKOV DECISION PROCESSES

Mark E. Lewis
Industrial and Operations Engineering
University of Michigan
1205 Beal Avenue
Ann Arbor, MI 48109-2117
Martin L. Puterman
Faculty of Commerce and Business Administration
The University of British Columbia
2053 Main Mall
Vancouver, BC V6T 1Z2 Canada

This paper focuses on bias optimality in unichain, finite state and action space Markov Decision Processes. Using relative value functions, we present new methods for evaluating optimal bias. This leads to a probabilistic analysis which transforms the original reward problem into a minimum average cost problem. The result is an explanation of how and why bias implicitly discounts future rewards.

A PROBABILISTIC ANALYSIS OF BIAS OPTIMALITY IN UNICHAIN MARKOV DECISION PROCESSES

Mark E. Lewis Industrial and Operations Engineering University of Michigan 1205 Beal Avenue Ann Arbor, MI 48109-2117 Martin L. Puterman Faculty of Commerce and Business Administration The University of British Columbia 2053 Main Mall Vancouver, BC V6T 1Z2 Canada

Mark E. Lewis
Industrial and Operations Engineering
University of Michigan
1205 Beal Avenue
Ann Arbor, MI 48109-2117
Martin L. Puterman
Faculty of Commerce and Business Administration
The University of British Columbia
2053 Main Mall
Vancouver, BC V6T 1Z2 Canada