Martin L. Puterman
Faculty of Commerce and Business Administration
The University of British Columbia
2053 Main Mall
Vancouver, BC V6T 1Z2 Canada
This paper focuses on bias optimality in unichain, finite state and action space Markov Decision Processes. Using relative value functions, we present new methods for evaluating optimal bias. This leads to a probabilistic analysis which transforms the original reward problem into a minimum average cost problem. The result is an explanation of how and why bias implicitly discounts future rewards.