BIAS OPTIMALITY


(A chapter to appear in Markov Decision Processes: Models, Methods, Directions, and Open Problems
edited by Eugene Feinberg and Adam Shwartz


Mark E. Lewis
Industrial and Operations Engineering
University of Michigan
1205 Beal Avenue
Ann Arbor, MI 48109-2117

Martin L. Puterman
Faculty of Commerce and Business Administration
The University of British Columbia
2053 Main Mall
Vancouver, BC V6T 1Z2 Canada

The use of the long-run average reward or the gain as an optimality criterion has received considerable attention in the literature. However, for many practical models the gain has the undesirable property of being underselective , that is, there may be several gain optimal policies. After finding the set of policies that achieve the primary objective of maximizing the long-run average reward one might search for that which maximizes the "short-run" or transient reward. This reward, called the bias aids in distinguishing among multiple gain optimal policies. This chapter focuses on the usefulness of the bias in distinguishing multiple gain optimal policies, its computation, and the implicit discounting captured by bias on recurrent states.