Martin L. Puterman
Faculty of Commerce and Business Administration
The University of British Columbia
2053 Main Mall
Vancouver, BC V6T 1Z2 Canada
The use of the long-run average reward or the gain as an optimality criterion has received considerable attention in the literature. However, for many practical models the gain has the undesirable property of being underselective , that is, there may be several gain optimal policies. After finding the set of policies that achieve the primary objective of maximizing the long-run average reward one might search for that which maximizes the "short-run" or transient reward. This reward, called the bias aids in distinguishing among multiple gain optimal policies. This chapter focuses on the usefulness of the bias in distinguishing multiple gain optimal policies, its computation, and the implicit discounting captured by bias on recurrent states.