Uniform turnpike planning horizon theorems for finite Markov decision processes

Mark E. Lewis
Cornell University
School of Operations Research and Information Engineering
221 Rhodes Hall
Ithaca, New York 14853

 

Anand Paul

University of Florida

Warrington College of Business

Information Systems and Operations Management Department

Gainesville, FL 32611-7169

Shapiro (1968) formulated and proved a turnpike theorem for finite discounted Markov decision processes. This states that for any fixed discount factor there exists a positive integer N such that the first period decision in some optimal n-period policy is identical to the first period decision in an optimal infinite horizon policy for all n ≥ N. Let us call the smallest integer N with this property the turnpike integer at that discount rate. Shapiro investigated turnpike integers as point functions of the discount factor. We are interested in viewing turnpike integers as interval functions of the discount rate rather than as point functions. Given a finite state, finite action Markov decision process, we show that there exists a finite turnpike integer that is valid for all discount factors in the entire interval of [0,1) of discount rates. This is a finding of theoretical significance since past research has hinted at the lack of a uniform upper bound on the turnpike integer at discount factors close to unity. The practical significance of the result is that rolling horizon policies are attractive even in situations in which it is difficult to narrow down the discount factor to a single point estimate, provided that the horizon is sufficiently long. We also establish a uniform convergence theorem for the difference between the value of an optimal infinite horizon policy and the value of an optimal n-horizon policy that may be of independent interest.