Approximation Algorithms for Stochastic Inventory Control Models Retsef Levi∗ Martin Pal † Robin Roundy‡ David B. Shmoys§ Submitted January 2005, Revised August 2005. Abstract We consider two classical stochastic inventory control models, the periodic-review stochastic inventory control problem and the stochastic lot-sizing problem. The goal is to coordinate a sequence of orders of a single commodity, aiming to supply stochastic demands over a discrete, fnite horizon with minimum expected overall ordering, holding and backlogging costs. In this paper, we address the important problem of fnding computationally effcient and provably good inventory control policies for these models in the presence of correlated and non-stationary (time-dependent) stochastic demands. This problem arises in many domains and has many practical applications in supply chain management. Our approach is based on a new marginal cost accounting scheme for stochastic inventory control models combined with novel cost-balancing techniques. Specifcally, in each period, we balance the expected cost of over ordering (i.e, costs incurred by excess inventory) against the expected cost of under ordering (i.e., costs incurred by not satisfying demand on time). This leads to what we believe to be the frst computationally effcient policies with constant worst-case performance guarantees for a general class of important stochastic inventory models. That is, there exists a constant C such that, for any instance of the problem, the expected cost of the policy is at most C times the expected cost of an optimal policy. In particular, we provide worst-case guarantee of 2 for the periodic-review stochastic inventory control problem and a worst-case guarantee of 3 for the stochastic lot-sizing problem. Our results are valid for all of the currently known approaches in the literature to model correlation and non-stationarity of demands over time. ∗retsef@us.ibm.com. IBM T. J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY 10598. This research was conducted while the author was a Phd student in the ORIE department at Cornell University. Research supported partially by a grant from Motorola and NSF grants CCR-9912422&CCR-0430682. †mpal@dimacs.rutgers.edu DIMACS Center, Rutgers University, Piscataway, NJ 08854-8018. This research was conducted while the author was a Phd student in the CS department at Cornell University. Research supported by ONR grant N0001498- 1-0589. ‡robin@orie.cornell.edu. School of ORIE, Cornell University, Ithaca, NY 14853. Research supported partially by a grant from Motorola, NSF grant DMI-0075627, and the Quer´etaro Campus of the Instituto Tecnol´ogico y de Estudios Superiores de Monterrey. §shmoys@cs.cornell.edu. School of ORIE and Dept. of Computer Science, Cornell University, Ithaca, NY 14853. Research supported partially by NSF grants CCR-9912422&CCR-0430682. 1 Introduction In this paper we address the fundamental problem of fnding computationally effcient and provably good inventory control policies in supply chains with correlated and non-stationary (time-dependent) stochastic demands. This problem arises in many domains and has many practical applications (see for example [8, 14]). We consider two classical models, the periodic-review stochastic inventory control problem and the stochastic lot-sizing problem with correlated and non-stationary demands. Here the correlation is inter- temporal, i.e., what we observe in period s changes our forecast for the demand in future periods. We provide what we believe to be the frst computationally effcient policies with constant worst-case performance guarantees; that is, there exists a constant C such that, for any instance of the problem, the expected cost of the policy is at most C times the expected cost of an optimal policy. A major domain of applications in which demand correlation and non-stationarity are commonly observed is where dynamic demand forecasts are used as part of the supply chain. Demand forecasts often serve as an essential managerial tool, especially when the demand environment is highly dynamic. The problem of how to use a demand forecast that evolves over time to devise an effcient and cost-effective inventory control policy is of great interest to managers, and has attracted the attention of many researchers over the years (see for example, [11, 6]). However, it is well known that such environments often induce high correlation between demands in different periods that makes it very hard to compute the optimal inventory policy. Another relevant and important domain of applications is for new products and/or new markets. These scenarios are often accompanied by an intensive promotion campaign and involve many uncertainties, which create high levels of correlation and non-stationarity in the demands over time. Correlation and non-stationarity also arise for products with strong cyclic demand patterns, and as products are phased out of the market. The two classical stochastic inventory control models considered in this paper capture many if not most of the application domains in which correlation and non-stationarity arise. More specifcally, we consider single-item models with one location and a fnite planning horizon of T discrete periods. The demands over the T periods are random variables that can be non-stationary and correlated. In the periodic-review stochastic inventory control problem, the cost consists of a per-unit, time-dependent ordering cost, a per- unit holding cost for carrying excess inventory from period to period and a per-unit backlogging cost, which is a penalty we incur for each unit of unsatisfed demand (where all shortages are fully backlogged). In addition, there is a lead time between the time an order is placed and the time that it actually arrives. In the stochastic lot-sizing problem, we consider, in addition, a fxed ordering cost that is incurred in each period in which an order is placed (regardless of its size), but with no lead time. In both models, the goal is to fnd a policy of orders with minimum expected overall discounted cost over the given planning horizon. The assumptions that we make on the demand distributions are very mild and generalize all of the currently known approaches in the literature to model correlation and non-stationarity of demands over time. This includes classical approaches like the martingale model of forecast evolution model (MMFE), exogenous Markovian demand, time series, order-one auto-regressive demand and random walks. For an overview of the different approaches and models, and for relevant references, we refer the reader to [11, 6]. Moreover, we believe that the models we consider are general enough to capture almost any other reasonable way of modelling correlation and non-stationarity of demands over time. These models have attracted the attention of many researchers over the years and there exists a huge body of related literature. The dominant paradigm in almost all of the existing literature has been to formulate these models using a dynamic programming framework. The optimization problem is defned recursively over time using subproblems for each possible state of the system. The state usually consists of a given time period, the level of the echelon inventory at the beginning of the period, a given conditional distribution on the future demands over the rest of the horizon, and possibly more information that is available by time t. For each subproblem, we compute an optimal solution to minimize the expected overall discounted cost from time t until the end of the horizon. This framework has turned out to be very effective in characterizing the optimal policy of the overall system. Surprisingly, the optimal policies for these rather complex models follow simple forms. In the models with only per-unit ordering cost, the optimal policy is a state-dependent base-stock policy. In each period, there exists an optimal target base-stock level that is determined only by the given conditional distribution (at that period) on future demands and possibly by additional information that is available, but it is not a function of the starting inventory level at the beginning of the period. The optimal policy aims to keep the inventory level at each period as close as possible to the target base-stock level. That is, it orders up to the target level whenever the inventory level at the beginning of the period is below that level, and orders nothing otherwise. The optimality of base-stock policies has been proven in many settings, including models with correlated demand and forecast evolution (see, for example, [11, 22]). For the models with fxed ordering cost, the optimal policy follows a slightly more complicated pattern. Now, in each period, there are lower and upper thresholds that are again determined only by the given conditional distribution (at that period) on future demands. Following an optimal policy, an order is placed in a certain period if and only if the inventory level at the beginning of the period has dropped below the lower threshold. Once an order is placed, the inventory level is increased up to the upper threshold. This class of policies is usually called state-dependent (s, S) policies. The optimality of state-dependent (s, S) ¨ policies has been proven for the case of non-stationary but independent demand [30]. Gallego and Ozer [9] have established their optimality for a model with correlated demands. We refer the reader to [6, 11, 30, 9] for the details on some of the results along these lines, as well as a comprehensive discussion of relevant literature. Unfortunately, the rather simple forms of these policies do not always lead to effcient algorithms for computing the optimal policies. The corresponding dynamic programs are relatively straightforward to solve if the demands in different periods are independent. Dynamic programming approach can still be tractable for uncapcitated models with exogenous Markov-modulated demand but under rather strong assumptions on the structure and the size of the state space of the underlying Markov process (see, for example, [27, 4]). However, in many scenarios with more complex demand structure the state space of the corresponding dynamic programs grows exponentially and explodes very fast. Thus, solving the corresponding dynamic programs becomes practically (and often also theoretically) intractable (see [11, 6] for relevant discussions on the MMFE model). This is especially true in the presence a complex demand structure where demands in different periods are correlated and non-stationary. The diffculty essentially comes from the fact that we need to solve ’too many’ subproblems. This phenomenon is known as the curse of dimensionality. Moreover, because of this phenomenon, it seems unlikely that there exists an effcient algorithm to solve these huge dynamic programs to optimality. This gap between the excellent knowledge on the structure of the optimal policies and the inability to compute them effciently provides the stimulus for future theoretical interest in these problems. For the periodic-review stochastic inventory control problem, Muharremoglu and Tsitsiklis [21] have proposed an alternative approach to the dynamic programming framework. They have observed that this problem can be decoupled into a series of unit supply-demand subproblems, where each subproblem corresponds to a single unit of supply and a single unit of demand that are matched together. This novel approach enabled them to substantially simplify some of the dynamic programming based proofs on the structure of optimal policies, as well as to prove several important new structural results. In particular, they have established the optimality of state-dependent base-stock policies for the uncapacitated model with general Markov-modulated demand. Using this unit decomposition, they have also suggested new methods to compute the optimal policies. However, their computational methods are essentially dynamic programming approaches applied to the unit subproblems, and hence they suffer from similar problems in the presence of correlated and non-stationary demand. Although our approach is very different than theirs, we use some of their ideas as technical tools in some of the proofs in the paper. As a result of this apparent computational intractability, many researchers have attempted to construct computationally effcient (but suboptimal) heuristics for these problems. However, we are aware of very few attempts to analyze the worst-case performance of these heuristics (see for example [17]). Moreover, we are aware of no computationally effcient policies for which there exist constant performance guarantees. For details on some of the proposed heuristics and a discussion of others, see [6, 17, 11]. One specifc class of suboptimal policies that has attracted a lot of attention is the class of myopic policies. In a myopic policy, in each period, we attempt to minimize the expected cost for that period, ignoring the potential effect on the cost in future periods. The myopic policy is attractive since it yields a base-stock policy that is easy to compute on-line, that is, it does not require information on the control policy in the future periods. In each period, we need to solve a single-variable convex minimization problem. In many cases, the myopic policy seems to perform well. However, in many other cases, especially when the demand can drop signifcantly from period to period, the myopic policy performs poorly. Veinott [29] and Ignall and Veinott [10] have shown that myopic policy can be optimal even in models with nonstationary demand as long as the demands are stochastically increasing over time. Iida and Zipkin [11] and Lu, Song and Regan [17] have focused on the martingale model of forecast evolution (MMFE) and shown necessary conditions and rather strong suffcient conditions for myopic policies to be optimal. They have also used myopic policies to compute upper and lower bounds on the optimal base-stock levels, as well as bounds on the relative difference between the optimal cost and the cost of different heuristics. However, the bounds they provide on this relative error are not constants. Chan and Muckstadt [2] have considered a different way for approximating huge dynamic programs that arise in the context of inventory control problems. More specifcally, they have considered un capacitated and capacitated multi-item models. Instead of solving the one period problem (as in the myopic policy) they have added to the one period problem a penalty function which they call Q-function. This function accounts for the holding cost incurred by the inventory left at the end of the period over the entire horizon. Their look ahead approach with respect to the holding cost is somewhat related to our approach, though signifcantly different. We note that our work is also related to a huge body of approximation results for stochastic and on-line combinatorial problems. The work on approximation results for stochastic combinatorial problems goes back to the work of M¨ oohring, ohring, Radermacher and Weiss [18, 19] and the more recent work of M¨ Schulz and Uetz [20]. They have considered stochastic scheduling problems. However, their performance guarantees are dependent on the specifc distributions (namely on second moment information). Recently, there is a growing stream of approximation results for several 2-stage stochastic combinatorial problems. For a comprehensive literature review we refer the reader to [28, 7, 25, 3]. We note that the problems we consider in this work are by nature multi-stage stochastic problems, which are usually much harder (see [5] for a recent result on the stochastic knapsack problem). Another approach that has been applied to these models is the robust optimization approach (see [1]). Here the assumption is of a distribution-free model, where instead the demands are assumed to be drawn from some specifed uncertainty set. Each policy is then evaluated with respect to the worst possible sequence of demands within the given uncertainty set. The goal is to fnd the policy with the best worst-case (i.e., a min-max approach). This objective is very different from the objective of minimizing expected (average cost) discussed in most of the existing literature, including this work. Our work is distinct from the existing literature in several signifcant ways, and is based on three key ideas: Marginal cost accounting scheme. We introduce a novel approach for cost accounting in uncapacitated stochastic inventory control problems. The standard dynamic programming approach directly assigns to the decision of how many units to order in each period only the expected holding and backlogging costs incurred in that period although this decision might effect the costs in future periods. Instead, our new cost accounting scheme assigns to the decision in each period all the expected costs that, once this decision is made, become independent of any decision made in future periods, and are dependent only on the future demands. Specifcally, we introduce a marginal holding cost accounting approach. This approach is based on the key observation that once we place an order for a certain number of units in some period, then the expected ordering and holding cost that these units are going to incur over the rest of the planning horizon is a function only of the realized demands over the rest of the horizon, not of future orders. Hence, with each period, we can associate the overall expected ordering and holding cost that is incurred by the units ordered in this period, over the entire horizon. We note that similar ideas of holding cost accounting were previously used in the context of models with continuous time, infnite horizon and stationary (Poisson distributed) demand (see, for example, the work of Axs¨ ater [23]). In addition, in an uncapacitated ater and Lundell [24] and Axs¨ model the decision of how many units to order in each period affects the expected backlogging cost in only a single future period, namely, a lead time ahead. Thus, our cost accounting approach is marginal, i.e., it associates with each period the overall expected costs that become independent of any future decision. We believe that this new approach will have more applications in the future in analyzing stochastic inventory control problems. Cost balancing. The idea of cost balancing was used in the past to construct heuristics with constant performance guarantees for deterministic inventory problems. The most well-known examples are the Silver- Meal Part-Period balancing heuristic for the lot-sizing problem (see [26]) and the Cost-Covering heuristic of Joneja for the joint-replenishment problem [13]. We are not aware of any application of these ideas to stochastic inventory control problems. The key observation is that any policy in any period incurs potential expected costs due to over ordering (namely, expected holding costs of carrying excess inventory) and under ordering (namely, expected backlogging costs incurred when demand is not met on time). For the periodic- review stochastic inventory control problem, we use the marginal cost accounting approach to construct a policy that, in each period, balances the expected (marginal) holding cost against the expected (marginal) backlogging cost. For the stochastic lot-sizing problem, we construct a policy that balances the expected fxed ordering cost, holding cost and backlogging cost over each interval between consecutive orders. As we shall show, the simple idea of balancing is powerful and leads to policies that have constant expected worst-case performance guarantees. We again believe that the balancing idea will have more applications in constructing and analyzing algorithms for other stochastic inventory control models (see [16] and [15] for follow-up work). Non base-stock policies. Our policies are not state-dependent base-stock policies, in that the order up-to level order of the policy in each period does depend on the inventory control in past periods, i.e., it depends on the inventory position at the beginning of the period. However, this enable us to use, in each period, the distributional information about the future demands beyond the current period (unlike the myopic policy), without the burden of solving huge dynamic programs. Moreover, our policies can be easily implemented on-line (like the myopic policy) and are simple, both conceptually and computationally (see [12]). Using these ideas we provide what is called a 2-approximation algorithm for the uncapacitated periodic- review stochastic inventory control problem; that is, the expected cost of our policies is no more than twice the expected cost of an optimal policy. Note that this is not the same requirement as stipulating that, for each realization of the demands, the cost of our policy is at most twice the expected cost of an optimal policy, which is a much more stringent requirement. We also note that these guarantees refer only to the worst-case performance and it is likely that the typical performance would be signifcantly better (see [12]). We then use a standard cost transformation to achieve signifcantly better guarantees if the ordering cost is the dominant part in the overall cost, as is the case in many real life situations. Our results are valid for all known approaches used to model correlated and non-stationary demands. We note that the analysis of the worst-case performance is tight. In particular, we describe a family of examples for which the ratio between the expected cost of the balancing policy and the expected cost of the optimal policy is asymptotically 2. We also present an extended class of myopic policies that provides easily computed upper bounds and lower bounds on the optimal base-stock levels. As shown in [12], these bounds combined with the balancing techniques lead to improved balancing policies. These policies have a worst-case performance guarantee of 2 and they seem to perform signifcantly better in practice. An interesting question that is not addressed in the current literature is whether the myopic policy has a constant worst-case performance guarantee. We provide a negative answer to this question, by showing a family of examples in which the expected cost of the myopic policy can be arbitrarily more expensive than the expected cost of an optimal policy. Our example provides additional insight into situations in which the myopic policy performs poorly. For the stochastic lot-sizing problem we provide a 3-approximation algorithm. This is again a worst-case analysis and we would expect the typical performance to be much better. The rest of the paper is organized as follows. In Section 2 we present a mathematical formulation of the periodic-review stochastic inventory control problem. Then in Section 3 we explain the details of our new marginal cost accounting approach. In Section 4 we describe a 2-approximation algorithm for the periodic-review stochastic inventory control problem. In Section 5 we present an extended class of myopic policies for this problem, develop upper and lower bounds on the optimal base-stock levels, and discuss the example in which the performance of the myopic policy is arbitrarily bad. The stochastic lot-sizing problem is discussed in Section 6, where we present a 3-approximation algorithm for the problem. We then conclude with some remarks and open research questions. 2 The Periodic-Review Stochastic Inventory Control Problem In this section, we provide the mathematical formulation of the periodic-review stochastic inventory problem and introduce some of the notation used throughout the paper. As a general convention, we distinguish between a random variable and its realization using capital letters and lower case letters, respectively. Script font is used to denote sets. We consider a fnite planning horizon of T periods numbered t =1,...,T (note that t and T are both deterministic unlike the convention above). The demands over these periods are random variables, denoted by D1,...,DT . As part of the model, we assume that at the beginning of each period s, we are given what we call an information set that is denoted by fs. The information set fs contains all of the information that is available at the beginning of time period s. More specifcally, the information set fs consists of the realized demands (d1,...,ds−1) over the interval [1,s), and possibly some more (external) information denoted by (w1,...,ws). The information set fs in period s is one specifc realization in the set of all possible realizations of the random vector Fs =(D1,...,Ds−1,W1,...,Ws). This set is denoted by Fs. In addition, we assume that in each period s, there is a known conditional joint distribution of the future demands (Ds,...,DT ), denoted by Is := Is(fs), which is determined by fs (i.e., knowing fs, we also know Is(fs)). For ease of notation, Dt will always denote the random demand in period t conditioning on some information set fs ∈Fs for some s ≤ t, where it will be clear from the context to which period s we refer. We will use t as the general index for time, and s will always refer to the current period. The only assumption on the demands is that for each s =1,...,T , and each fs ∈Fs, the conditional expectation E[Dt|fs] is well defned and fnite for each period t ≥ s. In particular, we allow non-stationarity and correlation between the demands in different periods. We note again that by allowing correlation we let Is be dependent on the realization of the demands over the periods 1,...,s − 1 and possibly on some other information that becomes available by time s (i.e., Is is a function of fs). However, the information set fs as well as the conditional joint distribution Is are assumed to be independent of the specifc inventory control policy being considered. In the periodic-review stochastic inventory control problem, our goal is to supply each unit of demand while attempting to avoid ordering it either too early or too late. In period t,(t =1,...,T ) three types of costs are incurred, a per-unit ordering cost ct for ordering any number of units at the beginning of period t, a per-unit holding cost ht for holding excess inventory from period t to t +1, and a per-unit backlogging penalty pt that is incurred for each unsatisfed unit of demand at the end of period t. Unsatisfed units of demand are usually called backorders. The assumption is that backorders fully accumulate over time until they are satisfed. That is, each unit of unsatisfed demand will stay in the system and will incur a backlogging penalty in each period until it is satisfed. In addition, we consider a model with a lead time of L periods between the time an order is placed and the time at which it actually arrives. We frst assume that the lead time is a known integer L. In Sub-Section 4.4, we will show that our policy can be modifed to handle stochastic lead times under the assumption of no order crossing (i.e., any order arrives no later than orders placed later in time). There is also a discount factor α ≤ 1. The cost incurred in period t is discounted by a factor of αt . Since the horizon is fnite and the cost parameters are time-dependent, we can assume without loss of generality that α =1. We also assume that there are no speculative motivations for holding inventory or having back orders in the system. To enforce this, we assume that, for each t =2,...,T − L, the inequalities ct ≤ ct−1 + ht+L−1 and ct ≤ ct+1 + pt+L are maintained (where cT +1 =0). (In case the the discount factor is smaller than 1, we require that αct ≤ ct−1 + αLht+L−1 and ct ≤ αct+1 + αLpt+L.) We also assume that the parameters ht,pt and ct are all non-negative. Note that the parameters hT and pT can be defned to take care of excess inventory and back orders at the end of the planning horizon. In particular, pT can be set to be high enough to ensure that there are very few back orders at the end of time period T . The goal is to fnd an ordering policy that minimizes the overall expected discounted ordering cost, holding cost and backlogging cost. We consider only policies that are non-anticipatory, i.e., at time s, the information that a feasible policy can use consists only of fs and the current inventory level. In particular, given any feasible policy P and conditioning on a specifc information set fs, we know the inventory level P x deterministically. s We will use D[s,t] to denote the accumulated demand over the interval [s, t], i.e., D[s,t] := Pt Dj . j=s We will also use superscripts P and OP T to refer to a given policy P and the optimal policy respectively. Given a feasible policy P , we describe the dynamics of the system using the following terminology. We let NIt denote the net inventory at the end of period t, which can be either positive (in the presence of physical on-hand inventory) or negative (in the presence of back orders). Since we consider a lead time of L periods, we also consider the orders that are on the way. The sum of the units included in these orders, added to the current net inventory is referred to as the inventory position of the system. We let Xt be the inventory position at the beginning of period t before the order in period t is placed, i.e., Xt := NIt−1 + Pt−1 Qj j=t−L (for t =1,...,T ), where Qj denotes the number of units ordered in period j (we will sometimes denote Pt−1 Qj by Q[t−L,t−1]). Similarly, we let Yt be the inventory position after the order in period t is j=t−L placed, i.e., Yt = Xt + Qt. Note that once we know the policy P and the information set fs ∈Fs, we can easily compute nis−1, xs and ys, where again these are the realizations of NIs−1,Xs and Ys, respectively. Since time is discrete, we next specify the sequence of events in each period s: 1. The order placed in period s−L of qs−L units arrives and the net inventory level increases accordingly to nis−1 + qs−L. 2. The decision of how many units to order in period s is made. Following a given policy P , qs units are ordered (0 ≤ qs). Consequently, the inventory position is raised by qs units (from xs to ys). This incurs a linear cost csqs. 3. We observe the demand in period s which is realized according to the conditional joint distribution Is. We also observe the new information set fs+1 ∈Fs+1, and hence we also know the updated conditional joint distribution Is+1. The net inventory and the inventory position each decrease by ds units. In particular, we have xs+1 = xs + qs − ds and nis+1 = nis + qs−L − ds. 4. If nis+1 > 0, then we incur a holding cost hsnis+1 (this means that there is excess inventory that needs to be carried to time period s +1). On the other hand, if nis+1 < 0 we incur a backlogging penalty pt|nis+1| (this means that there are currently unsatisfed units of demand). 3 Marginal Cost Accounting In this section, we present a new approach to the holding cost accounting in stochastic inventory control problems, which leads to what we call a marginal cost accounting scheme. Our approach differs from the traditional dynamic programming based approach. In particular, we account for the holding cost incurred by a feasible policy in a different way, which enables us to design and analyze new approximation algorithms. We believe that this approach will be useful in other stochastic inventory models. 3.1 Dynamic Programming Framework Traditionally, stochastic inventory control problems of the kind described in Section 2 are formulated using a dynamic programming framework. For simplicity, we discuss the case with L =0, where xs = nis (for a detailed discussion see Zipkin [30]). In a dynamic programming framework, the problem is defned recursively over time through subproblems that are defned for each possible state. A state usually consists of a time period t, an information set ft ∈Ft and the inventory position at the beginning of period t, denoted by xt. For each subproblem let Vt(xt,ft) be the optimal expected cost over the interval [t, T ] given that the inventory position at the beginning of period t was xt and the observed information set was ft. We seek to compute an optimal policy in period t that minimizes the expected cost over [t, T ] (i.e., minimizes Vt(xt,ft)) under the assumption that we are going to make optimal decisions in future periods. The space of feasible decisions consists of all orders of size 0 ≤ qt, or alternatively the level yt to which the inventory position is raised, where xt ≤ yt (and qt = yt − xt). Assuming that the optimal policy for all subproblems of states with periods t +1,...,T has been already computed, the dynamic programming formulation for computing the optimal policy for the subproblem of period t is Vt(xt,ft) = min {ct(yt − xt)+ E[ht(yt − Dt)+ + pt(Dt − yt)+|ft]+ xt≤yt E[Vt+1(yt − Dt,Ft+1)|ft]}. As can be seen the cost of any feasible decision xt ≤ yt is divided into two parts. The frst part is the period cost associated with period t, namely the ordering cost incurred by the order placed in period t and the resulted expected holding cost and backlogging cost in this period, i.e., ct(yt − xt)+ E[ht(yt − Dt)+ + pt(Dt − yt)+|ft]. In addition, there are the future costs over [t +1,T ] (again, assuming that optimal decisions are made in future periods). The impact of the decision in period t on the future costs is captured through the state in the next period, namely yt − Dt. In particular, in a standard dynamic programming framework, the cost accounted directly in each period t, is only the expected period cost, although the decision made in this period might imply additional costs in the future periods. We note that if L> 0, then the period cost is always computed a lead time ahead. That is, the period cost associated with the decision to order up to yt in period t is ct(yt − xt)+ E[ht+L(yt − D[t,t+L])+ + pt+L(D[t,t+L] − yt)+|ft], where D[t,t+L] is the accumulated demand over the lead time. Dynamic programming approach has turned out to be very effective in characterizing the structure of optimal policies. As was noted in Section 1, this yields an optimal base-stock policy, {R(ft): ft ∈Ft}. Given that the information set at time s is fs, the the optimal base-stock level is R(fs). The optimal policy then follows the following pattern. In case the inventory position at the beginning of period s is lower than R(fs) (i.e., xs Y OP T = {t : XB + Q2 }. Observe that for each period s, conditioned on some fs ∈Fs, we know Tπ ttt B B OPT deterministically x , q and, if the optimal policy is deterministic, we also know y . Therefore, we s 2 s know whether s ∈ TH or s ∈TΠ. If the optimal policy is also a randomized policy, we condition not only on fs but also on the decision made by the optimal policy in period s. Moreover, if s ∈ TH , then, ≤ Y OP T ≥ Y OP T with probability 1, Y B , and if s ∈Tπ, then, with probability 1, Y B . This implies that ss ss Lemmas 4.2 and 4.3 are also still valid. The following theorem is now established (the proof is identical to that of Theorem 4.4 above). Theorem 4.5 The randomized dual-balancing policy has a worst-case performance guarantee of 2, i.e., for each instance of the uncapacitated periodic-review stochastic inventory control problem, the expected cost of the randomized dual-balancing policy is at most twice the expected cost of an optimal solution, i.e, E[C(B)] ≤ 2E[C(OP T )]. 4.3.1 Dual-Balancing -Bad Example √ We shall show that the above analysis is tight. Consider an instance with h> 0, p = hL where L> 0 is again the a positive integer that denotes the lead time between the time an order is placed and the time it arrives. Also assume that T = 1+2L and α =1. The random demands have the following structure. There is one unit of demand that is going to occur with equal probability either in period L +1 or in period 2L +1. For each t 6= L +1, 2L +1, we have Dt =0 with probability 1. Fractional orders are allowed. It is readily verifed that the optimal policy orders 1 unit in period 1 and incurs expected cost of 1 hL. 2 √ On the other hand, the dual-balancing policy will order in each one of the periods 1,..., L +1 just a √ 1 small amount of the commodity. In particular, in period 1, the dual-balancing orders of a unit (this L+1 √ 1 can be calculated by equating 1 ( Lh)(1 − q)= Lhq, where q is the size of the order). It can be easily 22 verifed that when L goes to ∞, the ratio between the expected cost of the dual-balancing policy and the expected cost of the optimal policy converges to 2 (the calculations are rather messy to present but can be easily coded). 4.4 Stochastic Lead Times Next, we consider the more general model, where the lead time of an order placed in period s is some integer-valued random variable Ls (here we deviate from our convention in that ls is not he realization of Ls but the function defned above). However, we assume that the random variables L1,...,LT are correlated, and in particular, that s + Ls ≤ t + Lt for each s ≤ t. In other words, we assume that any order placed at time s will arrive no later than any other order placed after period s. This is a very common assumption in the inventory literature, usually described as ”no order crossing”. We next describe a version of the dual-balancing that provides a 2-approximation algorithm for this more general model. Let As be the set of all periods t ≥ s such that an order placed in s is the latest order to arrive by time period t. More precisely, As := {t ≥ s : s + Ls ≤ t and t0 + Lt0 > t, ∀t0 ∈ (s, t]}. Clearly, As is a random set of demands. Observe that the sets {As}T induce a partition of the planning horizon. s=1 Hence, we can write the cost of each feasible policy P in the following way: T C(P )= X(HP +( X ΠP )) st s=1 t∈As Now let Π˜ Ps := Pt∈As ΠPt and write C(P )= PT (HsP + Π˜ Ps ). Similar to the previous case, we s=1 consider in each period s the two functions E[HB|fs] and E[Π˜ B|fs], where again fs is the information set ss observed in period s. Here the expectation is with respect to future demands as well as future lead times. B Finally we order q units to balance these two functions. By arguments identical to those in Lemmas 4.1, s 4.2 and 4.3 we conclude that this policy yields a worst-case performance guarantee of 2. Observe that in order to implement the dual-balancing policy in this case, we have to know in period s the conditional distributions of the lead times of future orders (as seen from period s conditioned on some fs ∈Fs). This is required in order to evaluate the function E[Π˜ B |fs]. s 4.5 Cost Transformation In this section, we discuss in detail the cost transformation that enables us to assume, without loss of generality that, for each period t =1,...,T , we have ct =0 and ht,pt ≥ 0. Consider any instance of the problem with cost parameters that imply no speculative motivation for holding inventory or backorders (as discussed in Section 2). We use a simple standard transformation of the cost parameters (see [30]) to construct an equivalent instance, with the property that for each period t =1,...,T , we have ct =0 and ht,pt ≥ 0. The modifed instance has the same set of optimal policies. Applying the dual-balancing policy to that instance will provide a policy that is feasible and also has a performance guarantee of at most 2 with respect to the original problem. We shall also show that this cost transformation can improve the performance guarantee of the dual-balancing policy in cases where the ordering cost is the dominant part of the overall cost. In practice this is often the case. We now describe the transformation for the case with no lead time (L =0) and α =1; the extension to the case of arbitrary lead time and α< 1 is straightforward. Recall that any feasible policy P satisfes, for each t =1,...,T , Qt = NIt − NIt−1 + Dt (for ease of notation we omit the superscript P ). Using these equations we can express the ordering cost in each period t as ct(NIt − NIt−1 + Dt). Now replace NIt with NI+ − NI−, its respective positive and negative parts. tt ˆ This leads to the following transformation of cost parameters. We let cˆt := 0,ht := ht + ct − ct+1 (cT +1 = 0) and pˆt := pt − ct + ct+1. Note that the assumptions on the cost parameters ct,ht, and pt discussed in Section 2, and in particular, the assumption that there is no speculative motivation for holding inventory or backorders, imply that hˆ t and ˆbt above are non-negative (t =1,...,T ). Observe that the parameters hˆ t and ˆbt will still be non-negative even if the parameters ct,ht, and pt are negative and as long and the above assumption holds. Moreover, this enables us to incorporate into the model a negative salvage cost at the end of the planning horizon (after the cost transformation we will have non-negative cost parameters). It is readily verifed that the induced problem is equivalent to the original one. More specifcally, for each realization of the demands, the cost of each feasible policy P in the modifed input decreases by exactly PTt=1 ctdt (compared to its cost in the original input). Therefore, any optimal policy for the modifed input is also optimal for the original input. Now apply the dual-balancing policy to the modifed problem. We have seen that the assumptions on ct,ht and pt ensure that hˆ t and pˆt are non-negative and hence the analysis presented above is valid. Let opt and opt be the optimal expected cost of the original and modifed inputs, respectively. Clearly, ¯ opt = opt + E[ PTt=1 ctDt]. Now the expected cost of the dual-balancing policy in the modifed input is at ¯ most 2opt ¯ . Its cost in the original input is then at most 2opt ¯+ E[ PTt=1 ctDt]=2opt − E[ PtT =1 ctDt]. This implies that if E[ PTt=1 ctDt] is a large fraction of opt, then the performance guarantee of the expected cost of the dual-balancing policy might be signifcantly better than 2. For example, in case E[ PTt=1 ctDt] ≥ 0.5opt we can conclude that the expected cost of the dual-balancing policy would be at most 1.5opt. It is indeed the case in some real life problems that a major fraction of the total cost is due the ordering cost. The intuition of the above transformation is that PTt=1 ctDt is a cost that any feasible policy must pay. As a result, we treat it as an invariant in the cost of any policy and apply the approximation algorithm to the rest of the cost. In the case where we have a lead time L, we use the equations Qt := NIt+L − NIt+L−1 + Dt+L, for each t =1,...,T − L, to get the same cost transformation. The transformation for α> 1 is also straight forward. 5 A Class of Myopic Policies For many periodic-review inventory control problems with inter-temporal correlated demands, there is no known computationally tractable procedure for fnding the optimal base-stock inventory levels. As a result, various simpler heuristics have been considered in the literature. In particular, many researchers have considered a myopic policy. In the myopic policy, we follow a base-stock policy {Rmy(ft): ft ∈Ft}. For each period t and possible information set in period t, the target inventory level Rmy(ft) is computed as the minimizer of a one-period problem. More specifcally, in period s =1,...,T − L we focus only on minimizing the expected immediate cost that is going to be incurred in this period (or in s + L in the presence of a lead time L). In other words, the target inventory level Rmy(fs) minimizes the expected holding and backlogging costs in period s + L, while ignoring the cost over the rest of the horizon (i.e., the cost over (s + L, T + 1]). This optimization problem has been proven to be convex and hence easy to solve (see [30]). It is then possible to implement the myopic policy on-line, where in each period s, we compute the base-stock level based on the current observed information set fs. For each period t and each ft ∈Ft, the myopic base-stock level provides an upper bound on the optimal base-stock level (see [30] for a proof). The intuition is that the myopic policy underestimates the holding cost, since it considers only the one-period holding cost. Therefore, it always orders more units than the optimal policy. Clearly, this policy might not be optimal in general, though in many cases it seems to perform extremely well. Under rather strong conditions it might even be optimal (see [29, 10, 11, 17]). A natural question to ask is whether the myopic policy yields a constant performance guarantee for the periodic-review inventory control problem, i.e., is its expected cost always bounded by some constant times the optimal expected cost. In this section, we provide a negative answer to this question. We show that the expected cost of the myopic policy can be arbitrarily more expensive than the expected optimal cost, even for the case when the demands are independent and the costs are stationary. The example that we construct provides important intuition concerning the cases for which the myopic policy performs poorly. In addition, we describe an extended class of myopic policies that generalizes the myopic policy discussed above. It is interesting that this class of policies also provides a lower bound on the optimal base-stock levels. 5.1 Myopic Policy -Bad example Consider the following set of instances parameterized by T , the number of periods. We have a per-unit ordering cost of c =0, a per-unit holding cost h =1 and a unit backlogging penalty p =2. The demands are specifed as follows, D1 ∈{0, 1} with probability 0.5 for 0 and 1, respectively. For t =2,...,T −1,Dt := 0 with probability 1, and DT := 1 with probability 1. The lead time is considered to be equal 0, and α =1. It is easy to verify that the myopic policy will order 1 unit in period 1 and that this will result an expected cost of 0.5T . On the other hand, if we do not order in period 1, then the expected cost is 1. This implies that as T becomes larger the expected cost of the myopic policy is Ω(T ) times as expensive as the expected cost of the optimal policy. The above example indicates that the myopic policy may perform poorly in cases where the demand from period to period can vary a lot, and forecasts can go down. There are indeed many real-life situations, when this is exactly the case, including new markets, volatile markets or end-of-life products. 5.2 A Class of Myopic Policies As we mentioned before, by considering only the one-period problem, the myopic policy described above underestimates the actual holding cost that each unit ordered in period t is going to incur. This results in base-stock levels that are higher than the optimal base-stock levels. We now describe an alternative myopic base-stock policy that we call a minimizing policy. We shall show that the base-stock levels of this policy are always smaller than the corresponding optimal base-stock levels. Thus, combined with the classical myopic base-stock levels we derive both lower and upper bounds on the optimal base-stock levels. These bounds can be used in computing good inventory control policies. In particular, in follow-up work, Hurley, Jackson, Levi, Roundy and Shmoys [?] have shown that they these bounds can be used to derive improved balancing policy. These policies have a worst-case guarantee of 2 and their typical performance is signifcantly better than the dual-balancing policy described in this paper. In addition, they have shown that each policy that deviates from these respective lower and upper bounds can be improved. Recall the functions lP (qs),πP (qs) defned in Section 4 for each period s =1,...,T −L, where qs ≥ 0. ss Since at each period s we know xs, we can equivalently write lP (ys −xs),πP (ys −xs), where ys ≥ xs. We ss now consider in each period s the problem: minimize (lP (ys − xs)+ πP (ys − xs)) subject to ys ≥ xs , i.e., ss minimize the expected ordering and holding costs incurred by the units ordered in period s over [s, T ] and the backlogging cost incurred in period s + L, conditioned on some fs ∈Fs. We have already seen that this function is convex in ys. Observe that lP (ys − xs) − lP (ys) and πP (ys − xs) − πP (ys) do not depend on ys sss s for ys ≥ xs. This gives rise to the following equivalent one-period problem: minys≥xs (lP (ys)+ πP (ys)). ss That is, both problems have the same minimizer. It is also clear that the new minimization problem is also convex in ys and is easy to solve, in many cases as easy as the one-period problem solved by the myopic policy described above. We note that the function we minimize was used by Chan and Muckstadt [2]. For each t =1,...,T and ft ∈Ft, let RM (ft) be the smallest base-stock level resulted from the minimizing policy in period t, for a given observed information set ft. We now show that for each period t and ft ∈Ft, we have RM (ft) ≤ ROP T (ft), where ROP T (ft) is the optimal base-stock level. Theorem 5.1 For each period t and ft ∈Ft, we have RM (ft) ≤ ROP T (ft). Proof : Recall the dynamic programming based framework described in Section 3. Observe that for each state (xt,ft), we know that ROP T (ft) is the optimal base-stock level that results from the optimal solution for the corresponding subproblem defned over the interval [t, T ]. It is enough to show that the optimal solution for each such problem must be at least RM (ft). Assume otherwise, i.e., ROP T (fs) 0). For ease of notation, we will assume again, without loss of generality, that ct =0. We call this model the stochastic lot-sizing problem. The goal is again to fnd a policy that minimizes the expected discounted overall ordering, holding and backlogging costs. Naturally, this model is more complicated. Here we will assume that L =0, α =1 and that in each period t =1,...,T , the conditional joint distribution It of (Dt,...,DT ) is such that the demand Dt is known deterministically (i.e., with probability 1). The underlying assumption here is that at the beginning of period t our forecast for the demand in that period is suffciently accurate, so that we can assume it is given deterministically. A primary example is make-to-order systems. As noted in Section 1, for many settings it is known that the optimal solution can be described as a set {(st,St)=(st(ft),St(ft))}t. In each period t place an order if and only if the current inventory level is below st. If we place an order in period t, we will increase the inventory level up to St. We next describe a policy which we call the triple-balancing policy denoted by TB, and analyze its worst-case expected performance. Specifcally, we show that its expected cost is at most 3 times the expected cost of an optimal policy. We note that in this case the policy and its analysis are identical for discrete and continuous demands. 6.1 The Triple-Balancing Policy The policy follows two rules that specify when to place an order and how many units to order once an order is placed: ∗ Rule 1: When to order. At the beginning of period s, we let s be the period in which the triple-balancing ∗ ∗∗ policy has last placed an order, i.e., s is the latest order placed so far. Thus, s y . However, by the construction of the triple-balancing policy, we si si know that if OP T has more physical inventory, then the expected holding cost it will incur over [si,Si+1) is at least K. We conclude that in both cases, Z0·11(Si+1 ≤ T )|si,fsi )] ≥ K·11(Si+1 ≤ T |si,fsi ). Taking expectation i we have E[Z0 · 11(Si+1 ≤ T )|Si,FSi ] ≥ K · Pr(Si+1 ≤ T |Si,FSi ). i This implies that T E[C(OP T )] ≥ K · E[ X 11(Si y ) · (Hi|si,fsi + 11(Si+1 ≤ T |si,fsi ) · K). si si TB OPT In frst case where y ≤ y we know that OP T will incur over [si,Si+1) at least as much holding si si cost as the TB policy. By the construction of the algorithm we know that the TB policy will not incur more than K backlogging cost and will place at most one order over (si,Si+1]. In the second case where TB OPT y >y we know that the ordering cost and backlogging costs of OP T over (si,Si+1] are at least K, si si which is more than the backlogging cost the TB policy incurs on that interval. In addition, TB will incur holding cost Hi|si,fsi over [si,Si+1) and will place at most one order over (si,Si+1]. Taking expectation of both sides we conclude that: E[(Zi − Z 0 i )|Si,FSi ] ≤ E[11(y TB OPT ) · (K + 11(Si+1 ≤ T ) · K)|Si,Fsi ]+ ≤ y Sisi TB OPT E[11(y >y ) · (Hi + 11(Si+1 ≤ T ) · K)|Si,FSi ] ≤ E[K + 11(Si+1 ≤ T )|Si,FSi ]. Si Si The last inequality is by the construction of the algorithm (E[Hi|si,fsi ] ≤ K] for each Si = si and fsi ∈Fsi ). This implies that for each i =2,...,T , we have 0 i|Si, FSi E[K + 11(Si+1 ≤ T )]. Finally, we have that: T 0 E[(Zi − Z ) · 11(Si