Sampling-based Approximation Algorithms for Multi-stage Stochastic Optimization* Chaitanya Swamyt David B. Shmoys cswamy@ist.caltech.edu shmoys@cs.cornell.edu Abstract Stochastic optimization problems provide a means to model uncertainty in the input data where the uncertainty is modeled by a probability distribution over the possible realizations of the actual data. We consider a broad class of these problems in which the realized input is revealed through a series of stages, and hence are called multi-stage stochastic programming problems. Our main result is to give the first fully polynomial approximation scheme for a broad class of multi-stage stochastic linear programming problems with any constant number of stages. The algorithm analyzed, known as the sample average approximation (SAA) method, is quite simple, and is the one most commonly used in practice. The algorithm accesses the input by means of a “black box” that can generate, given a series of outcomes for the initial stages, a sample of the input according to the conditional probability distribution (given those outcomes). We use this to obtain the first polynomial-time approximation algorithms for a variety of -stage generalizations of basic combinatorial optimization problems. 1. Introduction Stochastic optimization problems provide a means to model uncertainty in the input data where the uncertainty is modeled by a probability distribution over the possible realizations of the actual data. We shall consider a broad class of these problems in which the realized input is revealed through a series of stages, and hence are called multi-stage stochastic programming problems. Multi-stage stochastic linear programming is an area that has received a great deal of attention within the Operations Research community, both in terms of the asymptotic convergence results, as well as computational work in a wide variety of application domains. For example, a classic example of such a model * A full version is available at ist.caltech.edu/ ˜cswamy/papers/multistage.ps Center for Mathematics of Information, Caltech, Pasadena, CA 91125. Dept. of Computer Science, Cornell University, Ithaca, NY 14853. Research supported partially by NSF grants CCF-0430682, DMI-0500263. seeks to minimize the expected cost of operating a water reservoir where one can decide, in each time period, the amount of irrigation water to be sold while maintaining the level of the reservoir within a specified range (where penalties are incurred for violating this constraint). The source of uncertainty is, of course, the variability in rainfall, and there is a simulation model that provides a means to sample from the distribution of inputs (of rainfall amounts per time period within the planning horizon) [2]. Observe that it is important to model this as a multi-stage process, rather than as a 2-stage one, since it allows us to capture essential conditional information, such as given a drought over the previous period, the next period is more likely to continue these conditions. Furthermore, within multi-stage stochastic linear programming, most work has focused on applications in which there are a small number of stages, including forest planning models electricity investment planning, bond investment planning, and currency options selection, as discussed in the recent survey of Ariyawansa and Felt [1]. Our main result is to give the first fully polynomial randomized approximation scheme (FPRAS) for a broad class of multi-stage stochastic linear programming problems with any constant number of stages. Although our results are much more general, we shall focus on a canonical example of the class of problems, a 3-stage stochastic variant of the fractional set covering problem. We are given a family of sets over a ground set and a probability distribution over the subsets that specifies a target set of ground elements that must be covered. We can view the three stages as specified by a scenario tree with 3 levels of nodes: the root, internal nodes, and leaves; the root corresponds to the initial state, each leaf is labeled with a target subset of elements that must be covered, and for each node in the tree there is a conditional distribution of the target sets at leaves within this subtree (where we condition on the fact that we have reached that node). One can buy (fractionally) sets at any node paying a cost that depends both on the set and the node at which it is bought. We want to be able to compute, given a node in the tree, the desired action, so as to minimize the expected total cost of fractionally covering the realized target set. This problem can be modeled as an exponentially large linear program (LP) in which there is, for each set and each node in the tree, a variable that indicates the fraction of that is bought at that node. The constraints say that for each leaf, for each ground element in its corresponding target set, the total fraction bought of sets that contain along this root-leaf path must be at least 1. If we view the probability of reaching a node as specified, it is straightforward to express the expected total cost as a linear function of these decision variables. As a corollary of our FPRAS, we also give the first approximation algorithms for the analogous class of multi-stage stochastic integer programs (IPs), such as the integer version of this set covering problem. For a rich class of -stage stochastic linear programming problems, where is assumed to be constant and not part of the input, we show that, for any 􀀀, we can compute, with high probability, a solution with expected cost guaranteed, for any probability distribution over inputs, to be within a factor of the optimal expected cost, in time bounded by a polynomial in the input size, 􀀀 , and a parameter that is an upper bound on the ratio between the cost of the same action (e.g., buying the set ) over successive stages. The algorithm accesses the input by means of a “black-box” (simulation) procedure that can generate, for any node in the scenario tree, a sample of the input according to the conditional distribution for this node. This is an extremely general model of the distribution, since it allows all types of correlated effects within different parts of the input. We improve upon our earlier work [13], which handles the very special case in which , not only by being able to handle any fixed number of stages, but whereas the earlier algorithm is based on the ellipsoid method, we can now show that the algorithm most commonly used in practice, the sample average approximation method (SAA), also yields the claimed approximation scheme. The algorithm of Shmoys & Swamy[13] for 2-stage problems is based on computing an approximate subgradient with respect to a compact convex programming formulation, and this is done by estimating each component of the subgradient sufficiently accurately, and then applying the ellipsoid method using these approximate subgradients. In the sample average approximation method, we merely sample scenarios a given (polynomial) number of times , and by computing the frequencies of occurrence in these samples, we derive a new LP that is a polynomial- sized approximation to the original exponential-sized LP, and the solve this compact LP explicitly. We first argue that using (approximate) subgradients one can establish a notion of closeness between two functions (e.g., the objective functions of the “true” LP and the SAA LP), so that if two functions are “close” in terms of their subgradients, then minimizing one function is equivalent to approximately minimizing the other. Next, we show that with a polynomially bounded sample size, the objective functions of the “true” problem and the sample-average problem satisfy this “closeness-in-subgradients” property with high probability, and therefore minimizing the sample-average problem yields a near-optimal solution to the true problem; thus we prove the polynomial-time convergence of the SAA method. Our proof does not rely on anything specific to discrete probability distributions, and therefore extends to the case of continuous distributions. Compare now the 3-stage and 2-stage problems. In the 2stage fractional set-covering problem, the compact convex program has variables corresponding only to the decisions made at the root to (fractionally) buy sets. Each component of the subgradient at the current point can be estimated by sampling a leaf from the scenario tree and using the optimal dual solution for the linear program that minimizes the cost to cover each element in this leaf’s target set to the extent it is not already covered by the root variables. In the 3-stage version, a 2-stage stochastic LP plays the analogous role of the linear program and we need to obtain a near-optimal dual solution for this exponentially large mathematical program to show the closeness property. Moreover, one difficulty that is not encountered in the 2-stage case, is that now this 2-stage recourse LP is different in the sample average and the “true” problems, since the conditional distribution of scenarios given a second-stage outcome is only approximated in the sample average problem. Thus to show the closeness property one has to argue that solving the dual of the sample average 2-stage recourse LP yields a near- optimal solution to the “true” 2-stage recourse LP. We introduce a novel compact non-linear formulation of this dual, for which we can prove such a statement for the duals, and thereby obtain the “closeness-in-subgradients” property for the 3-stage problem. In fact, this formulation yields a new means to provide lower bounds on 2-stage stochastic LPs, which might be of interest in its own right. The analogous idea can be applied inductively to obtain the FPRAS for any fixed number of stages. We believe that our proof is of independent interest and that our approach of using subgradients will find applications in proving convergence results in other stochastic models as well. Due to its simplicity and its use in practice, the SAA method has been studied extensively in the stochastic programming literature. Although it has been shown that the SAA method produces solutions that converge to the optimal solution as the number of samples gets sufficiently large (see, e.g., [11] and its references), no results were known that bound the number of samples needed to obtain a -optimal solution by a polynomial in the input size, 􀀀 and . Prior to our work, for 2-stage stochastic optimization, bounds on the sample size required by the SAA method were proved in [9], but this bound depends on the variance of a certain quantity that need not depend polynomially on the input size or . Recently, Nemirovskii and Shapiro (personal communication) showed that for 2-stage set-cover with non-scenario-dependent second-stage costs, the bound of [9] is a polynomial bound, provided that one applies the SAA method after some preprocessing to eliminate certain first-stage decisions. For multi-stage problems with arbitrary distributions, to the best of our knowledge, there are no results known about the rate of convergence of the sample average approximation to the true optimal solution (with high probability). In fact, we are not aware of any work (even outside of the sample average approach) that proves worst-case bounds on the sample size required for solving multi-stage stochastic linear programs with arbitrary distributions in the black-box model. Very recently, Shapiro [12] proved bounds on the sample size required in the SAA method for multi-stage problems, under the strong assumption that the distributions in the different stages are independent. In particular, this implies that the distribution of the outcomes in any stage , and hence of the scenarios in stage , does not depend on the outcomes in the previous stages, which fails to capture the notion of learning new information about the uncertainty as one proceeds through the stages. Moreover, as in the 2-stage case, the bounds in [12] are not polynomial in the input size or , even when the number of stages is fixed. It is important to note that we prove that an optimal solution to the SAA LP is a near-optimal solution to true LP, not that the optimal value of the SAA LP is a good approximation to the true optimal value. Indeed, one interesting question is to show, for any class of stochastic IPs and LPs, if one could obtain an approximation algorithm to the case in which there are only a polynomial number of scenarios, then one can also obtain an approximation algorithm for the general case. Subsequent to the dissemination of early versions of our work [14], Charikar, Chekuri and P´al [3] have obtained such a result for 2-stage problems. There has been a series of recent papers on approximation algorithms for 2-stage stochastic integer programming problems. Most of this work has focused on more restricted mechanisms for specifying the distribution of inputs [4, 10, 8]; Gupta, P´al, Ravi, and Sinha [5] were the first to consider the “black-box” model, and gave approximation algorithms for various 2-stage problems, but with the restriction that the second-stage costs be proportional to the first-stage costs. Shmoys and Swamy [13] showed that one could derive approximation algorithms for most of the stochastic integer programming problems considered in [4, 10, 8, 5] by adopting a natural LP rounding approach that, in effect, converted an LP-based approximation guarantee for the deterministic analogue to a guarantee for the stochastic generalization (where the performance guarantee degraded by a factor of 2 in the process). An immediate consequence of our approximation scheme for multi-stage stochastic linear programs is that we obtain approximation algorithms for several natural multistage stochastic integer programming problems, by extending the rounding approach of [13]. The only other work on multi-stage problems in the black-box model is due to Hayrapetyan, Swamy, and Tardos [7], and Gupta et al. [6] (done concurrently with this work). Both present -approximation algorithms for a -stage version of the Steiner tree problem under some restrictions on the costs; the latter also gives algorithms for the -stage versions of the vertex cover and facility location problems under the same cost restrictions, but their approximation ratio is exponential in . In contrast, in the black-box model without any cost restrictions, we obtain performance guarantees of for -stage set cover, for -stage vertex cover and -stage multicut on trees, and for the stage facility location problem. Finally, we obtain a FPRAS for a -stage multicommodity flow problem as a direct consequence of our stochastic linear programming result. 2. Preliminaries We state some definitions and basic facts that we will frequently use. Let denote the norm of . We say that function 􀀀 􀀀, has Lipschitz constant if for all 􀀀 . Definition 2.1 We say that is a subgradient of a function 􀀀 if the inequality 􀀀 at the point holds for every 􀀀 . We say that dis an -subgradient of at if for every ,we have d . The above definition is slightly weaker than the notion of an -subgradient as defined in [13], but it is easy to see that one can also implement the algorithm in [13] using the notion of an approximate subgradient given by Definition 2.1. The following claim will be useful in bounding the Lipschitz constant of the functions encountered. Claim 2.2 ([13]) Let denote a subgradient of a function 􀀀 􀀀 at point . Suppose for every . Then has Lipschitz constant (at most) . We will consider both convex minimization problems and concave maximization problems where we optimize over a polytope 􀀀 > . Analogous to Definition 2.1, we define a -subgradient, and an approximate version of it, that we use for concave maximization problems. Definition 2.3 We say that is a -subgradient of a function 􀀀 􀀀 at 􀀀 if for every point 􀀀 , we have . We say that dis an - -subgradient of at if for every we have d . When is clear from the context, we drop the from -subgradient and - -subgradient, and if 􀀀 we drop it from the notation. We will frequently use -subgradients which we abbreviate to -subgradients. We need the following sampling lemma which is proved using simple Chernoff bounds. 􀀀 2 Lemma 2.4 Let 2 be Æ iid random variables where each , 􀀀, , and is an arbitrary positive number. Let and . Then Æ. 3. The Sample Average Approximation method Suppose we have a black-box that can generate, for any sequence of outcomes for the initial stages, independent samples from the conditional distribution of scenarios given those initial outcomes. A natural approach to computing near-optimal solutions for these problems given such sampling access is the sample average approximation (SAA) approach: sample some times from the distribution on scenarios, estimate the actual distribution by the distribution induced by the samples, and solve the multi-stage problem specified by this approximate distribution. For 2-stage programs, we just estimate the probability of a scenario by its frequency in the sampled set; for -stage programs we construct an approximate -level distribution tree by sampling repeatedly at each level: we sample times to obtain some stage 2 outcomes, for each sampled outcome we sample times from the conditional distribution given that outcome and so on, and for each sampled outcome we estimate its conditional probability of occurrence given the previous-stage outcome by its frequency in the sampled set. The multi-stage problem specified by the approximate distribution is called the sample average problem, and its objective function is called the sample average function. If the total number of samples is polynomially bounded, then since the approximate distribution has support of size at most , the sample average problem can be solved efficiently by solving a polynomial size linear program. The issue here is the sample size required to guarantee that every optimal solution to the sample-average problem is a near-optimal solution to the original problem with high probability. We show that for any given (which is not part of the input), for a large class of -stage stochastic LPs, one can bound by a polynomial in the input size, the inverse of the desired accuracy, and the maximum ratio between the cost of an action in successive stages. Intuitively, to prove such a theorem, we need to show that the sample-average function is a close approximation to the true function in some sense. One obvious approach would be to argue that, with high probability, the values of the sample average function and the true function are close to each other, at a sufficiently dense set of points. This however immediately runs into problems since the variance in the scenario costs could be quite (exponentially) large, so one cannot estimate the true function value, that is, the expected scenario cost, to within a reasonable accuracy using a small (polynomial) number of samples. The basic problem is that there could be very low-probability outcomes that contribute significantly towards the cost in the true problem, but will almost never be sampled with only a polynomial number of samples (so they contribute nothing to the sample average function). The key insight is that such rare outcomes do not much influence the optimal first- stage decisions, since one would defer decisions for such outcomes till later. The minimizer of a convex function is determined by its “slope” (i.e., gradient or subgradient), which suggests that perhaps we should compare the slopes of the sample-average and the true objective functions and show that they are close to each other, and argue that this is sufficient to prove the near-equivalence of the corresponding minimization problems. Our proof builds upon this intuition. A subgradient is the analogue of a gradient for a non-differentiable function, and is a measure of the “slope” of the function. We identify a notion of closeness between any two functions based on their (approximate) subgradients so that if two functions are close under this criterion, then minimizing one is approximately equivalent to minimizing the other. Next, we show that the objective functions of the original multi-stage problem, and the sample average problem with polynomially bounded sample size, satisfy this “closeness-in-subgradients” property, and thus we obtain the desired result. Proof details. The proof is organized as follows. First, in Lemma 3.1 we show that given functions and d that agree in terms of their (approximate) subgradients at points in a polytope , every optimal solution to ˘ d is a near-optimal solution to ˘ . Some intuition about why this closeness-in-subgradients property is sufficient can be obtained by considering the ellipsoid-based algorithm for convex minimization given in [13]. This algorithm uses only (approximate) subgradient information about the convex function to be minimized, using a sub- gradient or an -subgradient of the function to derive a cut passing through the center of the current ellipsoid at a feasible point and make progress. Suppose at every feasible point , there is a vector that is both a subgradient of d and an -subgradient of at . One can then use to generate the cut at , so that the ellipsoid-based algorithm will run identically on both ˘ and ˘ d and return a point that is simultaneously near-optimal for both objective functions. Lemma 3.1 makes this intuition precise while weakening the assumption and strengthening the conclusion: we only require that at every point in a sufficiently dense finite set there be a vector that is both a subgradient of d and an -subgradient of , and we prove that every optimal solution to ˘ d is a near-optimal solution to ˘ . Lemma 3.2 proves an analogous result for maximization problems. The second part of the proof is to show that the objective functions of the true problem and the sample average problem (with polynomial samples) satisfy this closeness- in-subgradients property. This is divided into three parts. For the class of 2-stage linear programs considered in [13], this is easy to show (Theorem 3.5) because the subgradient at any point is the expectation (according to the scenario distribution) of a quantity derived from the optimal solutions to the dual of the recourse LP for each scenario, and this recourse LP is the same in both the sample average and the true problems. Thus, since the subgradient components have bounded variance [13], the closeness property follows. For the -stage problem however, one needs to develop several substantial new ideas to show this closeness property, even when ˇ. We introduce these ideas in Section 4 by focusing on 3-stage problems, and in particular, on the LP relaxation of 3-stage set cover as an illustrative example. We then generalize these ideas to prove an SAA theorem for a large class of 3-stage linear programs, and in Section 5 inductively apply the arguments to a broad class of -stage problems. The main difficulty, and the essential difference from the 2-stage case, is that now the recourse problem for each second-stage outcome is a 2-stage stochastic LP whose underlying distribution is only estimated in the sample average problem. So the sample average problem and the true problem solve different recourse problems for each stage 2 outcome. A (approximate) subgradient is obtained from the (approximately) optimal solutions to the dual of the 2-stage recourse LP for each scenario, therefore to show closeness in subgradients we need to argue that maximizing the sample average dual yields a near-optimal solution to the true dual, that is, prove an SAA theorem for the dual of a 2-stage stochastic primal program! Mimicking the approach for the primal problem, we prove this by showing that the two dual objective functions agree in terms of their -subgradients. However, simply considering the LP dual of the 2-stage primal recourse LP does not work; a -subgradient of the linear dual objective function is just the constant vector specifying the conditional probabilities of the stage 3 scenarios given the outcome in stage 2, and one cannot estimate the true conditional distribution using only a polynomial number of samples, in particular, because rare scenarios will almost never be sampled. To circumvent this problem, we introduce a novel compact, non-linear formulation of the dual, which turns the dual objective function into a concave function whose -subgradient can be computed by solving a 2-stage primal stochastic problem. We use the earlier SAA theorem for 2-stage programs to show that any optimal solution to this 2-stage LP in the sample average dual, is a near-optimal solution to the 2-stage LP in the true dual. This shows that the two dual objective functions (in this new representation) are close in terms of their -subgradients, thereby proving that an optimal solution the sample average dual is a near-optimal solution to the true dual. This in turn establishes the closeness in subgradients of the objective functions of the 3-stage sample average problem and the true 3-stage problem and yields the SAA theorem. Sufficiency of closeness in subgradients. Let 􀀀 􀀀 􀀀 be two functions with Lipschitz 􀀀 and d constant (at most) . Let 􀀀 be the bounded feasible > region and be a radius such that is contained in the ball 􀀀 . Let 􀀀 be two parameters with . Set and . Let for all Set . We call the extended -grid of the polytope . Note that for every , there exists such that . Fix 􀀀. We first consider minimization problems. We say that functions and d satisfy property (A) if d 􀀀 d is a subgradient of d , and an -subgradient of at. (A) Lemma 3.1 Suppose functions and d satisfy property (A). Let d be points that respectively minimize and d , with 􀀀, and let be a point such that d d d . Then, (i) d ˆ ; (ii) ˆ . Proof : We prove part (i); part (ii) is proved almost iden tically. Suppose first that d . Let ˙ be the point in closest to ,so ˙ and therefore 􀀀􀀀 ˙ . Let d ¼ ¼ ˙ and consider the vector d given by property (A). It must d d be that d ˙ 􀀀, otherwise we would have d d d contradicting the optimality of d. So, by the definition of an -subgradient, 􀀀 we have 􀀀 ˝ ˙ 􀀀 since . Also 􀀀 d ¼ since d ˙ . So, d ˇ . Now suppose d . Let ˛ be the point in closest to d,so ˛ d and d ˛ d d . For any , we have d ˛ , otherwise d ˛ d . Let ˙, and ˛ 􀀀 for .For 􀀀 d ˛ , and . The sample average function d has since d the same form as , but has a different distribution, so is an -subgradient of at , ˝ 􀀀 . This implies that ˝ ˙ ˝ . So d ˆ . Lemma 3.2 states an analogous result for maximization problems. We say that and d satisfy property (B) if d 􀀀 d is a -subgradient of d , and an - -subgradient of at. (B) Lemma 3.2 Suppose functions and dsatisfy property (B). Let and d be points in that respectively maximize functions and d , and suppose 􀀀. Then, d ˝ . Lemma 3.3 Let be the extended -grid of . Then . The SAA bound for 2-stage programs. We now prove a polynomial SAA bound for the class of 2-stage programs considered in [13] (this was stated with extra constraints but these are handled below). ˘ 􀀀 > (P) ˘ 􀀀 > 􀀀 > Here (a) 􀀀 for every scenario , and (b) for every , 􀀀 and the primal and dual problems corresponding to are feasible for every scenario .It is assumed that 􀀀 where is polynomially d d is a subgradient of d at , and d . So (by Claim 2.2) the Lipschitz constant of d is at most . Observe that d is just averaged over the random scenarios sampled to construct d , and d where the expectation is over these random samples. Theorem 3.5 For any 􀀀 with probability at least Æ, any optimal solution d to the sample average problem constructed with p 􀀀 􀀀 􀀀 sam- Æ ples, satisfies d O ˆ . Proof : We show that and d satisfy property (A) with probability Æ with the stated sample size; the rest follows from Lemma 3.1. Define , and let be the extended -grid. Note that is polynomially bounded in the input size. Let . 􀀀 Using Lemma 2.4, if we sample 2 2 Æ times to construct d then at a given point , subgradient d of d is component-wise close to its expectation with probability at least Æ , so by Lemma 3.4 d isan subgradient of at (with high probability). So with probability at least Æ, d is an -subgradient of at every point . Using Lemma 3.3 to bound , we get that . 􀀀Æ Under the mild assumption that (a) the point 􀀀􀀀 (i.e., deferring all decisions to stage 2) lies in , and (b) for every scenario , either is minimized with 􀀀􀀀,or for every , it was shown in [13] that by sampling 􀀀 times initially one can detect with Æ probability Æ (with Æ 􀀀 ), that either 􀀀 is an Æ .So if we optimal solution to (P), or that O ; we assume 􀀀 Æ bounded. Define detect that O is large, then we can set appropriatelythat is known. Let O be the optimum value and to get a -optimal solution with probability Æ, denote the size of the input. The sample average problem is to minimize the sample average function d using the SAA method with p 􀀀 􀀀 samples. Æ d over , where d , is the total number of samples and is the number of times scenario is sampled. Lemma 3.4 Let be a subgradient of at the point , and suppose that d is a vector such that d for all . Then dis an -subgradient (i.e., an 􀀀 -subgradient) of at . It is shown in [13] that at any point ,if is an optimal solution to the dual of , then (i) is a subgradient of ; (ii) for any component and any scenario , component of the vec tor lies in ; and therefore (iii) 4. 3-stage stochastic programs 3-stage stochastic set cover. Our techniques yield a polynomial-sample bound for a broad class of 3-stage programs, but before considering a generic 3-stage program, we introduce and explain the main ideas involved by focusing on the 3-stage stochastic set cover problem. In the stochastic set cover problem, we are given a universe of elements and a family of subsets of , and the set of elements to cover is determined by a probability distribution. In the 3-stage problem this distribution is specified by a 3-level tree. We use to denote an outcome in stage 2, and to denote a stage 3 scenario where was the stage 2 outcome. Let be the set of all stage 2 outcomes, and for each let ˘ is a scenario . Let and be the probabilities of outcome and scenario respectively, and let . Note that for every . We have to cover the (random) set of elements ˇ in scenario , and we can buy a set in stage 1, or in stage 2 outcome , or in scenario incurring a cost of , and respectively. We use and respectively to denote the decisions in stage 1, outcome and scenario respectively and consider the following fractional relaxation: ˘ (3SSC-P) 􀀀 ˘ > (3Rec-P) ˘ 􀀀 ? ˇ Let 􀀀 􀀀 and O ˘ . The total sample size in the sample average problem is where, (i) is the sample size used to estimate probability by the frequency d , and (ii) is the number of samples generated from the conditional distribution of scenarios in ˘ for each with d 􀀀 to estimate by d . The sample average problem is similar to (3SSC-P) with d replacing , and d replacing in the recourse problem .We use d ˘ > d and d dd to denote the sam ple average recourse problem for outcome and the sample average function respectively. As mentioned earlier, the main difficulty in showing that the sample average and the true functions satisfy the closeness-in-subgradients property, is that these two problems now solve different recourse problems, d and respectively, for an outcome . Since the subgradient is obtained from a dual solution, this entails first proving an SAA theorem for the dual which suggests that solving the dual of d yields a near-optimal solution to the dual of . To achieve this, we first formulate the dual as a compact concave maximization problem, then show that by slightly modifying the two dual programs, the dual objective functions become close in terms of their subgradients, and then use Lemma 3.2 to obtain the required SAA theorem (for the duals). A -subgradient of the dual objective function is obtained from the optimal solution of a 2-stage primal problem and we use Theorem 3.5 to prove the closeness in -subgradients of the sample average dual and the true dual. In Section 5 we show that this argument can be applied inductively to prove an SAA bound for a large class of -stage stochastic LPs. Let 􀀀􀀀° (respectively d 􀀀° ) denote the recourse problem (respectively d ) with 􀀀 and costs , that is, 􀀀􀀀° ˘ > 􀀀 . We formulate the follow ing dual of the true and sample average recourse problems: ° d ° where ° 􀀀° and d ° d 􀀀􀀀° . Lemma 4.1 At any point and outcome , and d . Lemma 4.2 Fix . Let be a solution to of value ° for every . Then, (i) is an -subgradient of at with ; (ii) if d is a vector such that d , then dis an -subgradient of at . Lemma 4.1 proves strong duality (in this new dual representation), which is used by Lemma 4.2. Lemma 4.2 also shows that any optimal solution d to yields d dd as a subgradient of d at , so to prove the closeness in subgradients of and d it suffices to argue that d is a near-optimal solution to . (Note that both and d have Lipschitz constant at most .) We could try to argue this by showing that ° and d ° are close in terms of their -subgradients (i.e., satisfy property (B)), however some technical difficulties arise here. A -subgradient of ° at is obtained from a solution to the 2-stage problem given by 􀀀° , and to show closeness in -subgradients at we need to argue that an optimal solution d to d 􀀀􀀀° is a near-optimal solution to 􀀀􀀀° . But this need not be true since the ratio of the second-and first stage costs in the 2-stage problem 􀀀° , could be unbounded. To tackle this, we consider instead the modified dual problems ° and d ° for a suitable 􀀀. Define and d d . As in Lemma 4.2, one can show that near-optimal solutions to for every yield an approximate subgradient of at . But now the cost ratio in the 2-stage problem 􀀀􀀀° is at most 2 for any , and this gives the SAA bound stated in Lemma 4.3 below; we present the proof after Theorem 4.6. Using Lemma 4.3, we show the closeness in sub- gradients of and d , and this suffices to show that if d minimizes d then it is a near-optimal solution to . Lemma 4.3 For any parameters 􀀀, any , and any outcome , if we use Æ p 􀀀 􀀀 samples to construct the recourse Æ problem d , then any optimal solution d to satisfies ° d with probability at least Æ. Lemma 4.4 Consider the sample average function d gen 􀀀 􀀀 erated using Æ 2 samples 2 Æ Æ from stage 2, and samples from stage 3 for each outcome with d 􀀀. At any point , sub- gradient d of d is an -subgradient of with probability at least Æ. Claim 4.5 For any , . Similarly d d d . Theorem 4.6 For any 􀀀 , one can construct d with p 􀀀 􀀀 􀀀 samples, and with prob- Æ ability at least Æ, any optimal solution d to ˘ d satisfies d O ˜ . Proof : Assume that without loss of generality. Let and . Note that is polynomially bounded in the input size. Set and . We show that (i) a near-optimal solution to ˘ d yields a near-optimal solution to ˘ , and (ii) minimizing and d over is roughly the same as approximately minimizing and d respectively over . Let ˙ be an optimal solution to ˘ d .By Claim 4.5, d d d ˙ ˙, and 􀀀 ˘ O . Let be the extended 􀀀 􀀀 which is a polynomial in 􀀀 and 􀀀 , where we Æ use Lemma 3.3 to bound . We construct d using Æ samples. Since is polynomially bounded, Lemma 4.3 shows that so is . Using Lemma 4.4 and the union bound over all points in , probability at least Æ, at every point , subgradient d of d is an -subgradient of . So by parts (i) and (ii) of Lemma 3.1, we have that ˙ O ˆ and d O ˆ ˙ with high probability (since d d d ˙ ˙). Combining this with the bound on O , the bounds ˙ ˙ , d d (Claim 4.5), and plugging in and , we get that d O ˜ . Under the very mild assumption that for every scenario with ˇ ˆ ˙, for every and 􀀀, , we have the following. Lemma 4.7 By sampling 􀀀 times, one can Æ detect with probability Æ Æ 􀀀 that either 􀀀􀀀 is Æ an optimal solution to (3SSC-P), or that O . Thus, as in the 2-stage case, we can obtain a optimal solution with the SAA method (with high probability) using polynomially many samples. Proof of Lemma 4.3 : Let 􀀀 . Clearly we may assume that in the problems 􀀀􀀀° and d 􀀀° . Let ,so 􀀀 . We want to show that if d solves , then ° d with high probability. By a now familiar approach, we will show that d ° and ° are close in terms of their -subgradients and then use Lemma 3.2. Let ° 􀀀 .We only consider - -subgradients, so we drop from now on. A -subgradient to ° (resp. d ° )at is obtained from the solution to the 2-stage problem 􀀀􀀀° (resp. d 􀀀􀀀° ): Lemma 4.8 Fix and . Let . If is a solution to 􀀀° of value ° 􀀀° , then is an - -subgradient of ° at . We can bound the Lipschitz constant of ° and d ° by ˝ , since . Since , the ratio of costs in the two stages in the 2-stage problem 􀀀􀀀° is at most 2 . Set and . Set and . Observe that is polynomially bounded. Recall that d is an optimal solution to . Let be the extended -grid of and . By Theorem 3.5, if we use Æ 2 Æ p 2 samples from ˘ to con- Æ Æ struct , then with probability at least , at a given point , any optimal solution d to d 􀀀􀀀° satisfies ° d 􀀀° . So by applying Lemma 4.8 and the union bound over all points in , with probability at least Æ, at each point , the -subgradient d of d ° at isan - -subgradient of ° at .By Lemma 3.2, we have ° d ˝ grid of and . Let 2 which is at least . Since and are p 􀀀 , we get that Æ p 􀀀 􀀀 . Æ A class of solvable 3-stage programs. The above arguments can be adapted to prove an SAA bound for the following broad class of 3-stage stochastic programs. ˘ (3G-P) 􀀀 ? ˘ > ˘ 􀀀 ? 􀀀 ? where for every scenario , (a) 􀀀, and (b) for every and 􀀀, 􀀀 ˛. Let ;we assume that is known. Let O be the optimum value and denote the input size. We assume that there is some with p such that 􀀀 , and further we assume that for any and any , there is an optimal solution to lying in 􀀀 . These assumptions are fairly mild and unrestrictive; in particular, they hold trivially for the fractional relaxations of 0-1 integer programs and many combinatorial optimization problems. We obtain the guarantee stated in Theorem 4.9 below; as before, we can use Lemma 4.7 to convert this to a guarantee under the mild assumption that 􀀀 , and for every scenario either is minimized at 􀀀 or for every and 􀀀, . Theorem 4.9 For any 􀀀, one can construct the sample average approximation d to (3G-P) with p 􀀀 􀀀 􀀀 samples, and with probability at Æ least Æ, any optimal solution d to ˘ d satisfies d O ˜ . Remark: Theorem 4.9 also handles programs, with constraints , with 􀀀,in . 5. The bound for k-stage stochastic programs We now extend our techniques to solve -stage stochastic linear programs. Here is a constant that is not part of the input; the running time will be exponential in . In the -stage problem, the scenario distribution is specified by a -level tree, called the scenario tree. We start at the root of this tree at level 1, which represents the first- stage. Each node at level represents an outcome in stage . At a leaf node, the uncertainty has completely resolved itself and we know the input precisely. A scenario always refers to a stage outcome, i.e., a leaf of the tree. The goal is to choose the first stage elements so as to minimize the total expected cost, i.e., 􀀀 stage cost where the expectation is taken over all scenarios. Let be the set of all children of , p be the set of all ancestors (including )of . Let be the probability that outcome occurs, and be the conditional probability that occurs given the previous-stage outcome. The distribution can be arbitrary, and can incorporate correlation effects from previous stages. Let denote the costs in outcome , be the decisions taken in outcome and Y , where p and is ’s parent, be all the decisions taken in the previous stages. Let °° Y for the root . We consider the following generic -stage LP. ˘ c 􀀀 ( G-P) 􀀀 Y ˘ > c Y for at level 􀀀 Y ˘ 􀀀 > 􀀀 > We need that for every with parent and feasible decisions Y , 􀀀 and 􀀀 􀀀 Y ˛, and that there is some with polynomially bounded such that each 􀀀 Y , , has an optimal solution in 􀀀 . Let be the input size, be the ratio c . The sample average problem is of the same form as ( G-P), where the probability is estimated by the frequency d of outcome in the appropriate sampled set. The SAA bound for programs of the form ( G-P) follows by induction. Theorem 3.5 supplies the base case; the induction step, where we show that an SAA bound for stage programs of the form 􀀀 yields an SAA bound for , follows by dovetailing the arguments in Section 4. As in the 3-stage case, we formulate a concave-maximization problem 􀀀 that is dual to 􀀀 , which is a - ˘ problem with a -stage primal problem embedded inside it. Using the induction hypothesis, we argue that the objective functions of the (slightly modified) true and sample-average dual programs are close in terms of their -subgradients, so that any optimal solution to the (modified) sample-average dual is a near-optimal solution to the (modified) true dual. This in turn allows us to show (essentially) the closeness in subgradients of the sam ple average and true functions. Lemma 3.1 completes the induction step. We obtain the following result, which also yields a -guarantee under some mild assumptions. Theorem 5.1 For any 􀀀, with probability Æ, any optimal solution d to the sample average problem con structed using p 􀀀 samples satisfies d Æ􀀀 . 6. Applications We consider a number of -stage stochastic optimization problems, where is a constant, for which we prove the first known performance guarantees. Our algorithms do not assume anything about the distribution or the cost structure of the input. Previously, algorithms for these problems were known only in the 2-stage setting initially in restricted settings [10, 8, 5], and later without any restrictions [13]. For each of these -stage problems, we can write a -stage LP for the linear relaxation of the problem for which Theorem 5.1 applies, and round the near-optimal fractional solution obtained by solving the sample average problem using an extension of the rounding scheme in [13]. Multicommodity flow We consider a stochastic version of the concurrent multicommodity flow problem where we have to buy capacity to install on the edges so that one can concurrently ship demand of each commodity from its source to its sink . The demand is uncertain and is revealed in -stages. We can buy capacity on edge in any stage outcome at a cost of ; and the total amount of capacity that we can install on an edge is limited by its capacity . The goal is to minimize the expected capacity installation cost. We obtain a -optimal solution. Covering problems We consider the -stage versions of set cover, vertex cover and the multicut problem on trees. In each of these problems, there are elements in some universe that need to covered by sets. In the -stage stochastic problem, the target set of elements to cover is determined by a probability distribution, and becomes known after a sequence of stages. In each outcome , we can purchase a set at a price of . We have to determine which sets to buy in stage I so as to minimize the (expected) total cost of buying sets. We can generalize the rounding theorem in [13] to show that a -approximation algorithm for the deterministic analogue, that uses the natural LP relaxation as a lower bound, yields a -approximation algorithm for the -stage problem. In general, to compute the decisions in a stage outcome, we solve a -stage problem, and round the solution. We get performance guarantees of for -stage set cover, and for -stage vertex cover and -stage multicut on trees. Facility location problems In the -stage uncapacitated facility location (UFL) problem, we are given some candidate facility locations, a set of clients, and a probability distribution on the client demands that evolves over -stages. In each stage, one can buy facilities paying a certain facility opening cost; in stage , we know the exact demands and we have to assign each client’s demand to an open facility incurring a client assignment cost. The goal is to minimize the expected total cost. Adapting the procedure in [13], we obtain a -approximation algorithms for -stage UFL, and -stage UFL with penalties, or with soft capacities. References [1] K. A. Ariyawansa and A. J. Felt On a new collection of stochastic linear programming test problems. INFORMS Journal on Computing, 16(3):291–299, 2004. [2] J. R. Birge and F. V. Louveaux. Introduction to Stochastic Programming. Springer-Verlag, NY, 1997. [3] M. Charikar, C. Chekuri, and M. P´al. Sampling bounds for stochastic optimization. Proc. RANDOM, 2005. To appear. [4] S. Dye, L. Stougie, and A. Tomasgard. The stochastic single resource service-provision problem. Naval Research Logistics, 50(8):869–887, 2003. Also appeared as COSOR- Memorandum 99-13, Dept. of Mathematics and Computer Sc., Eindhoven, Tech. Univ., Eindhoven, 1999. [5] A. Gupta, M. P´al, R. Ravi, and A. Sinha. Boosted sampling: approximation algorithms for stochastic optimization. Proc. 36th STOC, pages 417–426, 2004. [6] A. Gupta, M. P´al, R. Ravi, & A. Sinha. What about Wednesday? Approximation algorithms for multistage stochastic optimization. Proc. 8th APPROX, 2005. To appear. ´ for information networks. Proc. 16th SODA, 933–942, 2005. [7] A. Hayrapetyan, C. Swamy, and E. Tardos Network design [8] N. Immorlica, D. Karger, M. Minkoff, and V. Mirrokni. On the costs and benefits of procrastination: approximation algorithms for stochastic combinatorial optimization problems. Proc. 15th SODA, pages 684–693, 2004. [9] A. J. Kleywegt, A. Shapiro, and T. Homem-De-Mello. The sample average approximation method for stochastic discrete optimization. SIAM J. Optim., 12:479–502, 2001. [10] R. Ravi and A. Sinha. Hedging uncertainty: approximation algorithms for stochastic optimization problems. Proc. 10th IPCO, pages 101–115, 2004. [11] A. Shapiro. Monte Carlo sampling methods. In A. Ruszczynski and A. Shapiro, editors, Stochastic Programming, volume 10 of Handbooks in Oper. Res. and Management Sc., North-Holland, Amsterdam, 2003. [12] A. Shapiro. On complexity of multistage stochastic programs. Optimization Online, 2005. http://www.optimizationonline. org/DB FILE/2005/01/1041.pdf. [13] D. B. Shmoys and C. Swamy. Stochastic optimization is (almost) as easy as deterministic optimization. Proc. 45th FOCS, pages 228–237, 2004. [14] C. Swamy and D. Shmoys. The sample average approximation method for 2-stage stochastic optimization. November 2004. http://ist.caltech.edu/˜cswamy/papers/SAAproof.pdf.