Sampling-based Approximation Algorithms for Multi-stage Stochastic 
Optimization* 

Chaitanya Swamyt David B. Shmoys 
cswamy@ist.caltech.edu shmoys@cs.cornell.edu 

Abstract 

Stochastic optimization problems provide a means to 
model uncertainty in the input data where the uncertainty 
is modeled by a probability distribution over the possible 
realizations of the actual data. We consider a broad class 
of these problems in which the realized input is revealed 
through a series of stages, and hence are called multi-stage 
stochastic programming problems. Our main result is to 
give the first fully polynomial approximation scheme for a 
broad class of multi-stage stochastic linear programming 
problems with any constant number of stages. The algorithm 
analyzed, known as the sample average approximation 
(SAA) method, is quite simple, and is the one most commonly 
used in practice. The algorithm accesses the input by 
means of a “black box” that can generate, given a series of 
outcomes for the initial stages, a sample of the input according 
to the conditional probability distribution (given those 
outcomes). We use this to obtain the first polynomial-time 
approximation algorithms for a variety of -stage generalizations 
of basic combinatorial optimization problems. 

1. Introduction 

Stochastic optimization problems provide a means to 
model uncertainty in the input data where the uncertainty is 
modeled by a probability distribution over the possible realizations 
of the actual data. We shall consider a broad class 
of these problems in which the realized input is revealed 
through a series of stages, and hence are called multi-stage 
stochastic programming problems. Multi-stage stochastic 
linear programming is an area that has received a great 
deal of attention within the Operations Research community, 
both in terms of the asymptotic convergence results, as 
well as computational work in a wide variety of application 
domains. For example, a classic example of such a model 

* A full version is available at ist.caltech.edu/ ˜cswamy/papers/multistage.ps 
Center for Mathematics of Information, Caltech, Pasadena, CA 91125. 
Dept. of Computer Science, Cornell University, Ithaca, NY 14853. 

Research supported partially by NSF grants CCF-0430682, DMI-0500263. 

seeks to minimize the expected cost of operating a water 
reservoir where one can decide, in each time period, the 
amount of irrigation water to be sold while maintaining the 
level of the reservoir within a specified range (where penalties 
are incurred for violating this constraint). The source 
of uncertainty is, of course, the variability in rainfall, and 
there is a simulation model that provides a means to sample 
from the distribution of inputs (of rainfall amounts per time 
period within the planning horizon) [2]. Observe that it is 
important to model this as a multi-stage process, rather than 
as a 2-stage one, since it allows us to capture essential conditional 
information, such as given a drought over the previous 
period, the next period is more likely to continue these 
conditions. Furthermore, within multi-stage stochastic linear 
programming, most work has focused on applications 
in which there are a small number of stages, including forest 
planning models electricity investment planning, bond 
investment planning, and currency options selection, as discussed 
in the recent survey of Ariyawansa and Felt [1]. 

Our main result is to give the first fully polynomial randomized 
approximation scheme (FPRAS) for a broad class 
of multi-stage stochastic linear programming problems with 
any constant number of stages. Although our results are 
much more general, we shall focus on a canonical example 
of the class of problems, a 3-stage stochastic variant of the 
fractional set covering problem. We are given a family of 
sets over a ground set and a probability distribution over the 
subsets that specifies a target set of ground elements that 
must be covered. We can view the three stages as specified 
by a scenario tree with 3 levels of nodes: the root, internal 
nodes, and leaves; the root corresponds to the initial 
state, each leaf is labeled with a target subset of elements 
that must be covered, and for each node in the tree there is 
a conditional distribution of the target sets at leaves within 
this subtree (where we condition on the fact that we have 
reached that node). One can buy (fractionally) sets at any 
node paying a cost that depends both on the set and the node 
at which it is bought. We want to be able to compute, given 
a node in the tree, the desired action, so as to minimize the 
expected total cost of fractionally covering the realized target 
set. This problem can be modeled as an exponentially 


large linear program (LP) in which there is, for each set 
and each node in the tree, a variable that indicates the fraction 
of that is bought at that node. The constraints say that 
for each leaf, for each ground element in its corresponding 
target set, the total fraction bought of sets that contain 
along this root-leaf path must be at least 1. If we view the 
probability of reaching a node as specified, it is straightforward 
to express the expected total cost as a linear function 
of these decision variables. As a corollary of our FPRAS, 
we also give the first approximation algorithms for the analogous 
class of multi-stage stochastic integer programs (IPs), 
such as the integer version of this set covering problem. 


For a rich class of -stage stochastic linear programming 
problems, where is assumed to be constant and not part 
of the input, we show that, for any 􀀀, we can compute, 
with high probability, a solution with expected cost 
guaranteed, for any probability distribution over inputs, to 
be within a factor of the optimal expected cost, in 
time bounded by a polynomial in the input size, 􀀀 , and a 
parameter that is an upper bound on the ratio between the 
cost of the same action (e.g., buying the set ) over successive 
stages. The algorithm accesses the input by means of 
a “black-box” (simulation) procedure that can generate, for 
any node in the scenario tree, a sample of the input according 
to the conditional distribution for this node. This is an 
extremely general model of the distribution, since it allows 
all types of correlated effects within different parts of the 
input. We improve upon our earlier work [13], which handles 
the very special case in which , not only by being 
able to handle any fixed number of stages, but whereas the 
earlier algorithm is based on the ellipsoid method, we can 
now show that the algorithm most commonly used in practice, 
the sample average approximation method (SAA), also 
yields the claimed approximation scheme. 

The algorithm of Shmoys & Swamy[13] for 2-stage 
problems is based on computing an approximate subgradient 
with respect to a compact convex programming formulation, 
and this is done by estimating each component 
of the subgradient sufficiently accurately, and then applying 
the ellipsoid method using these approximate subgradients. 
In the sample average approximation method, we 
merely sample scenarios a given (polynomial) number of 
times , and by computing the frequencies of occurrence 
in these samples, we derive a new LP that is a polynomial-
sized approximation to the original exponential-sized LP, 
and the solve this compact LP explicitly. We first argue 
that using (approximate) subgradients one can establish a 
notion of closeness between two functions (e.g., the objective 
functions of the “true” LP and the SAA LP), so 
that if two functions are “close” in terms of their subgradients, 
then minimizing one function is equivalent to approximately 
minimizing the other. Next, we show that with 
a polynomially bounded sample size, the objective functions 
of the “true” problem and the sample-average problem 
satisfy this “closeness-in-subgradients” property with high 
probability, and therefore minimizing the sample-average 
problem yields a near-optimal solution to the true problem; 
thus we prove the polynomial-time convergence of the SAA 
method. Our proof does not rely on anything specific to discrete 
probability distributions, and therefore extends to the 
case of continuous distributions. 

Compare now the 3-stage and 2-stage problems. In the 2stage 
fractional set-covering problem, the compact convex 
program has variables corresponding only to the decisions 
made at the root to (fractionally) buy sets. Each component 
of the subgradient at the current point can be estimated by 
sampling a leaf from the scenario tree and using the optimal 
dual solution for the linear program that minimizes the cost 
to cover each element in this leaf’s target set to the extent it 
is not already covered by the root variables. In the 3-stage 
version, a 2-stage stochastic LP plays the analogous role 
of the linear program and we need to obtain a near-optimal 
dual solution for this exponentially large mathematical program 
to show the closeness property. Moreover, one difficulty 
that is not encountered in the 2-stage case, is that now 
this 2-stage recourse LP is different in the sample average 
and the “true” problems, since the conditional distribution 
of scenarios given a second-stage outcome is only approximated 
in the sample average problem. Thus to show the 
closeness property one has to argue that solving the dual 
of the sample average 2-stage recourse LP yields a near-
optimal solution to the “true” 2-stage recourse LP. We introduce 
a novel compact non-linear formulation of this dual, 
for which we can prove such a statement for the duals, and 
thereby obtain the “closeness-in-subgradients” property for 
the 3-stage problem. In fact, this formulation yields a new 
means to provide lower bounds on 2-stage stochastic LPs, 
which might be of interest in its own right. The analogous 
idea can be applied inductively to obtain the FPRAS for 
any fixed number of stages. We believe that our proof is of 
independent interest and that our approach of using subgradients 
will find applications in proving convergence results 
in other stochastic models as well. 

Due to its simplicity and its use in practice, the SAA 
method has been studied extensively in the stochastic programming 
literature. Although it has been shown that the 
SAA method produces solutions that converge to the optimal 
solution as the number of samples gets sufficiently 
large (see, e.g., [11] and its references), no results were 
known that bound the number of samples needed to obtain 
a -optimal solution by a polynomial in the input 
size, 􀀀 and . Prior to our work, for 2-stage stochastic optimization, 
bounds on the sample size required by the SAA 
method were proved in [9], but this bound depends on the 
variance of a certain quantity that need not depend polynomially 
on the input size or . Recently, Nemirovskii and 



Shapiro (personal communication) showed that for 2-stage 
set-cover with non-scenario-dependent second-stage costs, 
the bound of [9] is a polynomial bound, provided that one 
applies the SAA method after some preprocessing to eliminate 
certain first-stage decisions. 

For multi-stage problems with arbitrary distributions, to 
the best of our knowledge, there are no results known about 
the rate of convergence of the sample average approximation 
to the true optimal solution (with high probability). In 
fact, we are not aware of any work (even outside of the sample 
average approach) that proves worst-case bounds on the 
sample size required for solving multi-stage stochastic linear 
programs with arbitrary distributions in the black-box 
model. Very recently, Shapiro [12] proved bounds on the 
sample size required in the SAA method for multi-stage 
problems, under the strong assumption that the distributions 
in the different stages are independent. In particular, this 
implies that the distribution of the outcomes in any stage 

, and hence of the scenarios in stage , does not depend 
on the outcomes in the previous stages, which fails to capture 
the notion of learning new information about the uncertainty 
as one proceeds through the stages. Moreover, as 
in the 2-stage case, the bounds in [12] are not polynomial 
in the input size or , even when the number of stages is 
fixed. It is important to note that we prove that an optimal 
solution to the SAA LP is a near-optimal solution to true LP, 
not that the optimal value of the SAA LP is a good approximation 
to the true optimal value. Indeed, one interesting 
question is to show, for any class of stochastic IPs and LPs, 
if one could obtain an approximation algorithm to the case 
in which there are only a polynomial number of scenarios, 
then one can also obtain an approximation algorithm for the 
general case. Subsequent to the dissemination of early versions 
of our work [14], Charikar, Chekuri and P´al [3] have 
obtained such a result for 2-stage problems. 

There has been a series of recent papers on approximation 
algorithms for 2-stage stochastic integer programming 
problems. Most of this work has focused on more 
restricted mechanisms for specifying the distribution of inputs 
[4, 10, 8]; Gupta, P´al, Ravi, and Sinha [5] were the 
first to consider the “black-box” model, and gave approximation 
algorithms for various 2-stage problems, but with 
the restriction that the second-stage costs be proportional 
to the first-stage costs. Shmoys and Swamy [13] showed 
that one could derive approximation algorithms for most of 
the stochastic integer programming problems considered in 
[4, 10, 8, 5] by adopting a natural LP rounding approach 
that, in effect, converted an LP-based approximation guarantee 
for the deterministic analogue to a guarantee for the 
stochastic generalization (where the performance guarantee 
degraded by a factor of 2 in the process). 

An immediate consequence of our approximation 
scheme for multi-stage stochastic linear programs is that we 
obtain approximation algorithms for several natural multistage 
stochastic integer programming problems, by extending 
the rounding approach of [13]. The only other work 
on multi-stage problems in the black-box model is due 
to Hayrapetyan, Swamy, and Tardos [7], and Gupta et 
al. [6] (done concurrently with this work). Both present 

 -approximation algorithms for a -stage version of the 
Steiner tree problem under some restrictions on the costs; 
the latter also gives algorithms for the -stage versions of 
the vertex cover and facility location problems under the 
same cost restrictions, but their approximation ratio is exponential 
in . In contrast, in the black-box model without 
any cost restrictions, we obtain performance guarantees of 

 
for -stage set cover, for -stage vertex cover and 

-stage multicut on trees, and for the stage 
facility location problem. Finally, we obtain a FPRAS 
for a -stage multicommodity flow problem as a direct consequence 
of our stochastic linear programming result. 

2. Preliminaries 

We state some definitions and basic facts that we will 
frequently use. Let denote the norm of . We say 
that function 􀀀 
􀀀, has Lipschitz constant if 
for all 
􀀀 
. 

Definition 2.1 We say that is a subgradient of a function 

􀀀 
if the inequality 

 
􀀀 
at the point 

 holds for every 
􀀀 
. We say that dis an 
-subgradient of at if for every ,we 
have d . 

The above definition is slightly weaker than the notion 
of an -subgradient as defined in [13], but it is easy to 
see that one can also implement the algorithm in [13] using 
the notion of an approximate subgradient given by Definition 
2.1. The following claim will be useful in bounding the 
Lipschitz constant of the functions encountered. 

Claim 2.2 ([13]) Let denote a subgradient of a function 
􀀀 
􀀀 
at point . Suppose for 
every . Then has Lipschitz constant (at most) . 

We will consider both convex minimization problems 
and concave maximization problems where we optimize 
over a polytope 􀀀
> 
. Analogous to Definition 2.1, 
we define a -subgradient, and an approximate version 
of it, that we use for concave maximization problems. 

Definition 2.3 We say that is a -subgradient of a 
function 􀀀 
􀀀 
at 
􀀀 
if for every point 
􀀀 
, 
we have . We say that dis an 
- -subgradient of at if for every 

 we have d . 


When is clear from the context, we drop the from 
-subgradient and - -subgradient, 
and if 􀀀 we drop it from the notation. We will frequently 
use -subgradients which we abbreviate 
to -subgradients. We need the following sampling 
lemma which is proved using simple Chernoff bounds. 

 

 􀀀 2 

Lemma 2.4 Let 2 be

Æ 

iid random variables where each , 􀀀, 
, and is an arbitrary positive number. 

 

Let and . Then

 

 Æ. 

3. The Sample Average Approximation method 

Suppose we have a black-box that can generate, for any 
sequence of outcomes for the initial stages, independent 
samples from the conditional distribution of scenarios given 
those initial outcomes. A natural approach to computing 
near-optimal solutions for these problems given such sampling 
access is the sample average approximation (SAA) 
approach: sample some times from the distribution on 
scenarios, estimate the actual distribution by the distribution 
induced by the samples, and solve the multi-stage problem 
specified by this approximate distribution. For 2-stage programs, 
we just estimate the probability of a scenario by 
its frequency in the sampled set; for -stage programs we 
construct an approximate -level distribution tree by sampling 
repeatedly at each level: we sample times to obtain 
some stage 2 outcomes, for each sampled outcome we 
sample times from the conditional distribution given that 
outcome and so on, and for each sampled outcome we estimate 
its conditional probability of occurrence given the 
previous-stage outcome by its frequency in the sampled set. 
The multi-stage problem specified by the approximate distribution 
is called the sample average problem, and its objective 
function is called the sample average function. 

If the total number of samples is polynomially 
bounded, then since the approximate distribution has support 
of size at most , the sample average problem can 
be solved efficiently by solving a polynomial size linear 
program. The issue here is the sample size required to 
guarantee that every optimal solution to the sample-average 
problem is a near-optimal solution to the original problem 
with high probability. We show that for any given (which 
is not part of the input), for a large class of -stage stochastic 
LPs, one can bound by a polynomial in the input size, 
the inverse of the desired accuracy, and the maximum ratio 

between the cost of an action in successive stages. 

Intuitively, to prove such a theorem, we need to show 
that the sample-average function is a close approximation 
to the true function in some sense. One obvious approach 
would be to argue that, with high probability, the values of 
the sample average function and the true function are close 
to each other, at a sufficiently dense set of points. This 
however immediately runs into problems since the variance 
in the scenario costs could be quite (exponentially) large, 
so one cannot estimate the true function value, that is, the 
expected scenario cost, to within a reasonable accuracy 
using a small (polynomial) number of samples. The basic 
problem is that there could be very low-probability outcomes 
that contribute significantly towards the cost in the 
true problem, but will almost never be sampled with only a 
polynomial number of samples (so they contribute nothing 
to the sample average function). The key insight is that 
such rare outcomes do not much influence the optimal first-
stage decisions, since one would defer decisions for such 
outcomes till later. The minimizer of a convex function is 
determined by its “slope” (i.e., gradient or subgradient), 
which suggests that perhaps we should compare the slopes 
of the sample-average and the true objective functions and 
show that they are close to each other, and argue that this 
is sufficient to prove the near-equivalence of the corresponding 
minimization problems. Our proof builds upon 
this intuition. A subgradient is the analogue of a gradient 
for a non-differentiable function, and is a measure of the 
“slope” of the function. We identify a notion of closeness 
between any two functions based on their (approximate) 
subgradients so that if two functions are close under this 
criterion, then minimizing one is approximately equivalent 
to minimizing the other. Next, we show that the objective 
functions of the original multi-stage problem, and the sample 
average problem with polynomially bounded sample 
size, satisfy this “closeness-in-subgradients” property, and 
thus we obtain the desired result. 

Proof details. The proof is organized as follows. First, 
in Lemma 3.1 we show that given functions and d that 
agree in terms of their (approximate) subgradients at points 
in a polytope , every optimal solution to ˘ d is 
a near-optimal solution to ˘ . Some intuition 
about why this closeness-in-subgradients property is sufficient 
can be obtained by considering the ellipsoid-based 
algorithm for convex minimization given in [13]. This algorithm 
uses only (approximate) subgradient information 
about the convex function to be minimized, using a sub-
gradient or an -subgradient of the function to derive a cut 
passing through the center of the current ellipsoid at a feasible 
point and make progress. Suppose at every feasible 
point , there is a vector that is both a subgradient of 
d and an -subgradient of at . One can then use 
to generate the cut at , so that the ellipsoid-based algorithm 
will run identically on both ˘ and ˘ d 
and return a point that is simultaneously near-optimal for 
both objective functions. Lemma 3.1 makes this intuition 
precise while weakening the assumption and strengthening 


the conclusion: we only require that at every point in a 
sufficiently dense finite set there be a vector that 
is both a subgradient of d and an -subgradient of , 
and we prove that every optimal solution to ˘ d is 
a near-optimal solution to ˘ . Lemma 3.2 proves 
an analogous result for maximization problems. 

The second part of the proof is to show that the objective 
functions of the true problem and the sample average 
problem (with polynomial samples) satisfy this closeness-
in-subgradients property. This is divided into three parts. 
For the class of 2-stage linear programs considered in [13], 
this is easy to show (Theorem 3.5) because the subgradient 
at any point is the expectation (according to the scenario 
distribution) of a quantity derived from the optimal solutions 
to the dual of the recourse LP for each scenario, and 
this recourse LP is the same in both the sample average and 
the true problems. Thus, since the subgradient components 
have bounded variance [13], the closeness property follows. 

For the -stage problem however, one needs to develop 
several substantial new ideas to show this closeness property, 
even when ˇ. We introduce these ideas in Section 
4 by focusing on 3-stage problems, and in particular, 
on the LP relaxation of 3-stage set cover as an illustrative 
example. We then generalize these ideas to prove 
an SAA theorem for a large class of 3-stage linear programs, 
and in Section 5 inductively apply the arguments 
to a broad class of -stage problems. The main difficulty, 
and the essential difference from the 2-stage case, is that 
now the recourse problem for each second-stage outcome 
is a 2-stage stochastic LP whose underlying distribution is 
only estimated in the sample average problem. So the sample 
average problem and the true problem solve different 
recourse problems for each stage 2 outcome. A (approximate) 
subgradient is obtained from the (approximately) optimal 
solutions to the dual of the 2-stage recourse LP for 
each scenario, therefore to show closeness in subgradients 
we need to argue that maximizing the sample average dual 
yields a near-optimal solution to the true dual, that is, prove 
an SAA theorem for the dual of a 2-stage stochastic primal 
program! Mimicking the approach for the primal problem, 
we prove this by showing that the two dual objective functions 
agree in terms of their -subgradients. However, 
simply considering the LP dual of the 2-stage primal recourse 
LP does not work; a -subgradient of the linear 
dual objective function is just the constant vector specifying 
the conditional probabilities of the stage 3 scenarios given 
the outcome in stage 2, and one cannot estimate the true 
conditional distribution using only a polynomial number of 
samples, in particular, because rare scenarios will almost 
never be sampled. To circumvent this problem, we introduce 
a novel compact, non-linear formulation of the dual, 
which turns the dual objective function into a concave function 
whose -subgradient can be computed by solving a 
2-stage primal stochastic problem. We use the earlier SAA 
theorem for 2-stage programs to show that any optimal solution 
to this 2-stage LP in the sample average dual, is a 
near-optimal solution to the 2-stage LP in the true dual. This 
shows that the two dual objective functions (in this new representation) 
are close in terms of their -subgradients, 
thereby proving that an optimal solution the sample average 
dual is a near-optimal solution to the true dual. This in 
turn establishes the closeness in subgradients of the objective 
functions of the 3-stage sample average problem and 
the true 3-stage problem and yields the SAA theorem. 

Sufficiency of closeness in subgradients. Let 
􀀀 
􀀀 
􀀀 
be two functions with Lipschitz

 
􀀀 
and d 
constant (at most) . Let 􀀀 
be the bounded feasible

> 

region and be a radius such that is contained in the ball 
􀀀 . Let 􀀀 be two parameters

 

with . Set 
and . Let 

 

 
for all 

 

Set 


 

 . We call the extended -grid 
of the polytope . Note that for every , there exists 

 such that . Fix 􀀀. We first 
consider minimization problems. We say that functions 
and d satisfy property (A) if 

 d 
􀀀 
d is a subgradient of d , 

and an -subgradient of at. (A) 

Lemma 3.1 Suppose functions and d satisfy property 

 

(A). Let d be points that respectively minimize 

 

 and d , with 􀀀, and let be a point 

 

such that d d d . Then, (i) d 

 

ˆ ; (ii) ˆ . 

Proof : We prove part (i); part (ii) is proved almost iden

 


tically. Suppose first that d . Let ˙ be the point 

 

in closest to ,so ˙ and therefore

 

 􀀀􀀀

 ˙ . Let d ¼ ¼ ˙ 

 

and consider the vector d given by property (A). It must 
d d

be that d ˙ 􀀀, otherwise 
we would have d d d contradicting the optimality 
of d. So, by the definition of an -subgradient, 

 􀀀 


we have 􀀀 ˝ ˙ 

 􀀀

 since 
. Also 

 􀀀 

 d ¼ since d ˙ . So, 

 

 d ˇ . 

 

Now suppose d . Let ˛ be the point in closest to 
d,so ˛ d and d ˛ d d . For any , 
we have d ˛ , otherwise d ˛ d . 
Let ˙, and ˛ 􀀀 for .For 



 

 􀀀 

 d 

 

 ˛ 

 

, and 

 . 

The sample average function d has

since d 

the same form as , but has a different distribution, so

 

is an -subgradient of at , 

 

 ˝ 􀀀 . This implies that 

 

 ˝ ˙ ˝ . 
So d ˆ . 

Lemma 3.2 states an analogous result for maximization 
problems. We say that and d satisfy property (B) if 

 d 
􀀀 
d is a -subgradient of d , 

and an - -subgradient of at. (B) 

Lemma 3.2 Suppose functions and dsatisfy property (B). 

 

Let and d be points in that respectively maximize 
functions and d , and suppose 􀀀. Then, 

 d ˝ . 

Lemma 3.3 Let be the extended -grid of . Then 

 

. 

The SAA bound for 2-stage programs. We now 
prove a polynomial SAA bound for the class of 2-stage programs 
considered in [13] (this was stated with extra constraints 
but these are handled below). 

 ˘ 
􀀀
> (P) 
˘ 
􀀀
> 
􀀀
> 
Here (a) 
􀀀 for every scenario , and (b) for every 

 


, 􀀀 and the primal and dual problems

 

corresponding to are feasible for every scenario .It 
is assumed that 􀀀 where is polynomially

 

d d is a subgradient of d at , 
and d . So (by Claim 2.2) the Lipschitz constant 
of d is at most . Observe that d is just 

 averaged over the random scenarios sampled 
to construct d , and d 
where the expectation is 
over these random samples. 

Theorem 3.5 For any 􀀀 with probability at 
least Æ, any optimal solution d to the sample average

 

problem constructed with p 
􀀀 􀀀 􀀀 sam-

Æ 

ples, satisfies d O ˆ . 

Proof : We show that and d satisfy property (A) 
with probability Æ with the stated sample size; the rest

 

follows from Lemma 3.1. Define , 
and let be the extended -grid. Note that 
is polynomially bounded in the input size. Let . 

 

 􀀀 

Using Lemma 2.4, if we sample 2 


 
2 Æ 

times to construct d then at a given point , subgradient 
d of d is component-wise close to its expectation with 
probability at least Æ , so by Lemma 3.4 d isan subgradient 
of at (with high probability). So with 
probability at least Æ, d is an -subgradient of at 
every point . Using Lemma 3.3 to bound , we get

 

that 
.

􀀀Æ 


Under the mild assumption that (a) the point 􀀀􀀀 (i.e., 
deferring all decisions to stage 2) lies in , and (b) for every 
scenario , either is minimized with 􀀀􀀀,or 

 for every , it was shown in [13]

 

that by sampling 􀀀 times initially one can detect with

Æ 

probability Æ (with Æ 
􀀀 ), that either 􀀀 is an 

Æ .So if we

optimal solution to (P), or that O 

; we assume 􀀀 Æ bounded. Define 

 

 

 

detect that O is large, then we can set appropriatelythat is known. Let O be the optimum value and to get a -optimal solution with probability Æ,

 

denote the size of the input. The sample average problem is 

to minimize the sample average function d 

 

 

using the SAA method with p 
􀀀 􀀀 samples.

Æ 

 
d over , where d , is 
the total number of samples and is the number of times 
scenario is sampled. 

Lemma 3.4 Let be a subgradient of at the point 

, and suppose that d is a vector such that d 

 for all . Then dis an -subgradient (i.e., 
an 􀀀 -subgradient) of at . 

It is shown in [13] that at any point ,if 
is an optimal solution to the dual of , then (i) 

 

 

 is a subgradient of ; (ii) for any 
component and any scenario , component of the vec

 


tor lies in ; and therefore (iii) 

4. 3-stage stochastic programs 

3-stage stochastic set cover. Our techniques yield a 
polynomial-sample bound for a broad class of 3-stage programs, 
but before considering a generic 3-stage program, 
we introduce and explain the main ideas involved by focusing 
on the 3-stage stochastic set cover problem. 

In the stochastic set cover problem, we are given a universe 
of elements and a family of subsets of , 
and the set of elements to cover is determined by a probability 
distribution. In the 3-stage problem this distribution 
is specified by a 3-level tree. We use to denote an outcome 
in stage 2, and to denote a stage 3 scenario 


where was the stage 2 outcome. Let be the set of all 
stage 2 outcomes, and for each let ˘ 
is a scenario . Let and be the probabilities 
of outcome and scenario respectively, and let

 

 . Note that 

 


for every . We have to cover the (random) set of 
elements ˇ in scenario , and we can buy a set 

in stage 1, or in stage 2 outcome , or in scenario 
incurring a cost of , and respectively. 

We use and respectively to denote the decisions 
in stage 1, outcome and scenario respectively 
and consider the following fractional relaxation: 

 ˘ (3SSC-P) 

 􀀀 

 

 ˘ 

 > 

 


(3Rec-P) 

 ˘ 

 􀀀 

? 

 ˇ 

 

Let 
􀀀 
􀀀 and O 
˘ . The total sample size in the sample average 
problem is where, (i) is the sample size used to estimate 
probability by the frequency d , and 

(ii) is the number of samples generated from the conditional 
distribution of scenarios in ˘ for each with d 

􀀀 to estimate by d . The sample average 
problem is similar to (3SSC-P) with d replacing , 
and d replacing in the recourse problem .We

 

use d ˘ > d 

 


 

and d dd to denote the sam

 


ple average recourse problem for outcome and the sample 
average function respectively. 

As mentioned earlier, the main difficulty in showing 
that the sample average and the true functions satisfy the 
closeness-in-subgradients property, is that these two problems 
now solve different recourse problems, d and 

 respectively, for an outcome . Since the subgradient 
is obtained from a dual solution, this entails first proving 
an SAA theorem for the dual which suggests that solving 
the dual of d yields a near-optimal solution to the 
dual of . To achieve this, we first formulate the dual 
as a compact concave maximization problem, then show 
that by slightly modifying the two dual programs, the dual 
objective functions become close in terms of their subgradients, 
and then use Lemma 3.2 to obtain the required 
SAA theorem (for the duals). A -subgradient of the 
dual objective function is obtained from the optimal solution 
of a 2-stage primal problem and we use Theorem 3.5 

to prove the closeness in -subgradients of the sample 
average dual and the true dual. In Section 5 we show that 
this argument can be applied inductively to prove an SAA 
bound for a large class of -stage stochastic LPs. 

Let 􀀀􀀀° (respectively d 􀀀° ) denote the recourse 
problem (respectively d ) with 􀀀 
and costs , that is, 􀀀􀀀° ˘ > 

 

 􀀀 . We formulate the follow

 


ing dual of the true and sample average recourse problems: 

 ° d ° 

 


where ° 􀀀° and d ° 
d 􀀀􀀀° . 

Lemma 4.1 At any point and outcome , 
and d . 

Lemma 4.2 Fix . Let be a solution to 
of value ° 

 

for every . Then, (i) is 
an -subgradient of at with ; 

(ii) if d is a vector such that d , 
then dis an -subgradient of at . 

Lemma 4.1 proves strong duality (in this new dual representation), 
which is used by Lemma 4.2. Lemma 4.2 also 
shows that any optimal solution d to yields d 
dd as a subgradient of d at , so to prove 
the closeness in subgradients of and d it suffices to argue 
that d is a near-optimal solution to . (Note that 
both and d have Lipschitz constant at most .) 
We could try to argue this by showing that ° and 
d ° are close in terms of their -subgradients (i.e., 
satisfy property (B)), however some technical difficulties 
arise here. A -subgradient of ° at is obtained 
from a solution to the 2-stage problem given by 􀀀° , 
and to show closeness in -subgradients at we need 
to argue that an optimal solution d to d 􀀀􀀀° is a 
near-optimal solution to 􀀀􀀀° . But this need not be 

 


 

true since the ratio of the second-and first

 


stage costs in the 2-stage problem 􀀀° , could be unbounded. 
To tackle this, we consider instead the modified 
dual problems 
° 

and 
d ° for a suitable 

􀀀. Define and 
d d . As in Lemma 4.2, one 
can show that near-optimal solutions to for 
every yield an approximate subgradient of at 
. But now the cost ratio in the 2-stage problem 􀀀􀀀° 
is at most 2 for any , and this gives the SAA bound 


stated in Lemma 4.3 below; we present the proof after Theorem 
4.6. Using Lemma 4.3, we show the closeness in sub-
gradients of and d , and this suffices to show that if 
d minimizes d then it is a near-optimal solution to . 

Lemma 4.3 For any parameters 􀀀, any 
, and any outcome , if we use Æ 

 

p 􀀀 􀀀 samples to construct the recourse

Æ 

problem d , then any optimal solution d to 
satisfies ° d with 
probability at least Æ. 

Lemma 4.4 Consider the sample average function d gen

 


􀀀 􀀀 

erated using Æ 
2 
samples

2 Æ 
Æ

from stage 2, and 
samples from stage 3 for

 

each outcome with d 􀀀. At any point , sub-
gradient d of d is an -subgradient of with 
probability at least Æ. 

Claim 4.5 For any , . 
Similarly d d d . 

Theorem 4.6 For any 􀀀 , one can construct 
d with p 
􀀀 􀀀 􀀀 samples, and with prob-

Æ 
ability at least Æ, any optimal solution d to ˘ d 
satisfies d O ˜ . 

Proof : Assume that without loss of generality. 
Let 
and . Note that 
is polynomially bounded in the input size. Set 

 

 and 
. We show that (i) a near-optimal 
solution to ˘ d yields a near-optimal solution to 
˘ , and (ii) minimizing and d over is 

roughly the same as approximately minimizing and 
d respectively over . 
Let ˙ be an optimal solution to ˘ d .By 
Claim 4.5, d d d ˙ ˙, and 􀀀 
˘ O . Let be the extended 

 


􀀀 􀀀 

 

which is a polynomial in 􀀀 and 􀀀 , where we

Æ 

use Lemma 3.3 to bound . We construct d using 

 

 Æ

 
samples. Since is polynomially

 

bounded, Lemma 4.3 shows that so is . Using Lemma 4.4 
and the union bound over all points in , probability at least 
Æ, at every point , subgradient d of d is 
an -subgradient of . So by parts (i) and (ii) of 
Lemma 3.1, we have that ˙ O ˆ 
and d O ˆ ˙ with high 

probability (since d d d ˙ ˙). Combining this 
with the bound on O , the bounds ˙ ˙ , 
d d (Claim 4.5), and plugging in and , 
we get that d O ˜ . 


Under the very mild assumption that for every scenario 
with ˇ ˆ ˙, for every and 􀀀, 
, we have the following. 

 

Lemma 4.7 By sampling 􀀀 times, one can

Æ 

detect with probability Æ Æ 
􀀀 
that either 􀀀􀀀 is 

Æ

an optimal solution to (3SSC-P), or that O . 

Thus, as in the 2-stage case, we can obtain a optimal 
solution with the SAA method (with high probability) 
using polynomially many samples. 

Proof of Lemma 4.3 : Let 
􀀀 
. Clearly we may assume that in the 

 

problems 􀀀􀀀° and d 􀀀° . Let 
,so 􀀀 . We want to show that if d 
solves , then ° d 

 with high probability. By a now familiar approach, 
we will show that d ° and ° are close in 
terms of their -subgradients and then use Lemma 3.2.

 

Let ° 􀀀 .We

 


only consider - -subgradients, so we drop 

from now on. A -subgradient to ° (resp. 
d ° )at is obtained from the solution to the 2-stage 
problem 􀀀􀀀° (resp. d 􀀀􀀀° ): 

Lemma 4.8 Fix and . Let . 
If is a solution to 􀀀° of value ° 

 

 􀀀° , then is an 
- -subgradient of ° at . 

We can bound the Lipschitz constant of ° and 

 

d ° by ˝ , since . Since 
, the ratio of costs in the two stages in the 2-stage problem 
􀀀􀀀° is at most 2 . 

 

 


Set and . Set 


 

 

and . Observe that 
is polynomially 
bounded. Recall that d is an optimal solution to 
. Let be the extended 
-grid of 

and . By Theorem 3.5, if we use Æ 

 

2 Æ

p 
2 samples from ˘ to con-

Æ 
Æ

struct , then with probability at least , 
at a given point , any 
optimal 
solution d to 
d 􀀀􀀀° satisfies ° d 􀀀° .

 

So by applying Lemma 4.8 and the union bound over all 
points in , with probability at least Æ, at each point 

 , the -subgradient d of d ° at isan 
- -subgradient of ° at .By 
Lemma 3.2, we have ° d ˝ 

grid of and . Let 2 


 
which is at least . 


 

Since and 
are p 
􀀀 , we get that 

 Æ p 􀀀 􀀀 .

Æ 


A class of solvable 3-stage programs. The above 
arguments can be adapted to prove an SAA bound for the 
following broad class of 3-stage stochastic programs. 

 ˘ (3G-P) 

 􀀀 
? 

 ˘ 

 > 

 


 ˘ 

 
􀀀 

? 

 
􀀀 

? 

 

where for every scenario , (a) 􀀀, and (b) 
for every and 􀀀, 􀀀 

 


 

 

 ˛. Let 
;we 

 

assume that is known. Let O be the optimum value 
and denote the input size. We assume that there is some 

with p such that 􀀀 , and further 
we assume that for any and any , there 
is an optimal solution to lying in 􀀀 . These 
assumptions are fairly mild and unrestrictive; in particular, 
they hold trivially for the fractional relaxations of 0-1 integer 
programs and many combinatorial optimization problems. 
We obtain the guarantee stated in Theorem 4.9 below; 
as before, we can use Lemma 4.7 to convert this to a guarantee 
under the mild assumption that 􀀀 , 
and for every scenario either is minimized 
at 􀀀 or for every and 􀀀, 

 . 

Theorem 4.9 For any 􀀀, one can construct 
the sample average approximation d to (3G-P) with

 

p 
􀀀 􀀀 􀀀 samples, and with probability at

Æ 

least Æ, any optimal solution d to ˘ d satisfies 
d O ˜ . 

Remark: Theorem 4.9 also handles programs, with constraints 
, with 􀀀,in . 

5. The bound for k-stage stochastic programs 

We now extend our techniques to solve -stage stochastic 
linear programs. Here is a constant that is not part of 
the input; the running time will be exponential in . 

In the -stage problem, the scenario distribution is specified 
by a -level tree, called the scenario tree. We start at 


the root of this tree at level 1, which represents the first-
stage. Each node at level represents an outcome in stage 
. At a leaf node, the uncertainty has completely resolved 
itself and we know the input precisely. A scenario always 
refers to a stage outcome, i.e., a leaf of the tree. The 
goal is to choose the first stage elements so as to minimize

 

the total expected cost, i.e., 􀀀 
stage cost where the 
expectation is taken over all scenarios. Let be the 
set of all children of , p 
be the set of all ancestors 
(including )of . Let be the probability that outcome 
occurs, and be the conditional probability that occurs 
given the previous-stage outcome. The distribution can be 
arbitrary, and can incorporate correlation effects from previous 
stages. Let denote the costs in outcome , be

 

the decisions taken in outcome and Y , 
where p 
and is ’s parent, be all the 
decisions taken in the previous stages. Let °° Y for 
the root . We consider the following generic -stage LP. 

 ˘ 
c 
􀀀 
( G-P) 
􀀀 Y ˘ 
> 
c 
Y 
for at level 
􀀀 Y ˘ 
􀀀
> 
􀀀
> 

 

 

We need that for every with parent and feasible decisions 
Y , 􀀀 and 􀀀 􀀀 Y ˛, 
and that there is some with polynomially bounded 
such that each 􀀀 Y , , has an optimal solution 
in 􀀀 . Let be the input size, be the ratio

 

 c . The sample average problem 
is of the same form as ( G-P), where the probability is 
estimated by the frequency d of outcome in the appropriate 
sampled set. 

The SAA bound for programs of the form ( G-P) follows 
by induction. Theorem 3.5 supplies the base case; the induction 
step, where we show that an SAA bound for stage 
programs of the form 􀀀 yields an SAA bound for 

, follows by dovetailing the arguments in Section 4. As 
in the 3-stage case, we formulate a concave-maximization 
problem 􀀀 that is dual to 􀀀 , which is 
a - ˘ problem with a -stage primal problem 
embedded inside it. Using the induction hypothesis, we argue 
that the objective functions of the (slightly modified) 
true and sample-average dual programs are close in terms 
of their -subgradients, so that any optimal solution to 
the (modified) sample-average dual is a near-optimal solution 
to the (modified) true dual. This in turn allows us to 
show (essentially) the closeness in subgradients of the sam



ple average and true functions. Lemma 3.1 completes the 
induction step. We obtain the following result, which also 
yields a -guarantee under some mild assumptions. 

Theorem 5.1 For any 􀀀, with probability Æ, 
any optimal solution d to the sample average problem con

 


structed using p 􀀀 samples satisfies d 

Æ􀀀 

 . 

6. Applications 

We consider a number of -stage stochastic optimization 
problems, where is a constant, for which we prove 
the first known performance guarantees. Our algorithms do 
not assume anything about the distribution or the cost structure 
of the input. Previously, algorithms for these problems 
were known only in the 2-stage setting initially in restricted 
settings [10, 8, 5], and later without any restrictions [13]. 
For each of these -stage problems, we can write a -stage 
LP for the linear relaxation of the problem for which Theorem 
5.1 applies, and round the near-optimal fractional solution 
obtained by solving the sample average problem using 
an extension of the rounding scheme in [13]. 

Multicommodity flow We consider a stochastic version 
of the concurrent multicommodity flow problem where we 
have to buy capacity to install on the edges so that one 
can concurrently ship demand of each commodity from 
its source to its sink . The demand is uncertain and is 
revealed in -stages. We can buy capacity on edge in any 
stage outcome at a cost of ; and the total amount of 
capacity that we can install on an edge is limited by its capacity 
. The goal is to minimize the expected capacity 
installation cost. We obtain a -optimal solution. 

Covering problems We consider the -stage versions of 
set cover, vertex cover and the multicut problem on trees. 
In each of these problems, there are elements in some universe 
that need to covered by sets. In the -stage stochastic 
problem, the target set of elements to cover is determined 
by a probability distribution, and becomes known after a 
sequence of stages. In each outcome , we can purchase 
a set at a price of . We have to determine which sets 
to buy in stage I so as to minimize the (expected) total cost 
of buying sets. We can generalize the rounding theorem 
in [13] to show that a -approximation algorithm for the deterministic 
analogue, that uses the natural LP relaxation as a 
lower bound, yields a -approximation algorithm for 
the -stage problem. In general, to compute the decisions 
in a stage outcome, we solve a -stage problem, 
and round the solution. We get performance guarantees of 
for -stage set cover, and for -stage 
vertex cover and -stage multicut on trees. 


Facility location problems In the -stage uncapacitated 
facility location (UFL) problem, we are given some candidate 
facility locations, a set of clients, and a probability distribution 
on the client demands that evolves over -stages. 
In each stage, one can buy facilities paying a certain facility 
opening cost; in stage , we know the exact demands and 
we have to assign each client’s demand to an open facility 
incurring a client assignment cost. The goal is to minimize 
the expected total cost. Adapting the procedure in [13], we 
obtain a -approximation algorithms for -stage UFL, 
and -stage UFL with penalties, or with soft capacities. 

References 

[1] K. A. Ariyawansa and A. J. Felt On a new collection of 
stochastic linear programming test problems. INFORMS 
Journal on Computing, 16(3):291–299, 2004. 

[2] J. R. Birge and F. V. Louveaux. Introduction to Stochastic 
Programming. Springer-Verlag, NY, 1997. 

[3] M. Charikar, C. Chekuri, and M. P´al. Sampling bounds for 
stochastic optimization. Proc. RANDOM, 2005. To appear. 

[4] S. Dye, L. Stougie, and A. Tomasgard. The stochastic single 
resource service-provision problem. Naval Research Logistics, 
50(8):869–887, 2003. Also appeared as COSOR-
Memorandum 99-13, Dept. of Mathematics and Computer 
Sc., Eindhoven, Tech. Univ., Eindhoven, 1999. 

[5] A. Gupta, M. P´al, R. Ravi, and A. Sinha. Boosted sampling: 
approximation algorithms for stochastic optimization. Proc. 
36th STOC, pages 417–426, 2004. 

[6] A. Gupta, M. P´al, R. Ravi, & A. Sinha. What about Wednesday? 
Approximation algorithms for multistage stochastic 
optimization. Proc. 8th APPROX, 2005. To appear. 

´ 
for information networks. Proc. 16th SODA, 933–942, 2005. 

[7] A. Hayrapetyan, C. Swamy, and E. Tardos Network design 

[8] N. Immorlica, D. Karger, M. Minkoff, and V. Mirrokni. 
On the costs and benefits of procrastination: approximation 
algorithms for stochastic combinatorial optimization problems. 
Proc. 15th SODA, pages 684–693, 2004. 

[9] A. J. Kleywegt, A. Shapiro, and T. Homem-De-Mello. The 
sample average approximation method for stochastic discrete 
optimization. SIAM J. Optim., 12:479–502, 2001. 

[10] R. Ravi and A. Sinha. Hedging uncertainty: approximation 
algorithms for stochastic optimization problems. Proc. 10th 
IPCO, pages 101–115, 2004. 

[11] A. Shapiro. Monte Carlo sampling methods. In 

A. Ruszczynski and A. Shapiro, editors, Stochastic Programming, 
volume 10 of Handbooks in Oper. Res. and Management 
Sc., North-Holland, Amsterdam, 2003. 

[12] A. Shapiro. On complexity of multistage stochastic programs. 
Optimization Online, 2005. http://www.optimizationonline.
org/DB FILE/2005/01/1041.pdf. 

[13] D. B. Shmoys and C. Swamy. Stochastic optimization is 
(almost) as easy as deterministic optimization. Proc. 45th 
FOCS, pages 228–237, 2004. 

[14] C. Swamy and D. Shmoys. The sample average approximation 
method for 2-stage stochastic optimization. November 
2004. http://ist.caltech.edu/˜cswamy/papers/SAAproof.pdf.