An Approximation Scheme for Stochastic Linear Programming and its Application to Stochastic Integer Programs􀀀 David B. Shmoyst Chaitanya Swamy Abstract Stochastic optimization problems attempt to model uncertainty in the data by assuming that the input is specified by a probability distribution. We consider the well studied paradigm of 2-stage models with recourse: first, given only distributional information about (some of) the data one commits on initial actions, and then once the actual data is realized (according to the distribution), further (recourse) actions can be taken. We show that for a broad class of 2-stage linear models with recourse, one can, for any 0, in time polynomial in ! and the size of the input, compute a solution of value within a factor of the optimum, in spite of the fact that exponentially many second-stage scenarios may occur. In conjunction with a suitable rounding scheme, this yields the first approximation algorithms for 2-stage stochastic integer optimization problems where the underlying random data is given by a “black box” and no restrictions are placed on the costs in the two stages. Our rounding approach for stochastic integer programs shows that an approximation algorithm for a deterministic analogue yields, with a small constant-factor loss, provably near-optimal solutions for the stochastic generalization. Among the range of applications we consider are stochastic versions of the multicommodity flow, set covering, and facility location problems. 1 Introduction The study of stochastic optimization problems dates back to the 1950’s and the work of Dantzig [7] and Beale [2], and attempts to model uncertainty in the data by assuming that (part of) the input is specified in terms of a probability distribution, rather than by deterministic data given in advance. Stochastic optimization techniques and models have become an important paradigm in a wide range of application areas, including transportation models, logistics, financial instruments, and network design. Stochastic models are often computationally quite difficult, both from a practical perspective, as well from the point of view of computational complexity theory; even extremely specialized (sub)problems are # -complete. We focus on an important and widely used model in stochastic programming, the 2-stage recourse model: first, given only distributional information about (some of) the data, one commits on initial (first- stage) actions, and then once the actual data is realized, according to the distribution, further recourse actions can be taken, so that one can augment the earlier solution to satisfy the revealed requirements, if necessary. Typically the recourse actions entail making decisions in rapid reaction to the observed scenario, that is, at the “last minute,” and are therefore costlier than decisions made ahead of time. Many applications come under this setting and much of the textbook of Birge and Louveaux [4] is devoted to models and algorithms for this class of problems. Consider for example the problem of opening facilities to serve a *A preliminary version [25] appeared in the Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science, 2004. shmoys@orie.cornell.edu. School of ORIE, Cornell University, Ithaca, NY 14853. Research supported partially by NSF grant CCR-9912422. cswamy@ist.caltech.edu. Work done while the author was a student at the Department of Computer Science, Cornell University, Ithaca, NY 14853. Research supported partially by NSF grant CCR-9912422. set of clients. Initially given only distributional information (likelihood estimates) about the client demands (in addition to deterministic data for the facility and assignment costs), the first-stage decisions consist of deciding which facilities to open initially; but then once the actual input (the client demands) is realized according to this distribution one can extend (in a second stage) the solution by opening more facilities (if necessary) incurring a certain recourse cost. These recourse costs are typically more expensive than the original ones, e.g., because opening a facility at the last minute to handle excess demand might involve deploying resources at a much smaller lead time, may be different for the different facilities, and could even depend on the scenario that materializes. We shall initially focus on the following stochastic generalization of the set cover problem: we are given a family of sets 􀀀 over a ground set , where each set has an a priori weight , and an a posteriori weight associated with it. In the first stage, one selects some of these sets, incurring each associated weight , then a subset 􀀀 is drawn according to a specified distribution, and then additional sets may be selected (incurring their second stage weights) so as to ensure that is contained in the union of the selected sets (in both stages). The aim is to minimize the expected cost of the solution. Note that an explicit representation of a feasible solution would require specifying an exponential amount of information (since there are possible choices for ), and so it will be necessary to give more compact, algorithmic representations. An important issue left ambiguous in the description above is the way in which the probability distribution is specified; several approaches have recently been considered in papers that address related 2-stage stochastic optimization problems. Dye, Stougie, and Tomasgard [8], and later, Ravi and Sinha [22] assume that there are only a polynomial number of scenarios, i.e., choices for , that occur with positive probability. Independently, Immorlica, Karger, Minkoff, and Mirrokni [14] consider both this assumption, as well as the model where each element occurs with its own independent probability, and in so doing they enlarge the space of scenarios to be exponentially large. This is done with the rather severe restriction of assuming that the costs in the two stages are proportional, that is, there is a parameter such that for each set . Gupta, P´al, Ravi, and Sinha [13] also require this assumption, but give a more general way to specify the distribution, which we shall call the black-box model: they assume that the algorithm may make use of samples that are drawn according to the distribution of scenarios. Our Results We achieve the best qualities of all of these approaches: we work in the black-box model and obtain results in settings where the costs in the two stages need not be proportional, and the second-stage costs may even depend on the particular scenario realized (as in [22]). We obtain the first approximation algorithms for a variety of applications including the stochastic versions of the multicommodity flow, set covering, and facility location problems, without placing any restrictions on the underlying probability distribution or the cost structure of the input. Moreover, the performance guarantees we obtain, in some of these applications, are improvements on the results obtained by [22, 14, 13] in weaker models. We show that, given a class of (deterministic) set cover instances (e.g., the vertex cover problem), for which we have a -approximation guarantee with respect to the natural linear programming relaxation, we can, for any , obtain a randomized -approximation algorithm for the stochastic generalization. This generalizes and improves upon performance guarantees of [13]. Our result has two principle components. First, we show that if we formulate the stochastic set cover problem as an (exponentially large) integer program and solve its linear programming (LP) relaxation, then a surprisingly simple rounding approach suffices to prove the guarantee claimed above. The essence of our approach is that the relaxation indicates for each element either that it is at least half covered in the first stage, or else it must be at least half covered in each scenario in which it occurs in the second stage. This allows us to decouple the two stages (and indeed each of the scenarios for the second stage), and apply the deterministic result to each separately. Thus, the fact that we lose a factor of 2 is exactly tied into the fact that we are considering a 2-stage problem. It is important to note that we need to examine only the first-stage variables to decouple the two stages, and not the entire (exponentially large) LP solution. In fact, this decomposition can be applied to a number of stochastic integer optimization problems, allowing one to “reduce” the stochastic problem to the deterministic analogue. Furthermore, if we consider the case when there are only a polynomial number of scenarios, then this rounding approach is sufficient to yield polynomial time algorithms with strong performance guarantees for a wide range of applications. Second, and this is the more technically difficult part of the paper, we give a fully polynomial randomized approximation scheme for solving these stochastic linear programming problems in spite of the fact that they are # -hard. (Not surprisingly, the LP has an exponential number of both variables and constraints.) We believe that this is a tool of independent interest, in particular, in the stochastic programming literature, and will find application in the design of approximation algorithms for other stochastic integer optimization problems. We approximately solve this linear program by working with an equivalent (but compact) convex programming formulation, and show that the ellipsoid algorithm can be adapted to yield such a scheme. In the ellipsoid algorithm for convex programming, the algorithm generates a sequence of ellipsoids, starting with an ellipsoid that contains the entire feasible region; in each iteration, a hyperplane is generated that is intersected with the current ellipsoid, and the next ellipsoid generated is the minimum-volume ellipsoid containing this intersection. If the current ellipsoid center is infeasible, then one uses a violated inequality to generate the hyperplane; otherwise, the hyperplane is obtained by computing the subgradient of the objective function at the ellipsoid center. We introduce a notion of approximate subgradients that is sufficient to yield the same convergence of the algorithm. Furthermore, we show that this approximate subgradient can be computed in randomized polynomial time using samples from the distribution (obtained from a black box). Finally, the ellipsoid algorithm outputs the iterate with the best objective function value. However, evaluating the objective function value at a given point for our class of stochastic programs may be # hard; nonetheless, approximate subgradient information is sufficient to efficiently compute a point of cost close to the cost of the minimum-cost iterate (without, however, computing these costs). Note that for the rounding algorithm, one only needs the convex program’s (near-)optimal solution to compute the solution for the stochastic integer optimization problem. Our algorithm returns a solution of objective function value within of the optimum for any , in running time bounded by a polynomial in the input size, 􀀀 , and the maximum ratio between the second-and first-stage costs. The algorithm works for both discrete and continuous distributions, and does not require any assumptions about the probability distribution (or the cost structure of the input). This result should be viewed as indicative of the fact that an exponential number of scenarios is not an insurmountable impediment to the design of efficient algorithms for these problems. (For example, in [3], this viewpoint was used as a motivation for considering so-called “robust” versions of deterministic optimization problems, as opposed to their stochastic versions.) We believe that this work could lead to both, the development of algorithms with provable guarantees for more general stochastic optimization models, such as multistage problems, and better computational procedures for solving 2-stage stochastic LPs. Related Work Two-stage stochastic programs, both linear and integer programs, have been extensively studied in the stochastic programming literature, but relatively little is known about polynomial-time algorithms that deliver solutions that are provably good approximations to the optimum stochastic linear or integer program objective value. It is useful to compare our result with some work in the stochastic programming literature on the sample average approximation (SAA) method for solving stochastic programs, where one samples from the distribution on scenarios and then solves an approximate problem, estimating the probability of a scenario by its frequency. Note that the estimated distribution has support of size at most the number of samples. While there are results that prove asymptotic convergence to the (true) optimal solution as the number of samples goes to infinity (see [23] and the references therein), and it has been reported that some of these algorithms converge quickly in practice [18, 29], fewer results are known that give bounds on the rate of convergence and the sample size required to obtain a near-optimal solution (with high probability). Kleywegt, Shapiro, and Homem-De-Mello [16] (see also [23]) prove a bound on the sample size required to obtain a near-optimal solution that is polynomial in the dimension, but depends on the variance of a certain quantity (calculated using the scenario distribution) that might be exponential in the input size. Whereas the running time of our algorithm depends on the parameter , it does not depend on the underlying probability distribution. The algorithm of [13] also has the same dependence on . In fact, this dependence on is unavoidable; we show that a performance guarantee of requires samples. The dependence on 􀀀 is also necessary in light of the known # -hardness results. To the best of our knowledge, this is the first result to show that (a broad class of) 2-stage stochastic LPs can be solved in time polynomial in the input size, , and 􀀀 . Dyer, Kannan, and Stougie [9], and Nesterov and Vial [21] also give algorithms for stochastic optimization and we now compare our algorithm with their work. Dyer et al. focus on computing an estimate for the objective function value at a given point (for a maximization problem) by sampling from the distribution sufficiently many times. But this yields a running time that is only polynomial in the maximum value attained by any scenario, and does not yield a polynomial approximation scheme for our setting. In contrast, our algorithm is based on approximating the subgradient at a given point, and consequently the running time depends on the variation in the subgradient vector components which we show is bounded by . The algorithm of Nesterov and Vial also employs subgradients and uses a subgradient-descent approach; they obtain a running time that depends on the maximum variation in the objective function value in the feasible region, which in general is not polynomially bounded. The first worst-case analysis of approximation algorithms for 2-stage stochastic integer programming problems appears to be due to Dye et al. [8], who give a constant performance guarantee for a resource provisioning problem (a maximization problem) in the polynomial scenarios setting based on rounding a linear program. Ravi and Sinha [22], and independently Immorlica et al. [14], and subsequently Gupta et al. [13] consider various 2-stage problems including applications that follow from our general settings. We now contrast our results in these applications with previous results. For the vertex cover problem, our technique yields a -approximation algorithm in the black-box distribution model. In contrast, [13] give an 8-approximation algorithm in the black-box model with proportional costs, and [22] give a guarantee of 2 in the polynomial scenarios setting. We also give extensions to multi-covering generalizations of the set cover problem. For the stochastic uncapacitated facility location problem, we give a -approximation algorithm, whereas [13] give a guarantee of 8.45 in the black-box model with proportional costs, and [22] give a guarantee of 8 in the polynomial scenarios setting. Our approach yields constant-factor performance guarantees for several facility location variants, including facility location with penalties, or soft capacities, or service installation costs. As in [22], our results also extend to the case with scenario-dependent second-stage costs. The rest of the paper is organized as follows. In Section 2, we focus on a stochastic generalization of the set cover problem to illustrate our rounding approach and motivate a compact convex programming relaxation of the problem. In Sections 3,4 we describe the algorithm to solve the resulting convex program and prove that it computes a near-optimal fractional solution to the stochastic set cover problem; Section 5 extends this analysis to show that the algorithm can be used to solve a rich class of 2-stage programs to near- optimality. In Section 6, we consider various applications and present approximation algorithms for 2-stage stochastic integer problems. Section 7 proves a lower bound on the sample size required in the black-box model, and we conclude in Section 8. 2 An illustrative example: the stochastic set cover problem The deterministic weighted set-cover problem (DSC) is the following: given a universe of elements 􀀀 and a collection of subsets of , 􀀀 with set having weight , we want to choose a minimum weight collection of sets so that every element , is included in some chosen set. The problem can be formulated as an integer program and the integrality constraints can be relaxed to yield the following linear program: subject to for all for all (SC-P) Let O n denote the optimal value of (SC-P). In the 2-stage stochastic generalization of the problem, abbreviated SSC, the elements to be covered are not known in advance. There is a probability distribution over scenarios, and each scenario specifies the actual set of elements 􀀀 to be covered. For our purposes, a scenario is just some subset of the elements 􀀀 . We will assume without loss of generality that the set of all possible scenarios is the power set (including the empty set ). We use to denote the probability of scenario ( could be 0, if scenario never occurs). We will never explicitly the quantities in the algorithm. Throughout we will use to index the scenarios. Each set has two weights associated with it, an a priori weight , and an a posteriori weight . In the first stage, one selects some of these sets, incurring a cost of for choosing set , then a scenario 􀀀 is drawn according to a specified distribution, and then additional sets may be selected incurring their second stage weights so as to ensure that is contained in the union of the sets selected (in both stages). The aim is to minimize the expected total cost of the solution, that is, the sum of the cost incurred in stage I, and the expectation over all stage II scenarios of the stage II cost of scenario . The two-stage problem can also be formulated as an integer program and the integrality constraints can be relaxed to yield a linear program. (SSC-P1) s.t. for all (1) for all Variable indicates whether set is chosen in stage I, and indicates if set is chosen in scenario . Constraint (1) says that in every scenario , every element in that scenario has to be covered by a set chosen either in stage I or in stage II. A -solution corresponds exactly to a solution to our problem. The following theorem forms the basis of our methodology for tackling various 2-stage stochastic optimization problems. Let O denote the optimal value of (SSC-P1). Theorem 2.1 Suppose that we have a procedure that for every instance of DSC produces a solution of cost at most O n . Then, one can convert any solution to (SSC-P1) to an integer solution losing a factor of at most . Thus, an optimal solution to (SSC-P2) gives a -approximation algorithm. Proof : Let denote the objective function. We will argue that we can obtain an integer solution of cost at most . Observe the following simple fact: an element is either covered to an extent of at least 􀀀 in the first stage by the variables , or it is covered to an extent of at least 􀀀 by the variables in every scenario containing . Let 􀀀 . Then is a fractional set cover solution (i.e., a solution to (SC-P)) for the instance with universe set and so, one can obtain an integer set cover for this instance of cost at most . Similarly, for any scenario , is a fractional set cover for the elements in , since for each such element ,wehave 􀀀 . Therefore, one can cover these elements at a cost of at most . So if we output as the first-stage decisions, we get a solution of cost at most . Corollary 2.2 If the integrality gap of (SC-P) is , then the integrality gap of (SSC-P1) is at most . It is well known [6] that “the greedy algorithm” returns a solution to the deterministic set cover problem of weight at most O n . Thus, Theorem 2.1 shows that if we could solve (SSC-P1), then we could get a -approximation algorithm for SSC. In particular, when there are only a polynomial number of scenarios with non-zero probability, we obtain a -approximation algorithm. In general, however, it seems difficult to find an optimal solution to (SSC-P1) in its present form, since it has both an exponential number of variables and an exponential number of constraints; even writing out an optimal solution might take exponential space. Observe that in the proof of Theorem 2.1, we needed to examine only the stage I variables of the fractional solution, in order to round it to an integer solution. This is important, because if the rounding algorithm required information about each stage II scenario , that is, an exponential amount of information (as in [22]), then one would not get a polynomial-time algorithm even if one could “solve” (SSC-P1) efficiently. In contrast, Theorem 2.1 shows that if we could somehow compactly express (SSC-P1), and solve the resulting program efficiently and find an (near-) optimal (fractional) first-stage vector , then one can obtain a -approximation algorithm. This motivates the following formulation with only the stage I variables . subject to for all (SSC-P2) where and s.t. for all for all It is straightforward to show that (SSC-P2) and (SSC-P1) are equivalent mathematical programs, and that the objective function of the latter is convex. 3 Solving the convex program: algorithm overview We now leverage the fact that the objective function of (SSC-P2) is convex to show that the ellipsoid method can be adapted to find a near-optimal solution to (SSC-P2) in polynomial time. In doing so, a significant difficulty that we need to overcome, is the fact that evaluating , and hence the objective function, may in general be #P-hard. Section 5 generalizes the arguments to show that the algorithm can be applied to a more general class of 2-stage stochastic programs. The ellipsoid method starts by containing the feasible region within a ball and generates a sequence of ellipsoids, each of successively smaller volume. In each iteration, one examines the center of the current ellipsoid and obtains a specific half-space defined by a hyperplane passing through the current ellipsoid center. If the current ellipsoid center is infeasible, then one uses a violated inequality as the hyperplane, otherwise, one uses an objective function cut, to eliminate (some or all) feasible points whose objective function value is no better than the current center, and thus make progress. A new ellipsoid is then generated by finding the minimum-volume ellipsoid containing the half-ellipsoid obtained by the intersection of the current one with this half-space. Continuing in this way, using the fact that the volume of the successive ellipsoids decreases by a significant factor, one can show that after a certain number of iterations, the feasible point generated with the best objective function value is a near-optimal solution. The above description makes clear that the inability to evaluate is an obstacle to applying the ellipsoid method in this case. Let denote the polytope 􀀀 for all , and let be the (convex) objective function . If the current iterate is feasible, then one could add the constraint while maintaining the convexity of the feasible region. But then, in subsequent iterations, one would need to check if the current iterate is feasible, and generate a separating hyperplane if not. Without the ability to evaluate (or even estimate) the objective function value, we cannot even decide whether the current point is feasible (or even almost-feasible), and finding a separating hyperplane appears to pose a formidable difficulty. An alternative possibility is to use cuts generated by a subgradient, which essentially plays the role of the gradient when the function is not differentiable. Definition 3.1 Let 􀀀􀀀 be a function. We say that is a subgradient of at the point if the inequality holds for every 􀀀 . Note that the subgradient at a given point need not be unique. It is known (see [5]) that if a function is convex then it has a subgradient at every point. If is the subgradient at point , one can add the subgradient cut and proceed with the (smaller) polytope 􀀀 . Unfortunately, even computing the subgradient at a point seems hard to do in polynomial time for the objective functions that arise in stochastic programs. To circumvent this obstacle, we define the following notion of an approximate subgradient which is crucial to the working of our algorithm. Definition 3.2 We say that is a -subgradient of a function 􀀀􀀀 at the point if for every , we have . We will only use -subgradients in the algorithm, which we abbreviate and denote as -subgradients from now on. We show that one can compute, with high probability, an -subgradient of at any point , by sampling from the black box on scenarios. At a feasible point , we compute an -subgradient and add the inequality to chop off a region of and get the polytope 􀀀. Since we use an approximate subgradient to generate the cut, we might discard points with objective function value better than that at the current iterate ; but one can show that for each point in 􀀀 , , implying that a discarded point has function value not much better than . Continuing this way we obtain a polynomial number of points 􀀀 such that 􀀀 􀀀 for each , and the volume of the ellipsoid centered at containing (and hence of ) is “small” (this is made precise later). Now if the function has bounded variation on nearby points, then one can show that is close to the optimal value with high probability. Since we approximate the subgradient at a feasible point (and not the function value ), our running time depends only on the variation in the subgradient vector components, which we show is bounded by the maximum ratio of the stage II and stage I costs. One last hurdle remains however. Since we cannot compute we will not be able to compute the point . Nonetheless, by using approximate subgradients we will find a point in the convex hull of , at which the objective function value is close to (without, however, computing these values). At the heart of this procedure is a subroutine that given two points 􀀀 , returns a point on the line segment joining 􀀀 and such that is close to 􀀀 . We find by performing a bisection search, using the subgradient to infer which direction to move along the line segment. By repeatedly calling the above subroutine with (initialized to ) and for , each time updating to the point returned by the subroutine, at the end we get a point such that is close to . 4 Algorithm details and analysis Let O denote the optimal solution value. We describe the algorithm for an arbitrary convex function and an arbitrary (rational) polytope (the feasible region is bounded). We 1 use to denote the norm of , i.e., 􀀀 . The following definition makes precise the notion of bounded variation. Definition 4.1 (Lipschitz Condition) Given a function 􀀀􀀀, we say that has Lipschitz constant (at most) if for all 􀀀 . Let the objective function 􀀀􀀀 have Lipschitz constant . We shall assume that is a defining inequality of . We also assume that the polytope is contained in the ball , and contains a ball of radius such that and 􀀀 are polynomially bounded. (For all the optimization problems considered, it is trivial to set and so that they satisfy all these properties; moreover Lemmas 6.2.4–6.2.6 in [12] show that one can always get such and .) Set and 5 define ˘ ˘ . We assume that (or an upper bounds on it) is known to the algorithm. 5 The bulk of the work is performed by a procedure FindOpt (see Fig. 1). FindOpt takes two parameters and and returns a feasible solution such that O , where 􀀀 without loss of generality, assuming that one can compute -subgradients for a sufficiently small , in time polynomial in the dimension , and (excluding the time to compute the -subgradients). This is the main procedure that uses the ellipsoid method and the notion of -subgradients to get close to an optimal solution as discussed earlier. It uses a subroutine FindMin which takes a set of feasible points , and returns a feasible point having function value close to using -subgradients. To convert this to a purely multiplicative guarantee we use a procedure ConvOpt to bootstrap algorithm FindOpt. In procedure ConvOpt we first sample a certain number of times from the distribution on scenarios, and determine with high probability that either, is an optimal solution and return this solution, or obtain a lower bound on O and then call FindOpt setting and appropriately. By wrapping FindOpt within procedure ConvOpt, we may assume that FindOpt executes only if O is “large,” and thus set and so that FindOpt returns a solution of cost at most O . For ease of understanding, we divide the analysis into two parts. In Section 4.1 and prove that procedure FindOpt returns a solution of objective value at most O in time polynomial in the dimension , and . We also show that, with high probability, ConvOpt correctly determines if O is large enough and if FindOpt should be called. By our earlier discussion, one can always choose and so that is polynomial in the input size. In Section 4.2 we show that for the stochastic set cover problem, one can efficiently compute -subgradients (with a sufficiently high probability), and bound the parameter , so that the entire procedure runs in polynomial time and delivers a solution of cost at most O with high probability. In Section 5 we generalize these arguments and show that for a large class of 2-stage stochastic programs, one can efficiently compute -subgradients and bound the Lipschitz constant ,to argue that procedures FindOpt and ConvOpt can be used to find a -optimal solution in polynomial time. 4.1 Analysis of the Generic Algorithm We first analyze procedure FindOpt. Clearly we maintain the stated invariant. We will need the following well known facts (see for example, [12]). ConvOpt( Æ [Returns such that : O with probability at least Æ. Assume Æ : ! .] ( ( ! C1. Define . Sample Æ times from the distribution on scenarios. Let number of times a non-null scenario occurs. C2. If 0 , return 00 as an optimal solution. C3. Otherwise (with high probability), O , where Æ . Set , . Return ! Æ F . FindOpt( [Returns a point such that : O . Assume : ! .] ( ( 􀀀 O1. Set 0 0 ! , and . Let 0 and . O2. For 0 do the following. [We maintain the invariant that is an ellipsoid centered at containing the current polytope .] a) If , set . Let be an -subgradient of at . Let denote the half space 􀀀 : 0 . Set ! and . b) If , let : be a violated inequality, that is, , whereas : for all . Let be the half space 􀀀 : 0 . c) Set ! to be the ellipsoid of minimum volume containing the half-ellipsoid . O3. Let . We now have a collection of points such that each 􀀀!. Return F . FindMin( o 􀀀 ( M1. Set . M2. For do the following. ( 􀀀 􀀀! 􀀀! [We maintain the invariant that : .] a) We use binary search to find on the line segment with value close to . Initialize ! , . b) For do the following. [We maintain that ! : 􀀀! , : 􀀀! .] 􀀀 – Let . Compute an -subgradient of at the point .If ! 0 , then exit the loop. Otherwise exactly one of ! and is positive. –If ! 0, set ! , else set . c) Set . M3. Return . Figure 1: The convex optimization algorithm. Fact 4.2 The volume of the ball 􀀀 where 􀀀 is v . Fact 4.3 Let 􀀀 􀀀 be an ellipsoid and 􀀀 􀀀 be a half space passing through the center of . Then 􀀀 there is a unique ellipsoid of minimum volume containing the half-ellipsoid and . : Fact 4.4 Let 􀀀􀀀 be an affine transformation with , where ˇˆ˙ . Then for any set 􀀀 􀀀 we have v ˇˆ˙ v . Lemma 4.5 The points generated by FindOpt at the end of step O2 satisfy O . Proof : Let be an optimal solution. If for some , then since is a -subgradient at . Otherwise let . Consider the affine transformation defined by where is the identity matrix, and let ,so is a shrunken version of “centered” around . Observe that, (1) 􀀀 because is convex, and any point is a convex combination of and ,so ; (2) v v v using Facts 4.4 and 4.3 and since contains a ball of radius by assumption; and (3) for any , since ,so since 1 􀀀 has Lipschitz constant . Since : for every , and the volume of the ball : is v , plugging things together we obtain v v v v v so there must be a point that lies on a boundary of generated by a hyperplane . This implies that . Lemma 4.6 Procedure FindMin returns a point such that . Proof : The proof follows from the invariant stated in step M2 with , so we show the invariant. The invariant clearly holds when . Suppose the invariant holds at the beginning of iteration . We will show that the inner “For =...” loop returns a point such that .So after we set in step M2c) at the end of iteration , we get that , which satisfies the invariant at the beginning of iteration . To prove the claim about the inner loop, first notice that if at any point we have 􀀀 , then since 􀀀 and all lie on the line segment, we also have . This implies that and in this case the claim holds. So assume that this is not the case. We will show by induction that 􀀀 􀀀 and 􀀀 at the start of the iteration of the inner loop. This is true at the beginning of the inner loop when . Suppose that this is true for iterations .So we have, 􀀀 and at the 1 start of the iteration. In iteration , we set and either 􀀀 or . In the former case, we have 􀀀 􀀀 and we update 􀀀 ; similarly, in the latter case we have 􀀀 and we update . So in either case, at the beginning 􀀀 􀀀 of the iteration we maintain that 􀀀 and , and by induction the invariant holds through all iterations. At the end of iteration ,we have 􀀀 both at most , since as and both lie in 􀀀 , which implies that 􀀀 . This proves the claim about the inner loop on , and hence the lemma. Theorem 4.7 Algorithm FindOpt returns a feasible point satisfying O in time , where denotes the time taken to compute an -subgradient and ˝ . 􀀀 Proof : By Lemmas 4.5 and 4.6, we get that O . Since 􀀀 􀀀 􀀀 ˛ and ,we have (since we 􀀀 assumed ) which proves the performance guarantee, and shows that ˝ . The running time is which is . Now we show that procedure ConvOpt works correctly with high probability. We make the very mild assumption that at any point , in any non-null scenario, the total stage I cost + stage II cost of the scenario is at least 1, that is, for every scenario . Note that with integer costs, this is simply saying that we incur a non-zero total cost in any non-null scenario. (The constant 1 may be replaced by any other constant by adjusting the number of samples required by ConvOpt accordingly.) Lemma 4.8 Procedure ConvOpt determines (correctly) with probability at least Æ, that O , or that is an optimal solution. Proof : Note that since Æ 􀀀 . Since in every non-null scenario, we incur a cost of at least 1, O , where is the probability of occurrence of a non-null scenario. Let 􀀀 ° ˜ .So and .If Æ , then ° ˜ Æ.So with probability at least Æ we will say that O which is true since O 􀀀 .If Æ , then ° ˜ Æ. We return as an optimal solution with probability at least Æ which is indeed an optimal solution, because 􀀀 implies that it is always at least as good to defer to stage II since 􀀀 the expected stage II cost of a set is at most .If Æ , then we always return Æ a correct answer since it is both true that is an optimal solution, and that O . 4.2 Computing w-subgradients and bounding the Lipschitz constant We now focus on showing that algorithm ConvOpt returns a -optimal solution to relaxation (SSC-P2) of the stochastic set cover problem in polynomial time. Recall that our objective function is where , and for all for all By taking the dual, we can write ˘ where 􀀀 for all for all Recall that ˘ ˘ 5 . We argue that for the function one can efficiently compute 5 subgradients and bound its Lipschitz constant , as follows. 1. In Lemma 4.9 we show that any vector that component-wise approximates a subgradient at to within a certain accuracy is an -subgradient at .. 2. Next we show that at any point , there is a “nice” subgradient with components ˜ (in Lemma 4.10). This gives a bound on the Lipschitz constant , and will allow us to compute an -subgradient by using a sampling procedure. 3. Lemma 4.12 and Corollary 4.13 show that since the components lie in a range bounded multiplica 􀀀 tively by , samples from the scenario distribution suffice to compute a vector that component-wise approximates this subgradient to within the desired accuracy with high probability, and thus obtain a -subgradient. Using the above procedure to compute -subgradients in procedure FindOpt with a small enough error probability, and putting the various pieces together we show in Theorem 4.15 that ConvOpt returns a point , such that, O with probability at least Æ in time input size 􀀀 􀀀 . Æ Lemma 4.9 Let be a subgradient of at the point , and suppose that is a vector such that for all . Then is an -subgradient of at . Proof : Let be any point in . Since the polytope has as a defining constraint, it follows that for all .We have ,so we need to lower bound the second term by . Since and for every , (since ). Lemma 4.10 Consider any point 􀀀 , and let be an optimal dual solution for scenario with as the stage I decision vector. Then the vector with components is a subgradient at , and . Proof : Let be any point in 􀀀 . We have to show that . We know that for every scenario . Also, since , at point ,wehave for every scenario .So . The last term can be rewritten as therefore we get that . We can express in a similar way with an equality instead of the inequality, and replacing with . Subtracting the two terms, we get that where . To bound , since for all we have . Also, observe that , since for every scenario , and because some scenario has to materialize (recall that we include the empty set also as a scenario). Therefore , and hence . Claim 4.11 Suppose for every , where is a subgradient of at point . Then has Lipschitz constant (at most) . Proof : Consider any two points 􀀀 and let denote the subgradients at respectively, with , then we have , and similarly . Claim 4.11 and Lemma 4.10 show that we can bound the Lipschitz constant by . Note that is polynomially bounded. We will use the following sampling lemma to show that one can efficiently compute an -subgradient of at any point . Lemma 4.12 Let ˜ be a random variable, , computed by sampling from a probability distribution . Let ! and ˘ . Then for any , by taking 􀀀 􀀀 independent Æ samples from , one can compute an estimate such that with probability at least Æ. Proof : Let ˘ . The variance of is ! . We divide the samples into 􀀀 􀀀 groups, each group containing samples. Let be the value of computed Æ from the sample of group , 􀀀 . Let be the average of the values. We set median 􀀀 1 . The variables are iid with mean and variance .So we have ! and " # . By Chebyshev’s inequality, we get ° ˜ 􀀀 1 . Let if , and 0 otherwise, and 􀀀 . Then ! 􀀀 and the variables are independent. If or , then at least 􀀀 variables must be set to 1. Therefore by Chernoff bounds we have ° ˜ ˆ˘$ 1 Æ. Corollary 4.13 At any point , one can compute an -subgradient with probability at least Æ using at most independent samples from the probability distribution on scenarios. Æ Proof : The proof is an easy corollary of Lemmas 4.10, 4.9 and 4.12. We use the sampling process described in Lemma 4.12. Each time we sample and get a scenario , we compute the quantities where is an optimal dual solution for scenario with as the first-stage vector. Observe that if ! then , so the vector with components is a subgradient at by Lemma 4.10. Since ˜ for each , using Lemma 4.12 with error probability Æ and , we can estimate the expectation ! by using the claimed number of samples, so that for each individually, we have with probability at least Æ . So the error probability over all sets is at most Æ, that is, ° ˜ Æ. So by Lemma 4.9, the vector is an -subgradient at with probability at least Æ. Plugging in the time bound (with a sufficiently small error probability) in Theorem 4.7, we get the following. Lemma 4.14 Using the above procedure for computing -subgradients, FindOpt finds a feasible solution 􀀀 such that O with probability at least Æ in time input size 􀀀 􀀀 . Æ Proof : Theorem 4.7 gives the performance guarantee and accounts for the time taken excluding the time taken to compute -subgradients. We need to show that with high probability every vector the algorithm computes is an -subgradient for where ˛ and 􀀀 . The total number of times we need to compute a -subgradient is at most . Setting the error probability to Æ , and in Corollary 4.13, we get that samples suffice to Æ ensure that each individual vector computed is an -subgradient with probability at least Æ . So the net error probability over all -subgradient computations is at most Æ. The time taken is 􀀀 􀀀􀀀􀀀 , which is polynomial in the input size, , , and . ÆÆ Theorem 4.15 Procedure ConvOpt computes a feasible solution to (SSC-P2) of cost at most O 􀀀 with probability at least Æ in time polynomial in the input size, 􀀀 , and . Æ Proof : By Lemma 4.8, we know that if ConvOpt calls FindOpt then with probability at least Æ we Æ have O where . The performance guarantee and the time bound now follow from 􀀀 Æ Æ Lemma 4.14 since we set and . Since FindOpt may err with probability 􀀀 Æ at most Æ, the net error probability is at most Æ. 5 A General Class of Solvable Stochastic Programs We now show that algorithm ConvOpt can be used to solve the following broad class of 2-stage stochastic programs. subject to 􀀀 􀀀 (Stoc-P) where and s.t. (2) (3) 􀀀 􀀀 Here denotes the set of all possible scenarios, and is the feasible region polytope. We require that (a) for every scenario , and (b) at every point , and that the primal and dual problems corresponding to be feasible for every scenario . A sufficient condition for (b) is to insist that at every point and scenario . We can relax condition (a) somewhat and solve a more general class of programs that allow one to incorporate upper bounds on the second-stage decisions , under certain conditions. We assume that lies in , since we would like to be able to express the option where one does nothing in the first stage and defers all decisions to stage II. The essential property of this class of programs is that constraints (3) have the same matrix multiplying the recourse vector and the first-stage vector in every scenario . This implies that the stage I decisions given by the vector and the stage II decisions for scenario given by the vector act in the same capacity. All of the stochastic optimization problems we will consider can be expressed as convex programs in the above form, and one can therefore obtain a near-optimal fractional solution for each of these problems in polynomial time. Observe that this class of stochastic programs is rich enough to model stochastic problems with scenario-dependent recourse (that is, stage II) costs. To prevent an exponential blowup in the input, we consider an oracle model where an oracle supplied with scenario reveals the scenario-dependent data ; procedure ConvOpt will need to query this oracle only a polynomial number of times. We now show that the analysis in Section 4.2 can be extended to show that algorithm ConvOpt computes a near-optimal solution to (Stoc-P). Let denote the objective function which is easily shown to be convex. Define ˘ ˘ 5 . As before, we assume that the algorithm knows the value of 5 . To extend the analysis in Section 4.2 we need to show the following three things: (1) one can compute a -subgradient in polynomial time, (2) the Lipschitz constant can be set so that is polynomially bounded, and (3) one can detect with high probability that O is large. The third requirement is easily handled by Lemma 4.8. Under the mild assumption that at any , every “non-null” scenario1 incurs a total cost of at least 1, Lemma 4.8 holds, and shows that by sampling 􀀀 times one can Æ Æ determine, with probability at least Æ, whether O ,or if is an optimal solution. To 􀀀 Æ show (1) and (2) we proceed exactly as in Section 4.2. We show that at any point, there is a subgradient with a nice structure and bounded norm. By approximating this subgradient component-wise one obtains an -subgradient, and the bound on norm gives a bound on the Lipschitz constant. The following lemma shows that the components of the subgradient vector have variation that is bounded multiplicatively by , so that one can use the sampling lemma, Lemma 4.12, to compute an -subgradient with high probability by repeated sampling. 1The meaning of a non-null scenario will be intuitively clear from the application; for concreteness, we could define a null scenario as a scenario A for which ( ( . Lemma 5.1 Consider any point 􀀀 , and let be an optimal dual solution for scenario with as the stage I vector, where is the dual multiplier corresponding to inequalities (3). Then, (i) the vector is a subgradient at , (ii) and (iii) if is a vector such that , then is a -subgradient at . Proof : By taking the dual, we can write . For any other point , is a feasible dual solution for scenario , given the stage I vector .So and we have .As is a scalar, we can replace it by its transpose . Substituting above, and combining the terms with , we get that (4) Similarly, we have . Subtracting, we get that where , showing that is a subgradient at . For every scenario we have ,so since . Observe that the dual of the scenario (primal) optimization problem has the constraint . Since is a feasible dual solution, we have , and since , this shows that, .So we get that . Now by Claim 4.11 (which holds regardless of the function ), one can set , so that is polynomially bounded. Finally to show part (iii), we proceed exactly as in Lemma 4.9. . Since ,wehave , where the last inequality follows since . So as before, using Lemma 4.12, one can compute an -subgradient at any point using Æ samples. Theorem 5.2 Procedure ConvOpt can be used to obtain a feasible solution to (Stoc-P) of objective function 􀀀 value at most O with probability at least Æ, in time input size 􀀀 . Æ 5.1 2-Stage Programs with a Continuous Distribution We now consider the class of 2-stage programs specified by (Stoc-P) where the second stage scenario is specified by a parameter that is continuously distributed with probability density function , and show that procedure ConvOpt can be used to obtain a -optimal solution to this class of programs. Our objective function is ! , where ! and is the cost of scenario determined by the minimization problem in (Stoc-P) with parameters , , , , , and . As before we assume that at every feasible point and scenario , (a) , (b) , and that the primal and dual problems corresponding to are feasible. Notice that the proof of Lemma 5.1 (and hence that of Theorem 5.2) does not rely on the fact that the probability distribution is discrete. Consequently, the statement and proof of Lemma 5.1 extend eas ily to the continuous setting by replacing each occurrence of the summation by the inte gral . At any point ,if is an optimal solution for the dual problem corresponding to with being the dual multipliers for inequalities (3), then the vector is a subgradient at , and parts (ii) and (iii) of Lemma 5.1 hold as is where we define to be ˘ %&$ 5 , which we assume is known. One thus obtains a bound on the Lipschitz con 5 stant, and the fact that -subgradients can be computed by sampling. Finally, under the assumption that at every and every , either or , we can detect that O is large using Lemma 4.8. Theorem 5.3 Procedure ConvOpt returns a feasible solution to (Stoc-P) with a continuous distribution, of value at most O in polynomial time. 6 Applications We give a number of applications for which we prove the first known performance guarantees in the black- box model without any restrictions on the costs in the two stages. Our guarantees generalize and, in many cases, improve upon previous results that were obtained by placing restrictions either on the underlying distribution, or on the costs of the two stages. 6.1 Multicommodity Flow Problems We consider a stochastic version of the concurrent multicommodity flow problem where we are given a set of commodities represented by source-sink pairs , and we need to buy capacity to install on the edges so that in every scenario , one can concurrently ship units of each commodity from its source to its sink , where the scenario is generated by some probability distribution. We can either purchase capacity on an edge in stage I paying a lower price, or wait until the exact demands are known and buy capacity at a higher price; the total amount of capacity that we can install on an edge is limited by . The goal is to minimize the total (expected) cost of installing capacity. Let and denote the cost vectors for buying capacity in stage I and in scenario in stage II. The 2-stage stochastic multicommodity flow problem can be formulated as follows: minimize ( is the set of all scenarios) subject to for each , is the minimum value of subject to the constraints that for each , the total flow for is at least , for each edge , the total flow on is at most , and also at most (this ensures that ). Immorlica et al. [14] considered the single-commodity version of this problem and gave an algorithm based on writing an LP that enumerates all scenarios, one for each possible demand value, and solving the LP to compute the optimal first-stage decisions. Consequently, their running time depends on the maximum demand that may be realized. This approach suffers from the “curse of dimensionality” and does not work well in the multicommodity setting, since even if the maximum demand is 1, (.e., a scenario specifies a set of source-sink pairs have to be connected to each other) there are still an exponential number of scenarios to enumerate. Note that there are no integrality constraints, that is, one we can install fractional amounts of capacity. Since we have formulated the problem as a convex program of the type handled by Theorem 5.2, our fully polynomial approximation scheme can be applied. Whereas our running time depends on , the ratio of stage II and stage I costs, it does not depend on . Theorem 6.1 For any , the stochastic concurrent multicommodity flow problem can be approximated to within a factor of in polynomial time. The algorithm in Figure 1 can be applied to solve other stochastic multicommodity flow variants in polynomial time as well. For example, one could consider a maximum multicommodity flow variant, where a scenario specifies a set of active pairs and we want to install capacity so as to ensure that the total flow routed between the active pairs is at least some threshold . The only change here is that in the convex program, the scenario minimization problem contains a constraint stating that the net flow routed between the active pairs in is at least which replaces the concurrent flow constraints. This convex program can be solved to near-optimality, yielding a -approximation algorithm for the problem, for any . 6.2 Covering problems Vertex cover. The stochastic vertex cover problem is a special case of the stochastic set cover problem where we want to cover the edges of a graph by vertices. The edge set (i.e, scenario) to be covered is chosen from a probability distribution and is revealed only in stage II; we may choose a vertex either in stage I, paying a cost of , or in stage II at a cost of in scenario . The previous results known for this problem were an 8-approximation algorithm in the black-box model, a 3-approximation algorithm in the setting where each edge is independently activated, both under the restriction that for each and scenario , due to Gupta et al. [13]; Ravi and Sinha [22] gave a 2-approximation algorithm when there are only polynomially many scenarios (but the second-stage costs may be scenario dependent). Since the stochastic vertex cover problem is a special case of the stochastic set cover problem, and the deterministic vertex cover LP is known to have an integrality gap of 2, by Corollary 2.2, we obtain, for any ,a -approximation algorithm for the stochastic version with black box probability distributions and scenario-dependent second-stage costs. This is the first approximation algorithm in this more general model with black box probability distributions. Theorem 6.2 For any , there is a -approximation algorithm for the stochastic vertex cover problem with arbitrary probability distributions and scenario-dependent stage II costs. Minimum multicut problem on trees. In the deterministic minimum multicut problem on trees, we are given a tree with costs on the edges, and pairs of vertices . The goal is to remove a minimum-cost set of edges so as to disconnect each pair. In the stochastic variant, the pairs to be disconnected are revealed only in the second stage, and we can choose either to “cut” an edge in stage I or in stage II, paying a cost of or in scenario , respectively. The multicut problem is an instance of the case of the set-cover problem, where we want to cover each path. Garg, Vazirani & Yannakakis [11] gave a primal-dual approximation algorithm for the deterministic problem that showed that the natural covering LP relaxation of this problem has an integrality gap of 2. Using their 2-approximation algorithm, and applying Corollary 2.2, we get the following result. Theorem 6.3 For any , there is a -approximation algorithm for the stochastic minimum multicut problem on trees. General covering problems. Kolliopoulos and Young [17] consider general deterministic covering problems with multiplicity constraints, of the form subject to , where the entries of and are all non-negative. An example of such a problem is the multiset multicover problem with multiplicity constraints, which is an extension of the set-cover problem where each element is required to be covered times by the chosen sets; a set can cover element times and may be chosen at most times. Kolliopoulos and Young give bicriteria approximation algorithms for these problems. We obtain generalizations of their results for the corresponding stochastic covering problem where copies of a set may be purchased either in stage I or in stage II at a price of in stage I or in scenario in stage II. We remark that due to the multiplicity constraints the convex programming relaxation of the stochastic problem is not of the form SSC-P2 since the matrix has negative entries; nevertheless one can show that algorithm ConvOpt in Figure 1 can be used to obtain a -optimal solution, which can then be rounded using the procedure detailed in Theorem 2.1. 6.3 Facility location problems In the deterministic uncapacitated facility location (DUFL) problem, given a set of candidate facility locations ˘ and a set of clients , we want to open facilities at a subset of the locations in ˘, and assign each client to an open facility. Opening a facility at location incurs a cost of , and the cost of assigning client to facility is where is the demand of client , is the distance between and , and the distances form a metric. The goal is to minimize the total facility opening costs and client assignment costs. In the deterministic problem one assumes that the client demands are precisely known in advance; the 2-stage stochastic uncapacitated facility location (SUFL) problem handles settings where there is uncertainty in the demand, for example, due to macro-economic factors such as competition, technology, customer purchasing power etc. We are given a probability distribution on tuples 􀀀 where specifies the demand of client and is some known upper bound on the demand. We can open some facilities in stage I paying a cost of for opening facility , then the actual scenario with demands is revealed, and we may choose to open some more facilities in stage II, incurring a cost of for each facility that we open in scenario . As indicated by the notation, the recourse costs may in general be scenario-dependent. For the special case where for each ˘ and each scenario , Gupta et al. [13] gave an 8.45-approximation algorithm in the black box model, and a 6-approximation algorithm in the setting where each client is activated independently. Ravi and Sinha [22] gave an LP-rounding based 8-approximation algorithm in for the polynomial scenarios setting that can handle scenario-dependent facility opening and client assignment costs, where the assignment cost in scenario is for all . Their rounding algorithm needs to know the optimal fractional solution for each stage II scenario which renders it unsuitable when there are exponentially many scenarios. We improve upon all of these results. We consider a convex programming relaxation of the problem and give a different rounding approach that decides which facilities to open in stage I based on only the stage I fractional solution. Combined with our algorithm to solve the convex program, this yields a 3.225approximation algorithm in the black-box model with scenario-dependent costs. One can write the following convex program for SUFL. We use to index the facilities in ˘, to index the clients in , and to denote the set of all possible scenarios. subject to for all (SUFL-P) where s.t. for all such that for all ˘ such that for all Here indicates if facility is opened in stage I and indicates if facility is opened in the stage II scenario . The variables are the usual assignment variables indicating whether client is assigned to facility . The minimization problem for a scenario determines the cost, , incurred for scenario and has constraints that enforce that each client with positive demand has to be assigned to a facility that is opened either in stage I or in scenario A. The term is therefore the expected second stage cost. Observe that (SUFL-P) lies in the class of 2-stage stochastic programs handled by Theorem 5.2, and therefore one can use the algorithm in Figure 1 to obtain a solution of cost at most times the optimal in time polynomial in the size of the input and 􀀀 . Let denote the integrality gap of the deterministic problem which is bounded by 1.52 [20]. Theorem 6.4 There is a -approximation algorithm for SUFL based on rounding a near-optimal solution to (SUFL-P). Moreover, the integrality gap of (SUFL-P) is at most . These results hold even with scenario-dependent assignment costs . Proof : For notational simplicity, we shall focus on the case in which the demands in any scenario are either 0 or 1, that is, a scenario is now just a subset of the clients that need to be assigned to facilities. The extension to the setting with arbitrary demands requires only cosmetic notational changes. We first show that the integrality gap of (SUFL-P) is at most . The proof is along the lines of the proof of Theorem 2.1. Let be an optimal solution to (SUFL-P) and be the optimal solution for scenario given the first-stage decision vector . Let O be the optimal solution value. We will show that we can decouple the first-stage and second-stage decisions, so that one can get an integer solution by separately solving a DUFL problem for stage I and a DUFL problem for each stage II scenario. Fix a scenario and a client . Let . We write where and . Since we can always split in the above way. Observe that must be assigned to an extent of at least 􀀀 either by the assignment or by the assignment , that is 􀀀 􀀀 either or . In the former case, we will assign to a facility opened in stage I, and in the latter case we will assign to a facility opened in stage II. More precisely, for any client , consider the set of scenarios ˇ 􀀀 􀀀 . For our stage I decisions, we shall construct a feasible fractional solution for a DUFL instance in which the facility costs are , the assignment costs are , and each client has a demand equal to ; we then round this fractional solution to an integer solution using known algorithms for DUFL. In fact, we first construct a feasible solution in which there is a client for each scenario ˇ , with demand , and then coalesce these scenario-dependent clients into one. Consider such that ˇ . We can obtain a feasible solution by setting and for each ˘ . (Note that a client may be assigned to an extent greater than 1.) However, the facility variables do not depend on the scenario and given the values, we can re-optimize the fractional assignment for each client : first reset and then considering the facilities in non-decreasing order of the assignment cost , set for each facility Note that this new fractional assignment is completely determined by the values of the fractional facility variables and does not depend on , and so we can now view all of these clients as one client with demand . The facility cost of this fractional solution is at most , and the assignment cost is no more than the one for the scenario dependent clients, . Using the fact that the integrality gap of DUFL is , given this DUFL instance with a fractional solution , we can now obtain an integer solution of cost at most ; this determines the set of facilities to open in stage I, and for each client takes care of the scenarios in ˇ . In any scenario , each client such that ˇ is assigned to the stage I facility given by the assignment . To assign the remaining clients, we solve a DUFL instance with client set 􀀀 ˇ . Since ˇ , we have that , and hence if we reset for each ˘ , we get a feasible solution for this set of clients. Again, we can get an integer solution of cost at most . This solution tells us which facilities to open in scenario and how to assign the clients in with ˇ . Hence, the overall cost of the solution with first-stage facilities is at most O , which implies that the integrality gap is at most . To obtain the approximation algorithm, we first obtain a near-optimal solution in polynomial time. The difficulty in converting the proof of the integrality gap into a rounding algorithm is that the algorithm that shows that due to [20] requires knowledge of the client demands, whereas we do not know the demand of a client , and might not be able to even estimate it by sampling, since the probability could be extremely small. We therefore need an approximation algorithm for DUFL that works without explicit knowledge of the client demands. Swamy [27] (see Section 2.4) gives an algorithm with this property, improving upon the algorithm of [26] (which also has this property); the algorithm converts any fractional solution to an integer solution increasing the cost by a factor of at most 1.705. We use this algorithm to obtain the approximation algorithm. We modify the definition of ˇ slightly, so 􀀀 as to balance the contribution from stages I and II. Let . Let ˇ 􀀀 . 􀀀 􀀀 So now we have a fractional solution in which we set , and using the re-optimization procedure described earlier, we can find the optimal fractional assignment corresponding to the values. We round this using the algorithm of Swamy [27] to get a solution of cost at most 􀀀 . This determines the facilities to open in stage I. In any scenario , each client such that ˇ is taken care of by a stage I facility. Next, we determine which facilities to open in scenario and how to assign the remaining clients in by constructing a feasible fractional solution for a deterministic subproblem with client set ˇ and “rounding” this solution. We set and for each client such that ˇ , set . We “round” this solution using the algorithm of Mahdian et al. [20] (which is not an LP rounding algorithm) since the issue of the demands does not apply to this stage, to get an integer solution of cost at most􀀀􀀀 . So the total cost incurred if we open the facilities given by in stage I is at most O . The algorithm remains unchanged if we have arbitrary demands , and/or scenario-dependent assignment costs . The only change in the analysis, is that in the feasible fractional solution we exhibit to bound the cost of the stage I decisions computed, we set the demand of a client equal to , and in the fractional solution constructed for a stage II scenario , each client such that ˇ has demand . Remark 6.5 It is possible to prove an integrality gap of at most 3 by adapting the primal-dual algorithm of Jain & Vazirani [15]. This was also observed by Mahdian [19] and Devanur (personal communication). But this requires explicit knowledge of the probability of every scenario , and it seems difficult to obtain a polynomial-time algorithm this way. Extensions Our approach yields constant-factor approximation algorithms for various other stochastic facility location problems. In each case, we solve the relaxation of the stochastic integer program using the algorithm in Figure 1, and round the near-optimal solution by using a rounding algorithm for the deterministic problem in conjunction with a variant of the rounding procedure detailed above. We obtain constant performance guarantees for the stochastic versions of the facility location problem with penalties, or soft capacities, or service installation costs. The details may be found in [27]. 7 The dependence of the running time on . We have remarked previously that the running time of our algorithm (and that of Gupta et al. [13]) depends on the parameter , the maximum ratio between costs in the two stages. We first argue that in the black box model, this is necessary, and then provide stronger conditions on the way in which the distribution is specified that allows this dependence to be avoided. We show the lower bound by considering a rather simple instance of SSC with a single set and a single element. The example exposes the inherent limitation of using a black box to infer knowledge about the probability distribution on scenarios; it is straightforward to generalize the example to construct lower bound instances for other stochastic problems. Consider an instance of SSC with universe and just one set , where . Let denote the probability that scenario occurs (which is unknown to the algorithm that samples from the distribution on scenarios). The only decision here is whether to buy set in stage I or to defer buying the set to stage II. Let denote an algorithm that draws exactly samples. Let denote the value of the integer optimum solution. Theorem 7.1 If returns a (fractional) solution of cost at most with probability at least Æ where , then it must be that 􀀀 . The bound applies even if returns only a Æ fractional solution of cost at most . Proof : Let be a random variable that denotes the number of times scenario occurs in the samples. If , then must choose to defer to stage II with probability at least Æ (the algorithm may flip coins), that is, it must return the integer solution with probability at least Æ. Otherwise, with , and hence, , will pick (a non-zero fraction of) set in stage I with probability at least Æ, and thus incur a non-zero cost, that is, a cost greater than with probability at least Æ. Choose any 􀀀 such that and consider any where . Set and define 􀀀 . Let ° ˜ (since Æ 􀀀 ). The optimal solution is to pick in stage I, and incur a cost of 1. But if , then Æ 􀀀 Æ , so with probability at least Æ Æ, will choose the solution and incur a cost of . Therefore for to satisfy the required performance guarantee we must have for every which implies that 􀀀 . Æ Corollary 7.2 If algorithm returns a (fractional) solution of expected cost at most where , then it must be that ' . Proof : By Markov’s inequality, returns a solution of cost at most with probability at least . The claim now follows from Theorem 7.1. Now suppose that for the stochastic set cover problem, for every element , a) we know its activation probability , and b) we can sample scenarios conditioned on the fact that is activated, that is, scenarios from 􀀀 with scenario generated with probability . It is easy to see if the elements have independent probabilities of being activated (as considered in [14, 13]), then these conditions are satisfied. More generally, consider the subclass of problems mentioned in Section 5 where for every scenario , the parameters do not depend on the scenario, and . Fix an indexing of the rows of (and ). The columns of play the role of sets in the set cover problem, while the rows play the role of elements, and following the notation used in the set cover problem, we will use to index the columns of (and the components of ), and to index the rows of (and the components of and the dual vector ). We also require that in every scenario , is either 0 or a fixed quantity . Suppose that a) we know the “activation” probability , and b) we can sample scenarios conditioned on the event that . As mentioned above, the set cover example fits into this framework with as the element-set incidence matrix, rows representing elements and columns representing sets, and vector given by if , 0 otherwise. Using this additional structure we obtain the following result. Lemma 7.3 At any point ,an -subgradient of can be computed with probability at least Æ in time polynomial in the input size, 􀀀 , and 􀀀 . Æ Proof : Let denote the number of rows of ,so 􀀀 for each scenario . The dual of the scenario minimization problem is the following: ˘ (DP) s.t. (5) (6) where 􀀀 􀀀 , 􀀀 , and 􀀀 . Let denote the set of scenarios , that is, there are the scenarios where element is activated. Let be an optimal solution to (DP). Note that since , one may assume without loss of generality that if then , and that . Let . By Lemma 5.1 we know that the vector with components is a subgradient at . Since for , we can write . The proof is in three parts. First, we show by adapting the proof of part (i) of Lemma 5.1, that if is a -optimal solution to (DP), then the vector with components is an -subgradient at . Next, our goal will be to get structured near-optimal solutions for each scenario , so that one can estimate the quantity for each element to within a certain accuracy. We will show that one can get -optimal solutions to (DP) (for a suitable ) such that for every element , the variation in the values for is polynomially bounded. Finally, we will estimate for each element to within a factor of ˆ with high probability, and argue that these estimates induce a near-optimal solution to (DP) for each scenario , and thereby yield an -subgradient of at . Let be any -optimal solution to (DP). Consider any point . Since is a feasible dual solution for scenario with as the first-stage vector, as in Lemma 5.1 we have (7) Also since the value of is at least times that of , (8) Subtracting (8) from (7), we get that where with components .So is an -subgradient of at . Now we obtain a specific structured -optimal solution to (DP) for each scenario . We use to index the columns of . Let . For each element , define 5 . 5 Since , constraints (5) imply that . Similarly constraints (6) imply that ,so wehave for every element . Let . We assume that 􀀀 without loss of generality. Set if and 0 otherwise, and . Note that since if . Since for every scenario , either for every scenario in (if 􀀀 ), or for every scenario in . In the latter case, for any scenario , and since , so that the variation in is polynomially bounded. It is clear that the value of is at least times the optimum value of (DP); we show that is a feasible solution to (DP). Constraints (6) are satisfied since where the second inequality follows since 5 . Similarly, one can show that inequalities (5) hold, so 5 is a feasible solution. Set . We estimate by as follows. If , then we know that and we set . Otherwise, we sample 􀀀 times from the conditional Æ Æ distribution on , and for each sampled scenario compute as above. Let for denote the value of the sample. Let 􀀀 ; we set . Since scenario is generated with probability , ! ! is precisely . Note that each and so . So by standard Chernoff bounds we have ° ˜ ˆ˘$ Æ which implies that and with probability at least Æ . Finally, we show that vector with components given by is an -subgradient with probability at least Æ. From the above analysis, we know that ° ˜ Æ, so we may assume that this event happens. We will show that the values induce a -optimal solution to (DP) for each scenario , such that . Therefore, as shown before, we get that is an -subgradient of at the point . To prove the claim, consider the solution and for every scenario , element . Clearly , and so for every scenario , is a feasible solution to (DP) and its value is at least times the value of (since ) and hence at least times the value of . Using Lemma 7.3 to compute the subgradient at any point , we can implement procedure FindOpt in Figure 1 to run in time that does not depend on , and return a point of objective function value O with probability at least Æ. Combining this with a modified ConvOpt procedure, we obtain the following theorem. Theorem 7.4 For the above subclass of problems, given this extra information about the distribution, one 􀀀 can compute a -optimal solution with probability at least Æ in time input size 􀀀 . Æ Proof : Using the additional information about the probability distribution, we can now detect with probability 1, if O is “large”, without any sampling in step C1 of ConvOpt. Define a “null scenario” as a scenario with for all elements . Let ˇ be the set of all non-null scenarios. Assuming that we incur a total cost of at least 1 for every non-null scenario, O . Ob serve that is at least for any element , and is at most , since event some ˇ is realized implies that some element is activated. So if ˘ 􀀀 then O 􀀀 , and we can set 􀀀 in step C3 of ConvOpt, and call procedure FindOpt setting appropriately; otherwise ° some ˇ is realized 􀀀 which implies that is an optimal solution and we return this solution. Thus the entire algorithm runs in time that does not depend on . 8 Conclusions and discussion We presented an algorithm to solve a large class of 2-stage stochastic linear programs to within a factor of of the optimum, for any , in time polynomial in 􀀀 , the size of the input, and the ratio between the second-and first-stage costs. The algorithm works for both discrete and continuous distributions and requires only a black box to draw independent samples from the underlying probability distribution on scenarios. We show that is an inherent lower bound on the number of samples required in the black- box model; to the best of our knowledge, this is is the first result that shows that a broad class of 2-stage stochastic LPs can be solved in time polynomial in the input size and . We used our algorithm to devise the first approximation algorithms for a variety of 2-stage stochastic integer optimization problems in the black- box model and without any assumptions about the cost structure of the input. The performance guarantees are obtained by first solving a fractional relaxation of the problem using our algorithm to solve stochastic linear programs, and then rounding the near-optimal fractional solution. We show that this rounding step can be performed by utilizing existing algorithms for the deterministic analogue of the problem, thereby reducing, in some sense, the stochastic problem to its deterministic counterpart. Our algorithm for solving stochastic LPs is based on solving a convex-programming relaxation of the problem by adapting the ellipsoid method. One obvious question is whether one can obtain more efficient algorithms, in theory and/or practice, to solve 2-stage stochastic programs by adapting other techniques used for deterministic convex optimization, for example, interior-point methods, projection methods. From a practical perspective, it would be useful to investigate whether one can use the notion of approximate sub- gradients as defined in this paper, which one can compute efficiently via sampling, within the framework of a cutting plane algorithm, possibly along with a column generation procedure (since we have an exponential number of variables), to get an efficient heuristic for solving stochastic linear programs. A very appealiing approach, often used in practice, is the sample average approximation (SAA) method which consists of replacing the original stochastic program, where the underlying scenario distribution may have an exponentially large support, by the sample average problem which is a (smaller) linear program of size polynomial in the number of samples, that one can then solve efficiently using one’s favorite LP solver. The issue here is the number of samples required to ensure that an optimal (or near-optimal) solution to the sample average problem is, with high probability, a provably near-optimal solution to the original stochastic problem. As mentioned in the Introduction, Kleywegt et al. [16] show that for general stochastic programs the sample size required can be bounded by a polynomial in the dimension of the problem, and the variance of a certain quantity, however this variance need not be polynomially bounded, even for our structured class of LPs. Following our work, various researchers have suggested that it might be possible to argue that the SAA method actually yields an approximation scheme; in forthcoming work [28] we show that this indeed is true. We show that the convergence proof of our algorithm can be adapted to prove a convergence theorem for the SAA method. We show that for the class of problems considered in this paper, the SAA method converges to a -optimal solution to the true problem within a number of samples that is polynomial in the input size, 􀀀 , and . Independent of our work, Shapiro and Nemirovski [24] have shown that in the black-box model, the bounds on the sample size proved in [16] for the SAA method, are tight, up to polynomial factors, for arbitrary 2-stage stochastic programs. In this paper (and in [28]), we consider a structured class (Stoc-P) of 2-stage stochastic LPs that is rich enough to capture the fractional versions of a variety of combinatorial optimization problems, and by exploiting this structure we prove a bound on the sample size that is significantly better than the bound in [16]. In particular, even for the class (Stoc-P) and with small values of , the sample size bound in [16] can be exponentially large, whereas our bound is polynomial in . Most recently, and subsequent to the publication of a preliminary version of our work [25], Nemirovski and Shapiro (personal communication) showed that for the 2-stage stochastic set cover problem with non-scenario-dependent costs, if one preprocesses the input to eliminate certain first-stage decisions, then the bound of [16] for the SAA method applied to this modified problem, becomes polynomial in . 9 Acknowledgments We are grateful to Mike Todd and Shane Henderson for useful discussions and very helpful suggestions. References [1] M. Akg¨ul. Topics in relaxation and ellipsoidal methods. Pitman Advanced Publishing Program, London, 1997. [2] E. M. L. Beale. On minimizing a convex function subject to linear inequalities. Journal of the Royal Statistical Society, Series B, 17:173–184; discussion 194–203, 1955. [3] D. Bertsimas and M. Sim. Robust discrete optimization and network flows. Mathematical Programming, Series B, 98:49–71, 2003. [4] J. R. Birge and F. V. Louveaux. Introduction to Stochastic Programming. Springer-Verlag, NY, 1997. [5] J. Borwein and A. S.Lewis. Convex Analysis and Nonlinear Optimization Springer-Verlag, NY, 2000. [6] V. Chv´atal. A greedy heuristic for the set-covering problem. Mathematics of Operations Research, 4:233–235, 1979. [7] G. B. Dantzig. Linear programming under uncertainty. Management Science, 1:197–206, 1955. [8] S. Dye, L. Stougie, and A. Tomasgard. The stochastic single resource service-provision problem. Naval Research Logistics, 50(8):869–887, 2003. Also appeared as “The stochastic single node service provision problem”, COSOR-Memorandum 99-13, Dept. of Mathematics and Computer Science, Eindhoven, Technical University, Eindhoven, 1999. [9] M. Dyer, R. Kannan, and L. Stougie. A simple randomised algorithm for convex optimisation. SPOR- Report 2002-05, Dept. of Mathematics and Computer Science, Eindhoven Technical University, Eindhoven, 2002. [10] M. Dyer and L. Stougie. Computational complexity of stochastic programming problems. SPOR- Report 2003-20, Dept. of Mathematics and Computer Science, Eindhoven Technical University, Eindhoven, 2003. [11] N. Garg, V. Vazirani, and M. Yannakakis. Primal dual approximation algorithms for integral flow and multicut in trees. Algorithmica, 18(1):3–20, 1997. [12] M. Gr¨otschel, L. Lov´asz, and A. Schrijver. Geometric Algorithms and Combinatorial Optimization. Springer-Verlag, New York, 1988. [13] A. Gupta, M. P´al, R. Ravi, and A. Sinha. Boosted sampling: approximation algorithms for stochastic optimization. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing, pages 417–426, 2004. [14] N. Immorlica, D. Karger, M. Minkoff, and V. Mirrokni. On the costs and benefits of procrastination: approximation algorithms for stochastic combinatorial optimization problems. In Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 684–693, 2004. [15] K. Jain and V.V. Vazirani. Approximation algorithms for metric facility location and -median problems using the primal-dual schema and Lagrangian relaxation. Journal of the ACM 48(2):274–296, 2001. [16] A. J. Kleywegt, A. Shapiro, and T. Homem-De-Mello. The sample average approximation method for stochastic discrete optimization. SIAM Journal of Optimization, 12:479–502, 2001. [17] S. Kolliopoulos and N. Young. Tight approximation results for general covering integer programs. In Proceedings of the 42nd Annual IEEE Symposium on Foundations of Computer Science, pages 522– 528, 2001. [18] J. Linderoth, A. Shapiro, and R. K. Wright. The empirical behavior of sampling methods for stochastic programming. Annals of Operations Research. To appear. [19] M. Mahdian. Facility location and the analysis of algorithms through factor-revealing programs. Ph.D. thesis, MIT, Cambridge, MA, 2004. [20] M. Mahdian, Y. Ye, and J. Zhang. Improved approximation algorithms for metric facility location. In Proceedings of the 5th International Workshop on Approximation Algorithms for Combinatorial Optimization, pages 229–242, 2002. [21] Y. Nesterov and J.-Ph. Vial. Confidence level solutions for stochastic programming. CORE Discussion Papers, 2000. http://www.core.ucl.ac.be/services/psfiles/dp00/dp2000-13.pdf. [22] R. Ravi and A. Sinha. Hedging uncertainty: approximation algorithms for stochastic optimization problems. In Proceedings of the 10th International Conference on Integer Programming and Combinatorial Optimization, pages 101–115, 2004. [23] A. Shapiro. Monte Carlo sampling methods. In A. Ruszczynski and A. Shapiro, editors, Stochastic Programming, volume 10 of Handbooks in Operations Research and Management Science, North- Holland, Amsterdam, 2003. [24] A. Shapiro and A. Nemirovski. On complexity of stochastic programming problems. Published electronically in Optimization Online, 2004. http://www.optimization-online.org/DB FILE/2004/10/978.pdf. [25] D. B. Shmoys and C. Swamy. Stochastic optimization is (almost) as easy as deterministic optimization. In Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science, pages 228– 237, 2004. ´ [26] D. B. Shmoys, E. Tardos, and K. I. Aardal. Approximation algorithms for facility location problems. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing, pages 265–274, 1997. [27] C. Swamy. Approximation Algorithms for Clustering Problems. Ph.D. thesis, Cornell University, Ithaca, NY, 2004. [28] C. Swamy and D. B. Shmoys. The sample average approximation method for 2-stage stochastic optimization. Unpublished manuscript, 2004. [29] B. Verweij, S. Ahmed, A. J. Kleywegt, G. L. Nemhauser, and A. Shapiro. The sample average approximation method applied to stochastic routing problems: a computational study. Computational Optimization and Applications, 24:289–333, 2003.