IMPROVED APPROXIMATION ALGORITHMS FOR THE UNCAPACITATED FACILITY LOCATION PROBLEM ´ FABIAN A. CHUDAK∗ AND DAVID B. SHMOYS† Abstract. We consider the uncapacitated facility location problem. In this problem, there is a set of locations at which facilities can be built; a fxed cost fi is incurred if a facility is opened at location i. Furthermore, there is a set of demand locations to be serviced by the opened facilities; if the demand location j is assigned to a facility at location i, then there is an associated service cost proportional to the distance between i and j, cij . The objective is to determine which facilities to open and an assignment of demand points to the opened facilities, so as to minimize the total cost. We assume that the distance function c is symmetric and satisfes the triangle inequality. For this problem we obtain a (1 + 2/e)-approximation algorithm, where 1+2/e ≈ 1.736, which is a signifcant improvement on the previously known approximation guarantees. The algorithm works by rounding an optimal fractional solution to a linear programming relaxation. Our techniques use properties of optimal solutions to the linear program, randomized rounding, as well as a generalization of the decomposition techniques of Shmoys, Tardos & Aardal. Key words. Facility location, approximation algorithms, randomized rounding AMS subject classifications. 1. Introduction. The study of the location of facilities to serve clients at minimum cost has been one of the most studied themes in the field of Operations Research (see, e.g., the textbook edited by Mirchandani & Francis [MF90]). In this paper, we focus on one of its simplest variants, the uncapacitated facility location problem,also known as the simple plant location problem, which has been extensively treated in the literature (see, e.g., the survey by Cornu´ejols, Nemhauser & Wolsey [CNW90]). This problem can be described as follows. There is a set of potential facility locations F; building a facility at location i ∈F has an associated nonnegative fixed cost fi, and any open facility can provide an unlimited amount of a certain commodity. There also is a set of clients or demand points Dthat require service; client j ∈Dhas a positive demand of commodity dj that must be shipped from one of the open facilities. If a facility at location i ∈F is used to satisfy the demand of client j ∈D, the service or transportation cost incurred per unit is proportional to the distance from i to j, cij . The goal is to determine a subset of the set of potential facility locations at which to open facilities and an assignment of clients to these facilities so as to minimize the overall total cost, that is, the fixed costs of opening the facilities plus the total service cost. We will only consider the metric variant of the problem in which the distance function c is nonnegative, symmetric and satisfies the triangle inequality. Throughout this paper, a ρ-approximation algorithm is a polynomial-time algorithm that delivers a feasible solution within a factor of ρ of optimum. Our main result is a 1.736-approximation algorithm for the metric uncapacitated facility location problem. In contrast to the uncapacitated facility location problem, Cornu´ejols, Fisher & Nemhauser [CFN77] studied the problem in which the objective is to maximize the difference between assignment and facility costs. They showed that with this objective, the problem can be thought of as a bank account location problem as follows. A company seeks to maximize its available funds by paying bills using checks drawn on banks at different locations. More precisely, if a bill is incurred in location j, and is paid with a check from location i, there is delay in the clearing time that generates a profit, such as interest, cij . On the other hand, maintaining an account at location i has a fixed cost fi. Thus the company would like to choose a subset of locations at which to open accounts so as to maximize the difference between the total profit from clearing times minus the total fixed cost of maintaining the accounts. Notice that even though the maximization and minimization problems are equivalent from the point of view of optimization, they are not equivalent from the point of view of approximation: the maximization problem can be approximated within a constant factor, whereas the minimization problem with an arbitrary distance function is as hard as the set cover problem, and thus a c-approximation algorithm with c = o(log |D|) is unlikely to exist (see [Fei98] for details). Interestingly, Cornu´ejols, Fisher & Nemhauser showed that for the maximization problem, the greedy procedure that iteratively tries to open the facility that most improves the objective function yields a solution of value within a constant factor of optimum. In contrast, Hochbaum [Hoc82] showed that the greedy algorithm is an Θ(log |D|)-approximation algorithm for the minimization problem with an arbitrary ∗chudak@ifor.math.ethz.ch. Institute for Operations Research, Swiss Federal Institute of Technology, ETH-Z¨urich, Switzerland. This research was done while the author was a graduate student at the School of Operations Research & Industrial Engineering, Cornell University, Ithaca, NY 14853. Research partially supported by NSF grants DMS-9505155 and CCR-9700029 and by ONR grant N00014-96-1-00500. †shmoys@cs.cornell.edu. School of Operations Research & Industrial Engineering and Department of Computer Science, Cornell University, Ithaca, NY 14853. Research partially supported by NSF grants CCR-9912422, CCR-9700029 & DMS9505155 and ONR grant N00014-96-1-0050O. 1 F.A. CHUDAK and D.B. SHMOYS distance function. By filtering a linear programming relaxation, Lin & Vitter [LV92b] also obtained an O(log |D|)-approximation algorithm for the uncapacitated facility location problem when the distance function c is arbitrary. In addition, they also considered the k-median problem,in which only k facilities can be opened, but there are no fixed costs in the objective function. They showed how to find a solution with objective function value within (1 + ) of optimum, but that opens (1 + 1/ )O(log |D|) k facilities. Under the assumption that the distance function is a metric, they have also shown (see [LV92a]) how to find a solution of cost no more than 2(1 + ) of optimum, opening at most (1 + 1/ )k facilities. This latter result was the starting point of most recent work on metric facility location problems; although limited to the k-median problem, it essentially contains the core ideas needed to obtain a constant approximation algorithm for the metric uncapacitated facility location problem. The metric uncapacitated facility location problem is known to be NP-hard (see [CNW90]). Very recently, Guha & Khuller [GK99] and Sviridenko [Svi98] have shown that it is MAX-SNP-hard. In fact, Guha & Khuller have also shown that the existence of a ρ-approximation algorithm for ρ< 1.463 implies that NP ⊆ TIME(nO(log log n)) (see also Feige [Fei98]), and combined with an observation of Sviridenko [Svi98] such an algorithm would also imply that P=NP. We briefly review previous work on approximation algorithms for the metric uncapacitated facility location problem. The first constant-factor approximation algorithm was given by Shmoys, Tardos & Aardal [STA97], who presented a 3.16-approximation algorithm, based on rounding an optimal solution of a classical linear programming relaxation for the problem, due to Balinski [Bal65]. This bound was subsequently improved by Guha & Khuller [GK99], who provided a 2.408-approximation algorithm. Guha & Khuller’s algorithm requires a stronger linear programming relaxation, in which they add to the relaxation a facility budget constraint that separately bounds the total facility cost incurred. After running the algorithms of [STA97], they use a greedy procedure (as in [CFN77] and [Hoc82]) to improve the quality of the solution: iteratively, open one facility at a time if it improves the cost of the solution. However, since the optimal facility cost is unknown, they instead consider all ‘reasonable’ values of the form (1 + )k for some integer k and > 0. As a consequence, they must solve a weakly polynomial number of linear programs, round the optimal solution to each, and then select the best solution found. In contrast, the 1.736-approximation algorithm presented in this paper requires the solution of just one linear program, the one introduced by Balinski [Bal65], providing as a by-product further evidence of the strength of this linear programming relaxation. In a different line of work, Korupolu, Plaxton & Rajaraman [KPR00] showed that a simple local improvement heuristic produces a solution within a constant factor of optimum, though the best constant they obtain is 5. Very recently Arora, Raghavan & Rao [ARR98] have presented a quasi-polynomial approximation scheme for the case in which demand and facility points are in Rd,where the dimension d is fixed, and the distance function is the usual Euclidean distance. If d = 2, then they obtain a polynomial approximation scheme. Their method is similar to the one used by Arora [Aro98] to produce approximation schemes for the traveling salesman problem. Thus, in contrast with the algorithm presented here, it appears to have only theoretical relevance due to the inefficiency of this approach. Without loss of generality we shall assume that the set of potential facility locations F and the set of demand points D are disjoint; let N = F∪D, n = |N|. Even though all our results hold for the case of arbitrary nonnegative demands, for sake of simplicity of the exposition, we will assume that each demand dj is 1 (j ∈D); thus, the cost of assigning a client j to an open facility at location i is cij . The distance between any two points k, ∈N is ck . We assume that the n ×n distance matrix (ck ) is nonnegative, symmetric (that is, ck = c k, for all k, ∈N) and satisfies the triangle inequality (that is, cij ≤cik + ckj , for all i, j, k ∈N). The simplest linear programming relaxation (from [Bal65]), which we will refer to as P, is as follows: Minimize cij xij + fiyi j∈D i∈F i∈F (P) subject to xij =1 , for each j ∈D, (1.1) i∈F xij ≤yi , for each i ∈F, j ∈D, (1.2) xij ≥0 , for each i ∈F, j ∈D. (1.3) Any 0-1 feasible solution corresponds to a feasible solution to the uncapacitated facility location problem: yi = 1 indicates that a facility at location i ∈F is open, whereas xij = 1 means that client j ∈Dis serviced by the facility built at location i ∈F. Inequalities (1.1) state that each demand point j ∈D must be assigned to some facility, whereas inequalities (1.2) say that clients can only be assigned to open facilities. Thus the linear program P is indeed a relaxation of the problem. Given a feasible fractional solution (x, y), we will say that fiyi and i∈F cij xij are, respectively, its fractional facility and service cost. i∈F j∈D Given a feasible solution to the linear programming relaxation P, the algorithm of Shmoys, Tardos & Aardal first partitions the demand points into clusters and then for each cluster opens exactly one facility, which services all of the points in it. In their analysis, they show that the resulting solution has the property that the total facility cost is within a constant factor of the fractional facility cost and the total service cost is within a constant factor of the fractional service cost. The main drawback of this approach is that most of the time the solution is unbalanced, in the sense that the first constant is approximately three times smaller than the second. ∗ One of the simplest ways to round an optimal solution (x ,y ∗) to the linear program P is to use the randomized rounding technique of Raghavan & Thompson [RT87] as proposed by Sviridenko [Svi97] for the special case in which all of the distances are 1 or 2. The 1.2785-approximation algorithm of [Svi97], essentially, ∗ opens a facility at location i ∈F with probability yi ; and then assigns each demand point to its nearest facility. Guha & Khuller [GK99] also considered this special case, and proved a matching lower bound; that is, no better guarantee is possible unless P = NP . Ageev & Sviridenko [AS97] have recently shown that the randomized rounding analysis for the maximum satisfiability problem of Goemans & Williamson [GW94] can be adapted to obtain improved bounds for the maximization version of the problem studied by [CFN77]. The following simple ideas enable us to develop a rounding procedure for the linear programming relaxation P with an improved performance guarantee. We explicitly exploit optimality conditions of the linear program, and in particular, we use properties of the optimal dual solution and complementary slackness. A key element to our improvement is the use of randomized rounding in conjunction with the approach of Shmoys, Tardos & Aardal. To understand the essence of our approach, suppose that for each location i ∈F, ∗ independently, we open a facility at i with probability yi . The difficulty arises when attempting to estimate the expected service cost: the distance from a given demand point to the closest open facility might be too large. However, we could always use the routing of the algorithm of Shmoys, Tardos & Aardal if we knew ∗ that each cluster has a facility open. Rather than opening each facility independently with probability yi , ∗ we instead open one facility in each cluster with probability yi . The precise algorithm is not much more complicated, but the most refined analysis of it is not quite so simple. Our algorithms are randomized, and can be easily derandomized using the method of conditional expectations. Our main result is the following. Theorem 1.1. There is a polynomial-time algorithm that rounds an optimal solution to the linear programming relaxation P to a feasible integer solution whose value is within (1 + 2/e) 1.736 of the optimal value of the linear programming relaxation P. Since the optimal LP value is a lower bound on the integer optimal value, the theorem yields a 1.736approximation algorithm. The running time of algorithm is dominated by the time required to solve the linear programming relaxation P. Since the appearance of the preliminary version of this paper [Chu98], there have been significant strides forward on research on approximation algorithms for this problem. Most notably, Jain & Vazirani [JV01] gave a primal-dual 3-approximation algorithm for this problem, which by virtue of no longer needing to solve the linear programming relaxation, yields a substantially more efficient algorithm. The starting point of our algorithm is the graph defined by positive fractional primal assignment variables, for which complementary slackness conditions are then invoked to yield tight dual constraints; the method of Jain & Vazirani first constructs a dual solution, and this serves to define an analogous graph in which the edges have the corresponding dual constraints hold with equality. Subsequent algorithmic results have touched on several different paradigms. Sviridenko [Svi02] has provided a more sophisticated analysis of the approach used here, also incorporating a more clever use of the so-called pipeage rounding technique. Another significant contribution of this paper is a much simpler technique for proving one of the crucial probabilistic lemmas of our paper (as well as a generalization needed for this subsequent improvement). A number of papers have given analyses that are primal-dual in flavor. Results of Jain, Mahdian, and Saberi [JMS02] and Mahdian, Markakis, Saberi, and Vazirani [MMSV01] gave a primal-dual based analysis of greedy-style algorithms. Mettu and Plaxton [MP00] gave an algorithm that, at first consideration, does not appear to be a primal-dual algorithm at all, but by defining “radii” for amortizing the fixed cost needed to open a facility at a particular location, is merely doing so implicitly. At this writing, the best known performance guarantee follows from an analysis of this type: Mahdian, Ye, and Zhang have given a 1.52approximation algorithm [MYZ02]. Further work has also been done on local search algorithms; most notably, Charikar and Guha [CG99] have given a more sophisticated neighborhood structure that is amenable to analysis to yield stronger bounds F.A. CHUDAK and D.B. SHMOYS than those obtained by Korupolu, Plaxton, and Rajaraman. Kolliopoulos and Rao [KR99] have also improved on the state of the art in constructing polynomial approximation schemes for these problems; most notably, they showed that a polynomial approximation scheme could be obtained in Euclidean metric spaces of constant dimension. 2. A Simple 4-approximation Algorithm. In this section we present a new simple 4-approximation algorithm. Even though the guarantees we will prove in the next section are substantially better, we will use most of the ideas presented here. After stating a few definitions and simple facts, we review the work of Shmoys, Tardos & Aardal [STA97], and introduce the dual of the linear programming relaxation P and properties of primal and dual solutions that will be useful throughout the paper. First we define the neighborhood of a demand point k ∈D. Definition 2.1. If (x, y) is a feasible solution to the linear programming relaxation P,and j ∈D is any demand point, the neighborhood of j, N(j), is the set of facilities that fractionally service j, that is, N(j)= {i ∈F: xij > 0}. The following fact is a simple consequence of the previous definition and inequality (1.1). Fact 1. For each demand point j ∈D, i∈N(j) xij =1 . The following definition was crucial for the algorithm of Shmoys, Tardos & Aardal [STA97]. Definition 2.2. Suppose that (x, y) is a feasible solution to the linear programming relaxation P and let gj ≥0,for each j ∈D.Then (x, y) is g-close if xij > 0 implies that cij ≤gj (j ∈D, i ∈F). Notice that if (x, y)is g-close and j ∈Dis any demand point, all the neighbors of j, that is, the facilities that fractionally service j, are inside the ball of radius gj centered at j. The following lemma is from [STA97]. Lemma 2.3. Given a feasible g-close solution (x, y), we can find, in polynomial time, a feasible integer 3g-close solution (x, y) such that fiyi ≤ fiyi . i∈F i∈F We briefly sketch the proof below. The algorithm can be divided into two steps: a clustering step and a facility opening step. The clustering step works as follows (see Table 2.1). Let S be the set of demand points that have not yet been assigned to any cluster; initially, S= D. Find the unassigned demand point j◦ with smallest gj -value and create a new cluster centered at j◦. Then all of the unassigned demand points that are fractionally serviced by facilities in the neighborhood of j◦ (that is, all of the demand points k ∈S with N(k) ∩N(j◦)= ∅) are assigned to the cluster centered at j◦;the set S is updated accordingly. Repeat the procedure until all of the demand points are assigned to some cluster (i.e., S = ∅). We will use C to denote the set of centers of the clusters. The following fact follows easily from the clustering construction 1. S←D, C←∅ 2. while S= ∅ 3. choose j◦ ∈S with smallest gj value (j ∈S) 4. create a new cluster Qcentered at j◦, C←C∪{j◦} 5. Q←{k ∈S: N(k) ∩N(j◦)= ∅} 6. S←S−Q Table 2.1 The clustering construction of Shmoys, Tardos & Aardal. and the definition of neighborhood, and is essential for the success of the algorithm. Fact 2. Suppose that we run the clustering algorithm of Table 2.1, using any g-close solution (x, y). Then the neighborhoods of distinct centers are disjoint; that is, if j and k are centers, j = k ∈C,then N(j) ∩N(k)= ∅. After the clustering step, the algorithm of [STA97] opens exactly one facility per cluster. For each center j ∈C we open the facility i◦ in the neighborhood of j, N(j), with smallest fixed cost fi and assign all the demand points in the cluster of j to facility i◦. Observe that by inequalities (1.2) and Fact 1, i∈N(j) yi ≥1; thus fi◦ ≤ fiyi. Using Fact 2, the total facility cost incurred by the algorithm is never more than i∈N(j) the total fractional facility cost i∈F fiyi. Next consider any demand point k ∈D and suppose it belongs to the cluster centered at j◦;let ∈ N(k) ∩N(j◦) be a common neighbor and let i◦ be the open facility in the neighborhood of j◦ (see Figure 2.1). Then, the distance from k to i◦ can be bounded by the distance from k to (which is at most gk)plus the distance from to j◦ (which is at most gj◦ ) plus the distance from j◦ to i◦ (which is at most gj◦ ). Thus, the distance from k to an opened facility is at most 2gj◦ + gk,which is at most 3gk,since j◦ was the remaining demand point with minimum g-value. Hence the total service cost can be bounded by 3 i◦ ≤gj◦ ≤gj◦ j◦ k N(k)N(j) ≤gk Fig. 2.1. Bounding the service cost of k (4-approximation algorithm). The circles (•) are demand points, whereas the squares ( ) are facility locations. k∈D gk. Shmoys, Tardos & Aardal used the filtering technique of Lin & Vitter [LV92b] to obtain g-close solutions (as we will describe in Section 5) and then applied Lemma 2.3 to obtain the first constant factor approximation algorithm for the problem. However, a simpler g-close solution is directly obtained by using the optimal solution to the dual linear program of P. More precisely, the dual of the linear program P is given by Maximize vj (D) subject to j∈D wij ≤fi for each i∈F (2.1) j∈D vj −wij ≤cij for each i∈F, j ∈D (2.2) wij ≥0 for each i∈F, j ∈D . (2.3) ∗∗ Fix an optimal primal solution (x ,y ∗) and an optimal dual solution (v ,w ∗), and let LP∗ be the optimal ∗∗∗ ∗ LP value. Complementary slackness gives that xij > 0 implies vj −wij = cij ;since wij ≥0, we get the following lemma. ∗∗ Lemma 2.4. If (x ,y ∗) is an optimal solution to the primal linear program P and (v ,w ∗) is an optimal ∗ solution to the dual linear program D,then (x ,y ∗) is v ∗-close. ∗ By applying Lemma 2.3 to the optimal v ∗-close solution (x ,y ∗), we obtain a feasible solution for the ∗∗ problem with total facility cost at most i∈F fiyi and with total service cost bounded by 3 j∈D vj =3 LP∗ . We can bound the sum of these by 4 LP∗; thus we have a 4-approximation algorithm. Note the imbalance in bounding facility and service costs. 3. A Randomized Algorithm. After solving the linear program P, a very simple randomized algo ∗ rithm is the following: open a facility at location i ∈F with probability y independently for every i ∈F, i and then assign each demand point to its closest open facility. Notice that the expected facility cost is just ∗ fiyi , the same bound as in the algorithm of Section 2. Focus on a demand point k ∈D.If it happens i∈F that one of its neighbors has been opened, then the service cost of k would be bounded by the optimal dual ∗ variable vk . However, if we are unlucky and this is not the case (an event that, as we will see, can easily be shown to occur with probability at most 1/e 0.368, where the bound is tight), the service cost of k could be very large. On the other hand, suppose that we knew, for instance, that, for the clustering computed in Section 2, k belongs to a cluster centered at j, and that one of the facilities in N(j) has been opened. Then in this unlucky case we could bound the service cost of k using the routing cost of the 4-approximation algorithm. Our algorithm is also based on randomized rounding and the expected facility cost is i∈F fiy ∗.How ∗ ever, we weaken the randomized rounding step and do not open facilities independently with probability yi , but rather in a dependent way to ensure that each cluster center has one of its neighboring facilities opened. F.A. CHUDAK and D.B. SHMOYS Even though the algorithms presented in this section work for any g-close feasible solution, for sake of simplicity of the exposition, we will assume as in the end of Section 2 that we have a fixed optimal primal ∗ ∗∗ solution (x ,y ∗) and a fixed optimal dual solution (v ,w ∗), so that (x ,y ∗)is v ∗-close. It is easy to see that ∗ we can assume that y ≤1 for each potential facility location i∈F. i To motivate the following definition, fix a demand location j ∈D, and suppose without loss of generality ∗ that the neighborhood of j (that is, the facilities i for which xij >0) is {1,...,d}with c1j ≤c2j ≤...≤cdj . Then it is clear that we can assume that j is assigned “as much as possible” to facility 1, then to facility 2 ∗∗∗∗∗ ∗ ∗∗ and so on; that is, x = y = y2 , ..., x = y (but maybe x d). 1j 1 , x2jd−1,j d−1 dj 0 implies that xij = yi, for every i∈F, j ∈D. ∗ Thus the optimal solution (x ,y ∗) is “almost” complete, in the sense that for every j ∈D there is at ∗∗ most one i∈F with 0 0, let j◦ be the one with smallest xij value. Next create a new facility location i which is an exact copy of i (i.e., the same fixed cost and in the same location), and set yi∗ = yi −xij◦ and set yi equal to xij◦ . Next for every j ∈D with xij > 0, set xij = xij◦ = yi,and set xi∗ j = xij −xij◦ (which is nonnegative by the choice of j◦). All of the other components of xand y remain unchanged. Clearly (x,y)is a feasible solution to the linear programming relaxation of the new instance; if (x,y)is g-close so is (x,y). It is straightforward to verify that the new instance is equivalent to the old one and that the fractional facility and service costs of the solutions (x,y)and (x,y) are the same. Since the number of pairs (k,j)for which 22 0 0) implies that dd ∗ d ∗ 1 ∗−x − q = (1 −xik ) ≤ e ik = e i=1 xik = . e i=1 i=1 We will bound the expected service cost of k by considering a provably worse algorithm: assign k to its closest open neighbor; if none of the neighbors of k is open, assign k to the open facility i◦ ∈N(j◦)(exactly ∗ as in Section 2). If facility 1 is open, an event which occurs with probability y ,the service cost of k is 1 c1k. If, on the other hand, facility 1 is closed, but facility 2 is open, an event which occurs with probability ∗∗ (1 −y )y ,the service cost of k is c2k, and so on. If all of the facilities in the neighborhood of k are closed, 12 which occurs with probability q,then k is assigned to the open facility i◦ ∈N(j◦). But in this case, k is ∗∗ ∗ serviced by i◦,so the service cost of k is at most 2v + vk ≤3vk exactly as in Figure 2.1 (Section 2); in fact, j◦ ∗ this backup routing gives a deterministic bound: the service cost of k is always no more than 3vk.Thus the expected service cost of k is at most ∗∗∗ ∗∗ ∗∗ c1k y1 + c2k y2 (1 −y1 )+ ···+ cdk yd(1 −y1 ) ...(1 −yd−1)+3vk q d 13 ∗∗ ∗ ≤ 3v v, cikxik + k = Ck + k ee i=1 which concludes the proof of the lemma in this special case. Now we return to the more general case in which there are centers in C that can share more than one neighbor with k. We assumed that this was not the case in order to ensure that the events of opening facilities in N(k) were independent, but now this is no longer true for facilities i,i ∈N(k)thatare neighbors of the same center. However, if one of i or i is closed, the probability that the other is open increases; thus the dependencies are favorable for the analysis. The key idea of the proof is to group together those facilities that are neighbors of the same cluster center, so that the independence is retained and the proof of the special case above still works. A more rigorous analysis follows. Let Cbe the subset of centers that share neighbors with k. For each center j ∈C,let Sj = N(j) ∩N(k), and so C= {j ∈C: Sj = ∅}. We have already proved the lemma when |Sj |≤1, for each center j ∈C.For each center j ∈C,let Ej be the event that at least one common neighbor of j and k is open (see Figure 3.1). To follow the proof, for each j ∈C, it will be convenient to think of the event of choosing facility i in Sj as a ∗ sequence of two events: first j chooses to “open” Sj with probability pj = xik (i.e., event Ej occurs); i∈Sj ∗ and then if Sj is open, j chooses facility i∈Sj with probability xij /pj (which is the conditional probability ∗ of opening i given event Ej ). Now let cj = i∈Sj cik xik/pj ;that is, cj is the conditional expected distance from k to Sj ,given theevent Ej . For example, if Sj = {r,s,t}are the common neighbors of j and k,the ∗∗∗ ∗∗∗ event Ej occurs when one of r,s or tis open, pj = x tk and cj = crkxrk/pj +cskxsk /pj +ctkxtk /pj . rk +xsk +x Notice that by Fact 2, the events Ej (j ∈C) are independent. This completes the facility central grouping. Consider the neighbors of k that are noncentral facility locations; that is, locations i ∈N(k) ∩R.For each each noncentral neighbor i∈N(k) ∩R,let Ei be the event in which facility i is open, ci be the distance cik, ∗ and pi = xik .Next notice that all of the events E are independent. It follows easily from the definitions that p = i∈F xik =1 and cp = Ck . Now we can argue essentially as in the simple case when |Sj |≤1 for each center j ∈C. Assume that there are d events E , and for notational simplicity, they are indexed by ∈{1,...,d},with c1 ≤... ≤cd. Let D be the event that none of E1,...,Ed occurs; that is, D is precisely the event in which all the facilities in the neighborhood of k, N(k), are closed; let q be the probability of event D. Note that, as in the simple ∗ case, the service cost of k is never greater that its backup routing cost 3vk, in particular, this bound holds even conditioned on D. As before, we will analyze the expected service cost of a worse algorithm: k is assigned to the open neighboring facility with smallest c ; and if all the neighbors are closed, k is assigned F.A. CHUDAK and D.B. SHMOYS N(4) 4 E4 k N(k) E2 2 N(2) E1 E33 1 Fig. 3.1. Estimating the expected service cost of k. Here the centers that share a neighbor with k are demand locations 2 and4(C = {2, 4}). The neighbors of k that are noncentral locations are 1 and 3. Event E2 (respectively E4)occurswhen a facility in N(k)∩ N(2) (respectively N(k)∩ N(4)) is open, while event E1 (respectively E3) occurs when facility 1 (respectively 3) is open. Though there are dependencies among the neighbors of a fixed center, the events E1, E2, E3 and E4 are independent. through its backup routing to the open facility i◦ ∈ N(j◦). If the event E1 occurs (with probability p1), the expected service cost of k is c1.If event E1 does not occur, but event E2 occurs (which happens with probability (1 −p1)p2), the expected service cost of k is c2, and so on. If we are in the complementary space D, which occurs with probability q = d (1 −p ), the service cost of k is never greater than its backup =1 ∗ service cost 3vk. Thus the expected service cost of k can be bounded by ∗ c1 p1 + c2 (1 −p1)p2 + ···+ cd (1 −p1) ...(1 −pd−1)pd +3vk q. (3.1) To prove the lemma we bound the first d terms of (3.1) by Ck,and q by 1/e. Notice than even though the clustering construction is deterministic, the backup service cost of k (that is, the distance between k and the facility open in N(j◦)) is a random variable B. In the proof above, we used ∗ the upper bound B ≤ 3vk. In fact, the proof of Lemma 3.6 shows that the expected service cost of k is no more than Ck + qE[B|D], where D is the event in which no neighbor k is open, as in the proof of the lemma. As can be easily seen, the upper bound used for equation (3.1) is not tight. In fact, we can get an upper bound of (1 −q) Ck + qE[B|D] as follows. First note the following simple probabilistic interpretation of the first d terms of (3.1). Let Z ( =1,...,d) be independent 0-1 random variables with Prob{Z =1} = p . Consider the set of indices for which Z is 1, and let Z be the minimum c value in this set of indices; if all of the Z are 0, Z is defined to be 0. Then the expected value of Z is exactly equal to the first d terms of (3.1). Given a set of numbers S, we will use min◦(S) to denote the smallest element of S if S is nonempty, and 0 if S is empty, so that Z =min◦ {cZ : =1,...,d and Z =1}. The following intuitive probability lemma, whose proof is given in Section 6, provides a bound on the first d terms of (3.1). (A much simpler proof of this lemma, based on the Chebyshev Integral Inequality, was observed by Sviridenko [Svi02].) d Lemma 3.9. Suppose that 0 ≤ c1 ≤ ... ≤ cd, p1,...,pd > 0 ,with =1 p =1.Let Z1,...,Zd be 0-1 independent random variables, with Prob{Z =1}= p ;let C = cp .Then d E min◦ c Z + C (1 −Z ) ≤C. { :Z =1} =1 Applying the lemma to the first d terms of equation (3.1), since E[ d (1 −Z )] = d (1 −p )= q,we =1 =1 have that c1p1 + c2(1 −p1)p2 + ···+ cd(1 −p1) ...(1 −pd−1)pd ≤Ck(1 −q) . (3.2) Thus we have proved the following. Lemma 3.10. For each demand point k ∈D, the expected service cost of k is at most (1 −q) Ck + q E[B|D]. Finally we introduce the last idea that leads to the (1 + 2/e)-approximation algorithm. In Figure 2.1, ∗ we have bounded the distance from the center j◦ to the open facility i◦, ci◦ j◦ ,by v . However, i◦ is j◦ ∗ selected (by the center j◦) with probability x and, thus, the expected length of this leg of the routing i◦ j◦ ∗∗ is x = Cj◦ , which in general is smaller than the estimate v used in the proof of Lemma 3.6. i∈F cij◦ ij◦ j◦ Thus, to improve our bounds, we slightly modify the clustering procedure by changing line 3 of Table 2.1 to ∗ 3’. choose j◦ ∈S with smallest vj + Cj value (j ∈S) We will call the modified algorithm randomized rounding with improved clustering.Notice that Lemmas 3.4 and 3.10 are unaffected by this change. We will show that the modified rule 3’ leads to the ∗∗ bound E[B|D] ≤2vk + Ck, improving on the bound of 3vk we used in the proof of Lemma 3.6. Lemma 3.11. If we run randomized rounding with improved clustering, the conditional expected ∗ backup service cost of k, E[B|D],is atmost 2vk + Ck. Proof. Suppose that the clustering partition assigned k to the cluster with center j◦. Deterministically, we divide the proof into two cases. deterministic bound (a) (b) bound in expected value j◦j◦ k ≤v ∗ j◦ ≤v ∗ k ≤Cj◦ ≤v ∗ j◦ ≤Cj◦ N(k) N(j◦) v ∗ k ≥ N(k) i i k N(j◦) Fig. 3.2. Bounding the backup service cost of k. Case 1. Suppose that there is a facility ∈N(k) ∩N(j◦), such that c j◦ ≤Cj◦ (see Figure 3.2(a)). Let ∗ i be the facility in N(j◦) that was opened by j◦;notice that cij◦ ≤v (because (x, y)is v ∗-close). Then j◦ ∗ the service cost of k is at most cik ≤ck + c j◦ + cj◦i, which using again that (x ,y ∗)is v ∗-close, is at most ∗∗∗∗ ∗ vk + c j◦ + vj◦ ≤vk + Cj◦ + vj◦ ≤Ck +2vk, where the last inequality follows from the fact that the center ∗∗ has the minimum (Cj + vj ) value. In this case, we have a (deterministic) bound, B ≤Ck +2vk. Case 2. Assume that c j◦ >Cj◦ for every ∈N(k) ∩N(j◦) (see Figure 3.2(b)). First note that when we do not condition on D (i.e., that no facility in N(k) is open), then the expected length of the edge from j◦ to the facility that j◦ has selected is Cj◦ . However, we are given that all of the facilities in the neighborhood of k are closed, but in this case, all of these facilities that contribute to the expected service cost of j◦ (the facilities in N(k) ∩N(j◦)) are at distance greater than the average Cj◦ . Thus the conditional expected service cost of j◦ is at most the unconditional expected service cost of j◦, Cj◦ . It follows then that if ∈N(k)∩N(j◦), ∗∗ ∗ the conditional expected service cost of k is at most Cj◦ + cj◦ + c k ≤Cj◦ + v + vk ≤Ck +2vk,where j◦ ∗∗ ∗ again the last inequality from the fact that Cj◦ + v ≤Ck + v . Hence, E[B|D] ≤Ck +2vk in this case, j◦ k too. F.A. CHUDAK and D.B. SHMOYS Thus, using Lemmas 3.10 and 3.11, the expected service cost of k can be bounded by 2 ∗ ∗∗ Ck(1 −q)+ q(2vk + Ck)= Ck +2qvk ≤Ck + vk , e where once again we bound q by 1/e. Corollary 3.12. The expected total service cost of randomized rounding with improved clus ∗ tering is at most Ck +(2/e) . k∈D k∈D vk Combining Corollaries 3.5 and 3.12, randomized rounding with improved clustering produces a feasible solution with expected cost no greater than ∗∗ fiyi + Ck + 2 v = 1+ 2 LP∗ 1.736 LP∗ . k ee i∈F k∈D k∈D Thus we have proved the following theorem. Theorem 3.13. There is a polynomial-time randomized algorithm that finds a feasible solution to the uncapacitated facility location problem with expected cost at most (1 + 2/e) LP∗ . As a consequence of the theorem, we obtain the following corollary on the quality of the value of the linear programming relaxation of [Bal65]. Corollary 3.14. The optimal value of the linear programming relaxation P is within a factor of 1.736 of the optimal cost. This improves on the previously best known factor of 3.16 presented in [STA97]. To finish the proof of Theorem 1.1 we will show in the next section that the algorithm of Theorem 3.13 can be derandomized using standard methods. 4. Derandomization. In this section we show how to derandomize the algorithm of Theorem 3.13. To this end, we will use the method of conditional expectations due to Erd¨os & Selfridge [ES73] (see also Spencer [Spe87]) that works as follows. Suppose that we have m 0-1 random variables U1,...,Um,and know that E[F]= G,where F = g(U1,...,Um) for a nonnegative real-valued function g of m variables. Our task is to determine a set of values u1,...,um for the random variables U1,...,Um such that g(u1,...,um) ≤G. We wish to decide whether to set U1 to 0 or 1. The expected value E[F] is a convex combination of the conditional expectations E[F|U1 =0] and E[F|U1 = 1], more precisely, E[F]= Prob{U1 =0}E[F|U1 =0]+ Prob{U1 =1}E[F|U1 =1]. We would have certainly made the right decision if the conditional expectation decreases. Thus we set u1 to 1 if E[F|U1 =1] ≤E[F|U1 = 0] , and 0 otherwise. Note that E[F|U1 = u1] ≤E[F]= G. The process continues inductively. Suppose that we have already decided on the values u1,...,ut ∈{0,1}so that the conditional expected value E[F|U1 = u1,...,Ut = ut]isatmost G.Now E[F|U1 = u1,...,Ut = ut]is a convex combination of E[F|U1 = u1,...,Ut = ut,Ut+1 =0] and E[F|U1 = u1,...,Ut = ut,Ut+1 = 1]. Again we choose to set ut+1 to 1 if the expected value decreases, that is, if E[F|U1 = u1,...,Ut = ut,Ut+1 =1] ≤ E[F|U1 = u1,...,Ut = ut,Ut+1 = 0], and 0 otherwise. Clearly, at termination, we obtain a set of 0-1 values u1,...,um such that g(u1,...,um) ≤G as desired. Notice that we only need to compute 2m conditional expected values. However for this process to work we have tobeableto computeall theintermediary conditional expectations efficiently (i.e., in polynomial time). We first show that the analysis of randomized rounding with improved clustering also provides a random variable W,a pessimistic estimator [Rag88], such that the cost of the solution delivered by the algorithm is at most W, and the bounds we obtained can be read off as a bound on the expected value of W, that is, E[W] ≤(1 + 2/e)LP∗ . Furthermore the expected value of W can be computed exactly. The upper bound W will let us use the method of conditional expectations to derandomize the algorithm. The random variable W is precisely the cost of the worse algorithm we used to prove Theorem 3.13; this algorithm either assigns each demand k to the closest open neighbor (in the c sense as in the proof of Lemma 3.6) or, if none of its neighbors is open, it assigns k to the backup facility of the cluster to which k belongs. More concretely, for each potential facility location i∈Flet Ui be the 0-1 random variable that is 1 exactly when a facility at location i is open. Notice that the random variables Ui are not independent in general, and that, since we assumed that the feasible solution (x,y) was complete, the expected value of Ui is yi. It follows immediately that the facility cost of the randomized algorithm is fiUi. i∈F Next we need to find an upper bound for the service costs expressed in terms of the Ui’s. We fix a demand location k ∈D and carefully revise the bounds on the expected service cost of k given in the previous section. Recall the notation used in the proof of Lemma 3.6. For each ∈{1,...,d},we extend the definition of S when is a neighbor of k by just setting S equal to {}.Note that in any case cp == E[ cikUi]. Now let P = Ui and T = cikUi,for =1,...,d;and i∈S cik yii∈Si∈Si∈S d let Q = (1 −P ). Observe that because of the dependencies, the random variables P are 0-1 with =1 ∗ p = E[P ]= x Now suppose that according to Lemma 3.11 we are in Case 1. Then the arguments i∈S ik. given to bound the expected service cost of k in effect say that the service cost of k is at most ∗∗ T1 + T2(1 −P1)+ ···+ Td(1 −P1) ...(1 −Pd−1)+ Q(vk + v + Cj◦ ); (4.1) j◦ let Zk denote this upper bound. Since the factors of the products in each term are independent, it is ∗ straightforward to verify that the bound on the service cost of kof Theorem 3.13 implies E[Zk] ≤Ck +(2/e)vk. ForCase2,the servicecostof k is at most ∗∗ T1 + T2(1 −P1)+ ···+ Td(1 −P1) ...(1 −Pd−1)+ Q(vk + v + cij Ui); (4.2) j◦ i∈N(j◦)−N(k) which as before will be denoted by Zk. This case is slightly more complicated than Case 1, since before we used the fact that our upper bound was deterministic. Now dependencies have to be taken into account. As in Case 1, the expected value of the first d terms can be upper bounded, exactly as in the proof of Theorem 3.13, by (1 −q)Ck. For the last term, notice first that the dependencies imply that Pj◦ Ui = 0 (and thus (1 −Pj◦ )Ui = Ui )for each i∈N(j◦) −N(k), so that if Q = (1 −Pj ), j=j◦ Qcij Ui = Q (1 −Pj◦ ) cij Ui = Qcij Ui . i∈N(j◦)−N(k) i∈N(j◦)−N(k) i∈N(j◦)−N(k) Now the two factors on the right-most expression are independent, and ⎡⎤ E ⎣ cij Ui⎦ ≤(1 −pj◦ )Cj◦ , i∈N(j◦)−N(k) sinceby Case2, cij◦ >Cj◦ for all i∈Sj◦ . Thus the expected value of the last term of (4.2) can be bounded ∗∗ ∗ by q(v ). Putting the pieces together, we have again that E[Zk] ≤Ck +(2/e)v as needed. k + vj◦ + Cj◦ k Notice also that, as in Case 1, Zk can be written as a sum in which each term is the product of independent random variables. Clearly if W = fiUi + Zk, the cost of the randomized algorithm can be bounded by W,and i∈F k∈D E[W] ≤(1 + 2/e)LP∗.Since W is the cost of a worse algorithm, if we were able to find 0-1 values for the Ui’s so that the corresponding value of W is at most its expected value E[W], we would have a feasible solution to the problem whose cost is at most (1 + 2/e)LP∗ , thus proving Theorem 1.1. As mentioned earlier, to find such values for the Ui’s we apply the method of conditional expectations. We need to explain how to compute the conditional expected values of W. To understand how to apply the method of conditional expectations suppose first that all of the Ui’s are independent random variables. Now the conditional expected value E[W|Ui = 0] (respectively E[W|Ui = 1]) is easy to compute: simply replace Ui by 0 (respectively by 1) in the expression of W, and compute the expected value directly. It is also easy to see that we can substitute each Ui by ui ∈{0,1}for i ∈E, for any subset E⊆F and compute E[W|Ui = ui (i ∈E)]. Thus, if we did not have dependencies, the derandomization of the algorithm is quite simple. However, if for instance Us and Ut (s,t ∈F, s = t) are dependent, when conditioning on the event {Us =1}, we cannot just replace Us by 1 in the expression of W, since the value of Ut might also be affected. This apparent difficulty can be easily overcome in our case, since the dependencies imply that Ut = 0, whenever Us = 1. Next we describe the derandomization process with more detail. We first consider the noncentral facilities. If i is a noncentral facility location, i ∈R,it is easy to compute the conditional expected value of W given that Ui = u (u =0,1): simply substitute Ui by u in the expression of W and take expected values. In the same way, we can also compute E[W|Ui = ui (i ∈E)], for any subset E⊆R,where ui ∈{0,1} (i ∈E). Now we can determine the values of ui for i ∈R applying the method of conditional expectations as described earlier. For notational simplicity, we will assume that R= {1,...,r}. In the first step, we set u1 equal to 1 (or equivalently open facility 1) only if E[W|U1 =1] ≤E[W|U1 = 0]; otherwise we set u1 equal to 0. Inductively, if we already know u1,...,uh−1,in step h we set Uh to 1 if E[W|U1 = u1,...,Uh−1 = uh−1,Uh =1] ≤E[W|U1 = u1,...,Uh−1 = uh−1,Uh =0], and to0otherwise. When h= r,we have values u1,...,ur such that if W is the conditional random variable W|U1 = u1,...,Ur = ur ,then E[W] ≤E[W]. In what follows we find values for the remaining variables Ui, that is, decide which central facilities to open, in such a way that at termination the cost of the solution is at most E[W]. Fix a center j ∈C.We wish F.A. CHUDAK and D.B. SHMOYS to decide which neighboring facility to open. For each neighbor i ∈ N(j) we can compute the conditional expectation of W given that Ui is 1 as follows. Since Ui = 1, for all the other neighbors i of j it must be that Ui∗ = 0. Hence we just replace these values in the formula of W and compute the corresponding expectation. The key observation is that, since we open exactly one facility in the neighborhood of j, E[W] is a convex combination of the conditional expected values E[W|Ui =1] for i ∈ N(j), that is, E[W]= Prob{Ui =1} E[W|Ui =1] . i∈N(j) Thus we open the neighboring facility i◦ ∈ N(j) for which the conditional expected value is smallest, more precisely, we set ui◦ equalto 1,and ui equal to 0 for i ∈ N(j),i = i◦. Using now that the neighborhoods of distinct centers are disjoint, we can essentially argue as in the simple case when all the variables were independent. We repeat the process inductively; at each step we treat a new center and decide which neighboring facility to open, so that the conditional expected value never increases. Finally notice that there are |R| + |C| steps, and overall we need only to compute 1 + 2|R| + |L| expected values. The most expensive computation that dominates the whole derandomization process is to find, initially, the expected value of W,which takes O(|D||F| log |F|) arithmetic operations. 5. Extensions. To motivate the results of this section, suppose that the contribution of the fractional facility cost to the optimal fractional cost is very small. If we run randomized rounding with improved clustering and focus on the analysis, we would have an imbalance between the facility and service cost upper bounds. Hence, it is intuitively clear that we could afford to open facilities with higher probability to balance out the bounds. In this section, we will show how to do this in general, providing a more refined performance guarantee that depends on information related to the cost distribution of the optimal fractional solution. A standard technique to improve the performance of randomized rounding consists of using a nontrivial mapping of the optimal fractional solution into probabilities. One of the most common approaches boosts all the probabilities by a factor of γ, for a fixed parameter γ> 0. For instance, the simplest randomized rounding algorithm would open facility i ∈F with probability min{γyi ∗ ,1}. These ideas can also be applied to our randomized algorithm in a simple fashion that we describe below. First of all, as mentioned in Section 3, the proof of Theorem 3.13 is valid for any g-close solution, that is, if (x,y)is g-close, randomized rounding with improved clustering produces a feasible solution with expected cost at most 2 fiyi + cij xij + gj . e i∈F j∈D i∈F j∈D In fact, it is also easy to see that the derandomization of Section 4 also carries over to this more general setting. Now suppose again that (x,y)is g-close and complete, and let γ ≥ 1. We next describe the algorithm γ-randomized rounding with improved clustering, which is a variant of randomized rounding with improved clustering. The only difference is that now we open facilities with higher probabilities. Each noncentral facility i ∈R is opened independently with probability min{γyi,1}.As in randomized rounding with improved clustering, each center j ∈C opens one facility in its neighborhood i ∈ N(j) with probability xij ,in a first phase. Additionally, if a facility i ∈ N(j) has not been opened, it is now opened in a second phase with probability min{γyi − xij ,1} =min{γyi − yi,1}. The new algorithm has a guarantee given by the following theorem, whose proof is a minor variation of the proof of Theorem 3.13. Theorem 5.1. For each γ ≥ 1, the expected cost of the solution produced by γ-randomized rounding with improved clustering is at most 2 γfiyi + cij xij + γ gj . e i∈F j∈D i∈F j∈D Proof. For simplicity, we will analyze a worse algorithm in which we are allowed to open more than one facility at each location i ∈F. More precisely, for each facility i ∈L,if i is in the neighborhood of center j, we open a facility at location i in the second phase with probability min{γyi − yi,1} independently on whether there is already an open facility at i (from the first phase); in this case, we of course take into account the extra facility cost. First we estimate the expected total facility cost as in Lemma 3.4. If i ∈R is a noncentral facility, i is open with probability min{γyi,1}, giving a contribution to the expected total facility cost of fi min{γyi,1}≤ γfiyi.Now if i∈Lis a central facility location, i∈N(j)for j ∈C, a facility is open at ifirst with probability yi = xij , contributing fiyi. In addition, another facility is open at i with probability min{γyi −yi,1}, contributing an additional fi min{γyi −yi,1}to the expected total facility cost. Overall, the contribution of i is fiyi + fi min{γyi −yi,1}≤ γfiyi, once again. Thus, the expected total facility cost is at most γfiyi. i∈F Next, focus on a demand point k ∈D. As in the proof of Lemma 3.6, for each center j ∈C, Sj = N(j) ∩N(k)and C= {j ∈C: Sj = ∅}.Now for each j ∈C, the event Ej occurs when a facility in Sj is open in the first phase; the probability of event Ej is then pj = i∈Sj xik . In addition, for each central neighbor i ∈N(k) ∩L, define the event Ei in which a facility is open at location i in the second phase; hence the probability of event Ei is pi =min{γyi −yi,1}. For each noncentral neighbor i ∈N(k) ∩R, the event Ei occurs when a facility at location i is open, and the probability of event Ei is pi =min{γyi,1}.As in the proof of Lemma 3.6 and by the assumption at the beginning of the proof, all the events E are independent. For each j ∈C, cj is the expected distance from k to Sj given the event Ej , and considering only the first phase. For each neighbor i∈N(k), ci is simply the distance cik. Without loss of generality, we assume that there are d events E , and that they are indexed by ∈{1,...,d}with c1 ≤... ≤cd.Let D be the event in which none of the events E1,...,Ed occurs, and let q = d (1 −p ) be the probability of event D.If B =1 is the backup service cost of k, that is, the distance from k to the facility open during the first phase in the cluster to which k belongs (as in the proof of Lemma 3.10), following the proofs of Lemmas 3.6 and 3.10, the expected service cost of k is at most c1 p1 + c2 (1 −p1)p2 + ···+ cd (1 −p1) ...(1 −pd−1)pd + E[B|D] q. (5.1) It is easy to see that the proof of Lemma 3.11 remains valid, so that E[B|D] ≤2gk + Following i∈F cikxik. the proof of Theorem 3.13, we only need to show that the first dterms of (5.1) are at most (1−q) i∈F cikxik and that q ≤1/eγ . We will use the following generalization of Lemma 3.9, whose proof is postponed to Section 6. (A much simpler proof of this lemma, based on the Chebyshev Integral Inequality, was observed by Sviridenko [Svi02].) d Lemma 5.2. Let 0 ≤c1 ≤... ≤cd,and z1,...,zd > 0,with =1,and let γ ≥0. Suppose that =1 z Z1,...,Zd are 0-1 independent random variables, with Prob{Z =1}=min{γz ,1};let C = cz .Then d E min◦ cZ + (1 −Z )C ≤C. Z =1 =1 If j ∈C,let zj = pj /γ,so that pj = γzj =min{γzj,1}.If i ∈N(k) ∩R,let zi = yi = xik,so that pi =min{γzi,1}. Finally, if i∈N(k) ∩L,with i∈N(j), j ∈C,let zi = xik −xik/γ,so that pi =min{γzi,1}. Nowwehavethat z =1, and cz = i∈F cik xik. As in the proof of Lemma 3.10, using now Lemma 5.2, the first d terms of (5.1) can be bounded by (1 −q) i∈F cikxik. To finish the proof of the theorem notice that if p is1for some , q is 0; otherwise, dd d d 1 −γz − =1 γz q = (1 −p )= (1 −γz ) ≤ e = e = γ . e =1=1 =1 Using similar arguments to those given in Section 4, the algorithm of the theorem can be derandomized. ∗ For the rest of the section let ρ∈[0,1] be defined by ρLP∗ = fiyi .If ρ is either 0 or 1, there is an i∈F optimal solution to P which is integral; that is, all the yi’s are0or1. When ρ= 0, the optimal solution sets yi to 1 exactly for those facilities i∈F for which fi = 0. The case when ρ= 1 is slightly more complicated and requires that the distance function be symmetric and satisfy the triangle inequality. First for each facility location i ∈F, define Di as the set of demand points that are at distance 0 from i.If i, ∈F and j,k ∈D, the inequality cij ≤cik + c k + c j implies that for i= i,either Di = Di∗ or Di ∩Di∗ = ∅. Hence, we can partition the set of facilities into classes such that if i and i belong to the same class, Di = Di∗ ,and Di ∩Di∗ = ∅ otherwise. Now since ρ =1, the sets Di (i ∈F) cover all the demand points. Finally, the optimal solution simply opens the cheapest facility in each class (notice that the fact that Di ∩Di∗ = ∅ for i and i in different classes is crucial to argue optimality). Thus we will assume that 0 <ρ< 1. For each fixed ρ, we want to apply Theorem 5.1 to a g-close feasible solution to P with a conveniently chosen γ so as to improve the performance guarantee of γ-randomized rounding with improved clustering. ∗ First we consider the optimal solution (x ,y ∗), which is v ∗-close. By Theorem 5.1, the expected cost of the solution produced by γ-randomized rounding with improved clustering is at most 2 ∗ ∗∗ γfiyi + cij xij + vj . γ e i∈F j∈D i∈F j∈D F.A. CHUDAK and D.B. SHMOYS ∗∗ Since j∈D vj = LP∗,and j∈D i∈F cij xij =(1 −ρ) LP∗ , the expected performance guarantee is at most 2 γρ+(1 −ρ)+ γ (5.2) e Now we choose γ ≥1 so as to minimize (5.2), which gives ln(2/ρ)if ρ≤2/e, γ = 1 if ρ>2/e. In this case, we obtain a performance guarantee of 1+ ρ ln(2/ρ)if ρ≤2/e, 1+2/e if ρ>2/e. Figure 5.1 shows the performance of the new algorithm for each value of ρ. Notice that the performance improves when ρ gets closer to 0, but then there is no improvement, for instance, when ρ is close to 1. To improve the guarantees for this range of values of ρ, wewill useadifferent g-close solution, the one proposed by Shmoys, Tardos & Aardal [STA97], which provides better guarantees when ρ is close to 1. In what follows, we apply Theorem 5.1 to the g-close solutions given in [STA97], which were obtained ∗ using the filtering technique of Lin & Vitter [LV92b] applied to the optimal solution (x ,y ∗). Fix α ∈ ∗ (0,1]. For a demand point j ∈D, suppose that N(j)= {1,...,d},with c1j ≤ ... ≤ cdj .Let = ∗ min{ :1 ≤≤d, ij ≥α}; then the α-point of j, cj (α), is c ∗j . For each demand point j ∈D,let i=1 x ∗ α = x. j ij i:cij cj (α) α The filtered solution (x ,yα) is defined in a simple way to ensure g-closeness, for gj = cj (α)(j ∈D), as follows: if j ∈D, i∈F, ⎧ ∗ ⎪ x if cij ≤cj (α), ⎨ ij x αij = jα ⎪ ⎩0 otherwise, α ∗ α and y =min{1,yi /α}for i ∈F. It is easytoverifythat (x ,yα) is a feasible primal solution (because i α jα ≥ α)and that (x ,yα)is (cj (α))-close. The total fractional facility cost of the filtered solution is α ∗∗ i∈F fiyi , and it is clearly bounded by i∈F fiyi /α. Next define τ(α)= j∈D cj (α)/ j∈D i∈F cij xij . The following lemma follows from Lemma 10 of [STA97] and was observed in [GK99]. Lemma 5.3. The function τ(α) satisfies the following expression 1 τ(α)dα =1. 0 1 ∗ Proof. It was shown in [STA97] that cj (α)dα = i∈F cij xij , from which the lemma follows at once 0 according to our definitions. Also note that τ(α) is a left continuous step function that has at most O(|D||F|) break points. Since for each demand point k ∈D, the fractional service cost of the filtered solution is bounded by the fractional transportation cost, that is, ∗ x α ik ∗ cikxik = cik ≤ cikxik , i∈F i∈F {i:cik ck (α)} jα the previous theorem implies that the expected total cost of the solution of γ-randomized rounding with α improved clustering when applied to the (cj (α))-close solution (x ,yα)is atmost ∗ y 2 γfii + cij x ∗ ij + γ cj (α), αe i∈Fj∈D i∈F j∈D or an expected performance guarantee of ρ 2 γ +1 −ρ+ γ (1 −ρ)τ(α). (5.3) αe To find the best possible guarantee we will use binary search on a target guarantee and Lemma 5.3 in a way similar to that used in [GK99] as follows. Fix c> 1 and suppose that γ-randomized rounding with improved clustering does not produce a c-approximation algorithm for every α ∈ (0, 1), γ ≥ 1. Then eγ (c − 1+ ρ)α − γρ τ(α) >. (5.4) 2 (1 − ρ)α We are interested only in values of α for which the expression on the right is nonnegative; hence we will assume that α ≥ γρ/(c − 1+ ρ). For fixed α,the value of γ ≥ 1 that makes the right hand side of (5.4) greatest is given by γ(α)= ⎧⎨ ⎩ (c − 1+ ρ)α − ρ if 2ρ/(c − 1+ ρ) ≤ α ≤ 1 ρ 1 if ρ/(c − 1+ ρ) ≤ α ≤ 2ρ/(c − 1+ ρ). Thus, from (5.4), we obtain a lower bound for τ(α), LBc(α), given by LBc(α)= ⎧ ⎪⎪⎪⎨ ⎪⎪⎪⎩ ρ exp{[(c − 1+ ρ)α − ρ]/ρ} if 2ρ/(c − 1+ ρ) ≤ α ≤ 1 2 (1 − ρ)α e (c − 1+ ρ)α − ρ if ρ/(c − 1+ ρ) ≤ α ≤ 2ρ/(c − 1+ ρ) 2 (1 − ρ)α 0 if 0 ≤ α ≤ ρ/(c − 1+ ρ). Note that LBc(α) is a continuous and increasing function of α (since c and ρ are fixed). Now since 11 τ(α)dα =1, and τ(α) >LBc(α), if we can show that LBc(α)dα> 1, we have a contradiction 00 and, thus, the performance guarantee of the algorithm is no greater than c. Using bisection search we can 1 find the smallest c,say c◦, within a small error tolerance (say 0.0001) for which LBc◦ (α)dα> 1, and hence 0 show that the algorithm has a performance guarantee c◦. It is clear that the bisection search takes constant time. We finally address the issue of how to find the parameters α and γ to obtain a c◦-approximation algorithm. We know that 1 0 1 τ(α)dα =1 < LBc◦ (α)dα, 0 (5.5) hence there must be a value of α,say α◦, such that τ(α◦) ≤ LBc◦ (α◦). (5.6) This implies that when γ = γ(α◦), γ-randomized rounding with improved clustering produces a solution with expected cost within a factor of c◦ of optimum. Thus we only need to argue how to find such an α◦ in polynomial time. Now, we know that τ(α) is a left continuous step function with at most O(|D|2) break points. Hence we can find a partition of the interval (0, 1] = (0,s1] ∪ (s1,s2] ∪ ... ∪ (sk, 1], with k = O(|D|2), such that τ(α) is constant in each subinterval. Suppose that τ(α)= z in the subinterval (si,si+1]. If z> LBc◦ (si+1), since LBc◦ is increasing, the inequality must hold for the whole subinterval. Thus if for all the right end-points of the subintervals inequality (5.6) does not hold, τ(α) >LBc◦ (α)for each α ∈ (0, 1], contradicting (5.5). This gives a simple way to find a c◦-approximation algorithm. Figure 5.1 shows the performance guarantees obtained using this algorithm. We conclude the section by pointing out that there is a minor improvement on the best guarantee we can achieve using the algorithms γ-randomized rounding with improved clustering. Indeed a careful computation gives a slightly improved overall performance guarantee of 1.73352 < 1.73576 1+2/e. 6. Proof of Lemmas 3.9 and 5.2. Notice that Lemma 3.9 follows from Lemma 5.2 by simply taking γ = 1. Thus we only need to prove Lemma 5.2. As noted above, Sviridenko [Svi02] (in the proof and discussion around his Lemmas 4 and 5) observed that a much simpler version of this proof can be derived from the Chebyshev Integral Inequality, and cited particular variants in the text of Hardy, Littlewood, and Polya [HLP52]. The key idea of the proof is to observe that the expected value decreases as a function of γ;thus, since when γ is 0 the expected value is exactly C we are done. F.A. CHUDAK and D.B. SHMOYS 1.9 1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1 ρ Fig. 5.1. Performance guarantees of γ-randomized rounding with improved clustering as a function of ρ.The solid ∗ line corresponds to the algorithm that uses the optimal v ∗-close solution (x ,y ∗), while the dashed line corresponds to the algorithm run with the filtered solution of [STA97]. d For γ ≥ 0, let p(γ):= E min◦{i:Zi =1} ciZi + (1 −Zi)C .Let γ =mini(1/zi). We first prove the i=1 lemma for γ ≤γ,so that γzi < 1, if γ<γ.In this case, p(γ) can be computed as d c1γz1 + c2γz2(1 −γz1)+ ···+ cdγzd(1 −γz1) ...(1 −γzd−1)+ C (1 −γz ). =1 Notice that p(γ) is a polynomial on γ,and that p(0) = C. Hence, to prove the lemma in this case it is enough to show that p (γ) ≤0, for γ ∈(0,γ). Next define for each =0,1,...,d 1 if =0, F (γ)= (1 −γz )F −1(γ)if ≥1. Thus we can write p(γ)= c1γz1F0(γ)+ c2γz2F1(γ)+ ···+ cdγzdFd−1(γ)+ CFd(γ) dd = c γzF −1(γ)+ Fd(γ) cz =1 =1 d = cz [γF −1(γ)+ Fd(γ)] . =1 performance guarantee 1.7358 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Then d p (γ)= cz F −1(γ)+ γF −1(γ)+ Fd(γ) . =1 d For =1,...,d,let λ = F −1(γ)+ γF −1(γ)+ Fd(γ), so that p (γ)= =1 czλ. Next note that 0 if =0 F (γ)= −zF −1(γ)+(1 −γz )F −1(γ)if ≥1. It is easy to check by induction that F (γ) ≤ 0, =0,...,d. To prove the lemma in this case, we will use the following two claims. d Claim 1. The following equality holds: =0 . =1 zλ Claim 2. There is an index ◦, 0 ≤ ◦ ≤d, such that λ1,...,λ ◦−1 ≥0,and λ ◦ ,...,λd ≤0. To prove that p (γ) ≤0, take ◦ as in Claim 2, then use Claim 1 and the order of the c ’s: ◦−1 d p (γ)= cλz + cλz =1 = ◦ ◦−1 d ≤c ◦ λz + c ◦ λz =1 = ◦ d = c ◦ λz =1 =0 . We complete the proof of the lemma for γ ∈ [0,γ] by proving the claims. Proof of Claim 1. Note first that for =1,...,d −zF −1(γ) −γz F −1(γ)= F (γ) −F −1(γ). Thus d [−zF −1(γ) −γz F −1(γ)] = Fd(γ) −F0(γ)= Fd(γ), =1 which implies that zλ =0, since z =1. ProofofClaim 2. If λ ≥ 0(or λ ≤ 0) for all ,using Claim1, λ =0 for all , and any index works. Hence we can assume that at least one λ< 0 and at least one λ> 0. Suppose the claim does not hold. Then there must exist an index ,1 ≤≤d, such that λ ≤0, but λ +1 > 0. In particular, λ +1 = F (γ)+ γF (γ)+ Fd(γ) > 0. Thus, λ +1 −Fd(γ)= F (γ)+ γF (γ) =(1 −γz )F −1(γ)+ (1 −γz )γF −1(γ) −γz F −1(γ) =(1 −γz )[F −1(γ)+ γF −1(γ)] −γz F −1(γ) > −Fd(γ) ≥0. Let t := F −1(γ)+ γF −1(γ). Since (1 −γz ) > 0and −γz F −1(γ) < 0, it must be the case that t> 0. Since λ ≤0, t ≤−Fd(γ). Thus, 0