Facility Location with Service Installation Costs 

(Extended Abstract) 
David B. Shmoys∗ Chaitanya Swamy∗ Retsef Levi† 

Abstract 

We consider a generalization of the uncapacitated facility 
location problem which we call Facility Location with Service 
Installation Costs. We are given a set of facilities, F,a 
set of demands or clients D, and a set of services S.Each 
facility i has a facility opening cost fi, and we have a service 

fl

installation cost of for every facility-service pair (i, l). 
Each client j in Drequests a specific service g(j) ∈Sand 
the cost of assigning a client j to facility i is given by cij . 
We want to open a set of facilities, install services at the 
open facilities, and assign each client j to an open facility at 
which service g(j) is installed, so as to minimize the sum of 
the facility opening costs, the service installation costs and 
the client assignment costs. 

Our main result is a primal-dual 6-approximation algorithm 
under the assumption that there is an ordering on 
the facilities such that if i comes before i in this ordering 
then for every service type l, fil ≤fil . This includes (as 
special cases) the settings where the service installation cost 

i 

fl 

depends only on the service type l, or depends only on 
the location i. With arbitrary service installation costs, the 
problem becomes as hard as the set-cover problem. Our 
algorithm extends the algorithm of Jain & Vazirani [9] in a 
novel way. If the service installation cost depends only on the 
service type and not on the location, we give an LP rounding 
algorithm that attains an improved approximation ratio of 

i 

2.391. The algorithm combines both clustered randomized 
rounding [6] and the filtering based technique of [10, 14]. 
We also consider the k-median version of the problem where 
there is an additional requirement that at most k facilities 
may be opened. We use our primal-dual algorithm to give 
a constant-factor approximation for this problem when the 
service installation cost depends only on the service type. 

1 Introduction 

Facility location problems have been widely studied in 
the Operations Research community (see for e.g. [12]). 
In its simplest version, uncapacitated facility location 
(UFL), we are given a set of facilities, F,and a set 
of demands or clients D. Each facility i has a facility 
opening cost fi and the cost of assigning a client j to 
facility i is given by cij .We want to open some facilities 

∗{shmoys,swamy}@cs.cornell.edu. Dept. of Computer Science, 
Cornell University, Ithaca, NY 14853. Research supported 
partially by NSF grant CCR-9912422. 

†rl227@cornell.edu. School of ORIE, Cornell University, 
Ithaca, NY 14853. Research supported partially by a grant from 
Motorola and NSF grant CCR-9912422. 

from the set F and assign each demand to an open 
facility. The goal is to minimize the sum of the facility 
opening costs and the client assignment costs. This 
problem has a wide range of applications. For example, 
a company might want to open its warehouses at some 
locations so that its total cost of opening warehouses 
and servicing customers is minimized. 

In various applications, the clients are differentiated 
according to the kind of service they require and to 
satisfy the service requirement of a client we have to 
assign it to a facility that can provide the service 
required by the client. For example, in the warehouse 
location problem above, the customers may be retail 
stores that request different kinds of supplies. A 
warehouse may store different kinds of supplies. To 
satisfy a customer we have to assign it to a warehouse 
that holds inventory of the type requested by the 
customer. We model such a setting by saying that 
in addition to facilities and clients, we have a set of 
services S. Each client j in Drequests a specific service 
g(j) ∈S. To satisfy client j we have to assign it to an 
open facility on which service g(j) is installed. Further, 
if we install service l on an open facility i we incur a 
service installation cost of fl.We want to open a set 

i 

of facilities, install services at the open facilities, and 
assign each client j to an open facility i such that service 
g(j) is installed at i. The cost of a solution is the sum of 
the facility opening costs, the service installation costs 
and the client assignment costs, and the goal is to find a 
solution with minimum total cost. We call this problem, 
Facility Location with Service Installation Costs.In 
the warehouse location problem, the service installation 
cost corresponds to the initial cost of setting up the 
warehouse to store the particular kind of inventory. The 
notion of service-dependent fixed costs is also used in 
inventory problems where one incurs a joint setup cost 
to start a new order and an item-dependent fixed cost 
to order a specific item, so one needs to coordinate 
the placement of item orders; see [1] for a survey. We 
assume throughout that the assignment costs cij form 
a metric. 


Applications and Related Work. Facility location 
with service installation costs can be used to model 
a data management/caching problem. Here we are 
given a set of locations in a network at which caches 
may be built and a set of processes located at the 
nodes of the network that request data items. Each 
process requests a specific data item. To satisfy the 
request we must assign the process to a cache that 
stores the requested data item, incurring an access 
cost proportional to the distance between the process 
site and the cache location. Building a cache at a 
location incurs a location dependent cost and storing 
a data item in a cache at a particular location incurs a 
cost that depends on the data item and the location. 
The goal is to build caches, store data items in the 
caches and assign each process to a cache containing the 
data item requested by the process, so as to minimize 
the total cost of building caches, storing data items 
and the access cost of requests. This is exactly the 
facility location problem with service installation costs 
where the caches are facilities, the processes are clients 
and the data items correspond to services. Baev & 
Rajaraman [3] considered a closely related problem 
called the data placement problem and gave a 20.5approximation 
algorithm. Here caches of fixed capacity 
are already built at certain locations and the goal is to 
find a placement of data items to caches respecting the 
cache capacities that minimizes the sum of the access 
costs and the cost of storing data items. The ratio has 
recently been improved by Swamy [16]. 

Facility location with service installation costs is a 
generalization of UFL — if there is just one service 
type then this is simply the uncapacitated facility 
location problem. There is a large body of literature 
that deals with designing approximation algorithms 
for metric UFL and we sample only a few results 
below; see [13] for a survey of this and earlier work. 
Shmoys, Tardos & Aardal [14] gave the first constant-
factor approximation algorithm for this problem using 
the filtering technique of Lin & Vitter [10] to round 
the optimal solution of a linear program. Chudak & 

 


Shmoys [5, 6] gave an LP rounding based 1+ 2 


e 

approximation algorithm. They combined randomized 
rounding and the decomposition results of [14] to get 
a variant that might be called clustered randomized 
rounding. Sviridenko [15] improved the ratio to 1.58. 
Jain & Vazirani [9] gave a combinatorial primal-dual 
3-approximation algorithm where the LP is used only 
in the analysis. The current best ratio for UFL is 

1.52 [11] obtained by building upon a dual-fitting based 
greedy algorithm of Jain, Mahdian, Markakis, Saberi & 
Vazirani [8]. 

Our Results. Our main result is a primal-dual 6approximation 
algorithm for the facility location problem 
with service installation costs under the assumption 
that there is an ordering on the facilities such that if i 
comes before i in this ordering then for every service 
type l, fl ≤ fil . This is reasonable in many settings; 

i 

for example, one expects the inventory setup cost of a 
warehouse in New York city to be less than the inventory 
setup cost in a remote town like Ithaca regardless of 
the kind of inventory. As special cases this includes the 
cases where the service installation cost fl depends only 

i 

on the service type l, or depends only on the location i. 
In the former setting where the service installation cost 
depends only on the service type we give an algorithm 
based on LP rounding that attains a much improved approximation 
ratio of 2.391. We show that with arbitrary 
service installation costs the problem becomes as hard 
as the set-cover problem. Combined with the result of 
Feige [7], this shows that no polynomial-time algorithm 
with a ratio of (1 − )ln |D| exists for this problem in 

O(log log n)]. We 

the general case unless NP ⊆ DTIME[n 
also consider the k-median version of the problem where 
there is an additional requirement that at most k facilities 
may be opened. We use our primal-dual algorithm 
to give a constant-factor approximation for this problem 
when the service installation cost depends only on 
the service type. 

Our Techniques. Facility location with service installation 
costs is a generalization of UFL. It differs however 
from traditional multi-level extensions of facility location 
where we assume that a demand can be assigned to 
any facility in any level. In our problem demand j may 
only be assigned to a level 1 “facility” (i, g(j)) and then 
to facility i in level 2. Moreover, existing techniques 
for UFL and multi-level facility location do not readily 
generalize. If there were no facility opening costs we 
could decouple the problem into several UFL instances, 
one for each service type, and solve each one separately. 
With facility opening costs this approach fares badly 
since we may end up opening a lot of facilities and spend 
too much on the facility opening costs. All known algorithms 
for UFL rely on the fact, either in the design or 
the analysis, that a client j can be moved from a facility 
i to another nearby facility i without increasing its 
assignment cost by much, and leaving the facility opening 
cost unchanged. In our problem, reassigning j to 
i may now require us to install service g(j)on i caus


g(j)

ing us to pay the installation cost f which could be 

i 

large. Technically the hard part is to find a way to reassign 
clients to nearby facilities so that we do not pay 
too much to install services at the new locations. With 
arbitrary service installation costs such a reassignment 


need not be possible since we can encode the constraint 
that a client may only be assigned to a specific set of 
facilities, making the problem set-cover hard. 

We build upon the primal-dual algorithm of [9] in a 
novel way and give a 6-approximation algorithm under 
the assumption that the facilities are ordered so that 
if i comes before i then fl ≤ fl for every service 

ii 

l. At a high level, the idea is to consider an integer 
programming formulation of the problem and the dual 
of its linear programming relaxation, and construct 
simultaneously an integer primal solution and a dual 
solution. Each client j has a dual variable αj which 
can be interpreted intuitively as the payment that j is 
willing to make to get itself assigned to an open facility. 

The Jain-Vazirani (JV) algorithm [9] for UFL works 
in two phases. In phase I we grow each dual variable αj 
uniformly and gradually build a primal feasible solution. 
Once αj becomes equal to cij for some facility i, j starts 
paying toward the facility opening cost of i. When the 
total contribution to i from the various clients equals 
fi, we declare i to be tentatively open and assign all the 
unassigned clients contributing toward i to i.Phase I 
ends when each client is assigned to a tentatively open 
facility. At this point a client could be contributing 
towards multiple tentatively open facilities. We call 
a set of facilities independent if each client contributes 
towards at most one facility in the set. In phase II we 
select a maximal independent subset of tentatively open 
facilities and open these. The analysis shows that if the 
facility i to which a client j was assigned in phase I is 
not opened, then there is a “nearby” open facility i to 
which j can be reassigned. 

Our algorithm also proceeds in phases. During 
phase I we tentatively open some facilities, tentatively 
install service l on some facilities for each service type 
l, and assign each client j to a tentatively open facility 
on which service g(j) is tentatively installed. Phase II 
is more involved. We have to select a set of facilities 
to open and install services in such a way that we can, 

(1) pay for installing the services, and (2) ensure that 
if a client j has to be reassigned there is a nearby 
open facility on which service g(j) is installed. We 
show that we can achieve properties (1) and (2) if we 
look at the facilities in a particular order and pick a 
maximal independent subset greedily. This gives us a 
6-approximation algorithm. 

Our primal-dual algorithm exploits the property 
that in the JV algorithm any maximal independent set 
of tentatively open locations may be picked. Although 
this is a well-known fact, to our knowledge this has 
been used in only a couple of applications previously. 
Bartal, Charikar and Raz [4] consider a clustering 
problem and use the JV algorithm to solve a relaxation 
of the problem by picking an appropriate maximal 
independent set. Archer, Rajagopalan and Shmoys [2] 
pick a maximum independent set and use this to prove 
a bound on the integrality gap of the k-median LP. 

When the service installation cost depends only on 
the service type, we give an LP-rounding algorithm 
that combines clustered randomized rounding [6] and 
the filtering based technique of [10, 14]. A feature 
of the algorithm is that we bound distances cij using 
both the αj bound due to complementary slackness and 
the bound obtained by filtering. This gives a better 
performance guarantee than that obtained by using 
either of the two bounds separately. Sviridenko [15] 
also used the two bounds in conjunction to improve the 

 


approximation ratio for UFL from 1+ 2 to 1.58. 

e 

2 A Linear Program 

We can formulate the problem as an integer program 
and relax the integrality constraints to get a linear 
program. We use i to index the facilities in F, j to 
index the clients in D and l to index the services in S. 

 


l

min fiyi + fil y + cij xij (P) 

i 
i il ji 

 


s.t. xij ≥ 1 ∀j 

ig(j)

xij ≤ y ∀i, j 

i 

xij ≤ yi ∀i, j 

l 

xij ,yi,yi ≥ 0 ∀i, j, l. 

l

Variable yi indicates if facility i is open, y indicates if 

i 
service type l is installed at i,and xij indicates if client j 
is connected to facility i. The first constraint states that 
each client must be assigned to a facility, the second and 
the third constraints say that if client j is assigned to 
facility i, then service g(j)mustbeinstalledon i and i 
must be open. An integral solution corresponds exactly 
to a solution to our problem. Let Gl be the set of clients 

requesting service l. The dual program is, 
max αj (D) 
j 
s.t. αj ≤ cij + βij + θij ∀i, j (2.1) 
θij ≤ fl 
i ∀i, l 
j∈Gl 
βij ≤ fi ∀i (2.2) 
j 
αj ,βij ,θij ≥ 0 ∀i, j. 

Intuitively αj is the budget that j is willing to spend 
to get itself assigned to an open facility. Constraint 
(2.1) says that a part of this goes towards paying for 


the assignment cost cij . The rest gets divided into 
a payment for the service installation cost θij ,and a 
payment for the facility opening cost βij . 

3 A Primal-Dual Algorithm 

We consider instances of the problem where there is an 
ordering on the facilities in F such that if i comes before 
i in this ordering then for every service type l, fl ≤ fl 

ii . 
Equivalently this means that for any two locations i, i , 

 T T

the vectors fl and fl are comparable. 

ii

l=1...|S| l=1...|S| 

Let O denote this total ordering on the facilities. We 
say that i ≤ i if i comes before i in the ordering O. 

The algorithm is strongly motivated by the primal-
dual algorithm of Jain and Vazirani for the traditional 
uncapacitated facility location algorithm. In our algorithm, 
we first construct a feasible dual solution, and 
then use this dual solution to extract a feasible (integer) 
primal solution. As has been the norm, our algorithm 
is a dual ascent algorithm, so all dual variables are only 
increased throughout the execution of the algorithm. 

We next describe the algorithm. There is a notion 
of time around which the algorithm is specified. We 
start at time t = 0, all dual variables are initialized 
to 0, each demand j is said to be unfrozen,and all 
facilities are closed. At first, the variables that are 
increased are the αj s; more precisely, for any unfrozen 
demand j, αj is always equal to the time t.We say 
that demand j is tight with facility i, or has reached i, 
if αj ≥ cij . As time increases, we will freeze demand 
points, tentatively open facilities, and tentatively install 
services at facilities. We increase the αj of each demand 
j until one of the following events happens: 

1. Suppose that demand j becomes tight with facility 

i. If service g(j) is not tentatively installed at i, 
then we start increasing θij at the same rate as αj ; 
that is, if αj = t,then θij = t−cij . If service g(j)is 
tentatively installed, but i is not tentatively open, 
we instead increase βij at the same rate as αj ;that 
is, if αj = t,then θij remains 0, but βij = t − cij . 
Finally, if service g(j) is tentatively installed, and i 
is tentatively open, we freeze demand point j (and 
no longer increase αj ). 

2. Suppose that for a facility i and a service type l,we 

 


get that j∈Gl θij = fil: in this case, we tentatively 
install service l at i.If i is also tentatively open, 
then we freeze each demand j ∈ Gl that is tight 
with i (and no longer increase αj ). If i is not yet 
tentatively open, then for each demand j ∈ Gl that 
is tight with i, we no longer increase θij , but instead 
start increasing βij at the same rate as αj . 

 


3. Suppose that for a facility i, j βij = fi:in this 

case, we tentatively open i. For each demand j,we 
do not increase βij from now on. If demand j is 
tight with i and service g(j) is tentatively installed 
at i, we freeze j (and no longer increase αj ). 

We only raise the αj ,βij ,θij of unfrozen demands. 
Frozen demands do not participate in any events. We 
continue this process until all demands become frozen. 
Let (α, β, θ) denote the final dual solution obtained by 
the above process. Observe that if i is the facility that 
caused j to freeze, then service g(j) must be tentatively 
installed at i,and i must be tentatively open. 

We now specify which facilities to open, how to 
install services on facilities, and how to assign demands 
to facilities. Let F be the set of tentatively open 
facilities, and let Fl ⊆ F be the set of tentatively open 
facilities on which service l is tentatively installed. For 
facility i ∈ F ,let ti be the time at which i became 
tentatively open. If i ∈ Fl,let til be the time at which 
service l was tentatively installed at i. 

Opening facilities. We open a subset of facilities from 
F .We say that i, i ∈ F are dependent if there is a 
demand j such that both βij and βij are positive. We 
consider the facilities in F in the order given by O and 
pick a maximal independent set of facilities, F ⊆ F . 
We open the facilities in F . 

Installing services. Consider service type l and the 
set of facilities Fl. We say that facilities i, i ∈ Fl are 
service-l-dependent if there exists some demand j in Gl 
such that both θij and θij are positive. We pick a 
maximal independent subset Fl by looking at facilities 
in Fl in a particular order: first we consider facilities in 
Fl F in increasing order of til, and then facilities in 
Fl − F in increasing order of ti. 

We first install service l on all facilities in Fl F . 
Furthermore, for each i ∈ Fl − F , we pick a facility 
i ∈ F such that i and i are dependent and i ≤ i (in 
the ordering given by O), and install service l on facility 
i .We say that i is the neighbor of i and denote it by 
nbr(i). Note that nbr(.) depends only on i and not on 
the service type l:if i/∈ F , then we can choose a single 
facility i ∈ F such that i ≤ i regardless of the service 
type l. 

Assigning demands. We assign each client j to the 
nearest open facility at which service g(j) is installed. 

3.1 Analysis. We now bound the performance of our 
algorithm. The following lemma just says what it means 
for a demand j to get frozen. 

Lemma 3.1. Let i be the facility that causes a demand 


j to freeze. Then, i is tentatively open, service g(j) is 
tentatively installed at i,and αj =max(cij ,ti,tig(j)). 

We start by bounding the cost incurred in opening 
facilities, and installing services. Let D be the subset 
of demands {j : ∃i∈ F s.t. βij >0}. 

Lemma 3.2. The cost of opening facilities is at most 

 


αj . Furthermore, the cost of installing services 

j∈D 
is at most j αj . 

Proof. For each facility i that is tentatively opened, we 

 


have that fi = By the construction of F ,

j∈D βij . 
we know that for each demand j, there is at most one 
i ∈ F such that βij > 0. Summing over all facilities, 
and using this fact, we see that 

 


fi = βij = βij ≤ αj . 

i∈Fi∈Fj∈D j∈Di∈Fj∈D 

By the definition of nbr(i), we know that for any service 
type l, fl ≤ fl.For each i ∈ Fl , we install service 

nbr(i) i 

l either at i ∈ Fl or at nbr(i), and so we can upper 
bound the total cost of installing services of type l by 

 


fil . Since each service l is tentatively installed 

i∈F

l 

 


only when fl = j∈Gl θij , wehavethat fl = 

ii∈Fi

l 

 


θij . The notion of service-l-dependence 

j∈Gl i∈F 

insures that 
l for each demand j ∈ Gl,there is at most 
one facility i ∈ Fl for which θij is positive. We obtain 
that the total cost of installing service l is at most 

 


j∈Gl αj , which immediately implies the lemma. 

We next bound the assignment cost incurred by the 
solution computed. The following facts, which follow 
directly from the construction of the algorithm, will be 
useful in this analysis. 

Fact 3.1. Suppose that βik is positive. Then it follows 
that cik ≤ αk − βik and αk ≤ ti. 

Fact 3.2. Suppose that θik is positive. Then cik ≤ 
αk − θik and cik <tig(k).If βik =0 then αk ≤ tig(k). 

For example, we use these in deriving the following 
bounds. 

Claim 3.1. If i and i are dependent facilities in F, 
then cii <2min(ti,ti ). 

Proof. Let k be a client such that βik and βik are 
positive. Applying Fact 3.1 for both of these, and 
applying the triangle inequality, we get that cii < 
2αk ≤ 2min(ti,ti ). 

Claim 3.2. Let i,i ∈ Fl be service-l-dependent due to 
demand k ∈ Gl.Then cii <2max(til,til) and both cik 
and cik are less than αk. 

Proof. From the dependence of i and i, it follows that 
θik and θik are positive. Applying Fact 3.2 for both 
of these, and using the triangle inequality, we get that 
cii <2max(til,til), cik <αk and cik <αk. 

Lemma 3.3. If j ∈ D , then the assignment cost incurred 
for j is at most 3αj ;if j ∈ D , then the assignment 
cost incurred for j is at most 5αj . 

Proof. We will show that there always exists some open 
facility with service g(j) installed that is no further from 
j than the claimed bound (and hence the closest one, 
to which j is assigned, is no further away). 

Consider j ∈ D with g(j)= l.Let i be the unique 
facility in F for which βij is positive. If i ∈ Fl ,then 
we have installed service l at i,and cij ≤ αj − βij . 
Otherwise, i and i are service-l-dependent for some i 
in F Fl with til ≤ til. So, by Claim 3.2, cii < 
2max(til,til)=2til.Since βij > 0, it follows that 
til ≤ αj − βij . So by the triangle inequality, cij ≤ 3αj . 

Now consider a demand j/∈ D , and again let 
g(j)= l.Let i be the facility that caused j to freeze, 
and so αj =max(cij ,ti,til). If i ∈ F Fl ,then 
service l is installed at i,and cij ≤ αj . Suppose that 
i ∈ Fl − F ,and let i = nbr(i) (an open facility at 
which service l is installed, see Fig. 1a). By Claim 3.1, 
cii ≤ 2ti =⇒ cij ≤ 3αj . 

Next suppose that i ∈ Fl .Since i ∈ Fl,there must 
exist i ∈ Fl such that i was not picked in Fl because 
i and i are service-l-dependent due to a client k.If i 
is also in F (Fig. 1b), then service l is installed there. 
Applying Claim 3.2, we obtain that both cik and cik are 
less than αk ≤ max(ti,til) ≤ αj , and hence cij ≤ 3αj . 
However, if i ∈ F (so i/∈ F ), then let i = nbr(i); 
service lis installed at i (Fig. 1c). Since i and iservicel-
dependent, we have that ti ≤ ti, cii ≤ 2min(ti ,ti ) 
(Claim 3.1), and cij ≤ 3αj as above, which implies that 
cij ≤ 5αj . This completes the proof. 

Theorem 3.1. The above algorithm returns a solution 

 


of cost at most 6 j αj ≤ 6 · OPT . 

Proof. Follows from Lemma 3.2 and Lemma 3.3. 

4 An LP-Rounding Algorithm 

We now give an algorithm for the special case where 
fl = fl , i.e., the installation cost depends only on 

i 

the service l and not on the location at which it is 
installed. Note that this case is handled by the primal-
dual algorithm above. Here we adapt the rounding 
procedure of [6] to give deterministic and randomized 
approximation algorithms achieving ratios of 6 and 

2.391 respectively. 

Let (x,y)and (α,β,θ) be the optimal solutions 
to (P) and (D) respectively. We can ensure that for 


(a) (b) 

j 
k 
i 
i 
Facility icaused jto freeze, 
iand i are service-l-dependent, 

j 
k 
i 
i 
(c) 
j 
k 
k 
i 
i ≤ αj 
≤ ti 
≤ αk ≤ αk 
i ≤ ti 
i∈ F − F,i = nbr(i) i/∈ F

ll , 

i = nbr(i)i,iare service-l-dependent 

Figure 1: Bounding the assignment cost of j. (a), (b) Different 3-hop cases, and (c) the 5-hop case. 

g(j) l

every i, j and l, xij =0 or xij = y and y =0or 

ii 

l

y = yi. We will round the fractional solution (x, y)

i 

to an integer solution losing a factor of at most 6. Let 
Fj = {i : xij > 0}. We describe the algorithm briefly. 

A1. First, for every service type l, we consider the 
clients in Gl and cluster the facilities on which 
service l is installed around some cluster centers: 
pick j ∈ Gl with smallest αj value and form a 
cluster around j consisting of the facilities in Fj . 
We remove every client k ∈ Gl (including j)that is 
assigned (fractionally) to some facility in the cluster 
created, and recurse on the remaining set of clients 
until no client in Gl is left. Let Dl be the set of 
cluster centers. 


A2. Let D = Dl. We cannot open a facility in every 

l 

cluster since different clusters could share the same 
fractional facility weight (yi) if the cluster centers 
request different services. Say that j, k ∈ D are 
dependent if Fj Fk = φ. Note that this can 
only happen if j and k request different services. 
We consider clients in D in order of increasing αj 
and pick a maximal independent subset D .For 
each client j ∈ D , we open the facility in Fj with 
smallest fi and install service g(j)on it. Further 
for every k ∈ D − D ,there is some j ∈ D with 
αj ≤ αk such that j and k are dependent. We 
pick some such j and install service g(k)onthe 
facility opened from Fj .Call j the neighbor of k 
and denote it nbr(k). 

A3. We assign demand j to a facility as follows: (i) if 
j ∈ D it is assigned to the facility opened from Fj , 

(ii) if j ∈ D − D ,then nbr(j)= k ∈ D and j is 
assigned to the facility opened from Fk, and (iii) 
if j/∈ D,there is some k ∈ Dg(j) ⊆ D such that 
αk ≤ αj and j was removed from Gl in step A1 
because a cluster was created around k;we assign 
j to the same facility as k. Note that this is a 
feasible assignment of demands to facilities. 

Lemma 4.1. The facility opening cost is at most 

 


fiyi.

i 

Proof. The clusters corresponding to clients in D are 
disjoint and we open the cheapest facility in each cluster 

— the lemma follows. 

Lemma 4.2. Thecost of installingservices is atmost 

 


fll 

i yi.

i,l 

Proof. The cost of installing a service is independent 
of the location at which it is installed, and for 
any service type l, we install service on at most one 
(new) location per cluster center in Dl. So the total 

 


cost of installing services is at most fl = 

lj∈Dl 

 


fl fll

yi ≤ i yi.

lj∈Dl i∈Fj i,l 

Lemma 4.3. The assignment cost of client j is at most 
5αj . 

Proof. If j ∈ D , we assign j to a facility i ∈ Fj ,and 
cij ≤ αj by complementary slackness. If j ∈ D − D 
and nbr(j)= k ∈ D , we assign j to the facility i opened 
from Fk.Since αk ≤ αj and cjk ≤ 2αj , cij ≤ 3αj .If 
j/∈ D, j is assigned to the same facility as k where 
k ∈ Dg(j) ⊆ D and j was removed from Gg(j) in step 
A1 because a cluster was created around k .So αk ≤ αj 
and cjk ≤ 2αj .From above, k is assigned to a facility 
i with cik ≤ 3αk ,so cij ≤ 5αj . 

Thus we have proved the following theorem. 

Theorem 4.1. The cost of the solution returned is at 
most 6 · OPT. 


4.1 Improvement using randomization. We give 
a rounding procedure that combines clustered randomized 
rounding [6] and the filtering based technique 
of [10, 14]. We define some notation first. Let 0 < <1 

1

be a parameter that we will set later and r = .Sort 

γ 
the facilities in Fj by increasing cij .Let i be the first 

 


facility in this ordering such that xij ≥ .

i∈Fj :cij ≤cij 
Let Nj be the subset of Fj consisting of all facilities (including 
i) that come before i in this ordering. Define 

 

¯ 

Cj ( )= cij and Cj = i cij xij . To simplify things 

 


we assume that each yi ≤ and for any j, i∈Nj yi 
is exactly .If some yi > , then we can create at 
most 1/ copies of i and set yi ≤ for each of the 

 


copies so that yi = yi (setting the variables 

copies i 

 

l

xij ,yi accordingly). Similarly if i∈Nj yi > ,we can 
take the facility i in Nj that comes last in the ordering 
in Fj and split it into two copies i1 and i2 setting 

 


yi2 = i∈Nj yi − , yi1 = yi −yi2 (and the other variables 
accordingly). We include only i1 in Nj thus en

 


suring that i∈Nj yi = . The cost of the fractional solution 
remains unchanged by these transformations and 
any solution to the modified instance gives a solution to 
the original instance of no greater cost. 

R1. This is the same as step A1 except that we choose 
j ∈Gl with smallest 2αj +Cj ( )+C ¯ 
j as the cluster 
center. This gives us a set of cluster centers Dl for 
each service type l. 

 


R2. Weprune theset D = as in step A2 but

l Dl 

modify the notion of dependency to say that j,k ∈ 
D are dependent if Nj Nk = φ, and consider the 
clients in D in increasing order of Cj ( )+ C ¯ 
j .For 
k ∈D−D we define nbr(k) as before. We call the 
facilities in Nj for clients j ∈ D central facilities, 
and the rest as non-central facilities. 

R3. For every client j ∈ D we randomly open exactly 
one facility in Nj by choosing facility i with proba

 


bility yi/ i∈Nj yi = r·yi. This facility now serves 
as a backup facility for all the clients that would get 
assigned to this facility in step A3 of the deterministic 
algorithm. 

R4. Independent of step R3, each non-central facility i 
is opened independently with probability r·yi. 

R5. For any facility i, be it a central or a non-central 
facility, if iis opened (in R3 or R4), we install on it 
all services that are installed on it in the fractional 

l

solution, i.e., all l such that yi >0. 

R6. For every client j ∈ D−D , if no facility from Fj is 
open, we install service g(j) on the facility opened 
in R3 from Nnbr(j). 

R7. We assign demand j to the nearest open facility at 
which service g(j) is installed. 

Lemmas 4.1 and 4.2 are modified to the following. 

Lemma 4.4. The expected cost of opening facilities is 

 


r· fiyi. The expected cost of installing services is at

i 


1 fll

most r+ i yi. 

er i,l 

Proof. Each facility i is opened with probability r ·yi. 
The cost of installing services in step R5 is bounded 

 


by Pr[i is opened (in R3 or R4)] l fl = · 

il:y>0 ii r 

i 

 


fl flll l

yi l = r· i y since yi >0=⇒ y = yi.

l:y>0 i i,l i i

i 

Consider client j ∈D−D with g(j)= l. For every 
non-central facility i ∈ Fj ,let Ei be the event that i is 
opened in step R4 and pi =Pr[Ei]= r ·yi. For every 
cluster center k ∈ D such that Sk = Fj Nk = φ,let 
Ek be the event that a facility from Sk is open after 

􀀀 

yi 
step R3. Let pk =Pr[Ek ]= 􀀀 
i∈Sk = r · yi. 

yi i∈Sk

i∈Nk 

Let m be the total number of events. All the events 
Ei are independent. The probability that service l is 
installed in step R6 due to client j, is the probability 
that no facility from 
Fj is open after steps 
􀀀 
R3 and R4, 

m 

− pn −r

which is at most (1 − pn) ≤ e n = e . 

n=1 

So the cost of installing services in step R6 is at most 

 


1 fg(j) ≤ 1 fll l 

i y since y =1 and 

er j∈D−Der i,l i i∈Fj i 
any two clients in Dl have disjoint Fj . 

To bound the assignment cost, we bound the assignment 
cost incurred under a provably worse way of 
assigning demands to facilities. Demand j is assigned to 
a facility as follows. If some facility i ∈ Fj is open, we 
assign demand j to the nearest such facility. Otherwise 
if j ∈ D − D , j is assigned to its backup facility. If 
j/∈ D, there is some client k ∈ Dg(j) ⊆ D such that j 
was removed from Gg(j) because a cluster was formed 
around k in step R1. We assign j to the same facility as 
k;so j may be assigned either to a facility in Fk or to its 
backup facility in Nnbr(k),if k/∈D and no facility from 
Fk is open. Note that service g(j)isinstalledonthe 
facility to which j is assigned. We need the following 
lemma from [6] (see also [15]). 

Lemma 4.5. Let d1 ≤ d2 ≤ ... ≤ dm and 0 ≤ pn ≤ 1 
for n=1,... ,m.Then, 

p1d1 +(1−p1)p2d2 +···+(1−p1) ···(1 −pm−1)pmdm 

 


n≤m pndn 

≤ 
1 − (1 −pn) . 
n≤m pn 

n≤m 

Lemma 4.6. Let j be any demand. Let X be the 
distance between j and the facility assigned to it and 
Z be the event that no facility i ∈ Fj is open. Then, 


¯

(i) If j/∈ D, E X|Z ≤ 3αj + Cj ( )+ Cj , (ii) 

1

E X ≤C ¯ 
j + (3αj + Cj ( )). 

er 

 


Proof. If j ∈ D ,E X = i∈Nj cij xij / i∈Nj xij ≤ 

¯ 

Cj since every facility in Fj −Nj is farther from j than 
every facility in Nj .For j/∈ D , we show (i) and use it 
to prove (ii). 

Suppose j ∈ D−D , k = nbr(j)and A= Nj Nk = 
φ. For any facility i ∈ A we have cij ≤ αj and 
cik ≤ Ck( ). Let B be the distance between j and 
its backup facility in Nk.Event Z implies that j is 
assigned to the backup facility in Nk so conditioned on 
Z, X = B.If there is some i∈ A such that cik ≤C ¯ k we 

¯

have a deterministic bound of B ≤ αj + Ck + Ck( ). 
If there is no such i in A, since the unconditional 
distance between k and the backup facility in Nk is at 

¯

most Ck , by conditioning on Z we are only removing 
weight from facilities that are farther than the average 
distance. So the conditional expected distance between

¯ 

k and the backup facility is at most Ck implying that 
¯

E B|Z =E X|Z ≤ αj + Ck ( )+ Ck.In either case 
¯

E X|Z ≤ αj + Cj ( )+ Cj , where the last inequality 
follows since we look at clients in D in increasing order 
of Cj ( )+ C ¯ 
j and k was picked before j. 

If j/∈ D, there must be a client k ∈ Dg(j) such that 
j was removed from Gg(j) because a cluster was formed 
around k in step R1. So j,k are assigned to the same 

¯¯

facility, 2αk + Ck ( )+ Ck ≤ 2αj + Cj ( )+ Cj and 
cjk ≤αj + αk . If a facility in Fk is open then we have 
a deterministic bound of X ≤ αj +2αk .Otherwise 
j,k are assigned to the backup facility for k and by 
the above bound on E X|Z for k , the conditional 
expected distance from j to the backup facility is at 
most αj +2αk + Ck ( )+ C ¯ 
k ≤3αj + Cj ( )+ C ¯ 
j . 

We now prove part (ii). For a non-central facility 
i∈ Fj ,let pi and Ei be as defined in Lemma 4.4 and let 
di = cij . For every k ∈D such that Sk = Fj Nk = φ, 
let pk,Ek be as defined in Lemma 4.4 and define dk = 

 


E distance from j to Sk|Ek = cij yi/yi.

i∈Sk i∈Sk 

Let the distances be ordered so that d1 ≤d2 ≤...≤dm 
where there are m events in all. Since the events Ei are 

 

m

independent and yi = xij , p=Pr[Z]= (1 −pn) ≤ 

􀀀 n=1 

− pn

n

e = e−r.So, 

E X ≤ p1d1 +(1 −p1)p2d2 + ··· 

+(1 −p1) ···(1 −pm−1)pmdm + p·E X|Z 

 


n≤m nn 

≤ 
pd 
(1 −p)+ p(3αj + Cj ( )+ C ¯ 
j ) 

p

n≤mn 

1¯

≤ Cj + (3αj + Cj ( )). 

r

e 

Theorem 4.2. The randomized algorithm produces a 
where r =1/ .Taking =0.67674 we get a solution of 
cost at most 2.391 ·OPT. 

solution of expected cost at most, 
4 1 
max r+ ,1+ 
r re(1 − )e3 
+ 
re·OPT 

Proof. From the definition of Cj ( )and the Markov 

¯ 

property we have, Cj ( ) ≤ Cj . The proof now follows 

1−γ 

from Lemmas 4.4 and 4.6. 

5The k-Median Variant 

We consider a variant of this problem where we have 
the additional constraint that at most k facilities may 

 


be opened. This adds the constraint i yi ≤ k to the 
linear program (P). The objective function of the dual 

 


(D) gets modified to max j αj −kz and constraint 

 


(2.2) changes to j βij ≤ fi + z.Let (KP) and (KD) 
be the modified primal and dual programs and OPTK 
be the value of an optimal k-median solution. 

5.1 A modified primal-dual algorithm. We first 
modify the primal-dual algorithm of Section 3 to obtain 
the stronger guarantee that we return a solution to (P) 
of cost (O,I,C), and a solution (α,β,θ)to (D) such 

 


that 6O + I + C ≤ 6 j αj ≤ 6 ·OPT ,where O,I,C 
denote respectively the facility opening cost, the service 
installation cost and the client assignment cost. We 
describe the algorithm briefly and sketch the analysis. 
The dual ascent process is the same as in Section 3 but 
we modify the way in which we open facilities and install 
services to ensure that a demand j does not pay for both 
opening a facility and for installing a service at some 
other facility. To do this we consider a more detailed 
notion of dependence between facilities. We classify 4 
types of dependence between facilities. Say that the 
ordered pair (i,i)is, 

(1) ff-dependent (f for facility) if there is a demand j 
such that βij ,βij >0. 
(2) sf-l dependent (s for service) if there exists j ∈ Gl 
such that θij ,βij >0. 
(3) ss-ldependent if for some j ∈Gl,both θij ,θij >0. 
(4) fs-l dependent if for some j ∈Gl,both βij ,θij >0. 


Recall that F is the set of tentatively open facilities, 
Fl ⊆ F the set of tentatively open facilities on which 
service l is tentatively installed, ti is the time at which 
facility i ∈ F became tentatively open, and for i ∈ Fl, 
til is the time at which service lwas tentatively installed 
at i. Initially for each facility i∈F,let Si be the set of 
services that are tentatively installed at i. 


M1. We first pick a set F ⊆ F of facilities to open, 
and for each i ∈ F aset Ti ⊆ Si of services to 
install at facility i. Initially F = φ and Ti = φ for 
all i. We consider facilities in F in the order given 
by O. While F = φ, 

1. Let i ∈F be the currently considered facility. 
Remove i from F ,set F ←F ∪{i},Ti = Si. 
2. For each i ∈ F we do the following. 
a) If (i, i ) is ff-dependent or ∃l ∈ Ti s.t. 
(i, i )is sf-l dependent or ∃l ∈ Si s.t. 
(i, i )is fs-l dependent and til <ti ,set 
F ← F −{i }.Call i the neighbor of i 
and denote it by nbr(i ). Otherwise, 
b) For every l ∈Si ,if (i, i )is fs-l dependent 
(so til ≥ ti ) or l ∈ Ti and (i, i ) is ss-l 
dependent, set Si ←Si −{l}. 






We open the facilities in F and for each i ∈ F 

install all of the services in Ti at i. 

M2. We now install services at some more facilities. 
Consider service type l.Let Al be the facilities in 
F at which service l is installed (i.e., l ∈Ti). Note 
that Al ⊆ F Fl.Let Bl = Fl −F .We remove 
from Bl every facility i for which there is some 
facility i ∈F such that (1) (i, i )is fs-l dependent, 
or (2) i ∈ Al and (i, i ) is ss-l dependent. We say 
that a set of facilities is ss-l independent if no pair 
(i, i ) of facilities from the set is ss-l dependent. We 
pick a maximal ss-l independent subset Fl ⊆ Bl. 
Initially Fl = φ. We consider facilities in Bl in 
increasing order of ti and add facility i to Fl if 
Fl ∪{i}remains ss-l independent. For every i ∈Fl 
we install service l on facility nbr(i) ∈ F . 

M3. Each client j is assigned to the nearest open facility 
at which service g(j) is installed. 

Analysis Sketch. Consider the set of demands D = 

{j : ∃i ∈ F s.t. βij > 0}. By the construction of F ,we 
know that for each demand j thereis atmostone i ∈F 
such that βij > 0; for j ∈D let o(j) denote this unique 
facility in F . By design we ensure that any demand 
j ∈ Gl contributes θij > 0 for at most one facility 
i ∈ Al ∪ Fl and further that if j ∈ D then i = o(j) 
is the only such facility. This gives the following. 

Lemma 5.1. The cost of opening facilities is 

 


βo(j)j . The cost of installing services is at

j∈D 
most θo(j)j + αj .

j∈D j/∈D 

The bound on the assignment cost incurred is 

similar to the bound in Lemma 4.3 and is proved 

similarly. 

Lemma 5.2. If j ∈ D , the assignment cost of j is at 
most 3(αj − βo(j)j ).If j/∈ D the assignment cost 
incurred for j is at most 5αj . 

Combining the above two lemmas and the fact that 
for j ∈ D , αj = co(j)j + βo(j)j + θo(j)j ,we get the 
following theorem. 

Theorem 5.1. The solution returned has cost (O, I, C) 

 


such that 6O + I + C ≤6 j αj ≤6 ·OPT . 

5.2 An 18-approximation algorithm. Suppose we 
fix z and run the above primal-dual algorithm with the 
facility costs modified to fi + z. Suppose the algorithm 
returns a primal solution of cost (O, I, C)thatopens 
k facilities and a dual solution (α, β, θ). Here O is the 
facility opening cost with the original costs fi. Then, 
we can show that we have a solution of cost at most 
6·OPTK .since (α, β, θ, z) is a feasible solution to (KD) 

 


and by Theorem 5.1, 6(O+kz)+I +C+ ≤6 αj =⇒ 

 
j

6O+I+C ≤6( j αj −kz) ≤6·OPTK . So our goal is to 
find such a z in polynomial time. We do not quite know 
how to do this, instead we find two values z1 >z2,close 
together such that the algorithm opens k1 <k facilities 
for z1 and k2 >k facilities for z2.When z is very 
large, e.g., |D|(maxij cij +maxil fl), the algorithm will 

i 

open just one facility and at z = 0 the algorithm opens 
>k (we assume this — otherwise the solution at z =0 
costs at most 6·OPTK since (α, β, θ, 0) is a feasible dual 
solution). We perform a bisection search in this range to 
find z1 and z2. We combine these two solutions to first 
get a fractional solution of cost 6(1 − )−1 ·OPTK ,for 
some small , that (fractionally) opens k facilities and in 
which each demand is assigned to at most two facilities. 
If the service cost depends only on the service type, we 
can round this solution losing a factor of 3(1 − )to 
get an integer solution that opens exactly k facilities, 
thus getting an overall approximation ratio of 18. This 
idea was usedin[9] forthe k-median variant of UFL. 
However our final rounding procedure differs from the 
one in [9] since we also have to pay for installing services 
and need to ensure that demand j is connected to an 
open facility at which service g(j) is installed. 

Theorem 5.2. If the service installation cost depends 
only on the service and not on the location, we can get 
an 18-approximation algorithm for the k-median variant 
of facility location with service installation costs. 

6Extensions 
Arbitrary Demands. Our results carry over to the 
case where instead of unit demands, client j may have a 
demand dj ≥0. We can reduce this to the unit demand 
case by making dj copies of client j, but this makes 


the algorithm run in pseudo-polynomial time. We can 
simulate this reduction however. In the primal-dual 
algorithm we raise αj at rate dj and say that j has 
reached i if αj ≥ dj cij . In the LP rounding algorithms 
of Section 4 the only change is that αj gets replaced 
with αj /dj in steps A1, A2 and R1. The analysis in 
Sections 3, 4 and 5 extends in a straightforward way 
and we get the same approximation guarantees. 

Acknowledgments. We thank Robin Roundy for 
stimulating discussions and helpful suggestions. 

References 

[1] N. Aksoy and S. Erenguc. Multi-item inventory models 
with coordinated replenishments: a survey. International 
Journal of Operations & Production Management, 
8:63–73, 1988. 

[2] A. Archer, R. Rajagopalan and D. Shmoys. Lagrangian 
relaxation for the k-median problem: new insights and 
continuity properties. In Proceedings of 11th ESA, 
pages 31–42, 2003 

[3] I. Baev and R. Rajaraman. Approximation algorithms 
for data placement in arbitrary networks. In Proceedings 
of the 12th Annual ACM-SIAM Symposium on 
Discrete Algorithms, pages 661–670, 2001. 

[4] Y. Bartal, M. Charikar and D. Raz. Approximating 
min-sum k-clustering in metric spaces. In Proceedings 
of the 33rd Annual ACM Symposium on Theory of 
Computing, pages 11–20, 2001. 

[5] F. A. Chudak. Improved approximation algorithms for 
uncapacitated facility location. In Proceedings of 5th 
IPCO, pages 180–194, 1998. 

[6] F. Chudak and D. Shmoys. Improved approximation 
algorithms for the uncapacitated facility location problem. 
SIAM Journal on Computing.To appear. 

[7] U. Feige. A threshold of ln n for approximating set 
cover. Journal of the ACM, 45:634–652, 1998. 

[8] K. Jain, M. Mahdian, E. Markakis, A. Saberi, and 

V. Vazirani. Greedy facility location algorithms analyzed 
using dual-fitting with factor-revealing LP. Journal 
of the ACM.To appear. 

[9] K. Jain and V.V. Vazirani. Approximation algorithms 
for metric facility location and k-median problems using 
the primal-dual schema and Lagrangian relaxation. 
Journal of the ACM, 48:274–296, 2001. 

[10] J. H. Lin and J. S. Vitter. -approximations with 
minimum packing constraint violation. In Proceedings 
of the 24th Annual ACM Symposium on Theory of 
Computing, pages 771–782, 1992. 

[11] M. Mahdian, Y. Ye, and J. Zhang. Improved approximation 
algorithms for metric facility location. In Proceedings 
of 5th APPROX, pages 229–242, 2002. 

[12] P. Mirchandani and R. Francis, eds. Discrete Location 
Theory. John Wiley and Sons, Inc., New York, 1990. 

[13] D. B. Shmoys. Approximation algorithms for facility 
location problems. In Proceedings of 3rd APPROX, 
pages 27–33, 2000. 

´ 

[14] D. B. Shmoys, E. Tardos, and K. I. Aardal. Approximation 
algorithms for facility location problems. In 
Proceedings of the 29th Annual ACM Symposium on 
Theory of Computing, pages 265–274, 1997. 

[15] M. Sviridenko. An improved approximation algorithm 
for the metric uncapacitated facility location problem. 
In Proceedings of 9th IPCO, pages 240–257, 2002. 

[16] C. Swamy. A note on the data placement problem. 
Unpublished manuscript, 2003.