A constant approximation algorithm for 
the a priori traveling salesman problem 

David Shmoys1? and Kunal Talwar2 

1 Cornell University, Ithaca NY 14853 

shmoys@cs.cornell.edu 

2 Microsoft Research, Silicon Valley Campus, 1065 L’Avenida, Mountain View, CA 
94043 

kunal@microsoft.com 

Abstract. One of the interesting recent developments in the design and 
analysis of approximation algorithms has been in the area of algorithms 
for discrete stochastic optimization problems. In this domain, one is given 
not just one input, but rather a probability distribution over inputs, and 
yet the aim is to design an algorithm that has provably good worst-case 
performance, that is, for any probability distribution over inputs, the 
objective function value of the solution found by the algorithm must be 
within a specifed factor of the optimal value. 

The a priori traveling salesman problem is a prime example of such a 
stochastic optimization problem. One starts with the standard traveling 
salesman problem (in which one wishes to fnd the shortest tour through 
a given set of points N), and then considers the possibility that only a 
subset A of the entire set of points is active. The active set is given probabilistically; 
that is, there is a probability distribution over the subsets 
of N, which is given as part of the input. The aim is still to compute a 
tour through all points in N, but in order to evaluate its cost, we instead 
compute the expectation of the length of this tour after shortcutting it 
to include only those points in the active set A (where the expectation is 
computed with respect to the given probability distribution). The goal 
is to compute a “master tour” for which this expectation is minimized. 
This problem was introduced in the doctoral theses of Jaillet and Bertsimas, 
who gave asymptotic analyses when the distances between points 
in the input set are also given probabilistically. 

In this paper, we restrict attention to the so-called “independent activation” 
model in which we assume that each point j is active with a 
given probability pj , and that these independent random events. For 
this setting, we give a 8-approximation algorithm, a polynomial-time algorithm 
that computes a tour whose a priori TSP objective function 
value is guaranteed to be within a factor of 8 of optimal (and a randomized 
4-approximation algorithm, which produces a tour of expected cost 
within a factor of 4 of optimal). This is the frst constant approximation 
algorithm for this model. 

? Research supported partially by NSF grants CCR-0635121 & DMI-0500263. 


1 Introduction 

One of the interesting recent developments in the design and analysis of approximation 
algorithms has been in the area of algorithms for discrete stochastic 
optimization problems. In this domain, one is not given just one input, but 
rather a probability distribution over inputs, and yet the aim is to design an 
algorithm that has provably good worst-case performance, that is, for any probability 
distribution over inputs, the objective function value of the solution found 
by the algorithm is within a specifed factor of optimal value. 

The a priori traveling salesman problem is a prime example of such a stochastic 
optimization problem. This is a stochastic generalization of the (standard) 
traveling salesman problem (TSP). In the traditional deterministic problem, one 
is given a set N of n points in a metric space; that is, for each pair of points one 
is given the distance between them, with the assumption that these distances 
satisfy the triangle inequality (i.e., for each triple of points, i, j, and k, the distance 
between i and k, is at most the sum of the distances between i and j, 
and between j and k). The aim is to fnd the tour (or cyclic permutation) that 
contains all of these points exactly once, such that the total distance traversed 
is minimized. In the a priori TSP, one is also given a probability distribution 
over subsets A N; this models the fact that, in advance, one is not certain of 
the “currently active” points A that need to be included in the tour, and only 
has probabilistic information about this. However, given only this probabilistic 
information, one computes a “master tour” ˝, whose objective function value is 
the expected value (with respect to ) of the length of ˝ after shortcutting it to 
include only those active points in A. If the distribution makes the full set N 
active with probability 1, then this is just the usual (deterministic) TSP (and 
hence we expect results no stronger than is known for that problem). 

The a priori TSP was frst studied in the doctoral theses of Jaillet [1, 2] and 
Bertsimas [3], and their early work, which mostly focuses on asymptotic analysis 
when the points themselves are randomly selected (e.g., uniformly in the unit 
square), is nicely surveyed by Bertsimas, Jaillet, and Odoni [4]. If one is interested 
in polynomial-time algorithms, then one must be careful in the way that 
the distribution is given as part of the input. It would take an exponential 
amount of information to list the probability that each set A is active as part 
of the input; one simple workaround is to insist that only a polynomial (in n) 
number of sets can be active with a positive probability, and this is called the 
polynomial scenario model. Alternatively, we can present the probability distribution 
by means of an oracle, a “black box” from which we can draw independent 
samples according the distribution ; the restriction that we are interested in 
polynomial-time algorithms implies that we are limited to a polynomial number 
of samples in this black box model. 

In this paper, we restrict attention to the so-called independent activation 
model for ; for each j 2 N, consider the random indicator variable, 11(j 2 
A), which is 1 exactly when j 2 A, and then we shall assume that these are 
independent random variables, and that we are given, as part of the input, the 
probability pj = Pr[11(j 2 A) = 1], for each j 2 N. For this setting, we frst give 


a randomized polynomial-time algorithm to compute a tour for the a priori TSP 
whose expected objective function value (with respect to the random choices 
of the algorithm) is guaranteed to be within a factor of 4 of optimal. We then 
derandomize the procedure to give a deterministic algorithm that is guaranteed 
to fnd a tour of cost within a factor of 8 of optimal. This is the frst constant 
approximation algorithm for this model. 

The strongest previous result was a randomized O(log n)-approximation algorithm 
due to Schalekamp and Shmoys [5]. That result is signifcantly more 
general, since not only does it hold in the black box model, but in fact, it does 
not require any knowledge of the distribution . 

This stochastic notion of the TSP is closely related to the notion of the universal 
TSP. In the universal TSP, there is also a notion of a “master tour” which 
is shortcut according to the current active set A. However, for this problem, we 
say that a tour ˝ for N has a performance guarantee of ˆ if, for every subset 
A N, the tour ˝ shortcut to A has length at most ˆ times the optimal cost 
of a tour for the subset A. Clearly, a ˆ-approximation algorithm for the universal 
TSP is a ˆ-approximation algorithm for the a priori TSP with respect to 
any distribution. The universal TSP was introduced by Bartholdi and Platzman 

[6] in the context of an application for “meals on wheels”, and they gave an 
O(log n) performance guarantee for points in the Euclidean plane. In contrast, 
Hajiaghayi, Kleinberg, and Leighton [7] proved that no tour can achieve a bound 
better than c(log n/ log log n)1/6 for some constant c> 0, even for points in the 
Euclidean plane. Thus, it is particularly natural to consider weaker models, such 
as the a priori TSP. 

Our algorithm is extremely simple; in essence, we take one sample S from the 
active set distribution , build a minimum spanning tree on that subset, and 
then add each unselected point as a leaf in this tree by connecting it to its nearest 
neighbor in S. Then this spanning tree on N is converted to a tour through all 
points in N, by taking the standard “double tree” walk around the “outside” of 
the tree. In fact, one must be a bit more careful in handling the case in which the 
sample S is empty, and this will introduce a number of minor complications. This 
algorithm is a slight variant on an algorithm that was developed frst by Gupta, 
Kumar, P´al, and Roughgarden [8] in a rather di erent context, that of the rentor-
buy problem, which is a deterministic network design problem in which one 
can either rent an edge e at a cost of ce per use, or buy it for unlimited use at a 
cost of Mce (where M> 0 is an input to the problem); this algorithm was later 
also the basis for an algorithm of Gupta, P´al, Ravi and Sinha [9] for the 2-stage 
(with recourse) stochastic Steiner tree problem. Williamson and van Zuylen [10] 
have derandomized the rent-or-buy algorithm, and their technique translates 
almost directly to our setting. Finally, we note that Garg, Gupta, Leonardi, and 
Sankowski [11, 12], as a corollary of their work on the on-line stochastic Steiner 
tree, have independently obtained a similar constant approximation algorithm 
for the a priori TSP. 


2 The approximation algorithm 

2.1 Preliminaries 

We start by introducing some notation needed to present our results. For the 
traveling salesman problem, the input consists of a fnite set of points N = 
{1, 2,...,n} as well as the distance c(i, j) between each pair of distinct points 
i, j 2 N; we wish to fnd a tour ˝, given by a cyclic permutation of N, such that 

P 

the length of the tour, c(˝)= i2N c(i, ˝(i)) is minimized. We shall assume 
that the distances are symmetric (i.e., c(j, k)= c(k, j) for each pair of points 
j, k 2 N) and satisfy the triangle inequality (i.e., c(j, k)+ c(k, `) c(j, `) for 
each triple of points j, k, ` 2 N). 

For the a priori TSP, only a subset A N will be “active”. For a tour ˝ of 
N, we defne ˝A to be the tour “short-cut” to traverse only A; more formally, if 
`` 

we let ˝ be the `th power of ˝, (i.e., ˝1 = ˝ and for each `> 1, ˝ = ˝(˝ `−1)), 
then for each j 2 A, ˝A(j)= ˝ `(j), where ` is the smallest positive integer such 
that ˝ `(j) 2 A (unless A = {j}, in which case ˝ `(j) equals j). 

For the a priori TSP, the input consists of a set of points N, a distance 
function d, and a probability distribution specifed over the subsets of N. 
Since we are working in the independent activation model, we can compute the 
probability (A) for each subset A N; that is, 

010 1 
YY 
(A)= @ pj A@ (1 − pj)A , (1) 

j2Aj62A 

where pj is the probability that point j is in the active set A. The goal now 
is to fnd a tour ˝ that minimizes the expected length with respect to of the 
tour induced by ˝; i.e., to minimize EA[c(˝A)]. We shall let ˝ denote an optimal 
solution, and let OPT denote the corresponding optimal value. It is important 
to note that ˝ denotes the optimal tour shortcut to the subset of nodes A,

A 

which is not necessarily the optimal tour for this specifc subset. 

We will give an 8-approximation algorithm for this problem; that is, we 
will show how to compute a tour ˝ such that EA[c(˝A)] 8 · OPT. We frst 
give a randomized algorithm that computes a tour ˝ with the property that 
E[EA[c(˝)]] 4 · OPT, where the outer expectation in this expression is with 
respect to the random coin tosses used by the algorithm, but we will subsequently 
show how to derandomize this algorithm, while losing an additional factor of 2. 

2.2 Special Case: p1 =1 

As a warm-up, we will frst consider a more structured case, where p1 = 1. 

The randomized algorithm is extremely simple. We frst draw a sample S with 
respect to the underlying distribution . In other words, we choose a subset S 
by independently including every node j in S with probability pj . We compute a 
minimum spanning tree on the subset S. Then, we extend this to be a spanning 
tree T on N, by adding, for each node j 62 S, the edge from j to its nearest 


neighbor in S. (Note that this is well-defned, since we have ensured that S = ;
by setting p1 = 1.) Let MST(S) denote the cost of the minimum spanning 
tree (MST) on S and for each j 6= 1, let Dj(S) denote the length of this edge 
between j and its nearest neighbor in S −{j}. (For notational convenience, let 
D1(S) = 0.) 

From this spanning tree T , it is well known (see, e.g., [13]) that we can 
construct a tour ˝ of total length c(˝) at most twice the total length of edges in 
the tree T; we simply “walk around the outside of the tree,” or more formally, 
we construct an Eulerian multigraph by taking each edge twice, and then ˝ is 
obtained from an Eulerian tour by shortcutting any node that is visited more 
than once. 

To upper bound the objective function value of this tour ˝, we will focus on 
its tree representation; that is, for any subset A N, the induced tour ˝A can 
also be obtained by frst considering the tree induced from T by the node set A, 
and then taking the analogous traversal of its doubled Eulerian tour. 

In order to analyze the quality of this solution, we next present two obvious 
lower bounds on the length of an optimal solution. 

Fact 1 For each subset A N, ˝ MST(A).

A 

P 

Fact 2 For each subset A N, ˝ Dj (A).

Aj2A 

If we take expectations of both sides of these inequalities, we can thereby 
obtain lower bounds on the optimal value as well. 

Fact 3 OPT EA[MST(A)]. 

P 

Fact 4 OPT EA[ j2A Dj (A)]. 

Now let us analyze the cost of the (random) tree T generated by our algorithm. 
Focus on a particular active set A, and its contribution to the expected 
cost (with respect to ) of this solution. Recall that T consists of a spanning 
tree TS on our sample S, plus additional edges that connect nodes j 62 S to this 
spanning tree. For a given set A, the induced tree TA need not include all of the 
spanning tree TS ; however, for computing an upper bound on the cost of TA, we 
shall always include all of the edges of TS in the induced solution. Given this, it 
is straightforward to see that we can upper bound the cost of T by 

X 

MST(S) + 11(j 2 A)11(j 62 S)Dj (S), (2) 

j6

=1 

which is bounded above by 

X 

MST(S) + 11(j 2 A)Dj (S). (3) 
j6

=1 


Hence the expected cost (with respect to the random choices of our algorithm) 

of the tour ˝ that we compute, 

01 
X 

ES[EA[c(˝A)]] 2 @ES[MST(S)] + ES[EA[ 11(j 2 A)Dj(S)]]A . 
j6

=1 

Since S is a random subset selected according to , the frst term can be replaced 
by EA[MST (A)], which is, by Fact 3, at most OPT . For the second term, we 
apply linearity of expectation, and the fact that A and S are independent samples 
drawn according to , and hence 

XX 

ES [EA[ 11(j 2 A)Dj(S)]] = EA[11(j 2 A)]ES[Dj (S)] (4) 
j6 j=1

=1 6

X 

= EA[11(j 2 A)]EA[Dj (A)], 
j6

=1 

where again we have relied on the fact that the subsets S and A are both 
(independently) drawn according the distribution . 

For each j 6= 1, the random variable Dj(A) denoting the distance from j to 
its nearest neighbor in A \{j} is independent of 11(j 2 A). Thus we conclude 
that 

XX 

EA[11(j 2 A)]EA[Dj (A)] = EA[11(j 2 A)Dj(A)]. (5) 
j6 j=1

=1 6 

On the other hand, recall that by Fact 4, 

XX 

OPT EA[ 11(j 2 A)Dj (A)] = EA[11(j 2 A)Dj (A)]. 
jj 

Thus we can also bound the second term relative to OPT . Combining all of 
these pieces, we have shown that 

ES[EA[c(˝A)]] 2(OPT + OPT )=4OPT. 

2.3 General Case 

Now we will relax the condition that there must exist one of the points with 
activation probability equal to 1. One might question where we take advantage 
of this restriction within the analysis given above. The way in which this gets 
used is most importantly in the description of the algorithm, in which there 
always is a set of non-empty points on which to build a spanning tree. Stated 
another way, for each point j, there is always a point in the sample of the 
algorithm in S −{j}. let 0 denote the probability distribution where 0(A) 
is the probability that A is the active set (as drawn from ), conditioned on 
the event that |A| 2. For a set A drawn according to , let denote the 
probability that |A| 2. The following lemma implies that we can focus on 0 
instead of . 


Lemma 5 For any tour ˝, its a priori TSP objective function value with respect 
to is exactly equal to times its objective function with respect to 0 . 

Proof. For each set A with |A| < 2, the cost of the short-cut of the tour ˝ to the 
subset A, c(˝A), is equal to 0. For each subset A with |A| 2, the probability 
that A is active with respect to is exactly equal to times the probability 
that is active with respect to 0 . Now consider the objective function value of 
˝ with respect to , which is an expectation computed over the random choice 
of A. The subsets of size less than two contribute nothing (literally), and the 
rest contribute in total exactly times what they contribute to the analogous 
expected value for 0 . Hence, the claim is proved. 

This lemma shows that our objective functions with respect to and 0 
di er only by the same multiplicative factor (for all feasible solutions), and hence 
we get the following corollary. 

Corollary 6 Any (randomized) ˆ-approximation algorithm for the a priori TSP 
with respect 0 is a (randomized) ˆ-approximation algorithm with respect to . 

Now let us consider the extension of our algorithm and its analysis to 0 . 
If the algorithms draws a sample S according to 0 , we now have the property 
that for each j 2 N, there must exist some element in S −{j}, and hence Dj (S) 
is well-defned. It is straightforward to see that much of the analysis of the our 
algorithm is una ected by this change (though of course, the summation in the 
bound should now by done over all j 2 N, not just those nodes j 6= 1). The 
only signifcant change is that the random variables 11(j 62 S) and Dj (S) are 
no longer independent so that (5) no longer holds. Note that it would suÿce to 

replace (5) by the inequality 
X X 
EA[11(j 2 A)]EA[Dj (A)] EA[11(j 2 A)Dj(A)], (6) 
j j 
which, using basic properties of conditional expectation is equivalent to 
X X 
j 
EA[11(j 2 A)]EA[Dj(A)] 
j 
EA[11(j 2 A)]EA[Dj (A)|j 2 A]. (7) 

If we prove that for every j 2 N, EA[Dj(A)] EA[Dj (A)|j 2 A], then the remainder 
of the analysis carries over from the more specialized setting and the desired 
result follows. The modifed algorithm yields a randomized 4-approximation 
algorithm. 

Lemma 7 For a random set A drawn according to the distribution 0 , 
EA[Dj(A)] EA[Dj(A)|j 2 A]. 
Proof. The proof is based on the following two propositions: 


Proposition 8 For a random set A drawn according to the distribution 0 , 

EA[Dj(A)|j 62 A]= EA[Dj (A)|(j 2 A) ^ (|A| 3)]. 

Proof. Let S be the family of sets of cardinality at least 2 that contain j, and 
¯

let S be those that do not contain j. Finally, let S2 denote the 2-element sets of 
S. 

As above, we know that there is a constant 0 < 1, such that for any 
subset A of cardinality at least 2, (A)= 0(A). Furthermore, we know 
how to compute (and ) from (1). There is a natural 1-1 correspondence f 

¯

between the sets in S−S2 and S, by simply deleting j; that is, f(S) maps S to 

1−pj

S −{j}. But then, (S)= (f(S)), and hence the same relation holds for 

pj 

 0 (provided |S| 3). Further, Dj(S)= Dj (f(S)), since each is the distance 
from j to S −{j}. In other words, the conditional expectation of Dj (S) for sets 
of size at least 3 containing j is exactly equal to the conditional expectation of 
Dj (S) for sets not containing j (where both expectations are with respect to the 
distribution 0). 

Proposition 9 For a random set A drawn according to the distribution 0 , 

EA[Dj(A)|(j 2 A) ^ (|A| 3)] EA[Dj (A)|(j 2 A)]. 

Proof. We prove instead that 

EA[Dj(A)|(j 2 A)] EA[Dj(A)|(j 2 A) ^ (|A| = 2)], 

which implies the lemma by the basic properties of conditional expectations. 
By defnition, Dj ({j, k})= c(j, k). Let ˙ be a permutation on N such that 

0= c(j, j)= c(j, ˙(1)) c(j, ˙(2)) ··· c(j, ˙(n)). 

To compute either of these expectations, we need only consider the probability 
that each conditional distribution takes on each of these n−1 non-trivial values. 
We know that Dj (A)= c(j, ˙(k)) exactly when ˙(`) 62 A, ` =2,...,k − 1, and 
˙(k) 2 A. Furthermore, for each distribution, the probability that any set is 
selected is proportional to the probability that it was selected in the original 
distribution , for which we know how to compute these probabilities exactly. 
Thus, for the frst expectation, the probability that Dj (A)= c(j, ˙(k)) is propor-

Q

tional to p˙(k) · (1 − p˙(`)), whereas for the second, it is proportional 

Q `=2,...,k−1 
to p˙(k) · `=1,k(1−p˙(`)) (with the same constant of proportionality). It is clear 

6 

from these two expressions that the frst probability dominates the second, so 
that for any k, it is more likely in the second case that Dj(S) c(j, ˙(k)). Consequently, 
the second expectation is at least the frst, and the lemma is proved. 

It follows that 

EA[Dj(A)|j 62 A] EA[Dj (A)|j 2 A]. 

The lemma follows immediately since the unconditioned expectation is sandwiched 
in between these two values. 


There is one fnal detail that should be mentioned. When we wish to select 
a sample from 0 , the most natural way is to repeatedly choose a sample from 
, until the resulting set has cardinality at least 2. If, for example, each of the 
values pi =2−n , then this could take exponential time. However, one can easily 
compute the conditional probability that the two smallest indexed elements in 
S are i<j (for any such pair), and then the remaining elements can be selected 
by the independent selection rule from among {j +1,...,n}. In this way, we can 
generate a sample according to 0 in polynomial time. Thus we have derived 
the following theorem. 

Theorem 10. There is a randomized algorithm that computes a solution ˝ for 
the a priori TSP in polynomial time and has expected value (with respect to the 
random choices of the algorithm) no more than four times the optimal value. 

2.4 Derandomization 

A stronger result would be to give a deterministic algorithm with the same 
performance guarantee. In fact, our algorithm in the simpler case with p1 =1 
is essentially identical to an earlier algorithm for the rent-or-buy problem, and 
we will exploit this connection in devising a deterministic analogue of Theorem 

10. In this problem, one is given an undirected graph with edge costs ce and an 
infation factor M. One can rent edge e at a cost of ce per transmission, or buy 
the edge e (and have unlimited capacity) for a total cost of Mce. A given subset 
of nodes are demand points that need to communicate with the root node 1. 
The aim is to decide which edges to buy and which to rent so that the total cost 
is minimized. 

Gupta, Kumar, P´al, and Roughgarden [8] gave a very elegant approximation 
algorithm for this problem, which works as follows: choose a subset S by including 
node 1, and including each demand node i, independently, with probability 
1/M; build a minimum spanning tree (or a near-optimal Steiner tree) on S, and 
these are the edges to buy; for each demand node i 62 S, rent the shortest edge 
from i to a node in S to serve that demand. Other than the fact that the inclusion 
probability for node i is 1/M, rather than a specifed parameter pi, this is exactly 
the same algorithm as our approximation algorithm. Furthermore, Williamson 
and van Zuylen [10] have shown how to deterministically choose a tree for which 
the cost is at most the expected value of this randomized procedure. In fact, 
they gave a derandomization procedure for a slight restatement of this problem, 
called the connected facility location problem. In this problem, one selects some 
facilities to open, and then must connect them by a Steiner tree; for each node j 
not in the Steiner tree, one connects j to a node in i in the tree, by selecting the 
shortest such path (though of course, in a setting like ours in which the distances 
satisfy the triangle inequality, that path would be nothing more than a single 
edge). 

It is quite straightforward to show that the natural generalization of the derandomization 
approach of Williamson and van Zuylen applies to our setting as 


well. Consider the bound that we used to analyze the performance of our algorithm, 
equation (2). This is the upper bound used in the expectation calculation 
for each active A, and so, for a particular choice of S, the overall bound that we 
obtain in this way is equal to twice 

XX 

MST(S)+ EA[11(j 2 A)]11(j 62 S)Dj(S)= MST (S)+ pjDj (S). (8) 
j6 j62S

=1 

The essence of our proof is that if we select S at random, then this bound 
is good relative to the optimal cost. Thus, if we deterministically select a set S 
for which the actual cost (resulting from this upper bound on the expectation 
calculation) is no more than this expectation, then we have a deterministic approximation 
algorithm. It is not diÿcult to see that the problem of choosing a 
set S minimizing the expression (8) is an instance of the connected facility location 
problem, and hence using a (deterministic) 3.28-approximation (from [10]) 
to the problem gives a deterministic 13.12-approximation to the a priori TSP. 

We next show how to use ideas from [10] directly to get a deterministic 8approximation. 
The standard approach to derandomizing such an algorithm is 
the so-called method of conditional expectations; in this approach, one considers 
one random choice at a time, such as whether node j is in the set S, and 
computes the two conditional expectations, with this condition, and with its 
negation. While the conditional expectation of the second term in (8) is easy to 
compute, handling the frst term seems rather diÿcult. Instead, Williamson & 
van Zuylen relied on a linear programming relaxation that captures enough of 
the structure of the tree optimization problem for which the method of conditional 
expectations can be applied. Thus, instead of computing the conditional 
expectation, we compute an upper bound (i.e., a pessimistic estimator) on this 
conditional expectation. 

We frst write the standard LP relaxation for the Steiner tree problem on a 
point set S, where (U) denotes the set of edges ij such that i 2 U, j 62 U: 

P 

min ij c(i, j)yij

P 

(Steiner(S)) s.t. ij2 (U) yij 1 8U N : S \ U =6;, 1 62 U 

yij 0 

We next write a connected facility location type linear program: 

min B + C

P 

s.t. j xij =1 8i 2 N

PP 

(CFL) ij2 (U) zij j2U xij 8i 2 N, 8U N :1 62 U

P 

B = ij c(i, j)zij

PP 

C = i2N pij2N c(i, j)xij 
zij ,xij 0 8i, j 2 N 

 

Let (x ,z ,B ,C ) be an optimal solution to (CFL). We prove a sequence 
of lemmas: 


Lemma 11 

3 

B + C OPT 

2 

A

Proof. Let OPTA denote an optimal tour for instance A. For a sample A, let z 

A

be half the incidence vector of the optimal tour on A and for any i, let xij =1 

1

if j is the nearest neighbor of i in A \{i}. Clearly BA = OPTA and CA = 

P 2 

i piDj(A). Note that the tour OPTA crosses each set U N : U \A 6=0, 1 62 U

P 

at least twice so that the fractional value A is at least one. It is then 

ij2 (U) zij 
A

easy to check that (x ,zA,BA,CA) is a feasible solution for the linear program 

P 

with objective function value 1 OPTA + j pjDj(A).

2 

A

The vector EA[(x ,zA,BA,CA)] is a convex combination of feasible points 
and hence is feasible itself. The objective function value for this solution, using 
Fact 4 and equation (5) is easily bounded by 3 OPT . Thus the optimal solution 

2 

is no worse. 

Given a set S N, we now defne a solution to Steiner(S) as follows: 

S 

y¯ ij = zij + 11(i 2 S)xij. 

S

Lemma 12 y¯ is a feasible solution to Steiner(S). 

Proof. For any U N : S \ U 6= ;, 1 62 U, let i0 2 S \ U. Then: 

X X XXXX 

S 

y¯ kl = zkl+11(k 2 S)xkl zkl+ xi0l xi0l+ xi0l =1. 

kl2 (U) kl2 (U) kl2 (U) l62Ul2Ul62U 

The claim follows. 

With the above lemma in mind, we defne c¯ ST (S) to be the objective function 
S

value of this solution y¯ for Steiner(S). One can check that the conditional 
expectation of c¯ ST (S) can be directly evaluated. Moreover, since y¯S is a feasible 
solution to Steiner(S), one can fnd a Steiner tree on S with cost at most 2¯cST (S).

P 

Further, let cR(S) denote j pjDj(S). 
The method of conditional expectation iterates through the points one by 
one and decides for each point whether or not to add it to S. We let P N 
¯

denote the points we decide to add to S and let P N denote the set of points 
we decide not to add. We defne our estimator 

Est(P, P ¯) = 2E[¯cST (S)|P S, P ¯ \ S = ;]+ E[cR(S)|P S, P ¯ \ S = ;]. 

We start with P, P ¯ empty, and maintain the invariant: 

Est(P, P ¯) 4OPT. 

This holds in the beginning since 2E[¯cST (S)] is exactly 2(B + C ) 3OPT 
by Lemma 11, and the second term is at most OPT, by Fact 4. Moreover, one 
can verify (see [10]) that for i 62 P, P ¯, 

¯ 

Est(P, P ¯) = piEst(P [{i},P) + (1 − pi)Est(P, P ¯ [{i}). 


¯

Thus the smaller of Est(P [{i},P) and Est(P, P ¯ [{i}) is no larger than 
¯ 

Est(P, P ¯). We can therefore add i to one of P or P without violating the invariant. 


At the end of the process, P [ P ¯= N and we have a deterministic set S for 
which 2¯cST (S)+ cR(S) 4OPT. Doubling the tree costs us another factor of 
two, and so we get an 8-approximation algorithm. 

We have explained the derandomization in the context of the rooted variant. 
However, for the deterministic algorithm the distinction between these two cases 
essentially disappears. The probabilistic argument used to analyze the theorem 
proves that the expected cost over sets of size at least two is good (and can be 
bounded in the same way as previously). However, we can simply try all possible 
choices for the lexicographically smallest pair of nodes in S, choose the frst as 
the root, and then continue the derandomization procedure for the remaining 

2

nodes. For each of the resulting at most n choices of S, we can evaluate the 
upper bound (8), and choose the best. The previous theorem ensures that the 
resulting solution will have the desired guarantee. 

Theorem 13. There is a deterministic algorithm that computes a solution ˝ for 
the a priori TSP in polynomial time and has objective function value no more 
than eight times the optimal value. 

3 Conclusions 

The constant 4 (or 8 for the deterministic case) can probably be optimized. 
There are at least two immediate possible sources of improvement. First, we are 
using the “double-tree” heuristic to compute the tour, rather than Christofdes’s 
algorithm. This requires a slightly di erent analysis (since one must also bound 
the cost of the matching in terms of the a priori TSP). Furthermore, unlike in 
the result of Williamson & van Zuylen for rent-or-buy, we lose an extra factor of 
two relative to an LP relaxation. It seems likely that by adapting this technique 
to a stronger LP relaxation one can avoid this loss. On the other hand, our 
algorithm is competitive with respect to the expected value of the ex post OPT , 
which is a lower bound on the optimal cost for the problem. It is conceivable that 
one can get a better approximation guarantee if one uses a better lower bound 
on OPT . Finally, it is important to note that this algorithm is not a constant 
approximation algorithm in the black box model (or even the polynomial scenario 
model) for representing the probability distribution . It remains an interesting 
open question to provide a constant approximation algorithm in either of those 
two settings. 


Acknowledgments The authors would like to thank Alan Frieze, Shane 
Henderson, Frans Schalekamp, Anke van Zuylen, and David Williamson for 
helpful discussions. 

References 

1. Jaillet, P.: Probabilistic traveling salesman problems. Technical Report 185, Operations 
Research Center, MIT (1985) 
2. Jaillet, P.: A priori solution of a traveling salesman problem in which a random 
subset of the customers are visited. Operations Research 36 (1988) 929–936 
3. Bertsimas, D.: Probabilistic Combinatorial Optimization Problems. PhD thesis, 
MIT, Cambridge, Mass. (1988) 
4. Bertsimas, D.J., Jaillet, P., Odoni, A.R.: A priori optimization. Operations Research 
38(6) (1990) 1019–1033 
5. Schalekamp, F., Shmoys, D.B.: Algorithms for the universal and a priori TSP. 
Operations Research Letters (2007) In press. 
6. Bartholdi, III, J.J., Platzman, L.K.: An O(N log N) planar travelling salesman 
heuristic based on spaceflling curves. Operations Research Letters 1(4) (1981/82) 
121–125 
7. Hajiaghayi, M.T., Kleinberg, R., Leighton, T.: Improved lower and upper bounds 
for universal tsp in planar metrics. In: Proceedings of the 17th Annual ACM-SIAM 
Symposium on Discrete Algorithms, New York, NY, USA, ACM Press (2006) 649– 
658 
8. Gupta, A., Kumar, A., P´al, M., Roughgarden, T.: Approximation via cost-sharing: 
a simple approximation algorithm for the multicommodity rent-or-buy problem. 
In: Proceedings of the 44th Anuual IEEE Symposium on Foundations of Computer 
Science, Silver Spring, MD, IEEE Computer Society (2003) 606–615 
9. Gupta, A., P´al, M., Ravi, R., Sinha, A.: Boosted sampling: approximation algorithms 
for stochastic optimization. In: Proceedings of the 36th Annual ACM 
Symposium on Theory of Computing. (2004) 265–274 
10. Williamson, D.P., van Zuylen, A.: A simpler and better derandomization of an 
approximation algorithm for single source rent-or-buy. Operations Research Letters 
35 (2007) 707–712 
11. Garg, N., Gupta, A., Leonardi, S., Sankowski, P. Private communication (2007) 
12. Garg, N., Gupta, A., Leonardi, S., Sankowski, P.: Stochastic analyses for online 
combinatorial optimization problems. In: Proceedings of the 19th Anuual ACMSIAM 
Symposium on Discrete Algorithms, NY and Philadelphia, ACM and SIAM 
(2008) 942–951 
13. Lawler, E.L., Lenstra, J.K., Rinnooy Kan, A.H.G., Shmoys, D.B., eds.: The Traveling 
Salesman Problem. Wiley (1985)