A constant approximation algorithm for the a priori traveling salesman problem David Shmoys1? and Kunal Talwar2 1 Cornell University, Ithaca NY 14853 shmoys@cs.cornell.edu 2 Microsoft Research, Silicon Valley Campus, 1065 L’Avenida, Mountain View, CA 94043 kunal@microsoft.com Abstract. One of the interesting recent developments in the design and analysis of approximation algorithms has been in the area of algorithms for discrete stochastic optimization problems. In this domain, one is given not just one input, but rather a probability distribution over inputs, and yet the aim is to design an algorithm that has provably good worst-case performance, that is, for any probability distribution over inputs, the objective function value of the solution found by the algorithm must be within a specifed factor of the optimal value. The a priori traveling salesman problem is a prime example of such a stochastic optimization problem. One starts with the standard traveling salesman problem (in which one wishes to fnd the shortest tour through a given set of points N), and then considers the possibility that only a subset A of the entire set of points is active. The active set is given probabilistically; that is, there is a probability distribution over the subsets of N, which is given as part of the input. The aim is still to compute a tour through all points in N, but in order to evaluate its cost, we instead compute the expectation of the length of this tour after shortcutting it to include only those points in the active set A (where the expectation is computed with respect to the given probability distribution). The goal is to compute a “master tour” for which this expectation is minimized. This problem was introduced in the doctoral theses of Jaillet and Bertsimas, who gave asymptotic analyses when the distances between points in the input set are also given probabilistically. In this paper, we restrict attention to the so-called “independent activation” model in which we assume that each point j is active with a given probability pj , and that these independent random events. For this setting, we give a 8-approximation algorithm, a polynomial-time algorithm that computes a tour whose a priori TSP objective function value is guaranteed to be within a factor of 8 of optimal (and a randomized 4-approximation algorithm, which produces a tour of expected cost within a factor of 4 of optimal). This is the frst constant approximation algorithm for this model. ? Research supported partially by NSF grants CCR-0635121 & DMI-0500263. 1 Introduction One of the interesting recent developments in the design and analysis of approximation algorithms has been in the area of algorithms for discrete stochastic optimization problems. In this domain, one is not given just one input, but rather a probability distribution over inputs, and yet the aim is to design an algorithm that has provably good worst-case performance, that is, for any probability distribution over inputs, the objective function value of the solution found by the algorithm is within a specifed factor of optimal value. The a priori traveling salesman problem is a prime example of such a stochastic optimization problem. This is a stochastic generalization of the (standard) traveling salesman problem (TSP). In the traditional deterministic problem, one is given a set N of n points in a metric space; that is, for each pair of points one is given the distance between them, with the assumption that these distances satisfy the triangle inequality (i.e., for each triple of points, i, j, and k, the distance between i and k, is at most the sum of the distances between i and j, and between j and k). The aim is to fnd the tour (or cyclic permutation) that contains all of these points exactly once, such that the total distance traversed is minimized. In the a priori TSP, one is also given a probability distribution over subsets A N; this models the fact that, in advance, one is not certain of the “currently active” points A that need to be included in the tour, and only has probabilistic information about this. However, given only this probabilistic information, one computes a “master tour” ˝, whose objective function value is the expected value (with respect to ) of the length of ˝ after shortcutting it to include only those active points in A. If the distribution makes the full set N active with probability 1, then this is just the usual (deterministic) TSP (and hence we expect results no stronger than is known for that problem). The a priori TSP was frst studied in the doctoral theses of Jaillet [1, 2] and Bertsimas [3], and their early work, which mostly focuses on asymptotic analysis when the points themselves are randomly selected (e.g., uniformly in the unit square), is nicely surveyed by Bertsimas, Jaillet, and Odoni [4]. If one is interested in polynomial-time algorithms, then one must be careful in the way that the distribution is given as part of the input. It would take an exponential amount of information to list the probability that each set A is active as part of the input; one simple workaround is to insist that only a polynomial (in n) number of sets can be active with a positive probability, and this is called the polynomial scenario model. Alternatively, we can present the probability distribution by means of an oracle, a “black box” from which we can draw independent samples according the distribution ; the restriction that we are interested in polynomial-time algorithms implies that we are limited to a polynomial number of samples in this black box model. In this paper, we restrict attention to the so-called independent activation model for ; for each j 2 N, consider the random indicator variable, 11(j 2 A), which is 1 exactly when j 2 A, and then we shall assume that these are independent random variables, and that we are given, as part of the input, the probability pj = Pr[11(j 2 A) = 1], for each j 2 N. For this setting, we frst give a randomized polynomial-time algorithm to compute a tour for the a priori TSP whose expected objective function value (with respect to the random choices of the algorithm) is guaranteed to be within a factor of 4 of optimal. We then derandomize the procedure to give a deterministic algorithm that is guaranteed to fnd a tour of cost within a factor of 8 of optimal. This is the frst constant approximation algorithm for this model. The strongest previous result was a randomized O(log n)-approximation algorithm due to Schalekamp and Shmoys [5]. That result is signifcantly more general, since not only does it hold in the black box model, but in fact, it does not require any knowledge of the distribution . This stochastic notion of the TSP is closely related to the notion of the universal TSP. In the universal TSP, there is also a notion of a “master tour” which is shortcut according to the current active set A. However, for this problem, we say that a tour ˝ for N has a performance guarantee of ˆ if, for every subset A N, the tour ˝ shortcut to A has length at most ˆ times the optimal cost of a tour for the subset A. Clearly, a ˆ-approximation algorithm for the universal TSP is a ˆ-approximation algorithm for the a priori TSP with respect to any distribution. The universal TSP was introduced by Bartholdi and Platzman [6] in the context of an application for “meals on wheels”, and they gave an O(log n) performance guarantee for points in the Euclidean plane. In contrast, Hajiaghayi, Kleinberg, and Leighton [7] proved that no tour can achieve a bound better than c(log n/ log log n)1/6 for some constant c> 0, even for points in the Euclidean plane. Thus, it is particularly natural to consider weaker models, such as the a priori TSP. Our algorithm is extremely simple; in essence, we take one sample S from the active set distribution , build a minimum spanning tree on that subset, and then add each unselected point as a leaf in this tree by connecting it to its nearest neighbor in S. Then this spanning tree on N is converted to a tour through all points in N, by taking the standard “double tree” walk around the “outside” of the tree. In fact, one must be a bit more careful in handling the case in which the sample S is empty, and this will introduce a number of minor complications. This algorithm is a slight variant on an algorithm that was developed frst by Gupta, Kumar, P´al, and Roughgarden [8] in a rather di erent context, that of the rentor- buy problem, which is a deterministic network design problem in which one can either rent an edge e at a cost of ce per use, or buy it for unlimited use at a cost of Mce (where M> 0 is an input to the problem); this algorithm was later also the basis for an algorithm of Gupta, P´al, Ravi and Sinha [9] for the 2-stage (with recourse) stochastic Steiner tree problem. Williamson and van Zuylen [10] have derandomized the rent-or-buy algorithm, and their technique translates almost directly to our setting. Finally, we note that Garg, Gupta, Leonardi, and Sankowski [11, 12], as a corollary of their work on the on-line stochastic Steiner tree, have independently obtained a similar constant approximation algorithm for the a priori TSP. 2 The approximation algorithm 2.1 Preliminaries We start by introducing some notation needed to present our results. For the traveling salesman problem, the input consists of a fnite set of points N = {1, 2,...,n} as well as the distance c(i, j) between each pair of distinct points i, j 2 N; we wish to fnd a tour ˝, given by a cyclic permutation of N, such that P the length of the tour, c(˝)= i2N c(i, ˝(i)) is minimized. We shall assume that the distances are symmetric (i.e., c(j, k)= c(k, j) for each pair of points j, k 2 N) and satisfy the triangle inequality (i.e., c(j, k)+ c(k, `) c(j, `) for each triple of points j, k, ` 2 N). For the a priori TSP, only a subset A N will be “active”. For a tour ˝ of N, we defne ˝A to be the tour “short-cut” to traverse only A; more formally, if `` we let ˝ be the `th power of ˝, (i.e., ˝1 = ˝ and for each `> 1, ˝ = ˝(˝ `−1)), then for each j 2 A, ˝A(j)= ˝ `(j), where ` is the smallest positive integer such that ˝ `(j) 2 A (unless A = {j}, in which case ˝ `(j) equals j). For the a priori TSP, the input consists of a set of points N, a distance function d, and a probability distribution specifed over the subsets of N. Since we are working in the independent activation model, we can compute the probability (A) for each subset A N; that is, 010 1 YY (A)= @ pj A@ (1 − pj)A , (1) j2Aj62A where pj is the probability that point j is in the active set A. The goal now is to fnd a tour ˝ that minimizes the expected length with respect to of the tour induced by ˝; i.e., to minimize EA[c(˝A)]. We shall let ˝ denote an optimal solution, and let OPT denote the corresponding optimal value. It is important to note that ˝ denotes the optimal tour shortcut to the subset of nodes A, A which is not necessarily the optimal tour for this specifc subset. We will give an 8-approximation algorithm for this problem; that is, we will show how to compute a tour ˝ such that EA[c(˝A)] 8 · OPT. We frst give a randomized algorithm that computes a tour ˝ with the property that E[EA[c(˝)]] 4 · OPT, where the outer expectation in this expression is with respect to the random coin tosses used by the algorithm, but we will subsequently show how to derandomize this algorithm, while losing an additional factor of 2. 2.2 Special Case: p1 =1 As a warm-up, we will frst consider a more structured case, where p1 = 1. The randomized algorithm is extremely simple. We frst draw a sample S with respect to the underlying distribution . In other words, we choose a subset S by independently including every node j in S with probability pj . We compute a minimum spanning tree on the subset S. Then, we extend this to be a spanning tree T on N, by adding, for each node j 62 S, the edge from j to its nearest neighbor in S. (Note that this is well-defned, since we have ensured that S = ; by setting p1 = 1.) Let MST(S) denote the cost of the minimum spanning tree (MST) on S and for each j 6= 1, let Dj(S) denote the length of this edge between j and its nearest neighbor in S −{j}. (For notational convenience, let D1(S) = 0.) From this spanning tree T , it is well known (see, e.g., [13]) that we can construct a tour ˝ of total length c(˝) at most twice the total length of edges in the tree T; we simply “walk around the outside of the tree,” or more formally, we construct an Eulerian multigraph by taking each edge twice, and then ˝ is obtained from an Eulerian tour by shortcutting any node that is visited more than once. To upper bound the objective function value of this tour ˝, we will focus on its tree representation; that is, for any subset A N, the induced tour ˝A can also be obtained by frst considering the tree induced from T by the node set A, and then taking the analogous traversal of its doubled Eulerian tour. In order to analyze the quality of this solution, we next present two obvious lower bounds on the length of an optimal solution. Fact 1 For each subset A N, ˝ MST(A). A P Fact 2 For each subset A N, ˝ Dj (A). Aj2A If we take expectations of both sides of these inequalities, we can thereby obtain lower bounds on the optimal value as well. Fact 3 OPT EA[MST(A)]. P Fact 4 OPT EA[ j2A Dj (A)]. Now let us analyze the cost of the (random) tree T generated by our algorithm. Focus on a particular active set A, and its contribution to the expected cost (with respect to ) of this solution. Recall that T consists of a spanning tree TS on our sample S, plus additional edges that connect nodes j 62 S to this spanning tree. For a given set A, the induced tree TA need not include all of the spanning tree TS ; however, for computing an upper bound on the cost of TA, we shall always include all of the edges of TS in the induced solution. Given this, it is straightforward to see that we can upper bound the cost of T by X MST(S) + 11(j 2 A)11(j 62 S)Dj (S), (2) j6 =1 which is bounded above by X MST(S) + 11(j 2 A)Dj (S). (3) j6 =1 Hence the expected cost (with respect to the random choices of our algorithm) of the tour ˝ that we compute, 01 X ES[EA[c(˝A)]] 2 @ES[MST(S)] + ES[EA[ 11(j 2 A)Dj(S)]]A . j6 =1 Since S is a random subset selected according to , the frst term can be replaced by EA[MST (A)], which is, by Fact 3, at most OPT . For the second term, we apply linearity of expectation, and the fact that A and S are independent samples drawn according to , and hence XX ES [EA[ 11(j 2 A)Dj(S)]] = EA[11(j 2 A)]ES[Dj (S)] (4) j6 j=1 =1 6 X = EA[11(j 2 A)]EA[Dj (A)], j6 =1 where again we have relied on the fact that the subsets S and A are both (independently) drawn according the distribution . For each j 6= 1, the random variable Dj(A) denoting the distance from j to its nearest neighbor in A \{j} is independent of 11(j 2 A). Thus we conclude that XX EA[11(j 2 A)]EA[Dj (A)] = EA[11(j 2 A)Dj(A)]. (5) j6 j=1 =1 6 On the other hand, recall that by Fact 4, XX OPT EA[ 11(j 2 A)Dj (A)] = EA[11(j 2 A)Dj (A)]. jj Thus we can also bound the second term relative to OPT . Combining all of these pieces, we have shown that ES[EA[c(˝A)]] 2(OPT + OPT )=4OPT. 2.3 General Case Now we will relax the condition that there must exist one of the points with activation probability equal to 1. One might question where we take advantage of this restriction within the analysis given above. The way in which this gets used is most importantly in the description of the algorithm, in which there always is a set of non-empty points on which to build a spanning tree. Stated another way, for each point j, there is always a point in the sample of the algorithm in S −{j}. let 0 denote the probability distribution where 0(A) is the probability that A is the active set (as drawn from ), conditioned on the event that |A| 2. For a set A drawn according to , let denote the probability that |A| 2. The following lemma implies that we can focus on 0 instead of . Lemma 5 For any tour ˝, its a priori TSP objective function value with respect to is exactly equal to times its objective function with respect to 0 . Proof. For each set A with |A| < 2, the cost of the short-cut of the tour ˝ to the subset A, c(˝A), is equal to 0. For each subset A with |A| 2, the probability that A is active with respect to is exactly equal to times the probability that is active with respect to 0 . Now consider the objective function value of ˝ with respect to , which is an expectation computed over the random choice of A. The subsets of size less than two contribute nothing (literally), and the rest contribute in total exactly times what they contribute to the analogous expected value for 0 . Hence, the claim is proved. This lemma shows that our objective functions with respect to and 0 di er only by the same multiplicative factor (for all feasible solutions), and hence we get the following corollary. Corollary 6 Any (randomized) ˆ-approximation algorithm for the a priori TSP with respect 0 is a (randomized) ˆ-approximation algorithm with respect to . Now let us consider the extension of our algorithm and its analysis to 0 . If the algorithms draws a sample S according to 0 , we now have the property that for each j 2 N, there must exist some element in S −{j}, and hence Dj (S) is well-defned. It is straightforward to see that much of the analysis of the our algorithm is una ected by this change (though of course, the summation in the bound should now by done over all j 2 N, not just those nodes j 6= 1). The only signifcant change is that the random variables 11(j 62 S) and Dj (S) are no longer independent so that (5) no longer holds. Note that it would suÿce to replace (5) by the inequality X X EA[11(j 2 A)]EA[Dj (A)] EA[11(j 2 A)Dj(A)], (6) j j which, using basic properties of conditional expectation is equivalent to X X j EA[11(j 2 A)]EA[Dj(A)] j EA[11(j 2 A)]EA[Dj (A)|j 2 A]. (7) If we prove that for every j 2 N, EA[Dj(A)] EA[Dj (A)|j 2 A], then the remainder of the analysis carries over from the more specialized setting and the desired result follows. The modifed algorithm yields a randomized 4-approximation algorithm. Lemma 7 For a random set A drawn according to the distribution 0 , EA[Dj(A)] EA[Dj(A)|j 2 A]. Proof. The proof is based on the following two propositions: Proposition 8 For a random set A drawn according to the distribution 0 , EA[Dj(A)|j 62 A]= EA[Dj (A)|(j 2 A) ^ (|A| 3)]. Proof. Let S be the family of sets of cardinality at least 2 that contain j, and ¯ let S be those that do not contain j. Finally, let S2 denote the 2-element sets of S. As above, we know that there is a constant 0 < 1, such that for any subset A of cardinality at least 2, (A)= 0(A). Furthermore, we know how to compute (and ) from (1). There is a natural 1-1 correspondence f ¯ between the sets in S−S2 and S, by simply deleting j; that is, f(S) maps S to 1−pj S −{j}. But then, (S)= (f(S)), and hence the same relation holds for pj 0 (provided |S| 3). Further, Dj(S)= Dj (f(S)), since each is the distance from j to S −{j}. In other words, the conditional expectation of Dj (S) for sets of size at least 3 containing j is exactly equal to the conditional expectation of Dj (S) for sets not containing j (where both expectations are with respect to the distribution 0). Proposition 9 For a random set A drawn according to the distribution 0 , EA[Dj(A)|(j 2 A) ^ (|A| 3)] EA[Dj (A)|(j 2 A)]. Proof. We prove instead that EA[Dj(A)|(j 2 A)] EA[Dj(A)|(j 2 A) ^ (|A| = 2)], which implies the lemma by the basic properties of conditional expectations. By defnition, Dj ({j, k})= c(j, k). Let ˙ be a permutation on N such that 0= c(j, j)= c(j, ˙(1)) c(j, ˙(2)) ··· c(j, ˙(n)). To compute either of these expectations, we need only consider the probability that each conditional distribution takes on each of these n−1 non-trivial values. We know that Dj (A)= c(j, ˙(k)) exactly when ˙(`) 62 A, ` =2,...,k − 1, and ˙(k) 2 A. Furthermore, for each distribution, the probability that any set is selected is proportional to the probability that it was selected in the original distribution , for which we know how to compute these probabilities exactly. Thus, for the frst expectation, the probability that Dj (A)= c(j, ˙(k)) is propor- Q tional to p˙(k) · (1 − p˙(`)), whereas for the second, it is proportional Q `=2,...,k−1 to p˙(k) · `=1,k(1−p˙(`)) (with the same constant of proportionality). It is clear 6 from these two expressions that the frst probability dominates the second, so that for any k, it is more likely in the second case that Dj(S) c(j, ˙(k)). Consequently, the second expectation is at least the frst, and the lemma is proved. It follows that EA[Dj(A)|j 62 A] EA[Dj (A)|j 2 A]. The lemma follows immediately since the unconditioned expectation is sandwiched in between these two values. There is one fnal detail that should be mentioned. When we wish to select a sample from 0 , the most natural way is to repeatedly choose a sample from , until the resulting set has cardinality at least 2. If, for example, each of the values pi =2−n , then this could take exponential time. However, one can easily compute the conditional probability that the two smallest indexed elements in S are i