SCHOOL OF OPERATIONS RESEARCH AND INDUSTRIAL ENGINEERING COLLEGE OF ENGINEERING CORNELL UNIVERSITY ITHACA, NY 14853-7501 SCHOOL OF OPERATIONS RESEARCH AND INDUSTRIAL ENGINEERING COLLEGE OF ENGINEERING CORNELL UNIVERSITY ITHACA, NY 14853-7501 August 1990 COMPUTATIONAL COMPLEXITY by David B. Shmoys1 & Eva Tardos2 This paper is a preliminary version of a chapter to appear in The Handbook of Combinatorics, edited by R.L. Graham, M. Grotschel and L. Lovasz, (North-Holland, Amsterdam). 1Research was supported in part by an NSF Presidential Young Investigator award CCR-89-96272 with matching support from IBM, UPS and Sun and by Air Force Contract AFOSR-86-0078. 2Research was supported in part by Air Force Contract AFOSR-86-0078. Table of Contents Table of Contents Introduction 2. Complexity of Computational Problems • Computational problems • Models of computation • Computational resources and complexity • Complexity classes • Other theoretical models of efficiency 3. Shades of Intractability • Undecidability • Evidence of intractability: NP-completeness • The polynomial-time hierarchy • Evidence of intractability: #P-completeness • Evidence of intractability: PSPAC£-completeness • Proof of intractability • Extensions of NP: short proofs via randomization 4. Living with Intractability • The complexity of approximate solutions • Probabilistic analysis of algorithms • Cryptography 5. Inside P • Logarithmic space • The hardest problems in P • Parallel computation 6. Attacks on the P versus NP Problem • Relativized complexity classes • Relating circuit complexity to Turing machine complexity • Constant-depth circuits • Monotone circuit complexity • Machine-based complexity classes Introduction Introduction The interdependence of combinatorics and complexity theory is illustrated well by a celebrated result of Ajtai, Komlos and Szemeredi (1983), who provide an efficient parallel algorithm to sort numbers by using certain bipartite graphs, known as expanders. There is a beautiful probabilistic proof that these graphs exist: a graph is selected according to a certain probability distribution and then it is shown that the graph selected is an expander with non-zero probability. This proof merely shows the existence of such a graph, it does not show how to construct one. In order for this existential theorem to be useful in an algorithmic setting, we must be able to construct an expander, which raises the following question: how do we compute the description of an expander of a specified order n? There is a trivial way to do the computation, which suggests that we have asked the wrong question: for each graph in the sample space, check by brute force whether it is an expander. Unfortunately, the sample space is large, containing ( n!)2 graphs, and to check each graph for the expander property is itself a nontrivial process, so for even modest values of n, this procedure is wildly impractical. Thus, a more relevant question is one of efficient computation: does there exist an efficient way to compute the description of an expander of a specified order? We shall see that complexity theory provides the necessary machinery to make this statement more precise. As the previous example makes clear, computational issues have given an additional dimension to combinatorics. The probabilistic proof that expanders exist becomes just a first step in the search for the additional structure needed to provide an efficient construction. Indeed, one need only examine the table of contents of this volume to see the impact of computational issues on the present direction of combinatorics. Furthermore, complexity theory has often raised combinatorial questions that were previously unsolved, and thus has sparked research in combinatorics. Complexity theory has also helped to understand the significance of certain combinatorial theorems. For example, consider the following two well-known theorems due to Hall and Camion, respectively: a bipartite graph has a perfect matching if and only if each subset of nodes in one part is adjacent to at least as many nodes in the other; a matrix with 0,+1, and -1 entries is totally unimodular if and only if each nonsingular submatrix has a row with an odd number of nonzero components. Each theorem provides an alternate characterization of the property in question, but, as Edmonds observed, there is a fundamental way in which only the first is a "good" characterization. The perfect matching itself is a concise proof that a particular graph has one and Hall's theorem provides a way to give an efficiently verifiable proof that a graph does not have one. On the other hand, both the definition and Camion's characterization of total unimodularity show only how to give a concise proof that a matrix is not totally unimodular. Just as complexity theory has provided a means of discussing efficient algorithms, it has also given an analogous formal notion of 1 a good characterization. Complexity theory has been a rapidly expanding field over the past quarter century, and it would be impossible to include all of its advances. Much of complexity theory is directed towards the formulation of computational models that explain differences between the resources required to solve particular problems, and we will focus on those closely tied to combinatorial examples. For example, completeness results suggest that the problem of deciding who has a winning strategy in certain combinatorial games is computationally even more intractable than deciding if a graph is Hamiltonian. a good characterization. Complexity theory has been a rapidly expanding field over the past quarter century, and it would be impossible to include all of its advances. Much of complexity theory is directed towards the formulation of computational models that explain differences between the resources required to solve particular problems, and we will focus on those closely tied to combinatorial examples. For example, completeness results suggest that the problem of deciding who has a winning strategy in certain combinatorial games is computationally even more intractable than deciding if a graph is Hamiltonian. Parallel computers are rapidly becoming a reality, and traditional algorithmic techniques often appear ill-suited to take advantage of their power. As a result, there has been a great deal of work recently in studying algorithmic techniques for solving combinatorial problems in theoretical models of parallel computers. Complexity theory has provided evidence that a wide variety of problems are computationally intractable, but evidence is not nearly as satisfying as proof. Although the fundamental problem of providing this proof remains open, significant steps toward settling this question have been made. Many of these results are based on beautiful combinatorial methods, and it is quite possible that combinatorial methods will play a role in the final solution. Computational complexity is a theoretical discipline, and the results obtained do not always correspond precisely with practical considerations. When used to sort fewer than a million numbers, the algorithm mentioned above is inferior to algorithms that are theoretically inferior. Nonetheless, it is the practical importance of understanding the power of computation that forms the driving force behind complexity theory. Complexity of Computational Problems In this section, we will outline the essential machinery used to give formal meaning to the complexity of computational problems. This involves describing what is precisely meant by a computational problem, setting up a mathematical model of computation, and then formalizing the notion of the computational resources required for a problem with respect to that model. Unfortunately, there is no one standardized specification in which to discuss these questions. In view of this, one might become concerned that it is possible to prove apparently contradictory results under different specifications of these notions. Therefore, if this theory is to produce meaningful results, it is essential that the definitions be robust enough so that theorems proved with respect to them apply equally to all reasonable variants of this framework. Indeed, the particular definitions 2 that we will rely on will be accompanied by evidence that these notions are sufficiently flexible. that we will rely on will be accompanied by evidence that these notions are sufficiently flexible. Suppose that there is a directed graph G and you wish to know whether it is Hamiltonian, that is, whether there is a directed circuit that traverses each of its nodes exactly once. If you handed G to your favorite graph theorist, he might cite an appropriate theorem to obtain the correct answer, "yes" or "no". The graph G is a single instance of the Hamiltonian circuit problem. Thus, for any decision problem, of which the Hamiltonian circuit problem is an example, we will be interested in the set of all instances for which the answer is "yes", which will be called the associated language. The computational problem is, given any instance, to decide whether or not this instance is in the language. Making this more formal, we encode all of the instances in some arbitrary, but standardized way as strings of O's and 1 's. Consider the example above, where the instances to be encoded are directed graphs. There are two natural ways to represent the arcs of G, a directed graph of order n. In the adjacency matrix form, we simply give an n X n 0,1 matrix A = ( aij) where aij = 1 if and only if ( i, j) is an arc; the matrix can be represented by a string of n2 O's and 1 's, by simply listing the elements of each row, one after another. In the adjacency list form, we give a list of the nodes incident from node i, for each i; this can be encoded by using the binary representation for the names of the nodes from 1 to n and some additional symbols, such as '-' and '/', and representing the list for i by giving the encoding for i and the nodes incident from it separated by '-' symbols, and then concatenating the lists for all nodes, separated by '/' symbols. Notice that it is not necessary to expand the alphabet from {0, 1} since any larger alphabet could be encoded using just those two symbols. Notice that the language for a given decision problem depends on the particular encoding selected, and when we refer to a decision problem, we will refer to the language L associated with some natural encoding. Observe that this first way of encoding the graph may be less compact, since this encoding length is n2, whereas the second might be significantly less if there are few arcs in G. Note, however, that this difference is limited in that the length of the input in one format is at most the square of the length in the other. Consider now the following directed reachability problem: given a directed graph G, and two specified nodes s and t, does there exist a path from s to t? An instance of this problem consists of an encoding of G, followed by a suitable encoding of s and t. It is not hard to imagine that no algorithm for this problem that is given its input in the adjacency matrix form can "look at" significantly less than all n2 bits for all graphs of order n, and still answer correctly for all of these instances. If the input were given in the more compact adjacency list format, it is still possible to design a procedure that always takes time proportional to the length of the input and always produces a correct answer. At first glance it seems that the list format can lead to more efficient algorithms, but, as was true for the length of the encodings, this difference can be easily bounded. Nonetheless, unless otherwise stated, this chapter will use the adjacency list encoding. The previous example raises another interesting question, which is somewhat tangential to the main focus of this chapter. Is there a sense in which all "reasonable" decision problems on graphs encoded in adjacency matrix form require any algorithm to read some constant fraction of the input? This problem has a long history, both for directed and undirected graphs, and many attempts were made at giving sufficiently strong conditions before an accurate conjecture, due to Aanderaa and 3 Rosenberg, was proved by Rivest & Vuillemin (1976), and later strengthened by Kahn, Saks & Sturtevant (1984). Consider a decision problem L where the instances are undirected graphs, and L has two important properties: (1) monotonicity if G is an instance in Land G' is formed by adding an edge, then G' is also in L ; (2) invariance under isomorphism -if G is an instance in L and its nodes are relabeled to form G', then G' is also in L. If we only count the number of questions of the form, "Is_ ij an edge?" then for any problem satisfying these two properties, any correct procedure uses essentially nRosenberg, was proved by Rivest & Vuillemin (1976), and later strengthened by Kahn, Saks & Sturtevant (1984). Consider a decision problem L where the instances are undirected graphs, and L has two important properties: (1) monotonicity if G is an instance in Land G' is formed by adding an edge, then G' is also in L ; (2) invariance under isomorphism -if G is an instance in L and its nodes are relabeled to form G', then G' is also in L. If we only count the number of questions of the form, "Is_ ij an edge?" then for any problem satisfying these two properties, any correct procedure uses essentially n/4 queries for some graph of order n. In fact, if n is a prime power, Kahn, Sturtevant & Saks have shown that all n(n -1)/2 queries must be asked. Up to this point we have restricted our attention to decision problems, and this appears to be limiting. A more satisfying answer to an input in the language of the Hamiltonian circuit problem would be the Hamiltonian circuit itself. However, this information can be extracted by asking a series of questions about membership in L. Repeat the following for each arc in the graph: delete the arc, ask if the resulting graph is still Hamiltonian and replace the arc only in the case that the answer is "no". At the end of this procedure, the remaining graph is the circuit. Thus, we have shown that the search problem of finding the circuit is not much harder than the decision problem, and this relationship remains true for all well-formulated decision versions of search problems. Another important type of computational problem is an optimization problem; for example, given G and two nodes s and t, we may wish to find the shortest path from s to t ( or merely find the length). Problems of this type can be converted into decision problems by adding a bound b to the instance, and, for example, asking whether G has a path from s to t of length at most b. To see that this is reasonable, notice that the optimal value can be computed by binary search, that is, by asking a series of questions of this form by fixing b to be the midpoint of the current range of values for the optimum, and using the answer to cut the range in half for the next question. As above, we see that an optimization problem can be solved with only somewhat more work than the corresponding decision problem. Throughout this chapter, we will be dealing with computational problems involving a variety of structures, and we will not be specifying the nature of the encodings used. We will operate on the premise that any reasonable encoding produces strings of length that can be bounded by a polynomial of the length produced by any other encoding. For problems that involve numbers we must be more careful in stating precisely what is meant by reasonable. First, we shall restrict attention to integral data. To highlight the remaining issues, consider the following problem, called the subset sum problem: given a set of numbers S and a target number t, does there exist a subset of S that sums to t? Typically, we will assume that the numbers are given in binary representation, although we will occasionally use a unary representation ( e.g., representing 5 by 11111). Notice that the latter could be exponentially bigger than the former, and thus size is a deceptive measure, since it makes instances of the problem larger than they need be. As we survey the complexity of computational problems that involve numbers, we will see that some are sensitive to this choice of encodings, such as the subset sum problem, whereas others are less affected. Models of computation We next turn our attention to defining a mathematical model of a computer. In fact, we will present three different models, and although their superficial characteristics make them appear quite different, they will turn out to be formally equivalent. The first of these is the Turing machine, which is an extremely primitive model, and as a result, it is easier to prove results about 4 what cannot be computed within this model. On the other hand, its extreme simplicity makes it ill-suited for algorithm design; one would never want to program a Turing machine. As a result, it will be convenient to have an alternative model, the random access machine (RAM), within which to discuss algorithms. what cannot be computed within this model. On the other hand, its extreme simplicity makes it ill-suited for algorithm design; one would never want to program a Turing machine. As a result, it will be convenient to have an alternative model, the random access machine (RAM), within which to discuss algorithms. that selects the new state, the contents of the cells currently scanned on the work tapes, and indicates the direction in which each head moves one cell as a deterministic function of the current state and the contents of the input and work tape cells currently being read. One may view the transition function as the program hardwired into this primitive machine. The computation is begun in state q0, with the input head at the left end of the input, and all of the work tapes and the rest of the input tape blank. The machine halts if 6 is undefined for the current state and the symbols read. An input is accepted if it halts in a state in the accepting state set A ~ Q. A Turing machine M solves a decision problem L if L is the set of inputs accepted by M, and M halts on every input; such a language L is said to be decidable. This definition of a Turing machine is similar to the one given by Turing (1937), and the reader should note that many equivalent definitions are possible. We have defined a Turing machine so that it can only solve decision problems, but this definition can be easily extended to model the computation of an arbitrary function by, for example, adding a write-only output tape, on which to print the output before halting. Although this appears to be a very primitive form of a computer, it has become routine to accept the following proposition. Church's Thesis: Any function computed by an effective procedure can be computed by a Turing machine. Although this is a thesis, in the sense that any attempt to characterize the inexplicit notion of effective procedure would destroy its intent, it is supported by a host of theorems, since for any known characterization of computable functions, it has been shown that these are Turing computable. A random access machine (RAM) is a model of computation that is well-suited for specifying algorithms, since it uses an idealized, simplified programming language that closely resembles the assembly language of any modern-day digital computer. There is an infinite number of memory cells indexed by the integers, and there is no bound on the size of an integer that can be stored 5 in any cell. A program can directly specify a cell to be read from or written in, without moving a head into position. Furthermore, there is an indirect addressing option, that uses the contents of a cell as an index to another cell that is then ( seemingly randomly) accessed. All basic arithmetic operations can be performed. For further details on RAM's the reader is referred to Aho, Hopcroft & Ullman (1974). Since the RAM is cast more in the mold of a traditional programming language, it is less natural to think of using a RAM to solve decision problems, but this is just a special case with a simple kind of output, 0 or 1. in any cell. A program can directly specify a cell to be read from or written in, without moving a head into position. Furthermore, there is an indirect addressing option, that uses the contents of a cell as an index to another cell that is then ( seemingly randomly) accessed. All basic arithmetic operations can be performed. For further details on RAM's the reader is referred to Aho, Hopcroft & Ullman (1974). Since the RAM is cast more in the mold of a traditional programming language, it is less natural to think of using a RAM to solve decision problems, but this is just a special case with a simple kind of output, 0 or 1. In spite of the apparent differences in these three models, any particular choice is one of convenience, and not of substance. (2.1) Theorem. The following classes of decision problems are all identical: • the class of decision problems solvable by a Turing machine; • the class of decision problems solvable by a RAM; • the class of decision problems solvable by a family of circuits that can be generated by a Turing machine. This result is complemented by the following theorem, which is a consequence of the fact that there are an uncountable number of decision problems, but only a countable number of Turing machines. (2.2) Theorem. Not all decision problems are solvable by a Turing machine. This theorem does not provide a particular problem that is not solvable; it merely proves the existence of such a problem. Particular problems from a wide variety of fields of mathematics are known to have this property. In the next section, we will provide a few problems that are not solved by any Turing machine. One famous example of such a problem is Hilbert's tenth problem. 6 Hilbert asked for an effective procedure to decide if a given multivariate polynomial has an integer valued root. Culminating years of progress in this area, Matijasevic (1970) proved that there does not exist a Turing machine that solves this decision problem, so that by Church's thesis, there is no such effective procedure. Hilbert asked for an effective procedure to decide if a given multivariate polynomial has an integer valued root. Culminating years of progress in this area, Matijasevic (1970) proved that there does not exist a Turing machine that solves this decision problem, so that by Church's thesis, there is no such effective procedure. Consider again the Hamiltonian circuit problem. A nondeterministic Turing machine for it might be constructed by letting the guess tape encode a sequence of n nodes of the graph. The Turing machine simply verifies that there is an arc between each consecutive pair of nodes in the guessed sequence. If the graph is Hamiltonian, then there is a correct guess, but otherwise, for any sequence of n nodes there will be some consecutive pair that is not adjacent. The correct guess, in essence the Hamiltonian circuit, is a certificate that the graph is Hamiltonian. Observe that this definition is one-sided, since the requirements for instances in L and not in L are quite different. What could serve as a "guess" to certify that a graph is not Hamiltonian? Computational resources and complexity Now that we have mathematical formulations of both problems and machines, we can describe what is meant by the computational resources required to solve a certain problem. In considering the execution of a deterministic Turing machine, it is clear that the number of transitions before halting corresponds to the running time of the machine on a particular input. In discussing the running time of an algorithm, within any of the models, we do not want to simply speak of particular instances, and so we must make some characterization of the running time for all instances. The criterion that we will focus on for most of this chapter is the worst-case running time as a function of the input size. When we say that a Turing machine takes n2 steps, this means that for any input of size n, the Turing machine always halts within n2 transitions. Unless otherwise specified, the length of the input will be denoted by n. A more practical alternative might have been to use the "average" running time, but in order to do so, we must characterize an "average" instance. That is, we must specify a probability distri- 7 bution from which the input is drawn and then analyze the expected running time when the input is drawn accordingly. By focusing on worst-case analysis, the results do not depend on any such restriction. On the other hand, a theory of algorithms that could handle this sort of probabilistic analysis and still be relatively insensitive to the particulars of the probability distribution would significantly enrich the scope of complexity theory. bution from which the input is drawn and then analyze the expected running time when the input is drawn accordingly. By focusing on worst-case analysis, the results do not depend on any such restriction. On the other hand, a theory of algorithms that could handle this sort of probabilistic analysis and still be relatively insensitive to the particulars of the probability distribution would significantly enrich the scope of complexity theory. 2 + 5n -17, we say simply that it is O(n2). This simplification makes it possible to discuss complicated algorithms without being overwhelmed by details. Furthermore, for any Turing machine with superlinear running time and any constant c, there exists another Turing machine to solve the same problem that runs c times faster than the original, that can be constructed by using an expanded work tape alphabet. We will also use a notation analogous to O( ·) to indicate lower bounds. A function f( n) is n(g( n)) if there are constants N and c such that for all n 2:: N, f( n) 2:: cg( n ). A function f( n) is 0(g(n)) if it is both O(g(n)) and n(g(n)). It will also be useful to have a notion of one function being asymptotically smaller than another: f ( n) is o(g( n)) if limn-.oo f ( n) / g( n) = 0. For a nondeterministic Turing machine, we define the running time to be the number of transitions in any computation path generated by the input. (Recall that we added the restriction that all computation paths must have the same length.) For a circuit, we will want to characterize the number of operations performed as a measure of time, so that the relevant parameter is the size of the circuit, which is the order of the graph representing it. For a RAM, there are two standard ways in which to count the running time on a particular input. In each operation, a RAM can, for example, add two number of unbounded length. In the unit-cost model, this action takes one step, independent of the lengths of the numbers being added. In the log-cost model, this operation takes time proportional to the lengths of the numbers added. These measures can have radically different values. Take 17, and repeatedly square it k times. With each squaring, the number of bits in the binary representation essentially doubles. Thus, although we have taken only k steps in the unit-cost model, the time required according to the log-cost model is exponential ink. Technically, we will use the log-cost model, in order to ensure that the RAM is equivalent to the Turing machine in the desired ways. But, when speaking of the running time of an algorithm, it is traditional to state the running time in the unit-cost model, since for all standard algorithms one can prove that the pathological behavior of the above example does not come into play. Time is not the only computational resource in which we will be interested; we will see that the space complexity of certain combinatorial problems gives more insight into the structure of these problems. In considering the space requirements, we will focus on the Turing machine model, and will only count the number of cells used on the work tapes of the machine. Furthermore, we will again be interested in the asymptotic worst-case analysis of the space used. As was true for time bounds, the space used by a Turing machine can be compressed by any constant factor. The space used by a nondeterministic Turing machine is the maximum space used on any computation path. The notion of the complexity of a problem is the order of a given computational resource, such as time, that is necessary and sufficient to solve the problem. If we say that the complexity of the directed reachability problem for a graph with m arcs is 0( m) this means that there is a ( unit-cost 8 RAM) algorithm that has worst-case running time O(m) and there is no algorithm with running time of lower order. Tight results of this kind are extremely rare, since the tremendous progress in the design of efficient algorithms has not been matched, or even approached, by the slow progress in techniques for proving lower bounds on the complexity of these problems in general models of computation. For example, consider the 3-colorability problem: given an undirected graph G with m edges, can the nodes be colored with three colors so that no two adjacent nodes are given the same color; i.e., is x( G) ~ 3? The best lower bound is only il( m ), in spite of substantial evidence that it cannot be solved in time bounded by a polynomial. RAM) algorithm that has worst-case running time O(m) and there is no algorithm with running time of lower order. Tight results of this kind are extremely rare, since the tremendous progress in the design of efficient algorithms has not been matched, or even approached, by the slow progress in techniques for proving lower bounds on the complexity of these problems in general models of computation. For example, consider the 3-colorability problem: given an undirected graph G with m edges, can the nodes be colored with three colors so that no two adjacent nodes are given the same color; i.e., is x( G) ~ 3? The best lower bound is only il( m ), in spite of substantial evidence that it cannot be solved in time bounded by a polynomial. In order to study the relative power of particular computational resources, we introduce the notion of a complexity class, which is the set of problems that have a specified upper bound on their complexity. It will be convenient to define the complexity class DT IM E(T( n)) to be the set of all languages L that can be recognized by a deterministic Turing machine within time O(T( n) ). NT IM E(T( n)) denotes the analogous class of languages for nondeterministic Turing machines. Throughout this chapter, it will be convenient to make certain assumptions about the sorts of time bounds that define complexity classes. A function T( n) is called fully time-constructible if there exists a Turing machine that halts after exactly T( n) steps on any input of length n. All common time bounds, such as nlog nor n2, are fully time-constructible. We will implicitly assume that any function T(n) used to define a time-complexity class is fully time-constructible. The single most important complexity class is P, the class of problems solvable in polynomial time. Two of the most well-known algorithms, the Euclidean algorithm for finding the greatest common divisor of two integers and Gaussian elimination for solving a system of linear equations, are classical examples of polynomial-time algorithms. In fact, Lame observed as early as 1844 that the Euclidean algorithm was a polynomial-time algorithm. In 1953, Von Neumann contrasted the running time for an algorithm for the assignment problem that "turn[ed] out [to be] a moderate power of n, i.e., considerably smaller than the 'obvious' estimate n!" for a complete enumeration of the solutions. Edmonds (1965) and Cobham (1965) were the first to introduce P as an important complexity class, and it was through the pioneering work of Edmonds that polynomial solvability became recognized as a theoretical model of efficiency. With only a few exceptions, the discovery of a polynomial-time algorithm has proved to be an important first step in the direction of finding truly efficient algorithms. Polynomial time has proved to be very fruitful as a theoretical model of efficiency both in yielding a deep and interesting theory of algorithms and in designing efficient algorithms. There has been substantial work over the last 25 years in finding polynomial-time algorithms for combinatorial problems. It is a testament to the importance of this development that much of this handbook is devoted to discussing these algorithms. This work includes algorithms for graph connectivity and network :flow (see Chapter 2), for graph matchings (see Chapter 3), for matroid problems (see Chapter 11 ), for point lattice problems (see Chapter 19), for testing isomorphism (see Chapter 27), for finding disjoint paths in graphs (see Chapter 5) and for problems connected with linear programming (see Chapters 28 and 30). Another, more technical reason for the acceptance of P as the theoretical notion of efficiency, is its mathematical robustness. Recall the discussion of encodings where we remarked that any reasonable encoding will have length bounded by a polynomial in the length of another. As a 9 result, any polynomial-time algorithm which expects its input in one form can be converted to a polynomial-time algorithm for the other. In particular, note that the previous discussion of the two different encodings of a graph can be swept aside and we can assume that the size of the input for a graph of order n is n. Notice further that the informal definition of P given above does not rely on any model of (deterministic) computation. This rash statement is justified by the following theorem. result, any polynomial-time algorithm which expects its input in one form can be converted to a polynomial-time algorithm for the other. In particular, note that the previous discussion of the two different encodings of a graph can be swept aside and we can assume that the size of the input for a graph of order n is n. Notice further that the informal definition of P given above does not rely on any model of (deterministic) computation. This rash statement is justified by the following theorem. Theorem. The following classes of decision problems are all equivalent: • the class of decision problems solvable by a Turing macl1ine in polynomial-time; • the class of decision problems solvable by a RAM in polynomial-time under the log-cost measure; • the class of decision problems solvable by a family of circuits of polynomial size, where the circuit for inputs of size n can be generated by a Turing mad1ine witl1 running time bounded by a polynomial in n. The importance of the class NP = Uk NT IM E( nk) is due to the wide range of important problems that are known to be in the class, and yet are not known to lie in P. For example, the nondeterministic algorithm given for the Hamiltonian circuit problem is clearly polynomial, and so this problem lies in NP. However, it is not known whether this, or any problem is in NP\ P, and this is, undoubtably the central question in complexity theory. Open Problem Is P = NP? For each decision problem L, there is a complementary problem, L, such as the problem of recognizing non-Hamiltonian graphs. For any complexity class S, let co -S denote the class of languages whose complement is in S. The definition of Pis symmetric with respect to membership and non-membership in L, so that P = co -P. NP is very different in this respect. In fact, it is unknown whether the Hamiltonian circuit problem is in co -NP. Open Problem Is NP = co -NP? Edmonds (1965) brought attention to the class NP n co -NP, and called problems in this class well-characterized, since there is both a short certificate to show that the property holds, as well as a short certificate that it does not. Edmonds was working on algorithms for non-bipartite maximum matching at the time, and this problem serves as a good example of a problem in this class. If the instance consists of a graph G and a bound k and we wish to know if there is a matching of size at least k, the matching itself serves as a certificate for an instance in L, whereas an odd-set cover serves as a certificate for an instance not in L (see Chapter 3). Note that there is a min-max theorem, Tutte's theorem, characterizing the size of the maximum matching that is at the core of the fact that matching is in NP n co -NP, and indeed min-max theorems often serve this role. As mentioned above, matching is known to be in P, and this raises the following question. Open Problem Is P = NP n co -NP? We will also be concerned with complexity classes defined by the space complexity of problems. As for time, let DSPACE(S(n)) and NSPACE(S(n)) denote, respectively, the class of languages 10 accepted by deterministic and nondeterministic Turing machines within space O(S(n)). We will implicitly assume the following condition for all space bounds used to define complexity classes: a function S(n) is fully space-constructible if there is a Turing machine that, on any input of length n delimits S( n) tape cells and then halts. Three space complexity classes will receive the most prominent attention: accepted by deterministic and nondeterministic Turing machines within space O(S(n)). We will implicitly assume the following condition for all space bounds used to define complexity classes: a function S(n) is fully space-constructible if there is a Turing machine that, on any input of length n delimits S( n) tape cells and then halts. Three space complexity classes will receive the most prominent attention: C = DSPACE(logn); • NC= NSPACE(logn); and • PSP.AC,£ = UkDSPACE(nk). One might be tempted to add a fourth class, NP SP .AC,£, but we shall see that nondeterminism does not add anything in this case. We will see that the chain of inclusions holds, and the main thrust of complexity theory is to understand which of these inclusions is proper. At the extremes, a straightforward diagonalization argument due to Hartmanis, Lewis & Stearns (1965) shows that C i= PSP.AC.£, and a result of Savitch (1970) further implies that NC i= PSP.AC.t:, but after nearly a quarter century more work, these are the only sets in this chain known to be distinct. The lack of lower bounds with respect to a general model of computation for either space or time complexity has led to the search for other evidence that suggests that lower bounds hold. One such type of evidence might be the implication: if the Hamiltonian circuit problem is in P, then P = NP. Afp contains a tremendous variety of problems that are not known to be in P, and so by proving such a claim, one shows that the Hamiltonian circuit problem is a hardest problem in NP, in that any polynomial-time algorithm to solve it would in fact solve thousands of other problems. The principal tool in providing evidence of this form is that of reduction. We will say that a problem L1 polynomial-time reduces to L2 if there exists a polynomial-time computable function f that maps instances of L1 into instances of L2 such that x is a "yes" instance of L1 if and only if J(x) is a "yes" of L2• We shall denote this by L1, we construct a graph G as follows: for each clause in ¢>, let there be 3 nodes in G, each representing a literal in the clause, and let these 3 nodes induce a clique ( i.e., they are pairwise adjacent); complete the construction by making adjacent any pair of nodes that represent a literal and its negation, and set k to be the number of clauses in ¢>. If there is a satisfying assignment for, pick one literal from each clause that is given the assignment true; the corresponding nodes in G form a stable set of size k. If there is a stable set of size k, then it must have exactly one node in each clique corresponding to a clause. Furthermore, the nodes in the stable set cannot correspond to both a literal and its negation, so that we can form an assignment by setting to true all literals selected in the stable set, and extending this assignment to the remaining variables arbitrarily. This is a satisfying assignment. This is a characteristic reduction, in that we build gadgets to represent the variables and clause structure within the framework of the new problem. The NP-completeness of two other problems follow immediately: the clique problem, given a graph G and a bound k, decide if there is a clique in G of size k; and the node cover problem, given a graph G and a bound k, decide if there exists a set of k nodes such that every edge is incident to a node in the set. a( G) 2: k; that is, do there exist k pairwise non-adjacent nodes in G? It is easy to see that this problem is in NP, and to show that it is complete, we reduce 3-SAT to it and invoke Theorem (2.5). Given a 3-SAT instance , we construct a graph G as follows: for each clause in ¢>, let there be 3 nodes in G, each representing a literal in the clause, and let these 3 nodes induce a clique ( i.e., they are pairwise adjacent); complete the construction by making adjacent any pair of nodes that represent a literal and its negation, and set k to be the number of clauses in ¢>. If there is a satisfying assignment for, pick one literal from each clause that is given the assignment true; the corresponding nodes in G form a stable set of size k. If there is a stable set of size k, then it must have exactly one node in each clique corresponding to a clause. Furthermore, the nodes in the stable set cannot correspond to both a literal and its negation, so that we can form an assignment by setting to true all literals selected in the stable set, and extending this assignment to the remaining variables arbitrarily. This is a satisfying assignment. This is a characteristic reduction, in that we build gadgets to represent the variables and clause structure within the framework of the new problem. The NP-completeness of two other problems follow immediately: the clique problem, given a graph G and a bound k, decide if there is a clique in G of size k; and the node cover problem, given a graph G and a bound k, decide if there exists a set of k nodes such that every edge is incident to a node in the set. If we restrict the stable set problem to a particular constant value of k ( e.g., is a( G) ::; 100?), then this problem can be solved in P by enumerating all possible sets of size 100. In contrast to this, Stockmeyer (1973) showed that the 3-colorability problem is NP-complete by reducing 3-SAT to it. Let be a 3-SAT formula. We construct a graph G from it in the following way. First, construct a "reference" clique on three nodes, called true, false and undefined; these nodes will serve as a way of naming the colors in any 3-coloring of the graph. For each variable in , construct a pair of adjacent nodes, one representing the variable, and one representing its negation, and make them both adjacent to the undefined node. For each clause, Ii V /2 V /3, construct the subgraph shown in Figure 1 below, where the nodes labeled with literals, as well as false (F) and undefined (U), are the nodes already described. It is easy to see that if has a satisfying assignment, we can get a proper 3-coloring of this graph as follows: color the nodes corresponding literals that are true in the assignment with the same color as is given node true; color analogously the nodes for false literals; and then extend this coloring to the remaining nodes in a straightforward manner. Furthermore, it involves only a little case-checking to see that if the graph is 3-colorable, then the colors can be interpreted as a satisfying assignment. 19 Figure 1: Figure 1: . integer variable bounded between O and 1, and for each Boolean variable x, constrain the sum of the variables corresponding to x and its negation to be at most 1. The construction is completed by adding a constraint for each clause that forces the variables for the literals in the clause to sum to at least 1. On the other hand, to show that the problem is in NP requires more work, involving a calculation that bounds the length of a "smallest" solution satisfying the constraints (if one exists at all). The subset sum problem defined previously is also .NP-complete. This is the first problem that we have encountered that is a "number problem" in the sense that it is not the combinatorial structure, but rather the numbers that make this problem hard. If the numbers are given in unary, there is a polynomial-time algorithm (such an algorithm is called pseudo-polynomial): keep a (large, but polynomially bounded) table of all possible sums obtainable using a subset of the first i numbers; this is trivial to do for i = 1, and it is not hard to efficiently find the table for i + 1 given the table for i; the table for i = n gives us the answer. There are "number problems" that remain .NP-complete, even if the numbers are encoded in unary; such problems are called strongly .NP-complete. One example is the 3-partition problem: given a set S of the 3n integers that sum to nB, does there exist a partition of S into n sets, T1, .•• , Tn, where each Ti has 3 elements that sum to B? Consider a weighted generalization of the exact matching problem, where each edge of a bipartite graph is given a weight Wij and we wish to decide if there is a perfect matching of total weight exactly W. It is easy to see that the randomized algorithm given for the unweighted case extends to the weighted case, if the weights are given in unary. ( Generalize the previous construction to have a factor of yw for each edge of weight w.) On the other hand, for weights in binary, there is a straightforward reduction from the subset sum problem: build a disjoint copy of K2,2 for each numbers in our subset sum instance, where the weights are such that one perfect matching of the subgraph has weight s, whereas the other has weight 0. It is easy to see that the union of these subgraphs has a perfect matching of weight B, if and only if some subset of the numbers of the instance of the subset set problem total exactly B. 20 The Complexity Class NP \ P? The Complexity Class NP \ P? ( 3.3) Theorem. If L1 is decidable and L1 ~ P, then there exists a decidable language L2 such that L2 ~ P, L2 cxpL1 but L1 cf..rf.,2. Note that if L1 is NP-complete, the language L2 is "in between" the classes P and NP-complete, if P i= NP, and by repeatedly applying the result we see that there is a whole range of seemingly different complexity classes. Under the assumption that Pis different from NP, do we know of any candidate problems that may lie in this purgatory of complexity classes? The answer to this is "maybe". We will give four important problems that have not been shown to be either in P or NP-complete. The problem for which there has been the greatest speculation along these lines is the graph isomorphism problem: given a pair of graphs G1 = (V1, E1) and G2 = (V2, E2), decide if there is a bijection O" : Vi 1-+ V2 such that ij E E1 if and only if u( i)u(j) E E2. Later in this section we will provide evidence that it is not NP-complete, and efforts to show that it is in P have so far fallen short (see Chapter 27 of this volume). A problem that mathematicians since the ancient Greeks have been trying to solve is that of factoring integers; a decision formulation that is polynomially equivalent to factoring is as follows: given an integer N and a bound k, does there exist a factor of N that is at most k? A problem that is no harder than factoring is the discrete logarithm problem: given a prime p, a generator g and an natural number x < p, find l such that g' = x mod p. Finally, there is the shortest vector problem, where we are given a collection of integer vectors, and we wish to find the shortest vector (in the Euclidean norm) that can be represented as a non-zero integral combination of these vectors. Here, current evidence makes it seem likely that this problem is really NP-complete; related work is discussed elsewhere in this volume (see Chapter 19). It is important to mention that there is an important subclass of NP which may also fall in this presumed gap. Edmonds' class of well-characterized problems, NP n co-NP, certainly contains P and is contained in NP. Furthermore, unless NP = co -NP, it cannot contain any NP-complete problem. On the other hand, the prevailing feeling is that showing a problem to be in this class is a giant step towards showing that the problem is in P. A result of Pratt ( 1975a) shows that primality is in NP, so it lies in NP n co -NP, though as discussed above, there is additional evidence that it lies in P. The factoring problem, which appears to be significantly harder, also lies in NP n co -NP, since a prime factorization can be guessed along with certificate that each of the factors is indeed prime. One interesting open question connected with NP n co -NP is concerned with the existence of a problem that is complete for this class. One might hope that there is some natural problem that completely characterizes the relationship of NP n co NP with P and Afp (in the same manner that 3-SAT characterizes the P = NP question). One approach to shed light on the complexity of a problem that is not known to be either in P or 21 NP-complete, has been to consider weaker forms of completeness for NP. In fact, Cook's original notion of completeness, though technically a weaker definition of intractability, is no less damning. L1 cxpLNP-complete, has been to consider weaker forms of completeness for NP. In fact, Cook's original notion of completeness, though technically a weaker definition of intractability, is no less damning. L1 cxpL, can be thought of as solving L1 by a restricted kind of subroutine call, where first some polynomial-time preprocessing is done, and then the subroutine for L2 is called once. Cook (1971) proposed a notion of reducibility, where L1 is solved by using a polynomial-time Turing machine that can, in one step, get an answer to any query of the form, "is x E L2 ?" Such a more general Turing machine is called an oracle machine, and in the next subsection, we will discuss Turing machines and complexity classes relative to an oracle A in more detail. Note that any complete language L with respect to this reducibility still has the property that L E P if and only if P = NP. Karp (1972) focused attention on the notion exp, and was able to show that NP-completeness was powerful enough to capture a wide range of combinatorial problems. On the other hand, it remains an open question to show that Cook's notion of reducibility is stronger than Karp's; is there a natural problem in NP that is complete with respect to "Cook" reducibility, but not with respect to "Karp" reducibility? Adleman & Manders (1977, 1979) showed that non-determinism and randomization can play a role in defining notions of reducibility. The first notion, ,-reducibility, used nondeterminism in order to provide a stronger notion of a complete problem that still is not in NP n co -NP unless NP = co -NP. A randomized notion of completeness was introduced in order prove results of the form that L is in 'RP if and only if RP = NP. They also proved completeness results for several number theoretic problems in NP that are not known to be .NP-complete. The polynomial-time hierarchy It is important to realize that not all natural combinatorial problems lie in NP or co -NP. For example, consider the following generalized coloring problem: given input graphs G and H, can we color the nodes of G with two colors so that the graph induced by each color does not contain Has a subgraph? Given the guess of a coloring, there is an exponential number of possible subgraphs that all must be checked, so at least the most obvious approach does not work. Is there a more sophisticated approach that puts this problem or its complement in NP, or is this difficulty inherent? One can view NP as the class of languages L that can be expressed in the following way: there exists a language L' E P such that x E L {::} 3y of length bounded by a polynomial in the length of x such that (x, y) E L'. (We will denote this polynomially bounded quantification by 3p.) By writing NP in this way, we are led to a generalization that does contain the above problem. Merely allow an alternation of quantifiers: ( G, H) has a coloring if 3p a bipartition such that VP subgraphs G' of one part of G, G' is not equal to H. (Note that the specification of G' implicitly contains a bijection to the nodes H.) Of course, one need not stop with one alternation of quantification; this motivates a hierarchy of classes, called the polynomial-time hierarchy: I:,'f = {Ll3L' E P such that x EL if and only if 3pY1 VpY2 · · · QkYk(x, Y1, Y2, ... , Yk) EL'}, where Qk is 3P if k is odd, and VP if k is even. Thus, generalized coloring is in I:,f. Furthermore, note that Lib = P and I:,f = NP. We also define IIf to be co -I:,'f:. Clearly, I:,'f U II'f ~ Lik+i. Alternatively, it is possible to define the polynomial-time hierarchy in terms of oracles, as was done in the original formulation by Meyer & Stockmeyer (1972). An oracle Turing machine has a special additional tape, called a query tape, and three special states, qask, qno and qyes· The oracle Turing machine MA can ask questions of the form, "is x in A?" by writing x on the query tape, 22 and entering the state qask· The machine enters state qyes or qno depending on the outcome to the question. Let .NPA denote the class of languages recognized by nondeterministic polynomial-time oracle Turing machines with access to an oracle for the language A, and in general, any complexity class C has its relativized analogue cA. The notion of the space used by an oracle Turing machine is ambiguous, since it is not clear whether the space used on the query tape should be counted. We shall assume that it is, though this has the disadvantage that c,A might not contain A. Finally, let .NPC denote the union of .NpA over all A EC. Consider again the generalized coloring problem; it is not hard to see that there is nondeterministic polynomial-time Turing machine to solve it, given an oracle for the following problem A: given G and H, decide if H is a subgraph of G. Since A E .NP, we see that this coloring problem is in .Np.llf P. In fact, :Ef = .Np.llf P and in general, :Ef+i = .Np"E'f. Unfortunately, for each new complexity class, there is yet another host of unsettled questions. and entering the state qask· The machine enters state qyes or qno depending on the outcome to the question. Let .NPA denote the class of languages recognized by nondeterministic polynomial-time oracle Turing machines with access to an oracle for the language A, and in general, any complexity class C has its relativized analogue cA. The notion of the space used by an oracle Turing machine is ambiguous, since it is not clear whether the space used on the query tape should be counted. We shall assume that it is, though this has the disadvantage that c,A might not contain A. Finally, let .NPC denote the union of .NpA over all A EC. Consider again the generalized coloring problem; it is not hard to see that there is nondeterministic polynomial-time Turing machine to solve it, given an oracle for the following problem A: given G and H, decide if H is a subgraph of G. Since A E .NP, we see that this coloring problem is in .Np.llf P. In fact, :Ef = .Np.llf P and in general, :Ef+i = .Np"E'f. Unfortunately, for each new complexity class, there is yet another host of unsettled questions. 1? Open Problem For each k ~ 1, is :Ef = IIf? Contained in these, for k = 1, are the P = .NP and .NP = co -.NP questions, and as was true for those questions, one might hope to find complete problems on which to focus attention in resolving these open problems. We define these notions of completeness with respect to polynomial-time reducibility, so that L is complete for C if and only if L E C and all problems in C reduce ( o:Ef. As we shall see, the polynomial-time hierarchy has helped to provide structure to a number of complexity classes. Perhaps the first result along these lines is due to Sipser (1983b ), who used a beautiful "hashing function" technique to show that EPP is in the polynomial-time hierarchy, and in fact, can be placed within :Ef n IIf. Evidence of intractability: #P-completeness Consider the counting problem of computing the number of perfect matchings in a bipartite graph. If this is cast as a decision problem, it does not appear to be in .NP, since it seems that the number of solutions can only be certified by writing down possibly exponentially many matchings. However, consider modifying the definition of .NP to focus on the number of good certificates, rather than the existence of one; let #'P be the class of problems for which there exists a language L' E 'P and a polynomial p( n) such that for any input x, the only acceptable output is z = l{Y: IYI = p(jxl) and (x, y) E L'}I. Clearly, the problem of counting perfect matchings in a 23 bipartite graph is in #P. Any problem in #P can be computed using polynomial space, since all possible certificates y may be tried in succession bipartite graph is in #P. Any problem in #P can be computed using polynomial space, since all possible certificates y may be tried in succession . We can also define a notion of a complete problem for #P. To do this, we use a reducibility analogous to that used by Cook. Let Pi, i = 1, 2 denote a counting problem, where for an instance x, the number of solutions is P;(x). The problem Pi reduces to P2 if there exists a polynomialtime Turing machine that can compute Pi if it has access to an oracle that, given x can compute P2( x) in one step. We define a counting problem to be #P-complete if it is in #P, and all problems in #P reduce to it. A weaker notion of reducibility, analogous to the notion used by Karp, is called parsimonious: an instance x of P1 is mapped in polynomial time to f ( x) for P2, such that P1(x) = P2(f(x)). For either notion of reducibility, we see that any polynomial-time algorithm for P2 yields a polynomial-time algorithm for P1. It is not hard to see that the proof of Cook's theorem shows that the problem of computing the number of satisfying assignments of a Boolean formula in conjunctive normal form is #P-complete. Furthermore, by being only slightly more careful, it is easy to give parsimonious modifications of the reductions to the clique problem, the Hamiltonian circuit problem, and seemingly any NP-complete problem, and so the counting versions of NP-complete problems can be shown to be #P-complete. Surprisingly, not all #P-complete counting problems need be associated with an NP-complete problem. Computing the number the perfect matchings in a bipartite graph, ( or equivalently, computing the permanent of a 0,1 matrix) is a counting version of a problem in P, the perfect matching problem, and yet Valiant (1979) showed that this problem is complete. This made it possible to prove a variety of counting problems to be #P-complete that are related to polynomially solvable decision problems. There are many problems that are essentially #P-complete, although they do not have the appearance of a counting problem. An important practical example is the problem of computing the reliability of a network: given a graph G with a source s and a sink t, and probabilities p( e) that edge e will fail, what is the probability that s becomes disconnected from t? In practice, it may be more important to estimate this value, rather than determining it precisely, and work of Karp & Luby (1985) provides algorithms to do this. Another problem that has been shown to be intimately connected with the estimation of the value of counting problems is that of uniformly generating combinatorial structures. As an example, suppose that a randomized algorithm requires a perfect matching in a bipartite graph G to be chosen uniformly from the set of all perfect matchings in G; how can.this be done? Jerrum, Valiant & Vazirani (1986) have shown that a relaxed version of this problem, choosing perfect matchings that are selected with probability within an arbitrarily small constant factor of the uniform value, is equivalent to the problem of estimating the number of perfect matchings ( using a randomized algorithm). In fact, their result carries through to most counting/generation problems related to problems in NP, since it requires only a natural self-reducibility property, that says that an instance can be solved by handling some number of smaller instances of the same problem. Stockmeyer (1985) has provided insight into estimating the value of #P problems, by trying to place these problems within the polynomial-time hierarchy. Using Sipser's hashing function technique, Stockmeyer showed that for any problem in #P and any fixed values £, d > O, there exists a polynomial-time Turing machine with a :Bf-complete oracle that computes a value that is 1 A recent result of Toda gives new evidence of the intractibility of this class by showing the P1i ~ p#1'. 24 within a factor of (1 + rn-d) of the correct answer. within a factor of (1 + rn-d) of the correct answer. In this subsection, we turn to the question of the space complexity of problems. When we discuss space complexity, we may assume that the Turing machine has only one work tape ( and a separate input tape), since by expanding the tape alphabet, any number of tapes can be simulated by one tape without using more space. It is not hard to see that PSPACe. must contain the entire polynomial-time hierarchy. For any LE ~f, 3L' E P such that x EL if and only if 3pY1 Vpy2 · · · QkYk(x, Yi, Y2, ... , Yk) E L', and so all possible Y1, ... , Yk can be tried in succession, to see if the condition specified by the alternating quantifiers is satisfied. Similarly, P SP AC£ contains #P, since all computation paths of a polynomialtime counting machine may be tried in succession. We remarked when introducing 'P SP AC£ that it was not an oversight that NP SP AC£ was not defined, since 'P SP AC£ = NP SP AC£. This result is a special case of the following. (3.4) Theorem. (Savitch) If L is accepted by a nondeterministic Turing machine using space S ( n) 2'.: log n, then it is accepted by a deterministic Turing machine using space S ( n )2. Proof: The proof is based on the idea of bounding the number of configurations of the Turing machine and using a natural divide-and-conquer strategy. In any given computation, a nondeterministic Turing machine M can be completely described by specifying the input head position, the contents of the work tape, the work tape head position and the current state. If M uses S( n) space, then there are no more than d8(n) configurations, for some constant d, and one can be written down in S( n) space. Consequently, we can assume that M halts within d8(n) steps, since we can keep a counter of the number of steps taken and halt if that many are done. To simulate M by a deterministic machine, we build a procedure that recursively calls itself to check for any configurations I and K, and a bound l, whether M, when started in J, can reach K within 21 steps. There is such a computation if there exists a midpoint of the computation J such that I can lead to J and J can lead to Keach within 21-1 steps. The existential quantifier can be implemented by merely trying all possible Jin some specified order. The basis of this recursion is the case l = O, where we merely need to know if I = K, or if one transition leads from I to K, and this can be easily checked. In implementing this procedure, we need to keep track of the current midpoint at each level of the recursion, and so we need 1S(n) space to do this. Thus, by running the procedure with the initial configuration and every accepting configuration with l = S( n) log d, we get a deterministic procedure that accepts Land uses space O(S(n)2). D The idea of "guessing a midpoint" is also the main idea used to derive a P SP ACe.-complete problem, where completeness is defined with respect to polynomial-time reductions. The problem of the validity of quantified Boolean formulae is as follows: given a formula in first-order logic in prenix form, 3x1 V x2 · · · Q kX k must evaluate to true. There are many other P SP ACE-complete problems known, and most of these have a game-like flavor. An example of a more natural PSPACf-complete game is the Shannon switching game: given a graph G with two nodes s and t, two players alternately color the edges of G, where the first player, coloring with red, tries to construct a red path from s to t, whereas the second player, coloring with blue, tries to construct a blue ( s, t)-cut; does the first player have a winning strategy for G? Another problem that has a simple PSPACf-completeness proof deals with determining if two nodes are connected in a directed graph G ( of order 2n) that is given by a circuit with 2n inputs where the first n specify in binary a node i and the second n specify a node j, and the output of the circuit is 1 if and only if (i,j) is an arc of G. If Lis accepted by Min polynomial space, L can be reduced to this circuit-based directed reachability problem, by letting the nodes of G correspond to configurations of M, where arcs (i,j) correspond to possible transitions of M. It is easy to construct in polynomial time a (polynomial-size) circuit that decides if one configuration follows another by one transition of M. Since it easy to massage M so that it only has one accepting configuration, we have reduced the acceptance of the input x by M to whether the initial configuration of x is connected to the accepting configuration. The role of games in PSPACf-completeness suggests a new type of Turing machine, called an alternating Turing machine, which was originally proposed by Chandra, Kozen & Stockmeyer (1981). Consider a computation to be a sequence of moves made by two players, an existential player and a universal player. The current state indicates whose move it is, and in each configuration the specified player has several moves from which to choose. Each computation path either accepts or rejects the input. The input is accepted if the existential player has a winning strategy, that is, if there is a choice of moves for the existential player, so that for any choice of moves by the universal player, the input is accepted. For simplicity, assume again that each computation path has the same number of moves. As before, the time to accept an input x is this number of moves, and the space needed to accept x is the maximum space used on any computation path. Observe that a nondeterministic machine is an alternating machine where only the existential player moves. It is easy to see that the 1th level of the polynomial-time hierarchy can be defined in terms of polynomial-time alternating Turing machines where the number of "alternations" between existential and universal quantifiers is less than l. The role of PSPACf in the definition of these machines suggests that alternating polynomial time is closely related to PSPACf and indeed this is just a special case of a general phenomenon. Let ATIME(T(n)) and ASPACE(S(n)) denote the classes accepted by an alternating Turing machine with O(T( n)) time and 0( S( n)) space, respectively. Chandra, Kozen & Stockmeyer proved two fundamental results characterizing the relationship of alternating classes to deterministic and nondeterministic ones. Note that the first result implies Savitch's theorem, and in fact, the proof of the last inclusion of Theorem (3.5) is very similar to the proof of Savitch's theorem. (3.5) Theorem. IfT(n) ~ n, then ATIME(T(n)) ~ DSPACE(T(n)) ~ NSPACE(T(n)) ~ ATIME(T(n)2) 26 (3.6) (3.6) ASP ACE( S( n)) = Uc>oDT IM E( cs(n)) Among the consequences of these results, we see that AP PSP.ACe. In fact, one can view an alternating Turing machine as extremely idealized parallel computer, since it can branch off an unbounded number of parallel processes that can be used in determining the final outcome. Therefore, one can consider these results as a proven instance of the parallel computation thesis: parallel time equals sequential space (up to a polynomial function). Proof of intractability It is, of course, far preferable to prove that a problem is intractable, rather than merely giving evidence that supports this belief. Perhaps the first natural question is, are there any decidable languages that require more time than T( n )? The diagonalization techniques used to show that there are undecidable languages can be used to show that for any Turing computable T( n ), there must exist such a language; consider the language L of all i such that if i is run on the Turing machine Mi that i encodes, either it is rejected, or it runs for more than T( n) steps. It is easy to see that L is decidable, and yet no Turing machine that always halts within T( n) steps can accept L. Using our stronger assumption about T(n) (full time-constructibility) we are able to define such a language that is not only decidable, but can be recognized within only slightly more time than T(n). The additional logarithmic factor needed in Theorem (3.7), which combines results of Hartmanis & Stearns (1965) and Rennie & Stearns (1966), is due to the fact that the fastest known simulation of a (multitape) Turing machine by a machine with some constant number of tapes slows down the machine by a logarithmic factor. . . T1 ( n) log T1 ( n) . (3.7) Theorem. Jflimmf T ( ) = O, then there exists LE DTIME(T2) \DTIME(T1). n-+oo n 2 Since any multitape machine can be simulated by a 1-tape machine without using any additional space, the corresponding space hierarchy theorem does not require the logarithmic factor. Seiferas, Fischer & Meyer (1978) have proved an analogous, but much more difficult, hierarchy theorem for nondeterministic time. (3.8) Theorem. Ifliminf Ti(n(+/) = 0, then there exists LE NTIME(T2) \ NTIME(T1). n-+oo T2 n Although these theorems are non-constructive, Meyer & Stockmeyer (1972) developed the following strategy that makes use of completeness results in order to prove lower bounds on particular problems. Intuitively, a completeness result says that a problem is a "hardest" problem for some complexity class, and Theorems (3.7) and (3.8) can be used to show that certain complexity classes have provably hard problems. Consequently, these two pieces imply that the complete problem is provably hard. 27 As an example, consider the circuit-based large clique problem, Lee, which is the problem analogous to the circuit-based directed reachability problem, that tests whether the ( compactly represented) input graph on N = 2n nodes has a clique of size N /2. This problem is complete for the class N£XPTIM£ = UkNTIME(2nk) with respect to polynomial-time reducibility. This can be seen by introducing the circuit-based version of satisfiability; a formula is represented by a polynomial-size circuit that outputs 1 whenever the first set of inputs gives the binary representation of the index of a variable that occurs in the clause specified by the remainder of the inputs. The generic reduction of Cook's theorem translates exactly to show that circuit-based satisfiability is Nt:.XPTIM£-complete, and the completeness of Lee follows by using essentially the same reduction used to show the NP-completeness of the ordinary clique problem. Given this completeness result, it is easy to prove the following theorem. As an example, consider the circuit-based large clique problem, Lee, which is the problem analogous to the circuit-based directed reachability problem, that tests whether the ( compactly represented) input graph on N = 2n nodes has a clique of size N /2. This problem is complete for the class N£XPTIM£ = UkNTIME(2nk) with respect to polynomial-time reducibility. This can be seen by introducing the circuit-based version of satisfiability; a formula is represented by a polynomial-size circuit that outputs 1 whenever the first set of inputs gives the binary representation of the index of a variable that occurs in the clause specified by the remainder of the inputs. The generic reduction of Cook's theorem translates exactly to show that circuit-based satisfiability is Nt:.XPTIM£-complete, and the completeness of Lee follows by using essentially the same reduction used to show the NP-completeness of the ordinary clique problem. Given this completeness result, it is easy to prove the following theorem. Theorem. There exists a constant c > 0 such that Lee¢ NTIME(2nc). Proof: Let L be the language in NTIME(2n2) -NTIME(2n) specified by Theorem (3.8). Since L E Nt:.XPTIMt:., Lcx.pLcc. Let nk be the time bound for this reduction. Choose c = 1/k, and now suppose that Lee E NTIME(2nc). We can decide if a string x oflength n is in L by first applying the reduction to get a string of length at most nk, and then using the Turing machine that decides Lee within the assumed bound. This composite nondeterministic procedure runs in time nk + 2nkc = 0(2n), and this contradiction proves the theorem. D One interpretation of this result is that there exist graphs specified by circuits such that proofs that these graphs have a large clique require an exponential number of steps (in terms of the size of the circuit). Observe that we would have been able to prove a stronger result if we would have had a better bound on the length of the string produced by the reduction. Lower bounds proved using this strategy are often based on completeness with respect to reducibilities that are further restricted to produce an output of length bounded by a linear function of the input length. This strategy has been applied primarily to problems from logic and formal language theory. A result of Fischer & Rabin (1974) that contrasts nicely with Theorem (3.1) concerns Lpa, the language of all provable sentences in the theory of arithmetic for natural numbers without multiplication, which was shown to be decidable by Presburger (1929). ( 3.10) Theorem. There is a constant c > 0 such that Lpa rf. NT IM E(22cn). A representative sample of problems treated in this fashion is surveyed by Stockmeyer (1987). Extensions of NP: short proofs via randomization In the same way that randomized algorithms give rise to an extended notion of efficiently computable, randomized proofs give rise to an extension of NP, the class of languages for which membership is efficiently provable. Randomized proofs can be thought of as efficiently proving membership in a language by giving overwhelming statistical evidence, rather than proving with certainty. In contrast to the extensions of NP discussed previously, this notion will give rise to complexity classes much closer to the spirit of NP, and can be thought of as being "just above NP". In creating this branch of complexity theory, Goldwasser, Micali & Rackoff (1985) and Babai (1985) have given definitions to capture related notions of proof based on statistical evidence. We 28 will give a brief overview of this area, and the interested reader is directed to the surveys by Goldwasser (1989) and Johnson (1988) or the introduction to the paper by Babai & Moran (1988). will give a brief overview of this area, and the interested reader is directed to the surveys by Goldwasser (1989) and Johnson (1988) or the introduction to the paper by Babai & Moran (1988). 2, and Merlin wants to convince him that the two graphs are not isomorphic. Merlin can do this in the following way: Merlin asks Arthur to choose one of the two graphs, randomly relabel the nodes, and show him this random isomorphic copy of the chosen graph; Merlin will tell which graph Arthur chose. If the two graphs are isomorphic, then Merlin has only a fifty percent chance of choosing the right graph, assuming that he cannot read Arthur's mind. If Merlin can successfully repeat this test several times, then Arthur can fairly safely conclude that Merlin can distinguish isomorphic copies of the two graphs; in particular, the two graphs are not isomorphic. The above randomized proof is a prime example of the interactive proof defined by Goldwasser, Micali & Rackoff (1985). An interactive protocol consists of two Turing machines: the Prover (Merlin) and the Verifier (Arthur). The Prover is trying to convince the Verifier that a certain word x is in a language L. The Prover has unrestricted power; the Verifier is restricted to be a randomized polynomial-time Turing machine. The two machines have a common input tape, and each has its own private randomizing tape and work tapes. The two machines operate in turns; after each machine's turn, it writes its output on a special communication tape, and the other machine starts its turn by reading this message. At the end of the protocol, the Verifier announces whether or not it accepts x. A language L has an interactive proof system if there exists an interactive protocol such that • if x E L, then the Verifier accepts with probability at least 2/3 ( i.e., if x E L, then the Prover can "prove this"); • if x oAM(nk). One might expect a natural relationship between the levels of the Arthur-Merlin hierarchy and the polynomial-time hierarchy. The simplified proof that l3PP ~ 'Jjf n Ilf due to Lautemann ( 1983) has been extended by Babai (in the case of constant t( n)) and Aiello, Goldwasser & Hastad (1986) (in general) to show that AM(t(n)) ~ Ili{n) for t(n) 2:: 2. It would be natural to conjecture that the hierarchies for both notions of randomized proofs are strict, and that AM(k) is strictly contained in IP(k) for every k. (For example, in the graph nonisomorphism proof system given above, note how important it is that Arthur's coins be private.) The first surprise in this area was Babai's proof that the finite levels of the Arthur-Merlin hierarchy collapse. (3.11) Theorem. AM(t(n) + 1) = AM(t(n)) for every polynomially bounded t(n) 2:: 2. To prove this result, we will decrease the number of alternations by interchanging a move of Arthur with a move of Merlin. Consider a move of Merlin followed by a move of Arthur. If we simply ask Arthur to move first, this gives Merlin an unfair advantage that may be sufficient to turn 30 around the odds of the game. To decrease this advantage, we shall use the standard trick of letting Arthur and Merlin play several games in parallel, and choosing the majority decision. Consider again the consecutive moves of Merlin and Arthur in the original game. At Arthur's move, we split the game into several parallel games by allowing Arthur to make several alternate (random) moves. It can be shown that by interchanging the ( unsplit) move of Merlin with the (split) move of Arthur, we obtain a new game that defines the same language. In essence, if Merlin's winning probability was small enough, then even after seeing (ahead of time) the several next moves of Arthur, it is not too likely that he has a single move that is good for the majority of Arthur's choices. around the odds of the game. To decrease this advantage, we shall use the standard trick of letting Arthur and Merlin play several games in parallel, and choosing the majority decision. Consider again the consecutive moves of Merlin and Arthur in the original game. At Arthur's move, we split the game into several parallel games by allowing Arthur to make several alternate (random) moves. It can be shown that by interchanging the ( unsplit) move of Merlin with the (split) move of Arthur, we obtain a new game that defines the same language. In essence, if Merlin's winning probability was small enough, then even after seeing (ahead of time) the several next moves of Arthur, it is not too likely that he has a single move that is good for the majority of Arthur's choices. It was an even bigger surprise when Goldwasser & Sipser (1986) proved that interactive proofs and Arthur-Merlin games define the same complexity classes. (3.12) Theorem. IP(t(n)) = AM(t(n)) for every polynomially bounded function t(n) 2: 2. The idea of the proof can be outlined as follows. Instead of playing the game given by the interactive protocol, Merlin can try to convince Arthur that he could win for a majority of Arthur's random choices. This approach requires a technique to prove a lower bound on the size of an exponentially large set. To give an idea of how to provide such a lower bound consider the following problem. Consider a language L E NP that consists of pairs ( x, y) of strings of equal length where, for example, x can be viewed as the common input, and y as Arthur's random coin tosses. Define L(x) = {y: (x,y) EL}, and suppose that Arthur wants to convince Merlin that IL(x)I 2: f for some integer f. This problem might be #P-complete. However, the fraction of random strings for which the Prover can win in an interactive proof is either at least 2/3 or at most 1/3. Therefore, it might be useful to achieve the following, more modest goal: for any fixed E, prove that there exists an AM protocol in which Merlin can almost certainly win if IL(x)l 2: (1 + E)f and has almost no chance to win if IL(x)I < f. Goldwasser & Sipser have given such a protocol by using a hashing function technique similar to the one used to prove that BPP ~ :Ef n IIf. To see how this lower bounding technique can be used to turn an interactive proof into an Arthur-Merlin game, we consider the graph non-isomorphism problem. The following ArthurMerlin protocol is an adaptation of the Arthur-Merlin protocol given by the Goldwasser & Sipser proof when applied to the graph non-isomorphism protocol given in the beginning of this subsection. Let Aut(X) denote the automorphism group of the graph X (see Chapter 27), and let (G1, G2) denote the input. We may assume that both graphs are connected. Furthermore, let G denote the graph that consists of node-disjoint copies of G1 and G2• If the two graphs are isomorphic then IAut(G)I = 2 · IAut(G1)l · IAut(G2)I, and otherwise, IAut(G)I = IAut(G1)I · IAut(G2)I. The non-isomorphism of two graphs can be shown by proving fairly tight lower bounds for IAut( Gi)I for i = 1, 2, and an upper bound for I A ut( G) I-The upper bound can be shown by providing a lower bound on the number of different isomorphic copies of G. (In fact, the lower bounds for IAut( Gi)I can be proved without randomization by guessing generators for the automorphism groups and using a polynomial-time algorithm to compute the cardinality of the generated group.) Although the result of Goldwasser and Sipser implies that we can restrict attention to one of these two sorts of randomized proofs and forget the other, it is quite useful to have both notions. If one wishes to design a protocol, clearly it is easier to use private coins. On the other hand, the 31 relative simplicity of an Arthur-Merlin game makes it well-suited for proving properties of these hierarchies. relative simplicity of an Arthur-Merlin game makes it well-suited for proving properties of these hierarchies. The most important complexity class defined via randomized proofs seems to be AM2. All known examples of languages that belong to IP actually belong to one of the finite levels, and as a consequence belong to .AM. The class AM is an extension of NP using randomization, but not too much interaction. This seems to suggest that randomization is the more important element of the new kind of proofs introduced. We now give an oracle-based generalization of NP using randomization, that turns out to be equivalent to AM. We define a random language A to be one that contains each word x independently with probability 1/2. For a complexity class C, let us define almost -C to be the class of languages that are in cA for almost every oracle A. In other words, a language is in almost C if it is in cA for a random oracle with probability 1. Gill (1977) has shown that l3PP = almost -P. One can similarly show that AM ~ almost -NP, and one might expect that the converse is also true. However, this is not clear. The difficulty is that a nondeterministic Turing machine can use the power of nondeterminism to access an exponential number of random bits, whereas Arthur can generate only a polynomial number. However, a recent result of Nisan & Wigderson (1988) implies that a polynomial-time nondeterministic Turing machine cannot take advantage of the exponential number of random bits available, so that AM= almost -NP. As is true for most complexity classes, there is little known about the limits of AM. It appears that AM does not contain co-NP, but to prove this would imply that NP f= co-NP. Nonetheless, the following theorem, due to Boppana, Hastad & Zachos (1987), provides some evidence. (3.13) Theorem. If co -NP~ AM then ~f IIf = AM. Proof. We describe a compact proof due to Babai. Assume that co -NP ~ AM. It suffices to prove that this implies that ~f ~ AM ~ IIf, since co -C ~ C implies that C = co C for any 2Recent results imply that this intuition seems to be incorrect. Lund, Fortnow, Karloff & Nisan recently showed that the permanent could be computed using an interactive proof with a polynomial number of rounds. This implies that PH ~ TP, and Shamir extended their techniques to prove the even more astounding result that testing the validity of a quantified boolean formula is in TP, which implies that TP = PS-PAC[! 32 class C. Note that AM~ Ilf is true without the assumption, as we have already mentioned. Now let L be a language in I:f. There exists a language L' E IIf = co -NP such that L = class C. Note that AM~ Ilf is true without the assumption, as we have already mentioned. Now let L be a language in I:f. There exists a language L' E IIf = co -NP such that L = : 3Py such that (x, y) E L'}. By assumption, L' E AM and therefore L E MAM = MA(3). Theorem (3.11) implies that MA(3) = AM. □ We have seen that the graph non-isomorphism problem is in AM. Therefore, Theorem (3.13) provides new evidence that the graph isomorphism problem is not NP-complete. (3.14) Corollary. If the graph isomorphism problem is NP-complete, then I:f = IIf = AM. Goldwasser, Micali & Rackoff introduced the interactive proof system in order to characterize the minimum amount of "knowledge" that needs to be transferred in a proof. Interactive proofs make it possible to "prove", for example, that two graphs are isomorphic, without giving any further clue about the isomorphism between them. These aspects of interactive proofs shall be discussed at the end of the next section. Living with Intractability The knowledge that a problem is NP-complete is little consolation for the algorithm designer that needs to solve it. Contrary to their theoretical equivalence, all NP-complete problems are not equally hard from a practical perspective. In this section, we will examine two approaches to these intractable problems that, while not overcoming their inherent difficulty, make them appear more manageable. In this process, finer theoretical distinctions will appear among these problems that help to explain the empirical evidence. We conclude this section by considering work that uses intractability to an algorithm's advantage; much of modern cryptography relies on the simple idea that a code is secure if any algorithm to efficiently break it could also be used to solve an intractable problem. The complexity of approximate solutions Most of the 21 NP-complete problems in Karp's original paper are decision versions of optimization problems; this is also true for a great majority of the problems catalogued by Garey & Johnson (1979). Although the combinatorial nature of these problems makes it natural to focus on optimal solutions, for most practical settings in which these problems arise it is nearly as satisfying to obtain solutions that are guaranteed to be nearly optimal. In this subsection we will first classify the sorts of results that one might obtain in this area, and then sketch several results that exemplify the techniques that have been used. In order to simplify the discussion of terminology, we will focus on minimization problems where feasible solutions have integral objective function values. For more technical reasons, we will make the simplifying assumption that the value of the objective function depends on the input in a reasonable way; we assume that it is bounded by a polynomial in the length of the input and the value of largest number represented in the input. Consider an instance J of an optimization problem, and let OPT(I) denote the value of an optimal solution to J. If an approximation algorithm A (or more simply, a heuristic) delivers a solution of value A(J), A(I)/OPT(I) is called its performance ratio for J. The absolute performance guarantee of A is the smallest p such 33 that the performance ratio is at most p for all instances. If the absolute performance guarantee of A is p, and A is polynomial-time, then it is called a p-approximation algorithm. For some problems, this absolute measure gives a distorted picture, since an algorithm may perform poorly only when OPT(I) is small. As a result, we will often consider the asymptotic performance guarantee of A, which is the infimum of all p such that the performance ratio is at most p when restricted to instances where OPT(I) 2: k, for some choice of k. that the performance ratio is at most p for all instances. If the absolute performance guarantee of A is p, and A is polynomial-time, then it is called a p-approximation algorithm. For some problems, this absolute measure gives a distorted picture, since an algorithm may perform poorly only when OPT(I) is small. As a result, we will often consider the asymptotic performance guarantee of A, which is the infimum of all p such that the performance ratio is at most p when restricted to instances where OPT(I) 2: k, for some choice of k. The definition of a polynomial approximation scheme allows the running time to depend even superexponentially on 1/ E, since E is a constant for the sake of the analysis. However, if such a scheme is to be really useful in practice, the dependence on 1/ E must be constrained. If the running time of A€ increases as a polynomial function of log(l/E), then the assumption about the value OPT(I) makes it possible to choose E of polynomial length, so that A€ -OPT(I) < 1. By the integrality offeasible solutions, we see that this gives a polynomial-time algorithm for the problem. A more modest goal is to limit the dependence on E to be a polynomial in 1/E, and such a family of algorithms is called a fully polynomial approximation scheme. The previous argument shows that a fully polynomial approximation scheme can be used to obtain an algorithm that finds an optimal solution, where the running time is polynomial in the size of the unary encoding of the problem. This yields an important result, due to Garey & Johnson (1978); if a problem is strongly NP-complete, then there is no fully polynomial approximation scheme for it unless P = NP. Notice that this argument does not exclude the existence of such a family of algorithms with asymptotic performance guarantee arbitrarily close to 1; such a family will be called a fully polynomial asymptotic approximation scheme. An example of an algorithm with the strongest sort of performance guarantee predates complexity theory. In the edge coloring problem, we are given an undirected graph G and an integer k, and we wish to color the edges with as few colors as possible so that no two edges incident to the same node receive the same color. In a classical result, Vizing (1964) proved that the chromatic index is at most one more than maximum degree of G, and his proof immediately yields a polynomial-time algorithm that uses at most OPT(I) + 1 colors. In fact, this strong result was viewed as suggestive evidence that the edge coloring problem was not NP-complete, but Holyer (1981) showed that deciding if a cubic graph requires either 3 or 4 colors is NP-complete. This example also serves as a good illustration of the difference between the asymptotic performance guarantee, which is 1 in this case, and the absolute performance guarantee. Suppose that there exists a polynomial-time edge coloring p-approximation algorithm with p < 4/3. If this algorithm is run on a cubic graph that can be colored with three colors, then the algorithm must return a coloring that uses fewer than four colors; it returns an optimal coloring. Thus, by Holyer's result no such algorithm can exist unless P = NP. The idea of studying the performance guarantees of heuristics for optimization problems was first proposed by Graham (1966), in the context of a simple machine scheduling problem: each of n jobs is to be scheduled on one of m machines, where job j requires processing time Pi to be scheduled on any machine; find the minimum deadline d so that there is an assignment of jobs to 34 machines where the jobs assigned to each machine have a total processing requirement at most d. The decision version of this problem is a generalization of the 3-partition problem, so that it is strongly NP-complete. The most natural heuristic is to list the jobs, and then repeatedly assign the next listed job to the machine that currently has the smallest total processing load. Graham showed that this heuristic is a 2-approximation algorithm. To see this, consider the last job j assigned to the machine with the largest total processing time P assigned to it. Since j is assigned to this machine, all other machines must be assigned a total no less than P -Pi and so OPT( I) 2:: P -Pi. However, OPT(I) is clearly at least Pi, and thus P ::S 20PT(I). For this problem, there is essentially no difference between the absolute and asymptotic performance guarantees, because the problem has a simple scaling property; by multiplying all of the processing times by some factor, we get an equivalent problem with a larger optimal value. machines where the jobs assigned to each machine have a total processing requirement at most d. The decision version of this problem is a generalization of the 3-partition problem, so that it is strongly NP-complete. The most natural heuristic is to list the jobs, and then repeatedly assign the next listed job to the machine that currently has the smallest total processing load. Graham showed that this heuristic is a 2-approximation algorithm. To see this, consider the last job j assigned to the machine with the largest total processing time P assigned to it. Since j is assigned to this machine, all other machines must be assigned a total no less than P -Pi and so OPT( I) 2:: P -Pi. However, OPT(I) is clearly at least Pi, and thus P ::S 20PT(I). For this problem, there is essentially no difference between the absolute and asymptotic performance guarantees, because the problem has a simple scaling property; by multiplying all of the processing times by some factor, we get an equivalent problem with a larger optimal value. For each of the previous problems, we have either been able to provide an algorithm with constant performance guarantee, or been able to prove that such an algorithm would imply that P = NP. For the next few examples, we will present algorithms for which there is evidence that they cannot be improved, in terms of their performance guarantees. In the center problem, the input is a complete graph with integer edge weights { ce} and an integer k, and the aim is to select k nodes that cover all remaining nodes within as small a radius as possible, where a node v is covered within radius r if there is a selected node u such that Cuv ::S r. As was done in the traveling salesman problem, an easy reduction (from the dominating set problem) shows that no fixed performance guarantee can be achieved. Unlike the traveling salesman 35 problem, a slight modification of this reduction shows that even with the triangle inequality, no p-approximation algorithm exists for p < 2 unless P = NP. Hochbaum & Shmoys (1985) give a 2-approximation algorithm that relies on a more general technique, called dual approximation. A dual approximation algorithm finds solutions that have objective function values that are at most OPT(I), at the expense of obtaining solutions that are infeasible. The performance of a dual approximation algorithm is a bound on the infeasibility of the solutions generated. problem, a slight modification of this reduction shows that even with the triangle inequality, no p-approximation algorithm exists for p < 2 unless P = NP. Hochbaum & Shmoys (1985) give a 2-approximation algorithm that relies on a more general technique, called dual approximation. A dual approximation algorithm finds solutions that have objective function values that are at most OPT(I), at the expense of obtaining solutions that are infeasible. The performance of a dual approximation algorithm is a bound on the infeasibility of the solutions generated. r. In a problem that might be viewed as a dual of the center problem, we want to minimize k, in order to achieve a specified radius of coverage r. A p-dual approximation algorithm for this problem is a polynomial-time algorithm that, given edge weights and a radius r, finds k nodes that cover all points within radius pr, where k is no more than the optimal number of nodes needed to cover within radius r. It is easy to see that such a p-dual approximation algorithm can be used within a binary search procedure to obtain a p-approximation algorithm for the center problem. The following algorithm is a 2-dual approximation algorithm: choose any node, delete all nodes covered within radius 2r, and repeat until all nodes are covered. To see that the number of nodes selected is not too large, observe that it is impossible to choose two nodes covered by the same node in the optimal solution for radius r. The same framework of using a dual approximation algorithm for a dual problem to obtain an ordinary approximation algorithm for the original problem can, of course, be applied to any problem where there is such a tradeoff between two resources. One important technique that has been used in constructing approximation schemes is that of rounding and scaling. The technique is based on a simple idea: round the data sufficiently so that a pseudo-polynomial algorithm for the rounded problem runs in time polynomial in the original instance size; then interpret the solution in terms of the original problem, and if the rounding was small enough, the solution is still nearly optimal. Ibarra & Kim (1976) developed this technique to give a fully polynomial approximation scheme for the knapsack problem: there are n pieces to be packed in a knapsack of size B, where piece j has a size Sj ::S B and a value Vj, and the aim is pack a subset of total size no more than B, with maximum total value. It is easy to see that the subset sum problem reduces to a decision version of this, and also that an approach similar to the one used for the subset sum problem gives a pseudo-polynomial algorithm for this problem (where only the Vj must be encoded in unary). Algorithm A€ is as follows: round Vj down to its closest multiple ofμ= emaxj{Vj}/n, and solve the rounded instance; if this solution is worse than maxj{vj}, return the most valuable piece instead. This solution differs from OPT(I) by at most nμ = emaxj{Vj} ::SEA€, and so OPT(I)/AiI) ::S (1 + e). Since the rounded problem (when rescaled by μ) has length in its unary encoding at most n2 / € plus the size of the original instance ( encoded in binary), we see that this gives a fully polynomial approximation scheme. The bin packing problem is dual to the machine scheduling problem: given a specified capacity d, pack n pieces with sizes Pi into as few bins of capacity d as possible. Since it is easy to formulate the subset sum problem as a question of whether 2 bins suffice, we see that a papproximation algorithm with p < 3/2 would imply that P = NP. On the other hand, the bin packing problem does not have the scaling property, and Johnson (1973) showed that many simple heuristics do quite well, with asymptotic guarantees as small as 11/9. It was a great surprise when Fernandez de la Vega & Lueker (1981) not only substantially improved this 11/9 bound, but gave a polynomial asymptotic approximation scheme, by showing that a rounding and scalinglike approach could be used for this problem. Karmarkar & Karp (1982) gave a fully polynomial 36 asymptotic approximation scheme for the bin packing problem, by using similar ideas in conjunction with a more efficient approach to solving a related underlying linear program. This result was also quite surprising, since no such approximation scheme was known for any strongly NP-complete "number problem." Using techniques similar to those of Fernandez de la Vega & Lueker (1981), asymptotic approximation scheme for the bin packing problem, by using similar ideas in conjunction with a more efficient approach to solving a related underlying linear program. This result was also quite surprising, since no such approximation scheme was known for any strongly NP-complete "number problem." Using techniques similar to those of Fernandez de la Vega & Lueker (1981), 2d, then there are at most k = 1 + 1/€2 different piece sizes. For any constant k, there is polynomial-time algorithm to solve the bin packing problem with only k different sizes (although the degree of the polynomial depends on k ). If the solution to the rounded problem is interpreted for the original, then since each bin contains at most 1/E pieces, the total difference between the contents of a bin with and without rounding is at most (1/E)€2d; that is, each bin contains at most (1 + E)d. Clearly, the number of bins used is no more than the optimum with capacity d. To complete the packing, add the small pieces to any bin with no more than din it already, and start a new bin only when all bins are overfilled. It is easy to see that this gives a (1 + €)-dual approximation algorithm. For the preceding problems, we have been able to find algorithms whose performance matches the limits implied by assuming that P =I NP. Unfortunately, this is much more the exception than the rule. The interest in performance guarantees of algorithms for graph problems is largely due to the work of Johnson (1974a), but little improvement has been made in the intervening years. For the maximum stable set problem, the algorithm with the best known performance guarantee is a simple greedy heuristic. A stable set I is constructed by placing a node v of smallest degree in I and deleting v and its neighbors and iterating until the entire graph is deleted. If a(G) 2:: n/k, then any node in I has degree at most n-( n/ k ), and J has logk n nodes. This implies that the algorithm's performance ratio is 0( n/ log n ). (For maximization problems, we invert all performance ratios.) Garey & Johnson (1976) developed a graph composition technique to show that if there exists a p-approximation algorithm for the maximum stable set problem for some constant p, then there exists a polynomial approximation scheme. Let G2 be the graph formed by composing a graph G with itself, in the sense that each node of G is replaced by a copy of G, and there is an edge between any pair of nodes in copies of G that correspond to adjacent nodes. Note that a(G2) = a(G)2, and that given a stable set of size k2 in G2 one can efficiently find an stable set of size k in G. 2 By applying a p2-approximation algorithm to G2, we get a set of size a( G)2 / p, which can then be used to find an stable set in G of size a( G) / p. By repeatedly applying this technique, we get the claimed result. Since this problem has the scaling property (simply take disjoint copies of the graph), an identical result holds for asymptotic performance guarantees. The state of the art for the node coloring problem is not much better. For the node coloring problem, a corollary of the NP-completeness of 3-colorability is that no p-approximation algorithm can exist with p < 4/3. The same result holds for asymptotic performance guarantees. To see this, note that for any graph G we can produce a graph G' with x( G') = kx( G), by taking k disjoint 37 copies of G and adding an edge between any pair of nodes in different copies. This construction maintains the ratio of the chromatic number of those instances constructed from graphs G with x(G) = 4 and those constructed from graphs with x(G) = 3. Garey & Johnson (1976) use a more intricate composition technique to increase this ratio, and thereby prove that an asymptotic performance guarantee less than 2 would imply that P = NP. copies of G and adding an edge between any pair of nodes in different copies. This construction maintains the ratio of the chromatic number of those instances constructed from graphs G with x(G) = 4 and those constructed from graphs with x(G) = 3. Garey & Johnson (1976) use a more intricate composition technique to increase this ratio, and thereby prove that an asymptotic performance guarantee less than 2 would imply that P = NP. 2 /(log n )2)x( G). Probabilistic analysis of algorithms One justified criticism of complexity theory is that it focuses on worst-case possibilities, and this may in fact be unrepresentative of the practical reality. In this section, we will examine results that are concerned with the probabilistic analysis of algorithms, where the inputs are selected according to a specified probability distribution, and the average behavior is studied. After a brief discussion of the sorts of results that might be obtained, we will illustrate this approach by discussing results concerning several of the combinatorial problems already mentioned. We shall also sketch the main ideas of an analogue of NP-completeness, recently proposed by Levin, to provide evidence that a problem is hard to solve in even a probabilistic sense. For a more extensive survey of results in this area, the reader is referred to Karp, Lenstra, McDiarmid & llinnooy Kan (1985), whereas Coffman, Lueker & llinnooy Kan (1988) give a nice introduction to the techniques used. In order to discuss the probabilistic performance of algorithms, we first introduce some terminology from probability. Let {rn} be a sequence of random variables, and let r be a random variable. Then Tn-+ r in expectation, if limn-+oo E[lrn-rl] = 0. A stronger notion of convergence is that Tn -+ r in probability, which occurs if for every€ > 0, limn .... 00 Pr[lrn -rl > e] = 0. The sequence Tn-+ r almost surely, if Pr[lim supn-+oo Tn r = liminfn-+oo rn] = 1. These definitions are used to characterize different notions of an algorithm producing asymptotically optimal solutions. For example, if Tn denotes the performance ratio A(In)/0PT(In) for a particular algorithm A when run on an input In of size n drawn from a specified distribution, then it is optimal in expectation if rn -+ 1 in expectation. Successively stronger results would prove that an algorithm is optimal in probability and optimal almost surely, depending on whether rn-+ 1 in probability or almost surely. These notions characterize approximation results, in the sense that the solutions produced by the algorithm may always be suboptimal, since merely the asymptotic error is bounded. It would, of course, be still more preferable to prove that the 38 algorithm generally yields the optimal solution. We will say that an algorithm solves the problem in probability if limn--+oo Pr[A(In) = 0PT(In)] = 1. This definition can also be applied to decision problems in NP, where the solution values are restricted to be 0 or 1, but in this case we will require that the algorithm also produce a certificate proving that the solution value is 1. Finally, an algorithm solves the problem in expected polynomial time, if it always finds an optimal solution, and the expected value of the running time is bounded by a polynomial. algorithm generally yields the optimal solution. We will say that an algorithm solves the problem in probability if limn--+oo Pr[A(In) = 0PT(In)] = 1. This definition can also be applied to decision problems in NP, where the solution values are restricted to be 0 or 1, but in this case we will require that the algorithm also produce a certificate proving that the solution value is 1. Finally, an algorithm solves the problem in expected polynomial time, if it always finds an optimal solution, and the expected value of the running time is bounded by a polynomial. In a paper that stimulated the area of probabilistic analysis of algorithms for NP-complete problems, Karp (1976) proposed a natural partitioning scheme for solving the traveling salesman problem over instances where the n nodes are i.i.d. uniformly in the unit square [0, 1]2, and the intercity distances Cij are given by the Euclidean distance between each pair of nodes. A classical theorem of Beardwood, Halton & Hammersley (1959) proves that there exists a constant c such that 0PT(In)/.jn-+ c almost surely for instances drawn from this distribution. We now give the adaptive partitioning method of Karp. For simplicity, assume that no two nodes have a common vertical coordinate or horizontal coordinate. Find a vertical line through a node that splits the unit square into two parts so that each part contains essentially the same number of nodes. ( Consider a node on the dividing line to be in both parts.) For each of the two parts, split them analogously by finding a horizontal line that ensures that the number of nodes in each new region is reduced by a factor of two. Repeat this procedure recursively on each region until each small part has log n nodes. A dynamic programming algorithm can find an optimal solution to any of these small instances in 0( n log2 n) time. The union of these tours is a spanning Eulerian graph, and as in Christofides' algorithm, the Eulerian tour yields an ordering of the nodes. To analyze the performance of the algorithm, let Ri denote the ith region, and let Ti and T/ denote, respectively, the length of the part of the heuristically generated tour in Ri, and the length of the part of the optimal tour in Ri. We first show that Ti is at most T/ plus the 3/2 times the perimeter of Ri. Consider Ri, and add a node at each point where the optimal tour leaves this region. Construct a tour through all of the nodes in Ri in the following way: take the pieces of the optimal tour, and add the perimeter once, which makes the graph connected; then take alternate pieces of the perimeter, choosing a la Christofides, the matching of smaller length. This Eulerian graph can be shortcut to yield a tour, proving the claim. This implies that A(In) ~ 0PT(In)+total perimeter length ~ 0 PT(In) + 0( Jn/ log n ). The theorem of Beardwood, Halton & Hammersley ( or even a simple neighborhood argument) shows that there exist constants c' and d < 1 such that Pr[0PT(In) < c\/n] ~ dn. These results are sufficient to prove that the algorithm is optimal almost surely; it is not hard to see that for any E > 0, 00 00 L Pr[(A(In)/0PT(In)) 2: l+E] ~ L(Pr[OPT(In) < c' vn]+Pr[I0PT(In)-A(I)I 2: Ec\/n]) < 00 n=l 39 and by the Borel-Cantelli lemma, this implies that the relative error converges to 0 almost surely. Although worst-case analysis of the traveling salesman problem requires assumptions such as the triangle inequality to obtain reasonable results, the same is not true for probabilistic analysis. In fact, consider instances of the directed traveling salesman problem, where all nand by the Borel-Cantelli lemma, this implies that the relative error converges to 0 almost surely. Although worst-case analysis of the traveling salesman problem requires assumptions such as the triangle inequality to obtain reasonable results, the same is not true for probabilistic analysis. In fact, consider instances of the directed traveling salesman problem, where all nentries of the distance matrix are i.i.d. uniform random variables over the interval [0,1). Algorithms to find a minimum weight perfect matching in a bipartite graph yield a polynomial-time procedure to find the minimum length collection of directed circuits that cover all of the nodes. It remains merely to patch these circuits together to build a tour. A fairly natural algorithm is to take the two longest circuits, find the minimum cost way to delete two arcs and to add two arcs to join the two circuits, and repeat until the result is a single tour. Karp showed that the expected value of the performance ratio of this algorithm is 1 + 0( n-112), and so it is optimal in expectation. (For a survey containing this and other results on the probabilistic analysis of the traveling salesman problem, the reader is referred to the survey of Karp and Steele (1985).) Since the Hamiltonian circuit problem is a decision problem, we shall focus instead on probabilistic results to solve this problem, and so the analysis has a somewhat different flavor. The study of algorithms for this problem grew out of the theory of random graphs of Erdos & Renyi (1960), which is treated elsewhere in this volume (see Chapter 6). There are two standard distributions of random graphs of order n: the graph Gn,m is chosen uniformly from all graphs with m edges, and the graph Gn,p is selected by including each possible edge independently with probability p. Most results in one model can be translated to an equivalent result in the other. A major part of the study of random graphs deals with the evolution of the graph as the number of edges grows as a function of the number of nodes, and one of the fundamental results of this area has been to provide a threshold function that characterizes the density where the graph quickly changes from having an overwhelming chance of being non-Hamiltonian to having an overwhelming chance of being Hamiltonian. The ultimate result along these lines is due to Komlos & Szemeredi (1983), who proved that if H(n, m) is the probability that Gn,m is Hamiltonian, and m(n) = ( n log n )/2 + ( n log log n )/2 + nw( n ), then limn-+oo H(n,m(n)) = 0 if w(n)-+ -oo; = exp(exp(-2c)) if w(n)-+ c; and =1 if w(n)-+ oo. This result is surprising in view of the fact that an identical result was shown by Erdos & Renyi (1960) for the property of having minimum degree two, which is clearly necessary for a graph to be Hamiltonian. Although the previous result is not algorithmic, the progress towards this result has included a number of results that yield polynomial-time algorithms to find a Hamiltonian circuit in the case that it is likely that one exists. The first result proving that a modest number of edges is sufficient for almost all graphs to have a Hamiltonian circuit is due to Perepelica (1970), who gave an algorithm that solves the Hamiltonian circuit problem in probability for graphs with cn312'\l'logn edges. There have been a series of results that have gradually reduced the number of edges that are sufficient to be sure of efficiently finding a Hamiltonian circuit. Based on a nearly algorithmic result of Posa (1976), Korsunov (1976) gave the first result of the correct order by reducing the bound to 3n log n. In a recent paper, Bollobas, Fenner & Frieze (1985) provide a polynomial-time algorithm HAM that yields an algorithmic proof of the result of Komlos & Szemeredi. By considering the special case Gn,p with p = 1/2, they give a slight extension of HAM that solves the Hamiltonian circuit problem in expected polynomial time. For this distribution, the probability that HAM fails to find a Hamiltonian circuit in a graph that is connected and has minimum degree at least two is so small, that if an exponential-time dynamic programming algorithm to solve the Hamiltonian circuit 40 problem is run in this case, the expected running time remains polynomial. Furthermore, they use a variant of this algorithm to solve in probability the bottleneck version of the traveling salesman problem, where the aim is to minimize the maximum weight edge included in the Hamiltonian circuit, where the edges of the undirected complete graph are given weights i.i.d. uniformly in the interval [0,1]. The algorithm first computes the minimum value t such that the subgraph containing the t cheapest edges has minimum degree 2, and then applies HAM to find a Hamiltonian circuit in this subgraph. problem is run in this case, the expected running time remains polynomial. Furthermore, they use a variant of this algorithm to solve in probability the bottleneck version of the traveling salesman problem, where the aim is to minimize the maximum weight edge included in the Hamiltonian circuit, where the edges of the undirected complete graph are given weights i.i.d. uniformly in the interval [0,1]. The algorithm first computes the minimum value t such that the subgraph containing the t cheapest edges has minimum degree 2, and then applies HAM to find a Hamiltonian circuit in this subgraph. 2 n is selected. Since the algorithm does not "look" at the remaining graph, we see that the remaining graph is a random graph of smaller order selected according to the same distribution. This allows us to analyze the process of repeatedly choosing such a set until the entire graph has been colored. It is not hard to show that the size of the largest stable set in Gn,p is, in probability, 2 log2 n. This implies that the algorithm uses at most 2x( Gn,p) colors in probability. Bollobas (1988) has recently shown that the lower bound on x( Gn,p) given by the stable set computation is indeed tight, so that the greedy algorithm does indeed use twice as many colors as needed, and it remains an interesting open question to give an efficient algorithmic proof of Bollobas' result. Although we have focused on the case p = 1/2, these results have been extended to include a broader spectrum of distributions. We next consider the subset sum problem. Initially, we will restrict attention to the following distribution of instances that are chosen to be "yes" instances, and so the problem is to find an appropriate subset: randomly select a vector x in { O, 1 }n, and choose integers s1, ... , Sn i.i.d. uniformly in [1, U], where U = 2Sn2 for any fixed 8 > 1/2, and let the target t = ~f=I Xi •Si. Lagarias & Odlyzko (1983) showed how to use the lattice basis reduction algorithm of Lovasz to find x, given the Si and t. A lattice is the collection of vectors that can be represented as an integral linear combination of a given set of n linearly independent basis vectors in zn. Given one basis, the basis reduction algorithm of Lovasz produces a basis whose shortest vector has length at most 2(n-l)/2 times the length of the shortest non-zero vector in the lattice (see Chapter 19). We shall give a short proof of the result of Lagarias & Odlyzko due to Frieze (1986). For the analysis, consider a fixed choice of x, and a random selection of the Si. We may assume that ~i=l s;/2 ::; t since otherwise we can solve the problem of finding the complementary subset. Consider the ( n+ 1 )-dimensional lattice given by the basis ao = ( ct, O, ... , 0), ai = (-csi, bi1, bi2, ... , bin), i = 1, ... , n, where c is an integer greater than n2n/2 and the matrix B = (bij) is the identity matrix. Note that x = (0,x1, •.. ,xn) is in this lattice. Let A= (.\0, ... ,>.n) be the shortest vector returned by the basis reduction algorithm, and let >. * denote the shortest non-zero vector in the lattice. We will show that, with probability approaching 1, A is x or -x. Using the above bound for the basis reduction algorithm, we see that II-XII::; 2n/2il>.*II ::; 2nl2llxll ::; ~ = m, where II· II denotes the Euclidean norm. Our choice of c implies that .\0 = 0. Next, consider the set Ao of lattice points a with a0 = 0 and llall ::; m that are not integer multiples of x. We can bound the probability that the algorithm fails by the probability that Ao is non-empty. It is not hard to see 41 that if a E Ao, then it must satisfy :Ef=that if a E Ao, then it must satisfy :Ef=ai · Si = dt for some integer d, ldl ~ 2m. Consider some fixed choice of d and ( a1, •.• , an) that is not an integer multiple of x. The probability that they satisfy this equation is at most 1 / U, and since there are at most ( 4m + 1 )( 2m + 1 t = o( 28n2) such choices, we get the claimed result. By choosing t uniformly among the integers from 1 to f cnUl, where c ~ 1 is some positive constant, this technique can be extended to instances that may not have solutions. Although the probabilistic analysis of algorithms has focused mainly on NP-complete problems, it has often served as a useful tool to show that the average-case running time of certain algorithms is much better than the worst-case running time. The most important example of this is the simplex method for linear programming, which is a practically efficient algorithm that was shown to have exponential worst-case running time by Klee & Minty (1972). Borgwardt (1982) showed that if the rows of the constraint matrix are i.i.d. according to a spherically symmetric distribution (where each vector is viewed as a ray from the origin), then a variant of the simplex method requires only a polynomial number of iterations. Independently, Smale (1983) showed similar results under the assumption that the rows of the constraint matrix, the vector of right-hand sides, and the objective function are independently distributed with the symmetry assumption that the distribution is invariant under coordinate permutations. Smale analyzed a practical variant of the simplex algorithm, and unlike Borgwardt's result, his distribution includes both feasible and infeasible problems. There has been a great deal of work done in extending these results to more general probability distributions and in analyzing algorithms closer to the particular variants of the simplex method most commonly used in practice. For a thorough survey of results in this area, the reader is referred to Shamir (1987). As we have seen, not all problems in NP have been solved efficiently even with probabilistic assumptions, and only the simplest sorts of distributions have been analyzed. This raises the specter of intractability: are there distributions for which certain NP-complete problems remain hard to solve? We will sketch the main ideas behind a recent result of Levin (1986), which proposes a notion of completeness in a probabilistic setting. Once again, evidence for hardness is given by showing that if a particular problem in NP with a specified input distribution can be solved in expected polynomial time, then every such problem and distribution pair can also be solved so efficiently. For a more complete discussion, the reader is encouraged to read the column of Johnson (1984). A bit of care must be given in formulating the precise notion of polynomial expected time, so that it is insensitive to both the particular choice of machine and encoding. If μ( x) denotes the probability that a randomly selected instance of size n is x, and T( x) is the running time on x, then one would typically define expected polynomial time to require that the sum of μ( x )T( x) over all instances of size n is O(nk) for some constant k. Instead, we considerμ to be the density function over the set of all instances I, and require that ExEI μ(x )T(x )1fk /Ix! < oo for some constant k. Levin's notion of random NP requires that the distribution function M(x) = Ef=1 μ(i) can be computed in P, where each instance is viewed as a natural number. This notion does not seem to be too restrictive, and includes all of the probability distributions discussed here. It remains only to define the notion of reducibility. As usual, "yes" instances must be mapped by a polynomialtime function f to "yes" instances, and analogously for "no" instances, but one must consider the density functions as well. The pair (11,μ1) reduces to (12,μ2) if in addition we require that μ2(x) is at least a polynomial fraction of the total probability of elements that are mapped to x by f. 42 Levin showed that all of random NP reduces to a certain random tiling problem. Instances consist of integers k :s; N, a set of tile types, each of which is a unit square labeled with letters in its four corners, and a side-by-side sequence of k such tiles where consecutive tiles have matching labels in both adjoining corners. We wish to decide if there is a way of extending the sequence to fill out an N X N square, where adjacent tiles have matching labels in their corresponding corners, and the top row starts with the specified sequence of tiles. The instances are selected by first selecting N = n proportionately to n- Levin showed that all of random NP reduces to a certain random tiling problem. Instances consist of integers k :s; N, a set of tile types, each of which is a unit square labeled with letters in its four corners, and a side-by-side sequence of k such tiles where consecutive tiles have matching labels in both adjoining corners. We wish to decide if there is a way of extending the sequence to fill out an N X N square, where adjacent tiles have matching labels in their corresponding corners, and the top row starts with the specified sequence of tiles. The instances are selected by first selecting N = n proportionately to n-, choosing k uniformly between 1 and N; after choosing the tile types uniformly, the tiles in the sequence are selected in order uniformly among all possible extensions of the current sequence. More recently, Venkatesan & Levin (1988) have shown that a generalization of the problem of edge coloring digraphs ( where for certain subgraphs, the number of edges given each color is specified) is also random NP-complete. Cryptography The basic goal of cryptography is to provide a means for private communication in the presence of adversaries. Until fairly recently, cryptography has been more of an art than a science. Modern cryptography has suggested ways to use complexity theory for designing schemes with provable levels of security. We will not be able to discuss in detail, or even mention, most of the results here. We intend rather to give a flavor of the problems considered and the results proved. The interested reader is referred to the survey by Rivest (1990) or to the recent special issue of SIAM Journal on Computing (April 1988) on cryptography for more results, and for further references. A traditional solution to the privacy problem uses secret key crypto-systems. Suppose two parties wish to communicate a message M (a string of O's and l's) of length n, and they agree in advance on a secret key I(, a string of O's and 1 's of the same length. They send the encrypted message C = M EB J( instead of the real message M, where EB denotes the componentwise addition over G F( 2). This scheme is provably secure, in the sense that ( as long as the key I( is kept secret) an adversary can learn nothing about the message by intercepting. Every string of length n is equally likely to be the message encoded by the string C. This crypto-system is provably secure, but it is very inconvenient to use since before each transmission the two parties must agree on a new secret key. In the paper that founded the area of modern cryptography, Diffie & Hellman (1976) suggested the idea of public-key crypto-systems. They proposed that a transmission might be sufficiently secure if the task of decryption is computationally infeasible, rather than impossible in an information theoretic sense. Such a weaker assumption might make it possible to securely communicate without agreeing on a secret key. In a public key crypto-system the participants agree on a security parameter n. Messages to be sent are divided into pieces of length n. • Each participant should randomly choose a public encryption key E and a corresponding secret decryption key D depending on the security parameter n. • There must be a deterministic polynomial-time algorithm that, given a message M of length n and an encryption key E, produces the encrypted message E(M). • Similarly, there must be a deterministic polynomial-time algorithm that, given a message M of length n and an decryption key D, produces the decrypted message D(M). • It is required that for every message M, D(E(M)) M. 43 • • The users of this system should publish a directory of the public keys. When a user named Bob wants to send a message M to another user named Alice, he looks up Alice's public-key EA in this directory, and sends her the encrypted message EA(M). By the assumptions, only Alice can decrypt this message using her private key DA. The basic ingredient of this system is a "trapdoor" function. The encryption function is assumed to be a function that is easy to compute, but hard to invert. There are several ways to formalize this notion. The one used most often is the one-way function, that is a 1 to 1 function f such that f can be computed in polynomial time, but for n sufficiently large, no polynomial-time algorithm can invert f on even a polynomial fraction of the instances of length n. In order for a one-way function to be useful in the above scheme, it has to be a trapdoor function, which is a one-way function with a key K, such that by knowing the key one can invert the function in polynomial time. Given the present state of complexity theory, there is little hope to prove that any function is oneway, or that a public key crypto-system satisfies the last requirement above. Ideally, one would like to prove the security of a crypto-system based on the assumption that P =J NP. However, this also seems to be quite difficult. Complexity theory is mainly concerned with worst-case analysis. For the purpose of cryptography, average-case analysis, and a corresponding notion of completeness (such as the one suggested by Levin) would be more appropriate. Furthermore, these inversion problems have the special property that the inverse is unique, whereas the nondeterministic algorithms for NP-complete problems have several accepting paths for most instances. Over the last ten years, several public-key crypto-systems have been suggested and analyzed. Some have been proven secure, based on assumptions about the computational difficulty of a particular problem ( e.g., factoring integers), or on the more general assumption that one-way functions exist. Rivest, Shamir & Adleman (1978) were the first to suggest such a scheme. Their scheme is based on the assumed difficulty of factoring integers. Each user of this scheme has to select a number n that is the product of two random primes. Random primes can be selected using a randomized primality testing algorithm ( since every (log n )th number is expected to be prime). The public key consists of a pair of integers ( n, e) such that e is relative prime to ( n ), where ( n) denotes the number of integers less than n relatively prime to n. A message M is encrypted by C = Me (mod n). The private key consists of integers (n, d), where d • e = 1 (mod (n)), and decryption is done by computing Cd = M (mod n ). Given the prime factorization of n, one can find the required private key d in polynomial time, but the task of finding the appropriate d given only n and e is equivalent to factoring. Rabin (1979) suggested a variation on this scheme in which the task of decryption is equivalent to factoring, rather than just the task of finding the decryption key. The idea is to encrypt a message M by C = M2 (mod n). (A slight technical problem that has to be overcome before turning this into an encryption scheme is that squaring modulo a product of two primes is not a 1-1 function, but is rather 4-1.) An algorithm that can extract square roots modulo n can be used to factor: choose a random integer x and let the algorithm find a square root y of the integer ( x2 44 mod n ); with probability 1/2, the greatest common divisor of x -y and n is one of the two primes used to create n. mod n ); with probability 1/2, the greatest common divisor of x -y and n is one of the two primes used to create n. 1, ••• ,an)· A message M, that is a 0-1 vector of length n, is encrypted by the inner product C =(a· M). One problem with this scheme is that there is no apparent private key that can make the task of decryption easier for the intended receiver. Crypto-systems based on this idea have built some additional trapdoor information into the structure of the vector a, and so the decryption problem is based on restricted variant of the subset sum problem. Furthermore, as discussed in the previous subsection, randomly chosen subset sum problems can be easy to solve. In an innovative paper, Shamir (1982) used Lenstra's integer programming algorithm to break the Merkle-Hellman scheme, i.e., he gave a polynomial-time decryption algorithm that does not need the secret key. Since then, several other subset sum-based schemes have been broken by clever use of Lovasz's basis reduction algorithm (see Chapter 19). The schemes mentioned so far have encrypted messages of length n, for some given security parameter n. An apparent problem with this approach is that even if we prove that no polynomial-time algorithm can decrypt the messages, it still might be possible to deduce some relevant information about the message in polynomial time. Goldwasser & Micali (1984) suggested encrypting messages bit by bit to achieve provable levels of security. How does one encrypt a single bit? Deterministic encryption schemes cannot be used repeatedly when there are only two possible messages. A natural solution to encrypt bits more securely is to append a number of random bits to the one bit of information, and encrypt the resulting longer string. Goldwasser & Micali proposed a different randomized bit-encryption scheme for which they can prove security. The Goldwasser-Micali scheme is based on the assumed difficulty of the quadratic residuousity problem. An integer xis a quadratic residue modulo n if x = y2 (mod n) for some integer y. The Jacobi symbol(~) is a 0, +1, -1 valued function, that gives some help in recognizing quadratic residues. It can be computed efficiently, and (~) = 1 for all quadratic residues. The quadratic res id uousity problem is to decide for given integers n and x with ( ~) = 1 whether x is a quadratic residue modulo n. The Goldwasser-Micali encryption scheme works as follows: a public key consists of an integer n that is the product of two large primes, and a quadratic non-residue y with (;) = 1. The bit 0 is encrypted by a random quadratic residue r2 (mod n ), the bit 1 is encrypted by a random quadratic non-residue of the form r2y (mod n). The task of distinguishing encryptions of O's from encryptions of l's is exactly the quadratic residuousity problem. Decryption is easy for the intended receiver, who knows the prime factorization of n. Yao (1982) has extended this result by proving that a secure randomized bit-encryption scheme exists if a one-way function exists. For a public key crypto-system, it is does not seem to be very restrictive to further assume that E(D(M)) = M for every message M. Diffie & Hellman noticed that such a system can be used to solve the signature problem, so that the participants can electronically sign their messages. When Bob sends a message M to Alice he can use his private decryption key DB to append the "signature" DB(M) after the message M. Given such a signature, Alice can use Bob's public key EB to convince herself that the message came from Bob, exactly as she received it. The livest, Shamir & Adleman scheme has this additional property, and therefore can be used for signatures as well. Rabin's scheme is, in fact, more appropriate for signatures then for encryption, since it is 45 not a 1-1 function. not a 1-1 function. 2 mod n). The first signature scheme that is secure against chosen message attacks was given by Goldwasser, Micali & Rivest (1988). The proof of the security of the scheme is based on the assumed difficulty of certain number-theoretic functions. More recently, Bellare & Micali (1988) have given a signature scheme that is secure against chosen message attacks and is based on the existence of trapdoor functions. A further related issue is that of generating pseudo-random numbers. When random numbers are used in algorithms and crypto-systems it is essential that the random bits used are unbiased and independent. The speed or the reliability of the algorithm, and the security of the crypto-system depend on the quality of the random numbers used. Natural sources of randomness ( such as coins, or noise diodes) are fairly slow in generating random bits. On the other hand, for both of these applications, truly random bits could be replaced by a sequence of bits that no polynomial-time algorithm can distinguish from truly random bits. A pseudo-random number generator takes a truly random string x (a seed) of length n and expands it to a pseudo-random string y of length nk for some constant k. The quality of the pseudo-random number generator is measured by certain statistical tests. A pseudo-random number generator passes the next bit test if after seeing the first i bits of y for some i < nk no polynomial-time algorithm can predict the next bit with probability 1/2+n-c for any constant c. Most computers have built-in pseudo-random number generators, and some of the standard algorithms used have been proven to output inappropriate sequences by clever use of Lovasz's basis reduction algorithm. Such examples include the linear congruential generator (where the seed consists of integers a, b, m and x0, and the pseudo-random numbers are generated by the recurrence Xi+i = axi + b (mod m)), and the binary expansion of algebraic numbers ( where the seed is some representation of the algebraic number). The first provably secure pseudo-random bit generator was developed by Blum & Micali (1984). They proved that pseudo-random bit generators exist, based on an assumption related to the existence of one-way functions. They assume that there exist permutations f of the set of nbit integers and functions B from then-bit integers to {O, 1}, such that the composite function B(f ( x)) can be computed in polynomial time, but the probability that a polynomial-time algorithm computes B(y) for a randomly chosen y oflength n (for a sufficiently large n), is less than 1/2 + nc for any constant c. The Blum-Micali pseudo-random number generator produces the sequence bo,••·,bk, defined by bi= B(f(xk-i)), and Xi+i = J(xi), where the random seed is used to select the functions used and the initial vector x0• They also give particular suggestions for these functions based on the assumed difficulty of the discrete logarithm problem. For a prime p, an integer g is a generator modulo p if every non-zero integer x modulo p can be written as x = ge (mod n) for an appropriate integer exponent e < (n). We shall use the notation e =indexp,9(x). The discrete logarithm problem is to compute indexp,g(x) given a prime p, a generator g and an integer x. The seed of the Blum-Micali generator is used to randomly select a large prime p, a generator g and a starting point x0• The pseudo-random numbers are defined using the above scheme with the 46 functions f(x) = gx (mod p) and B(x) = 1 if indexfunctions f(x) = gx (mod p) and B(x) = 1 if index,p(x):::; (p -1)/2. The defined pseudo-random number generator can be proved to pass the next bit test based on the assumption that the discrete logarithm problem is difficult. One might wonder whether certain pseudo-random number generators pass statistical tests other than the next bit test. However, Yao (1982) proved that if a pseudo-random number generator passes the next bit test then it passes any statistical test, i.e., no randomized polynomial-time algorithm can distinguish the generated pseudo-random numbers from truly random numbers. In some of the above cryptographic applications, one can imagine that it might be useful to be able to convince someone that a number is the product of two primes, without telling the two primes themselves. Interactive proof systems make this possible. In fact, Goldwasser, Micali & Rackoff (1985) invented their notion of interactive proof systems for exactly this application. Informally, an interactive proof of a statement is said to be a zero-knowledge proof if the verifier cannot learn anything from the proof except the validity of the statement. Imagine, for example, a proof that a graph has a Hamiltonian circuit that gives no help in finding the circuit. There are several ways to formalize this notion. The one we shall use is computational zero knowledge. We say that an interactive protocol is a zero-knowledge protocol if the verifier by himself (without the participation of the prover) can generate a sequence of communication that is indistinguishable in polynomial time from the true transcript of the conversation. Consider the example of the interactive proof for the graph non-isomorphism problem. At first sight, this seems to be a zeroknowledge protocol, since (so long as the verifier does not deviate from the protocol) he always knows the prover's next move, and hence could generate the conversation himself. There are zeroknowledge protocols for the graph non-isomorphism problem, but this protocol is not, in fact, zero-knowledge, since the verifier can use it to test if a third graph is isomorphic to one of the two in the input ( i.e., the verifier can gain extra information by deviating from the protocol). Goldreich, Micali & Wigderson (1986) and, subsequently but independently, Brassard & Crepeau (1986) proved that every language in NP has a zero-knowledge interactive proof. The Goldreich, Micali, & Wigderson protocol is based on the existence of one-way functions, whereas the Brassard & Crepeau protocol is based on the assumed difficulty of certain number-theoretic functions. To prove that all languages in NP have zero-knowledge proofs, one merely has to provide a zeroknowledge proof for a single NP-complete problem. The following protocol for the Hamiltonian circuit problem is due to Blum. Suppose that the prover, Alice, knows a Hamiltonian circuit C in a graph G, and she wants to convince the verifier, Bob, of this fact without showing him the circuit. The basic cryptographic tool needed for the protocol is a secure bit-encryption scheme. Alice must be able to encrypt a bit, so that Bob has no chance to figure out on his own what the bit is, but later Alice can prove which bit she encrypted, if she chooses to do so. To convince Bob that a graph has a Hamiltonian circuit, Alice chooses a random permutation P, and uses this permutation to obtain a random isomorphic copy G' of the graph G. She encodes the permutation, and the n( n -1) /2 bits representing the adjacency matrix of the graph G'. Bob can choose either to ask Alice for a proof that the encoded graph G' is isomorphic to the original, or to ask for a proof that G' has a Hamiltonian circuit. In the first case, Alice decrypts every encrypted bit, thereby providing the permutation P and the graph G'. In the second case, she decrypts only the bits corresponding to edges participating in the Hamiltonian circuit C. If Alice does not know a Hamiltonian circuit, then she will be caught cheating with probability at least 1/2. Therefore, this protocol is an interactive proof that the graph is Hamiltonian. It is intuitively clear that this is a 47 zero-knowledge protocol (assuming that the encryption function is secure) since Bob can either ask to see a random isomorphic copy of the graph, or a random permutation of the sequence of nodes in a circuit, neither of which contains any information. zero-knowledge protocol (assuming that the encryption function is secure) since Bob can either ask to see a random isomorphic copy of the graph, or a random permutation of the sequence of nodes in a circuit, neither of which contains any information. 2 (mod n) and a 1 by a randomly chosen square r2 x ( mod n). Since Bob succeeded in proving that x is a square modulo n, then x is indeed a square with very high probability. But, if x is a square, then either 0 or 1 is encrypted by a random square, and Bob cannot distinguish them, no matter how much computational power he has. However, Alice can prove which of the two bits she was encrypting, by revealing r. For Alice to cheat ( e.g., to claim that an encryption of 1 is an encryption of 0) she has to be able to find the square root of x, which is equivalent to factoring. This protocol is dual of the one given by Blum, in the sense that here Alice can cheat if she has unlimited computational power, whereas in the previous one, Bob can. The latter protocol has the conceptual advantage that in order for Alice to cheat, she must be able to factor n on-line, whereas in Blum's protocol, Bob can go home and try to use all of his computational power to figure out Alice's secret after the protocol is over. It is a quite powerful tool to have zero-knowledge protocols for all of NP. As an example, consider the following general problem: suppose that n people each have their own input, and they want to jointly compute a function of these inputs, without giving unnecessary information about their private inputs to the others. Interesting special cases of this problem include the majority function (that is used for voting), the sum function (for a situation when the participants wish to compute their average income), or comparison (for the millionaire's problem, when two millionaires want to decide which one of them is richer). Special protocols to handle some of these problems have been suggested earlier. Goldreich, Micali & Wigderson (1987) used zero-knowledge proofs and other powerful techniques discovered in the last 10 years that we could not cover here, to design protocols to compute any polynomially computable function in zero-knowledge. Inside P In this section we shall focus on the complexity of problems in P. After proving that a problem is in P, the most important next step undoubtedly is to find an algorithm that is truly efficient, and not merely efficient in this theoretical sense. However, we will not address that issue, since that is best deferred to the individual chapters that discuss polynomial-time algorithms for particular 48 problems. In this section, we shall address questions that relate to machine-independent complexity classes inside P. problems. In this section, we shall address questions that relate to machine-independent complexity classes inside P. We shall also consider the space complexity of problems in P. Recall that the parallel computation thesis suggests that there is a close relationship between sequential space complexity and parallel time complexity. We will show another proven case of this thesis: every problem in C ( and even in NC) can be solved extremely efficiently in parallel. This is one source of interest in the complexity class C. Another source is that this complexity class is the basis for the natural reduction that helps to distinguish among the problems in P in order to provide a notion of a hardest problem in P. Logarithmic space The most general restriction on the space complexity of a language L that is known to imply that L E P is logarithmic space. Observe that C ~ NC ~ P, where the last inclusion follows, for example, from Theorem (3.6). The typical use of logarithmic space is to store a constant number of pointers, e.g., the names of a constant number of nodes in the input graph, and in some sense, this restriction attempts to characterize such algorithms. Although C contains many combinatorial examples, we will postpone their presentation until later in this section, when we consider a narrower complexity class defined by parallel computation. Instead, we will focus on the nondeterministic and randomized analogues, for which there are languages that appear to be best characterized in terms of their space complexity. To define the notion of a logarithmic-space reduction, we introduce a variant of logarithmicspace computation that can produce output of superlogarithmic size. We say that a function f can be computed in logarithmic space if there exists a Turing machine with a read-only input tape and a write-only output tape that, on input x, halts with f(x) written on its output tape and uses at most logarithmic space on its work tapes. A problem L1 reduces in logarithmic space to L2 if there exists a function f computable in logarithmic space that maps instances of L1 into instances of L2 such that x is a "yes" instance of L1 if and only if /( x) is a "yes" instance of L2. Let L1 cx.10gL2 denote such a logarithmic-space reduction ( or log-space reduction, for short). In order for this notion of reducibility to be useful, one has to prove that the ex.log relation is transitive. This is not as apparent as in the case of the exp reduction, but it can be shown fairly easily. As a corollary, we see that if L1cx.1ogL2 and there were a log-space algorithm for L2, then we could obtain a log-space algorithm for L1. It is worth noting that all of the polynomial-time reductions mentioned in the discussion of NP-completeness are based on local replacements, and therefore are log-space reductions. In fact, log-space reduction is sometimes used in place of polynomial-time reduction in the definition of NP-completeness and PSPAC£-completeness. A problem L1 is NC-complete if L1 ENC and for all L E NC, Lcx1ogL1. The composition argument given above yields the following result. (5.1) Theorem. For any NC-complete problem L, LE C if and only if C = NC. 49 The following theorem, due to Savitch (1970), shows that the complexity of the directed reachability problem can best be characterized by its space requirement in a nondeterministic computation. The following theorem, due to Savitch (1970), shows that the complexity of the directed reachability problem can best be characterized by its space requirement in a nondeterministic computation. Theorem. The directed graph reachability problem is NC-complete. Proof. A nondeterministic log-space algorithm for the directed graph reachability problem can guess the nodes in an ( s, t)-path one at a time, and at each step verify that the graph contains the arc that connects the nodes in the guessed path. To show that the problem is N £-complete, we must give a log-space reduction from each problem in N £ to the directed graph reachability problem. Let L be in N £ and let M denote a nondeterministic log-space Turing machine that recognizes L. We can assume without loss of generality that M has a single accepting configuration ( e.g., it erases its work tapes and enters a special accepting state before halting). For a given input x, we define a directed graph whose nodes correspond to the configurations of the nondeterministic Turing machine M with input x. Note that the number of these configurations is bounded by a polynomial in the size of x. Two configurations are joined by an arc if one can follow the other via a single transition. The input x is in L if and only if this graph contains a path from the starting configuration to the accepting configuration. D In view of the above result, it was very surprising when Alelunias, Karp, Lipton, Lovasz and Rackoff (1979) showed that the undirected graph reachability problem, the analogue of the directed graph reachability problem for undirected graphs, can be solved by a randomized Turing machine using logarithmic space. We define the class 'RC to be the log-space analogue of 'RP. A language Lis in 'RC if there exists a randomized Turing Machine RM that works in logarithmic space, such that each input x that RM accepts along any computation path is in L and for every x E L, the probability that RM accepts x is at least 1/2. (5.3) Theorem. Undirected graph reachability is in 'RC. Proof. Let G = (V, E) denote the input graph, s and t the two specified nodes, n the number of nodes and m the number of edges. The algorithm attempts to find the required path by following a random walk in the graph, starting from the node s. At each step, an edge incident to the current node is chosen uniformly at random and the walk is continued along this edge. The algorithm stops either if t is reached, or if 4nm steps have been made. We will show that if t ands are in the same connected component of the graph, then with probability at least 1/2, a random walk will reach tin at most 4mn steps. For the purpose of the proof, we may assume that the graph is connected. We first claim that the random walk will use each edge of the graph in each direction with the same frequency, 1 / ( 2m). Let Pv denote the frequency that the random walk is at the node v, i.e., the limit as n tends to infinity of the fraction of the first n steps that the random walk is at the node v. (The existence of this limit can be proved using the theory of Markov chains.) Let deg( v) denote the degree of the node v. The frequency that the random walk uses a particular edge to leave v is ev = Pv/ deg( v ). Before the walk leaves v it has to enter v, and therefore ev = C'f:,;(w,v)EEew)/deg(v). This implies that ev is the same for each node v E V, and therefore ev = 1/(2m). Consequently, each edge 50 (and, in particular, an edge entering t) is going to be used, on the average, once every 2m steps. However, this does not mean that the expected time to reach t is at most 2m, since this is only true as a limit, and might not be true at the beginning of the walk. (and, in particular, an edge entering t) is going to be used, on the average, once every 2m steps. However, this does not mean that the expected time to reach t is at most 2m, since this is only true as a limit, and might not be true at the beginning of the walk. Why is this theorem about undirected graphs? One could conceivably run the same algorithm for directed graphs. The first problem with this approach is that the random walk might get stuck in a strongly connected component from which there is no path to the rest of the graph. This problem could be solved by repeatedly restarting the walk. However, there is a more fundamental aspect to this problem. Consider the graph that consists of an (s, t)-path and an arc from each node back to the source s. This graph is strongly connected, and yet the expected time to reach t from s is exponential in the length of the path. A log-space Turing machine cannot count an exponential number of steps, so it is not clear how one could correctly stop the computation if t is not reachable from s. There could be a way out; termination could be based on a random event that has an exponentially small chance of occurring. Since the directed path problem is N .C-complete, this would prove that 'R.C = NC. However, our definition of a randomized Turing machine does not allow such an algorithm. We have assumed that each branch of the computation tree uses the same number of random bits, and in an algorithm based on the previous idea, some branches of the computation will never stop. One can think of the classes .C and N .C as lower-level analogues of P and NP. By studying the relationships of .C, NC and co-N .C, one hopes to better understand the relationship of deterministic and nondeterministic computation. It is this point of view that makes the following theorem of Immerman (1988) and Szelepcsenyi (1987) one of the biggest surprises in recent developments in complexity theory. (5.4) Theorem. NC = co -N .C The proof uses a definition of nondeterministically computing a function, similar to the notion of ,-reduction that has been mentioned at the end of the subsection on NP-completeness. We say that a function f(x) can be computed in nondeterministic logarithmic space if there is a nondeterministic log-space Turing machine that, on input x, outputs the value f( x) on at least one branch of the computation and on every other branch either stops without an output or also outputs f(x). If f(x) is a Boolean function, then we say that the language L defined by f ( x) = 1 is decided in nondeterministic logarithmic space, which is equivalent to L being in N.Cnco-N.C. We will prove Theorem (5.4) by showing that the N £-complete directed graph reachability problem can be decided in nondeterministic logarithmic space. Given a directed graph G = (V, A), 51 a source nodes and an integer k, let f(G,s,k) denote the number of nodes reachable from the a source nodes and an integer k, let f(G,s,k) denote the number of nodes reachable from the (5.5) Lemma. The directed graph reachability problem is decidable in nondeterministic logarithmic space if and only if the function f ( G, s, k) can be computed in nondeterministic logarithmic space. Proof. To prove the only if direction, we use the fact that the directed graph reachability is N £,complete. If it is decidable in logarithmic space, then so is the problem to recognize if there is a path of length at most k. To compute f( G, s, k ), we use the assumed nondeterministic machine for each node v, to decide if vis reachable from s by a path of length at most k. By counting the number ofreachable nodes, we obtain f(G,S,k). To prove the opposite direction, we use the following nondeterministic log-space computation. First compute f(G, s, n). Then for each node v, (nondeterministically) try to guess a path from s to v. Count the number of nodes for which a path has been found. If a path has been found tot, we accept the input. If f( G, s, n) nodes have been reached without finding a path tot, this proves that t is not reachable from s, so we reject. Finally, if the number of nodes that have been reached is less than f( G, s, n ), then this is an incorrect branch of the computation, and the computation stops without producing an output. □ (5.6) Lemma. The function f(G,s,k) can be computed in nondeterministic logarithmic space. Proof. The proof is by induction on k. The base case to compute f(G,s,O) is trivial. Assume that a log-space nondeterministic Turing machine has already computed the value f( G, s, k ). The value J( G, s, k + 1) can be computed by checking the nodes of G one at a time, and counting those for which a path of length at most k + 1 exists. We shall decide if there exists such a path from s to a particular node v by using f( G, s, k) to decide if there is a path of length at most k to any of the predecessors of v; this can be done by using a variant of the algorithm given in the if part of Lemma (5.5).□ The theorem of Immerman and Szelepcsenyi implies that N SP ACe( S( n)) = co-N SP AC£( S( n)) for every S( n) 2:: log n. This type of upward inheritability result concerning either space or time complexity classes is usually proved by a so-called padding argument. Let L be a language in NSPACe(S(n)). Let the language L' consist of the words formed by appending sufficient $'s to a word oflength n from L to make the length equal to 2S(n) (where we assume that $ is not in the alphabet of L). It is easy to see that L' ENC(= co-NC). The nondeterministic log-space Turing machine that proves that L' E co -NC can be used to test non-membership in L rather than L'. (We simulate appending the correct number of $'s to an input by keeping a counter that maintains a record of the input head position, when the simulated head would like to read one of imaginary $'s.) This shows that LE co -NSPACe(S(n)). The hardest problems in P One important application oflog-space computation was introduced by Cook (1973), who used log-space reducibility to introduce a notion of a hardest problem in P. A problem £1 is P-complete if L1 E P and for all L E P, Lcx1ogL1• The transitivity of the log-space reduction gives the following theorem. 52 (5.7) (5.7) Later in this section we shall see that P-completeness also provides evidence that a problem cannot be efficiently solved in parallel. This fact has greatly increased the interest in P-completeness and a variety of problems have been shown to be P-complete. Perhaps the most natural example is the circuit value problem, which was proved P-complete by Ladner (1975b). Given a circuit (with and, or and not gates) with truth values assigned to its input gates, the circuit value problem is to compute the output of the circuit. (5.8) Theorem. The circuit value problem is P-complete. Proof. A computational model that uses circuits and is equivalent to Turing machines has already been discussed, and this equivalence suggests the theorem. The circuit value problem is clearly in P. Given a polynomial-time Turing machine M and an input x, we shall build a circuit that has value 1 if and only if x is in the language accepted by the Turing machine M. The construction is done in a way analogous to the proof of Cook's theorem. We encode the entire computation as a matrix, but instead of constructing a Boolean formula, we build a circuit that computes one row of this matrix from the preceding one to simulate the computation of the Turing machine. D When proving other problems in P to be P-complete we shall use the same strategy as was used for NP-completeness: merely reduce from problems that are already known to be P-complete. To facilitate later reductions, note that restricted versions of the circuit value problem are P-complete. For example, this is true for the restricted monotone circuit value problem, where the circuits use only and and or gates, and each node in the directed acyclic graph that defines the circuit has both indegree (fan-in) and outdegree (fan-out) at most 2. This can be shown by considering the inputs and their negations as separate input variables, and then converting the circuit into an equivalent one that computes the value of every gate and its negation separately. Dobkin, Lipton and Reiss (1979) proved that each problem in P log-space reduces to the linear programming problem, and the celebrated result of Khachiyan (1979) showed that it is in P. Valiant gave a straightforward reduction from a restricted circuit value problem that uses linear constraints to trace the value computed by the circuit. The maximum flow problem is an important special case of the linear programming problem. In this problem, we are given a directed graph G = (V, A) with two specified nodes, the source s and the sink t, and a non-negative capacity u( a) assigned to each arc a E A. A feasible flow is a vector f E RA that satisfies the capacity constraints, i.e., 0 ~ f(a) ~ u(a) for each arc a E A, and the flow conservation constraints, i.e., the sum of the flow values on the arcs incident to a node v :f. s, tis the same as the sum of the flow values on the arcs incident from v. The value of a flow is La=(v,t)EA f( a) -La=(t,v)EA f( a). The maximum flow problem is to find a feasible flow of maximum value. Goldschlager, Shaw and Staples (1982) showed that computing the parity of the maximum flow is also P-complete. We give a log-space reduction from the restricted monotone circuit value problem for circuits in which the output gate is an or gate and all gates have fan-in and fan-out at most 2. Number the gates of the circuit so that wires of the circuit go from higher numbered gates to lower numbered gates, and the output gate is numbered 0. The nodes of the graph will correspond to the gates of the circuit with two additional nodes, a source s and a sink 53 t. t. There is a collection of P-complete problems that are related to simple polynomial-time algorithms. The maximal stable set problem, in which the objective is to find a maximal (not maximum) stable set in an undirected graph, can clearly be solved by a simple greedy algorithm. Similarly, the depth-first search procedure greedily finds a depth-first search tree, which is a spanning tree of an undirected graph rooted at a given node r, such that for any edge ( u, v) in the graph, the path from r to one of u and v passes through the other ( see Aho, Hopcroft & Ullman (1974)). When using one of these procedures, we usually select the first available node in each step, and so we find a specific solution, the lexicographically-first one. Reif (1985) proved that the related problem of finding the lexicographically-first depth first search tree is P-complete. Cook showed that finding the lexicographically-first maximal stable set is P-complete. These results might be surprising since both of these problems are easy to solve in polynomial time. However, P-completeness also provides evidence that the problem is not solvable efficiently in parallel. Consequently, these completeness results support the intuition that these particular procedures are inherently sequential. Parallel Computation Parallel computation gives us the potential of curbing the effects of the intractability of certain problems by solving them with a large number of processors simultaneously. In studying parallel algorithms, we shall not be concerned with the precise number of parallel processors used, but rather their order as a function of the input size. We say that a parallel algorithm using O(p( n)) processors achieves optimal speedup, if it runs in O(t(n )) time and the best sequential algorithm known for solving the same problem runs in O ( t( n )p( n)) time. Effi.cien t algorithms that reach optimal ( or near optimal) speedup with a significant number of processors have been found for many of the basic combinatorial problems. Another aspect of parallel computation is concerned with the inherent limitations of using many processors to speed up a computation. To be somewhat realistic, we shall only be interested in algorithms that use a polynomial number of processors. Consequently, we will focus on the possible speedup of polynomial-time sequential computation by parallel processing. The interested reader is referred to the survey of Karp and Ramachandran (1990) for more details and references. First we define a model of parallel computation. Although many such models have been proposed, one that seems to be the most convenient for designing algorithms is the parallel random access machine (PRAM). The PRAM is the parallel analogue of the random access machine; it consists of a sequence of random access machines called processors, each with its own infinite local random access memory, in addition to an infinite shared random access memory where each memory cell can store any integer, and the input is stored in the shared memory. Each processor knows the input size and its identification number, although otherwise the processors are identical 54 ( i.e., they run the same program). Different variants of the basic PRAM model are distinguished by the manner in which they handle read and write conflicts. In an EREW PRAM ( exclusive-read exclusive-write PRAM), for example, it is assumed that each cell of the shared memory is only read from and written into by at most one processor at a time. At the other extreme, in a CRCW PRAM ( concurrent-read, concurrent-write PRAM), each cell of the memory can be read from and written into by more than one processor at a time. If different processors attempt to write different things in the same cell, then the lowest numbered processor succeeds. ( i.e., they run the same program). Different variants of the basic PRAM model are distinguished by the manner in which they handle read and write conflicts. In an EREW PRAM ( exclusive-read exclusive-write PRAM), for example, it is assumed that each cell of the shared memory is only read from and written into by at most one processor at a time. At the other extreme, in a CRCW PRAM ( concurrent-read, concurrent-write PRAM), each cell of the memory can be read from and written into by more than one processor at a time. If different processors attempt to write different things in the same cell, then the lowest numbered processor succeeds. Let us first review a sequential algorithm for the problem: select a node v and include it in the stable set, delete v and all of its neighbors from the graph; repeat this procedure until all nodes have been deleted. Note that this algorithm requires n iterations for a path of length 2n. A similar approach can still be used for a parallel algorithm. To make the algorithm fast, one selects a stable set in each iteration (rather than a single node), where the set is deleted along with its neighborhood. The following simple way to choose a stable set is due to Luby. A processor is assigned to each node and each edge of the graph. For a graph of order n, the processor assigned to node v picks a random integer c( v) from 1 to n4• Next, each processor assigned to an edge compares the values at the two nodes of the edge. The stable set selected consists of those nodes v for which c( v) is strictly larger than the values assigned to any of its neighbors. This algorithm clearly finds a maximal stable set, but it is less clear that the algorithm runs fast. It can be shown that each iteration is expected to remove a constant fraction of the edges, and consequently, the expected number of iterations is O(log n ). The algorithm can be implemented on a randomized CRCW PRAM in O(log n) time (if we assume that a processor can choose a random number of size O(log n) in one step). Luby gave an elegant version of a technique of Karp and Wigderson that is used to convert certain randomized algorithms into deterministic ones. The technique can be used if the analysis of the randomized algorithm depends only on pairwise, rather than fully, independent random choices. The idea is that pairwise independent random integers between 1 and n4 can be chosen from a sample space of polynomial size. Each iteration can then be run for each point in the sample space simultaneously, and this ensures that a good sample point is used. This method can be used to convert the above randomized algorithm into a deterministic one. A simple, but important example of a deterministic parallel algorithm is for the directed graph reachability problem. One way to solve this problem is by repeatedly squaring the adjacency matrix of the graph. More precisely, after the ith iteration, the variable av,w will indicate whether there exists a path oflength at most 2i from v tow. To update this information we assign a processor to every triplet of the nodes. In the ith iteration, the processor assigned to the triplet ( v, u, w) checks whether there exists a path from v to w through u, that has fewer than 2i-l nodes both before and after u. This algorithm, when implemented on a CRCW PRAM uses 0( n3) processors, and runs in O(log n) time. 55 Parallel computation can also be considered in the framework of other computational models. By using many processors, one can evaluate a circuit in time proportional to its depth. Hence, the depth of a circuit is the natural analogue of parallel time. When using families of circuits as a model of computation, one must make certain uniformity assumptions. In this case, the usual uniformity assumption is log(space)-uniformity; that is, for inputs of size n, the circuit must be generated by a Turing machine using space O(log n ). This allows us to define parallel complexity classes. For i 2: 0, the class ACi is defined to be the class oflanguages that are decidable by a family of logspace-uniform circuits with and, or, and not gates of depth O(logi n) and size polynomial inn. The class NCi is defined in the same way with the additional restriction that each gate in the circuit has constant fan-in. Clearly, Nci ~ ACi. Furthermore, an and gate or an or gate with a polynomial number of inputs can be computed by a constant fan-in circuit in depth O(log n ). Consequently, ACi ~ Nci+i. Notice that any function that depends on all of its inputs requires depth n(log n) to be computed on a constant fan-in circuit, and an analogous bound holds in any reasonable model of parallel computation. Analogous to P, membership in the class NC= Ui>oNCi = Ui>oACi has become accepted as the theoretical model of efficiently computable in parallel. - Parallel computation can also be considered in the framework of other computational models. By using many processors, one can evaluate a circuit in time proportional to its depth. Hence, the depth of a circuit is the natural analogue of parallel time. When using families of circuits as a model of computation, one must make certain uniformity assumptions. In this case, the usual uniformity assumption is log(space)-uniformity; that is, for inputs of size n, the circuit must be generated by a Turing machine using space O(log n ). This allows us to define parallel complexity classes. For i 2: 0, the class ACi is defined to be the class oflanguages that are decidable by a family of logspace-uniform circuits with and, or, and not gates of depth O(logi n) and size polynomial inn. The class NCi is defined in the same way with the additional restriction that each gate in the circuit has constant fan-in. Clearly, Nci ~ ACi. Furthermore, an and gate or an or gate with a polynomial number of inputs can be computed by a constant fan-in circuit in depth O(log n ). Consequently, ACi ~ Nci+i. Notice that any function that depends on all of its inputs requires depth n(log n) to be computed on a constant fan-in circuit, and an analogous bound holds in any reasonable model of parallel computation. Analogous to P, membership in the class NC= Ui>oNCi = Ui>oACi has become accepted as the theoretical model of efficiently computable in parallel. - (5.9) Theorem. For i 2: 1, ACi is the class of languages recognizable by the above variant of the CRCW PRAM with a polynomial number of processors in O(logi n) time. Proof. We first show how to simulate a family of circuits by a PRAM. Once we have the appropriate circuit from the family, this is easy to do. The only difficulty lies in constructing the circuit. By our uniformity assumption, the circuit for inputs of size n can be generated by a Turing machine in space O(log n ), and so the problem of deciding if ( i, j) is an arc of the circuit is in C ~ NC. We have seen an O(log n )-time CRCW PRAM algorithm for the directed graph reachability problem. To finish this proof, we combine the proof of the N £-completeness of the directed graph reachability problem with the above algorithm, to prove that any language in NC can be recognized in time O(log n) by a CRCW PRAM. To prove the other direction we have to simulate a CRCW PRAM by a logspace-uniform family of circuits. The only difficulty is in simulating indirect addressing. It is a good exercise to show that indirect addressing can be simulated by unbounded fan-in circuits in constant depth. □ The maximal stable set algorithm of Luby discussed earlier uses a randomized version of the CRCW PRAM. In a randomized CRCW PRAM algorithm, every processor can choose a logarithmic-size random number in one step. We define the class 'RNC to consist of all languages L for which there exists a randomized CRCW PRAM algorithm that runs in polylogarithmic time using a polynomial number of processors and both accepts each word in L and rejects each word not in L with probability at least 2/3. The restriction in the definition of NCi is similar to the exclusive-write restriction in the PRAM model. However, this similarity is somewhat misleading. Hoover, Klawe and Pippenger (1984) proved that a constant fan-in circuit can be converted into an equivalent circuit with both constant fan-in and constant fan-out, while increasing both the size and depth by only a constant factor. 56 Consequently, for i 2: 2, any language in NCi can be recognized by an EREW PRAM in O(logi n) Consequently, for i 2: 2, any language in NCi can be recognized by an EREW PRAM in O(logi n) Ruzzo (1981) has discovered a characterization of the class NCi using alternating Turing machines. He proved that for i 2: 2, the class NCi is the class of languages that can be recognized by an alternating Turing machine simultaneously using O(log n) space and O(logi n) time. Ruzzo gave a more sophisticated definition for uniformity of circuit families, that is more restrictive for circuits of depth O(log n ). Under his definition, the above equation also holds true for i = 1. The proof of Theorem (5.9) also shows that NC ~ AC1• As a consequence, a log-space reduction can be simulated efficiently in parallel, and therefore P-completeness provides evidence that a problem is not efficiently solvable in parallel. (5.10) Theorem. For any P-complete problem L, LE NC if and only if NC= P. On the other hand, NC1 ~ C. More generally, any logspace-uniform family of constant fan-in circuits of depth at least log n, can be simulated by a Turing machine in space proportional to the depth of the circuit, that is, NCi ~ DSP ACC(logi n ). As a corollary, we obtain the following chain of containments: We have seen that NC is contained in Ui>iDSPACC(logi n). By the analogue of Theorem (3.7) for space complexity, this implies that NC =f:: PSPACC. It is fairly easy to argue that NC0 does not contain the or function, and therefore NC0 =f:: AC0• The only further separation result known is that AC0 =f:: NC1. This is a landmark result of Furst, Saxe and Sipser (1981) and Ajtai (1983), and will be discussed in the next section. On a CRCW PRAM one can compute the or of n variables in one step. This suggests that the CRCW PRAM is not a reasonable model of parallel computation. In a more realistic model, computing any function that depends on all of its inputs requires n(log n) time. The addition problem takes two n-bit binary numbers as inputs (where each bit is stored in a separate memory location) and computes their (n+l)-bit sum. Surprisingly, even the addition problem can be solved in constant time on a CRCW PRAM. It is a good exercise to figure out how to add two numbers using constant-depth, unbounded fan-in circuits. Cook, Dwork and Reischuk (1986) have considered the limitations of the CREW PRAM ( concurrentread exclusive-write PRAM). They considered the or function, and showed that it required n(log n) time. This result is subtler than it seems since with clever use of the concurrent read option one can actually do better than log2 n. Next we shall give examples of simple parallel algorithms. First we consider the arithmetic operations. We have just mentioned that addition (and similarly subtraction) can be done in AC0. The standard shift-and-add algorithm reduces multiplication to the addition of n numbers. Consequently, multiplication can be done in AC1• In fact, multiplication is in NC1. This fact can be established by noticing that the addition of three numbers can be reduced to the addition of two in NC0 (by separately maintaining the carries). Beame, Cook and Hoover (1986) showed that integer division can also be computed by depth O(log n) constant fan-in circuits. However, the construction is not log-uniform. The best log-uniform family of circuits for integer division is due 57 to Reif (1986) and has depth O(lognloglogn). Evaluating arithmetic expressions with addition and multiplication operations has been considered by Brent (1974) who gave an O(logto Reif (1986) and has depth O(lognloglogn). Evaluating arithmetic expressions with addition and multiplication operations has been considered by Brent (1974) who gave an O(logn)-time parallel algorithm. Miller and Reif (1985) have developed a general tree contraction method, that can manage the binary tree associated with the parenthesis structure of the arithmetic expression. The resulting algorithm runs in O(log n) time and uses 0( n) processors. Another important area of efficient parallel algorithms deals with the sorting problem. One can easily sort in AC0, and consequently in NC1. However, the trivial algorithm uses 0( n2) processors. The first deterministic EREW PRAM algorithm that runs in O(log n) time using 0( n) processors was given by Ajtai, Komlos and Szemeredi (1983). However, the constant in the running time is too large for the method to have any practical use. Cole (1988) later gave a simpler method that achieved the same asymptotic bounds. The Ajtai, Komlos, Szemeredi algorithm is based on their O(logn)-time sorting network. Consider the following restricted version of the sorting problem. Suppose that n numbers are given in memory locations 1, ... , n. The basic operation for the algorithm is to compare the numbers in two memory locations and rearrange them so that the smaller number is in the location with the smaller index. A comparison network can compare and possibly swap the numbers in several disjoint pairs of memory locations at the same time. The network has to be oblivious, in the sense that the pairs to be compared cannot depend on the outcomes of previous comparisons. A comparison network is a sorting network if it is guaranteed to rearrange the elements in non-decreasing order. The construction of the Ajtai, Komlos and Szemeredi sorting network is based on constantdegree expander graphs. A bipartite graph with color classes A and B such that IAI = IBI is an E-expander if for any SC A satisfying ISi ~ IAl/2 it follows that the number of neighbors of Sin Bis at least (1 + E)ISI. For every constant E < 1 there exists a constant d such that the union of d randomly chosen perfect matchings is an £-expander with non-zero probability. This establishes that constant-degree £-expanders with E > 0 exist. The first deterministic construction of constant degree expanders was given by Margulis (1973). Since then, several simpler constructions have been given that yield better parameters. A classical (and much more practical) sorting network, that takes O(log2 n)-time, is the sorting network of Batcher (1968). It repeatedly uses an O(log n)-time merging network, that is a comparison network guaranteed to rearrange the elements in non-decreasing order if the first n/2 and the second n/2 elements of the input are separately in non-decreasing order. The Batcher merging network can be defined recursively. Assume for simplicity that n is a power of 2. To merge two lists of length n/2, we first merge the lists of length n/ 4 consisting of the elements in the odd and even numbered memory cells of the two lists separately. We then compare and possibly swap the pairs in locations 2i and 2i + 1 for i = 1, ... , n/2 -1 in parallel. It is not difficult to see that this defines an O(log n )-time merging network. Interestingly, many of the simple parallel graph algorithms are based on matrix operations. The parallel directed graph reachability algorithm presented earlier was in fact based on Boolean matrix multiplication. We shall discuss matrix operations before the more combinatorial elementary graph problems. General matrix multiplication can easily be done on a CRCW PRAM in O(log n) time using O(n3) processors. In fact, using the ideas from the more efficient matrix multiplication techniques, one can save somewhat on the number of processors used. Csanky (1976) has given an NC algorithm to compute the determinant of a matrix in O(log2 n) 58 time. The method actually computes the characteristic polynomial of a matrix A ( i.e., the polynomial det(A-xl), whose constant term is detA). In fact, Csanky's method can be used to compute the characteristic polynomial of a matrix over any field of characteristic O, where the field operations are assumed to take unit time. Berkowitz (1984) and Chistov have independently extended this result to matrices over arbitrary fields. Over fields of characteristic 0, the last non-zero term of the characteristic polynomial gives the rank of the matrix. This is not true over arbitrary fields. Recently, deterministic parallel algorithms to compute the rank of a matrix over arbitrary fields have been independently discovered by Chistov and Mulmuley (1987), improving on the randomized algorithm of Borodin, von zur Gathen and Hopcroft (1982). As a corollary, we get a parallel algorithm to solve systems of linear equations. time. The method actually computes the characteristic polynomial of a matrix A ( i.e., the polynomial det(A-xl), whose constant term is detA). In fact, Csanky's method can be used to compute the characteristic polynomial of a matrix over any field of characteristic O, where the field operations are assumed to take unit time. Berkowitz (1984) and Chistov have independently extended this result to matrices over arbitrary fields. Over fields of characteristic 0, the last non-zero term of the characteristic polynomial gives the rank of the matrix. This is not true over arbitrary fields. Recently, deterministic parallel algorithms to compute the rank of a matrix over arbitrary fields have been independently discovered by Chistov and Mulmuley (1987), improving on the randomized algorithm of Borodin, von zur Gathen and Hopcroft (1982). As a corollary, we get a parallel algorithm to solve systems of linear equations. Consider the problem of testing if a graph is planar. Ja'Ja' and Simon (1982) have observed that the planarity of a graph can be tested efficiently in parallel using Tutte's characterization that states that a straight line embedding of a 3-connected graph can be obtained by solving a set of linear equations. One drawback of this approach is that Tutte's theorem only works for 3-connected graphs. Ja'Ja' and Simon extended this approach to check whether an arbitrary graph is planar, but the algorithm only constructs the plane embedding for 3-connected graphs. Miller and Reif (1985) have used the tree contraction method mentioned earlier to extend this algorithm to arbitrary graphs. Klein and Reif (1988) have recently given an efficient planarity testing algorithm that does not rely on matrix inversion, and is also more efficient in terms of the number of processors used. We next turn our attention to elementary graph searching techniques. A parallel algorithm for the directed graph reachability problem based on Boolean matrix multiplication has been discussed earlier. This algorithm can be used to construct a breadth-first search tree of the graph. It is an important open problem to find a polylogarithmic-time parallel algorithm for breadth-first search that uses fewer processors than are needed for matrix multiplication. The undirected graph reachability problem can be solved more efficiently. Shiloach and Vishkin (1982) have given an implementation of an algorithm by Hirschberg, Chandra and Sarwate (1979) that takes O(logn) time on a CRCW PRAM using 0( m + n) processors. Although this algorithm is fairly efficient, it is still not optimal in the sense that the product of the time and the number of processors is more than the running time of the best sequential algorithm known for the problem. Gazit (1986) gave an O(logn)-time randomized CRCW PRAM algorithm that only uses O((m+n)/logn) processors. No deterministic O (log n )-time optimal algorithm is known. These algorithm can be used to construct a spanning forest of the graph, but not necessarily the breadth-first search tree. Another elementary search technique is depth-first search. No efficient deterministic parallel algorithm is known for finding a depth-first search tree. Aggarwal and Anderson (1988) have given 59 a randomized parallel algorithm for finding a depth-first search tree (first for undirected graphs, and later with Kao (1989), they extended the result to directed graphs). The algorithm is based on a randomized algorithm to find a perfect matching in parallel that, in turn, is based on matrix inversion. As a consequence, the algorithm uses a large number of processors, and is not efficient in practice. a randomized parallel algorithm for finding a depth-first search tree (first for undirected graphs, and later with Kao (1989), they extended the result to directed graphs). The algorithm is based on a randomized algorithm to find a perfect matching in parallel that, in turn, is based on matrix inversion. As a consequence, the algorithm uses a large number of processors, and is not efficient in practice. 0• Lovasz has used the ear-decomposition algorithm to find a minimum cost branching of a directed graph. (A branching is a directed tree where each node except a specified root has indegree 1.) fu undirected graphs, an ear-decomposition can be found more efficiently. Lovasz's algorithm for undirected graphs runs in O(log n) time and uses a linear number of processors. The ear-decomposition algorithm does not in general give an open ear-decomposition ( even if one exists). Maon, Schie her and Vishkin ( 1986) modified the algorithm to find an open ear-decomposition with the same complexity. Ear-decomposition techniques can be used to check whether a graph is 2-edge or 2-node connected, though a parallel algorithm using O(log n) time and a linear number of processors had been given earlier by Tarjan and Vishkin (1985). Miller and Ramachandran (1987) used the ear-decomposition technique to design an NC algorithm for finding 3-connected components that uses only a linear number of processors. The first NC algorithm to decide whether a graph is 3-connected is due to Ja'Ja' and Simon (1982); however, they were unable to find the 3-connected components of a graph. Miller and Reif ( 1985) used tree contraction methods to find 3-connected components in parallel. One of the most important open problems in parallel computation is to give an efficient parallel algorithm that finds a maximum matching in a graph. Recall the polynomial time randomized algorithm of Lovasz (1979) that was discussed in introducing the idea ofrandomized computation. The algorithm decides whether a graph has a perfect matching by computing the determinant of a single randomly chosen matrix. Determinants can be computed efficiently in parallel. This establishes that deciding whether a graph has a perfect matching is in RNC. The same idea shows that the exact matching problem is in RNC. This is an interesting case of a problem in RNC that is not known to be in P. It is important to notice that decision and search problems are not equivalent for parallel computation. There does not appear to be a way to turn the RNC algorithm of Lovasz that decides whether a perfect matching exists, into an RNC algorithm that finds a perfect matching if one exists. The difficulty seems to lie in identifying a particular matching. The graph might have several matchings, and the different processors working in parallel have to coordinate which perfect matching to choose. An RNC algorithm to find a perfect matching in a graph (if one exists) has been given by Karp, Upfal and Wigderson (1986). Later Mulmuley, Vazirani and Vazirani (1987) gave a very elegant and more efficient algorithm. Here we briefly outline the later algorithm in the case of bipartite graphs. For a bipartite graph G = (U U W, E) with n nodes in each of the color classes U and W, independently assign to each edge e an integral cost c( e) selected uniformly at random between 1 and 2IEI. The algorithm is based on a lemma which states that with probability 60 at least 1/2, the minimum cost perfect matching is unique. The algorithm proceeds as follows: define an n by n matrix A with entries aij = 2c(e) if e = ( Ui, Vj) E E and O otherwise. The 2c terms in the expansion of the determinant of the matrix A correspond to matchings of cost c. Now assume that the minimum cost perfect matching is unique, and has cost c. The determinant expansion has exactly one term that is not divisible by 2c+1 and consequently, det A f:. 0. Let bij denote the (i,j)th entry of A- at least 1/2, the minimum cost perfect matching is unique. The algorithm proceeds as follows: define an n by n matrix A with entries aij = 2c(e) if e = ( Ui, Vj) E E and O otherwise. The 2c terms in the expansion of the determinant of the matrix A correspond to matchings of cost c. Now assume that the minimum cost perfect matching is unique, and has cost c. The determinant expansion has exactly one term that is not divisible by 2c+1 and consequently, det A f:. 0. Let bij denote the (i,j)th entry of A-. An edge e = (ui,vj) belongs to the unique minimum cost matching if and only if the highest power of 2 dividing bij is 2c-c(e). As a corollary of the 'RNC perfect matching algorithm, one immediately has an 'RNC algorithm to find a maximum matching ( or to find a matching of a given size). The resulting algorithm will output a matching that is a maximum-size matching with high probability. Karloff (1986) has noticed that the Gallai-Edmonds structure theorem (see Chapter 3) implies that a maximum matching algorithm can be used to produce a minimum odd-set cover. This gives a Las Vegas-type randomized maximum matching algorithm, that is, an algorithm that is likely to find a matching proven to be maximum. Further consequences of the perfect matching algorithm are RA!C algorithms for finding a minimum cost perfect matching if the costs are given in unary, and for the maximum flow problem with capacities given in unary. Recall that the maximum flow problem with capacities given in binary is P-complete. It is an interesting open problem whether a minimum cost matching problem (with costs given in binary) is in 'RNC. There are several important problems that are known to be in P, but are not known to be in NC ( and even 'RN C), and yet are not known to be P-complete. In addition to the ones already mentioned, perhaps the most prominent is the problem of computing the greatest common divisor of two integers. A fast parallel algorithm for this would have implications for a variety of numbertheoretic problems, including primality testing, which have so far resisted attempts to classify their parallel complexity. One major element of Luks' (1982) polynomial-time algorithm to test if two constant-degree graphs are isomorphic, is an algorithm of Furst, Hopcroft and Luks (1980) to compute the order of a permutation group, and a recent result of Babai, Luks and Seress (1988) provides a parallel analogue of that result. Nonetheless, the problem a providing an efficient parallel algorithm for testing constant-degree graph isomorphism remains an important open problem. Attacks on the P versus NP Problem In previous sections, we have discussed techniques for providing evidence of the intractability of concrete problems ( e.g., NP-completeness). However, this evidence is not a proof, as long as the fundamental problems remain open. In this section, we will be concerned with approaches to settling the P versus NP question. First we consider the generalization of these unresolved questions to oracle Turing machines. For example, we will consider questions such as whether pA = NPA for an oracle A. These problems were introduced in order to gain insight into the techniques that might be used to settle the fundamental open problems. Recent results, most importantly those of Furst, Saxe and Sipser (1981) and Razborov(1985a), have provided a major breakthrough in the development of techniques for proving lower bounds on the computational complexity of certain problems. These methods have succeeded by considering computational models with serious limitations on some of the resources. The main lines of such 61 research are for polynomial-size, constant-depth circuits and polynomial-size monotone circuits (i.e., circuits without negations that use only and and or gates). The techniques developed for these results are combinatorial in nature. research are for polynomial-size, constant-depth circuits and polynomial-size monotone circuits (i.e., circuits without negations that use only and and or gates). The techniques developed for these results are combinatorial in nature. 3. We will discuss three combinatorial techniques. We discuss the classical crossing sequence technique, where the lower bound on time is based on the amount of information that can pass from one end of the tape to the other. Next we discuss a result of Hopcroft, Paul and Valiant (1977) that relates the time and space complexity of a problem. We conclude with the landmark result of Paul, Pippenger, Szemeredi and Trotter (1983), that proves that DTIME(n) =J NTIME(n). Relativized complexity classes In the previous sections we have introduced the notion of an oracle Turing machine and complexity classes defined by such Turing machines. Several of the theorems and proof techniques discussed in the previous sections easily extend to these complexity classes ( i.e., they relativize). By generalizing diagonalization and simulation results, we get theorems such as APA= PSPAC£A = NPSPAC£A for every oracle A. The main result concerning oracle complexity classes, due to Baker, Gill and Solovay (1975), is that the answer to the relativized versions of the fundamental open problems depends on the oracle. (6.1) Theorem. There exist languages A and B such that pA =J NpA =J co -NPA, and pB = NpB = co -NP8. The existence of an appropriate language A for the first alternative is proved by diagonalization. Any PSPAC£-complete language B will provide an example of the second alternative. As a corollary, we can assert that techniques that relativize cannot settle the P versus NP problem. One might wonder which one of the two alternatives provided by the theorem of Baker, Gill, and Solovay is more typical. Recall the definition of a random oracle. We say that a statement holds for a random oracle if the probability that the statement holds for a random oracle is 1. The Kolmogorov 0-1 law of probability theory states that if an event A is determined by the independent events, 131, 132, ... and A is independent of any event that is a finite combination of the events 13i, then the probability of A is either O or 1. Applying this to the event A that pA = NPA and the events 13i that the ith word is in the language A, we see that the probability of pA = NpA for a random oracle A is either O or 1. Bennett and Gill (1981) have provided the final answer to this question. (6.2) Theorem. pA =J NPA and NPA =J co -NPA for a random oracle A. Furst, Saxe and Sipser ( 1981) discovered the following connection between constant-depth circuit lower bounds and separating relativized complexity classes. This result provides further motivation for studying restricted models of computation. The parity function indicates the sum modulo 2 of the bits of the input. 62 (6.3) (6.3) 10gc n) gates, then there exists an oracle A that separates the polynomial-time hierarchy from PSPAC£; that is, Uk?:t:Ei'A =/ PSPACeA. The idea of the proof is as follows. For any oracle A, we define the language L(A) = {lnlA contains an odd number of strings of length n}. L(A) is in PSPACeA for any oracle A. Using diagonalization and the lower bound assumed in the theorem, one can construct an oracle A such that L(A) is not in Uk>t:Ei'A. First one shows that it is sufficient to consider alternating Turing machines in which every branch of the computation has a single oracle question at the end of the branch. The computation tree of such an alternating Turing machine with k levels of alternation corresponds to a depth-k circuit where the oracle answers are the inputs. Another related problem is to separate the finite levels of the polynomial-time hierarchy using oracles. Baker and Selman ( 1979) proved the existence of an oracle A such that :Ef ,A # rrf ,A ( and consequently, :Ef,A # :Ef ,A). Sipser (1983a) showed a result similar to Theorem (6.3) for the finite levels of the polynomial-time hierarchy. It replaces parity by a certain function Fk, that is in some sense, a generic function that is computable in depth k. (6.4) Theorem. If for every k and c, any family of circuits of depth k -1 computing Fk has f!(n10gcn) gates, then for every k there exists an oracle Ak, such that :Ef ,Ak =/ :Ef+~k. The circuit complexity results that are needed in these theorems have been proved in a series of papers by Furst, Saxe and Sipser (1981), Sipser (1983a), Yao (1985) and Hastad (1986). These results will be discussed in the next subsection. Based on stronger circuit complexity results, Babai (private communication) and Cai (1986) separated PSP AC£ from the finite levels of the hierarchy by random oracles. The question whether random oracles separate the finite levels of the polynomial-time hierarchy remains open. Another open question concerning random oracles is whether pA = AfpA n co -AfpA for a random oracle A. Similar oracle separation results have also been proved for classes of languages defined by randomized proof systems. For example, Santha (1989) provided an oracle B for which MAB # AMB. Aiello, Goldwasser and Hastad (1986) provided an oracle under which IP is not contained in the polynomial-time hierarchy. In contrast with the speedup theorem of Babai and Moran mentioned earlier, Aiello, Goldwasser and Hastad (1986) proved that for any functions f and g such that g(n) = o(f(n)), there exists an oracle B such that AMB(f(n)) # AMB(g(n))3• Relating circuit complexity to Turing machine complexity Among the three models of computations introduced in Section 2, circuits have the simplest, most combinatorial structure, and therefore seem to be most amenable for proving lower bounds. This is especially true if we consider nonuniform families of circuits; that is, we forget about the 3In considering of the results of Lund, Fortnow, Karloff & Nisan and Shamir, it is significant to note that these techniques do not relativize, as is evident from the oracle results known for these classes. 63 uniformity requirement. This simple combinatorial structure has made it possible to use combinatorial techniques for proving lower bounds. Boppana and Sipser (1990) give an excellent survey of circuit complexity lower bounds. uniformity requirement. This simple combinatorial structure has made it possible to use combinatorial techniques for proving lower bounds. Boppana and Sipser (1990) give an excellent survey of circuit complexity lower bounds. To measure the amount of nonuniformity in a family of circuits, we introduce a nonuniform version of the Turing machine. A nonuniform Turing machine, in addition to the usual features of a deterministic Turing machine, has an associated sequence of advice strings a1, a2, ••• , ( to be written onto an additional advice tape). The nonuniform Turing machine accepts an input of length n if with an written on its advice tape, the Turing machine accepts the input. Note that there is an important difference between the acceptance rule for a nonuniform and a nondeterministic Turing machine. A nonuniform Turing machine has to use the same advice for every input oflength n. On the other hand, the definition does not restrict the behavior of a nonuniform Turing machine with incorrect advice. We can measure the nonuniformity of a Turing machine by the length of the advice. A language is in P /log if and only if there exists a nonuniform Turing machine that accepts L, runs in polynomial time, and whose family of advice strings { an} has length O(log n ). Similarly, P / poly is the set of languages that can be accepted by a polynomial-time nonuniform Turing machine. It is easy to see that a language L can be recognized by a family of polynomial-size (nonuniform) circuits if and only if it is in P /poly. (The right advice is the description of the circuit.) Karp and Lipton (1982) used simple self-reducibility properties to prove following theorem, that showed that uniform and nonuniform complexity classes are not too far from each other. (6.5) Theorem. The following relationships exist between non-uniform and uniform complexity classes: (1) if NP~ P /log then P = NP; (2) if NP~ P /poly then ~f = IIf; (3) if PSPACe ~ P /poly then PSPAC£ = ~f. The following theorem which is based on an idea of Adleman (1978) shows a natural way in which nonuniform families of circuits are more powerful that their uniform analogues. (6.6) Theorem. 13PP ~ P /poly. Proof. Consider a randomized Turing machine RM that recognizes a language L. If we run the algorithm k times and take the majority decision, the error probability decreases exponentially in k. Therefore, we can assume without loss of generality that RM accepts every word x E L with probability 1 -2-n-l and rejects every word x