SCHOOL OF OPERATIONS RESEARCH
AND INDUSTRIAL ENGINEERING
COLLEGE OF ENGINEERING
CORNELL UNIVERSITY
ITHACA, NY 14853-7501
SCHOOL OF OPERATIONS RESEARCH
AND INDUSTRIAL ENGINEERING
COLLEGE OF ENGINEERING
CORNELL UNIVERSITY
ITHACA, NY 14853-7501
August 1990
COMPUTATIONAL COMPLEXITY
by
David B. Shmoys1 & Eva Tardos2
This paper is a preliminary version of a chapter to appear in The Handbook of Combinatorics, edited
by R.L. Graham, M. Grotschel and L. Lovasz, (North-Holland, Amsterdam).
1Research was supported in part by an NSF Presidential Young Investigator award CCR-89-96272
with matching support from IBM, UPS and Sun and by Air Force Contract AFOSR-86-0078.
2Research was supported in part by Air Force Contract AFOSR-86-0078.
Table of Contents
Table of Contents
Introduction
2. Complexity of Computational Problems
• Computational problems
• Models of computation
• Computational resources and complexity
• Complexity classes
• Other theoretical models of efficiency
3. Shades of Intractability
• Undecidability
• Evidence of intractability: NP-completeness
• The polynomial-time hierarchy
• Evidence of intractability: #P-completeness
• Evidence of intractability: PSPAC£-completeness
• Proof of intractability
• Extensions of NP: short proofs via randomization
4. Living with Intractability
• The complexity of approximate solutions
• Probabilistic analysis of algorithms
• Cryptography
5. Inside P
• Logarithmic space
• The hardest problems in P
• Parallel computation
6. Attacks on the P versus NP Problem
• Relativized complexity classes
• Relating circuit complexity to Turing machine complexity
• Constant-depth circuits
• Monotone circuit complexity
• Machine-based complexity classes
Introduction
Introduction
The interdependence of combinatorics and complexity theory is illustrated well by a celebrated
result of Ajtai, Komlos and Szemeredi (1983), who provide an efficient parallel algorithm to sort
numbers by using certain bipartite graphs, known as expanders. There is a beautiful probabilistic
proof that these graphs exist: a graph is selected according to a certain probability distribution
and then it is shown that the graph selected is an expander with non-zero probability. This proof
merely shows the existence of such a graph, it does not show how to construct one.
In order for this existential theorem to be useful in an algorithmic setting, we must be able to
construct an expander, which raises the following question: how do we compute the description of
an expander of a specified order n? There is a trivial way to do the computation, which suggests
that we have asked the wrong question: for each graph in the sample space, check by brute force
whether it is an expander. Unfortunately, the sample space is large, containing ( n!)2 graphs, and
to check each graph for the expander property is itself a nontrivial process, so for even modest
values of n, this procedure is wildly impractical. Thus, a more relevant question is one of efficient
computation: does there exist an efficient way to compute the description of an expander of a
specified order? We shall see that complexity theory provides the necessary machinery to make
this statement more precise.
As the previous example makes clear, computational issues have given an additional dimension
to combinatorics. The probabilistic proof that expanders exist becomes just a first step in the
search for the additional structure needed to provide an efficient construction. Indeed, one need
only examine the table of contents of this volume to see the impact of computational issues on the
present direction of combinatorics. Furthermore, complexity theory has often raised combinatorial
questions that were previously unsolved, and thus has sparked research in combinatorics.
Complexity theory has also helped to understand the significance of certain combinatorial theorems.
For example, consider the following two well-known theorems due to Hall and Camion,
respectively: a bipartite graph has a perfect matching if and only if each subset of nodes in one
part is adjacent to at least as many nodes in the other; a matrix with 0,+1, and -1 entries is totally
unimodular if and only if each nonsingular submatrix has a row with an odd number of nonzero
components. Each theorem provides an alternate characterization of the property in question, but,
as Edmonds observed, there is a fundamental way in which only the first is a "good" characterization.
The perfect matching itself is a concise proof that a particular graph has one and Hall's
theorem provides a way to give an efficiently verifiable proof that a graph does not have one. On
the other hand, both the definition and Camion's characterization of total unimodularity show only
how to give a concise proof that a matrix is not totally unimodular. Just as complexity theory has
provided a means of discussing efficient algorithms, it has also given an analogous formal notion of
1
a good characterization.
Complexity theory has been a rapidly expanding field over the past quarter century, and it
would be impossible to include all of its advances. Much of complexity theory is directed towards
the formulation of computational models that explain differences between the resources required to
solve particular problems, and we will focus on those closely tied to combinatorial examples. For
example, completeness results suggest that the problem of deciding who has a winning strategy in
certain combinatorial games is computationally even more intractable than deciding if a graph is
Hamiltonian.
a good characterization.
Complexity theory has been a rapidly expanding field over the past quarter century, and it
would be impossible to include all of its advances. Much of complexity theory is directed towards
the formulation of computational models that explain differences between the resources required to
solve particular problems, and we will focus on those closely tied to combinatorial examples. For
example, completeness results suggest that the problem of deciding who has a winning strategy in
certain combinatorial games is computationally even more intractable than deciding if a graph is
Hamiltonian.
Parallel computers are rapidly becoming a reality, and traditional algorithmic techniques often
appear ill-suited to take advantage of their power. As a result, there has been a great deal of
work recently in studying algorithmic techniques for solving combinatorial problems in theoretical
models of parallel computers.
Complexity theory has provided evidence that a wide variety of problems are computationally
intractable, but evidence is not nearly as satisfying as proof. Although the fundamental problem of
providing this proof remains open, significant steps toward settling this question have been made.
Many of these results are based on beautiful combinatorial methods, and it is quite possible that
combinatorial methods will play a role in the final solution.
Computational complexity is a theoretical discipline, and the results obtained do not always
correspond precisely with practical considerations. When used to sort fewer than a million numbers,
the algorithm mentioned above is inferior to algorithms that are theoretically inferior. Nonetheless,
it is the practical importance of understanding the power of computation that forms the driving
force behind complexity theory.
Complexity of Computational Problems
In this section, we will outline the essential machinery used to give formal meaning to the
complexity of computational problems. This involves describing what is precisely meant by a
computational problem, setting up a mathematical model of computation, and then formalizing
the notion of the computational resources required for a problem with respect to that model.
Unfortunately, there is no one standardized specification in which to discuss these questions. In
view of this, one might become concerned that it is possible to prove apparently contradictory results
under different specifications of these notions. Therefore, if this theory is to produce meaningful
results, it is essential that the definitions be robust enough so that theorems proved with respect to
them apply equally to all reasonable variants of this framework. Indeed, the particular definitions
2
that we will rely on will be accompanied by evidence that these notions are sufficiently flexible.
that we will rely on will be accompanied by evidence that these notions are sufficiently flexible.
Suppose that there is a directed graph G and you wish to know whether it is Hamiltonian, that
is, whether there is a directed circuit that traverses each of its nodes exactly once. If you handed
G to your favorite graph theorist, he might cite an appropriate theorem to obtain the correct
answer, "yes" or "no". The graph G is a single instance of the Hamiltonian circuit problem.
Thus, for any decision problem, of which the Hamiltonian circuit problem is an example, we
will be interested in the set of all instances for which the answer is "yes", which will be called the
associated language. The computational problem is, given any instance, to decide whether or not
this instance is in the language.
Making this more formal, we encode all of the instances in some arbitrary, but standardized
way as strings of O's and 1 's. Consider the example above, where the instances to be encoded are
directed graphs. There are two natural ways to represent the arcs of G, a directed graph of order
n. In the adjacency matrix form, we simply give an n X n 0,1 matrix A = ( aij) where aij = 1 if
and only if ( i, j) is an arc; the matrix can be represented by a string of n2 O's and 1 's, by simply
listing the elements of each row, one after another. In the adjacency list form, we give a list of
the nodes incident from node i, for each i; this can be encoded by using the binary representation
for the names of the nodes from 1 to n and some additional symbols, such as '-' and '/', and
representing the list for i by giving the encoding for i and the nodes incident from it separated by
'-' symbols, and then concatenating the lists for all nodes, separated by '/' symbols. Notice that
it is not necessary to expand the alphabet from {0, 1} since any larger alphabet could be encoded
using just those two symbols. Notice that the language for a given decision problem depends on the
particular encoding selected, and when we refer to a decision problem, we will refer to the language
L associated with some natural encoding.
Observe that this first way of encoding the graph may be less compact, since this encoding length
is n2, whereas the second might be significantly less if there are few arcs in G. Note, however, that
this difference is limited in that the length of the input in one format is at most the square of the
length in the other.
Consider now the following directed reachability problem: given a directed graph G, and
two specified nodes s and t, does there exist a path from s to t? An instance of this problem consists
of an encoding of G, followed by a suitable encoding of s and t. It is not hard to imagine that
no algorithm for this problem that is given its input in the adjacency matrix form can "look at"
significantly less than all n2 bits for all graphs of order n, and still answer correctly for all of these
instances. If the input were given in the more compact adjacency list format, it is still possible
to design a procedure that always takes time proportional to the length of the input and always
produces a correct answer. At first glance it seems that the list format can lead to more efficient
algorithms, but, as was true for the length of the encodings, this difference can be easily bounded.
Nonetheless, unless otherwise stated, this chapter will use the adjacency list encoding.
The previous example raises another interesting question, which is somewhat tangential to the
main focus of this chapter. Is there a sense in which all "reasonable" decision problems on graphs
encoded in adjacency matrix form require any algorithm to read some constant fraction of the input?
This problem has a long history, both for directed and undirected graphs, and many attempts were
made at giving sufficiently strong conditions before an accurate conjecture, due to Aanderaa and
3
Rosenberg, was proved by Rivest & Vuillemin (1976), and later strengthened by Kahn, Saks &
Sturtevant (1984). Consider a decision problem L where the instances are undirected graphs, and
L has two important properties: (1) monotonicity if G is an instance in Land G' is formed by
adding an edge, then G' is also in L ; (2) invariance under isomorphism -if G is an instance
in L and its nodes are relabeled to form G', then G' is also in L. If we only count the number of
questions of the form, "Is_ ij an edge?" then for any problem satisfying these two properties, any
correct procedure uses essentially nRosenberg, was proved by Rivest & Vuillemin (1976), and later strengthened by Kahn, Saks &
Sturtevant (1984). Consider a decision problem L where the instances are undirected graphs, and
L has two important properties: (1) monotonicity if G is an instance in Land G' is formed by
adding an edge, then G' is also in L ; (2) invariance under isomorphism -if G is an instance
in L and its nodes are relabeled to form G', then G' is also in L. If we only count the number of
questions of the form, "Is_ ij an edge?" then for any problem satisfying these two properties, any
correct procedure uses essentially n/4 queries for some graph of order n. In fact, if n is a prime
power, Kahn, Sturtevant & Saks have shown that all n(n -1)/2 queries must be asked.
Up to this point we have restricted our attention to decision problems, and this appears to be
limiting. A more satisfying answer to an input in the language of the Hamiltonian circuit problem
would be the Hamiltonian circuit itself. However, this information can be extracted by asking a
series of questions about membership in L. Repeat the following for each arc in the graph: delete
the arc, ask if the resulting graph is still Hamiltonian and replace the arc only in the case that the
answer is "no". At the end of this procedure, the remaining graph is the circuit. Thus, we have
shown that the search problem of finding the circuit is not much harder than the decision problem,
and this relationship remains true for all well-formulated decision versions of search problems.
Another important type of computational problem is an optimization problem; for example,
given G and two nodes s and t, we may wish to find the shortest path from s to t ( or merely find
the length). Problems of this type can be converted into decision problems by adding a bound b to
the instance, and, for example, asking whether G has a path from s to t of length at most b. To
see that this is reasonable, notice that the optimal value can be computed by binary search, that
is, by asking a series of questions of this form by fixing b to be the midpoint of the current range
of values for the optimum, and using the answer to cut the range in half for the next question. As
above, we see that an optimization problem can be solved with only somewhat more work than the
corresponding decision problem.
Throughout this chapter, we will be dealing with computational problems involving a variety
of structures, and we will not be specifying the nature of the encodings used. We will operate
on the premise that any reasonable encoding produces strings of length that can be bounded by
a polynomial of the length produced by any other encoding. For problems that involve numbers
we must be more careful in stating precisely what is meant by reasonable. First, we shall restrict
attention to integral data. To highlight the remaining issues, consider the following problem, called
the subset sum problem: given a set of numbers S and a target number t, does there exist
a subset of S that sums to t? Typically, we will assume that the numbers are given in binary
representation, although we will occasionally use a unary representation ( e.g., representing 5 by
11111). Notice that the latter could be exponentially bigger than the former, and thus size is a
deceptive measure, since it makes instances of the problem larger than they need be. As we survey
the complexity of computational problems that involve numbers, we will see that some are sensitive
to this choice of encodings, such as the subset sum problem, whereas others are less affected.
Models of computation
We next turn our attention to defining a mathematical model of a computer. In fact, we will
present three different models, and although their superficial characteristics make them appear
quite different, they will turn out to be formally equivalent. The first of these is the Turing
machine, which is an extremely primitive model, and as a result, it is easier to prove results about
4
what cannot be computed within this model. On the other hand, its extreme simplicity makes it
ill-suited for algorithm design; one would never want to program a Turing machine. As a result, it
will be convenient to have an alternative model, the random access machine (RAM), within which
to discuss algorithms.
what cannot be computed within this model. On the other hand, its extreme simplicity makes it
ill-suited for algorithm design; one would never want to program a Turing machine. As a result, it
will be convenient to have an alternative model, the random access machine (RAM), within which
to discuss algorithms.
that selects the new state, the contents of the cells currently scanned on the work tapes, and
indicates the direction in which each head moves one cell as a deterministic function of the
current state and the contents of the input and work tape cells currently being read. One may view
the transition function as the program hardwired into this primitive machine. The computation is
begun in state q0, with the input head at the left end of the input, and all of the work tapes and
the rest of the input tape blank. The machine halts if 6 is undefined for the current state and
the symbols read. An input is accepted if it halts in a state in the accepting state set A ~ Q. A
Turing machine M solves a decision problem L if L is the set of inputs accepted by M, and M halts
on every input; such a language L is said to be decidable. This definition of a Turing machine
is similar to the one given by Turing (1937), and the reader should note that many equivalent
definitions are possible.
We have defined a Turing machine so that it can only solve decision problems, but this definition
can be easily extended to model the computation of an arbitrary function by, for example, adding
a write-only output tape, on which to print the output before halting. Although this appears to
be a very primitive form of a computer, it has become routine to accept the following proposition.
Church's Thesis: Any function computed by an effective procedure can be computed by a Turing
machine.
Although this is a thesis, in the sense that any attempt to characterize the inexplicit notion
of effective procedure would destroy its intent, it is supported by a host of theorems, since for
any known characterization of computable functions, it has been shown that these are Turing
computable.
A random access machine (RAM) is a model of computation that is well-suited for specifying
algorithms, since it uses an idealized, simplified programming language that closely resembles the
assembly language of any modern-day digital computer. There is an infinite number of memory
cells indexed by the integers, and there is no bound on the size of an integer that can be stored
5
in any cell. A program can directly specify a cell to be read from or written in, without moving a
head into position. Furthermore, there is an indirect addressing option, that uses the contents of
a cell as an index to another cell that is then ( seemingly randomly) accessed. All basic arithmetic
operations can be performed. For further details on RAM's the reader is referred to Aho, Hopcroft
& Ullman (1974). Since the RAM is cast more in the mold of a traditional programming language,
it is less natural to think of using a RAM to solve decision problems, but this is just a special case
with a simple kind of output, 0 or 1.
in any cell. A program can directly specify a cell to be read from or written in, without moving a
head into position. Furthermore, there is an indirect addressing option, that uses the contents of
a cell as an index to another cell that is then ( seemingly randomly) accessed. All basic arithmetic
operations can be performed. For further details on RAM's the reader is referred to Aho, Hopcroft
& Ullman (1974). Since the RAM is cast more in the mold of a traditional programming language,
it is less natural to think of using a RAM to solve decision problems, but this is just a special case
with a simple kind of output, 0 or 1.
In spite of the apparent differences in these three models, any particular choice is one of convenience,
and not of substance.
(2.1) Theorem. The following classes of decision problems are all identical:
• the class of decision problems solvable by a Turing machine;
• the class of decision problems solvable by a RAM;
• the class of decision problems solvable by a family of circuits that can be generated by a
Turing machine.
This result is complemented by the following theorem, which is a consequence of the fact that
there are an uncountable number of decision problems, but only a countable number of Turing
machines.
(2.2) Theorem. Not all decision problems are solvable by a Turing machine.
This theorem does not provide a particular problem that is not solvable; it merely proves the
existence of such a problem. Particular problems from a wide variety of fields of mathematics are
known to have this property. In the next section, we will provide a few problems that are not
solved by any Turing machine. One famous example of such a problem is Hilbert's tenth problem.
6
Hilbert asked for an effective procedure to decide if a given multivariate polynomial has an integer
valued root. Culminating years of progress in this area, Matijasevic (1970) proved that there does
not exist a Turing machine that solves this decision problem, so that by Church's thesis, there is
no such effective procedure.
Hilbert asked for an effective procedure to decide if a given multivariate polynomial has an integer
valued root. Culminating years of progress in this area, Matijasevic (1970) proved that there does
not exist a Turing machine that solves this decision problem, so that by Church's thesis, there is
no such effective procedure.
Consider again the Hamiltonian circuit problem. A nondeterministic Turing machine for it
might be constructed by letting the guess tape encode a sequence of n nodes of the graph. The
Turing machine simply verifies that there is an arc between each consecutive pair of nodes in the
guessed sequence. If the graph is Hamiltonian, then there is a correct guess, but otherwise, for any
sequence of n nodes there will be some consecutive pair that is not adjacent. The correct guess, in
essence the Hamiltonian circuit, is a certificate that the graph is Hamiltonian. Observe that this
definition is one-sided, since the requirements for instances in L and not in L are quite different.
What could serve as a "guess" to certify that a graph is not Hamiltonian?
Computational resources and complexity
Now that we have mathematical formulations of both problems and machines, we can describe
what is meant by the computational resources required to solve a certain problem.
In considering the execution of a deterministic Turing machine, it is clear that the number of
transitions before halting corresponds to the running time of the machine on a particular input. In
discussing the running time of an algorithm, within any of the models, we do not want to simply
speak of particular instances, and so we must make some characterization of the running time for
all instances. The criterion that we will focus on for most of this chapter is the worst-case running
time as a function of the input size. When we say that a Turing machine takes n2 steps, this
means that for any input of size n, the Turing machine always halts within n2 transitions. Unless
otherwise specified, the length of the input will be denoted by n.
A more practical alternative might have been to use the "average" running time, but in order
to do so, we must characterize an "average" instance. That is, we must specify a probability distri-
7
bution from which the input is drawn and then analyze the expected running time when the input
is drawn accordingly. By focusing on worst-case analysis, the results do not depend on any such
restriction. On the other hand, a theory of algorithms that could handle this sort of probabilistic
analysis and still be relatively insensitive to the particulars of the probability distribution would
significantly enrich the scope of complexity theory.
bution from which the input is drawn and then analyze the expected running time when the input
is drawn accordingly. By focusing on worst-case analysis, the results do not depend on any such
restriction. On the other hand, a theory of algorithms that could handle this sort of probabilistic
analysis and still be relatively insensitive to the particulars of the probability distribution would
significantly enrich the scope of complexity theory.
2 + 5n -17, we say simply that it is O(n2). This simplification makes it possible to
discuss complicated algorithms without being overwhelmed by details. Furthermore, for any Turing
machine with superlinear running time and any constant c, there exists another Turing machine to
solve the same problem that runs c times faster than the original, that can be constructed by using
an expanded work tape alphabet.
We will also use a notation analogous to O( ·) to indicate lower bounds. A function f( n) is
n(g( n)) if there are constants N and c such that for all n 2:: N, f( n) 2:: cg( n ). A function f( n) is
0(g(n)) if it is both O(g(n)) and n(g(n)). It will also be useful to have a notion of one function
being asymptotically smaller than another: f ( n) is o(g( n)) if limn-.oo f ( n) / g( n) = 0.
For a nondeterministic Turing machine, we define the running time to be the number of transitions
in any computation path generated by the input. (Recall that we added the restriction that
all computation paths must have the same length.) For a circuit, we will want to characterize the
number of operations performed as a measure of time, so that the relevant parameter is the size
of the circuit, which is the order of the graph representing it. For a RAM, there are two standard
ways in which to count the running time on a particular input. In each operation, a RAM can,
for example, add two number of unbounded length. In the unit-cost model, this action takes one
step, independent of the lengths of the numbers being added. In the log-cost model, this operation
takes time proportional to the lengths of the numbers added. These measures can have radically
different values. Take 17, and repeatedly square it k times. With each squaring, the number of bits
in the binary representation essentially doubles. Thus, although we have taken only k steps in the
unit-cost model, the time required according to the log-cost model is exponential ink. Technically,
we will use the log-cost model, in order to ensure that the RAM is equivalent to the Turing machine
in the desired ways. But, when speaking of the running time of an algorithm, it is traditional to
state the running time in the unit-cost model, since for all standard algorithms one can prove that
the pathological behavior of the above example does not come into play.
Time is not the only computational resource in which we will be interested; we will see that the
space complexity of certain combinatorial problems gives more insight into the structure of these
problems. In considering the space requirements, we will focus on the Turing machine model, and
will only count the number of cells used on the work tapes of the machine. Furthermore, we will
again be interested in the asymptotic worst-case analysis of the space used. As was true for time
bounds, the space used by a Turing machine can be compressed by any constant factor. The space
used by a nondeterministic Turing machine is the maximum space used on any computation path.
The notion of the complexity of a problem is the order of a given computational resource, such
as time, that is necessary and sufficient to solve the problem. If we say that the complexity of the
directed reachability problem for a graph with m arcs is 0( m) this means that there is a ( unit-cost
8
RAM) algorithm that has worst-case running time O(m) and there is no algorithm with running
time of lower order. Tight results of this kind are extremely rare, since the tremendous progress in
the design of efficient algorithms has not been matched, or even approached, by the slow progress
in techniques for proving lower bounds on the complexity of these problems in general models of
computation. For example, consider the 3-colorability problem: given an undirected graph G
with m edges, can the nodes be colored with three colors so that no two adjacent nodes are given
the same color; i.e., is x( G) ~ 3? The best lower bound is only il( m ), in spite of substantial
evidence that it cannot be solved in time bounded by a polynomial.
RAM) algorithm that has worst-case running time O(m) and there is no algorithm with running
time of lower order. Tight results of this kind are extremely rare, since the tremendous progress in
the design of efficient algorithms has not been matched, or even approached, by the slow progress
in techniques for proving lower bounds on the complexity of these problems in general models of
computation. For example, consider the 3-colorability problem: given an undirected graph G
with m edges, can the nodes be colored with three colors so that no two adjacent nodes are given
the same color; i.e., is x( G) ~ 3? The best lower bound is only il( m ), in spite of substantial
evidence that it cannot be solved in time bounded by a polynomial.
In order to study the relative power of particular computational resources, we introduce the
notion of a complexity class, which is the set of problems that have a specified upper bound on
their complexity. It will be convenient to define the complexity class DT IM E(T( n)) to be the set
of all languages L that can be recognized by a deterministic Turing machine within time O(T( n) ).
NT IM E(T( n)) denotes the analogous class of languages for nondeterministic Turing machines.
Throughout this chapter, it will be convenient to make certain assumptions about the sorts of time
bounds that define complexity classes. A function T( n) is called fully time-constructible if there
exists a Turing machine that halts after exactly T( n) steps on any input of length n. All common
time bounds, such as nlog nor n2, are fully time-constructible. We will implicitly assume that any
function T(n) used to define a time-complexity class is fully time-constructible.
The single most important complexity class is P, the class of problems solvable in polynomial
time. Two of the most well-known algorithms, the Euclidean algorithm for finding the greatest
common divisor of two integers and Gaussian elimination for solving a system of linear equations,
are classical examples of polynomial-time algorithms. In fact, Lame observed as early as 1844 that
the Euclidean algorithm was a polynomial-time algorithm. In 1953, Von Neumann contrasted the
running time for an algorithm for the assignment problem that "turn[ed] out [to be] a moderate
power of n, i.e., considerably smaller than the 'obvious' estimate n!" for a complete enumeration of
the solutions. Edmonds (1965) and Cobham (1965) were the first to introduce P as an important
complexity class, and it was through the pioneering work of Edmonds that polynomial solvability
became recognized as a theoretical model of efficiency. With only a few exceptions, the discovery
of a polynomial-time algorithm has proved to be an important first step in the direction of finding
truly efficient algorithms. Polynomial time has proved to be very fruitful as a theoretical model
of efficiency both in yielding a deep and interesting theory of algorithms and in designing efficient
algorithms.
There has been substantial work over the last 25 years in finding polynomial-time algorithms
for combinatorial problems. It is a testament to the importance of this development that much of
this handbook is devoted to discussing these algorithms. This work includes algorithms for graph
connectivity and network :flow (see Chapter 2), for graph matchings (see Chapter 3), for matroid
problems (see Chapter 11 ), for point lattice problems (see Chapter 19), for testing isomorphism
(see Chapter 27), for finding disjoint paths in graphs (see Chapter 5) and for problems connected
with linear programming (see Chapters 28 and 30).
Another, more technical reason for the acceptance of P as the theoretical notion of efficiency,
is its mathematical robustness. Recall the discussion of encodings where we remarked that any
reasonable encoding will have length bounded by a polynomial in the length of another. As a
9
result, any polynomial-time algorithm which expects its input in one form can be converted to a
polynomial-time algorithm for the other. In particular, note that the previous discussion of the
two different encodings of a graph can be swept aside and we can assume that the size of the input
for a graph of order n is n. Notice further that the informal definition of P given above does not
rely on any model of (deterministic) computation. This rash statement is justified by the following
theorem.
result, any polynomial-time algorithm which expects its input in one form can be converted to a
polynomial-time algorithm for the other. In particular, note that the previous discussion of the
two different encodings of a graph can be swept aside and we can assume that the size of the input
for a graph of order n is n. Notice further that the informal definition of P given above does not
rely on any model of (deterministic) computation. This rash statement is justified by the following
theorem.
Theorem. The following classes of decision problems are all equivalent:
• the class of decision problems solvable by a Turing macl1ine in polynomial-time;
• the class of decision problems solvable by a RAM in polynomial-time under the log-cost
measure;
• the class of decision problems solvable by a family of circuits of polynomial size, where the
circuit for inputs of size n can be generated by a Turing mad1ine witl1 running time bounded
by a polynomial in n.
The importance of the class NP = Uk NT IM E( nk) is due to the wide range of important
problems that are known to be in the class, and yet are not known to lie in P. For example, the
nondeterministic algorithm given for the Hamiltonian circuit problem is clearly polynomial, and so
this problem lies in NP. However, it is not known whether this, or any problem is in NP\ P, and
this is, undoubtably the central question in complexity theory.
Open Problem Is P = NP?
For each decision problem L, there is a complementary problem, L, such as the problem of
recognizing non-Hamiltonian graphs. For any complexity class S, let co -S denote the class of
languages whose complement is in S. The definition of Pis symmetric with respect to membership
and non-membership in L, so that P = co -P. NP is very different in this respect. In fact, it is
unknown whether the Hamiltonian circuit problem is in co -NP.
Open Problem Is NP = co -NP?
Edmonds (1965) brought attention to the class NP n co -NP, and called problems in this
class well-characterized, since there is both a short certificate to show that the property holds,
as well as a short certificate that it does not. Edmonds was working on algorithms for non-bipartite
maximum matching at the time, and this problem serves as a good example of a problem in this
class. If the instance consists of a graph G and a bound k and we wish to know if there is a matching
of size at least k, the matching itself serves as a certificate for an instance in L, whereas an odd-set
cover serves as a certificate for an instance not in L (see Chapter 3). Note that there is a min-max
theorem, Tutte's theorem, characterizing the size of the maximum matching that is at the core of
the fact that matching is in NP n co -NP, and indeed min-max theorems often serve this role.
As mentioned above, matching is known to be in P, and this raises the following question.
Open Problem Is P = NP n co -NP?
We will also be concerned with complexity classes defined by the space complexity of problems.
As for time, let DSPACE(S(n)) and NSPACE(S(n)) denote, respectively, the class of languages
10
accepted by deterministic and nondeterministic Turing machines within space O(S(n)). We will
implicitly assume the following condition for all space bounds used to define complexity classes:
a function S(n) is fully space-constructible if there is a Turing machine that, on any input of
length n delimits S( n) tape cells and then halts. Three space complexity classes will receive the
most prominent attention:
accepted by deterministic and nondeterministic Turing machines within space O(S(n)). We will
implicitly assume the following condition for all space bounds used to define complexity classes:
a function S(n) is fully space-constructible if there is a Turing machine that, on any input of
length n delimits S( n) tape cells and then halts. Three space complexity classes will receive the
most prominent attention:
C = DSPACE(logn);
• NC= NSPACE(logn); and
• PSP.AC,£ = UkDSPACE(nk).
One might be tempted to add a fourth class, NP SP .AC,£, but we shall see that nondeterminism
does not add anything in this case. We will see that the chain of inclusions
holds, and the main thrust of complexity theory is to understand which of these inclusions is
proper. At the extremes, a straightforward diagonalization argument due to Hartmanis, Lewis
& Stearns (1965) shows that C i= PSP.AC.£, and a result of Savitch (1970) further implies that
NC i= PSP.AC.t:, but after nearly a quarter century more work, these are the only sets in this chain
known to be distinct.
The lack of lower bounds with respect to a general model of computation for either space or
time complexity has led to the search for other evidence that suggests that lower bounds hold. One
such type of evidence might be the implication: if the Hamiltonian circuit problem is in P, then
P = NP. Afp contains a tremendous variety of problems that are not known to be in P, and so by
proving such a claim, one shows that the Hamiltonian circuit problem is a hardest problem in NP,
in that any polynomial-time algorithm to solve it would in fact solve thousands of other problems.
The principal tool in providing evidence of this form is that of reduction. We will say that
a problem L1 polynomial-time reduces to L2 if there exists a polynomial-time computable
function f that maps instances of L1 into instances of L2 such that x is a "yes" instance of L1
if and only if J(x) is a "yes" of L2• We shall denote this by L1, we construct a graph G as follows: for each clause in ¢>, let there be 3
nodes in G, each representing a literal in the clause, and let these 3 nodes induce a clique ( i.e.,
they are pairwise adjacent); complete the construction by making adjacent any pair of nodes that
represent a literal and its negation, and set k to be the number of clauses in ¢>. If there is a
satisfying assignment for>, pick one literal from each clause that is given the assignment true; the
corresponding nodes in G form a stable set of size k. If there is a stable set of size k, then it must
have exactly one node in each clique corresponding to a clause. Furthermore, the nodes in the stable
set cannot correspond to both a literal and its negation, so that we can form an assignment by
setting to true all literals selected in the stable set, and extending this assignment to the remaining
variables arbitrarily. This is a satisfying assignment. This is a characteristic reduction, in that
we build gadgets to represent the variables and clause structure within the framework of the new
problem. The NP-completeness of two other problems follow immediately: the clique problem,
given a graph G and a bound k, decide if there is a clique in G of size k; and the node cover
problem, given a graph G and a bound k, decide if there exists a set of k nodes such that every
edge is incident to a node in the set.
a( G) 2: k; that is, do there exist k pairwise non-adjacent nodes in G? It is easy to see that this
problem is in NP, and to show that it is complete, we reduce 3-SAT to it and invoke Theorem (2.5).
Given a 3-SAT instance >, we construct a graph G as follows: for each clause in ¢>, let there be 3
nodes in G, each representing a literal in the clause, and let these 3 nodes induce a clique ( i.e.,
they are pairwise adjacent); complete the construction by making adjacent any pair of nodes that
represent a literal and its negation, and set k to be the number of clauses in ¢>. If there is a
satisfying assignment for>, pick one literal from each clause that is given the assignment true; the
corresponding nodes in G form a stable set of size k. If there is a stable set of size k, then it must
have exactly one node in each clique corresponding to a clause. Furthermore, the nodes in the stable
set cannot correspond to both a literal and its negation, so that we can form an assignment by
setting to true all literals selected in the stable set, and extending this assignment to the remaining
variables arbitrarily. This is a satisfying assignment. This is a characteristic reduction, in that
we build gadgets to represent the variables and clause structure within the framework of the new
problem. The NP-completeness of two other problems follow immediately: the clique problem,
given a graph G and a bound k, decide if there is a clique in G of size k; and the node cover
problem, given a graph G and a bound k, decide if there exists a set of k nodes such that every
edge is incident to a node in the set.
If we restrict the stable set problem to a particular constant value of k ( e.g., is a( G) ::; 100?),
then this problem can be solved in P by enumerating all possible sets of size 100. In contrast to
this, Stockmeyer (1973) showed that the 3-colorability problem is NP-complete by reducing 3-SAT
to it. Let > be a 3-SAT formula. We construct a graph G from it in the following way. First,
construct a "reference" clique on three nodes, called true, false and undefined; these nodes will serve
as a way of naming the colors in any 3-coloring of the graph. For each variable in >, construct a
pair of adjacent nodes, one representing the variable, and one representing its negation, and make
them both adjacent to the undefined node. For each clause, Ii V /2 V /3, construct the subgraph
shown in Figure 1 below, where the nodes labeled with literals, as well as false (F) and undefined
(U), are the nodes already described. It is easy to see that if > has a satisfying assignment, we
can get a proper 3-coloring of this graph as follows: color the nodes corresponding literals that are
true in the assignment with the same color as is given node true; color analogously the nodes for
false literals; and then extend this coloring to the remaining nodes in a straightforward manner.
Furthermore, it involves only a little case-checking to see that if the graph is 3-colorable, then the
colors can be interpreted as a satisfying assignment.
19
Figure 1:
Figure 1:
. integer variable bounded between O and 1, and for each Boolean variable x, constrain the sum of
the variables corresponding to x and its negation to be at most 1. The construction is completed
by adding a constraint for each clause that forces the variables for the literals in the clause to sum
to at least 1. On the other hand, to show that the problem is in NP requires more work, involving
a calculation that bounds the length of a "smallest" solution satisfying the constraints (if one exists
at all).
The subset sum problem defined previously is also .NP-complete. This is the first problem
that we have encountered that is a "number problem" in the sense that it is not the combinatorial
structure, but rather the numbers that make this problem hard. If the numbers are given in unary,
there is a polynomial-time algorithm (such an algorithm is called pseudo-polynomial): keep a
(large, but polynomially bounded) table of all possible sums obtainable using a subset of the first
i numbers; this is trivial to do for i = 1, and it is not hard to efficiently find the table for i + 1
given the table for i; the table for i = n gives us the answer. There are "number problems" that
remain .NP-complete, even if the numbers are encoded in unary; such problems are called strongly
.NP-complete. One example is the 3-partition problem: given a set S of the 3n integers that
sum to nB, does there exist a partition of S into n sets, T1, .•• , Tn, where each Ti has 3 elements
that sum to B?
Consider a weighted generalization of the exact matching problem, where each edge of a bipartite
graph is given a weight Wij and we wish to decide if there is a perfect matching of total weight
exactly W. It is easy to see that the randomized algorithm given for the unweighted case extends
to the weighted case, if the weights are given in unary. ( Generalize the previous construction to
have a factor of yw for each edge of weight w.) On the other hand, for weights in binary, there is
a straightforward reduction from the subset sum problem: build a disjoint copy of K2,2 for each
numbers in our subset sum instance, where the weights are such that one perfect matching of the
subgraph has weight s, whereas the other has weight 0. It is easy to see that the union of these
subgraphs has a perfect matching of weight B, if and only if some subset of the numbers of the
instance of the subset set problem total exactly B.
20
The Complexity Class NP \ P?
The Complexity Class NP \ P?
( 3.3) Theorem. If L1 is decidable and L1 ~ P, then there exists a decidable language L2 such
that L2 ~ P, L2 cxpL1 but L1 cf..rf.,2.
Note that if L1 is NP-complete, the language L2 is "in between" the classes P and NP-complete,
if P i= NP, and by repeatedly applying the result we see that there is a whole range of seemingly
different complexity classes. Under the assumption that Pis different from NP, do we know of
any candidate problems that may lie in this purgatory of complexity classes? The answer to this
is "maybe". We will give four important problems that have not been shown to be either in P
or NP-complete. The problem for which there has been the greatest speculation along these lines
is the graph isomorphism problem: given a pair of graphs G1 = (V1, E1) and G2 = (V2, E2),
decide if there is a bijection O" : Vi 1-+ V2 such that ij E E1 if and only if u( i)u(j) E E2. Later in
this section we will provide evidence that it is not NP-complete, and efforts to show that it is in P
have so far fallen short (see Chapter 27 of this volume). A problem that mathematicians since the
ancient Greeks have been trying to solve is that of factoring integers; a decision formulation that
is polynomially equivalent to factoring is as follows: given an integer N and a bound k, does there
exist a factor of N that is at most k? A problem that is no harder than factoring is the discrete
logarithm problem: given a prime p, a generator g and an natural number x < p, find l such
that g' = x mod p. Finally, there is the shortest vector problem, where we are given a collection
of integer vectors, and we wish to find the shortest vector (in the Euclidean norm) that can be
represented as a non-zero integral combination of these vectors. Here, current evidence makes it
seem likely that this problem is really NP-complete; related work is discussed elsewhere in this
volume (see Chapter 19).
It is important to mention that there is an important subclass of NP which may also fall in this
presumed gap. Edmonds' class of well-characterized problems, NP n co-NP, certainly contains P
and is contained in NP. Furthermore, unless NP = co -NP, it cannot contain any NP-complete
problem. On the other hand, the prevailing feeling is that showing a problem to be in this class
is a giant step towards showing that the problem is in P. A result of Pratt ( 1975a) shows that
primality is in NP, so it lies in NP n co -NP, though as discussed above, there is additional
evidence that it lies in P. The factoring problem, which appears to be significantly harder, also lies
in NP n co -NP, since a prime factorization can be guessed along with certificate that each of the
factors is indeed prime. One interesting open question connected with NP n co -NP is concerned
with the existence of a problem that is complete for this class. One might hope that there is some
natural problem that completely characterizes the relationship of NP n co NP with P and Afp
(in the same manner that 3-SAT characterizes the P = NP question).
One approach to shed light on the complexity of a problem that is not known to be either in P or
21
NP-complete, has been to consider weaker forms of completeness for NP. In fact, Cook's original
notion of completeness, though technically a weaker definition of intractability, is no less damning.
L1 cxpLNP-complete, has been to consider weaker forms of completeness for NP. In fact, Cook's original
notion of completeness, though technically a weaker definition of intractability, is no less damning.
L1 cxpL, can be thought of as solving L1 by a restricted kind of subroutine call, where first some
polynomial-time preprocessing is done, and then the subroutine for L2 is called once. Cook (1971)
proposed a notion of reducibility, where L1 is solved by using a polynomial-time Turing machine
that can, in one step, get an answer to any query of the form, "is x E L2 ?" Such a more general
Turing machine is called an oracle machine, and in the next subsection, we will discuss Turing
machines and complexity classes relative to an oracle A in more detail. Note that any complete
language L with respect to this reducibility still has the property that L E P if and only if P = NP.
Karp (1972) focused attention on the notion exp, and was able to show that NP-completeness was
powerful enough to capture a wide range of combinatorial problems. On the other hand, it remains
an open question to show that Cook's notion of reducibility is stronger than Karp's; is there a
natural problem in NP that is complete with respect to "Cook" reducibility, but not with respect
to "Karp" reducibility?
Adleman & Manders (1977, 1979) showed that non-determinism and randomization can play
a role in defining notions of reducibility. The first notion, ,-reducibility, used nondeterminism in
order to provide a stronger notion of a complete problem that still is not in NP n co -NP unless
NP = co -NP. A randomized notion of completeness was introduced in order prove results of the
form that L is in 'RP if and only if RP = NP. They also proved completeness results for several
number theoretic problems in NP that are not known to be .NP-complete.
The polynomial-time hierarchy
It is important to realize that not all natural combinatorial problems lie in NP or co -NP.
For example, consider the following generalized coloring problem: given input graphs G and
H, can we color the nodes of G with two colors so that the graph induced by each color does not
contain Has a subgraph? Given the guess of a coloring, there is an exponential number of possible
subgraphs that all must be checked, so at least the most obvious approach does not work. Is there
a more sophisticated approach that puts this problem or its complement in NP, or is this difficulty
inherent?
One can view NP as the class of languages L that can be expressed in the following way: there
exists a language L' E P such that x E L {::} 3y of length bounded by a polynomial in the length
of x such that (x, y) E L'. (We will denote this polynomially bounded quantification by 3p.) By
writing NP in this way, we are led to a generalization that does contain the above problem. Merely
allow an alternation of quantifiers: ( G, H) has a coloring if 3p a bipartition such that VP subgraphs
G' of one part of G, G' is not equal to H. (Note that the specification of G' implicitly contains a
bijection to the nodes H.) Of course, one need not stop with one alternation of quantification; this
motivates a hierarchy of classes, called the polynomial-time hierarchy: I:,'f = {Ll3L' E P such
that x EL if and only if 3pY1 VpY2 · · · QkYk(x, Y1, Y2, ... , Yk) EL'}, where Qk is 3P if k is odd, and VP
if k is even. Thus, generalized coloring is in I:,f. Furthermore, note that Lib = P and I:,f = NP.
We also define IIf to be co -I:,'f:. Clearly, I:,'f U II'f ~ Lik+i.
Alternatively, it is possible to define the polynomial-time hierarchy in terms of oracles, as was
done in the original formulation by Meyer & Stockmeyer (1972). An oracle Turing machine has a
special additional tape, called a query tape, and three special states, qask, qno and qyes· The oracle
Turing machine MA can ask questions of the form, "is x in A?" by writing x on the query tape,
22
and entering the state qask· The machine enters state qyes or qno depending on the outcome to the
question. Let .NPA denote the class of languages recognized by nondeterministic polynomial-time
oracle Turing machines with access to an oracle for the language A, and in general, any complexity
class C has its relativized analogue cA. The notion of the space used by an oracle Turing machine
is ambiguous, since it is not clear whether the space used on the query tape should be counted. We
shall assume that it is, though this has the disadvantage that c,A might not contain A. Finally, let
.NPC denote the union of .NpA over all A EC. Consider again the generalized coloring problem;
it is not hard to see that there is nondeterministic polynomial-time Turing machine to solve it,
given an oracle for the following problem A: given G and H, decide if H is a subgraph of G. Since
A E .NP, we see that this coloring problem is in .Np.llf P. In fact, :Ef = .Np.llf P and in general,
:Ef+i = .Np"E'f. Unfortunately, for each new complexity class, there is yet another host of unsettled
questions.
and entering the state qask· The machine enters state qyes or qno depending on the outcome to the
question. Let .NPA denote the class of languages recognized by nondeterministic polynomial-time
oracle Turing machines with access to an oracle for the language A, and in general, any complexity
class C has its relativized analogue cA. The notion of the space used by an oracle Turing machine
is ambiguous, since it is not clear whether the space used on the query tape should be counted. We
shall assume that it is, though this has the disadvantage that c,A might not contain A. Finally, let
.NPC denote the union of .NpA over all A EC. Consider again the generalized coloring problem;
it is not hard to see that there is nondeterministic polynomial-time Turing machine to solve it,
given an oracle for the following problem A: given G and H, decide if H is a subgraph of G. Since
A E .NP, we see that this coloring problem is in .Np.llf P. In fact, :Ef = .Np.llf P and in general,
:Ef+i = .Np"E'f. Unfortunately, for each new complexity class, there is yet another host of unsettled
questions.
1?
Open Problem For each k ~ 1, is :Ef = IIf?
Contained in these, for k = 1, are the P = .NP and .NP = co -.NP questions, and as was true for
those questions, one might hope to find complete problems on which to focus attention in resolving
these open problems. We define these notions of completeness with respect to polynomial-time
reducibility, so that L is complete for C if and only if L E C and all problems in C reduce ( o:Ef.
As we shall see, the polynomial-time hierarchy has helped to provide structure to a number of
complexity classes. Perhaps the first result along these lines is due to Sipser (1983b ), who used a
beautiful "hashing function" technique to show that EPP is in the polynomial-time hierarchy, and
in fact, can be placed within :Ef n IIf.
Evidence of intractability: #P-completeness
Consider the counting problem of computing the number of perfect matchings in a bipartite
graph. If this is cast as a decision problem, it does not appear to be in .NP, since it seems
that the number of solutions can only be certified by writing down possibly exponentially many
matchings. However, consider modifying the definition of .NP to focus on the number of good
certificates, rather than the existence of one; let #'P be the class of problems for which there exists
a language L' E 'P and a polynomial p( n) such that for any input x, the only acceptable output
is z = l{Y: IYI = p(jxl) and (x, y) E L'}I. Clearly, the problem of counting perfect matchings in a
23
bipartite graph is in #P. Any problem in #P can be computed using polynomial space, since all
possible certificates y may be tried in succession
bipartite graph is in #P. Any problem in #P can be computed using polynomial space, since all
possible certificates y may be tried in succession .
We can also define a notion of a complete problem for #P. To do this, we use a reducibility
analogous to that used by Cook. Let Pi, i = 1, 2 denote a counting problem, where for an instance
x, the number of solutions is P;(x). The problem Pi reduces to P2 if there exists a polynomialtime
Turing machine that can compute Pi if it has access to an oracle that, given x can compute
P2( x) in one step. We define a counting problem to be #P-complete if it is in #P, and all
problems in #P reduce to it. A weaker notion of reducibility, analogous to the notion used by
Karp, is called parsimonious: an instance x of P1 is mapped in polynomial time to f ( x) for P2,
such that P1(x) = P2(f(x)). For either notion of reducibility, we see that any polynomial-time
algorithm for P2 yields a polynomial-time algorithm for P1. It is not hard to see that the proof of
Cook's theorem shows that the problem of computing the number of satisfying assignments of a
Boolean formula in conjunctive normal form is #P-complete. Furthermore, by being only slightly
more careful, it is easy to give parsimonious modifications of the reductions to the clique problem,
the Hamiltonian circuit problem, and seemingly any NP-complete problem, and so the counting
versions of NP-complete problems can be shown to be #P-complete.
Surprisingly, not all #P-complete counting problems need be associated with an NP-complete
problem. Computing the number the perfect matchings in a bipartite graph, ( or equivalently,
computing the permanent of a 0,1 matrix) is a counting version of a problem in P, the perfect
matching problem, and yet Valiant (1979) showed that this problem is complete. This made it
possible to prove a variety of counting problems to be #P-complete that are related to polynomially
solvable decision problems.
There are many problems that are essentially #P-complete, although they do not have the
appearance of a counting problem. An important practical example is the problem of computing
the reliability of a network: given a graph G with a source s and a sink t, and probabilities p( e)
that edge e will fail, what is the probability that s becomes disconnected from t? In practice, it
may be more important to estimate this value, rather than determining it precisely, and work of
Karp & Luby (1985) provides algorithms to do this.
Another problem that has been shown to be intimately connected with the estimation of the
value of counting problems is that of uniformly generating combinatorial structures. As an example,
suppose that a randomized algorithm requires a perfect matching in a bipartite graph G to be chosen
uniformly from the set of all perfect matchings in G; how can.this be done? Jerrum, Valiant &
Vazirani (1986) have shown that a relaxed version of this problem, choosing perfect matchings
that are selected with probability within an arbitrarily small constant factor of the uniform value,
is equivalent to the problem of estimating the number of perfect matchings ( using a randomized
algorithm). In fact, their result carries through to most counting/generation problems related
to problems in NP, since it requires only a natural self-reducibility property, that says that an
instance can be solved by handling some number of smaller instances of the same problem.
Stockmeyer (1985) has provided insight into estimating the value of #P problems, by trying
to place these problems within the polynomial-time hierarchy. Using Sipser's hashing function
technique, Stockmeyer showed that for any problem in #P and any fixed values £, d > O, there
exists a polynomial-time Turing machine with a :Bf-complete oracle that computes a value that is
1 A recent result of Toda gives new evidence of the intractibility of this class by showing the P1i ~ p#1'.
24
within a factor of (1 + rn-d) of the correct answer.
within a factor of (1 + rn-d) of the correct answer.
In this subsection, we turn to the question of the space complexity of problems. When we
discuss space complexity, we may assume that the Turing machine has only one work tape ( and a
separate input tape), since by expanding the tape alphabet, any number of tapes can be simulated
by one tape without using more space.
It is not hard to see that PSPACe. must contain the entire polynomial-time hierarchy. For any
LE ~f, 3L' E P such that x EL if and only if 3pY1 Vpy2 · · · QkYk(x, Yi, Y2, ... , Yk) E L', and so all
possible Y1, ... , Yk can be tried in succession, to see if the condition specified by the alternating quantifiers
is satisfied. Similarly, P SP AC£ contains #P, since all computation paths of a polynomialtime
counting machine may be tried in succession. We remarked when introducing 'P SP AC£ that
it was not an oversight that NP SP AC£ was not defined, since 'P SP AC£ = NP SP AC£. This
result is a special case of the following.
(3.4) Theorem. (Savitch) If L is accepted by a nondeterministic Turing machine using space
S ( n) 2'.: log n, then it is accepted by a deterministic Turing machine using space S ( n )2.
Proof: The proof is based on the idea of bounding the number of configurations of the Turing
machine and using a natural divide-and-conquer strategy. In any given computation, a nondeterministic
Turing machine M can be completely described by specifying the input head position, the
contents of the work tape, the work tape head position and the current state. If M uses S( n) space,
then there are no more than d8(n) configurations, for some constant d, and one can be written down
in S( n) space. Consequently, we can assume that M halts within d8(n) steps, since we can keep a
counter of the number of steps taken and halt if that many are done.
To simulate M by a deterministic machine, we build a procedure that recursively calls itself to
check for any configurations I and K, and a bound l, whether M, when started in J, can reach K
within 21 steps. There is such a computation if there exists a midpoint of the computation J such
that I can lead to J and J can lead to Keach within 21-1 steps. The existential quantifier can be
implemented by merely trying all possible Jin some specified order. The basis of this recursion is
the case l = O, where we merely need to know if I = K, or if one transition leads from I to K, and
this can be easily checked. In implementing this procedure, we need to keep track of the current
midpoint at each level of the recursion, and so we need 1S(n) space to do this. Thus, by running
the procedure with the initial configuration and every accepting configuration with l = S( n) log d,
we get a deterministic procedure that accepts Land uses space O(S(n)2). D
The idea of "guessing a midpoint" is also the main idea used to derive a P SP ACe.-complete
problem, where completeness is defined with respect to polynomial-time reductions. The problem
of the validity of quantified Boolean formulae is as follows: given a formula in first-order logic
in prenix form, 3x1 V x2 · · · Q kX k must evaluate to true. There are many other P SP ACE-complete problems known, and
most of these have a game-like flavor. An example of a more natural PSPACf-complete game is
the Shannon switching game: given a graph G with two nodes s and t, two players alternately
color the edges of G, where the first player, coloring with red, tries to construct a red path from
s to t, whereas the second player, coloring with blue, tries to construct a blue ( s, t)-cut; does the
first player have a winning strategy for G?
Another problem that has a simple PSPACf-completeness proof deals with determining if
two nodes are connected in a directed graph G ( of order 2n) that is given by a circuit with 2n
inputs where the first n specify in binary a node i and the second n specify a node j, and the
output of the circuit is 1 if and only if (i,j) is an arc of G. If Lis accepted by Min polynomial
space, L can be reduced to this circuit-based directed reachability problem, by letting the
nodes of G correspond to configurations of M, where arcs (i,j) correspond to possible transitions
of M. It is easy to construct in polynomial time a (polynomial-size) circuit that decides if one
configuration follows another by one transition of M. Since it easy to massage M so that it only
has one accepting configuration, we have reduced the acceptance of the input x by M to whether
the initial configuration of x is connected to the accepting configuration.
The role of games in PSPACf-completeness suggests a new type of Turing machine, called an
alternating Turing machine, which was originally proposed by Chandra, Kozen & Stockmeyer
(1981). Consider a computation to be a sequence of moves made by two players, an existential player
and a universal player. The current state indicates whose move it is, and in each configuration the
specified player has several moves from which to choose. Each computation path either accepts
or rejects the input. The input is accepted if the existential player has a winning strategy, that
is, if there is a choice of moves for the existential player, so that for any choice of moves by
the universal player, the input is accepted. For simplicity, assume again that each computation
path has the same number of moves. As before, the time to accept an input x is this number of
moves, and the space needed to accept x is the maximum space used on any computation path.
Observe that a nondeterministic machine is an alternating machine where only the existential
player moves. It is easy to see that the 1th level of the polynomial-time hierarchy can be defined in
terms of polynomial-time alternating Turing machines where the number of "alternations" between
existential and universal quantifiers is less than l.
The role of PSPACf in the definition of these machines suggests that alternating polynomial
time is closely related to PSPACf and indeed this is just a special case of a general phenomenon.
Let ATIME(T(n)) and ASPACE(S(n)) denote the classes accepted by an alternating Turing
machine with O(T( n)) time and 0( S( n)) space, respectively. Chandra, Kozen & Stockmeyer proved
two fundamental results characterizing the relationship of alternating classes to deterministic and
nondeterministic ones. Note that the first result implies Savitch's theorem, and in fact, the proof
of the last inclusion of Theorem (3.5) is very similar to the proof of Savitch's theorem.
(3.5) Theorem. IfT(n) ~ n, then
ATIME(T(n)) ~ DSPACE(T(n)) ~ NSPACE(T(n)) ~ ATIME(T(n)2)
26
(3.6) (3.6)
ASP ACE( S( n)) = Uc>oDT IM E( cs(n))
Among the consequences of these results, we see that AP PSP.ACe. In fact, one can view
an alternating Turing machine as extremely idealized parallel computer, since it can branch off
an unbounded number of parallel processes that can be used in determining the final outcome.
Therefore, one can consider these results as a proven instance of the parallel computation thesis:
parallel time equals sequential space (up to a polynomial function).
Proof of intractability
It is, of course, far preferable to prove that a problem is intractable, rather than merely giving
evidence that supports this belief. Perhaps the first natural question is, are there any decidable
languages that require more time than T( n )? The diagonalization techniques used to show that
there are undecidable languages can be used to show that for any Turing computable T( n ), there
must exist such a language; consider the language L of all i such that if i is run on the Turing
machine Mi that i encodes, either it is rejected, or it runs for more than T( n) steps. It is easy to
see that L is decidable, and yet no Turing machine that always halts within T( n) steps can accept
L. Using our stronger assumption about T(n) (full time-constructibility) we are able to define
such a language that is not only decidable, but can be recognized within only slightly more time
than T(n). The additional logarithmic factor needed in Theorem (3.7), which combines results
of Hartmanis & Stearns (1965) and Rennie & Stearns (1966), is due to the fact that the fastest
known simulation of a (multitape) Turing machine by a machine with some constant number of
tapes slows down the machine by a logarithmic factor.
. . T1 ( n) log T1 ( n) .
(3.7) Theorem. Jflimmf T ( ) = O, then there exists LE DTIME(T2) \DTIME(T1).
n-+oo n
2
Since any multitape machine can be simulated by a 1-tape machine without using any additional
space, the corresponding space hierarchy theorem does not require the logarithmic factor. Seiferas,
Fischer & Meyer (1978) have proved an analogous, but much more difficult, hierarchy theorem for
nondeterministic time.
(3.8) Theorem. Ifliminf Ti(n(+/) = 0, then there exists LE NTIME(T2) \ NTIME(T1).
n-+oo T2 n
Although these theorems are non-constructive, Meyer & Stockmeyer (1972) developed the following
strategy that makes use of completeness results in order to prove lower bounds on particular
problems. Intuitively, a completeness result says that a problem is a "hardest" problem for some
complexity class, and Theorems (3.7) and (3.8) can be used to show that certain complexity classes
have provably hard problems. Consequently, these two pieces imply that the complete problem is
provably hard.
27
As an example, consider the circuit-based large clique problem, Lee, which is the problem
analogous to the circuit-based directed reachability problem, that tests whether the ( compactly
represented) input graph on N = 2n nodes has a clique of size N /2. This problem is complete
for the class N£XPTIM£ = UkNTIME(2nk) with respect to polynomial-time reducibility. This
can be seen by introducing the circuit-based version of satisfiability; a formula is represented by a
polynomial-size circuit that outputs 1 whenever the first set of inputs gives the binary representation
of the index of a variable that occurs in the clause specified by the remainder of the inputs.
The generic reduction of Cook's theorem translates exactly to show that circuit-based satisfiability
is Nt:.XPTIM£-complete, and the completeness of Lee follows by using essentially the same reduction
used to show the NP-completeness of the ordinary clique problem. Given this completeness
result, it is easy to prove the following theorem.
As an example, consider the circuit-based large clique problem, Lee, which is the problem
analogous to the circuit-based directed reachability problem, that tests whether the ( compactly
represented) input graph on N = 2n nodes has a clique of size N /2. This problem is complete
for the class N£XPTIM£ = UkNTIME(2nk) with respect to polynomial-time reducibility. This
can be seen by introducing the circuit-based version of satisfiability; a formula is represented by a
polynomial-size circuit that outputs 1 whenever the first set of inputs gives the binary representation
of the index of a variable that occurs in the clause specified by the remainder of the inputs.
The generic reduction of Cook's theorem translates exactly to show that circuit-based satisfiability
is Nt:.XPTIM£-complete, and the completeness of Lee follows by using essentially the same reduction
used to show the NP-completeness of the ordinary clique problem. Given this completeness
result, it is easy to prove the following theorem.
Theorem. There exists a constant c > 0 such that Lee¢ NTIME(2nc).
Proof: Let L be the language in NTIME(2n2) -NTIME(2n) specified by Theorem (3.8).
Since L E Nt:.XPTIMt:., Lcx.pLcc. Let nk be the time bound for this reduction. Choose c = 1/k,
and now suppose that Lee E NTIME(2nc). We can decide if a string x oflength n is in L by first
applying the reduction to get a string of length at most nk, and then using the Turing machine
that decides Lee within the assumed bound. This composite nondeterministic procedure runs in
time nk + 2nkc = 0(2n), and this contradiction proves the theorem. D
One interpretation of this result is that there exist graphs specified by circuits such that proofs
that these graphs have a large clique require an exponential number of steps (in terms of the size
of the circuit). Observe that we would have been able to prove a stronger result if we would have
had a better bound on the length of the string produced by the reduction. Lower bounds proved
using this strategy are often based on completeness with respect to reducibilities that are further
restricted to produce an output of length bounded by a linear function of the input length.
This strategy has been applied primarily to problems from logic and formal language theory. A
result of Fischer & Rabin (1974) that contrasts nicely with Theorem (3.1) concerns Lpa, the language
of all provable sentences in the theory of arithmetic for natural numbers without multiplication,
which was shown to be decidable by Presburger (1929).
( 3.10) Theorem. There is a constant c > 0 such that Lpa rf. NT IM E(22cn).
A representative sample of problems treated in this fashion is surveyed by Stockmeyer (1987).
Extensions of NP: short proofs via randomization
In the same way that randomized algorithms give rise to an extended notion of efficiently
computable, randomized proofs give rise to an extension of NP, the class of languages for which
membership is efficiently provable. Randomized proofs can be thought of as efficiently proving
membership in a language by giving overwhelming statistical evidence, rather than proving with
certainty. In contrast to the extensions of NP discussed previously, this notion will give rise to
complexity classes much closer to the spirit of NP, and can be thought of as being "just above
NP". In creating this branch of complexity theory, Goldwasser, Micali & Rackoff (1985) and Babai
(1985) have given definitions to capture related notions of proof based on statistical evidence. We
28
will give a brief overview of this area, and the interested reader is directed to the surveys by
Goldwasser (1989) and Johnson (1988) or the introduction to the paper by Babai & Moran (1988).
will give a brief overview of this area, and the interested reader is directed to the surveys by
Goldwasser (1989) and Johnson (1988) or the introduction to the paper by Babai & Moran (1988).
2, and Merlin wants to convince
him that the two graphs are not isomorphic. Merlin can do this in the following way: Merlin asks
Arthur to choose one of the two graphs, randomly relabel the nodes, and show him this random
isomorphic copy of the chosen graph; Merlin will tell which graph Arthur chose. If the two graphs
are isomorphic, then Merlin has only a fifty percent chance of choosing the right graph, assuming
that he cannot read Arthur's mind. If Merlin can successfully repeat this test several times, then
Arthur can fairly safely conclude that Merlin can distinguish isomorphic copies of the two graphs;
in particular, the two graphs are not isomorphic.
The above randomized proof is a prime example of the interactive proof defined by Goldwasser,
Micali & Rackoff (1985). An interactive protocol consists of two Turing machines: the Prover
(Merlin) and the Verifier (Arthur). The Prover is trying to convince the Verifier that a certain
word x is in a language L. The Prover has unrestricted power; the Verifier is restricted to be a
randomized polynomial-time Turing machine. The two machines have a common input tape, and
each has its own private randomizing tape and work tapes. The two machines operate in turns;
after each machine's turn, it writes its output on a special communication tape, and the other
machine starts its turn by reading this message. At the end of the protocol, the Verifier announces
whether or not it accepts x. A language L has an interactive proof system if there exists an
interactive protocol such that
• if x E L, then the Verifier accepts with probability at least 2/3 ( i.e., if x E L, then the Prover
can "prove this");
• if x oAM(nk).
One might expect a natural relationship between the levels of the Arthur-Merlin hierarchy and
the polynomial-time hierarchy. The simplified proof that l3PP ~ 'Jjf n Ilf due to Lautemann
( 1983) has been extended by Babai (in the case of constant t( n)) and Aiello, Goldwasser & Hastad
(1986) (in general) to show that AM(t(n)) ~ Ili{n) for t(n) 2:: 2.
It would be natural to conjecture that the hierarchies for both notions of randomized proofs are
strict, and that AM(k) is strictly contained in IP(k) for every k. (For example, in the graph nonisomorphism
proof system given above, note how important it is that Arthur's coins be private.)
The first surprise in this area was Babai's proof that the finite levels of the Arthur-Merlin hierarchy
collapse.
(3.11) Theorem. AM(t(n) + 1) = AM(t(n)) for every polynomially bounded t(n) 2:: 2.
To prove this result, we will decrease the number of alternations by interchanging a move of
Arthur with a move of Merlin. Consider a move of Merlin followed by a move of Arthur. If we
simply ask Arthur to move first, this gives Merlin an unfair advantage that may be sufficient to turn
30
around the odds of the game. To decrease this advantage, we shall use the standard trick of letting
Arthur and Merlin play several games in parallel, and choosing the majority decision. Consider
again the consecutive moves of Merlin and Arthur in the original game. At Arthur's move, we split
the game into several parallel games by allowing Arthur to make several alternate (random) moves.
It can be shown that by interchanging the ( unsplit) move of Merlin with the (split) move of Arthur,
we obtain a new game that defines the same language. In essence, if Merlin's winning probability
was small enough, then even after seeing (ahead of time) the several next moves of Arthur, it is
not too likely that he has a single move that is good for the majority of Arthur's choices.
around the odds of the game. To decrease this advantage, we shall use the standard trick of letting
Arthur and Merlin play several games in parallel, and choosing the majority decision. Consider
again the consecutive moves of Merlin and Arthur in the original game. At Arthur's move, we split
the game into several parallel games by allowing Arthur to make several alternate (random) moves.
It can be shown that by interchanging the ( unsplit) move of Merlin with the (split) move of Arthur,
we obtain a new game that defines the same language. In essence, if Merlin's winning probability
was small enough, then even after seeing (ahead of time) the several next moves of Arthur, it is
not too likely that he has a single move that is good for the majority of Arthur's choices.
It was an even bigger surprise when Goldwasser & Sipser (1986) proved that interactive proofs
and Arthur-Merlin games define the same complexity classes.
(3.12) Theorem. IP(t(n)) = AM(t(n)) for every polynomially bounded function t(n) 2: 2.
The idea of the proof can be outlined as follows. Instead of playing the game given by the
interactive protocol, Merlin can try to convince Arthur that he could win for a majority of Arthur's
random choices. This approach requires a technique to prove a lower bound on the size of an
exponentially large set. To give an idea of how to provide such a lower bound consider the following
problem. Consider a language L E NP that consists of pairs ( x, y) of strings of equal length where,
for example, x can be viewed as the common input, and y as Arthur's random coin tosses. Define
L(x) = {y: (x,y) EL}, and suppose that Arthur wants to convince Merlin that IL(x)I 2: f for
some integer f. This problem might be #P-complete. However, the fraction of random strings for
which the Prover can win in an interactive proof is either at least 2/3 or at most 1/3. Therefore, it
might be useful to achieve the following, more modest goal: for any fixed E, prove that there exists
an AM protocol in which Merlin can almost certainly win if IL(x)l 2: (1 + E)f and has almost no
chance to win if IL(x)I < f. Goldwasser & Sipser have given such a protocol by using a hashing
function technique similar to the one used to prove that BPP ~ :Ef n IIf.
To see how this lower bounding technique can be used to turn an interactive proof into an
Arthur-Merlin game, we consider the graph non-isomorphism problem. The following ArthurMerlin
protocol is an adaptation of the Arthur-Merlin protocol given by the Goldwasser & Sipser
proof when applied to the graph non-isomorphism protocol given in the beginning of this subsection.
Let Aut(X) denote the automorphism group of the graph X (see Chapter 27), and let (G1, G2)
denote the input. We may assume that both graphs are connected. Furthermore, let G denote
the graph that consists of node-disjoint copies of G1 and G2• If the two graphs are isomorphic
then IAut(G)I = 2 · IAut(G1)l · IAut(G2)I, and otherwise, IAut(G)I = IAut(G1)I · IAut(G2)I. The
non-isomorphism of two graphs can be shown by proving fairly tight lower bounds for IAut( Gi)I for
i = 1, 2, and an upper bound for I A ut( G) I-The upper bound can be shown by providing a lower
bound on the number of different isomorphic copies of G. (In fact, the lower bounds for IAut( Gi)I
can be proved without randomization by guessing generators for the automorphism groups and
using a polynomial-time algorithm to compute the cardinality of the generated group.)
Although the result of Goldwasser and Sipser implies that we can restrict attention to one of
these two sorts of randomized proofs and forget the other, it is quite useful to have both notions.
If one wishes to design a protocol, clearly it is easier to use private coins. On the other hand, the
31
relative simplicity of an Arthur-Merlin game makes it well-suited for proving properties of these
hierarchies.
relative simplicity of an Arthur-Merlin game makes it well-suited for proving properties of these
hierarchies.
The most important complexity class defined via randomized proofs seems to be AM2. All
known examples of languages that belong to IP actually belong to one of the finite levels, and as
a consequence belong to .AM. The class AM is an extension of NP using randomization, but not
too much interaction. This seems to suggest that randomization is the more important element
of the new kind of proofs introduced. We now give an oracle-based generalization of NP using
randomization, that turns out to be equivalent to AM.
We define a random language A to be one that contains each word x independently with
probability 1/2. For a complexity class C, let us define almost -C to be the class of languages
that are in cA for almost every oracle A. In other words, a language is in almost C if it is in
cA for a random oracle with probability 1. Gill (1977) has shown that l3PP = almost -P. One
can similarly show that AM ~ almost -NP, and one might expect that the converse is also true.
However, this is not clear. The difficulty is that a nondeterministic Turing machine can use the
power of nondeterminism to access an exponential number of random bits, whereas Arthur can
generate only a polynomial number. However, a recent result of Nisan & Wigderson (1988) implies
that a polynomial-time nondeterministic Turing machine cannot take advantage of the exponential
number of random bits available, so that AM= almost -NP.
As is true for most complexity classes, there is little known about the limits of AM. It appears
that AM does not contain co-NP, but to prove this would imply that NP f= co-NP. Nonetheless,
the following theorem, due to Boppana, Hastad & Zachos (1987), provides some evidence.
(3.13) Theorem. If co -NP~ AM then ~f IIf = AM.
Proof. We describe a compact proof due to Babai. Assume that co -NP ~ AM. It suffices
to prove that this implies that ~f ~ AM ~ IIf, since co -C ~ C implies that C = co C for any
2Recent results imply that this intuition seems to be incorrect. Lund, Fortnow, Karloff & Nisan recently showed
that the permanent could be computed using an interactive proof with a polynomial number of rounds. This implies
that PH ~ TP, and Shamir extended their techniques to prove the even more astounding result that testing the
validity of a quantified boolean formula is in TP, which implies that TP = PS-PAC[!
32
class C. Note that AM~ Ilf is true without the assumption, as we have already mentioned.
Now let L be a language in I:f. There exists a language L' E IIf = co -NP such that L =
class C. Note that AM~ Ilf is true without the assumption, as we have already mentioned.
Now let L be a language in I:f. There exists a language L' E IIf = co -NP such that L =
: 3Py such that (x, y) E L'}. By assumption, L' E AM and therefore L E MAM = MA(3).
Theorem (3.11) implies that MA(3) = AM. □
We have seen that the graph non-isomorphism problem is in AM. Therefore, Theorem (3.13)
provides new evidence that the graph isomorphism problem is not NP-complete.
(3.14) Corollary. If the graph isomorphism problem is NP-complete, then I:f = IIf = AM.
Goldwasser, Micali & Rackoff introduced the interactive proof system in order to characterize
the minimum amount of "knowledge" that needs to be transferred in a proof. Interactive proofs
make it possible to "prove", for example, that two graphs are isomorphic, without giving any further
clue about the isomorphism between them. These aspects of interactive proofs shall be discussed
at the end of the next section.
Living with Intractability
The knowledge that a problem is NP-complete is little consolation for the algorithm designer
that needs to solve it. Contrary to their theoretical equivalence, all NP-complete problems are
not equally hard from a practical perspective. In this section, we will examine two approaches to
these intractable problems that, while not overcoming their inherent difficulty, make them appear
more manageable. In this process, finer theoretical distinctions will appear among these problems
that help to explain the empirical evidence. We conclude this section by considering work that
uses intractability to an algorithm's advantage; much of modern cryptography relies on the simple
idea that a code is secure if any algorithm to efficiently break it could also be used to solve an
intractable problem.
The complexity of approximate solutions
Most of the 21 NP-complete problems in Karp's original paper are decision versions of optimization
problems; this is also true for a great majority of the problems catalogued by Garey &
Johnson (1979). Although the combinatorial nature of these problems makes it natural to focus on
optimal solutions, for most practical settings in which these problems arise it is nearly as satisfying
to obtain solutions that are guaranteed to be nearly optimal. In this subsection we will first classify
the sorts of results that one might obtain in this area, and then sketch several results that exemplify
the techniques that have been used.
In order to simplify the discussion of terminology, we will focus on minimization problems
where feasible solutions have integral objective function values. For more technical reasons, we will
make the simplifying assumption that the value of the objective function depends on the input in
a reasonable way; we assume that it is bounded by a polynomial in the length of the input and
the value of largest number represented in the input. Consider an instance J of an optimization
problem, and let OPT(I) denote the value of an optimal solution to J. If an approximation
algorithm A (or more simply, a heuristic) delivers a solution of value A(J), A(I)/OPT(I) is called
its performance ratio for J. The absolute performance guarantee of A is the smallest p such
33
that the performance ratio is at most p for all instances. If the absolute performance guarantee
of A is p, and A is polynomial-time, then it is called a p-approximation algorithm. For some
problems, this absolute measure gives a distorted picture, since an algorithm may perform poorly
only when OPT(I) is small. As a result, we will often consider the asymptotic performance
guarantee of A, which is the infimum of all p such that the performance ratio is at most p when
restricted to instances where OPT(I) 2: k, for some choice of k.
that the performance ratio is at most p for all instances. If the absolute performance guarantee
of A is p, and A is polynomial-time, then it is called a p-approximation algorithm. For some
problems, this absolute measure gives a distorted picture, since an algorithm may perform poorly
only when OPT(I) is small. As a result, we will often consider the asymptotic performance
guarantee of A, which is the infimum of all p such that the performance ratio is at most p when
restricted to instances where OPT(I) 2: k, for some choice of k.
The definition of a polynomial approximation scheme allows the running time to depend even
superexponentially on 1/ E, since E is a constant for the sake of the analysis. However, if such a
scheme is to be really useful in practice, the dependence on 1/ E must be constrained. If the running
time of A€ increases as a polynomial function of log(l/E), then the assumption about the value
OPT(I) makes it possible to choose E of polynomial length, so that A€ -OPT(I) < 1. By the
integrality offeasible solutions, we see that this gives a polynomial-time algorithm for the problem.
A more modest goal is to limit the dependence on E to be a polynomial in 1/E, and such a family
of algorithms is called a fully polynomial approximation scheme. The previous argument
shows that a fully polynomial approximation scheme can be used to obtain an algorithm that
finds an optimal solution, where the running time is polynomial in the size of the unary encoding
of the problem. This yields an important result, due to Garey & Johnson (1978); if a problem
is strongly NP-complete, then there is no fully polynomial approximation scheme for it unless
P = NP. Notice that this argument does not exclude the existence of such a family of algorithms
with asymptotic performance guarantee arbitrarily close to 1; such a family will be called a fully
polynomial asymptotic approximation scheme.
An example of an algorithm with the strongest sort of performance guarantee predates complexity
theory. In the edge coloring problem, we are given an undirected graph G and an integer
k, and we wish to color the edges with as few colors as possible so that no two edges incident to
the same node receive the same color. In a classical result, Vizing (1964) proved that the chromatic
index is at most one more than maximum degree of G, and his proof immediately yields a
polynomial-time algorithm that uses at most OPT(I) + 1 colors. In fact, this strong result was
viewed as suggestive evidence that the edge coloring problem was not NP-complete, but Holyer
(1981) showed that deciding if a cubic graph requires either 3 or 4 colors is NP-complete. This
example also serves as a good illustration of the difference between the asymptotic performance
guarantee, which is 1 in this case, and the absolute performance guarantee. Suppose that there
exists a polynomial-time edge coloring p-approximation algorithm with p < 4/3. If this algorithm
is run on a cubic graph that can be colored with three colors, then the algorithm must return a
coloring that uses fewer than four colors; it returns an optimal coloring. Thus, by Holyer's result
no such algorithm can exist unless P = NP.
The idea of studying the performance guarantees of heuristics for optimization problems was
first proposed by Graham (1966), in the context of a simple machine scheduling problem: each
of n jobs is to be scheduled on one of m machines, where job j requires processing time Pi to be
scheduled on any machine; find the minimum deadline d so that there is an assignment of jobs to
34
machines where the jobs assigned to each machine have a total processing requirement at most d.
The decision version of this problem is a generalization of the 3-partition problem, so that it is
strongly NP-complete. The most natural heuristic is to list the jobs, and then repeatedly assign the
next listed job to the machine that currently has the smallest total processing load. Graham showed
that this heuristic is a 2-approximation algorithm. To see this, consider the last job j assigned to the
machine with the largest total processing time P assigned to it. Since j is assigned to this machine,
all other machines must be assigned a total no less than P -Pi and so OPT( I) 2:: P -Pi. However,
OPT(I) is clearly at least Pi, and thus P ::S 20PT(I). For this problem, there is essentially no
difference between the absolute and asymptotic performance guarantees, because the problem has
a simple scaling property; by multiplying all of the processing times by some factor, we get an
equivalent problem with a larger optimal value.
machines where the jobs assigned to each machine have a total processing requirement at most d.
The decision version of this problem is a generalization of the 3-partition problem, so that it is
strongly NP-complete. The most natural heuristic is to list the jobs, and then repeatedly assign the
next listed job to the machine that currently has the smallest total processing load. Graham showed
that this heuristic is a 2-approximation algorithm. To see this, consider the last job j assigned to the
machine with the largest total processing time P assigned to it. Since j is assigned to this machine,
all other machines must be assigned a total no less than P -Pi and so OPT( I) 2:: P -Pi. However,
OPT(I) is clearly at least Pi, and thus P ::S 20PT(I). For this problem, there is essentially no
difference between the absolute and asymptotic performance guarantees, because the problem has
a simple scaling property; by multiplying all of the processing times by some factor, we get an
equivalent problem with a larger optimal value.
For each of the previous problems, we have either been able to provide an algorithm with
constant performance guarantee, or been able to prove that such an algorithm would imply that
P = NP. For the next few examples, we will present algorithms for which there is evidence that
they cannot be improved, in terms of their performance guarantees.
In the center problem, the input is a complete graph with integer edge weights { ce} and
an integer k, and the aim is to select k nodes that cover all remaining nodes within as small a
radius as possible, where a node v is covered within radius r if there is a selected node u such that
Cuv ::S r. As was done in the traveling salesman problem, an easy reduction (from the dominating set
problem) shows that no fixed performance guarantee can be achieved. Unlike the traveling salesman
35
problem, a slight modification of this reduction shows that even with the triangle inequality, no
p-approximation algorithm exists for p < 2 unless P = NP. Hochbaum & Shmoys (1985) give a
2-approximation algorithm that relies on a more general technique, called dual approximation.
A dual approximation algorithm finds solutions that have objective function values that are at
most OPT(I), at the expense of obtaining solutions that are infeasible. The performance of a dual
approximation algorithm is a bound on the infeasibility of the solutions generated.
problem, a slight modification of this reduction shows that even with the triangle inequality, no
p-approximation algorithm exists for p < 2 unless P = NP. Hochbaum & Shmoys (1985) give a
2-approximation algorithm that relies on a more general technique, called dual approximation.
A dual approximation algorithm finds solutions that have objective function values that are at
most OPT(I), at the expense of obtaining solutions that are infeasible. The performance of a dual
approximation algorithm is a bound on the infeasibility of the solutions generated.
r. In a problem that might be viewed as a dual of the center problem, we want to minimize k,
in order to achieve a specified radius of coverage r. A p-dual approximation algorithm for this
problem is a polynomial-time algorithm that, given edge weights and a radius r, finds k nodes that
cover all points within radius pr, where k is no more than the optimal number of nodes needed to
cover within radius r. It is easy to see that such a p-dual approximation algorithm can be used
within a binary search procedure to obtain a p-approximation algorithm for the center problem.
The following algorithm is a 2-dual approximation algorithm: choose any node, delete all nodes
covered within radius 2r, and repeat until all nodes are covered. To see that the number of nodes
selected is not too large, observe that it is impossible to choose two nodes covered by the same node
in the optimal solution for radius r. The same framework of using a dual approximation algorithm
for a dual problem to obtain an ordinary approximation algorithm for the original problem can, of
course, be applied to any problem where there is such a tradeoff between two resources.
One important technique that has been used in constructing approximation schemes is that of
rounding and scaling. The technique is based on a simple idea: round the data sufficiently so
that a pseudo-polynomial algorithm for the rounded problem runs in time polynomial in the original
instance size; then interpret the solution in terms of the original problem, and if the rounding was
small enough, the solution is still nearly optimal. Ibarra & Kim (1976) developed this technique
to give a fully polynomial approximation scheme for the knapsack problem: there are n pieces
to be packed in a knapsack of size B, where piece j has a size Sj ::S B and a value Vj, and the aim
is pack a subset of total size no more than B, with maximum total value. It is easy to see that
the subset sum problem reduces to a decision version of this, and also that an approach similar
to the one used for the subset sum problem gives a pseudo-polynomial algorithm for this problem
(where only the Vj must be encoded in unary). Algorithm A€ is as follows: round Vj down to
its closest multiple ofμ= emaxj{Vj}/n, and solve the rounded instance; if this solution is worse
than maxj{vj}, return the most valuable piece instead. This solution differs from OPT(I) by at
most nμ = emaxj{Vj} ::SEA€, and so OPT(I)/AiI) ::S (1 + e). Since the rounded problem (when
rescaled by μ) has length in its unary encoding at most n2 / € plus the size of the original instance
( encoded in binary), we see that this gives a fully polynomial approximation scheme.
The bin packing problem is dual to the machine scheduling problem: given a specified
capacity d, pack n pieces with sizes Pi into as few bins of capacity d as possible. Since it is easy
to formulate the subset sum problem as a question of whether 2 bins suffice, we see that a papproximation
algorithm with p < 3/2 would imply that P = NP. On the other hand, the bin
packing problem does not have the scaling property, and Johnson (1973) showed that many simple
heuristics do quite well, with asymptotic guarantees as small as 11/9. It was a great surprise
when Fernandez de la Vega & Lueker (1981) not only substantially improved this 11/9 bound,
but gave a polynomial asymptotic approximation scheme, by showing that a rounding and scalinglike
approach could be used for this problem. Karmarkar & Karp (1982) gave a fully polynomial
36
asymptotic approximation scheme for the bin packing problem, by using similar ideas in conjunction
with a more efficient approach to solving a related underlying linear program. This result was also
quite surprising, since no such approximation scheme was known for any strongly NP-complete
"number problem." Using techniques similar to those of Fernandez de la Vega & Lueker (1981),
asymptotic approximation scheme for the bin packing problem, by using similar ideas in conjunction
with a more efficient approach to solving a related underlying linear program. This result was also
quite surprising, since no such approximation scheme was known for any strongly NP-complete
"number problem." Using techniques similar to those of Fernandez de la Vega & Lueker (1981),
2d, then there are at most k = 1 + 1/€2 different piece sizes. For any constant k, there is
polynomial-time algorithm to solve the bin packing problem with only k different sizes (although
the degree of the polynomial depends on k ). If the solution to the rounded problem is interpreted
for the original, then since each bin contains at most 1/E pieces, the total difference between the
contents of a bin with and without rounding is at most (1/E)€2d; that is, each bin contains at
most (1 + E)d. Clearly, the number of bins used is no more than the optimum with capacity d.
To complete the packing, add the small pieces to any bin with no more than din it already, and
start a new bin only when all bins are overfilled. It is easy to see that this gives a (1 + €)-dual
approximation algorithm.
For the preceding problems, we have been able to find algorithms whose performance matches
the limits implied by assuming that P =I NP. Unfortunately, this is much more the exception than
the rule. The interest in performance guarantees of algorithms for graph problems is largely due
to the work of Johnson (1974a), but little improvement has been made in the intervening years.
For the maximum stable set problem, the algorithm with the best known performance guarantee
is a simple greedy heuristic. A stable set I is constructed by placing a node v of smallest degree in I
and deleting v and its neighbors and iterating until the entire graph is deleted. If a(G) 2:: n/k, then
any node in I has degree at most n-( n/ k ), and J has logk n nodes. This implies that the algorithm's
performance ratio is 0( n/ log n ). (For maximization problems, we invert all performance ratios.)
Garey & Johnson (1976) developed a graph composition technique to show that if there exists
a p-approximation algorithm for the maximum stable set problem for some constant p, then there
exists a polynomial approximation scheme. Let G2 be the graph formed by composing a graph G
with itself, in the sense that each node of G is replaced by a copy of G, and there is an edge between
any pair of nodes in copies of G that correspond to adjacent nodes. Note that a(G2) = a(G)2,
and that given a stable set of size k2 in G2 one can efficiently find an stable set of size k in G.
2
By applying a p2-approximation algorithm to G2, we get a set of size a( G)2 / p, which can then
be used to find an stable set in G of size a( G) / p. By repeatedly applying this technique, we get
the claimed result. Since this problem has the scaling property (simply take disjoint copies of the
graph), an identical result holds for asymptotic performance guarantees.
The state of the art for the node coloring problem is not much better. For the node coloring
problem, a corollary of the NP-completeness of 3-colorability is that no p-approximation algorithm
can exist with p < 4/3. The same result holds for asymptotic performance guarantees. To see this,
note that for any graph G we can produce a graph G' with x( G') = kx( G), by taking k disjoint
37
copies of G and adding an edge between any pair of nodes in different copies. This construction
maintains the ratio of the chromatic number of those instances constructed from graphs G with
x(G) = 4 and those constructed from graphs with x(G) = 3. Garey & Johnson (1976) use a
more intricate composition technique to increase this ratio, and thereby prove that an asymptotic
performance guarantee less than 2 would imply that P = NP.
copies of G and adding an edge between any pair of nodes in different copies. This construction
maintains the ratio of the chromatic number of those instances constructed from graphs G with
x(G) = 4 and those constructed from graphs with x(G) = 3. Garey & Johnson (1976) use a
more intricate composition technique to increase this ratio, and thereby prove that an asymptotic
performance guarantee less than 2 would imply that P = NP.
2 /(log n )2)x( G).
Probabilistic analysis of algorithms
One justified criticism of complexity theory is that it focuses on worst-case possibilities, and this
may in fact be unrepresentative of the practical reality. In this section, we will examine results that
are concerned with the probabilistic analysis of algorithms, where the inputs are selected according
to a specified probability distribution, and the average behavior is studied. After a brief discussion
of the sorts of results that might be obtained, we will illustrate this approach by discussing results
concerning several of the combinatorial problems already mentioned. We shall also sketch the main
ideas of an analogue of NP-completeness, recently proposed by Levin, to provide evidence that a
problem is hard to solve in even a probabilistic sense. For a more extensive survey of results in this
area, the reader is referred to Karp, Lenstra, McDiarmid & llinnooy Kan (1985), whereas Coffman,
Lueker & llinnooy Kan (1988) give a nice introduction to the techniques used.
In order to discuss the probabilistic performance of algorithms, we first introduce some terminology
from probability. Let {rn} be a sequence of random variables, and let r be a random
variable. Then Tn-+ r in expectation, if limn-+oo E[lrn-rl] = 0. A stronger notion of convergence
is that Tn -+ r in probability, which occurs if for every€ > 0, limn .... 00 Pr[lrn -rl > e] = 0. The
sequence Tn-+ r almost surely, if Pr[lim supn-+oo Tn r = liminfn-+oo rn] = 1.
These definitions are used to characterize different notions of an algorithm producing asymptotically
optimal solutions. For example, if Tn denotes the performance ratio A(In)/0PT(In) for
a particular algorithm A when run on an input In of size n drawn from a specified distribution,
then it is optimal in expectation if rn -+ 1 in expectation. Successively stronger results would
prove that an algorithm is optimal in probability and optimal almost surely, depending on
whether rn-+ 1 in probability or almost surely. These notions characterize approximation results,
in the sense that the solutions produced by the algorithm may always be suboptimal, since merely
the asymptotic error is bounded. It would, of course, be still more preferable to prove that the
38
algorithm generally yields the optimal solution. We will say that an algorithm solves the problem
in probability if limn--+oo Pr[A(In) = 0PT(In)] = 1. This definition can also be applied to decision
problems in NP, where the solution values are restricted to be 0 or 1, but in this case we will
require that the algorithm also produce a certificate proving that the solution value is 1. Finally,
an algorithm solves the problem in expected polynomial time, if it always finds an optimal
solution, and the expected value of the running time is bounded by a polynomial.
algorithm generally yields the optimal solution. We will say that an algorithm solves the problem
in probability if limn--+oo Pr[A(In) = 0PT(In)] = 1. This definition can also be applied to decision
problems in NP, where the solution values are restricted to be 0 or 1, but in this case we will
require that the algorithm also produce a certificate proving that the solution value is 1. Finally,
an algorithm solves the problem in expected polynomial time, if it always finds an optimal
solution, and the expected value of the running time is bounded by a polynomial.
In a paper that stimulated the area of probabilistic analysis of algorithms for NP-complete
problems, Karp (1976) proposed a natural partitioning scheme for solving the traveling salesman
problem over instances where the n nodes are i.i.d. uniformly in the unit square [0, 1]2, and the
intercity distances Cij are given by the Euclidean distance between each pair of nodes. A classical
theorem of Beardwood, Halton & Hammersley (1959) proves that there exists a constant c such
that 0PT(In)/.jn-+ c almost surely for instances drawn from this distribution.
We now give the adaptive partitioning method of Karp. For simplicity, assume that no two
nodes have a common vertical coordinate or horizontal coordinate. Find a vertical line through
a node that splits the unit square into two parts so that each part contains essentially the same
number of nodes. ( Consider a node on the dividing line to be in both parts.) For each of the two
parts, split them analogously by finding a horizontal line that ensures that the number of nodes in
each new region is reduced by a factor of two. Repeat this procedure recursively on each region
until each small part has log n nodes. A dynamic programming algorithm can find an optimal
solution to any of these small instances in 0( n log2 n) time. The union of these tours is a spanning
Eulerian graph, and as in Christofides' algorithm, the Eulerian tour yields an ordering of the nodes.
To analyze the performance of the algorithm, let Ri denote the ith region, and let Ti and T/
denote, respectively, the length of the part of the heuristically generated tour in Ri, and the length
of the part of the optimal tour in Ri. We first show that Ti is at most T/ plus the 3/2 times the
perimeter of Ri. Consider Ri, and add a node at each point where the optimal tour leaves this
region. Construct a tour through all of the nodes in Ri in the following way: take the pieces of the
optimal tour, and add the perimeter once, which makes the graph connected; then take alternate
pieces of the perimeter, choosing a la Christofides, the matching of smaller length. This Eulerian
graph can be shortcut to yield a tour, proving the claim. This implies that A(In) ~ 0PT(In)+total
perimeter length ~ 0 PT(In) + 0( Jn/ log n ). The theorem of Beardwood, Halton & Hammersley
( or even a simple neighborhood argument) shows that there exist constants c' and d < 1 such that
Pr[0PT(In) < c\/n] ~ dn. These results are sufficient to prove that the algorithm is optimal
almost surely; it is not hard to see that for any E > 0,
00 00
L Pr[(A(In)/0PT(In)) 2: l+E] ~ L(Pr[OPT(In) < c' vn]+Pr[I0PT(In)-A(I)I 2: Ec\/n]) < 00
n=l
39
and by the Borel-Cantelli lemma, this implies that the relative error converges to 0 almost surely.
Although worst-case analysis of the traveling salesman problem requires assumptions such as
the triangle inequality to obtain reasonable results, the same is not true for probabilistic analysis.
In fact, consider instances of the directed traveling salesman problem, where all nand by the Borel-Cantelli lemma, this implies that the relative error converges to 0 almost surely.
Although worst-case analysis of the traveling salesman problem requires assumptions such as
the triangle inequality to obtain reasonable results, the same is not true for probabilistic analysis.
In fact, consider instances of the directed traveling salesman problem, where all nentries of the
distance matrix are i.i.d. uniform random variables over the interval [0,1). Algorithms to find a
minimum weight perfect matching in a bipartite graph yield a polynomial-time procedure to find
the minimum length collection of directed circuits that cover all of the nodes. It remains merely to
patch these circuits together to build a tour. A fairly natural algorithm is to take the two longest
circuits, find the minimum cost way to delete two arcs and to add two arcs to join the two circuits,
and repeat until the result is a single tour. Karp showed that the expected value of the performance
ratio of this algorithm is 1 + 0( n-112), and so it is optimal in expectation. (For a survey containing
this and other results on the probabilistic analysis of the traveling salesman problem, the reader is
referred to the survey of Karp and Steele (1985).)
Since the Hamiltonian circuit problem is a decision problem, we shall focus instead on probabilistic
results to solve this problem, and so the analysis has a somewhat different flavor. The study
of algorithms for this problem grew out of the theory of random graphs of Erdos & Renyi (1960),
which is treated elsewhere in this volume (see Chapter 6). There are two standard distributions of
random graphs of order n: the graph Gn,m is chosen uniformly from all graphs with m edges, and
the graph Gn,p is selected by including each possible edge independently with probability p. Most
results in one model can be translated to an equivalent result in the other. A major part of the study
of random graphs deals with the evolution of the graph as the number of edges grows as a function of
the number of nodes, and one of the fundamental results of this area has been to provide a threshold
function that characterizes the density where the graph quickly changes from having an overwhelming
chance of being non-Hamiltonian to having an overwhelming chance of being Hamiltonian. The
ultimate result along these lines is due to Komlos & Szemeredi (1983), who proved that if H(n, m)
is the probability that Gn,m is Hamiltonian, and m(n) = ( n log n )/2 + ( n log log n )/2 + nw( n ), then
limn-+oo H(n,m(n)) = 0 if w(n)-+ -oo; = exp(exp(-2c)) if w(n)-+ c; and =1 if w(n)-+ oo.
This result is surprising in view of the fact that an identical result was shown by Erdos & Renyi
(1960) for the property of having minimum degree two, which is clearly necessary for a graph to
be Hamiltonian.
Although the previous result is not algorithmic, the progress towards this result has included
a number of results that yield polynomial-time algorithms to find a Hamiltonian circuit in the
case that it is likely that one exists. The first result proving that a modest number of edges is
sufficient for almost all graphs to have a Hamiltonian circuit is due to Perepelica (1970), who gave
an algorithm that solves the Hamiltonian circuit problem in probability for graphs with cn312'\l'logn
edges. There have been a series of results that have gradually reduced the number of edges that are
sufficient to be sure of efficiently finding a Hamiltonian circuit. Based on a nearly algorithmic result
of Posa (1976), Korsunov (1976) gave the first result of the correct order by reducing the bound to
3n log n. In a recent paper, Bollobas, Fenner & Frieze (1985) provide a polynomial-time algorithm
HAM that yields an algorithmic proof of the result of Komlos & Szemeredi. By considering the
special case Gn,p with p = 1/2, they give a slight extension of HAM that solves the Hamiltonian
circuit problem in expected polynomial time. For this distribution, the probability that HAM fails
to find a Hamiltonian circuit in a graph that is connected and has minimum degree at least two is so
small, that if an exponential-time dynamic programming algorithm to solve the Hamiltonian circuit
40
problem is run in this case, the expected running time remains polynomial. Furthermore, they use
a variant of this algorithm to solve in probability the bottleneck version of the traveling salesman
problem, where the aim is to minimize the maximum weight edge included in the Hamiltonian
circuit, where the edges of the undirected complete graph are given weights i.i.d. uniformly in the
interval [0,1]. The algorithm first computes the minimum value t such that the subgraph containing
the t cheapest edges has minimum degree 2, and then applies HAM to find a Hamiltonian circuit
in this subgraph.
problem is run in this case, the expected running time remains polynomial. Furthermore, they use
a variant of this algorithm to solve in probability the bottleneck version of the traveling salesman
problem, where the aim is to minimize the maximum weight edge included in the Hamiltonian
circuit, where the edges of the undirected complete graph are given weights i.i.d. uniformly in the
interval [0,1]. The algorithm first computes the minimum value t such that the subgraph containing
the t cheapest edges has minimum degree 2, and then applies HAM to find a Hamiltonian circuit
in this subgraph.
2 n is selected. Since the algorithm does not "look" at the remaining graph, we see that the
remaining graph is a random graph of smaller order selected according to the same distribution.
This allows us to analyze the process of repeatedly choosing such a set until the entire graph has
been colored. It is not hard to show that the size of the largest stable set in Gn,p is, in probability,
2 log2 n. This implies that the algorithm uses at most 2x( Gn,p) colors in probability. Bollobas
(1988) has recently shown that the lower bound on x( Gn,p) given by the stable set computation
is indeed tight, so that the greedy algorithm does indeed use twice as many colors as needed, and
it remains an interesting open question to give an efficient algorithmic proof of Bollobas' result.
Although we have focused on the case p = 1/2, these results have been extended to include a
broader spectrum of distributions.
We next consider the subset sum problem. Initially, we will restrict attention to the following
distribution of instances that are chosen to be "yes" instances, and so the problem is to find
an appropriate subset: randomly select a vector x in { O, 1 }n, and choose integers s1, ... , Sn i.i.d.
uniformly in [1, U], where U = 2Sn2 for any fixed 8 > 1/2, and let the target t = ~f=I Xi •Si. Lagarias
& Odlyzko (1983) showed how to use the lattice basis reduction algorithm of Lovasz to find x, given
the Si and t. A lattice is the collection of vectors that can be represented as an integral linear
combination of a given set of n linearly independent basis vectors in zn. Given one basis, the basis
reduction algorithm of Lovasz produces a basis whose shortest vector has length at most 2(n-l)/2
times the length of the shortest non-zero vector in the lattice (see Chapter 19).
We shall give a short proof of the result of Lagarias & Odlyzko due to Frieze (1986). For the
analysis, consider a fixed choice of x, and a random selection of the Si. We may assume that
~i=l s;/2 ::; t since otherwise we can solve the problem of finding the complementary subset. Consider
the ( n+ 1 )-dimensional lattice given by the basis ao = ( ct, O, ... , 0), ai = (-csi, bi1, bi2, ... , bin),
i = 1, ... , n, where c is an integer greater than n2n/2 and the matrix B = (bij) is the identity matrix.
Note that x = (0,x1, •.. ,xn) is in this lattice. Let A= (.\0, ... ,>.n) be the shortest vector
returned by the basis reduction algorithm, and let >. * denote the shortest non-zero vector in the
lattice. We will show that, with probability approaching 1, A is x or -x. Using the above bound
for the basis reduction algorithm, we see that II-XII::; 2n/2il>.*II ::; 2nl2llxll ::; ~ = m, where II· II
denotes the Euclidean norm. Our choice of c implies that .\0 = 0. Next, consider the set Ao of
lattice points a with a0 = 0 and llall ::; m that are not integer multiples of x. We can bound the
probability that the algorithm fails by the probability that Ao is non-empty. It is not hard to see
41
that if a E Ao, then it must satisfy :Ef=that if a E Ao, then it must satisfy :Ef=ai · Si = dt for some integer d, ldl ~ 2m. Consider some
fixed choice of d and ( a1, •.• , an) that is not an integer multiple of x. The probability that they
satisfy this equation is at most 1 / U, and since there are at most ( 4m + 1 )( 2m + 1 t = o( 28n2) such
choices, we get the claimed result. By choosing t uniformly among the integers from 1 to f cnUl,
where c ~ 1 is some positive constant, this technique can be extended to instances that may not
have solutions.
Although the probabilistic analysis of algorithms has focused mainly on NP-complete problems,
it has often served as a useful tool to show that the average-case running time of certain algorithms
is much better than the worst-case running time. The most important example of this is the
simplex method for linear programming, which is a practically efficient algorithm that was shown
to have exponential worst-case running time by Klee & Minty (1972). Borgwardt (1982) showed
that if the rows of the constraint matrix are i.i.d. according to a spherically symmetric distribution
(where each vector is viewed as a ray from the origin), then a variant of the simplex method
requires only a polynomial number of iterations. Independently, Smale (1983) showed similar
results under the assumption that the rows of the constraint matrix, the vector of right-hand sides,
and the objective function are independently distributed with the symmetry assumption that the
distribution is invariant under coordinate permutations. Smale analyzed a practical variant of
the simplex algorithm, and unlike Borgwardt's result, his distribution includes both feasible and
infeasible problems. There has been a great deal of work done in extending these results to more
general probability distributions and in analyzing algorithms closer to the particular variants of the
simplex method most commonly used in practice. For a thorough survey of results in this area, the
reader is referred to Shamir (1987).
As we have seen, not all problems in NP have been solved efficiently even with probabilistic
assumptions, and only the simplest sorts of distributions have been analyzed. This raises the
specter of intractability: are there distributions for which certain NP-complete problems remain
hard to solve? We will sketch the main ideas behind a recent result of Levin (1986), which proposes
a notion of completeness in a probabilistic setting. Once again, evidence for hardness is given by
showing that if a particular problem in NP with a specified input distribution can be solved in
expected polynomial time, then every such problem and distribution pair can also be solved so
efficiently. For a more complete discussion, the reader is encouraged to read the column of Johnson
(1984).
A bit of care must be given in formulating the precise notion of polynomial expected time, so
that it is insensitive to both the particular choice of machine and encoding. If μ( x) denotes the
probability that a randomly selected instance of size n is x, and T( x) is the running time on x, then
one would typically define expected polynomial time to require that the sum of μ( x )T( x) over all
instances of size n is O(nk) for some constant k. Instead, we considerμ to be the density function
over the set of all instances I, and require that ExEI μ(x )T(x )1fk /Ix! < oo for some constant k.
Levin's notion of random NP requires that the distribution function M(x) = Ef=1 μ(i) can be
computed in P, where each instance is viewed as a natural number. This notion does not seem to
be too restrictive, and includes all of the probability distributions discussed here. It remains only
to define the notion of reducibility. As usual, "yes" instances must be mapped by a polynomialtime
function f to "yes" instances, and analogously for "no" instances, but one must consider the
density functions as well. The pair (11,μ1) reduces to (12,μ2) if in addition we require that μ2(x)
is at least a polynomial fraction of the total probability of elements that are mapped to x by f.
42
Levin showed that all of random NP reduces to a certain random tiling problem. Instances
consist of integers k :s; N, a set of tile types, each of which is a unit square labeled with letters in
its four corners, and a side-by-side sequence of k such tiles where consecutive tiles have matching
labels in both adjoining corners. We wish to decide if there is a way of extending the sequence to fill
out an N X N square, where adjacent tiles have matching labels in their corresponding corners, and
the top row starts with the specified sequence of tiles. The instances are selected by first selecting
N = n proportionately to n-
Levin showed that all of random NP reduces to a certain random tiling problem. Instances
consist of integers k :s; N, a set of tile types, each of which is a unit square labeled with letters in
its four corners, and a side-by-side sequence of k such tiles where consecutive tiles have matching
labels in both adjoining corners. We wish to decide if there is a way of extending the sequence to fill
out an N X N square, where adjacent tiles have matching labels in their corresponding corners, and
the top row starts with the specified sequence of tiles. The instances are selected by first selecting
N = n proportionately to n-, choosing k uniformly between 1 and N; after choosing the tile types
uniformly, the tiles in the sequence are selected in order uniformly among all possible extensions of
the current sequence. More recently, Venkatesan & Levin (1988) have shown that a generalization
of the problem of edge coloring digraphs ( where for certain subgraphs, the number of edges given
each color is specified) is also random NP-complete.
Cryptography
The basic goal of cryptography is to provide a means for private communication in the presence
of adversaries. Until fairly recently, cryptography has been more of an art than a science. Modern
cryptography has suggested ways to use complexity theory for designing schemes with provable
levels of security. We will not be able to discuss in detail, or even mention, most of the results here.
We intend rather to give a flavor of the problems considered and the results proved. The interested
reader is referred to the survey by Rivest (1990) or to the recent special issue of SIAM Journal on
Computing (April 1988) on cryptography for more results, and for further references.
A traditional solution to the privacy problem uses secret key crypto-systems. Suppose two
parties wish to communicate a message M (a string of O's and l's) of length n, and they agree in
advance on a secret key I(, a string of O's and 1 's of the same length. They send the encrypted
message C = M EB J( instead of the real message M, where EB denotes the componentwise addition
over G F( 2). This scheme is provably secure, in the sense that ( as long as the key I( is kept secret)
an adversary can learn nothing about the message by intercepting. Every string of length n is
equally likely to be the message encoded by the string C. This crypto-system is provably secure,
but it is very inconvenient to use since before each transmission the two parties must agree on a
new secret key.
In the paper that founded the area of modern cryptography, Diffie & Hellman (1976) suggested
the idea of public-key crypto-systems. They proposed that a transmission might be sufficiently
secure if the task of decryption is computationally infeasible, rather than impossible in an information
theoretic sense. Such a weaker assumption might make it possible to securely communicate
without agreeing on a secret key. In a public key crypto-system the participants agree on a
security parameter n. Messages to be sent are divided into pieces of length n.
• Each participant should randomly choose a public encryption key E and a corresponding
secret decryption key D depending on the security parameter n.
• There must be a deterministic polynomial-time algorithm that, given a message M of length
n and an encryption key E, produces the encrypted message E(M).
• Similarly, there must be a deterministic polynomial-time algorithm that, given a message M
of length n and an decryption key D, produces the decrypted message D(M).
• It is required that for every message M, D(E(M)) M.
43
• •
The users of this system should publish a directory of the public keys. When a user named
Bob wants to send a message M to another user named Alice, he looks up Alice's public-key EA in
this directory, and sends her the encrypted message EA(M). By the assumptions, only Alice can
decrypt this message using her private key DA.
The basic ingredient of this system is a "trapdoor" function. The encryption function is assumed
to be a function that is easy to compute, but hard to invert. There are several ways to formalize
this notion. The one used most often is the one-way function, that is a 1 to 1 function f such that
f can be computed in polynomial time, but for n sufficiently large, no polynomial-time algorithm
can invert f on even a polynomial fraction of the instances of length n. In order for a one-way
function to be useful in the above scheme, it has to be a trapdoor function, which is a one-way
function with a key K, such that by knowing the key one can invert the function in polynomial
time.
Given the present state of complexity theory, there is little hope to prove that any function is oneway,
or that a public key crypto-system satisfies the last requirement above. Ideally, one would like
to prove the security of a crypto-system based on the assumption that P =J NP. However, this also
seems to be quite difficult. Complexity theory is mainly concerned with worst-case analysis. For the
purpose of cryptography, average-case analysis, and a corresponding notion of completeness (such
as the one suggested by Levin) would be more appropriate. Furthermore, these inversion problems
have the special property that the inverse is unique, whereas the nondeterministic algorithms for
NP-complete problems have several accepting paths for most instances.
Over the last ten years, several public-key crypto-systems have been suggested and analyzed.
Some have been proven secure, based on assumptions about the computational difficulty of a
particular problem ( e.g., factoring integers), or on the more general assumption that one-way
functions exist.
Rivest, Shamir & Adleman (1978) were the first to suggest such a scheme. Their scheme is based
on the assumed difficulty of factoring integers. Each user of this scheme has to select a number
n that is the product of two random primes. Random primes can be selected using a randomized
primality testing algorithm ( since every (log n )th number is expected to be prime). The public
key consists of a pair of integers ( n, e) such that e is relative prime to >( n ), where >( n) denotes
the number of integers less than n relatively prime to n. A message M is encrypted by C = Me
(mod n). The private key consists of integers (n, d), where d • e = 1 (mod >(n)), and decryption is
done by computing Cd = M (mod n ). Given the prime factorization of n, one can find the required
private key d in polynomial time, but the task of finding the appropriate d given only n and e is
equivalent to factoring.
Rabin (1979) suggested a variation on this scheme in which the task of decryption is equivalent
to factoring, rather than just the task of finding the decryption key. The idea is to encrypt a
message M by C = M2 (mod n). (A slight technical problem that has to be overcome before
turning this into an encryption scheme is that squaring modulo a product of two primes is not a
1-1 function, but is rather 4-1.) An algorithm that can extract square roots modulo n can be used
to factor: choose a random integer x and let the algorithm find a square root y of the integer ( x2
44
mod n ); with probability 1/2, the greatest common divisor of x -y and n is one of the two primes
used to create n.
mod n ); with probability 1/2, the greatest common divisor of x -y and n is one of the two primes
used to create n.
1, ••• ,an)· A message M, that is a 0-1 vector of length n, is encrypted by the inner
product C =(a· M). One problem with this scheme is that there is no apparent private key that
can make the task of decryption easier for the intended receiver. Crypto-systems based on this
idea have built some additional trapdoor information into the structure of the vector a, and so
the decryption problem is based on restricted variant of the subset sum problem. Furthermore, as
discussed in the previous subsection, randomly chosen subset sum problems can be easy to solve.
In an innovative paper, Shamir (1982) used Lenstra's integer programming algorithm to break the
Merkle-Hellman scheme, i.e., he gave a polynomial-time decryption algorithm that does not need
the secret key. Since then, several other subset sum-based schemes have been broken by clever use
of Lovasz's basis reduction algorithm (see Chapter 19).
The schemes mentioned so far have encrypted messages of length n, for some given security parameter
n. An apparent problem with this approach is that even if we prove that no polynomial-time
algorithm can decrypt the messages, it still might be possible to deduce some relevant information
about the message in polynomial time. Goldwasser & Micali (1984) suggested encrypting messages
bit by bit to achieve provable levels of security. How does one encrypt a single bit? Deterministic
encryption schemes cannot be used repeatedly when there are only two possible messages. A natural
solution to encrypt bits more securely is to append a number of random bits to the one bit
of information, and encrypt the resulting longer string. Goldwasser & Micali proposed a different
randomized bit-encryption scheme for which they can prove security.
The Goldwasser-Micali scheme is based on the assumed difficulty of the quadratic residuousity
problem. An integer xis a quadratic residue modulo n if x = y2 (mod n) for some integer y.
The Jacobi symbol(~) is a 0, +1, -1 valued function, that gives some help in recognizing quadratic
residues. It can be computed efficiently, and (~) = 1 for all quadratic residues. The quadratic
res id uousity problem is to decide for given integers n and x with ( ~) = 1 whether x is a quadratic
residue modulo n. The Goldwasser-Micali encryption scheme works as follows: a public key consists
of an integer n that is the product of two large primes, and a quadratic non-residue y with (;) = 1.
The bit 0 is encrypted by a random quadratic residue r2 (mod n ), the bit 1 is encrypted by a
random quadratic non-residue of the form r2y (mod n). The task of distinguishing encryptions of
O's from encryptions of l's is exactly the quadratic residuousity problem. Decryption is easy for
the intended receiver, who knows the prime factorization of n. Yao (1982) has extended this result
by proving that a secure randomized bit-encryption scheme exists if a one-way function exists.
For a public key crypto-system, it is does not seem to be very restrictive to further assume that
E(D(M)) = M for every message M. Diffie & Hellman noticed that such a system can be used
to solve the signature problem, so that the participants can electronically sign their messages.
When Bob sends a message M to Alice he can use his private decryption key DB to append the
"signature" DB(M) after the message M. Given such a signature, Alice can use Bob's public key
EB to convince herself that the message came from Bob, exactly as she received it. The livest,
Shamir & Adleman scheme has this additional property, and therefore can be used for signatures
as well. Rabin's scheme is, in fact, more appropriate for signatures then for encryption, since it is
45
not a 1-1 function.
not a 1-1 function.
2 mod n).
The first signature scheme that is secure against chosen message attacks was given by Goldwasser,
Micali & Rivest (1988). The proof of the security of the scheme is based on the assumed
difficulty of certain number-theoretic functions. More recently, Bellare & Micali (1988) have given
a signature scheme that is secure against chosen message attacks and is based on the existence of
trapdoor functions.
A further related issue is that of generating pseudo-random numbers. When random numbers
are used in algorithms and crypto-systems it is essential that the random bits used are unbiased and
independent. The speed or the reliability of the algorithm, and the security of the crypto-system
depend on the quality of the random numbers used. Natural sources of randomness ( such as coins,
or noise diodes) are fairly slow in generating random bits. On the other hand, for both of these
applications, truly random bits could be replaced by a sequence of bits that no polynomial-time
algorithm can distinguish from truly random bits. A pseudo-random number generator takes
a truly random string x (a seed) of length n and expands it to a pseudo-random string y of
length nk for some constant k. The quality of the pseudo-random number generator is measured
by certain statistical tests. A pseudo-random number generator passes the next bit test if after
seeing the first i bits of y for some i < nk no polynomial-time algorithm can predict the next bit
with probability 1/2+n-c for any constant c. Most computers have built-in pseudo-random number
generators, and some of the standard algorithms used have been proven to output inappropriate
sequences by clever use of Lovasz's basis reduction algorithm. Such examples include the linear
congruential generator (where the seed consists of integers a, b, m and x0, and the pseudo-random
numbers are generated by the recurrence Xi+i = axi + b (mod m)), and the binary expansion of
algebraic numbers ( where the seed is some representation of the algebraic number).
The first provably secure pseudo-random bit generator was developed by Blum & Micali (1984).
They proved that pseudo-random bit generators exist, based on an assumption related to the
existence of one-way functions. They assume that there exist permutations f of the set of nbit
integers and functions B from then-bit integers to {O, 1}, such that the composite function
B(f ( x)) can be computed in polynomial time, but the probability that a polynomial-time algorithm
computes B(y) for a randomly chosen y oflength n (for a sufficiently large n), is less than 1/2 + nc
for any constant c. The Blum-Micali pseudo-random number generator produces the sequence
bo,••·,bk, defined by bi= B(f(xk-i)), and Xi+i = J(xi), where the random seed is used to select
the functions used and the initial vector x0• They also give particular suggestions for these functions
based on the assumed difficulty of the discrete logarithm problem. For a prime p, an integer g is a
generator modulo p if every non-zero integer x modulo p can be written as x = ge (mod n) for
an appropriate integer exponent e < (n). We shall use the notation e =indexp,9(x). The discrete
logarithm problem is to compute indexp,g(x) given a prime p, a generator g and an integer x.
The seed of the Blum-Micali generator is used to randomly select a large prime p, a generator g
and a starting point x0• The pseudo-random numbers are defined using the above scheme with the
46
functions f(x) = gx (mod p) and B(x) = 1 if indexfunctions f(x) = gx (mod p) and B(x) = 1 if index,p(x):::; (p -1)/2. The defined pseudo-random
number generator can be proved to pass the next bit test based on the assumption that the discrete
logarithm problem is difficult.
One might wonder whether certain pseudo-random number generators pass statistical tests other
than the next bit test. However, Yao (1982) proved that if a pseudo-random number generator
passes the next bit test then it passes any statistical test, i.e., no randomized polynomial-time
algorithm can distinguish the generated pseudo-random numbers from truly random numbers.
In some of the above cryptographic applications, one can imagine that it might be useful to
be able to convince someone that a number is the product of two primes, without telling the
two primes themselves. Interactive proof systems make this possible. In fact, Goldwasser, Micali
& Rackoff (1985) invented their notion of interactive proof systems for exactly this application.
Informally, an interactive proof of a statement is said to be a zero-knowledge proof if the verifier
cannot learn anything from the proof except the validity of the statement. Imagine, for example,
a proof that a graph has a Hamiltonian circuit that gives no help in finding the circuit. There are
several ways to formalize this notion. The one we shall use is computational zero knowledge. We
say that an interactive protocol is a zero-knowledge protocol if the verifier by himself (without
the participation of the prover) can generate a sequence of communication that is indistinguishable
in polynomial time from the true transcript of the conversation. Consider the example of the
interactive proof for the graph non-isomorphism problem. At first sight, this seems to be a zeroknowledge
protocol, since (so long as the verifier does not deviate from the protocol) he always
knows the prover's next move, and hence could generate the conversation himself. There are zeroknowledge
protocols for the graph non-isomorphism problem, but this protocol is not, in fact,
zero-knowledge, since the verifier can use it to test if a third graph is isomorphic to one of the
two in the input ( i.e., the verifier can gain extra information by deviating from the protocol).
Goldreich, Micali & Wigderson (1986) and, subsequently but independently, Brassard & Crepeau
(1986) proved that every language in NP has a zero-knowledge interactive proof. The Goldreich,
Micali, & Wigderson protocol is based on the existence of one-way functions, whereas the Brassard
& Crepeau protocol is based on the assumed difficulty of certain number-theoretic functions.
To prove that all languages in NP have zero-knowledge proofs, one merely has to provide a zeroknowledge
proof for a single NP-complete problem. The following protocol for the Hamiltonian
circuit problem is due to Blum. Suppose that the prover, Alice, knows a Hamiltonian circuit C
in a graph G, and she wants to convince the verifier, Bob, of this fact without showing him the
circuit. The basic cryptographic tool needed for the protocol is a secure bit-encryption scheme.
Alice must be able to encrypt a bit, so that Bob has no chance to figure out on his own what
the bit is, but later Alice can prove which bit she encrypted, if she chooses to do so. To convince
Bob that a graph has a Hamiltonian circuit, Alice chooses a random permutation P, and uses this
permutation to obtain a random isomorphic copy G' of the graph G. She encodes the permutation,
and the n( n -1) /2 bits representing the adjacency matrix of the graph G'. Bob can choose either
to ask Alice for a proof that the encoded graph G' is isomorphic to the original, or to ask for
a proof that G' has a Hamiltonian circuit. In the first case, Alice decrypts every encrypted bit,
thereby providing the permutation P and the graph G'. In the second case, she decrypts only the
bits corresponding to edges participating in the Hamiltonian circuit C. If Alice does not know a
Hamiltonian circuit, then she will be caught cheating with probability at least 1/2. Therefore, this
protocol is an interactive proof that the graph is Hamiltonian. It is intuitively clear that this is a
47
zero-knowledge protocol (assuming that the encryption function is secure) since Bob can either ask
to see a random isomorphic copy of the graph, or a random permutation of the sequence of nodes
in a circuit, neither of which contains any information.
zero-knowledge protocol (assuming that the encryption function is secure) since Bob can either ask
to see a random isomorphic copy of the graph, or a random permutation of the sequence of nodes
in a circuit, neither of which contains any information.
2 (mod n) and a 1
by a randomly chosen square r2 x ( mod n). Since Bob succeeded in proving that x is a square
modulo n, then x is indeed a square with very high probability. But, if x is a square, then either
0 or 1 is encrypted by a random square, and Bob cannot distinguish them, no matter how much
computational power he has. However, Alice can prove which of the two bits she was encrypting,
by revealing r. For Alice to cheat ( e.g., to claim that an encryption of 1 is an encryption of 0) she
has to be able to find the square root of x, which is equivalent to factoring. This protocol is dual of
the one given by Blum, in the sense that here Alice can cheat if she has unlimited computational
power, whereas in the previous one, Bob can. The latter protocol has the conceptual advantage
that in order for Alice to cheat, she must be able to factor n on-line, whereas in Blum's protocol,
Bob can go home and try to use all of his computational power to figure out Alice's secret after
the protocol is over.
It is a quite powerful tool to have zero-knowledge protocols for all of NP. As an example,
consider the following general problem: suppose that n people each have their own input, and they
want to jointly compute a function of these inputs, without giving unnecessary information about
their private inputs to the others. Interesting special cases of this problem include the majority
function (that is used for voting), the sum function (for a situation when the participants wish to
compute their average income), or comparison (for the millionaire's problem, when two millionaires
want to decide which one of them is richer). Special protocols to handle some of these problems
have been suggested earlier. Goldreich, Micali & Wigderson (1987) used zero-knowledge proofs and
other powerful techniques discovered in the last 10 years that we could not cover here, to design
protocols to compute any polynomially computable function in zero-knowledge.
Inside P
In this section we shall focus on the complexity of problems in P. After proving that a problem is
in P, the most important next step undoubtedly is to find an algorithm that is truly efficient, and
not merely efficient in this theoretical sense. However, we will not address that issue, since that
is best deferred to the individual chapters that discuss polynomial-time algorithms for particular
48
problems. In this section, we shall address questions that relate to machine-independent complexity
classes inside P.
problems. In this section, we shall address questions that relate to machine-independent complexity
classes inside P.
We shall also consider the space complexity of problems in P. Recall that the parallel computation
thesis suggests that there is a close relationship between sequential space complexity and
parallel time complexity. We will show another proven case of this thesis: every problem in C
( and even in NC) can be solved extremely efficiently in parallel. This is one source of interest in
the complexity class C. Another source is that this complexity class is the basis for the natural
reduction that helps to distinguish among the problems in P in order to provide a notion of a
hardest problem in P.
Logarithmic space
The most general restriction on the space complexity of a language L that is known to imply
that L E P is logarithmic space. Observe that C ~ NC ~ P, where the last inclusion follows,
for example, from Theorem (3.6). The typical use of logarithmic space is to store a constant
number of pointers, e.g., the names of a constant number of nodes in the input graph, and in
some sense, this restriction attempts to characterize such algorithms. Although C contains many
combinatorial examples, we will postpone their presentation until later in this section, when we
consider a narrower complexity class defined by parallel computation. Instead, we will focus on the
nondeterministic and randomized analogues, for which there are languages that appear to be best
characterized in terms of their space complexity.
To define the notion of a logarithmic-space reduction, we introduce a variant of logarithmicspace
computation that can produce output of superlogarithmic size. We say that a function f
can be computed in logarithmic space if there exists a Turing machine with a read-only input
tape and a write-only output tape that, on input x, halts with f(x) written on its output tape and
uses at most logarithmic space on its work tapes. A problem L1 reduces in logarithmic space
to L2 if there exists a function f computable in logarithmic space that maps instances of L1 into
instances of L2 such that x is a "yes" instance of L1 if and only if /( x) is a "yes" instance of L2.
Let L1 cx.10gL2 denote such a logarithmic-space reduction ( or log-space reduction, for short).
In order for this notion of reducibility to be useful, one has to prove that the ex.log relation is
transitive. This is not as apparent as in the case of the exp reduction, but it can be shown fairly
easily. As a corollary, we see that if L1cx.1ogL2 and there were a log-space algorithm for L2, then we
could obtain a log-space algorithm for L1.
It is worth noting that all of the polynomial-time reductions mentioned in the discussion of
NP-completeness are based on local replacements, and therefore are log-space reductions. In fact,
log-space reduction is sometimes used in place of polynomial-time reduction in the definition of
NP-completeness and PSPAC£-completeness. A problem L1 is NC-complete if L1 ENC and
for all L E NC, Lcx1ogL1. The composition argument given above yields the following result.
(5.1) Theorem. For any NC-complete problem L, LE C if and only if C = NC.
49
The following theorem, due to Savitch (1970), shows that the complexity of the directed reachability
problem can best be characterized by its space requirement in a nondeterministic computation.
The following theorem, due to Savitch (1970), shows that the complexity of the directed reachability
problem can best be characterized by its space requirement in a nondeterministic computation.
Theorem. The directed graph reachability problem is NC-complete.
Proof. A nondeterministic log-space algorithm for the directed graph reachability problem can
guess the nodes in an ( s, t)-path one at a time, and at each step verify that the graph contains
the arc that connects the nodes in the guessed path. To show that the problem is N £-complete,
we must give a log-space reduction from each problem in N £ to the directed graph reachability
problem. Let L be in N £ and let M denote a nondeterministic log-space Turing machine that
recognizes L. We can assume without loss of generality that M has a single accepting configuration
( e.g., it erases its work tapes and enters a special accepting state before halting). For a given input
x, we define a directed graph whose nodes correspond to the configurations of the nondeterministic
Turing machine M with input x. Note that the number of these configurations is bounded by a
polynomial in the size of x. Two configurations are joined by an arc if one can follow the other via
a single transition. The input x is in L if and only if this graph contains a path from the starting
configuration to the accepting configuration. D
In view of the above result, it was very surprising when Alelunias, Karp, Lipton, Lovasz and
Rackoff (1979) showed that the undirected graph reachability problem, the analogue of the
directed graph reachability problem for undirected graphs, can be solved by a randomized Turing
machine using logarithmic space. We define the class 'RC to be the log-space analogue of 'RP. A
language Lis in 'RC if there exists a randomized Turing Machine RM that works in logarithmic
space, such that each input x that RM accepts along any computation path is in L and for every
x E L, the probability that RM accepts x is at least 1/2.
(5.3) Theorem. Undirected graph reachability is in 'RC.
Proof. Let G = (V, E) denote the input graph, s and t the two specified nodes, n the number of
nodes and m the number of edges.
The algorithm attempts to find the required path by following a random walk in the graph,
starting from the node s. At each step, an edge incident to the current node is chosen uniformly
at random and the walk is continued along this edge. The algorithm stops either if t is reached, or
if 4nm steps have been made. We will show that if t ands are in the same connected component
of the graph, then with probability at least 1/2, a random walk will reach tin at most 4mn steps.
For the purpose of the proof, we may assume that the graph is connected. We first claim that
the random walk will use each edge of the graph in each direction with the same frequency, 1 / ( 2m).
Let Pv denote the frequency that the random walk is at the node v, i.e., the limit as n tends to
infinity of the fraction of the first n steps that the random walk is at the node v. (The existence of
this limit can be proved using the theory of Markov chains.) Let deg( v) denote the degree of the
node v. The frequency that the random walk uses a particular edge to leave v is ev = Pv/ deg( v ).
Before the walk leaves v it has to enter v, and therefore ev = C'f:,;(w,v)EEew)/deg(v). This implies
that ev is the same for each node v E V, and therefore ev = 1/(2m). Consequently, each edge
50
(and, in particular, an edge entering t) is going to be used, on the average, once every 2m steps.
However, this does not mean that the expected time to reach t is at most 2m, since this is only
true as a limit, and might not be true at the beginning of the walk.
(and, in particular, an edge entering t) is going to be used, on the average, once every 2m steps.
However, this does not mean that the expected time to reach t is at most 2m, since this is only
true as a limit, and might not be true at the beginning of the walk.
Why is this theorem about undirected graphs? One could conceivably run the same algorithm
for directed graphs. The first problem with this approach is that the random walk might get stuck
in a strongly connected component from which there is no path to the rest of the graph. This
problem could be solved by repeatedly restarting the walk. However, there is a more fundamental
aspect to this problem. Consider the graph that consists of an (s, t)-path and an arc from each
node back to the source s. This graph is strongly connected, and yet the expected time to reach
t from s is exponential in the length of the path. A log-space Turing machine cannot count an
exponential number of steps, so it is not clear how one could correctly stop the computation if t is
not reachable from s. There could be a way out; termination could be based on a random event that
has an exponentially small chance of occurring. Since the directed path problem is N .C-complete,
this would prove that 'R.C = NC. However, our definition of a randomized Turing machine does
not allow such an algorithm. We have assumed that each branch of the computation tree uses the
same number of random bits, and in an algorithm based on the previous idea, some branches of
the computation will never stop.
One can think of the classes .C and N .C as lower-level analogues of P and NP. By studying the
relationships of .C, NC and co-N .C, one hopes to better understand the relationship of deterministic
and nondeterministic computation. It is this point of view that makes the following theorem of
Immerman (1988) and Szelepcsenyi (1987) one of the biggest surprises in recent developments in
complexity theory.
(5.4) Theorem. NC = co -N .C
The proof uses a definition of nondeterministically computing a function, similar to the notion
of ,-reduction that has been mentioned at the end of the subsection on NP-completeness. We
say that a function f(x) can be computed in nondeterministic logarithmic space if there
is a nondeterministic log-space Turing machine that, on input x, outputs the value f( x) on at
least one branch of the computation and on every other branch either stops without an output
or also outputs f(x). If f(x) is a Boolean function, then we say that the language L defined by
f ( x) = 1 is decided in nondeterministic logarithmic space, which is equivalent to L being in
N.Cnco-N.C.
We will prove Theorem (5.4) by showing that the N £-complete directed graph reachability
problem can be decided in nondeterministic logarithmic space. Given a directed graph G = (V, A),
51
a source nodes and an integer k, let f(G,s,k) denote the number of nodes reachable from the
a source nodes and an integer k, let f(G,s,k) denote the number of nodes reachable from the
(5.5) Lemma. The directed graph reachability problem is decidable in nondeterministic logarithmic
space if and only if the function f ( G, s, k) can be computed in nondeterministic logarithmic
space.
Proof. To prove the only if direction, we use the fact that the directed graph reachability is N £,complete.
If it is decidable in logarithmic space, then so is the problem to recognize if there is a
path of length at most k. To compute f( G, s, k ), we use the assumed nondeterministic machine
for each node v, to decide if vis reachable from s by a path of length at most k. By counting the
number ofreachable nodes, we obtain f(G,S,k).
To prove the opposite direction, we use the following nondeterministic log-space computation.
First compute f(G, s, n). Then for each node v, (nondeterministically) try to guess a path from s
to v. Count the number of nodes for which a path has been found. If a path has been found tot,
we accept the input. If f( G, s, n) nodes have been reached without finding a path tot, this proves
that t is not reachable from s, so we reject. Finally, if the number of nodes that have been reached
is less than f( G, s, n ), then this is an incorrect branch of the computation, and the computation
stops without producing an output. □
(5.6) Lemma. The function f(G,s,k) can be computed in nondeterministic logarithmic space.
Proof. The proof is by induction on k. The base case to compute f(G,s,O) is trivial. Assume
that a log-space nondeterministic Turing machine has already computed the value f( G, s, k ). The
value J( G, s, k + 1) can be computed by checking the nodes of G one at a time, and counting those
for which a path of length at most k + 1 exists. We shall decide if there exists such a path from s
to a particular node v by using f( G, s, k) to decide if there is a path of length at most k to any of
the predecessors of v; this can be done by using a variant of the algorithm given in the if part of
Lemma (5.5).□
The theorem of Immerman and Szelepcsenyi implies that N SP ACe( S( n)) = co-N SP AC£( S( n))
for every S( n) 2:: log n. This type of upward inheritability result concerning either space or time
complexity classes is usually proved by a so-called padding argument. Let L be a language in
NSPACe(S(n)). Let the language L' consist of the words formed by appending sufficient $'s to a
word oflength n from L to make the length equal to 2S(n) (where we assume that $ is not in the
alphabet of L). It is easy to see that L' ENC(= co-NC). The nondeterministic log-space Turing
machine that proves that L' E co -NC can be used to test non-membership in L rather than L'.
(We simulate appending the correct number of $'s to an input by keeping a counter that maintains
a record of the input head position, when the simulated head would like to read one of imaginary
$'s.) This shows that LE co -NSPACe(S(n)).
The hardest problems in P
One important application oflog-space computation was introduced by Cook (1973), who used
log-space reducibility to introduce a notion of a hardest problem in P. A problem £1 is P-complete
if L1 E P and for all L E P, Lcx1ogL1• The transitivity of the log-space reduction gives the following
theorem.
52
(5.7) (5.7)
Later in this section we shall see that P-completeness also provides evidence that a problem
cannot be efficiently solved in parallel. This fact has greatly increased the interest in P-completeness
and a variety of problems have been shown to be P-complete. Perhaps the most natural example is
the circuit value problem, which was proved P-complete by Ladner (1975b). Given a circuit (with
and, or and not gates) with truth values assigned to its input gates, the circuit value problem is
to compute the output of the circuit.
(5.8) Theorem. The circuit value problem is P-complete.
Proof. A computational model that uses circuits and is equivalent to Turing machines has already
been discussed, and this equivalence suggests the theorem. The circuit value problem is clearly in
P. Given a polynomial-time Turing machine M and an input x, we shall build a circuit that has
value 1 if and only if x is in the language accepted by the Turing machine M. The construction is
done in a way analogous to the proof of Cook's theorem. We encode the entire computation as a
matrix, but instead of constructing a Boolean formula, we build a circuit that computes one row
of this matrix from the preceding one to simulate the computation of the Turing machine. D
When proving other problems in P to be P-complete we shall use the same strategy as was used
for NP-completeness: merely reduce from problems that are already known to be P-complete. To
facilitate later reductions, note that restricted versions of the circuit value problem are P-complete.
For example, this is true for the restricted monotone circuit value problem, where the circuits
use only and and or gates, and each node in the directed acyclic graph that defines the circuit has
both indegree (fan-in) and outdegree (fan-out) at most 2. This can be shown by considering
the inputs and their negations as separate input variables, and then converting the circuit into an
equivalent one that computes the value of every gate and its negation separately.
Dobkin, Lipton and Reiss (1979) proved that each problem in P log-space reduces to the linear
programming problem, and the celebrated result of Khachiyan (1979) showed that it is in P. Valiant
gave a straightforward reduction from a restricted circuit value problem that uses linear constraints
to trace the value computed by the circuit.
The maximum flow problem is an important special case of the linear programming problem.
In this problem, we are given a directed graph G = (V, A) with two specified nodes, the source s
and the sink t, and a non-negative capacity u( a) assigned to each arc a E A. A feasible flow is a
vector f E RA that satisfies the capacity constraints, i.e., 0 ~ f(a) ~ u(a) for each arc a E A,
and the flow conservation constraints, i.e., the sum of the flow values on the arcs incident to a
node v :f. s, tis the same as the sum of the flow values on the arcs incident from v. The value of
a flow is La=(v,t)EA f( a) -La=(t,v)EA f( a). The maximum flow problem is to find a feasible flow
of maximum value. Goldschlager, Shaw and Staples (1982) showed that computing the parity of
the maximum flow is also P-complete. We give a log-space reduction from the restricted monotone
circuit value problem for circuits in which the output gate is an or gate and all gates have fan-in
and fan-out at most 2. Number the gates of the circuit so that wires of the circuit go from higher
numbered gates to lower numbered gates, and the output gate is numbered 0. The nodes of the
graph will correspond to the gates of the circuit with two additional nodes, a source s and a sink
53
t. t.
There is a collection of P-complete problems that are related to simple polynomial-time algorithms.
The maximal stable set problem, in which the objective is to find a maximal (not
maximum) stable set in an undirected graph, can clearly be solved by a simple greedy algorithm.
Similarly, the depth-first search procedure greedily finds a depth-first search tree, which is a
spanning tree of an undirected graph rooted at a given node r, such that for any edge ( u, v) in the
graph, the path from r to one of u and v passes through the other ( see Aho, Hopcroft & Ullman
(1974)). When using one of these procedures, we usually select the first available node in each
step, and so we find a specific solution, the lexicographically-first one. Reif (1985) proved that the
related problem of finding the lexicographically-first depth first search tree is P-complete. Cook
showed that finding the lexicographically-first maximal stable set is P-complete. These results
might be surprising since both of these problems are easy to solve in polynomial time. However,
P-completeness also provides evidence that the problem is not solvable efficiently in parallel. Consequently,
these completeness results support the intuition that these particular procedures are
inherently sequential.
Parallel Computation
Parallel computation gives us the potential of curbing the effects of the intractability of certain
problems by solving them with a large number of processors simultaneously. In studying parallel
algorithms, we shall not be concerned with the precise number of parallel processors used, but
rather their order as a function of the input size. We say that a parallel algorithm using O(p( n))
processors achieves optimal speedup, if it runs in O(t(n )) time and the best sequential algorithm
known for solving the same problem runs in O ( t( n )p( n)) time. Effi.cien t algorithms that reach
optimal ( or near optimal) speedup with a significant number of processors have been found for
many of the basic combinatorial problems. Another aspect of parallel computation is concerned
with the inherent limitations of using many processors to speed up a computation. To be somewhat
realistic, we shall only be interested in algorithms that use a polynomial number of processors.
Consequently, we will focus on the possible speedup of polynomial-time sequential computation by
parallel processing. The interested reader is referred to the survey of Karp and Ramachandran
(1990) for more details and references.
First we define a model of parallel computation. Although many such models have been proposed,
one that seems to be the most convenient for designing algorithms is the parallel random
access machine (PRAM). The PRAM is the parallel analogue of the random access machine;
it consists of a sequence of random access machines called processors, each with its own infinite
local random access memory, in addition to an infinite shared random access memory where each
memory cell can store any integer, and the input is stored in the shared memory. Each processor
knows the input size and its identification number, although otherwise the processors are identical
54
( i.e., they run the same program). Different variants of the basic PRAM model are distinguished by
the manner in which they handle read and write conflicts. In an EREW PRAM ( exclusive-read
exclusive-write PRAM), for example, it is assumed that each cell of the shared memory is only
read from and written into by at most one processor at a time. At the other extreme, in a CRCW
PRAM ( concurrent-read, concurrent-write PRAM), each cell of the memory can be read from and
written into by more than one processor at a time. If different processors attempt to write different
things in the same cell, then the lowest numbered processor succeeds.
( i.e., they run the same program). Different variants of the basic PRAM model are distinguished by
the manner in which they handle read and write conflicts. In an EREW PRAM ( exclusive-read
exclusive-write PRAM), for example, it is assumed that each cell of the shared memory is only
read from and written into by at most one processor at a time. At the other extreme, in a CRCW
PRAM ( concurrent-read, concurrent-write PRAM), each cell of the memory can be read from and
written into by more than one processor at a time. If different processors attempt to write different
things in the same cell, then the lowest numbered processor succeeds.
Let us first review a sequential algorithm for the problem: select a node v and include it in
the stable set, delete v and all of its neighbors from the graph; repeat this procedure until all
nodes have been deleted. Note that this algorithm requires n iterations for a path of length 2n. A
similar approach can still be used for a parallel algorithm. To make the algorithm fast, one selects
a stable set in each iteration (rather than a single node), where the set is deleted along with its
neighborhood. The following simple way to choose a stable set is due to Luby. A processor is
assigned to each node and each edge of the graph. For a graph of order n, the processor assigned
to node v picks a random integer c( v) from 1 to n4• Next, each processor assigned to an edge
compares the values at the two nodes of the edge. The stable set selected consists of those nodes
v for which c( v) is strictly larger than the values assigned to any of its neighbors.
This algorithm clearly finds a maximal stable set, but it is less clear that the algorithm runs
fast. It can be shown that each iteration is expected to remove a constant fraction of the edges, and
consequently, the expected number of iterations is O(log n ). The algorithm can be implemented on
a randomized CRCW PRAM in O(log n) time (if we assume that a processor can choose a random
number of size O(log n) in one step).
Luby gave an elegant version of a technique of Karp and Wigderson that is used to convert
certain randomized algorithms into deterministic ones. The technique can be used if the analysis
of the randomized algorithm depends only on pairwise, rather than fully, independent random
choices. The idea is that pairwise independent random integers between 1 and n4 can be chosen
from a sample space of polynomial size. Each iteration can then be run for each point in the sample
space simultaneously, and this ensures that a good sample point is used. This method can be used
to convert the above randomized algorithm into a deterministic one.
A simple, but important example of a deterministic parallel algorithm is for the directed graph
reachability problem. One way to solve this problem is by repeatedly squaring the adjacency matrix
of the graph. More precisely, after the ith iteration, the variable av,w will indicate whether there
exists a path oflength at most 2i from v tow. To update this information we assign a processor to
every triplet of the nodes. In the ith iteration, the processor assigned to the triplet ( v, u, w) checks
whether there exists a path from v to w through u, that has fewer than 2i-l nodes both before and
after u. This algorithm, when implemented on a CRCW PRAM uses 0( n3) processors, and runs
in O(log n) time.
55
Parallel computation can also be considered in the framework of other computational models.
By using many processors, one can evaluate a circuit in time proportional to its depth. Hence,
the depth of a circuit is the natural analogue of parallel time. When using families of circuits as
a model of computation, one must make certain uniformity assumptions. In this case, the usual
uniformity assumption is log(space)-uniformity; that is, for inputs of size n, the circuit must be
generated by a Turing machine using space O(log n ). This allows us to define parallel complexity
classes. For i 2: 0, the class ACi is defined to be the class oflanguages that are decidable by a family
of logspace-uniform circuits with and, or, and not gates of depth O(logi n) and size polynomial inn.
The class NCi is defined in the same way with the additional restriction that each gate in the circuit
has constant fan-in. Clearly, Nci ~ ACi. Furthermore, an and gate or an or gate with a polynomial
number of inputs can be computed by a constant fan-in circuit in depth O(log n ). Consequently,
ACi ~ Nci+i. Notice that any function that depends on all of its inputs requires depth n(log n) to
be computed on a constant fan-in circuit, and an analogous bound holds in any reasonable model
of parallel computation. Analogous to P, membership in the class NC= Ui>oNCi = Ui>oACi has
become accepted as the theoretical model of efficiently computable in parallel. -
Parallel computation can also be considered in the framework of other computational models.
By using many processors, one can evaluate a circuit in time proportional to its depth. Hence,
the depth of a circuit is the natural analogue of parallel time. When using families of circuits as
a model of computation, one must make certain uniformity assumptions. In this case, the usual
uniformity assumption is log(space)-uniformity; that is, for inputs of size n, the circuit must be
generated by a Turing machine using space O(log n ). This allows us to define parallel complexity
classes. For i 2: 0, the class ACi is defined to be the class oflanguages that are decidable by a family
of logspace-uniform circuits with and, or, and not gates of depth O(logi n) and size polynomial inn.
The class NCi is defined in the same way with the additional restriction that each gate in the circuit
has constant fan-in. Clearly, Nci ~ ACi. Furthermore, an and gate or an or gate with a polynomial
number of inputs can be computed by a constant fan-in circuit in depth O(log n ). Consequently,
ACi ~ Nci+i. Notice that any function that depends on all of its inputs requires depth n(log n) to
be computed on a constant fan-in circuit, and an analogous bound holds in any reasonable model
of parallel computation. Analogous to P, membership in the class NC= Ui>oNCi = Ui>oACi has
become accepted as the theoretical model of efficiently computable in parallel. -
(5.9) Theorem. For i 2: 1, ACi is the class of languages recognizable by the above variant of the
CRCW PRAM with a polynomial number of processors in O(logi n) time.
Proof. We first show how to simulate a family of circuits by a PRAM. Once we have the appropriate
circuit from the family, this is easy to do. The only difficulty lies in constructing the circuit. By
our uniformity assumption, the circuit for inputs of size n can be generated by a Turing machine
in space O(log n ), and so the problem of deciding if ( i, j) is an arc of the circuit is in C ~ NC. We
have seen an O(log n )-time CRCW PRAM algorithm for the directed graph reachability problem.
To finish this proof, we combine the proof of the N £-completeness of the directed graph reachability
problem with the above algorithm, to prove that any language in NC can be recognized in time
O(log n) by a CRCW PRAM.
To prove the other direction we have to simulate a CRCW PRAM by a logspace-uniform family
of circuits. The only difficulty is in simulating indirect addressing. It is a good exercise to show
that indirect addressing can be simulated by unbounded fan-in circuits in constant depth. □
The maximal stable set algorithm of Luby discussed earlier uses a randomized version of
the CRCW PRAM. In a randomized CRCW PRAM algorithm, every processor can choose a
logarithmic-size random number in one step. We define the class 'RNC to consist of all languages
L for which there exists a randomized CRCW PRAM algorithm that runs in polylogarithmic time
using a polynomial number of processors and both accepts each word in L and rejects each word
not in L with probability at least 2/3.
The restriction in the definition of NCi is similar to the exclusive-write restriction in the PRAM
model. However, this similarity is somewhat misleading. Hoover, Klawe and Pippenger (1984)
proved that a constant fan-in circuit can be converted into an equivalent circuit with both constant
fan-in and constant fan-out, while increasing both the size and depth by only a constant factor.
56
Consequently, for i 2: 2, any language in NCi can be recognized by an EREW PRAM in O(logi n)
Consequently, for i 2: 2, any language in NCi can be recognized by an EREW PRAM in O(logi n)
Ruzzo (1981) has discovered a characterization of the class NCi using alternating Turing machines.
He proved that for i 2: 2, the class NCi is the class of languages that can be recognized
by an alternating Turing machine simultaneously using O(log n) space and O(logi n) time. Ruzzo
gave a more sophisticated definition for uniformity of circuit families, that is more restrictive for
circuits of depth O(log n ). Under his definition, the above equation also holds true for i = 1.
The proof of Theorem (5.9) also shows that NC ~ AC1• As a consequence, a log-space reduction
can be simulated efficiently in parallel, and therefore P-completeness provides evidence that a
problem is not efficiently solvable in parallel.
(5.10) Theorem. For any P-complete problem L, LE NC if and only if NC= P.
On the other hand, NC1 ~ C. More generally, any logspace-uniform family of constant fan-in
circuits of depth at least log n, can be simulated by a Turing machine in space proportional to the
depth of the circuit, that is, NCi ~ DSP ACC(logi n ). As a corollary, we obtain the following chain
of containments:
We have seen that NC is contained in Ui>iDSPACC(logi n). By the analogue of Theorem (3.7)
for space complexity, this implies that NC =f:: PSPACC. It is fairly easy to argue that NC0 does
not contain the or function, and therefore NC0 =f:: AC0• The only further separation result known
is that AC0 =f:: NC1. This is a landmark result of Furst, Saxe and Sipser (1981) and Ajtai (1983),
and will be discussed in the next section.
On a CRCW PRAM one can compute the or of n variables in one step. This suggests that
the CRCW PRAM is not a reasonable model of parallel computation. In a more realistic model,
computing any function that depends on all of its inputs requires n(log n) time. The addition
problem takes two n-bit binary numbers as inputs (where each bit is stored in a separate memory
location) and computes their (n+l)-bit sum. Surprisingly, even the addition problem can be solved
in constant time on a CRCW PRAM. It is a good exercise to figure out how to add two numbers
using constant-depth, unbounded fan-in circuits.
Cook, Dwork and Reischuk (1986) have considered the limitations of the CREW PRAM ( concurrentread
exclusive-write PRAM). They considered the or function, and showed that it required n(log n)
time. This result is subtler than it seems since with clever use of the concurrent read option one
can actually do better than log2 n.
Next we shall give examples of simple parallel algorithms. First we consider the arithmetic
operations. We have just mentioned that addition (and similarly subtraction) can be done in
AC0. The standard shift-and-add algorithm reduces multiplication to the addition of n numbers.
Consequently, multiplication can be done in AC1• In fact, multiplication is in NC1. This fact can
be established by noticing that the addition of three numbers can be reduced to the addition of
two in NC0 (by separately maintaining the carries). Beame, Cook and Hoover (1986) showed that
integer division can also be computed by depth O(log n) constant fan-in circuits. However, the
construction is not log-uniform. The best log-uniform family of circuits for integer division is due
57
to Reif (1986) and has depth O(lognloglogn). Evaluating arithmetic expressions with addition
and multiplication operations has been considered by Brent (1974) who gave an O(logto Reif (1986) and has depth O(lognloglogn). Evaluating arithmetic expressions with addition
and multiplication operations has been considered by Brent (1974) who gave an O(logn)-time
parallel algorithm. Miller and Reif (1985) have developed a general tree contraction method, that
can manage the binary tree associated with the parenthesis structure of the arithmetic expression.
The resulting algorithm runs in O(log n) time and uses 0( n) processors.
Another important area of efficient parallel algorithms deals with the sorting problem. One can
easily sort in AC0, and consequently in NC1. However, the trivial algorithm uses 0( n2) processors.
The first deterministic EREW PRAM algorithm that runs in O(log n) time using 0( n) processors
was given by Ajtai, Komlos and Szemeredi (1983). However, the constant in the running time is
too large for the method to have any practical use. Cole (1988) later gave a simpler method that
achieved the same asymptotic bounds.
The Ajtai, Komlos, Szemeredi algorithm is based on their O(logn)-time sorting network. Consider
the following restricted version of the sorting problem. Suppose that n numbers are given in
memory locations 1, ... , n. The basic operation for the algorithm is to compare the numbers in
two memory locations and rearrange them so that the smaller number is in the location with the
smaller index. A comparison network can compare and possibly swap the numbers in several
disjoint pairs of memory locations at the same time. The network has to be oblivious, in the sense
that the pairs to be compared cannot depend on the outcomes of previous comparisons. A comparison
network is a sorting network if it is guaranteed to rearrange the elements in non-decreasing
order. The construction of the Ajtai, Komlos and Szemeredi sorting network is based on constantdegree
expander graphs. A bipartite graph with color classes A and B such that IAI = IBI is an
E-expander if for any SC A satisfying ISi ~ IAl/2 it follows that the number of neighbors of Sin
Bis at least (1 + E)ISI. For every constant E < 1 there exists a constant d such that the union of
d randomly chosen perfect matchings is an £-expander with non-zero probability. This establishes
that constant-degree £-expanders with E > 0 exist. The first deterministic construction of constant
degree expanders was given by Margulis (1973). Since then, several simpler constructions have
been given that yield better parameters.
A classical (and much more practical) sorting network, that takes O(log2 n)-time, is the sorting
network of Batcher (1968). It repeatedly uses an O(log n)-time merging network, that is a
comparison network guaranteed to rearrange the elements in non-decreasing order if the first n/2
and the second n/2 elements of the input are separately in non-decreasing order. The Batcher
merging network can be defined recursively. Assume for simplicity that n is a power of 2. To merge
two lists of length n/2, we first merge the lists of length n/ 4 consisting of the elements in the odd
and even numbered memory cells of the two lists separately. We then compare and possibly swap
the pairs in locations 2i and 2i + 1 for i = 1, ... , n/2 -1 in parallel. It is not difficult to see that
this defines an O(log n )-time merging network.
Interestingly, many of the simple parallel graph algorithms are based on matrix operations.
The parallel directed graph reachability algorithm presented earlier was in fact based on Boolean
matrix multiplication. We shall discuss matrix operations before the more combinatorial elementary
graph problems. General matrix multiplication can easily be done on a CRCW PRAM in O(log n)
time using O(n3) processors. In fact, using the ideas from the more efficient matrix multiplication
techniques, one can save somewhat on the number of processors used.
Csanky (1976) has given an NC algorithm to compute the determinant of a matrix in O(log2 n)
58
time. The method actually computes the characteristic polynomial of a matrix A ( i.e., the polynomial
det(A-xl), whose constant term is detA). In fact, Csanky's method can be used to compute
the characteristic polynomial of a matrix over any field of characteristic O, where the field operations
are assumed to take unit time. Berkowitz (1984) and Chistov have independently extended
this result to matrices over arbitrary fields. Over fields of characteristic 0, the last non-zero term
of the characteristic polynomial gives the rank of the matrix. This is not true over arbitrary fields.
Recently, deterministic parallel algorithms to compute the rank of a matrix over arbitrary fields
have been independently discovered by Chistov and Mulmuley (1987), improving on the randomized
algorithm of Borodin, von zur Gathen and Hopcroft (1982). As a corollary, we get a parallel
algorithm to solve systems of linear equations.
time. The method actually computes the characteristic polynomial of a matrix A ( i.e., the polynomial
det(A-xl), whose constant term is detA). In fact, Csanky's method can be used to compute
the characteristic polynomial of a matrix over any field of characteristic O, where the field operations
are assumed to take unit time. Berkowitz (1984) and Chistov have independently extended
this result to matrices over arbitrary fields. Over fields of characteristic 0, the last non-zero term
of the characteristic polynomial gives the rank of the matrix. This is not true over arbitrary fields.
Recently, deterministic parallel algorithms to compute the rank of a matrix over arbitrary fields
have been independently discovered by Chistov and Mulmuley (1987), improving on the randomized
algorithm of Borodin, von zur Gathen and Hopcroft (1982). As a corollary, we get a parallel
algorithm to solve systems of linear equations.
Consider the problem of testing if a graph is planar. Ja'Ja' and Simon (1982) have observed
that the planarity of a graph can be tested efficiently in parallel using Tutte's characterization that
states that a straight line embedding of a 3-connected graph can be obtained by solving a set of
linear equations. One drawback of this approach is that Tutte's theorem only works for 3-connected
graphs. Ja'Ja' and Simon extended this approach to check whether an arbitrary graph is planar, but
the algorithm only constructs the plane embedding for 3-connected graphs. Miller and Reif (1985)
have used the tree contraction method mentioned earlier to extend this algorithm to arbitrary
graphs. Klein and Reif (1988) have recently given an efficient planarity testing algorithm that does
not rely on matrix inversion, and is also more efficient in terms of the number of processors used.
We next turn our attention to elementary graph searching techniques. A parallel algorithm for
the directed graph reachability problem based on Boolean matrix multiplication has been discussed
earlier. This algorithm can be used to construct a breadth-first search tree of the graph. It
is an important open problem to find a polylogarithmic-time parallel algorithm for breadth-first
search that uses fewer processors than are needed for matrix multiplication. The undirected graph
reachability problem can be solved more efficiently. Shiloach and Vishkin (1982) have given an
implementation of an algorithm by Hirschberg, Chandra and Sarwate (1979) that takes O(logn)
time on a CRCW PRAM using 0( m + n) processors. Although this algorithm is fairly efficient, it
is still not optimal in the sense that the product of the time and the number of processors is more
than the running time of the best sequential algorithm known for the problem. Gazit (1986) gave an
O(logn)-time randomized CRCW PRAM algorithm that only uses O((m+n)/logn) processors. No
deterministic O (log n )-time optimal algorithm is known. These algorithm can be used to construct
a spanning forest of the graph, but not necessarily the breadth-first search tree.
Another elementary search technique is depth-first search. No efficient deterministic parallel
algorithm is known for finding a depth-first search tree. Aggarwal and Anderson (1988) have given
59
a randomized parallel algorithm for finding a depth-first search tree (first for undirected graphs,
and later with Kao (1989), they extended the result to directed graphs). The algorithm is based
on a randomized algorithm to find a perfect matching in parallel that, in turn, is based on matrix
inversion. As a consequence, the algorithm uses a large number of processors, and is not efficient
in practice.
a randomized parallel algorithm for finding a depth-first search tree (first for undirected graphs,
and later with Kao (1989), they extended the result to directed graphs). The algorithm is based
on a randomized algorithm to find a perfect matching in parallel that, in turn, is based on matrix
inversion. As a consequence, the algorithm uses a large number of processors, and is not efficient
in practice.
0• Lovasz has used the ear-decomposition algorithm to
find a minimum cost branching of a directed graph. (A branching is a directed tree where each
node except a specified root has indegree 1.) fu undirected graphs, an ear-decomposition can be
found more efficiently. Lovasz's algorithm for undirected graphs runs in O(log n) time and uses a
linear number of processors. The ear-decomposition algorithm does not in general give an open
ear-decomposition ( even if one exists). Maon, Schie her and Vishkin ( 1986) modified the algorithm
to find an open ear-decomposition with the same complexity. Ear-decomposition techniques can
be used to check whether a graph is 2-edge or 2-node connected, though a parallel algorithm using
O(log n) time and a linear number of processors had been given earlier by Tarjan and Vishkin
(1985).
Miller and Ramachandran (1987) used the ear-decomposition technique to design an NC algorithm
for finding 3-connected components that uses only a linear number of processors. The first
NC algorithm to decide whether a graph is 3-connected is due to Ja'Ja' and Simon (1982); however,
they were unable to find the 3-connected components of a graph. Miller and Reif ( 1985) used tree
contraction methods to find 3-connected components in parallel.
One of the most important open problems in parallel computation is to give an efficient parallel
algorithm that finds a maximum matching in a graph. Recall the polynomial time randomized
algorithm of Lovasz (1979) that was discussed in introducing the idea ofrandomized computation.
The algorithm decides whether a graph has a perfect matching by computing the determinant
of a single randomly chosen matrix. Determinants can be computed efficiently in parallel. This
establishes that deciding whether a graph has a perfect matching is in RNC. The same idea shows
that the exact matching problem is in RNC. This is an interesting case of a problem in RNC that
is not known to be in P.
It is important to notice that decision and search problems are not equivalent for parallel
computation. There does not appear to be a way to turn the RNC algorithm of Lovasz that
decides whether a perfect matching exists, into an RNC algorithm that finds a perfect matching if
one exists. The difficulty seems to lie in identifying a particular matching. The graph might have
several matchings, and the different processors working in parallel have to coordinate which perfect
matching to choose. An RNC algorithm to find a perfect matching in a graph (if one exists) has
been given by Karp, Upfal and Wigderson (1986). Later Mulmuley, Vazirani and Vazirani (1987)
gave a very elegant and more efficient algorithm. Here we briefly outline the later algorithm in the
case of bipartite graphs. For a bipartite graph G = (U U W, E) with n nodes in each of the color
classes U and W, independently assign to each edge e an integral cost c( e) selected uniformly at
random between 1 and 2IEI. The algorithm is based on a lemma which states that with probability
60
at least 1/2, the minimum cost perfect matching is unique. The algorithm proceeds as follows:
define an n by n matrix A with entries aij = 2c(e) if e = ( Ui, Vj) E E and O otherwise. The 2c terms
in the expansion of the determinant of the matrix A correspond to matchings of cost c. Now assume
that the minimum cost perfect matching is unique, and has cost c. The determinant expansion
has exactly one term that is not divisible by 2c+1 and consequently, det A f:. 0. Let bij denote the
(i,j)th entry of A-
at least 1/2, the minimum cost perfect matching is unique. The algorithm proceeds as follows:
define an n by n matrix A with entries aij = 2c(e) if e = ( Ui, Vj) E E and O otherwise. The 2c terms
in the expansion of the determinant of the matrix A correspond to matchings of cost c. Now assume
that the minimum cost perfect matching is unique, and has cost c. The determinant expansion
has exactly one term that is not divisible by 2c+1 and consequently, det A f:. 0. Let bij denote the
(i,j)th entry of A-. An edge e = (ui,vj) belongs to the unique minimum cost matching if and
only if the highest power of 2 dividing bij is 2c-c(e).
As a corollary of the 'RNC perfect matching algorithm, one immediately has an 'RNC algorithm
to find a maximum matching ( or to find a matching of a given size). The resulting algorithm
will output a matching that is a maximum-size matching with high probability. Karloff (1986)
has noticed that the Gallai-Edmonds structure theorem (see Chapter 3) implies that a maximum
matching algorithm can be used to produce a minimum odd-set cover. This gives a Las Vegas-type
randomized maximum matching algorithm, that is, an algorithm that is likely to find a matching
proven to be maximum. Further consequences of the perfect matching algorithm are RA!C algorithms
for finding a minimum cost perfect matching if the costs are given in unary, and for the
maximum flow problem with capacities given in unary. Recall that the maximum flow problem with
capacities given in binary is P-complete. It is an interesting open problem whether a minimum
cost matching problem (with costs given in binary) is in 'RNC.
There are several important problems that are known to be in P, but are not known to be in
NC ( and even 'RN C), and yet are not known to be P-complete. In addition to the ones already
mentioned, perhaps the most prominent is the problem of computing the greatest common divisor
of two integers. A fast parallel algorithm for this would have implications for a variety of numbertheoretic
problems, including primality testing, which have so far resisted attempts to classify their
parallel complexity. One major element of Luks' (1982) polynomial-time algorithm to test if two
constant-degree graphs are isomorphic, is an algorithm of Furst, Hopcroft and Luks (1980) to
compute the order of a permutation group, and a recent result of Babai, Luks and Seress (1988)
provides a parallel analogue of that result. Nonetheless, the problem a providing an efficient parallel
algorithm for testing constant-degree graph isomorphism remains an important open problem.
Attacks on the P versus NP Problem
In previous sections, we have discussed techniques for providing evidence of the intractability of
concrete problems ( e.g., NP-completeness). However, this evidence is not a proof, as long as
the fundamental problems remain open. In this section, we will be concerned with approaches to
settling the P versus NP question.
First we consider the generalization of these unresolved questions to oracle Turing machines.
For example, we will consider questions such as whether pA = NPA for an oracle A. These
problems were introduced in order to gain insight into the techniques that might be used to settle
the fundamental open problems.
Recent results, most importantly those of Furst, Saxe and Sipser (1981) and Razborov(1985a),
have provided a major breakthrough in the development of techniques for proving lower bounds on
the computational complexity of certain problems. These methods have succeeded by considering
computational models with serious limitations on some of the resources. The main lines of such
61
research are for polynomial-size, constant-depth circuits and polynomial-size monotone circuits
(i.e., circuits without negations that use only and and or gates). The techniques developed for
these results are combinatorial in nature.
research are for polynomial-size, constant-depth circuits and polynomial-size monotone circuits
(i.e., circuits without negations that use only and and or gates). The techniques developed for
these results are combinatorial in nature.
3. We will discuss three combinatorial techniques. We discuss the classical crossing sequence
technique, where the lower bound on time is based on the amount of information that can pass
from one end of the tape to the other. Next we discuss a result of Hopcroft, Paul and Valiant (1977)
that relates the time and space complexity of a problem. We conclude with the landmark result of
Paul, Pippenger, Szemeredi and Trotter (1983), that proves that DTIME(n) =J NTIME(n).
Relativized complexity classes
In the previous sections we have introduced the notion of an oracle Turing machine and
complexity classes defined by such Turing machines. Several of the theorems and proof techniques
discussed in the previous sections easily extend to these complexity classes ( i.e., they
relativize). By generalizing diagonalization and simulation results, we get theorems such as
APA= PSPAC£A = NPSPAC£A for every oracle A.
The main result concerning oracle complexity classes, due to Baker, Gill and Solovay (1975),
is that the answer to the relativized versions of the fundamental open problems depends on the
oracle.
(6.1) Theorem. There exist languages A and B such that pA =J NpA =J co -NPA, and pB =
NpB = co -NP8.
The existence of an appropriate language A for the first alternative is proved by diagonalization.
Any PSPAC£-complete language B will provide an example of the second alternative. As a
corollary, we can assert that techniques that relativize cannot settle the P versus NP problem.
One might wonder which one of the two alternatives provided by the theorem of Baker, Gill,
and Solovay is more typical. Recall the definition of a random oracle. We say that a statement
holds for a random oracle if the probability that the statement holds for a random oracle is
1. The Kolmogorov 0-1 law of probability theory states that if an event A is determined by the
independent events, 131, 132, ... and A is independent of any event that is a finite combination of the
events 13i, then the probability of A is either O or 1. Applying this to the event A that pA = NPA
and the events 13i that the ith word is in the language A, we see that the probability of pA = NpA
for a random oracle A is either O or 1. Bennett and Gill (1981) have provided the final answer to
this question.
(6.2) Theorem. pA =J NPA and NPA =J co -NPA for a random oracle A.
Furst, Saxe and Sipser ( 1981) discovered the following connection between constant-depth circuit
lower bounds and separating relativized complexity classes. This result provides further motivation
for studying restricted models of computation. The parity function indicates the sum modulo 2
of the bits of the input.
62
(6.3) (6.3)
10gc n) gates, then there exists an oracle A that separates the polynomial-time
hierarchy from PSPAC£; that is, Uk?:t:Ei'A =/ PSPACeA.
The idea of the proof is as follows. For any oracle A, we define the language L(A) = {lnlA
contains an odd number of strings of length n}. L(A) is in PSPACeA for any oracle A. Using
diagonalization and the lower bound assumed in the theorem, one can construct an oracle A such
that L(A) is not in Uk>t:Ei'A. First one shows that it is sufficient to consider alternating Turing
machines in which every branch of the computation has a single oracle question at the end of the
branch. The computation tree of such an alternating Turing machine with k levels of alternation
corresponds to a depth-k circuit where the oracle answers are the inputs.
Another related problem is to separate the finite levels of the polynomial-time hierarchy using
oracles. Baker and Selman ( 1979) proved the existence of an oracle A such that :Ef ,A # rrf ,A ( and
consequently, :Ef,A # :Ef ,A). Sipser (1983a) showed a result similar to Theorem (6.3) for the finite
levels of the polynomial-time hierarchy. It replaces parity by a certain function Fk, that is in some
sense, a generic function that is computable in depth k.
(6.4) Theorem. If for every k and c, any family of circuits of depth k -1 computing Fk has
f!(n10gcn) gates, then for every k there exists an oracle Ak, such that :Ef ,Ak =/ :Ef+~k.
The circuit complexity results that are needed in these theorems have been proved in a series
of papers by Furst, Saxe and Sipser (1981), Sipser (1983a), Yao (1985) and Hastad (1986). These
results will be discussed in the next subsection. Based on stronger circuit complexity results,
Babai (private communication) and Cai (1986) separated PSP AC£ from the finite levels of the
hierarchy by random oracles. The question whether random oracles separate the finite levels of
the polynomial-time hierarchy remains open. Another open question concerning random oracles is
whether pA = AfpA n co -AfpA for a random oracle A.
Similar oracle separation results have also been proved for classes of languages defined by
randomized proof systems. For example, Santha (1989) provided an oracle B for which MAB #
AMB. Aiello, Goldwasser and Hastad (1986) provided an oracle under which IP is not contained
in the polynomial-time hierarchy. In contrast with the speedup theorem of Babai and Moran
mentioned earlier, Aiello, Goldwasser and Hastad (1986) proved that for any functions f and g
such that g(n) = o(f(n)), there exists an oracle B such that AMB(f(n)) # AMB(g(n))3•
Relating circuit complexity to Turing machine complexity
Among the three models of computations introduced in Section 2, circuits have the simplest,
most combinatorial structure, and therefore seem to be most amenable for proving lower bounds.
This is especially true if we consider nonuniform families of circuits; that is, we forget about the
3In considering of the results of Lund, Fortnow, Karloff & Nisan and Shamir, it is significant to note that these
techniques do not relativize, as is evident from the oracle results known for these classes.
63
uniformity requirement. This simple combinatorial structure has made it possible to use combinatorial
techniques for proving lower bounds. Boppana and Sipser (1990) give an excellent survey of
circuit complexity lower bounds.
uniformity requirement. This simple combinatorial structure has made it possible to use combinatorial
techniques for proving lower bounds. Boppana and Sipser (1990) give an excellent survey of
circuit complexity lower bounds.
To measure the amount of nonuniformity in a family of circuits, we introduce a nonuniform
version of the Turing machine. A nonuniform Turing machine, in addition to the usual features
of a deterministic Turing machine, has an associated sequence of advice strings a1, a2, ••• , ( to be
written onto an additional advice tape). The nonuniform Turing machine accepts an input of
length n if with an written on its advice tape, the Turing machine accepts the input. Note that there
is an important difference between the acceptance rule for a nonuniform and a nondeterministic
Turing machine. A nonuniform Turing machine has to use the same advice for every input oflength
n. On the other hand, the definition does not restrict the behavior of a nonuniform Turing machine
with incorrect advice.
We can measure the nonuniformity of a Turing machine by the length of the advice. A language
is in P /log if and only if there exists a nonuniform Turing machine that accepts L, runs in
polynomial time, and whose family of advice strings { an} has length O(log n ). Similarly, P / poly
is the set of languages that can be accepted by a polynomial-time nonuniform Turing machine. It
is easy to see that a language L can be recognized by a family of polynomial-size (nonuniform)
circuits if and only if it is in P /poly. (The right advice is the description of the circuit.) Karp and
Lipton (1982) used simple self-reducibility properties to prove following theorem, that showed that
uniform and nonuniform complexity classes are not too far from each other.
(6.5) Theorem. The following relationships exist between non-uniform and uniform complexity
classes: (1) if NP~ P /log then P = NP; (2) if NP~ P /poly then ~f = IIf; (3) if PSPACe ~
P /poly then PSPAC£ = ~f.
The following theorem which is based on an idea of Adleman (1978) shows a natural way in
which nonuniform families of circuits are more powerful that their uniform analogues.
(6.6) Theorem. 13PP ~ P /poly.
Proof. Consider a randomized Turing machine RM that recognizes a language L. If we run the
algorithm k times and take the majority decision, the error probability decreases exponentially in
k. Therefore, we can assume without loss of generality that RM accepts every word x E L with
probability 1 -2-n-l and rejects every word x