# Divide and conquer algorithms¶

There are problems with special structure that allow one to break it down into several smaller subproblems and combine the results. This the approach behind divide and conquer algorithms. They are typically easy to analyze and implement, and the ability to split a potentially large problem into many smaller ones makes this scheme ripe for a parallelized approach.

Typically, the mathematical tool for analyzing divide and conquer algorithms is recursion. Given an input size \(n\), one can usually find an expression for run time or memory usage \(T(n)\) in terms of smaller instances of the problem. For example, one can use merge sort to order a list. One can express the running time as \(T(n) = 2T(n/2) + n\), with the first term capturing the time required to sort the two halves, and the second term representing the number of operations required to merge the two. The base case for this recursion could be when the list is a singleton, meaning that \(T(1) = 1\). There are also more sophisticated choices for base cases: for instance, if the list is smaller than \(k\) elements, one could use any preferred sorting algorithm, and that would provide a value for \(T(j)\) for \(j \le k\).

## Master theorem¶

The master theorem is a mathematical result that can find the asymptotic behavior of \(T(n)\) if it is recursively defined. Suppose \(T(n)\) is defined as

$$ T(n) = aT(n/b) + f(n), $$

where \(a \ge 1\) and \(b > 1\). Then we have the following three cases.

If \(f(n) = O(n^c)\) where \(c < \log_b a\), then

$$ T(n) = O(n^c).$$

If \(f(n) = O(n^c \log^k n)\) for some \(k \ge 0\) and \(c = \log_b a\), then

$$ T(n) = O(n^c \log^{k+1} n). $$

If \(f(n) = O(n^c)\) where \(c > \log_b a\), and additionally if there exists \(p < 1\) such that for sufficiently large \(n\) we have \(af(n/b) \le p f(n)\), then

$$T(n) = O(f(n)).$$

From our previous example of the binary search, \(a=2\), \(b=2\), \(c=1\), and \(k=0\). Then the second case applies, implying that \(T(n) = O(n \log n)\), as expected for an efficient sorting algorithm.

## Recursive calls¶

For divide and conquer algorithms, it is natural to write them using recursion explicitly. However, this is not always the best choice. Consider a naive implementation of a function that calculates terms of the Fibonacci sequence.

```
def fib_bad(n):
if n <= 1:
return 1
return fib_bad(n-1) + fib_bad(n-2)
```

The program is correct and outputs the correct result, but running the program takes a long time even for moderate values. Using a profiler to investigate provides insight into why this is the case.

```
>>> import cProfile
>>> cProfile.run('fib_bad(30)')
2692540 function calls (4 primitive calls) in 0.577 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
2692537/1 0.576 0.000 0.576 0.576 <stdin>:1(fib_bad)
1 0.000 0.000 0.576 0.576 <string>:1(<module>)
1 0.000 0.000 0.577 0.577 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
```

Running `fib_bad`

to find the 30th term requires recursively running
`fib_bad`

2,692,537 times. This is because the program needs to
recalculate previous terms in the sequence many, many times. To be
precise, when we call `fib_bad(30)`

, the interpreter realizes that
it needs to find `fib_bad(29)`

and `fib_bad(28)`

. It starts
computing `fib_bad(29)`

, and adds `fib_bad(28)`

to the call
stack, which keeps track of the remaining routines to be
executed. When the interpreter is done finding `fib_bad(29)`

, it can
pop from the stack and execute `fib_bad(28)`

. The problem is that
the CPU does not keep track of the previous work it did to find the
29th term, and therefore performs all of the same operations. And to
make it worse, this problem occurs for finding any term in the
sequence (think about how many times `fib_bad(5)`

was
recalculated). Even worse than that, there is memory overhead
associated with pushing a function call to the stack. If the call
stack gets too large, there may be memory problems, causing the system
to crash. This phenomenon is called a stack overflow. For this
reason, many languages put a cap on the maximum size for the call
stack, and if this threshold is met, the program is terminated.

We can analyze this, we use the same approach as before, and can recursively define the run time \(T(n) = T(n-1) + T(n-2) + 1\) (the last term is the summation). Ironically, it takes more operations to find the nth term of the sequence this way than the magnitude of the nth term itself, and the Fibonacci sequence already grows exponentially.

The way around this is to write the program to reuse the same work done previously. This usually involves using a looping construct, or higher order functions in functional programming languages. Doing this in Python is relatively straightforward.

```
def fib_loop(n):
a,b = 1,1
for _ in range(n):
a,b = b,a+b
return a
```

It is straightforward to verify its correctness, and it only comprises
of a loop iterating \(n\) times. Using `fib_loop`

, we can
calculate the 250,000th term of the sequence in the time it took
`fib_bad`

to calculate the 30th. This does not mean that explicit
recursion is always a poor choice, but rather that there are better
implementations in some instances.

## Merge sort¶

Merge sort is a divide and conquer algorithm for sorting a list. The following is a Haskell implementation of merge sort.

```
mergeSort [] = []
mergeSort [x] = [x]
mergeSort xs = merge ls rs
where
n = length xs
ls = take (n `div` 2) xs
rs = drop (n `div` 2) xs
merge [] rs = rs
merge ls [] = ls
merge (l:ls) (r:rs) | l < r = l : merge ls (r:rs)
| otherwise = r : merge (l:ls) rs
```

Like any good comparison-based sorting algorithm, sorting a list with merge sort takes \(O(n \log n)\) time to sort \(n\) elements. However, what makes merge sort useful is its ability to sort enormous lists that don’t fit in RAM.

### Sorting large lists¶

Suppose we want to sort a list that cannot fit inside of RAM — for example, we might want to sort a 100 gigabyte file on our hard drive. Merge sort can handle this situation quite naturally. We choose a block size \(B\) of memory that can comfortably fit inside RAM, and process the file in chunks of size \(B\). We sort each \(B\)-sized chunk and write it to another file. Once we have sorted all \(B\)-sized chunks, we then begin merging the files into a sorted file. Pseudo-code is below.

```
LARGE-MERGE-SORT(infile, outfile)
// Sorting phase
let fcount = 0
while infile still has contents
let x = the next B items from infile
let y = SORT(x)
write y to file fcount
let fcount = fcount + 1
// Merging phase
for i = 0 to fcount
let f(i) be a file handle to file i
K-WAY-MERGE([f(0), f(1), ... f(fcount)], outfile)
K-WAY-MERGE(files, outfile)
while length files > 0
let i = arg min { peek next element of f | for all f in files }
let x = read next element of files[i]
if files[i] exhausted
remove element i from files
write x to outfile
```

Here we’ve relied on some in-memory sorting algorithm `SORT`

. This
can be any sorting algorithm you like, such as merge sort or
quicksort.

The correctness of this algorithm is readily apparent, since its structure exactly follows merge sort. However, we note that we never have more than \(O(B)\) memory being used at any given point.

### Parallelizing the sort¶

The `LARGE-MERGE-SORT`

algorithm described above has natural
variants which can easily be parallelized. First, we can simply rely
on an implementation of `SORT`

which is itself parallel. However, we
can do better. Disk I/O can be quite slow, so if we choose \(B\)
such that two \(B\)-sized blocks can fit in main memory, we can be
calling `SORT`

on one block while reading the second block into
memory. Frameworks such as Hadoop take this a step further,
distributing large chunks of the file to difference machines to be
sorted in parallel, and then handles the merge back into a single
file.