Week 3: Asymptotic analysis & data structures

Part I: Asymptotic analysis

When writing computer programs, a standard way to measure their computational complexity is by the number of operations they require as a function of the size of the input. The term 'asymptotic' means that we only care about their behavior as the size of the input grows to infinity (i.e. "for large enough" inputs) and disregard absolute constants.

Note: unless otherwise stated, we are interested in the worst-case complexity of an algorithm. In particular, there might be problem instances where our algorithm will terminate really quickly and some other instances where our algorithm will be really slow; we are interested in its performance in the "hard" instances.

Elementary operations

In asymptotic analysis, elementary arithmetic or logical operations (e.g. adding / subtracting two numbers or comparing them to find the bigger one) are often considered "atomic", in the sense that they take the minimal amount of time to complete. This is not always true (e.g. for extended-range arithmetic), but we will take it for granted here.

Example

A program that finds the maximum of an unsorted list of \( n \) numbers maintains a running maximum and traverses the list, one element at a time, checking if the current element exceeds the running maximum. For each element, we perform a logical test (check if it is larger than the running maximum) and an assignment if the test succeeds. Therefore, the running time of this program is something like \(T(n) = cn\), where \(c\) is a small constant reflecting the constant number of work we need to do for each item.

Bachmann-Landau notation

Bachmann-Landau notation, also known as "Big-Oh" notation, abstracts away all constants in asymptotic analysis. Letting \( f(n) \) and \( g(n) \) denote functions of an integer \( n \), we use the following notation:

\( f(n) \in O(g(n)) \) if \( \lim_{n \to \infty} \frac{f(n)}{g(n)} \leq c_1 \) for some constant \( c_1 \) which is independent of \( n \).
\( f(n) \in \Omega(g(n)) \) if \( \lim_{n \to \infty} \frac{f(n)}{g(n)} \geq c_2 \) for some constant \( c_2 \) which is independent of \( n \).
\( f(n) \in \Theta(g(n)) \) if both \( f(n) \in O(g(n)) \) and \( f(n) \in \Omega(g(n)) \) hold.

Examples: Upper bounds

We already saw that finding the maximum / minimum of an unsorted list takes time \( O(n) \). Some other examples include:

sorting an array with a recursive algorithm like mergesort requires time \( O(n \log n) \)
multiplying an \( n \times n \) matrix with a vector is an \( O(n^2) \) operation

Examples: Lower bounds

It is also easy to argue that finding the maximum of a list must also take time \( \Omega(n) \). Indeed, any algorithm that looks at less than \( n \) elements can "miss" the maximum. Therefore, finding the maximum / minimum is also an \( \Omega(n) \) algorithm. Other examples include:

sorting (using any algorithm) must take \( \Omega(n) \) time. Indeed, even if we are given an already sorted array, we still need to iterate over all the elements to verify that they are sorted.
sorting using a comparison-based algorithm takes time \( \Omega(n \log n) \) (this result is nontrivial to show).

With the help of limits, we can also define "strong" versions of \(O(\cdot)\) and \(\Omega(\cdot)\) like below:

\( f(n) \in o(g(n)) \) if \( \lim_{n \to \infty} \frac{f(n)}{g(n)} = 0 \), and
\( f(n) \in \omega(g(n)) \) if \( \lim_{n \to \infty} \frac{f(n)}{g(n)} = \infty \).

These statements imply their "weak" variants, but not vice-versa, as the next exercise shows.

Exercise

Answer the following questions:

If \( f(n) = 2n^2 + 5n \) and \( g(n) = 3n^2 \), which of the following are true?
- \( f(n) \in O(g(n)) \)
- \( f(n) \in \Omega(g(n)) \)
- \( f(n) \in \omega(g(n)) \)
True or False: \( f(n) \in \omega(g(n)) \implies f(n) \in \Omega(g(n)) \)
True or False: \( f(n) \in O(g(n)) \implies f(n) \in o(g(n)) \)
Is \( 2^n \in \Theta(e^n) \)? Why or why not?

Solution

The first two statements are true, and the third is false. Hint: The limit of the ratio of two polynomials only depends on the leading terms.
True: if \(\lim_{n \to \infty} \frac{f(n)}{g(n)} = \infty\), then it also satisfies \(\lim_{n \to \infty} \frac{f(n)}{g(n)} \geq c\) for any \(c > 0\).
False: the \( f(n) \) and \( g(n) \) from the first question are counterexamples to this.
False: Taking the limit gives \( \lim_{n\to\infty} \frac{f(n)}{g(n)} = \lim_{n \to \infty} \left( \frac{2}{e} \right)^n = 0 \), which means that \(f(n) \in o(g(n))\). This rules out the claimed inclusion.

Tips and tricks

You may find the following facts useful in asymptotic analysis:

For any \( \varepsilon > 0 \), we have \( \log n = o(n^{\varepsilon}) \).
The Stirling formula is incredibly useful when working with factorials:

\[ n! = \sqrt{2 \pi n} \left( \frac{n}{e} \right)^n (1 + \Theta(1 / n)). \]

For example, using this formula you can try to prove that \( \log (n!) \in \Theta(n \log n) \).
Addition: in general, we have

\[ f_1(n) \in O(g_1(n)), f_2(n) \in O(g_2(n)) \in O(\max\{ g_1(n), g_2(n) \}). \]

On the other hand, we have

\[ f_1(n) \in \Omega(g_1(n)), f_2(n) \in \Omega(g_2(n)) \in \Omega(\min\{ g_1(n), g_2(n) \}). \]
Composition: in general, if \( f(n) \in O(g(n)) \) and \( h(n) \in O(\kappa(n)) \), we have

\[ (f \circ h)(n) \in O((g \circ \kappa)(n)), \quad (f \cdot h)(n) \in O((g \cdot \kappa)(n)). \]

Here, the \( \circ \) symbol denotes composition.

Why Landau notation?

Abstracting away constants is not always desirable, but asymptotic notation gives you a good idea of how well you should expect your algorithm to scale when working with big data. For example, a difference between \( n \) and \( 5n \) operations when \( n = 10^6 \) is a few milliseconds on any modern laptop. But \( 5n \) vs. \( n^2 \) operations makes a huge difference; if the \( 5n \) algorithm takes one second, the \( n^2 \) algorithm could take up to 250 minutes!

Part 2: Data structures

Before we start analyzing some classical algorithmic paradigms, such as divide-and-conquer algorithms, it will be useful to build an index of data structures, the time complexity of operations on them, as well as their availability in Python.

Disclaimer: The implementations of some of these data structures may deviate from their theoretical specifications.

Lists and arrays

Linked lists

A linked list can be viewed as a collection of tuples of the form (value, next) living in your computer's memory:

value contains the actual content of each element, such as a number or a character
next is a pointer (memory address) to the next element of the list

We always maintain a reference to the first node (the head) of the list, and traverse it by following the memory addresses indicated by the next fields. For that reason, accessing the k-th element of a list requires k operations. The complexity of some standard operations is given below:

access: \( O(n) \) (worst case when we are trying to access the last element in the list)
insert element (at beginning of list): \( O(1) \)
append element (at the end of a list): \( O(n) \)
delete element: \( O(n) \)

Arrays

In contrast to lists, arrays are contiguous parts of your computer's working memory, whose start and end addresses are known. For that reason, accessing an element at a given index takes constant time. In contrast, simple operations like inserting or removing an element can take \( O(n) \), because resizing an array might require you to re-allocate it starting at a different location in memory, copying the previous elements in the process.

The time complexity of standard operations is given below:

access: \( O(1) \)
insert / append / delete element: \( O(n) \) (might require re-allocation)

Note: the Python list is implemented as a resizeable array behind the scenes:

items = list(range(1000))
items[500]                  # O(1) access
items.append(10)            # O(n) at worst case, but typically constant
items.insert(5)             # O(n), requires shifting all other elements

Stacks and queues

Stacks and queues are abstract data types implementing two different access principles.

Stacks

A stack implements the LIFO (last in, first out) access principle. This means that the element that was most recently "pushed" to the stack will be the first one to be retrieved when "popping" the top of the stack:

push(): \( O(1) \) (put element on top)
pop(): \( O(1) \) (remove element from top)

In Python, you can implement a stack using the built-in list. This will give you \( O(1) \) complexity for the push and pop operations in the average case. An alternative is using the interface from collections.deque. Some examples follow below.

stack = []          # empty stack
stack.append(10)    # use append() to push elements - end of list is "top" of stack
stack.append(20)
stack.append(30)
stack.pop()         # returns 30

# alternative way
from collections import deque

stack = deque([5, 10, 15, 20, 25])
stack.pop()         # returns 25
stack.append(5)     # use append() to push
stack.pop()         # returns 5

Queues

Queues implement the FIFO (first in, first out) access principle. The element first inserted (or "enqueued") becomes the first element to be retrieved (or "dequeued"):

enqueue(): \( O(1) \) (add element to end of queue)
dequeue(): \( O(1) \) (retrieve element from beginning of queue)

In Python, you can also implement queues using a list, but it will be fairly slow since inserting an element at the beginning of a list requires you to shift the positions of all the other elements. Here, collections.deque comes to the rescue, with guaranteed \( O(1) \) operations:

from collections import deque

dq = deque([3, 2, 4, 5, 6])
dq.pop()                        # removes 6 (dequeue)
dq.appendleft(6)                # puts 6 (enqueue)

Key-value data structures

Key-value data structures are more involved than anything we've seen so far. These structures power Python constructs like dict and set.

Idea: in general, we use key-value data structures when:

we have a collection of objects of the form \((k, v)\), where \(v\) is a value and \(k\) is a "key" that indexes the object
keys \(k\) can be totally ordered
we want a data structure that allows at least one of the following:
efficient access to max/min (ideally in \(O(1)\))
efficient insertions / deletions / updates to elements
efficient membership queries

For example, we may want to write a task scheduler for our favorite operating system. This requires a data structure that allows us to quickly obtain the highest priority task to execute next, and also quickly add new tasks to the scheduler.

Key-value data structure implementations come in different flavors, some of which we present below.

Heaps

A heap is a binary-tree like data structure. Conceptually, every pair \((k, v)\) becomes a node in a tree that satisfies the following invariant:

the key \(k\) of a node is less than or equal to (min-heap) the keys of its left and right children

Standard heap implementations result in balanced binary trees, where the height of any pair of "leaf" nodes may differ by at most \( 1 \). This way, the height of the tree is on the order of \( O(\log n) \) (since \(n \approx 2^h\), where \( h \) is the tree height). Somewhat surprisingly, we can implement a binary heap using an array, in-place!

The time complexity of standard heap operations follows below:

peek() (look at minimal element without removing it): \( O(1) \)
pop() (remove minimal element): \( O(\log n) \)
push() (add element to heap): \( O(\log n) \)

In Python, there is the heapq module (that implements a min-heap by default). It offers a functional interface that operates on a raw Python list:

import heapq
q = []
heapq.heappush(q, (2, "sleep"))
heapq.heappush(q, (1, "eat"))
heapq.heappush(q, (3, "study"))
heapq.heappop(q)                    # returns (1, "eat")


items = [(2, "sleep"), (1, "eat"), (4, "study"), (3, "drink")]
items_heap = heapq.heapify(items)   # use heapify() to turn list into heap in O(n) time!

Hash tables

Focus: efficient membership lookups and updates

Hash tables are another very useful data structure, that typically lead to very small observed time complexity for various operations. Given inputs from some universe \(\mathcal{X}\), the idea is to "map" that universe to another, smaller-dimensional universe, which is typically a range of integers \(\{0, \dots, K - 1 \}\). Typically, we choose \( K \) to be some power of \(2\).

This map between universes is what we call a "hash function", denoted \( \texttt{hash}: \mathcal{X} \to \{0, \dots, K - 1\} \). It should satisfy the following:

if \( x = y \), we should have \( \texttt{hash}(x) = \texttt{hash}(y) \)
if \( x \neq y \), the probability that \( \texttt{hash}(x) = \texttt{hash}(y) \) should be small

Note that it is impossible to guarantee that different inputs will produce different hashes. Indeed, if that were true, we would not be mapping our inputs to a smaller universe.

How do hash tables work?

A basic hash table initially allocates an array with \( K \) cells, say \( A[0, \dots, K - 1] \), where each cell is initially an empty linked list. Then, depending on the operation, we do the following:

lookup(x): to check if \( x \) is already in the hash table, we compute \( i := \texttt{hash}(x) \). Then:
- if \( A[i] \) is empty, we know that \( x \) cannot be a member of the table.
- if \( A[i] \) is nonempty, we search for \( x \) exhaustively over that list.
insert(x): to insert \( x \), we compute \( i := \texttt{hash}(x) \) and check if it is already a member of the hash table. If it is, we do nothing. If it is not already a member, we add it to the list indexed by \( A[i] \).
delete(x): similar to the above case, but we delete from the list at \( A[i] \) instead, if needed.

When implementing a hash table, there is an inherent trade-off in the size of the table: if we make the table too big, we can afford a poor hash function, since the probability of collision is smaller, and the resulting lists will contain few elements on average (unless our hash function is really poor). However, we might end up with many unused locations. In that case, we could have opted for a smaller table size and chosen a good hash function instead.

The following table summarizes the worst and average case complexity of operations on hash tables for standard implementations using a good hash function (i.e. with small collision probability):

Operation	Worst-Case	Avg-case
Lookup	\(O(n)\)	\(O(1)\)
Insert	\(O(n)\)	\(O(1)\)
Delete	\(O(n)\)	\(O(1)\)

In practice: much closer to \(O(1)\) than \(O(n)\).

Python: Both dict and set are backed by a hash table!

# dict
student_num = {
    "ORIE-5270": 55,
    "ORIE-6125": 10,
    "ORIE-3300": 150}
student_num["ORIE-5270"] = 60           # assignment
student_num["ORIE-3310"] = 125          # add new element to table
student_num.get("ORIE-3500", 0)         # use .get() to specify a default return value of 0

# set
people = {"vasilis", "mateo", "qiaomin"}
if "mateo" in people:
    # do something
elif "vasilis" in people:
    # do something else

# use set() on an iterable to build a set
set(student_num)                        # {"ORIE-5270", ..., "ORIE-3310"}
set(("vasilis", "mateo", "qiaomin"))    # {"vasilis", "mateo", "qiaomin"}

A typical use case is having built a custom class and wanting to store it in a hash table. In that case, you need to implement the __hash()__ built-in method for that class. If you are interested in this, this Stackoverflow thread may be of help.

Asymptotic analysis of a few standard algorithms

We now present a few examples of runtime analysis.

Example 1: Partial sums

Consider the piece of Python code below. Here, arr is assumed to be an \(n\)-element array.

run_sum = [0] * len(arr)
for i in range(len(arr)):
    for j in range(i, len(arr)):
        run_sum += arr[j]

The above code is a (suboptimal) way of computing the partial sums of the form \(A[i, \dots, n]\) for each index \(i\). It consists of one "outer" loop ranging from \(0\) to \(n - 1\); inside each outer loop, an "inner" loop executes, ranging from \(i\) to \(n - 1\), where \(i\) is the index of the outer loop. Finally, each step of the inner loop takes constant time, since it only executes an arithmetic operation. Therefore, the total runtime is given by

\[ T(n) = c n + c (n - 1) + \dots + c = \Theta(n^2). \]

This is a straightforward example where the runtime analysis reduces to counting the number of iterations for each loop. However, several algorithms are more involved, as we will see below.

Exercise

The above algorithm for computing partial sums is highly suboptimal, since it does not reuse any information about previous partial sums. Write an \(O(n)\) algorithm for computing these partial sums.

Solution

p_sum = [0] * len(arr)
p_sum[0] = arr[0]
for i in range(1, len(arr)):
    p_sum[i] = p_sum[i - 1] + arr[i]

Allocating p_sum takes \(O(n)\) time. Moreover, we only have a single loop from \(1\) to \(n - 1\), for a total of \(O(n)\) time.

Example 2: Binary search

Binary search is a search algorithm operating on an already sorted array \(A[1, \dots, n]\). When we look for an element \(x\) in \(A\) using binary search, the desired behavior is:

if \(x \in A\), we seek an index \(i\) such that \(A[i] = x\).
if \(x \notin A\), we seek the largest index \(i\) such that \(A[i] \leq x\).

Given the above desiderata, the idea is simple but powerful: look at the middle index of \(A\), denoted \(\texttt{mid}\). Then:

if \(A[\texttt{mid}] < x\), since \(A\) is sorted, we know that the index we are looking for must be bigger than \(k\). Thus we can restrict our search to \(A[\texttt{mid}+1, \dots, n]\)
if \(A[\texttt{mid}] > x\), by the same argument we can restrict our search to \(A[1, \dots, \texttt{mid}-1]\).
if \(A[\texttt{mid}] = x\), we can simply return the index \(\texttt{mid}\).

When we restrict our search to the upper or lower halves of \(A\), we recursively apply the same procedure. We terminate whenever we arrive at a sub-array with at most \(2\) elements, say \(A[k, k+1]\). In this case, we know that \(A[k-1] < x\) and can check manually if \(A[k]\) or \(A[k+1]\) are equal to \(x\).

The runtime analysis of binary search is more involved, but not that hard. Note that at each stage, we perform one logical operation (compare \(A[\texttt{mid}]\) to \(x\)). Then we solve the same problem on a sub-array of at most \(\frac{n}{2}\) elements. This leads to the following recurrence (\(T(n)\) is the runtime of binary search over \(n\) elements):

\[ T(n) = T(n / 2) + \Theta(1). \]

In particular, the constant "hidden" by \(\Theta(1)\) does not change across various stages. Therefore, we can "unfold" this recurrence to obtain

\[ T(n) = T(n / 4) + 2 \Theta(1) = \dots = T(n / 2^h) + h \Theta(1). \]

Since we stop whenever \(\frac{n}{2^h} \leq 2\), solving for \(h\) we conclude that we will solve at most \(h := \log_2 (n)\) subproblems, leading to a total runtime of \(T(n) = O(\log_2(n))\).

Note: Divide and conquer algorithms

Algorithms like binary search follow the paradigm of divide-and-conquer, a powerful technique for algorithm design. Most of these algorithms are recursive, breaking a big problem into subproblems until the subproblems are simple enough to solve, and (if necessary) combining their solutions efficiently. We will examine the technique in more detail in future lectures.

Analyzing the runtime of divide-and-conquer algorithms often reduces to analyzing a recurrence, such as the one shown above. Some recurrences are simple enough to analyze by hand, but others may be tricker. In particular, most recurrences look like below:

\[ T(n) = \alpha T(n / \beta) + f(n), \]

where \(\alpha\) indicates the number of subproblems, \(\beta\) is a factor by which the size of each subproblem is reduced, and \(f(n)\) indicates the complexity of the current stage. For example, \(f(n) = \Theta(1)\) for the case of binary search, since we only do constant work before solving the subproblem on the lower or upper half of the array.

Solving recurrences such as the above can be done with the help of the Master Theorem, which defines a set of cases for generic \(\alpha\) and \(\beta\) that lead to different solutions. You can find a worksheet with solved recurrences here.

Example 3: Merging two sorted lists

Our third example is a task used as a subroutine in the mergesort algorithm. The setup is simple:

we are given two sorted arrays \(A\) and \(B\) with \(n\) and \(m\) elements, respectively
we wish to produce an array \(C\) of size \(n + m\) that contains all elements of \(A\) and \(B\), and is itself sorted.

Here is an example input/output pair:

\[ A = [1, 4, 9, 15], \; B = [2, 3, 8, 13] \Rightarrow C = [1, 2, 3, 4, 8, 9, 13, 15]. \]

Can we do this efficiently? The best runtime we can hope for is obviously \(O(n + m)\), as even allocating \(C\) will take time proportional to that. The answer is yes! The standard algorithm for merging \(A\) and \(B\) utilizes the fact that they are both sorted, and works as follows:

maintain two indices \(i\) and \(j\), pointing to elements of \(A\) and \(B\), initialized at \(0\)
as long as their sum is less than \(m + n\), do the following:
if \(A[i] < B[j]\), \(A[i]\) should be the next element of \(C\), so we set \(C[i + j] = A[i]\) and increment \(i\) after.
otherwise, if \(A[i] \ge B[j]\), we set \(C[i + j] = B[j]\) and increment \(j\).

The time complexity of this algorithm is \(\Theta(n + m)\), since we increment one of \(i\) and \(j\) at each step and stop as soon as \(i + j = n + m\). The algorithm in Python follows below:

def merge(A, B):
    n, m = len(A), len(B)
    C    = [0] * (n + m)
    i = j = 0
    while i + j < n + m:
        if A[i] < B[j]:
            C[i + j] = A[i]
            i += 1
        else:
            C[i + j] = B[j]
            j += 1
    return C

Example 4: Mergesort

With the merge() function at hand, we can now describe one of the most important sorting algorithms: mergesort. Given an array \(A\) that we want to sort, the idea is simple but powerful: suppose that we had already sorted the subarray consisting of the first \(n/2\) elements of \(A\), as well as the subarray containing the last \(n/2\) elements. Then, we can merge two sorted subarrays in time \(\Theta(n/2 + n/2) = \Theta(n)\) using the merge() function. Indeed, this is the whole algorithm:

def mergesort(A):
    if len(A) <= 1:
        return A
    lower_A = mergesort(A[:(len(A) / 2)])
    upper_A = mergesort(A[(len(A)/2):])
    return merge(lower_A, upper_A)

Note that the base case, which happens when the subarray contains \(0\) or \(1\) elements, trivally returns the subarray itself. All the work essentially happens in merge().

Complexity: to analyze the running time, let us assume that \(n\) is a power of \(2\) for simplicity. This means that our subproblems at each stage have size exactly \(n/2\). Thus we have the following recurrence:

\[ T(n) = 2 T(n / 2) + c n, \]

for a small constant \(c > 0\). This constant \(c\) does not change in smaller subproblems, since we call the same merge function in every subproblem, only with arrays of different size. Unfolding this recursion (or solving using the Master Theorem) yields a runtime of \(O(n \log n)\), which is in fact tight for any comparison-based algorithm. Therefore, the running time of mergesort is \(\Theta(n \log n)\).