Binary trees

A binary tree is a tree structure where each node has at most two children. Some data is stored at each node. Nodes with children are called interior nodes while nodes without children are called leaf nodes.

Conceptually we will represent a node as a data structure with the following fields.

class Node:
    def __init__(self, value, left, right, parent):
        self.value = value
        self.left = left
        self.right = right
        self.parent = parent

The value field stores the node’s payload, while the left and right fields store the left- and right-child nodes, respectively. The parent pointer references the parent of this node, if any. We store None to represent the absence of a child or parent.

../_images/unbalanced_tree.png

An unbalanced binary tree.

We refer to the depth of a node as the depth of its parent plus one; the root node (which has no parents) has depth 0.

../_images/bst.png

An balanced binary search tree.

Binary trees have several flavors. A balanced binary tree is one in which no leaf nodes are ‘too far’ from the root. For example, one definition of balanced could require that all leaf nodes have a depth that differ by at most 1. An unbalanced binary tree is one that is not balanced. A complete binary tree has all levels completely filled, except possibly the last. If a complete tree has maximum depth \(n\), then it has at least \(2^n\) and at most \(2^{n + 1} - 1\) nodes. A complete tree with exactly \(2^{n + 1} - 1\) nodes is called perfect.

Binary search trees

Binary search trees maintain the following invariant:

  1. If a node stores a value \(x\), then \(x\) compares greater than all nodes in its left sub-tree and compares less than all nodes in its right sub-tree.

Binary search trees are useful in representing, for example, sets.

A balanced binary search tree is additionally balanced.

The definition of balanced is implementation-dependent. In red black trees, the depth of any leaf node is no more than twice the depth of any other leaf node. In AVL trees, the depth of leaf nodes differ by at most one.

There isn’t much of a reason to use an unbalanced binary search tree over a balanced binary search tree except for ease of implementation.

Implementing binary trees

There is a fairly obvious implementation of binary trees using data structures and pointers. Namely, there is a root node which contains pointers to the left- and right sub-trees, which recursively maintain pointers to their children. However, there is another implementation which uses arrays. Specifically, we store the root at array index 0. Recursively, the left child (if any) and the right child (if any) of the node at index \(i\) are stored at \(2 i + 1\) and \(2 i + 2\), respectively. The parent of a node at index \(i\) (if any) is stored at \(\lfloor (i - 1) / 2 \rfloor\).

The advantage of the array implementation is speed; specifically, it is very cache efficient to iterate over the array. However, growing the tree can be expensive since adding a node at depth \(n\) requires that the array be of length at least \(2^{n + 1}\).

Operations

We can insert nodes, delete nodes, and find the node containing a value. The complexity of these operations depends on the type of binary tree we are using.

  insert find delete Space
Binary tree (pointers) \(O(1)\) \(O(n)\) \(O(1)\) \(O(n)\)
Binary tree (array) \(O(1)\) \(O(n)\) \(O(\log n)\) \(O(n)\)
Binary search tree (pointers) \(O(1)\) \(O(\text{max depth})\) \(O(1)\) \(O(n)\)
Binary search tree (array) \(O(2^{\text{max depth}})\) \(O(\text{max depth})\) \(O(\log n)\) \(O(2^{\text{max depth}})\)
Balanced binary search tree \(O(\log n)\) \(O(\log n)\) \(O(\log n)\) \(O(n)\)