Binary trees¶
A binary tree is a tree structure where each node has at most two children. Some data is stored at each node. Nodes with children are called interior nodes while nodes without children are called leaf nodes.
Conceptually we will represent a node as a data structure with the following fields.
class Node:
def __init__(self, value, left, right, parent):
self.value = value
self.left = left
self.right = right
self.parent = parent
The value
field stores the node’s payload, while the left
and
right
fields store the left- and right-child nodes,
respectively. The parent
pointer references the parent of this
node, if any. We store None
to represent the absence of a child or
parent.
We refer to the depth of a node as the depth of its parent plus one; the root node (which has no parents) has depth 0.
Binary trees have several flavors. A balanced binary tree is one in which no leaf nodes are ‘too far’ from the root. For example, one definition of balanced could require that all leaf nodes have a depth that differ by at most 1. An unbalanced binary tree is one that is not balanced. A complete binary tree has all levels completely filled, except possibly the last. If a complete tree has maximum depth \(n\), then it has at least \(2^n\) and at most \(2^{n + 1} - 1\) nodes. A complete tree with exactly \(2^{n + 1} - 1\) nodes is called perfect.
Binary search trees¶
Binary search trees maintain the following invariant:
- If a node stores a value \(x\), then \(x\) compares greater than all nodes in its left sub-tree and compares less than all nodes in its right sub-tree.
Binary search trees are useful in representing, for example, sets.
A balanced binary search tree is additionally balanced.
The definition of balanced is implementation-dependent. In red black trees, the depth of any leaf node is no more than twice the depth of any other leaf node. In AVL trees, the depth of leaf nodes differ by at most one.
There isn’t much of a reason to use an unbalanced binary search tree over a balanced binary search tree except for ease of implementation.
Implementing binary trees¶
There is a fairly obvious implementation of binary trees using data structures and pointers. Namely, there is a root node which contains pointers to the left- and right sub-trees, which recursively maintain pointers to their children. However, there is another implementation which uses arrays. Specifically, we store the root at array index 0. Recursively, the left child (if any) and the right child (if any) of the node at index \(i\) are stored at \(2 i + 1\) and \(2 i + 2\), respectively. The parent of a node at index \(i\) (if any) is stored at \(\lfloor (i - 1) / 2 \rfloor\).
The advantage of the array implementation is speed; specifically, it is very cache efficient to iterate over the array. However, growing the tree can be expensive since adding a node at depth \(n\) requires that the array be of length at least \(2^{n + 1}\).
Operations¶
We can insert nodes, delete nodes, and find the node containing a value. The complexity of these operations depends on the type of binary tree we are using.
insert |
find |
delete |
Space | |
Binary tree (pointers) | \(O(1)\) | \(O(n)\) | \(O(1)\) | \(O(n)\) |
Binary tree (array) | \(O(1)\) | \(O(n)\) | \(O(\log n)\) | \(O(n)\) |
Binary search tree (pointers) | \(O(1)\) | \(O(\text{max depth})\) | \(O(1)\) | \(O(n)\) |
Binary search tree (array) | \(O(2^{\text{max depth}})\) | \(O(\text{max depth})\) | \(O(\log n)\) | \(O(2^{\text{max depth}})\) |
Balanced binary search tree | \(O(\log n)\) | \(O(\log n)\) | \(O(\log n)\) | \(O(n)\) |