# Sets and Associative Arrays¶

Data structures can act as containers, keeping track of many elements at once. Allowing the ability to perform actions on any element within the data structure requires an efficient implementation. Below, we discuss two prevalent examples of such data structures: sets and associative arrays.

## Sets¶

Sets are data structures that implement operations from their mathematical definition: they contain unique elements in no particular order. The following are routines common to set implementations.

`insert(a)`

Inserts element

`a`

into the set.`has(a)`

Returns

`True`

if the set contains`a`

, and`False`

otherwise.`remove(a)`

Removes element

`a`

from the set.`union(A,B)`

Returns a set that contains elements either in set

`A`

or set`B`

.`intersection(A,B)`

Returns a set that contains elements in both set

`A`

and set`B`

.`set_minus(A,B)`

Returns a set that contains elements in set

`A`

that are not in set`B`

.

### Binary search tree implementation¶

If there is an ordering among elements, then one of the best
implementations for a set is a binary tree. This
means that `insert`

, `has`

and `remove`

would all take
\(O(\log n)\) time, which is very efficient (typically, it is
difficult for one of these operations to be performed in constant time
without having another require linear time).

The last three functions are operations performed on sets themselves. Because binary trees can be merged in an efficient manner, these operations can be done very efficiently. For example, it is relatively simple to generate an ordered array of elements in the union by performing a merge operation on the in-order tree traversals of both trees. From there, initializing a new binary search tree from a sorted array takes linear time. All three operations can be done in \(O(m+n)\) time using a similar technique, where \(m\) and \(n\) indicate the sizes of the two sets.

## Associative arrays¶

Sometimes, there are elements that are expressed as key-value pairs. A key is a unique identifier for the element. If the elements being added are complex objects, the key is used to search for the element inside a container data structure. This is the premise behind associative arrays. The following are the operations that an associative array would efficiently perform.

`insert(key,val)`

Inserts an element using

`key`

as a key and`val`

as the implicit value.`get(key)`

Returns the value associated with

`key`

. Throws an exception if there is no key-value pair in the associative array.`remove(key)`

Removes the associated key-value pair from the associative array. Throws an exception if no such pair exists.

### Hash table implementation¶

One of the most common ways to implement this is by using a hash table. Some hash function so that keys are mapped to integers called table indices. Usually, the domain of possible keys is larger than the range of possible indices, so there would be entire sets of keys that would map to the same index. The index corresponds to some location in memory with a smaller dynamic array.

Suppose we want to store the key-value pair `('Steele','red')`

. We
find the correct index by evaluating `hash_function('Steele')`

, and
then append the key-value pair to the end of the smaller
array. The functions `get('Steele')`

and `remove('Steele')`

would
work in a similar way.

Analyzing the complexity of this implementation requires some
nuance. At the end of the day, we are performing search and remove
operations on an unordered array, which takes a linear amount of time
with respect to its size. However, the goal is to choose a hash
function so that these hash collisions are minimal, making the
smaller arrays sufficiently small. In the worst-case scenario, the
added key-value pairs have keys that all map to the same
index. Therefore, the worst case scenario has `get(key)`

and
`remove(key)`

taking \(O(n)\) time.

But this is a pessimistic analysis. The hash functions are strategically chosen to minimize the likelihood of this occurrence. In the average-case scenario, the lengths of the smaller arrays are independent of the total number of key-value pairs in the hash table. Therefore, in an average-case setting, all of the desired operations take \(O(1)\) time.

### Binary search tree implementation¶

If the keys can be ordered, another approach is to use a self-balancing binary search tree. There, each key-value pair would be a node in the tree. The worst-case scenario analysis indicates that all three desired operations would take \(O(\log n)\) time (this is true for the average-case scenario as well).