Dictionary data structures also known as associative array
Binary search trees - interview
hash tables - wildly used
Dictionaries maintain a set of key value pairs
e.g.
words and definitions
symbol table of a compiler
Support the following operations:
insert (key)
delete (key)
lookup (key)
Usually the number of keys in your set is mush smaller than the number of possible keys.
e.g. keys are IP addresses.
Ideally, to maintain at most n keys, we will use space “close” to n. (m=1.1m)
# Binary Search Tree
A BST is a binary tree(i.e. every node has at most 2 children) and each node is associated with a key.
Furthermore, the following holds:
the left subtree of x, key(y)<=key(x)
the right subtree of x, key(y)<=key(x)
For BSTs, the runtime of operations is at most the tree depth. But the tree can have depth n (i.e. only taking one side).
self-balance BSTs. e.g red-block tree, keep the depth O(logN) *not need to know the details
a B tree is not a BST
# Hash Table
Hash table maintain a table with m > n “slots” and a hash function j:keys —>slots
when we want to insert key x, we try to place it in slot h(x).
If there are keys x,x’ in our set and h(x)=h(x’), this is called a collision.
There are many ways to deal with them
option 1 - Chaining each slot contains a linked list staring all keys hashing to the slot
Pro:
insertions always take constant time
con store more than there are slots
Con:
linked lists require storing pointers which takes space
poor memory locality
merry allocation/deallocation
Runtime:
insert are constant time
lookup, delete on key x takes time up to the size of slot h(x)
for each slot, the expected size of the slot is m/n = O(1)
however, with high probability, there will be slots storing ϴ(logN/loglogN) keys —> better than logN(BST)
option 2 - Two-Choice Hashing + Chaining. Same as above, except each key is mapped to two slots and placed i the least loaded slot.
Runtime:
ϴ(logN/loglogN) becomes loglogN *nearly constant < 6
option 3 - Linear Probing *wildly used
If slot h(x) is occupied, try slot h(x)+1, then try h(x)+2 and so forth until an empty slot is found.
Pro:
Great memory locality
simple to implement
Con:
suffer form “clustering”
Runtime:
All operations run in expected constant time
With high probability no operation will take more than O(logN)time —> much better than BST
option 4 - Cuckoo Hushing
Pro:
All lookups take at most 2 time steps.
A worry:
inserts may the a long time
Rumtime:
with high probability all inserts succeed and more take more than O(logN) time