Dictionaries and Hashtables

Dictionary data structures also known as associative array

Binary search trees - interview

hash tables - wildly used

Dictionaries maintain a set of key value pairs

e.g.

words and definitions

symbol table of a compiler

Support the following operations:

insert (key)

delete (key)

lookup (key)

Usually the number of keys in your set is mush smaller than the number of possible keys.

e.g. keys are IP addresses.

Ideally, to maintain at most n keys, we will use space “close” to n. (m=1.1m)

# Binary Search Tree

A BST is a binary tree(i.e. every node has at most 2 children) and each node is associated with a key.

Furthermore, the following holds:

the left subtree of x, key(y)<=key(x)

the right subtree of x, key(y)<=key(x)

For BSTs, the runtime of operations is at most the tree depth. But the tree can have depth n (i.e. only taking one side).

self-balance BSTs. e.g red-block tree, keep the depth O(logN) *not need to know the details

a B tree is not a BST

# Hash Table

Hash table maintain a table with m > n “slots” and a hash function j:keys —>slots

when we want to insert key x, we try to place it in slot h(x).

If there are keys x,x’ in our set and h(x)=h(x’), this is called a collision.

There are many ways to deal with them

option 1 - Chaining each slot contains a linked list staring all keys hashing to the slot

Pro:

insertions always take constant time

con store more than there are slots

Con:

linked lists require storing pointers which takes space

poor memory locality

merry allocation/deallocation

Runtime:

insert are constant time

lookup, delete on key x takes time up to the size of slot h(x)

for each slot, the expected size of the slot is m/n = O(1)

however, with high probability, there will be slots storing ϴ(logN/loglogN) keys —> better than logN(BST)

option 2 - Two-Choice Hashing + Chaining. Same as above, except each key is mapped to two slots and placed i the least loaded slot.

Runtime:

ϴ(logN/loglogN) becomes loglogN *nearly constant < 6

option 3 - Linear Probing *wildly used

If slot h(x) is occupied, try slot h(x)+1, then try h(x)+2 and so forth until an empty slot is found.

Pro:

Great memory locality

simple to implement

Con:

suffer form “clustering”

Runtime:

All operations run in expected constant time

With high probability no operation will take more than O(logN)time —> much better than BST

option 4 - Cuckoo Hushing

Pro:

All lookups take at most 2 time steps.

A worry:

inserts may the a long time

Rumtime:

with high probability all inserts succeed and more take more than O(logN) time

Dictionaries and Hashtables

推荐阅读更多精彩内容