12. Hash Tables 2

Hash functions

A good hash function satisfies the assumption of simple uniform hashing:
Each key is equally likely to hash to any of the m slots in the hash table, independently of where any other key has hashed to.

Most hash functions assume that the universe of keys is the set N = {0, 1, 2, . . .}. We here consider the following schemes of hash function creation.

Hashing by Division
Hashing by Multiplication
Universal Hashing

Hashing by Division

Map a key k into one of the m slots of a hash table by taking the remainder of k divided by m.

h(k) = k mod m

E.g., m = 23 and the key k = 107, then h(k) = 15.

Multiplication method

Step1. Multiply the key k by a constant A in the range 0<A<1, and extract the fractional part of k*A.
Step 2. Multiply this value by m and take the floor of the result.

h(k) = ⌊m · ((k*A) mod 1)⌋

The advantage of this method is that the value choice of m is not critical.

Universal hashing

Universal hashing is to select the hash function at random and at run time from a carefully designed collection of hash functions.

Let H = {h1,h2,...,hl} be a finite collection of hash functions that map a given universe U of keys into a range {0,1,...,m−1}.

以上都是建立在非Open Address上。（相同的key存储在一个linked list里）

Open Addressing

All elements in open addressing are stored in the hash table itself.

The sequence of positions probed depends on the key being inserted.
To determine which slots to probe, we extend the hash function to include the probe number as a second input.

So that every hash table position is eventually considered as a slot for a new key as the table fills up.

Probing

Linear Probing
Given an ordinary hash function h′:U → {0,1,...,m−1}, linear probing uses the hash function

h(k,i) = (h′(k)+i) mod m, for i = 0,1...,m−1.

描述：Given key k, the 1st slot is T [h′(k)], the 2nd slot is T [h′(k) + 1], and so on up to slot T [m − 1]. Then, we wrap around to slots T [0], T [1], . . . until we finally probe slot T [h′(k) − 1].

Disadvantage: the primary clustering problem.

Quadratic probing
uses a hash function of the form

h(k, i) = (h′(k) + c1i + c2i^2) mod m,
where h′ is an auxiliary hash function, c1 and c2 are constants.

Also, if two keys have the same initial probe position, then their probe sequences are the same since h(k1,0) = h(k2,0) implies h(k1,i) = h(k2,i).

Disadvantage: the secondary clustering problem.

在做题的时候，默认c1 = 0，具体的增量应该是 -1，1，-4，4，-9，9，……
具体步骤

Double hashing
One of the best methods available for open addressing because the permutations produced have many of the characteristics of randomly chosen permutations.

h(k,i) = [h1(k)+i·h2(k)] mod m,
where h1 and h2 are auxiliary hash functions.

The initial position is T [h1(k)]
A successive probe position is offset from its previous position by the amount h2(k) modulo m.
The value h2(k) must be relatively prime to the hash table size m for the entire hash table to be searched.
As we vary the value of key k, the initial probe position and the offset may vary independently.

Double hashing represents an improvement over linear or quadratic probings in that Θ(m^2) probe sequences, rather than Θ(m), are used, since each possible (h1(k), h2(k)) pair yields a distinct probe sequence with 0 ≤ h1(k), h2(k) ≤ m − 1