12. Hash Tables 2

Hash functions

A good hash function satisfies the assumption of simple uniform hashing:
Each key is equally likely to hash to any of the m slots in the hash table, independently of where any other key has hashed to.

Most hash functions assume that the universe of keys is the set N = {0, 1, 2, . . .}. We here consider the following schemes of hash function creation.

  1. Hashing by Division
  2. Hashing by Multiplication
  3. Universal Hashing

Hashing by Division

Map a key k into one of the m slots of a hash table by taking the remainder of k divided by m.

h(k) = k mod m

E.g., m = 23 and the key k = 107, then h(k) = 15.

Multiplication method

Step1. Multiply the key k by a constant A in the range 0<A<1, and extract the fractional part of k*A.
Step 2. Multiply this value by m and take the floor of the result.

h(k) = ⌊m · ((k*A) mod 1)⌋

The advantage of this method is that the value choice of m is not critical.

Universal hashing

Universal hashing is to select the hash function at random and at run time from a carefully designed collection of hash functions.

Let H = {h1,h2,...,hl} be a finite collection of hash functions that map a given universe U of keys into a range {0,1,...,m−1}.


以上都是建立在非Open Address上。(相同的key存储在一个linked list里)

Open Addressing

All elements in open addressing are stored in the hash table itself.

The sequence of positions probed depends on the key being inserted.
To determine which slots to probe, we extend the hash function to include the probe number as a second input.

So that every hash table position is eventually considered as a slot for a new key as the table fills up.

Probing

Linear Probing
Given an ordinary hash function h′:U → {0,1,...,m−1}, linear probing uses the hash function

h(k,i) = (h′(k)+i) mod m, for i = 0,1...,m−1.

描述:Given key k, the 1st slot is T [h′(k)], the 2nd slot is T [h′(k) + 1], and so on up to slot T [m − 1]. Then, we wrap around to slots T [0], T [1], . . . until we finally probe slot T [h′(k) − 1].

Disadvantage: the primary clustering problem.

Quadratic probing
uses a hash function of the form

h(k, i) = (h′(k) + c1i + c2i^2) mod m,
where h′ is an auxiliary hash function, c1 and c2 are constants.

Also, if two keys have the same initial probe position, then their probe sequences are the same since h(k1,0) = h(k2,0) implies h(k1,i) = h(k2,i).

Disadvantage: the secondary clustering problem.

在做题的时候,默认c1 = 0,具体的增量应该是 -1,1,-4,4,-9,9,……
具体步骤

Double hashing
One of the best methods available for open addressing because the permutations produced have many of the characteristics of randomly chosen permutations.

h(k,i) = [h1(k)+i·h2(k)] mod m,
where h1 and h2 are auxiliary hash functions.

  1. The initial position is T [h1(k)]
  2. A successive probe position is offset from its previous position by the amount h2(k) modulo m.
  3. The value h2(k) must be relatively prime to the hash table size m for the entire hash table to be searched.
  4. As we vary the value of key k, the initial probe position and the offset may vary independently.

Double hashing represents an improvement over linear or quadratic probings in that Θ(m^2) probe sequences, rather than Θ(m), are used, since each possible (h1(k), h2(k)) pair yields a distinct probe sequence with 0 ≤ h1(k), h2(k) ≤ m − 1

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • **2014真题Directions:Read the following text. Choose the be...
    又是夜半惊坐起阅读 9,959评论 0 23
  • 弟弟比姐姐矮了许多,我们带他去上海看内分泌专家门诊,专家建议住院系统检查。一听说要住院,金豆子就挂在眼里,垂垂欲坠...
    恬隆阅读 478评论 7 2
  • 9.19--9.23 第7章 正则表达式 正则表达式是一个拆分字符串并查询相关信息的过程。 推荐练习网站: js ...
    如201608阅读 1,072评论 0 4
  • 材料:白萝卜1个、五花肉100克、尖红椒4个、青尖椒3个、青蒜5根 调料:植物油、盐、鸡精、生抽、蚝油、料酒、干红...
    蔚蓝色的天空c阅读 232评论 0 0