Hash Tables
When we talk about hash tables, we're actually talking about dictionary. While an array can be used to construct hash tables, array indexes its elements using integers. However, if we want to store data and use keys other than integer, such as 'string', we may want to use dictionary.
Dictionaries in Python are implemented using hash tables. It is an array whose indexes are obtained using a hash function on the keys.
We declare an empty dictionary like this:
We declare an empty dictionary like this:
>>> D = {}
Then, we can add its elements:
>>> D['a'] = 1
>>> D['b'] = 2
>>> D['c'] = 3
>>> D
{'a': 1, 'c': 3, 'b': 2}
It's a structure with (key, value) pair:
D[key] = value
The string used to "index" the hash table D is called the "key". To access the data stored in the table, we need to know the key:
>>> D['b']
2
How we loop through the hash table?
>>> for k in D.keys():
... print D[k]
...
1
3
2
If we want to print the (key, value) pair:
>>> for k,v in D.items():
... print k,':',v
...
a : 1
c : 3
b : 2
Hashing from two arrays
Using two Arrays of equal length, create a Hash object where the elements from one array (the keys) are associated with the elements of the other (the values):
>>> keys = ['a', 'b', 'c']
>>> values = [1, 2, 3]
>>> hash = {k:v for k, v in zip(keys, values)}
>>> hash
{'a': 1, 'c': 3, 'b': 2}
Hashing
Here are some hashing samples using built-in hash function:
>>> map(hash, [0, 1, 2, 3])
[0, 1, 2, 3]
>>> map(hash, ['0','1','2','3'])
[6144018481, 6272018864, 6400019251, 6528019634]
>>> hash('0')
6144018481
As we can see from the example, Python is using different hash() function depending on the type of data.
Python provides hashlib for secure hashes and message digests:
md5(), sha*():
>>> import hashlib>>> hashlib.md5('a')>>> hashlib.md5('a').digest()'\x0c\xc1u\xb9\xc0\xf1\xb6\xa81\xc3\x99\xe2iw&a;'>>> hashlib.md5('a').hexdigest()'0cc175b9c0f1b6a831c399e269772661'>>> hashlib.sha512('a')>>> hashlib.sha512('a').digest()
'\x1f@\xfc\x92\xda$\x16\x94u\ty\xeel\xf5\x82\xf2\xd5\xd7\xd2\x8e\x183]\xe0Z\xbcT\xd0V\x0e\x0fS\x02\x86\x0ce+\xf0\x8dV\x02R\xaa^t!\x05F\xf3i\xfb\xbb\xce\x8c\x12\xcf\xc7\x95{&R;\xfe\x9au'
>>> hashlib.sha512('a').hexdigest()
'1f40fc92da241694750979ee6cf582f2d5d7d28e18335de05abc54d0560e0f5302860c652bf08d560252aa5e74210546f369fbbbce8c12cfc7957b2652fe9a75'
>>>
Hashing example code
Hashing example code
The following code is a revision from Sets (union/intersection) and itertools - Jaccard coefficient & shingling to check plagiarism. In this section, we used 64 bit integer (hash value from hash()) for the comparison of shingles instead of directly working on the string.
Output is exactly the same as the one we got using string comparison:
combinations=[(0, 1), (0, 2), (1, 2)]
(0, 1) : jaccard=0.0196078431373
(0, 2) : jaccard=0.0
(1, 2) : jaccard=0.0