无序容器

1. 无序容器

2. 自定义类型

2.1. 问题

#include <iostream>
#include <algorithm>
#include <unordered_map>


class Node
{
public:
 Node() {}

 Node(std::vector<int> vec)
 {
 if (vec.size() < 3)
 {
 return;
 }

 std::sort(vec.begin(), vec.end());
 numA = vec[0];
 numB = vec[1];
 numC = vec[2];
 }

 bool operator ==(const Node& rhs)
 {
 return rhs.numA == numA && rhs.numB == numB && rhs.numC == numC;
 }

public:
 int numA;
 int numB;
 int numC;
};


int main()
{

 std::vector<int> vec{ 3, 8, 9 };
 Node currNode(vec);

 std::unordered_map<Node, int> mapInstance;
 mapInstance[currNode] = 0;

 for (auto pos = mapInstance.begin(); pos != mapInstance.end(); ++pos)
 {
 std::cout << "\tsecond: " << pos->second << std::endl;
 }

 getchar();

 return 0;
}

提示

C2338  The C++ Standard doesn't provide a hash for this type.

2.2. 解决方案

To be able to use std::unordered_map (or one of the other unordered associative containers) with a user-defined key-type, you need to define two things:

A hash function; this must be a class that overrides operator() and calculates the hash value given an object of the key-type. One particularly straight-forward way of doing this is to specialize the std::hash template for your key-type.

A comparison function for equality; this is required because the hash cannot rely on the fact that the hash function will always provide a unique hash value for every distinct key (i.e., it needs to be able to deal with collisions), so it needs a way to compare two given keys for an exact match. You can implement this either as a class that overrides operator(), or as a specialization of std::equal, or – easiest of all – by overloading operator==() for your key type (as you did already).

为使用户自定义类型可以使用 std::unordered_map，需要定义两个方法：

hash 函数。这是一个重载了 operator() 操作符，并计算给定对象 key 的 hash 值的类。默认使用 std::hash。
相等比较函数。因为 hash 函数可能会产生碰撞，需要一种方法比较两个 key 是否严格匹配。可以通过重载 operator()、使用 std::equal 的特化版本、或重载 operator== 来实现。

The difficulty with the hash function is that if your key type consists of several members, you will usually have the hash function calculate hash values for the individual members, and then somehow combine them into one hash value for the entire object. For good performance (i.e., few collisions) you should think carefully about how to combine the individual hash values to ensure you avoid getting the same output for different objects too often.

hash 函数的难点在于若 key 值包含多个成员，如何计算单个成员的 hash 值，以及如何计算组合后的 hash 值，以避免碰撞。

2.2.1. 修改 std::hash 方法

A fairly good starting point for a hash function is one that uses bit shifting and bitwise XOR to combine the individual hash values. For example, assuming a key-type like this:

struct Key
{
 std::string first;
 std::string second;
 int         third;

 bool operator==(const Key &other) const
 {
 return (first == other.first
 && second == other.second
 && third == other.third);
 }
};

namespace std
{
 template <>
 struct hash<Key>
 {
 std::size_t operator()(const Key& k) const
 {
 using std::size_t;
 using std::hash;
 using std::string;

 // Compute individual hash values for first,
 // second and third and combine them using XOR
 // and bit shifting:

 return ((hash<string>()(k.first)
 ^ (hash<string>()(k.second) << 1)) >> 1)
 ^ (hash<int>()(k.third) << 1);
 }
 };
}


int main()
{
 std::unordered_map<Key, std::string> m6 = {
 { { "John", "Doe", 12 }, "example" },
 { { "Mary", "Sue", 21 }, "another" }
 };
}

It will automatically use std::hash<Key> as defined above for the hash value calculations, and the operator== defined as member function of Key for equality checks.

一个好的实践是使用位的移动和 XOR 位操作，组合单个的 hash 值。这会自动使用 std::hash<Key> 计算 hash 值。

2.2.2. 定义 hash 函数类

If you don't want to specialize template inside the std namespace (although it's perfectly legal in this case), you can define the hash function as a separate class and add it to the template argument list for the map:
struct KeyHasher
{
 std::size_t operator()(const Key& k) const
 {
 using std::size_t;
 using std::hash;
 using std::string;

 return ((hash<string>()(k.first)
 ^ (hash<string>()(k.second) << 1)) >> 1)
 ^ (hash<int>()(k.third) << 1);
 }
};

int main()
{
 std::unordered_map<Key, std::string, KeyHasher> m6 = {
 { { "John", "Doe", 12 }, "example" },
 { { "Mary", "Sue", 21 }, "another" }
 };

 getchar();
}

如果不想在 std 命名空间里特化 std::hash 模板，可以定义 hash 函数类，并将它作为 unordered_map 的模板类型参数。

2.3. 优良的 hash 函数

How to define a better hash function? As said above, defining a good hash function is important to avoid collisions and get good performance. For a real good one you need to take into account the distribution of possible values of all fields and define a hash function that projects that distribution to a space of possible results as wide and evenly distributed as possible.

定义一个好的 hash 函数。

4. find() 返回值

4.1. iterator 不可修改

iterator find ( const key_type& k );
const_iterator find ( const key_type& k ) const;
Get iterator to element

Searches the container for an element with k as value and returns an iterator to it if found, otherwise it returns an iterator to unordered_set::end (the element past the end of the container).

Another member function, unordered_set::count, can be used to just check whether a particular element exists.

All iterators in an unordered_set have const access to the elements (even those whose type is not prefixed with const_): Elements can be inserted or removed, but not modified while in the container.

Return value

An iterator to the element, if the specified value is found, or unordered_set::end if it is not found in the container.

Member types iterator and const_iterator are forward iterator types. Both may be aliases of the same iterator type.

4.2. 原因及解决方案

Both set and unordered_set have read-only keys. It's easy to see why this is the case - if the key value were to change, the data structure would have it filed in the wrong spot and you wouldn't be able to find it anymore.

It could be possible to change some part of the object that is not used in making the hash key, but that would lead to possible hard to track down bugs. The standards committee decided to eliminate that possibility by making the entire key const.

There are two ways around this restriction.

The first is to split the key from the value and use a mapor unordered_map instead.

The second is to remove the item from the set and reinsert it after it's modified.

4.3. 接口说明

iterator is the same as const_iterator, why there is non-const version of find()?

Because iterator is not mandatory the same as const_iterator, as stated in documentation:

The member types iterator and const_iterator may be aliases to the same type. Since iterator is convertible to const_iterator, const_iterator should be used in function parameter lists to avoid violations of the One Definition Rule.

emphasis is mine. Since they are not mandatory the same some generic code can depend on particular type of iterator returned by find() and it should be consistent with other containers.

5. Range-based for loop

Each element of the container is a map<K, V>::value_type, which is a typedef for std::pair<const K, V>. Consequently, you'd write this as
for (auto& kv : myMap) {
std::cout << kv.first << " has value " << kv.second << std::endl;
}
For efficiency, it is a good idea to make the parameter in the loop a reference. You could also consider making it const if you want a read-only view of the values.

Starting with C++17, you can also write
for (auto& [key, value]: myMap) {
std::cout << key << " has value " << value << std::endl;
}
which is a lot cleaner.