(Boolan) C++ STL与泛型编程——容器2

对于标准库来说，容器是非常大的一块内容，那么之前已经谈过关于list、vector、array、forward_list（slist）的内容，还有很多的容器是没有谈到的，今天就把剩下的容器一网打尽，全部过一遍，看看他们背后到底藏着些什么秘密吧。

容器结构分类

序列式容器（Sequence Container）的衍生关系
- array （C++2.0）连续空间
- vector 连续空间
- heap 以算法形式呈现（xxx_heap()）
  - priority_queue
- list 双向链表
- slist C++2.0中为forward_list,单向链表
- deque 分段连续空间
  - stack Container Adapter
  - queue Container Adapter

关联式容器（Associative Containers）的衍生关系(复合)
- rb_tree 红黑树，非公开
  - set
  - map
  - multiset
  - multimap
- hashtable非公开
  - hash_set非标准，C++2.0为unordered_set
  - hash_map非标准，C++2.0为unordered_map
  - hash_multiset非标准，C++2.0为unordered_multiset
  - hash_mulitmap非标准，C++2.0为unordered_multimap

容器 deque —— 双向开口的一段“表面连续”空间

表面看deque的图形是如下图所hi的样子，可以认为他是一段连续的内存空间，同时，deque可以双向扩充。不但可以向前方增加内存空间，也可以向后方增加内存空间。

表面表现出来的内存结构

但是对于实际内存来说，双向扩充是比较复杂的事情。因为，在内存的角度来看，内存是分配给所有的运行程序的，所以在我们的程序之前或之后的内存空间，说不定操作系统早就帮我们分配给别的程序了，如果，那些空间都是为我们程序预留起来的话，那么实在是浪费空间，并且，给你预留多少合适呢，你总还是需要向两方面扩充的。所以，如果内存的设计方式和我们所感觉的一样，其实，在实际的操作中是不好实现的。

那么就需要设计一套行之有效的方法来解决在这样的内存空间中的双向扩充问题了。

那么stl的实际的解决方案，内存示意图，是如下图所示的这样的。

实际的管理内存方式

对于deque来说，实际是有一块连续的内存空间的，他的实现底层是采用的vector来实现的，而这块连续空间中，只维护这一些*** 指针 ***，这些指针，每个都指向一块连续的空间，也就是buffer。在每个buffer中，保存的都是实际的数据。

这样做可以解决之前所提出的问题，如果扩充的时候，只需要考虑改变vector这块连续空间中的指针数组就可以了，增加的空间只需要放在对应的buffer，而不需要多大程度来影响连续空间的问题。

那么既然实际的内存空间不连续，如何让使用者感受不到实际设计就变得十分重要了。接下来，我们会仔细研究一下，为什么deque可以通过这样的设计，来让使用者觉得他是连续的。

deque的iterator

这里面最关键的就在于上图所示的部分了，也就是这个iterator的作用，让这样“连续是假象，分段是实事”的情况得以实现。在迭代器运行到边界的时候，都需要检测是否到边界，然后通过回到控制buffer的那个vector来管理边界的buffer了。在iterator中，cur、first、last和node分别指向了用户使用时的当前的数据，first指向了buffer的第一块空间，last指向了buffer之后那个不在buffer中的空间，而node指向了控制buffer的指针序列中的实际位置。

原理其实说的差不多了，但是实际代码长啥样呢，我们来看看吧。

//如果BufSiz不为0，则返回对应值，表示buffer size由开发者自己确定
//如果BufSiz为0，表示buffer size 由预设值决定。
template<class T, class Alloc = alloc, size_t  BufSiz = 0>
//T:存储的数据类型
//Alloc:分配器
//BufSiz:buffer的大小
class deque{
public:
      typedef T value_type;
      typedef __deque_iterator<T, T&, T*, BufSize> iterator;  //buffer size 是指每个buffer容纳的元素个数
//在接下来会给出__deque_iterator的源代码
protected:
      typedef pointer* map_pointer; //T**
protected:
      iterator start;
      iterator finish;
      map_pointer map;
      size_type map_size;
public:
      iterator begin(){return start;}
      iterator end() {return finish;}
      size_type size() const {return finish - start; }
...................

//确定BufSiz的大小，如果sz（sizeof(value_type)） < 512， 返回 512/sz
//如果sz>= 512，返回1
    inline size_t __deque_buf_size(size_t n, size_t sz){
        return n!=0? (sz<512?size_t(512/sz):size_t(1));
    }
}

//__deque_iterator的源代码
template<class T, class Ref, class Ptr, size_t BufSiz>
struct __deque_iterator{
    typedef random_access_iterator_tag  iterator_datagory
    typedef T value_type;
    typedef Ptr Pointer;
    typedef Ref reference;
    typedef size_t size_type;
    typedef ptrdiff_t difference_type;
    typedef T** map_pointer;
    typedef __deque_iterator self;

    //迭代器的数据部分，也就是之前的cur、first、last和node
    T* cur;
    T* first;
    T* last;
    map_pointer node;
............
}

//deque的插入问题
//元素插入的时候，因为是按顺序排列，如果插入在中间的位置，应该会改变其他元素的位置
//就相当于在在书架中插入一本书，肯定需要移动前后的书
//如果插入点，距离前端比较近，那么移动前端比较合适，效率较高
//如果插入点距离后端比较近，那么将插入点之后的元素向后移动比较快

//在postion处插入一个元素x
iterator insert(iterator postion, const value_type& x){
    if(postion.cur == start.cur)  //如果安插点是deque的最前端
    {
        push_front(x);  //直接使用push_front
        return start;
    }
    else if(postion.cur == finish.cur)  //如果安插点是deque的最末位
    {
        push_back(x);  //直接交给push_back
        iterator tmp = finish;
        --tmp;
        return tmp;
    }
    else
    {
        return insert_aux(postion, x);
     }
}

template <class T, class Alloc, size_t BufSize>
typename deque<T, Alloc, BufSize>::iterator_deque<T, Alloc, BufSIze>:: itert_aux(iterator pos, const value_type& x){
    difference_type index = pos - start;    //安插点之前的元素个数
    value_type x_copy = x;
    if(index < size() / 2){  //如果安插点之前的元素较少
        push_front(front());  //在最前端加入第一个元素同值的元素
        .......
        copy(front2, pos1, front1);  //元素搬移
    }
    else {    //安插点之后的元素较少
        push_back(back());//在尾端加入最末元素同值的元素
        ......
        copy_backward(pos, back2, back1);//元素搬移
    }
    *pos = x_copy;//在安插点上设定新值
    return pos;
}

deque如何模拟连续空间

主要功劳都是iterator的协调

reference operator[](size_type n)
{
      return start[difference_type(n)];
}
reference front()
{
    return *start;
}
reference back()
{
    iterator tmp = finish;
    --tmp;
    return *tmp;
}
size_type size() const
{
    return finish - start; 
    // 此处内存不连续，说明操作符- 进行了重载
}
bool empty() const
{
    return finish == start;
}
reference operator* () const
{
    return *cur;
}
pointer operator->() const
{
    return &(operator*());
}
//两个iterator之间的距离相当于
//1.两个iterator之间的buffer的总长度
//2.加上itr至buffer末尾的长度
//3.加上x至buffer开头的长度
difference_type
operator- (const self& x) const
{
    return difference_type(buffer_size()) * (node - x.node - 1) + (cur - first) + (x.last - x.cur);
    //buffer size * 首尾buffer之间的buffer之间的数量 + 末尾（当前）buffer的元素量 + 起始buffer的元素量
}

//-- 和++ 的操作符重载
self& operator++()
{
    ++cur;  //切换至下一个元素
    if(cur == last){  //如果抵达缓冲区的末尾
        set_node(node + 1);  //就跳至下一个节点（缓冲区）的起点
        cur = first;  
    }
    return *this;
}
self operator++(int)
{
    self tmp = *this;
    ++*this;
     return tmp;
}

self& operator--()
{
    if(cur == first){
        set_node(node - 1);
        cur = last;
    }
    --cur;
    return *this;
}
self operator--(int)
{
    self tmp = *this;
    --*this;
    return tmp;
}

void set_node(map_pointer new_node)
{
    node = new_node;
    first = *new_node;
    last = first + difference_type(buffer_size));
}

self& operator+=(difference_type n ){
    difference_type offset = n + (cur - first);
    if(offset >= 0 && offset < difference_type(buffer_size())
    //目标位置在同一级缓存区
         cur += n;
     else{
       //目标位置不在同一级缓存区内
         difference_type node_offset = offset > 0? offset / difference_type(buffer_size()): -difference_type((-offset - 1) / buffer_size;
          //切换至正确的的缓存区
          set_node(node + node_offset);
          cur = first + (offset - node_offset * difference_type(buffser_size());
      }
      return *this;
}

operator+(difference_type n) const 
{
     self tmp = *this;
     return tmp += n;
}

self& operator-=(difference_type n)
{
    return *this += - n;
}
self operator-(difference_type n)
{
    self tmp = *this;
    return tmp -= n;
}
reference operator[] (difference_type n)const
{
    return *(*this + n);
}

GNU4.9版本中的版本

UML

容器 queue

内部维护一个deque，开放部分功能实现先进先出。

template <class T, class Sequence = deque<T>>
class queue
{
............
public:
    typedef typename Sequence::value_type value_type
    typedef typename Sequence::size_type size_type
    typedef typename Sequence::reference reference;
    typedef typename Sequence::const_reference const_reference;
protected:
    Sequence c;  //底层容器
 public:
    bool empty() const{return c.empty();}
    size_type size() const{return c.size();}
    reference front() const {return c.front();}
    const_reference front() const{ return c.front();}
    reference back(){return c.back(); }
    const_reference back() const {return c.back();}
    void push (const value_type& x){ c.push_back(); }
    void pop(){c.pop.front();}
}

容器 stack

内部维护一个deque，开放部分功能实现先进先出。

template <class T, class Sequence = deque<T>>
class stack
{
............
public:
    typedef typename Sequence::value_type value_type
    typedef typename Sequence::size_type size_type
    typedef typename Sequence::reference reference;
    typedef typename Sequence::const_reference const_reference;
protected:
    Sequence c;  //底层容器
 public:
    bool empty() const{return c.empty();}
    size_type size() const{return c.size();}
    reference top() const {return c.back();}
    const_reference top() const{ return c.back();}
    void push (const value_type& x){ c.push_back(); }
    void pop(){c.pop.back();}
}

stack和queue都可以选择list或deque作为底层结构
queue不可以选择vector作为底层结构（vector中没有pop_front()函数，提供给pop()函数来调用）
stack和queue都不允许遍历，也不提供iterator

rb-tree（红黑树）

红黑树

Red-Black tree（红黑树）是平衡二叉查找树（balanced Binary search tree）的一种，平衡二叉搜索树的特点：排列规律，方便查找和插入，并能够保持适度的平衡，不会产生深度很深的子树

rb_tree提供“遍历”的操作和iterator，按正常规则（++ite）遍历，便能够获得排序后的状态。

我们不应该使用rb_tree的iterator来改变元素的值（因为元素尤其严谨的排列规则）。编程层面并没有禁止这样做，但是如果设计是正确的，因为rb_tree即将为set和map服务（作为其底部支持），而map允许元素的data改变，只有元素的key啊此时不可以改变的。

rb_tree提供两种插入方式：insert_unique()和insert_equal()前者表示节点的key一定在整个tree中独一无二，否则安插失败；第二个表示节点key可以重复。

标准库对rb_tree的实现

template <class Key,
                class Value,
                class KeyOfValue,
                class Compare,
                class Alloc = alloc>
class rb_tree{
protected:
    typedef __rb_tree_node<Value> rb_tree<node;
    .....
public:
    typedef rb_tree_node* link_type;
......
protected:
    //rb_tree只以三种数据表现自己
    size_type node_count;  //rb_tree的大小
    link_type header;  //一个rb_tree_node的指针
    Compare key_compare;  //key的大小比较，应该是function object
     ..........
};

GNU4.9之后的rb_tree的结构

UML

容器set、multiset

set和multiset以rb_tree为底层结构，因此有“元素自动排序的功能”。特性：排序的依据是key，而set和multiset元素的key和value合一，value就是key；

set和multiset提供“遍历”的操作以及iterator，按正常规则（++ite）遍历，便能够获得排序后的状态

我们无法使用set和multiset的iterator改变元素的值（因为，key尤其严谨的排列规则），set和multiset的iterator是底部的红黑树的const_iterator，就是为了禁止开发者对元素进行赋值操作的。

set的元素的key必须独一无二，因此insert()使用的是红黑树的insert_unique()
multiset元素的key可以重复，因此使用insert()使用的是红黑树的insert_equal()

template <class Key, class Compare = less<Key>, class Alloc = alloc>
class set{
public:
      //typedefs:
      typedef Key key_type;
      typedef Key value_type;
      typedef Compare key_compare;
      typedef Compare value_compare;
private:
    typedef rb_tree<key_type, value_type, identity<value_type<. key_compare, Alloc> rep_type;
     rep_type t;
 public:
      typedef typename rep_type::const_iterator iterator;  //此处为rep_type::const_iterator，所以不能够修改
..........
//set的所有操作，都调用底层rb_tree的函数，从这点看来，set实际应该为container adapter
}

set到rb_tree的模版

multiset和set的原理基本一样，调用插入部分有些区别

容器map和multimap

map和multimap以rb_tree为底层机构，因此，有“元素自动排列的特性”，排序的依据是key。

map和multimap提供遍历的操作和iterator。按正常的++ite，就可以得到排序后的结果

我们无法使用map和multimap的iterator改变元素的key（因为key尤其严谨的排序规则），但可以用它来改变元素的data。因此map和multimap内部自动将开发者指定的key type设置为const，以此便能够禁止开发者对元素的key进行赋值。

map元素的key必须为独一无二的，因此insert()用的是红黑树的insert_unique()
multimap元素的key可以重复，因此insert()用的是红黑树的insert_equal()

template <class Key,class T,  class Compare = less<Key>, class Alloc = alloc>
class map{
public:
      //typedefs:
      typedef Key key_type;
      typedef T data_type;
      typedef T mapped_type;
      typedef pair<const Key, T> value_type;
      typedef Compare value_compare;
private:
    typedef rb_tree<key_type, value_type, select1st<value_type>, key_compare, Alloc> rep_type;
     rep_type t;
 public:
      typedef typename rep_type::const_iterator iterator;  //此处为rep_type::const_iterator，所以不能够修改
..........
//set的所有操作，都调用底层rb_tree的函数，从这点看来，set实际应该为container adapter
}

map通过重载了操作符[]来实现通过key来查找元素，multimap的key可以重复，那么没有这个方法
如果对于在空间中不存在的key使用[] 赋值，会在map中自动添加该元素

hashtable（哈希表）

为了方便管理元素，对于多个元素来说么可以除以可放元素的空间的大小，得到的余数来作为存放元素的编号。对于不同元素放在同一个位置，会产生碰撞的情况，为了避免这个问题，采用了链表来组织重复的元素，这样就可以避免了一个空间存放多个元素的问题（separate chaining）。

hash表的内存示意图

如此管理原始，解决了空间上的问题，但是如果出现了链表过长的情况，遍历链表的过程反而消耗了大量的时间，效率会变得低下，那么应该如何解决呢？

对于存放链表的空间，把他们成为篮子，那么，凭借经验值得出来，如果当元素的个数多过篮子的个数时，就将篮子的数量进行调整，重新分配元素的位置（rehashing）

篮子的个数，一般都是采用质数来做篮子的数量，当扩充时，会选择原有篮子数量的两倍以上的质数，作为篮子的个数

template<class Value, class Key, class HashFcn, class ExtractKey, class EqualKey, class Alloc = alloc>
class hashtable{
public:
      typedef HashFcn hasherl
      typedef EqualKey key_equal;
      typedef size_t size_type;
private:
      hasher hash;
      key_equal equals;
      ExtractKey get_key;
      typedef __hashtable_node<Value> node;
      vector<node*, Alloc> buckets;
      size_type num_elements;
public:
      size_type bucket_count() const{return buckets.size();}
...........
};

template<class Value, class Key, class HashFcn, class ExtractKey, class EqualKey, class Alloc = alloc>
struct __hashtable_iterator{
    .....
    node* cur;
    hashtable* ht;
};

template<class Value>
struct __hashtable_node{
  __hashtable_node* next;
  Value val;
};

hash Function的问题（HashFcn）
目的：希望根据元素值酸楚一个hash code（一个可进行modulus运算的值），使得元素经hash code映射之后能够“够乱够随机”的被置于hashtable中，越混乱，越不容易发生碰撞。

//泛化
template<class Key> struct hash{};
//特化
#define __STL_TEMPLATE_NULL template<>

__STL_TEMPLATE_NULL struct hash<char> {
    size_t operator()(char x) const {return x;}
};
__STL_TEMPLATE_NULL struct hash<short> {
    size_t operator()(short x) const {return x;}
};
__STL_TEMPLATE_NULL struct hash<unsigned short> {
    size_t operator()(unsigned short x) const {return x;}
};
.......

//char* 的hash function的特化版本
//标准库没有提供hash<std:string>的版本
__STL_TEMPLATE_NULL struct hash<char *> {
    size_t operator()(char* x) const {return __stl_hash_string(x);}
};
inline size_t __stl_hash_string(const char* s)
{
    unsigned long h = 0;
    for(; *s; ++s){
        h = 5 * h + *s;
    }
    return size_t(h);
}

c++字符串（string）使用hash table为底层的容器，需要重写hash function

unordered_set(hash_set)和unordered_multiset(hash_multiset)、unordered_map(hash_map)和unordered_multimap(hash_multimap)

这些容器的底层都是hashtable，因此这些容器的篮子一定大于元素的个数

(Boolan) C++ STL与泛型编程——容器2