蓄水池抽样系列问题

1.链表以等概率的方式随机返回节点

/**
 * Definition for singly-linked list.
 * struct ListNode {
 *     int val;
 *     ListNode *next;
 *     ListNode() : val(0), next(nullptr) {}
 *     ListNode(int x) : val(x), next(nullptr) {}
 *     ListNode(int x, ListNode *next) : val(x), next(next) {}
 * };
 */
class Solution {
public:
    Solution(ListNode* head) {
        this->node = head;

    }
    
    int getRandom() {
        int count = 1;
        int ans = 0;
        while(node) {
            if (random()%count == 0) {
                ans = node->val;
            }
            ++count;
            node = node->next;
        }
        return ans;
    }
private:
    ListNode* node;
};

/**
 * Your Solution object will be instantiated and called as such:
 * Solution* obj = new Solution(head);
 * int param_1 = obj->getRandom();
 */

2.等概率选择K个节点

for(int i = 0; i < k; ++i)
  R[i] = L[i];
for(int i = k; i < n; ++i) {
  j = rand.nextInt(i+1);
  if(j < k)
    R[j] = L[i];
}
return R;

3.分布式 https://ballsandbins.wordpress.com/2014/04/13/distributedparallel-reservoir-sampling/

假设存在A和B两条数据流，A的长度为m，B的长度为n，且m和n都远大于k。
可以使用问题2中的采样算法分布得到R和S，接下来要考虑如何融合R和S。
假设选择R的概率p=m/(m+n)，反之不选择的概率为1-p。
可以得出选择R的概率为 1/k*k/m * m/(m+n) = 1/(m+n)

for(sub-stream s: sub-streams) do in parallel {
  simple sequential reservoir sampling and count length of s;
}
double p = (double) m / (m+n);
for(int i = 0; i < k; ++i){
  j = rand.nextDouble();
  if(j <= p)
    move a random item from R to T;
  else
    move a random item from S to T;
}
return T;

蓄水池抽样系列问题

推荐阅读更多精彩内容