1.链表以等概率的方式随机返回节点
/**
* Definition for singly-linked list.
* struct ListNode {
* int val;
* ListNode *next;
* ListNode() : val(0), next(nullptr) {}
* ListNode(int x) : val(x), next(nullptr) {}
* ListNode(int x, ListNode *next) : val(x), next(next) {}
* };
*/
class Solution {
public:
Solution(ListNode* head) {
this->node = head;
}
int getRandom() {
int count = 1;
int ans = 0;
while(node) {
if (random()%count == 0) {
ans = node->val;
}
++count;
node = node->next;
}
return ans;
}
private:
ListNode* node;
};
/**
* Your Solution object will be instantiated and called as such:
* Solution* obj = new Solution(head);
* int param_1 = obj->getRandom();
*/
2.等概率选择K个节点
for(int i = 0; i < k; ++i)
R[i] = L[i];
for(int i = k; i < n; ++i) {
j = rand.nextInt(i+1);
if(j < k)
R[j] = L[i];
}
return R;
3.分布式 https://ballsandbins.wordpress.com/2014/04/13/distributedparallel-reservoir-sampling/
假设存在A和B两条数据流,A的长度为m,B的长度为n,且m和n都远大于k。
可以使用问题2中的采样算法分布得到R和S,接下来要考虑如何融合R和S。
假设选择R的概率p=m/(m+n),反之不选择的概率为1-p。
可以得出选择R的概率为 1/k*k/m * m/(m+n) = 1/(m+n)
for(sub-stream s: sub-streams) do in parallel {
simple sequential reservoir sampling and count length of s;
}
double p = (double) m / (m+n);
for(int i = 0; i < k; ++i){
j = rand.nextDouble();
if(j <= p)
move a random item from R to T;
else
move a random item from S to T;
}
return T;