It starts from an empty seed set S = ∅,and then iteratively adds into S the node u that leads to the largest increase in E[Ⅰ(S)],until |S| = k.That is,
u = arg max (E[Ⅰ(S∪{v})] - E[Ⅰ(S)]), v∈V.
To address this issue,Kempe et al. propose to estimate E[Ⅰ(S)] to a reasonable accuracy using a Monte Carlo method.To explain,remove the edge with 1 - p(e) probability. Let g be the resulting graph,and R(S) be the set of nodes in g that are reachable from S.Kempe et al. prove that the expected size of R(S) equals E[Ⅰ(S)]. Therefore, to estimate E[Ⅰ(S)],we can first generate multiple instances of g, then measure R(S) on each instance, and finally take the average measurement as an estimation of E[Ⅰ(S)].
Assume that we take a large number r of measurements in the estimation of each E[Ⅰ(S)]. Then, with a high probability, Greedy yields a (1 - 1/e - ε)-approximate solution under the IC model,where ε is a constant that depents on both G and r.In general, Greedy achieves the same approximation ratio under any cascade model where E[Ⅰ(S)] is a submodular function of S.
Although Greedy is general and effective, it incurs significant computation overheads due to its O(kmnr) time complexity.Specifically, it runs in k iterations, each of which requires estimating the expected spread of O(n) node sets. In addition, each estimation of expected spread takes measurements on r graphs, and each measurement needs O(m) time.There lead to an O(kmnr) total running time.