1、Code
val samplerdd = sc.makeRDD(Array(
"spark1","spark2","spark3","spark4","spark5",
"hadoop1","hadoop2","hadoop3","java4","java5"
))
samplerdd.sample(false,0.3).foreach(println)
2、结果
spark4
hadoop2
java5
3、sample
sample(withReplacement:Boolean,fraction:Double,seed:Long)
withReplacement 是否放回抽样
true 代表如果抽中A元素,之后还可以抽取A元素
false 代表如果抽中A元素,之后不可以抽取A元素
fraction 抽样比例
seed 抽样算法的初始化值