1 Transformations介绍
Transformations(转换)
从之前的RDD构建一个新的RDD,像map()和filter()
map()
map()接收函数,把函数应用到RDD的每一个元素,返回新RDD
val lines=sc.parallelize(Array("hello","spark","hello","world","!")
lines.foreach(println)
val lines2 = lines.map(word=>(word,1))
lines2.foreach(println)
filter()
filter()接收函数,返回只包含满足filter()函数的元素的新RDD
val lines3=lines.filter(word=>word.contains("hello"))
lines3.foreach(println)
flatMap()
对每个输入元素,输出多个输出元素
flat压扁的意思,将RDD中元素压扁后返回一个新的RDD
val inputs=sc.textFile("/home/helloSpark.txt")
inputs.foreach(println)
val lines=inputs.flatMapt(line=>line.split(" "))
lines.foreach(println)
lines.foreach(print)
集合运算
val rdd1 = sc.parallelize(Array("coffe","coffe","panda","monkey","tea"))
rdd1.foreach(println)
val rdd2 =sc.parallelize(Array("coffe","monkey","kitty"))
rdd2.foreach(println)
val rdd_distinct=rdd1.distinct() #去重
rdd_distinct.foreach(println)
val rdd_union = rdd1.union(rdd2) #并集
rdd_union.foreach(println)
val rdd_inter=rdd1.intersection(rdd2) #交集
rdd_inter.foreach(println)
val rdd_sub=rdd1.subtract(rdd2) #包含
rd_sub.foreach(println)