安装 spark2.1.0 mongodb 4.0.9
参考 https://www.cnblogs.com/hanson1/p/7105288.html https://www.jianshu.com/p/080324f87b3f 安装比较简单
按照上面文章的说法,执行
/u01/spark-2.1.0-bin-hadoop2.7/bin/spark-shell \
--conf "spark.mongodb.input.uri=mongodb://127.0.0.1/jinrong.chuxu?readPreference=primaryPreferred" \
--conf "spark.mongodb.output.uri=mongodb://127.0.0.1/jinrong.chuxu" \
--packages org.mongodb.spark:mongo-spark-connector_2.11:2.1.5
提示错误:You probably access the destination server through a proxy server that is not well configured.
上述命令时 需要进行联网操作,猜测是否可以引入jar包?
/u01/spark-2.1.0-bin-hadoop2.7/bin/spark-shell --jars /u01/spark-2.1.0-bin-hadoop2.7/bin/ojdbc6.jar , \
/u01/spark-2.1.0-bin-hadoop2.7/bin/mongo-spark-connector_2.11-2.1.5.jar , \
/u01/spark-2.1.0-bin-hadoop2.7/bin/mongo-scala-driver_2.11-2.1.0.jar , \
/u01/spark-2.1.0-bin-hadoop2.7/bin/mongo-java-driver-3.4.2.jar \
--driver-class-path /u01/spark-2.1.0-bin-hadoop2.7/bin/ \
--conf "spark.mongodb.input.uri=mongodb://127.0.0.1/jinrong.chuxu?readPreference=primaryPreferred" \
--conf "spark.mongodb.output.uri=mongodb://127.0.0.1/jinrong.chuxun" \
--packages org.mongodb.spark:mongo-spark-connector_2.11:2.1.5
:require /u01/spark-2.1.0-bin-hadoop2.7/bin/bson-3.4.2.jar
:require /u01/spark-2.1.0-bin-hadoop2.7/bin/mongo-scala-driver_2.11-2.1.0.jar
:require /u01/spark-2.1.0-bin-hadoop2.7/bin/mongo-java-driver-3.4.2.jar
:require /u01/spark-2.1.0-bin-hadoop2.7/bin/mongodb-driver-3.6.4.jar
正常启动,后尝试使用:
import com.mongodb.spark._
import org.bson.Document
MongoSpark.load(sc).take(10).foreach(println)
报错:
required: org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.SparkContext
Mongo Spark error: overloaded method value load with alternatives:
后来引入以下包放入 ~/.ivy 目录下 就可以了