spark+mongodb在企业内网部署大数据处理环境

安装 spark2.1.0    mongodb 4.0.9  

参考 https://www.cnblogs.com/hanson1/p/7105288.html   https://www.jianshu.com/p/080324f87b3f  安装比较简单

按照上面文章的说法,执行

/u01/spark-2.1.0-bin-hadoop2.7/bin/spark-shell \

--conf "spark.mongodb.input.uri=mongodb://127.0.0.1/jinrong.chuxu?readPreference=primaryPreferred" \

--conf "spark.mongodb.output.uri=mongodb://127.0.0.1/jinrong.chuxu" \

--packages org.mongodb.spark:mongo-spark-connector_2.11:2.1.5


提示错误:You probably access the destination server through a proxy server that is not well configured.

上述命令时 需要进行联网操作,猜测是否可以引入jar包?

/u01/spark-2.1.0-bin-hadoop2.7/bin/spark-shell --jars /u01/spark-2.1.0-bin-hadoop2.7/bin/ojdbc6.jar , \

/u01/spark-2.1.0-bin-hadoop2.7/bin/mongo-spark-connector_2.11-2.1.5.jar , \

/u01/spark-2.1.0-bin-hadoop2.7/bin/mongo-scala-driver_2.11-2.1.0.jar , \

/u01/spark-2.1.0-bin-hadoop2.7/bin/mongo-java-driver-3.4.2.jar  \

--driver-class-path /u01/spark-2.1.0-bin-hadoop2.7/bin/ \

--conf "spark.mongodb.input.uri=mongodb://127.0.0.1/jinrong.chuxu?readPreference=primaryPreferred" \

                  --conf "spark.mongodb.output.uri=mongodb://127.0.0.1/jinrong.chuxun" \

                  --packages org.mongodb.spark:mongo-spark-connector_2.11:2.1.5


:require /u01/spark-2.1.0-bin-hadoop2.7/bin/bson-3.4.2.jar

:require /u01/spark-2.1.0-bin-hadoop2.7/bin/mongo-scala-driver_2.11-2.1.0.jar

:require /u01/spark-2.1.0-bin-hadoop2.7/bin/mongo-java-driver-3.4.2.jar

:require /u01/spark-2.1.0-bin-hadoop2.7/bin/mongodb-driver-3.6.4.jar


正常启动,后尝试使用:

import com.mongodb.spark._

import org.bson.Document

MongoSpark.load(sc).take(10).foreach(println)

报错: 

required: org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.SparkContext

Mongo Spark error: overloaded method value load with alternatives:


后来引入以下包放入 ~/.ivy 目录下 就可以了

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容