这个错误是个老哥布林了,原因一般上就是 Spark 编译时的版本以及scala版本和运行环境上不一致导致的;但总是会动不动就踩一下这个错误;记录一下这次错误;
错误日志
在本地开发调试 ccp 的过程中,一次部署后出了问题,应用提交和交互式操作应用都出现了这个错误;
WARN ] 2020-07-24 16:13:09,468(252275) --> [SchedulerFactory4] com.cgws.ccp.interactive.socket.InteractiveServer.onStatusChange(InteractiveServer.java:379): Job section_1595578364911_390576819 is finished, status: ERROR, exception: null, result: %text java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.apache.spark.sql.kafka010.KafkaSourceProvider could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:232)
at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:630)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
at com.cgws.ccp.spark.util.ExternalTools$.loadJDBCReader(ExternalTools.scala:189)
at com.cgws.ccp.spark.util.ExternalTools$.readJDBCMysql(ExternalTools.scala:205)
at com.cgws.ccp.spark.util.ExternalTools$.readJDBC(ExternalTools.scala:122)
at com.cgws.ccp.spark.util.SparkUtils$.loadJdbcSource(SparkUtils.scala:341)
at com.cgws.ccp.spark.util.SparkUtils$.loadSource(SparkUtils.scala:297)
at com.cgws.ccp.spark.job.SparkScript$$anonfun$createTempViewForExternalSource$1.apply(SparkScript.scala:89)
at com.cgws.ccp.spark.job.SparkScript$$anonfun$createTempViewForExternalSource$1.apply(SparkScript.scala:80)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at com.cgws.ccp.spark.job.SparkScript.createTempViewForExternalSource(SparkScript.scala:79)
at com.cgws.ccp.spark.job.SparkScript$$anonfun$post$2.apply(SparkScript.scala:62)
at com.cgws.ccp.spark.job.SparkScript$$anonfun$post$2.apply(SparkScript.scala:53)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at com.cgws.ccp.spark.job.SparkScript.post(SparkScript.scala:53)
at com.cgws.ccp.spark.interpreter.CCPSparkSqlInterpreter.internalInterpret(CCPSparkSqlInterpreter.scala:72)
at com.cgws.ccp.interpreter.interpreter.AbstractInterpreter.interpret(AbstractInterpreter.java:47)
at com.cgws.ccp.interpreter.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:110)
at com.cgws.ccp.interpreter.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:685)
at com.cgws.ccp.interpreter.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:578)
at com.cgws.ccp.interpreter.scheduler.Job.run(Job.java:172)
at com.cgws.ccp.interpreter.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:130)
at com.cgws.ccp.interpreter.scheduler.ParallelScheduler.lambda$runJobInScheduler$0(ParallelScheduler.java:39)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodError: org.apache.spark.internal.Logging.$init$(Lorg/apache/spark/internal/Logging;)V
at org.apache.spark.sql.kafka010.KafkaSourceProvider.<init>(KafkaSourceProvider.scala:44)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
... 43 more
分析原因
- 1 马上检查代码中 spark 版本和 scala 版本;没有问题;2.4.3 + 2.11 和 deploy 的spark 一致;
- 2 这中间有一些代码修改,回退代码重新对项目打包部署,运行 OK;
- 3 手工打包 spark uber jar;替换部署之后发现问题重现;
到这一步,这个 uber jar 就很可疑了,他和完整打包的 uber 不一样;
- 4 jar -tf 将其内容进行对比
他们的 scala 版本不一致;
打开 uber jar 发现,同时出现了 spark*2.11 和 2.12 的依赖;导致出现本文的错误 ;
常见的 mvn 依赖问题;1 是管理好代码中的mvn dep,2 是管理好本地 repo;
- 5 分析 mvn 依赖;
mvn dependency:tree -Dverbose -Dincludes=org.apache.spark:spark-tags_2.12
mvn dependency:tree -Dverbose -Dincludes=org.apache.spark:spark-tags_2.11
- 6 检查 ccp-spark module 的 spark 依赖,寻找 spark***2.12
<!-- Spark dependency start -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.binary.version}</artifactId>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.binary.version}</artifactId>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_${scala.binary.version}
</artifactId>
<exclusions>
<exclusion>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- Spark dependency end -->
这里使用的是 parent 的 scala.binary.version,我们一开始就已经检查了代码,是2.11;继续检查 mvn 本地仓库;
- 7 检查 mvn local repo;
vim /Users/apple/.m2/repository/com/cgws/ccp/ccp/1.1.0/ccp-1.1.0.pom
结果发现和我们 code 里面的 pom 不一致;mvn 编译的老问题了;
<!-- scala -->
<scala.binary.version>2.12</scala.binary.version>
<scala.version>2.12.8</scala.version>
出现原因
之前调试过 将 spark 版本切换到 3.0.0 2.12,可能当时有过 install 的操作,将 parent 装到本地了;