将java文件使用eclipse的maven插件以jar包的形式导出,然后上传到spark集群中使用shell脚本运行时出现错误:19/05/10 16:25:18 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.AbstractMethodError:cn.spark.study.core.WordCountCluster$1.call(Ljava/lang/Object;)Ljava/util/Iterator;
错误原因:
从错误日志看java.lang.AbstractMethodError:sparkCore.JavaWordCount$2.call(Ljava/lang/Object;)Ljava/lang/Iterable;大致可以猜测是存在函数定义冲突。
具体的原因基本上是因为spark执行程序的版本和集群spark版本不一样,如集群spark版本是1.6.*,而执行程序使用的是2.1.*;由于版本差异大,部分函数的定义发生变化,会造成以上问题。
其实分析出错误原因是比较容易的,最坑爹的事情是,spark每个版本更新之后或者是加入新的功能以后很有可能对之前的版本不兼容,可能和之前版本的函数定义以及类的定义造成冲突,坑爹的就是 我们不知道hadoop 与 spark 以及 scala等等他们之间的版本到底哪些是兼容的,怎么办呢 只有一点点的试了,下面就是我 几个小时时间试出来的结果 目前为止肯定是兼容的;
scala -sdk - 2.11.11
spark-2.3.3-bin-hadoop2.7
用这两套组合, 可以执行
下面是兼容的maven依赖:
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.3</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.3.3</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.3.3</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.3.3</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.7</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.11</artifactId>
<version>1.5.2</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/java</sourceDirectory
<testSourceDirectory>src/main/test</testSourceDirectory>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass></mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.4.0</version>
<executions>
<execution>
<goals>
<goal>exec</goal>
</goals>
</execution>
</executions>
<configuration>
<executable>java</executable>
<includeProjectDependencies>true</includeProjectDependencies> <includePluginDependencies>false</includePluginDependencies>
<classpathScope>compile</classpathScope>
<mainClass>cn.spark.sparktest.App</mainClass>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
</plugins>
</build>
刷新项目,将target下的jar包导出
,用WinSCP导入到Linux服务器上
最后,等上传完成后,使用shell脚本启动文件就可以了;