1.前言
spark学习三个层次
1.在理论层面了解运行原理
2.在源码层面理解执行过程
3.定制源码
2.准备
1.win10 操作系统
2.JDK1.8
3.apache-maven-3.6.3
4.Scala 2.11
5.spark 2.2.0源码安装包
6.IntelliJ IDEA Community Edition 2020.2.4 x64
3.步骤
1.解压spark源码并使用idea打开,选择maven项目
2.配置maven仓库
File->settings->Build,Execution,Deployment->Build Tools->Maven
maven
3.修改pom下载源并下载依赖
替换下载源为阿里云
<repositories>
<repository>
<id>alimaven</id>
<name>aliyun maven</name>
<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
</repository>
</repositories>
4.问题解决
执行examples下的BroadcastTest文件,测试源码部署结果
测试执行
运行报错:
not found: type SparkFlumeProtocol
val transactionTimeout: Int, val backOffInterval: Int) extends SparkFlumeProtocol with Logging {
not found: type EventBatch
override def getEventBatch(n: Int): EventBatch = {
not found: type EventBatch
new EventBatch("Spark sink has been stopped!", "", java.util.Collections.emptyList())
运行错误
解决方法:
打开File->Project Structure->Modules
选择spark-streaming-flume-sink_2.11
修改前
设置target为Sources,target中的scala-2.11取消Excluded
修改后
5.运行成功
再次执行examples下的BroadcastTest文件,执行成功!!!
执行成功