/spark/bin/spark-submit \
--master yarn \ //提交模式
--deploy-mode cluster \ //运行的模式
--class org.apache.spark.demo \ //提交的任务
--name "demo" \ //任务名字
--queue root.default \ //提交的队列
--driver-memory 1g \ //为driver申请的内存
--num-executors 20 \ //executors的数量,对应yarn中的Container个数
--executor-memory 3g \ //为每一个executor申请的内存
--executor-cores 3 \ //为每一个executor申请的core
--conf spark.yarn.driver.memoryOverhead=1g \ //driver可使用的非堆内存,这些内存用于如VM,字符 串常量池以及其他额外本地开销等
--conf spark.yarn.executor.memoryOverhead=2g \ //每个executor可使用的非堆内存,这些内存用于如 VM,字符串常量池以及其他额外本地开销等
上面的命令执行顺序大致为:
spark-submit.sh :
exec "${SPARK_HOME}"/bin/spark-class org.apache.spark.deploy.SparkSubmit "$@"
SparkSubmit对象文件中:
mai():
val appArgs = new SparkSubmitArguments(args)
submit(appArgs, uninitLog):
val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args)
doRunMain():
runMain(childArgs, childClasspath, sparkConf, childMainClass, args.verbose):
mainClass = Utils.classForName(childMainClass)
val app: SparkApplication = if (classOf[SparkApplication].isAssignableFrom(mainClass)) {
mainClass.newInstance().asInstanceOf[SparkApplication]
} else {
// SPARK-4170
if (classOf[scala.App].isAssignableFrom(mainClass)) {
printWarning("Subclasses of scala.App may not work correctly. Use a main() method instead.")
}
new JavaMainApplication(mainClass)
}
app.start(childArgs.toArray, sparkConf)
文件描述:
- spark-submit.sh 提交任务,程序进入到SparkSubmit 文件代码中
- 执行main()函数:
2.1 匹配到SPARK-SUBMIT执行submit函数;
2.1.1 参数准备
2.1.2 执行doRunMain()函数:
2.1.2.1 执行runMain()函数:
2.1.2.1.1:
2.1.2.1.1.1 反向加载yarn类文件:mainClass = Utils.classForName(childMainClass)
2.1.2.1.1.2 实例化一个yarn对象:
app = mainClass.newInstance().asInstanceOf[SparkApplication]
2.1.2.1.1.3开始向yarn提交任务:app.start(childArgs.toArray, sparkConf)