一、Yarn集群模式
Spark on Yarn 模式就是将Spark应用程序跑在Yarn集群之上,通过Yarn资源调度将executor启动在container中,从而完成driver端分发给executor的各个任务。将Spark作业跑在Yarn上,首先需要启动Yarn集群,然后通过spark-shell或spark-submit的方式将作业提交到Yarn上运行。
提交作业之前需要将HADOOP_CONF_DIR或YARN_CONF_DIR配置到Spark-env.sh中。
- 集群规划
服务器 | IP地址 | 软件 | 服务 | 备注 |
---|---|---|---|---|
master | 192.168.247.131 | JDK、Scala、Spark | resourceManager、namenode、datanode | 主机 |
slave1 | 192.168.247.132 | JDK、Scala、Spark | nodeManager、namenode、datanode | 从机 |
slave2 | 192.168.247.130 | JDK、Scala、Spark | nodeManager、namenode、datanode | 从机 |
- 主机配置
192.168.247.131 master
192.168.247.132 slave1
192.168.247.130 slave2
- 配置免密
二、前置条件
1、Java8安装
/usr/lib/jvm/java-8-openjdk-amd64/
2、Scala scala 2.12.10
root@master:~# scala
Welcome to Scala 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_231).
Type in expressions for evaluation. Or try :help.
scala> :quit
3、安装Hadoop
需要安装HDFS模块和YARN模块,HDFS必须安装,spark运行时要把jar包存放到HDFS上。
# 下载
# Hadoop下载
root@master:~# wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/core/hadoop-2.9.2/hadoop-2.9.2.tar.gz
# 解压
root@master:~# tar -zxvf hadoop-2.9.2.tar.gz -C /usr/local
# 配置环境变量
root@master:~# vi /etc/profile
export SCALA_HOME=/usr/local/scala-2.12.10
export HADOOP_HOME=/usr/local/hadoop-2.9.2
export PATH=$PATH:$SCALA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
三、下载安装Spark
spark程序将作为YARN的客户端用于提交任务。
- 下载安装
下载地址:http://spark.apache.org/downloads.html
# 下载
root@master:~# wget https://www-us.apache.org/dist/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
# 解压
root@master:~# tar -zxvf spark-2.4.4-bin-hadoop2.7.tgz -C /usr/local
- 配置环境变量
# 配置环境变量
root@master:~# vi /etc/profile
# 内容
export SCALA_HOME=/usr/local/scala-2.12.10
export HADOOP_HOME=/usr/local/hadoop-2.9.2
export SPARK_HOME=/usr/local/spark-2.4.4-bin-hadoop2.7
export PATH=$PATH:$SCALA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SPARK_HOME/bin:$SPARK_HOME/sbin
# 环境变量立即生效
root@master:~# source /etc/profile
- 配置hadoop
# 添加java环境变量
root@master:~# vi /usr/local/hadoop-2.9.2/etc/hadoop/hadoop-env.sh
# 内容
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/hadoop-env.sh root@slave1:/usr/local/hadoop-2.9.2/etc/hadoop/
hadoop-env.sh 100% 4991 4.8MB/s 00:00
root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/hadoop-env.sh root@slave2:/usr/local/hadoop-2.9.2/etc/hadoop/
hadoop-env.sh
# core-site.xml
root@master:~# vi /usr/local/hadoop-2.9.2/etc/hadoop/core-site.xml
# 内容
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/tmp</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>
root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/core-site.xml root@slave1:/usr/local/hadoop-2.9.2/etc/hadoop/
core-site.xml 100% 1258 330.8KB/s 00:00
root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/core-site.xml root@slave2:/usr/local/hadoop-2.9.2/etc/hadoop/
core-site.xml
# hdfs-site.xml
root@master:~# vi /usr/local/hadoop-2.9.2/etc/hadoop/hdfs-site.xml
# 内容
<configuration>
<!-- 设置namenode的http通讯地址 -->
<property>
<name>dfs.namenode.http-address</name>
<value>master:50070</value>
</property>
<!-- 设置secondarynamenode的http通讯地址 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>slave1:50090</value>
</property>
<!-- 设置namenode存放的路径 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>/hadoop/tmp/name</value>
</property>
<!-- 设置datanode存放的路径 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>/hadoop/tmp/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/hdfs-site.xml root@slave1:/usr/local/hadoop-2.9.2/etc/hadoop/
hdfs-site.xml 100% 1576 440.6KB/s 00:00
root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/hdfs-site.xml root@slave2:/usr/local/hadoop-2.9.2/etc/hadoop/
hdfs-site.xml
# 修改文件slaves
root@master:~# vi /usr/local/hadoop-2.9.2/etc/hadoop/slaves
master
slave1
slave2
root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/slaves root@slave1:/usr/local/hadoop-2.9.2/etc/hadoop/
slaves 100% 21 1.4KB/s 00:00
root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/slaves root@slave2:/usr/local/hadoop-2.9.2/etc/hadoop/
slaves
# 本机启动免密
root@master:~/.ssh# cp id_rsa.pub authorized_keys
# 格式化hdfs
root@master:~# hdfs namenode -format
- 配置spark-env.sh
# hadoop环境变量配置
export SPARK_DIST_CLASSPATH=/usr/local/hadoop-2.9.2
# 绑定master的主机域名
SPARK_MASTER_HOST=master
# master 通信端口,worker和master通信端口
SPARK_MASTER_PORT=7077
# master SParkUI用的端口
SPARK_MASTER_WEBUI_PORT=8080
# 配置worker的内存大小
SPARK_WORKER_MEMORY=1g
- slaves配置
slave1
slave2
- history server 配置
root@master:~# cp /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-defaults.conf.template /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-defaults.conf
root@master:~# vi /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-defaults.conf
# 内容
# history
spark.master=spark://master:7077
# 设定事件日志为true
spark.eventLog.enabled=true
# 设定事件日志目录
spark.eventLog.dir=hdfs://master:9000/spark/log/historyEventLog
spark.serializer=org.apache.spark.serializer.KryoSerializer
# 设定Driver的内存大小
spark.driver.memory=1g
# 设定历史操作日志保存路径
spark.history.fs.logDirectory=hdfs://master:9000/spark/log/historyEventLog
spark.history.ui.port=18080
spark.history.fs.update.interval=10s
# 要保留的应用程序ui的数目。如果超过此上限,则将删除最旧的应用程序。
spark.history.retainedApplications=50
spark.history.fs.cleaner.enabled=false
# 设定记录删除时间
spark.history.fs.cleaner.interval=1d
spark.history.fs.cleaner.maxAge=7d
spark.history.ui.acls.enable=false
root@master:~# scp /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-defaults.conf root@slave1:/usr/local/spark-2.4.4-bin-hadoop2.7/conf/
spark-defaults.conf 100% 2091 2.5MB/s 00:00
root@master:~# scp /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-defaults.conf root@slave2:/usr/local/spark-2.4.4-bin-hadoop2.7/conf/
spark-defaults.conf
注意: spark.eventLog.dir 和spark.history.fs.logDirectory 要相同路径
四、启动集群
- master启动命令
root@master:~# start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark-2.4.4-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.master.Master-1-master.out
root@master:~# jps
111570 Master
111661 Jps
- worker启动命令
root@slave1:/usr/local/spark-2.4.4-bin-hadoop2.7# ./sbin/start-slave.sh spark://master:7077
starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark-2.4.4-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave1.out
root@slave1:/usr/local/spark-2.4.4-bin-hadoop2.7# jps
126165 Jps
125909 Worker
root@slave2:/usr/local/spark-2.4.4-bin-hadoop2.7# ./sbin/start-slave.sh spark://master:7077
starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark-2.4.4-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave2.out
root@slave2:/usr/local/spark-2.4.4-bin-hadoop2.7# jps
7572 Worker
7656 Jps
-
Web查看
http://192.168.247.131:8080/
启动历史服务
# 启动hdfs
root@master:~# start-dfs.sh
Starting namenodes on [master]
master: starting namenode, logging to /usr/local/hadoop-2.9.2/logs/hadoop-root-namenode-master.out
slave2: starting datanode, logging to /usr/local/hadoop-2.9.2/logs/hadoop-root-datanode-slave2.out
slave1: starting datanode, logging to /usr/local/hadoop-2.9.2/logs/hadoop-root-datanode-slave1.out
master: starting datanode, logging to /usr/local/hadoop-2.9.2/logs/hadoop-root-datanode-master.out
Starting secondary namenodes [slave1]
slave1: starting secondarynamenode, logging to /usr/local/hadoop-2.9.2/logs/hadoop-root-secondarynamenode-slave1.out
root@master:~# jps
11874 NameNode
111570 Master
12583 Jps
12157 DataNode
# 创建历史记录存放目录
root@master:~# hadoop fs -mkdir -p /spark/log/historyEventLog
# 启动历史服务
root@master:~# start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to /usr/local/spark-2.4.4-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.history.HistoryServer-1-master.out
root@master:~# jps
11874 NameNode
111570 Master
13975 HistoryServer
14057 Jps
12157 DataNode
五、验证测试
六、Yarn模式
Yarn模式
是指Spark直接调用Yarn进行资源调度,Standalone下资源调度用的是Spark自身的Master和ApplicationMaster;
在Yarn模式下直接直接用Yarn的ResourceManager、NodeManager和Container环境做资源调度,Spark计算框架的ApplicationMaster和Executor存在于Container中。
Spark ApplicationMaster
负责和Yarn交互做资源调度,
Spark Driver
负责和Executor交互做任务调度,
Spark中的Master和Worker没有了,换成了对应的yarn的RM和NM,Executor和Driver依然存在。
(1)客户端提交应用程序,SparkSubmit
(2)让RM启动Spark的ApplicationMaster程序,用于Spark与Yarn之间资源交互
(3)AM向RM申请资源,用于启动Executor
(4)RM获取集群的资源信息(NM)
(5)RM将资源信息发送给AM,由AM中的Driver判断任务调度的地址
(6)Driver划分任务,分配任务task发送给Executor执行
(7)Executor执行任务,执行完毕后,通知Driver
(8)Driver和AM交互通知RM回收资源
(9)Executor、Container、Driver、ApplicationMaster就都释放资源消失
(10)最终留下Yarn的RM和NM,在client端打印结果。
七、运行模式
(一) client模式(开发测试)
在Client模式下,Driver进程会在当前客户端启动,客户端进程一直存在直到应用程序运行结束。
- 配置 spark-env.sh
# 配置 spark-env.sh
root@master:~# cp /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-env.sh.template /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-env.sh
# 添加配置
HADOOP_CONF_DIR=/usr/local/hadoop-2.9.2/etc/hadoop
YARN_CONF_DIR=/usr/local/hadoop-2.9.2/etc/hadoop
# 分发Spark
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# scp -r /usr/local/spark-2.4.4-bin-hadoop2.7/ root@slave1:/usr/local/
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# scp -r /usr/local/spark-2.4.4-bin-hadoop2.7/ root@slave2:/usr/local/
- 分发Scala
# 分发scala
root@master:~# scp -r /usr/local/scala-2.13.1/ root@slave1:/usr/local/
root@master:~# scp -r /usr/local/scala-2.13.1/ root@slave2:/usr/local/
# 分发环境变量
root@master:~# scp -r /etc/profile root@slave1:/etc/
profile
root@master:~# scp -r /etc/profile root@slave2:/etc/
profile
- 配置Hadoop
# 编辑yarn-site.xml文件
# 添加
<!--是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<!--是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
# 分发到其他节点
root@master:~# scp -r /usr/local/spark-2.4.4-bin-hadoop2.7/ root@slave1:/usr/local/
root@master:~# scp -r /usr/local/spark-2.4.4-bin-hadoop2.7/ root@slave2:/usr/local/
# 分发hadoop配置
root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/yarn-site.xml root@slave1:/usr/local/hadoop-2.9.2/etc/hadoop/
yarn-site.xml 100% 3128 294.2KB/s 00:00
root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/yarn-site.xml root@slave2:/usr/local/hadoop-2.9.2/etc/hadoop/
yarn-site.xml
- 启动hadoop
# 启动hadoop
# master
root@master:~# start-dfs.sh
root@master:~# start-yarn.sh
- 提交作业
# 提交作业
spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode client \
--driver-memory 1g \
--executor-memory 512m \
--executor-cores 1 \
/usr/local/spark-2.4.4-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.4.4.jar \
10
# 运行结果
Pi is roughly 3.1428957144785725
说明:
--master yarn:使用yarn进行资源调度
--deploy-mode client:使用client模式
-
查看结果:
http://master:8088
spark-shell
spark-shell必须使用client模式
# 创建文件
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# cat /root/wordcount.txt
hello tom
hello jerry
hello kitty
hello world
hello tom
hello marquis
hello jone
# 上传数据文件
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# hadoop fs -put /root/wordcount.txt /wordcount
# 行数统计
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# ./bin/spark-shell --master yarn --deploy-mode client
19/12/02 11:11:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/12/02 11:12:24 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Spark context Web UI available at http://master:4040
Spark context available as 'sc' (master = yarn, app id = application_1575253969569_0004).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.4
/_/
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_231)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val lines = sc.textFile("/wordcount")
lines: org.apache.spark.rdd.RDD[String] = /wordcount MapPartitionsRDD[1] at textFile at <console>:24
scala> lines.count()
res0: Long = 7
scala> lines.first()
res1: String = hello tom
(二). cluster模式(生产)
yarn-cluster 不支持spark-shell /spark-sql
。
在cluster模式下,Driver进程将会在集群中的一个worker中启动,而且客户端进程在完成自己提交任务的职责后,就可以退出,而不用等到应用程序执行完毕。
- 日志配置
因为集群模式只能在日志中查看结果,所以必须配置历史日志。
# 非高可用模式
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>2592000</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://master:19888/jobhistory/logs</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://master:9000/user/hadoop/yarn-logs/</value>
</property>
# yarn.nodemanager.remote-app-log-dir是文件存放位置,可以是本地位置,也可以是HDFS上的位置,建议是存放在HDFS上。文件存放在HDFS上,目录一定要存在。
# 高可用模式
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>106800</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://master:19888/jobhistory/logs</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://hadoopha/user/hadoop/yarn-logs/</value>
</property>
# 创建目录
root@master~ # hadoop fs -mkdir -p /user/hadoop/yarn-logs
# hadoopha是在hdfs-site.xml中配置的dfs.nameservices的值。注意:高可用模式后面不要加端口号了。
# 文件分发
root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/yarn-site.xml root@slave2:/usr/local/hadoop-2.9.2/etc/hadoop/
yarn-site.xml 100% 3365 4.9MB/s 00:00
root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/yarn-site.xml root@slave1:/usr/local/hadoop-2.9.2/etc/hadoop/
yarn-site.xml 100% 3365 7.0MB/s 00:00
- 启动历史服务
# 开启hadoop 历史服务
root@master~ # mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/hadoop-2.9.2/logs/mapred-root-historyserver-master.out
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# jps
14144 JobHistoryServer
11298 NodeManager
11063 ResourceManager
21272 QuorumPeerMain
14475 Jps
22508 NameNode
23293 DFSZKFailoverController
22767 DataNode
# 开启spark历史服务
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# ./sbin/start-history-server.sh
- 作业提交
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# ./bin/spark-submit --master yarn --deploy-mode cluster --executor-memory 1G --executor-cores 1 --class org.apache.spark.examples.SparkPi /usr/local/spark-2.4.4-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.4.4.jar 10
19/12/02 11:23:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/12/02 11:24:07 INFO yarn.Client: Requesting a new application from cluster with 3 NodeManagers
19/12/02 11:24:07 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
19/12/02 11:24:07 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
19/12/02 11:24:07 INFO yarn.Client: Setting up container launch context for our AM
19/12/02 11:24:07 INFO yarn.Client: Setting up the launch environment for our AM container
19/12/02 11:24:07 INFO yarn.Client: Preparing resources for our AM container
19/12/02 11:24:07 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
19/12/02 11:24:09 INFO yarn.Client: Uploading resource file:/tmp/spark-253add86-aa04-4d50-8034-87963da2a896/__spark_libs__1592300848359771379.zip -> hdfs://hadoopha/user/root/.sparkStaging/application_1575253969569_0006/__spark_libs__1592300848359771379.zip
19/12/02 11:24:13 INFO yarn.Client: Uploading resource file:/usr/local/spark-2.4.4-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.4.4.jar -> hdfs://hadoopha/user/root/.sparkStaging/application_1575253969569_0006/spark-examples_2.11-2.4.4.jar
19/12/02 11:24:13 INFO yarn.Client: Uploading resource file:/tmp/spark-253add86-aa04-4d50-8034-87963da2a896/__spark_conf__836577871397246590.zip -> hdfs://hadoopha/user/root/.sparkStaging/application_1575253969569_0006/__spark_conf__.zip
19/12/02 11:24:13 INFO spark.SecurityManager: Changing view acls to: root
19/12/02 11:24:13 INFO spark.SecurityManager: Changing modify acls to: root
19/12/02 11:24:13 INFO spark.SecurityManager: Changing view acls groups to:
19/12/02 11:24:13 INFO spark.SecurityManager: Changing modify acls groups to:
19/12/02 11:24:13 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
19/12/02 11:24:14 INFO yarn.Client: Submitting application application_1575253969569_0006 to ResourceManager
19/12/02 11:24:14 INFO impl.YarnClientImpl: Submitted application application_1575253969569_0006
19/12/02 11:24:15 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:15 INFO yarn.Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1575257054794
final status: UNDEFINED
tracking URL: http://master:8088/proxy/application_1575253969569_0006/
user: root
19/12/02 11:24:16 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:17 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:18 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:19 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:20 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:21 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:22 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:23 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:24 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:25 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:26 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:27 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:28 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:29 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:30 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:31 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:32 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:33 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:34 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:35 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:36 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:37 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:38 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:39 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:40 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:41 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:42 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:43 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:44 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:45 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:46 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:47 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:48 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:49 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:49 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: master
ApplicationMaster RPC port: 36121
queue: default
start time: 1575257054794
final status: UNDEFINED
tracking URL: http://master:8088/proxy/application_1575253969569_0006/
user: root
19/12/02 11:24:50 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:51 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:52 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:53 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:54 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:55 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:56 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:57 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:58 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:59 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:00 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:01 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:02 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:03 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:04 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:05 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:06 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:07 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:08 INFO yarn.Client: Application report for application_1575253969569_0006 (state: FINISHED)
19/12/02 11:25:08 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: master
ApplicationMaster RPC port: 36121
queue: default
start time: 1575257054794
final status: SUCCEEDED
tracking URL: http://master:8088/proxy/application_1575253969569_0006/
user: root
19/12/02 11:25:09 INFO util.ShutdownHookManager: Shutdown hook called
19/12/02 11:25:09 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-488a412e-9786-4242-9768-3df83c89078c
19/12/02 11:25:09 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-253add86-aa04-4d50-8034-87963da2a896
必须配置历史日志:
结果在日志中查看。
注意:必须启动Spark的历史服务。
(三)两种模式的区别
client模式
:Driver运行在Client上,应用程序运行结果会在客户端显示,所有适合运行结果有输出的应用程序(如spark-shell)。
cluster模式
:Driver程序在YARN中运行,应用的运行结果不能在客户端显示,所以最好运行那些将结果最终保存在外部存储介质(如HDFS、Redis、Mysql)而非stdout输出的应用程序,客户端的终端显示的仅是作为YARN的job的简单运行状况。
八、常见问题:
- WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
解决:
# 在hdfs上创建目录
root@master:~# hadoop fs -mkdir -p /home/hadoop/spark_jars
# 上传spark的jars
root@master:~# hadoop fs -put /usr/local/spark-2.4.4-bin-hadoop2.7/jars/* /home/hadoop/spark_jars
- 内存不足
# 关闭虚拟机内存检查(避免虚拟机内存不足时,无法运行)
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>