Spark开发--Yarn集群模式(五)

一、Yarn集群模式

  Spark on Yarn 模式就是将Spark应用程序跑在Yarn集群之上,通过Yarn资源调度将executor启动在container中,从而完成driver端分发给executor的各个任务。将Spark作业跑在Yarn上,首先需要启动Yarn集群,然后通过spark-shell或spark-submit的方式将作业提交到Yarn上运行。
  提交作业之前需要将HADOOP_CONF_DIR或YARN_CONF_DIR配置到Spark-env.sh中。

  1. 集群规划
服务器 IP地址 软件 服务 备注
master 192.168.247.131 JDK、Scala、Spark resourceManager、namenode、datanode 主机
slave1 192.168.247.132 JDK、Scala、Spark nodeManager、namenode、datanode 从机
slave2 192.168.247.130 JDK、Scala、Spark nodeManager、namenode、datanode 从机
  1. 主机配置
192.168.247.131  master
192.168.247.132  slave1
192.168.247.130  slave2

  1. 配置免密

二、前置条件

1、Java8安装

/usr/lib/jvm/java-8-openjdk-amd64/

2、Scala scala 2.12.10

root@master:~# scala
Welcome to Scala 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_231).
Type in expressions for evaluation. Or try :help.

scala> :quit

3、安装Hadoop

需要安装HDFS模块和YARN模块,HDFS必须安装,spark运行时要把jar包存放到HDFS上。

# 下载
# Hadoop下载
root@master:~# wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/core/hadoop-2.9.2/hadoop-2.9.2.tar.gz

# 解压
root@master:~# tar -zxvf hadoop-2.9.2.tar.gz -C /usr/local
# 配置环境变量
root@master:~# vi /etc/profile
export SCALA_HOME=/usr/local/scala-2.12.10
export HADOOP_HOME=/usr/local/hadoop-2.9.2
export PATH=$PATH:$SCALA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

三、下载安装Spark

spark程序将作为YARN的客户端用于提交任务。

  1. 下载安装
    下载地址:http://spark.apache.org/downloads.html
    下载地址
# 下载
root@master:~# wget https://www-us.apache.org/dist/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz

# 解压
root@master:~# tar -zxvf spark-2.4.4-bin-hadoop2.7.tgz -C /usr/local

  1. 配置环境变量
# 配置环境变量
root@master:~# vi /etc/profile
# 内容
export SCALA_HOME=/usr/local/scala-2.12.10
export HADOOP_HOME=/usr/local/hadoop-2.9.2
export SPARK_HOME=/usr/local/spark-2.4.4-bin-hadoop2.7
export PATH=$PATH:$SCALA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SPARK_HOME/bin:$SPARK_HOME/sbin

# 环境变量立即生效
root@master:~# source /etc/profile

  1. 配置hadoop
# 添加java环境变量
root@master:~# vi /usr/local/hadoop-2.9.2/etc/hadoop/hadoop-env.sh 
# 内容
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/

root@master:~# scp  /usr/local/hadoop-2.9.2/etc/hadoop/hadoop-env.sh root@slave1:/usr/local/hadoop-2.9.2/etc/hadoop/
hadoop-env.sh                                                                                                                                                    100% 4991     4.8MB/s   00:00    
root@master:~# scp  /usr/local/hadoop-2.9.2/etc/hadoop/hadoop-env.sh root@slave2:/usr/local/hadoop-2.9.2/etc/hadoop/
hadoop-env.sh   

# core-site.xml 
root@master:~# vi /usr/local/hadoop-2.9.2/etc/hadoop/core-site.xml 
# 内容
<configuration>
        <property>
                <name>fs.default.name</name>
                <value>hdfs://master:9000</value>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/hadoop/tmp</value>
        </property>
        <property>
                 <name>hadoop.proxyuser.root.hosts</name>
                 <value>*</value>
        </property>
        <property>   
                 <name>hadoop.proxyuser.root.groups</name>
                 <value>*</value>
        </property>
</configuration>

root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/core-site.xml root@slave1:/usr/local/hadoop-2.9.2/etc/hadoop/
core-site.xml                                                                                                                                                    100% 1258   330.8KB/s   00:00    
root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/core-site.xml root@slave2:/usr/local/hadoop-2.9.2/etc/hadoop/
core-site.xml  

# hdfs-site.xml 
root@master:~# vi /usr/local/hadoop-2.9.2/etc/hadoop/hdfs-site.xml 
# 内容
<configuration>
        <!-- 设置namenode的http通讯地址 -->
        <property>
                <name>dfs.namenode.http-address</name>
                <value>master:50070</value>
        </property>
        <!-- 设置secondarynamenode的http通讯地址 -->
        <property>
                <name>dfs.namenode.secondary.http-address</name>
                <value>slave1:50090</value>
        </property>
        <!-- 设置namenode存放的路径 -->
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>/hadoop/tmp/name</value>
        </property>
        <!-- 设置datanode存放的路径 -->
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>/hadoop/tmp/data</value>
        </property>
        <property>
                <name>dfs.replication</name>
                <value>3</value>
        </property>
        <property>
                <name>dfs.permissions</name>
                <value>false</value>
        </property>
</configuration>

root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/hdfs-site.xml root@slave1:/usr/local/hadoop-2.9.2/etc/hadoop/
hdfs-site.xml                                                                                                                                                    100% 1576   440.6KB/s   00:00    
root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/hdfs-site.xml root@slave2:/usr/local/hadoop-2.9.2/etc/hadoop/
hdfs-site.xml   

# 修改文件slaves
root@master:~# vi /usr/local/hadoop-2.9.2/etc/hadoop/slaves 
master
slave1
slave2

root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/slaves root@slave1:/usr/local/hadoop-2.9.2/etc/hadoop/
slaves                                                                                                                                                           100%   21     1.4KB/s   00:00    
root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/slaves root@slave2:/usr/local/hadoop-2.9.2/etc/hadoop/
slaves 
# 本机启动免密
root@master:~/.ssh# cp id_rsa.pub authorized_keys

# 格式化hdfs
root@master:~# hdfs namenode -format

  1. 配置spark-env.sh
# hadoop环境变量配置
export SPARK_DIST_CLASSPATH=/usr/local/hadoop-2.9.2
# 绑定master的主机域名
SPARK_MASTER_HOST=master
# master 通信端口,worker和master通信端口
SPARK_MASTER_PORT=7077
# master SParkUI用的端口
SPARK_MASTER_WEBUI_PORT=8080
# 配置worker的内存大小
SPARK_WORKER_MEMORY=1g

  1. slaves配置
slave1
slave2

  1. history server 配置
root@master:~# cp /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-defaults.conf.template /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-defaults.conf
root@master:~# vi /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-defaults.conf
# 内容
# history 
spark.master=spark://master:7077
# 设定事件日志为true
spark.eventLog.enabled=true
# 设定事件日志目录
spark.eventLog.dir=hdfs://master:9000/spark/log/historyEventLog
spark.serializer=org.apache.spark.serializer.KryoSerializer
# 设定Driver的内存大小
spark.driver.memory=1g
# 设定历史操作日志保存路径
spark.history.fs.logDirectory=hdfs://master:9000/spark/log/historyEventLog

spark.history.ui.port=18080
spark.history.fs.update.interval=10s
#    要保留的应用程序ui的数目。如果超过此上限,则将删除最旧的应用程序。
spark.history.retainedApplications=50
spark.history.fs.cleaner.enabled=false
# 设定记录删除时间
spark.history.fs.cleaner.interval=1d
spark.history.fs.cleaner.maxAge=7d
spark.history.ui.acls.enable=false

root@master:~# scp /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-defaults.conf root@slave1:/usr/local/spark-2.4.4-bin-hadoop2.7/conf/
spark-defaults.conf                                                                                                                                              100% 2091     2.5MB/s   00:00    
root@master:~# scp /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-defaults.conf root@slave2:/usr/local/spark-2.4.4-bin-hadoop2.7/conf/
spark-defaults.conf  

注意: spark.eventLog.dir 和spark.history.fs.logDirectory 要相同路径

四、启动集群

  1. master启动命令
root@master:~# start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark-2.4.4-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.master.Master-1-master.out
root@master:~# jps
111570 Master
111661 Jps

  1. worker启动命令
root@slave1:/usr/local/spark-2.4.4-bin-hadoop2.7# ./sbin/start-slave.sh spark://master:7077
starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark-2.4.4-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave1.out
root@slave1:/usr/local/spark-2.4.4-bin-hadoop2.7# jps
126165 Jps
125909 Worker

root@slave2:/usr/local/spark-2.4.4-bin-hadoop2.7#  ./sbin/start-slave.sh spark://master:7077
starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark-2.4.4-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave2.out
root@slave2:/usr/local/spark-2.4.4-bin-hadoop2.7# jps
7572 Worker
7656 Jps

  1. Web查看
    http://192.168.247.131:8080/


    master控制台
  2. 启动历史服务

# 启动hdfs
root@master:~# start-dfs.sh
Starting namenodes on [master]
master: starting namenode, logging to /usr/local/hadoop-2.9.2/logs/hadoop-root-namenode-master.out
slave2: starting datanode, logging to /usr/local/hadoop-2.9.2/logs/hadoop-root-datanode-slave2.out
slave1: starting datanode, logging to /usr/local/hadoop-2.9.2/logs/hadoop-root-datanode-slave1.out
master: starting datanode, logging to /usr/local/hadoop-2.9.2/logs/hadoop-root-datanode-master.out
Starting secondary namenodes [slave1]
slave1: starting secondarynamenode, logging to /usr/local/hadoop-2.9.2/logs/hadoop-root-secondarynamenode-slave1.out
root@master:~# jps
11874 NameNode
111570 Master
12583 Jps
12157 DataNode
# 创建历史记录存放目录
root@master:~# hadoop fs -mkdir -p /spark/log/historyEventLog

# 启动历史服务
root@master:~# start-history-server.sh 
starting org.apache.spark.deploy.history.HistoryServer, logging to /usr/local/spark-2.4.4-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.history.HistoryServer-1-master.out
root@master:~# jps
11874 NameNode
111570 Master
13975 HistoryServer
14057 Jps
12157 DataNode

查看历史日志

五、验证测试


六、Yarn模式

Yarn模式是指Spark直接调用Yarn进行资源调度,Standalone下资源调度用的是Spark自身的Master和ApplicationMaster;
在Yarn模式下直接直接用Yarn的ResourceManager、NodeManager和Container环境做资源调度,Spark计算框架的ApplicationMaster和Executor存在于Container中。
Spark ApplicationMaster负责和Yarn交互做资源调度,
Spark Driver负责和Executor交互做任务调度,
Spark中的Master和Worker没有了,换成了对应的yarn的RM和NM,Executor和Driver依然存在。

Yarn模式运行流程

(1)客户端提交应用程序,SparkSubmit
(2)让RM启动Spark的ApplicationMaster程序,用于Spark与Yarn之间资源交互
(3)AM向RM申请资源,用于启动Executor
(4)RM获取集群的资源信息(NM)
(5)RM将资源信息发送给AM,由AM中的Driver判断任务调度的地址
(6)Driver划分任务,分配任务task发送给Executor执行
(7)Executor执行任务,执行完毕后,通知Driver
(8)Driver和AM交互通知RM回收资源
(9)Executor、Container、Driver、ApplicationMaster就都释放资源消失
(10)最终留下Yarn的RM和NM,在client端打印结果。

七、运行模式

(一) client模式(开发测试)

  在Client模式下,Driver进程会在当前客户端启动,客户端进程一直存在直到应用程序运行结束。


client模式
  1. 配置 spark-env.sh
# 配置 spark-env.sh
root@master:~# cp /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-env.sh.template /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-env.sh
# 添加配置
HADOOP_CONF_DIR=/usr/local/hadoop-2.9.2/etc/hadoop
YARN_CONF_DIR=/usr/local/hadoop-2.9.2/etc/hadoop

# 分发Spark
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# scp -r /usr/local/spark-2.4.4-bin-hadoop2.7/ root@slave1:/usr/local/
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# scp -r /usr/local/spark-2.4.4-bin-hadoop2.7/ root@slave2:/usr/local/

  1. 分发Scala
# 分发scala
root@master:~# scp -r /usr/local/scala-2.13.1/ root@slave1:/usr/local/
root@master:~# scp -r /usr/local/scala-2.13.1/ root@slave2:/usr/local/

# 分发环境变量
root@master:~# scp -r /etc/profile root@slave1:/etc/
profile 
root@master:~# scp -r /etc/profile root@slave2:/etc/
profile   

  1. 配置Hadoop
# 编辑yarn-site.xml文件
# 添加
<!--是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
<property>
     <name>yarn.nodemanager.pmem-check-enabled</name>
     <value>false</value> 
</property>

<!--是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
<property>
     <name>yarn.nodemanager.vmem-check-enabled</name>
     <value>false</value>
</property>

# 分发到其他节点
root@master:~# scp -r /usr/local/spark-2.4.4-bin-hadoop2.7/ root@slave1:/usr/local/
root@master:~# scp -r /usr/local/spark-2.4.4-bin-hadoop2.7/ root@slave2:/usr/local/
# 分发hadoop配置
root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/yarn-site.xml root@slave1:/usr/local/hadoop-2.9.2/etc/hadoop/
yarn-site.xml                                                                                                                                                          100% 3128   294.2KB/s   00:00    
root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/yarn-site.xml root@slave2:/usr/local/hadoop-2.9.2/etc/hadoop/
yarn-site.xml   
  1. 启动hadoop
# 启动hadoop
# master
root@master:~# start-dfs.sh
root@master:~# start-yarn.sh

  1. 提交作业
# 提交作业
spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode client \
--driver-memory 1g \
--executor-memory 512m \
--executor-cores 1 \
 /usr/local/spark-2.4.4-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.4.4.jar \
10
# 运行结果
Pi is roughly 3.1428957144785725

说明:
--master yarn:使用yarn进行资源调度
--deploy-mode client:使用client模式

  1. 查看结果:
    http://master:8088

    查看结果

  2. spark-shell
    spark-shell必须使用client模式

# 创建文件
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# cat /root/wordcount.txt 
hello tom
hello jerry
hello kitty
hello world
hello tom
hello marquis
hello jone
# 上传数据文件
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# hadoop fs -put /root/wordcount.txt /wordcount

# 行数统计
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# ./bin/spark-shell --master yarn --deploy-mode client
19/12/02 11:11:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/12/02 11:12:24 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Spark context Web UI available at http://master:4040
Spark context available as 'sc' (master = yarn, app id = application_1575253969569_0004).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.4
      /_/
         
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_231)
Type in expressions to have them evaluated.
Type :help for more information.

scala> val lines  = sc.textFile("/wordcount")
lines: org.apache.spark.rdd.RDD[String] = /wordcount MapPartitionsRDD[1] at textFile at <console>:24

scala> lines.count()
res0: Long = 7                                                                  

scala> lines.first()
res1: String = hello tom

运行结果

(二). cluster模式(生产)

yarn-cluster 不支持spark-shell /spark-sql
  在cluster模式下,Driver进程将会在集群中的一个worker中启动,而且客户端进程在完成自己提交任务的职责后,就可以退出,而不用等到应用程序执行完毕。

cluster模式

  1. 日志配置
    因为集群模式只能在日志中查看结果,所以必须配置历史日志。
# 非高可用模式
<property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
</property>
 <property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>2592000</value>
</property>
<property>
    <name>yarn.log.server.url</name>
    <value>http://master:19888/jobhistory/logs</value>
</property>
<property>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>hdfs://master:9000/user/hadoop/yarn-logs/</value>
</property>
# yarn.nodemanager.remote-app-log-dir是文件存放位置,可以是本地位置,也可以是HDFS上的位置,建议是存放在HDFS上。文件存放在HDFS上,目录一定要存在。

# 高可用模式
<property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
</property>
<property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>106800</value>
</property>
<property>
        <name>yarn.log.server.url</name>
        <value>http://master:19888/jobhistory/logs</value>
</property>
<property>
        <name>yarn.nodemanager.remote-app-log-dir</name>
        <value>hdfs://hadoopha/user/hadoop/yarn-logs/</value>
</property>

# 创建目录
root@master~ # hadoop fs -mkdir -p /user/hadoop/yarn-logs

# hadoopha是在hdfs-site.xml中配置的dfs.nameservices的值。注意:高可用模式后面不要加端口号了。

# 文件分发
root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/yarn-site.xml root@slave2:/usr/local/hadoop-2.9.2/etc/hadoop/
yarn-site.xml                                                                                                                                                          100% 3365     4.9MB/s   00:00    
root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/yarn-site.xml root@slave1:/usr/local/hadoop-2.9.2/etc/hadoop/
yarn-site.xml                                                                                                                                                          100% 3365     7.0MB/s   00:00  
  1. 启动历史服务
# 开启hadoop 历史服务
root@master~ # mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/hadoop-2.9.2/logs/mapred-root-historyserver-master.out
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# jps
14144 JobHistoryServer
11298 NodeManager
11063 ResourceManager
21272 QuorumPeerMain
14475 Jps
22508 NameNode
23293 DFSZKFailoverController
22767 DataNode
# 开启spark历史服务
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# ./sbin/start-history-server.sh 

  1. 作业提交
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# ./bin/spark-submit --master yarn --deploy-mode cluster --executor-memory 1G --executor-cores 1 --class org.apache.spark.examples.SparkPi  /usr/local/spark-2.4.4-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.4.4.jar 10

19/12/02 11:23:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/12/02 11:24:07 INFO yarn.Client: Requesting a new application from cluster with 3 NodeManagers
19/12/02 11:24:07 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
19/12/02 11:24:07 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
19/12/02 11:24:07 INFO yarn.Client: Setting up container launch context for our AM
19/12/02 11:24:07 INFO yarn.Client: Setting up the launch environment for our AM container
19/12/02 11:24:07 INFO yarn.Client: Preparing resources for our AM container
19/12/02 11:24:07 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
19/12/02 11:24:09 INFO yarn.Client: Uploading resource file:/tmp/spark-253add86-aa04-4d50-8034-87963da2a896/__spark_libs__1592300848359771379.zip -> hdfs://hadoopha/user/root/.sparkStaging/application_1575253969569_0006/__spark_libs__1592300848359771379.zip
19/12/02 11:24:13 INFO yarn.Client: Uploading resource file:/usr/local/spark-2.4.4-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.4.4.jar -> hdfs://hadoopha/user/root/.sparkStaging/application_1575253969569_0006/spark-examples_2.11-2.4.4.jar
19/12/02 11:24:13 INFO yarn.Client: Uploading resource file:/tmp/spark-253add86-aa04-4d50-8034-87963da2a896/__spark_conf__836577871397246590.zip -> hdfs://hadoopha/user/root/.sparkStaging/application_1575253969569_0006/__spark_conf__.zip
19/12/02 11:24:13 INFO spark.SecurityManager: Changing view acls to: root
19/12/02 11:24:13 INFO spark.SecurityManager: Changing modify acls to: root
19/12/02 11:24:13 INFO spark.SecurityManager: Changing view acls groups to: 
19/12/02 11:24:13 INFO spark.SecurityManager: Changing modify acls groups to: 
19/12/02 11:24:13 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
19/12/02 11:24:14 INFO yarn.Client: Submitting application application_1575253969569_0006 to ResourceManager
19/12/02 11:24:14 INFO impl.YarnClientImpl: Submitted application application_1575253969569_0006
19/12/02 11:24:15 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:15 INFO yarn.Client: 
         client token: N/A
         diagnostics: AM container is launched, waiting for AM container to Register with RM
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1575257054794
         final status: UNDEFINED
         tracking URL: http://master:8088/proxy/application_1575253969569_0006/
         user: root
19/12/02 11:24:16 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:17 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:18 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:19 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:20 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:21 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:22 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:23 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:24 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:25 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:26 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:27 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:28 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:29 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:30 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:31 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:32 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:33 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:34 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:35 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:36 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:37 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:38 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:39 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:40 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:41 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:42 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:43 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:44 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:45 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:46 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:47 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:48 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:49 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:49 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: master
         ApplicationMaster RPC port: 36121
         queue: default
         start time: 1575257054794
         final status: UNDEFINED
         tracking URL: http://master:8088/proxy/application_1575253969569_0006/
         user: root
19/12/02 11:24:50 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:51 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:52 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:53 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:54 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:55 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:56 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:57 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:58 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:59 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:00 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:01 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:02 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:03 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:04 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:05 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:06 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:07 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:08 INFO yarn.Client: Application report for application_1575253969569_0006 (state: FINISHED)
19/12/02 11:25:08 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: master
         ApplicationMaster RPC port: 36121
         queue: default
         start time: 1575257054794
         final status: SUCCEEDED
         tracking URL: http://master:8088/proxy/application_1575253969569_0006/
         user: root
19/12/02 11:25:09 INFO util.ShutdownHookManager: Shutdown hook called
19/12/02 11:25:09 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-488a412e-9786-4242-9768-3df83c89078c
19/12/02 11:25:09 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-253add86-aa04-4d50-8034-87963da2a896

运行结果

必须配置历史日志:
结果在日志中查看。
结果

结果

结果

注意:必须启动Spark的历史服务。

(三)两种模式的区别

client模式:Driver运行在Client上,应用程序运行结果会在客户端显示,所有适合运行结果有输出的应用程序(如spark-shell)。
cluster模式:Driver程序在YARN中运行,应用的运行结果不能在客户端显示,所以最好运行那些将结果最终保存在外部存储介质(如HDFS、Redis、Mysql)而非stdout输出的应用程序,客户端的终端显示的仅是作为YARN的job的简单运行状况。

八、常见问题:

  1. WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
    解决:
# 在hdfs上创建目录
root@master:~# hadoop fs -mkdir -p /home/hadoop/spark_jars
# 上传spark的jars
root@master:~# hadoop fs -put /usr/local/spark-2.4.4-bin-hadoop2.7/jars/* /home/hadoop/spark_jars

  1. 内存不足
# 关闭虚拟机内存检查(避免虚拟机内存不足时,无法运行)
<property>
  <name>yarn.nodemanager.vmem-check-enabled</name> 
  <value>false</value> 
 </property>
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 205,132评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 87,802评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 151,566评论 0 338
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,858评论 1 277
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,867评论 5 368
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,695评论 1 282
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,064评论 3 399
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,705评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 42,915评论 1 300
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,677评论 2 323
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,796评论 1 333
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,432评论 4 322
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,041评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,992评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,223评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,185评论 2 352
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,535评论 2 343

推荐阅读更多精彩内容