要通过web页面查看运行日志,需要启动两个东西
hadoop启动jobhistoryserver和spark的history-server.
etc/hadoop/mapred-site.xml
<!--配置jobhistory的地址和web管理地址-->
<property>
<name>mapreduce.jobhistory.address</name>
<value>spark-master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>spark-master:19888</value>
</property>
yarn-site.xml
<!-- 是否开启聚合日志 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 配置日志服务器的地址,work节点使用 -->
<property>
<name>yarn.log.server.url</name>
<value>http://spark-master:19888/jobhistory/logs/</value>
</property>
<!-- 配置日志过期时间,单位秒 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>86400</value>
</property>
spark-defaults.conf
(spark的安装目录下)
spark.eventLog.enabled=true
spark.eventLog.compress=true
#保存在本地
#spark.eventLog.dir=file://usr/local/hadoop-2.7.6/logs/userlogs
#spark.history.fs.logDirectory=file://usr/local/hadoop-2.7.6/logs/userlogs
#保存在hdfs上
spark.eventLog.dir=hdfs://spark-master:9000/tmp/logs/root/logs
spark.history.fs.logDirectory=hdfs://spark-master:9000/tmp/logs/root/logs
spark.yarn.historyServer.address=spark-master:18080
启动
1.首先启动 hadoop的jobhistory
[root@spark-master hadoop-2.7.6]# sbin/mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/hadoop-2.7.6/logs/mapred-root-historyserver-spark-master.out
2.启动spark的history-server
[root@spark-master spark-2.3.0]# sbin/start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to /usr/local/spark-2.3.0/logs/spark-root-org.apache.spark.deploy.history.HistoryServer-1-spark-master.out
如果配置正确,启动完成之后,就可以访问18080 和19888
通过yarn提交任务出现Failed while trying to construct the redirect url to the log server. Log Server url may
1、在通过yarn-client模式提交任务时,打开http://master:8088/网页出现如下错误:
报错:
Aggregation is not enabled. Try the nodemanager at server-3:44981
Or see application
2、而且显示任务是成功运行的,并且任务运行结果也出来了
3、出现此问题是由于启动historyserver服务,默认情况关闭的,它是一个独立的服务,首先需要配置yarn-site.xml文件,在该配置文件中加入以下配置
<property>
<name>yarn.log.server.url</name>
<value>http://master:19888/jobhistory/logs</value>
</property>
4、然后在mapred-site.xml中加入如下配置,端口是在yarn-site.xml中一样,是19888:
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
5、将此更改的配置分发到其他节点上去,可通过如下命令进行分发:
scp /usr/local/src/hadoop-2.6.5/etc/hadoop/yarn-site.xml root@slave1:/usr/local/src/hadoop-2.6.5/etc/hadoop/
scp /usr/local/src/hadoop-2.6.5/etc/hadoop/yarn-site.xml root@slave2:/usr/local/src/hadoop-2.6.5/etc/hadoop/
scp /usr/local/src/hadoop-2.6.5/etc/hadoop/mapred-site.xml root@slave1:/usr/local/src/hadoop-2.6.5/etc/hadoop/
scp /usr/local/src/hadoop-2.6.5/etc/hadoop/mapred-site.xml root@slave2:/usr/local/src/hadoop-2.6.5/etc/hadoop/
6、在master上通过如下命令启动historyserver:
/usr/local/src/hadoop-2.6.5/sbin/mr-jobhistory-daemon.sh start historyserver
7、此时可以打开http://master:19888查看页面了,如下图所示:
但是当去点击log的链接的时候,会碰到Aggregation function is not enabled错误。为了能看到每个Map和Reduce任务的Log,还必须在yarn-site.xml里面配置aggregation为true。
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
<description>Configuration to enable or disable log aggregation</description>
</property>
然后将yarn-site.xml同步到所有的节点,在重启集群。这个时候再点击上面那个logs链接,就可以看到每个任务的log了,而Logger们输出的内容也在里面!!
到了这里,就只剩下一个问题了。这个log文件在哪里?查看yarn-site.xml后终于发现了MapReduce任务的log的位置。
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/logs</value>
<description>HDFS directory where the application logs are moved on application completion. Need to set appropriate permissions. Only applicable if log-aggregation is enabled. The default value is "/logs" or "/tmp/logs" </description>
</property>
这里清楚的指明了这个log是存放在HDFS文件系统里面的,不是放在Linux文件系统里面的。在hdfs://namenode/logs/hadoop/logs里面,终于发现了每个任务对应的log文件夹。每个任务文件夹里有两个文件。分别对应的Map任务和Reduce任务。
[hadoop@SXV2V999 ~]$ hdfs dfs -ls hdfs://namenode/logs/hadoop/logs/application_1430285399789_0001