【2019-06-28】yarn服务nodemanger故障

问题描述

yarn服务故障,查看服务管理一个nodemanger状态异常

分析过程

1.首先分析启动日志,由于HEATH_CHECK_STOP停止了nodemanger

2019-06-19 15:13:29 | INFO  | PID-16052  | start to stop nodemanager | yarn-start-stop.sh
2019-06-19 15:13:29 | INFO  | PID-16052  | stop type: HEATH_CHECK_STOP. | yarn-start-stop.sh

2.分析nodemanger运行日志,全是delete app log dir的打印,直到最后收到RECEIVED SIGNAL 15,进程kill

2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_11333211_DEL_1559995833078 | ResourceLocalizationService.java:1474
2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_10625928_DEL_1559995968558 | ResourceLocalizationService.java:1474
2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_12077960_DEL_1560384533291 | ResourceLocalizationService.java:1474
2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_11315373_DEL_1559996652333 | ResourceLocalizationService.java:1474
2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_11035836_DEL_1559996652333 | ResourceLocalizationService.java:1474
2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_11127905_DEL_1559996105413 | ResourceLocalizationService.java:1474
2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_11274943_DEL_1559996241710 | ResourceLocalizationService.java:1474
2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1547246203054_1961416_DEL_1550657777851 | ResourceLocalizationService.java:1474
2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1547246203054_2114204_DEL_1550657777851 | ResourceLocalizationService.java:1474
2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_10699626_DEL_1559996379568 | ResourceLocalizationService.java:1474
2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_11842934_DEL_1560261203480 | ResourceLocalizationService.java:1474
2019-06-19 15:13:29,650 | ERROR | SIGTERM handler | RECEIVED SIGNAL 15: SIGTERM | LogAdapter.java:69

3.基于上述分析,nodemanger是在正常启动,只是启动时候需要清理大量的app的信息。由于还未清理完成,健康检查就失败,任务重启。

解决办法

1.手工先清理nodemanger日志,rm -rf /srv/BigData/hadoop/data*/nm
2.重启nodemanger

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

  • 服务组件的规划 机器的配置需要根据实际情况考虑。由于我用的虚拟机,所以各个容量大小设置的都很低。* 注意:搭建分布...
    心_的方向阅读 4,492评论 0 2
  • 文/胡晨川我是Linux外行,hadoop外行,java更外行,Scala听也没听过,在我脑海中,Spark一直只...
    老树之见阅读 11,154评论 3 39
  • 1 目的将hadoop 2.7.1 安装到 166、167、168 三台机器上2 提供环境练习环境192.168....
    灼灼2015阅读 8,855评论 4 40
  • 现代玉器的形制和纹饰 现代常见的玉器类型,主要有用于首饰的装饰品,用于陈设的工艺品,用于防病治病的保健品和部分实用...
    古风雅韵_82fb阅读 2,187评论 0 0
  • 1.替换3d模型,首先你得有个xcode识别的3d模型,在这里下载https://github.com/HBehr...
    KouKuma阅读 1,868评论 0 0

友情链接更多精彩内容