关于HDFS文件块丢失/损坏的相关问题

打开Ambari看到hdfs报警[alert]: Total Blocks:[*], Missing Blocks:[*], 发现是有些文件块损坏了. 启动hdfs的时候发现也起不来了, 日志一直循环下面的东西.

Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://test01.bigdata.hbh:8020 -safemode get | grep 'Safe mode is OFF'' returned 1.

NameNode一直处于安全模式

[root@test01 ~]# sudo -u hdfs hdfs dfsadmin -fs hdfs://test01.bigdata.hbh:8020 -safemode get
Safe mode is ON

打开NameNode UI可以看到如下的描述:

Safe mode is ON. The reported blocks 4156 needs additional 2 blocks to reach the threshold 1.0000 of total blocks 4157. The number of live datanodes 4 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.

说明我们的损坏的文件比例超过了阈值, 这个阈值配置在hdfs中, 下图是从Ambari的配置管理, 这里配置的是100%, 也就是说不允许任何一个块损坏掉. 如果我们配置成99%应该就不会触发safemode了.


image.png

问题描述: 测试集群上的硬盘容量很小, 只有几十G, 之前做基准测试的时候就把磁盘写满了, 导致数据块丢失, 系统启动都是有问题的, 一直说hdfs在safe mode.

基础

什么是safe mode

怎么样触发safe mode

丢了一部分副本的数据

检查

[hdfs@test01 ~]$ hadoop fsck /user/root/.staging/job_1515575016190_0003/job.jar -files -blocks -locations -racks
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://test01.bigdata.hbh:50070/fsck?ugi=hdfs&files=1&blocks=1&locations=1&racks=1&path=%2Fuser%2Froot%2F.staging%2Fjob_1515575016190_0003%2Fjob.jar
FSCK started by hdfs (auth:SIMPLE) from /172.16.201.200 for path /user/root/.staging/job_1515575016190_0003/job.jar at Fri Jan 26 16:11:15 CST 2018
/user/root/.staging/job_1515575016190_0003/job.jar 272019 bytes, 1 block(s):  Under replicated BP-1912246748-192.168.89.173-1513143837848:blk_1073751971_11222. Target Replicas is 10 but found 4 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
0. BP-1912246748-192.168.89.173-1513143837848:blk_1073751971_11222 len=272019 repl=4 [/default-rack/172.16.201.200:50010, /default-rack/172.16.201.201:50010, /default-rack/172.16.201.202:50010, /default-rack/172.16.201.204:50010]

Status: HEALTHY
 Total size:    272019 B
 Total dirs:    0
 Total files:   1
 Total symlinks:        0
 Total blocks (validated):  1 (avg. block size 272019 B)
 Minimally replicated blocks:   1 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:   1 (100.0 %)
 Mis-replicated blocks:     0 (0.0 %)
 Default replication factor:    3
 Average block replication: 4.0
 Corrupt blocks:        0
 Missing replicas:      6 (60.0 %)
 Number of data-nodes:      4
 Number of racks:       1
FSCK ended at Fri Jan 26 16:11:15 CST 2018 in 0 milliseconds


The filesystem under path '/user/root/.staging/job_1515575016190_0003/job.jar' is HEALTHY
[hdfs@test01 ~]$

脏数据

[hdfs@test01 ~]$ hadoop fsck /apps/hbase/data/oldWALs/test01.bigdata.hbh%2C16020%2C1515637923065.default.1515745933793 -files -blocks -locations -racks
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://test01.bigdata.hbh:50070/fsck?ugi=hdfs&files=1&blocks=1&locations=1&racks=1&path=%2Fapps%2Fhbase%2Fdata%2FoldWALs%2Ftest01.bigdata.hbh%252C16020%252C1515637923065.default.1515745933793
FSCK started by hdfs (auth:SIMPLE) from /172.16.201.200 for path /apps/hbase/data/oldWALs/test01.bigdata.hbh%2C16020%2C1515637923065.default.1515745933793 at Fri Jan 26 16:12:22 CST 2018
/apps/hbase/data/oldWALs/test01.bigdata.hbh%2C16020%2C1515637923065.default.1515745933793 91 bytes, 1 block(s):
/apps/hbase/data/oldWALs/test01.bigdata.hbh%2C16020%2C1515637923065.default.1515745933793: CORRUPT blockpool BP-1912246748-192.168.89.173-1513143837848 block blk_1073753448
 MISSING 1 blocks of total size 91 B
0. BP-1912246748-192.168.89.173-1513143837848:blk_1073753448_12711 len=91 MISSING!

Status: CORRUPT
 Total size:    91 B
 Total dirs:    0
 Total files:   1
 Total symlinks:        0
 Total blocks (validated):  1 (avg. block size 91 B)
  ********************************
  UNDER MIN REPL'D BLOCKS:  1 (100.0 %)
  dfs.namenode.replication.min: 1
  CORRUPT FILES:    1
  MISSING BLOCKS:   1
  MISSING SIZE:     91 B
  CORRUPT BLOCKS:   1
  ********************************
 Minimally replicated blocks:   0 (0.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks:     0 (0.0 %)
 Default replication factor:    3
 Average block replication: 0.0
 Corrupt blocks:        1
 Missing replicas:      0
 Number of data-nodes:      4
 Number of racks:       1
FSCK ended at Fri Jan 26 16:12:22 CST 2018 in 1 milliseconds


The filesystem under path '/apps/hbase/data/oldWALs/test01.bigdata.hbh%2C16020%2C1515637923065.default.1515745933793' is CORRUPT

处理问题

查到具体哪个DataNode的哪些文件是丢失/损坏了的

[root@test01 ~]# sudo -u hdfs hdfs fsck /apps/hbase/data/oldWALs/ | egrep -v '^\.+$' | egrep -v '^$'
Connecting to namenode via http://test01.bigdata.hbh:50070/fsck?ugi=hdfs&path=%2Fapps%2Fhbase%2Fdata%2FoldWALs
FSCK started by hdfs (auth:SIMPLE) from /172.16.201.200 for path /apps/hbase/data/oldWALs at Thu Feb 08 09:57:58 CST 2018
/apps/hbase/data/oldWALs/test02.bigdata.hbh%2C16020%2C1515637922143..meta.1515745955950.meta: CORRUPT blockpool BP-1912246748-192.168.89.173-1513143837848 block blk_1073753450
/apps/hbase/data/oldWALs/test02.bigdata.hbh%2C16020%2C1515637922143..meta.1515745955950.meta: MISSING 1 blocks of total size 91 B..
/apps/hbase/data/oldWALs/test05.bigdata.hbh%2C16020%2C1515637921765.default.1515745929606: CORRUPT blockpool BP-1912246748-192.168.89.173-1513143837848 block blk_1073753446
/apps/hbase/data/oldWALs/test05.bigdata.hbh%2C16020%2C1515637921765.default.1515745929606: MISSING 1 blocks of total size 91 B.Status: CORRUPT
 Total size:    182 B
 Total dirs:    1
 Total files:   2
 Total symlinks:        0
 Total blocks (validated):  2 (avg. block size 91 B)
  ********************************
  UNDER MIN REPL'D BLOCKS:  2 (100.0 %)
  dfs.namenode.replication.min: 1
  CORRUPT FILES:    2
  MISSING BLOCKS:   2
  MISSING SIZE:     182 B
  CORRUPT BLOCKS:   2
  ********************************
 Minimally replicated blocks:   0 (0.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks:     0 (0.0 %)
 Default replication factor:    3
 Average block replication: 0.0
 Corrupt blocks:        2
 Missing replicas:      0
 Number of data-nodes:      4
 Number of racks:       1
FSCK ended at Thu Feb 08 09:57:58 CST 2018 in 1 milliseconds
The filesystem under path '/apps/hbase/data/oldWALs' is CORRUPT
[root@test01 ~]# sudo -u hdfs hadoop fs -rm /apps/hbase/data/oldWALs/test05.bigdata.hbh%2C16020%2C1515637921765.default.1515745929606
18/02/08 09:58:17 INFO fs.TrashPolicyDefault: Moved: 'hdfs://test01.bigdata.hbh:8020/apps/hbase/data/oldWALs/test05.bigdata.hbh%2C16020%2C1515637921765.default.1515745929606' to trash at: hdfs://test01.bigdata.hbh:8020/user/hdfs/.Trash/Current/apps/hbase/data/oldWALs/test05.bigdata.hbh%2C16020%2C1515637921765.default.1515745929606
[root@test01 ~]# sudo -u hdfs hdfs fsck /apps/hbase/data/oldWALs/ | egrep -v '^\.+$' | egrep -v '^$'
Connecting to namenode via http://test01.bigdata.hbh:50070/fsck?ugi=hdfs&path=%2Fapps%2Fhbase%2Fdata%2FoldWALs
FSCK started by hdfs (auth:SIMPLE) from /172.16.201.200 for path /apps/hbase/data/oldWALs at Thu Feb 08 09:58:24 CST 2018
/apps/hbase/data/oldWALs/test02.bigdata.hbh%2C16020%2C1515637922143..meta.1515745955950.meta: CORRUPT blockpool BP-1912246748-192.168.89.173-1513143837848 block blk_1073753450
/apps/hbase/data/oldWALs/test02.bigdata.hbh%2C16020%2C1515637922143..meta.1515745955950.meta: MISSING 1 blocks of total size 91 B.Status: CORRUPT
 Total size:    91 B
 Total dirs:    1
 Total files:   1
 Total symlinks:        0
 Total blocks (validated):  1 (avg. block size 91 B)
  ********************************
  UNDER MIN REPL'D BLOCKS:  1 (100.0 %)
  dfs.namenode.replication.min: 1
  CORRUPT FILES:    1
  MISSING BLOCKS:   1
  MISSING SIZE:     91 B
  CORRUPT BLOCKS:   1
  ********************************
 Minimally replicated blocks:   0 (0.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks:     0 (0.0 %)
 Default replication factor:    3
 Average block replication: 0.0
 Corrupt blocks:        1
 Missing replicas:      0
 Number of data-nodes:      4
 Number of racks:       1
FSCK ended at Thu Feb 08 09:58:24 CST 2018 in 1 milliseconds
The filesystem under path '/apps/hbase/data/oldWALs' is CORRUPT
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 213,014评论 6 492
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,796评论 3 386
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 158,484评论 0 348
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,830评论 1 285
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 65,946评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,114评论 1 292
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,182评论 3 412
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,927评论 0 268
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,369评论 1 303
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,678评论 2 327
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,832评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,533评论 4 335
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,166评论 3 317
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,885评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,128评论 1 267
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,659评论 2 362
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,738评论 2 351

推荐阅读更多精彩内容

  • 首先,我们在使用前先看看HDFS是什麽?这将有助于我们是以后的运维使用和故障排除思路的获得。 HDFS采用mast...
    W_Bousquet阅读 4,187评论 0 2
  • 一、系统参数配置优化 1、系统内核参数优化配置 修改文件/etc/sysctl.conf,添加如下配置,然后执行s...
    张伟科阅读 3,727评论 0 14
  • 翻译: http://hadoop.apache.org/docs/stable/hadoop-project-d...
    金刚_30bf阅读 489评论 0 0
  • 江畔夕阳一点红, 佳人起舞动天宫。 何须驾鹤寻仙子, 人间美景胜一重。
    一锦不还阅读 211评论 0 1
  • 古文中不只是糟粕,只是精华你没发现而已。 两千多年的历史,发生在1919年的五四新青年运动——毫无掺杂的一场纯...
    方太婆阅读 411评论 0 0