HDFS的Datanode启动异常:FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid
# vi /var/log/hadoop-hdfs/hadoop-hdfs-datanode-chefserver.log
2020-06-14 01:41:15,857 INFO org.apache.hadoop.hdfs.server.common.Storage: Using 1 threads to upgrade data directories (dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1)
2020-06-14 01:41:15,873 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /home/hadoop/tmp/dfs/data/in_use.lock acquired by nodename 15373@node2.hadoop.com
2020-06-14 01:41:15,877 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to add storage directory [DISK]file:/home/hadoop/tmp/dfs/data/
java.io.IOException: Incompatible clusterIDs in /home/hadoop/tmp/dfs/data: namenode clusterID = CID-c301ae20-232d-4115-a475-bd70fcec69f4; datanode clusterID = CID-d9d3ee37-5414-4f39-89ff-14ba02f7b7ec
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:779)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadStorageDirectory(DataStorage.java:302)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadDataStorage(DataStorage.java:418)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:397)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:575)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1570)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1530)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:354)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:219)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:674)
at java.lang.Thread.run(Thread.java:748)
2020-06-14 01:41:15,889 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool ID needed, but service not yet registered with NN, trace:
java.lang.Exception
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:190)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.hasBlockPoolId(BPOfferService.java:200)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.shouldRetryInit(BPOfferService.java:799)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.shouldRetryInit(BPServiceActor.java:713)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:679)
at java.lang.Thread.run(Thread.java:748)
2020-06-14 01:41:15,890 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid 61c3b7eb-d387-4d7b-93ef-927043960018) service to node1.hadoop.com/172.26.37.245:8020. Exiting.
java.io.IOException: All specified directories are failed to load.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:576)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1570)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1530)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:354)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:219)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:674)
at java.lang.Thread.run(Thread.java:748)
2020-06-14 01:41:15,890 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid 61c3b7eb-d387-4d7b-93ef-927043960018) service to node1.hadoop.com/172.26.37.245:8020
2020-06-14 01:41:15,994 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool ID needed, but service not yet registered with NN, trace:
java.lang.Exception
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:190)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.hasBlockPoolId(BPOfferService.java:200)
at org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.remove(BlockPoolManager.java:91)
at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:1485)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:437)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:457)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:708)
at java.lang.Thread.run(Thread.java:748)
2020-06-14 01:41:15,995 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering> (Datanode Uuid 61c3b7eb-d387-4d7b-93ef-927043960018)
2020-06-14 01:41:15,997 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool ID needed, but service not yet registered with NN, trace:
java.lang.Exception
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:190)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.hasBlockPoolId(BPOfferService.java:200)
at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:1486)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:437)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:457)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:708)
at java.lang.Thread.run(Thread.java:748)
2020-06-14 01:41:17,998 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2020-06-14 01:41:18,004 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2020-06-14 01:41:18,009 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at node2.hadoop.com/172.26.37.246
************************************************************/
1.系统环境:
OS:CentOS Linux release 7.5.1804 (Core)
CPU:2核心
Memory:1GB
运行用户:root
JDK版本:1.8.0_252
Hadoop版本:cdh5.16.2
2.问题原因
多次对namenode进行format,每一次format主节点NameNode产生新的clusterID、namespaceID,于是导致主节点的clusterID、namespaceID与各个子节点DataNode不一致。当format过后再启动hadoop,hadoop尝试创建新的current目录,但是由于已存在current目录,导致创建失败,最终引起DataNode节点的DataNode进程启动失败,从而引起hadoop集群完全启动失败。因此可以通过直接删除数据节点DataNode的current文件夹,进行解决该问题。
3.解决步骤
各Datanode节点删除数据目录
# cd /home/hadoop/tmp
# rm -rf *
各Datanode节点启动Datanode服务
# systemctl start hadoop-hdfs-datanode.service
namenode节点初始化hdfs
# sudo -u hdfs hdfs namenode -format
namenode节点启动
# systemctl start hadoop-hdfs-namenode
# systemctl status hadoop-hdfs-namenode
namenode节点上测试hdfs
#sudo -u hdfs hadoop fs -mkdir /tmp ####创建tmp文件夹
#sudo -u hdfs hadoop fs -chmod -R 1777 /tmp #### 修改权限
#sudo -u hdfs hadoop fs -ls / #### 查看文件