解决:启动hadoop-2.6.3集群遇到的datanode启动不了 - 推酷 http://www.tuicool.com/articles/2y6B3uA
hadoop日志报错信息如下:
2016-01-22 12:09:22,467 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/hadoop/tmp/dfs/data/in_use.lock acquired by nodename 13286@slave01
2016-01-22 12:09:22,469 WARN org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException: Incompatible clusterIDs in /data/hadoop/tmp/dfs/data: namenode clusterID = CID-ee797f67-7aa7-4ba5-8a13-ae174b6d0cf1; datanode clusterID = CID-9bd341cd-b887-49cb-b519-19772a3818c9
2016-01-22 12:09:22,470 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to master/192.168.1.120:9000. Exiting.
java.io.IOException: All specified directories are failed to load.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:478)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1338)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1304)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:314)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:226)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:867)
at java.lang.Thread.run(Thread.java:745)
2016-01-22 12:09:22,471 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to master/192.168.1.120:9000
2016-01-22 12:09:22,473 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering> (Datanode Uuid unassigned)
2016-01-22 12:09:24,474 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2016-01-22 12:09:24,476 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2016-01-22 12:09:24,477 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
从日志上看是因为 datanode的clusterID 和 namenode的clusterID 不匹配。
解决方法:
根据日志中的路径/data/hadoop/tmp/dfs,master可以看到name目录,salve可以看到data目录,
将name/current下的VERSION中的clusterID复制到data/current下的VERSION中,覆盖掉原来的clusterID,目的是让两个保持一致。
然后重启,就可以看到slave上的DataNode进程已经起来。
出现该问题的原因:在第一次格式化dfs后,启动并使用了hadoop,后来又重新执行了格式化命令(hdfs namenode -format),这时namenode的clusterID会重新生成,而datanode的clusterID 保持不变。