一. Hadoop数据存放路径
core-site.xml文件位于$HADOOP_HOME/etc/hadoop路径下,配置了hadoop数据存放路径,包括namenode、datanode、journalnode的相关数据。
[root@hadoop001 hadoop]# ll
total 156
-rw-r--r-- 1 root root 4436 Jan 31 14:53 capacity-scheduler.xml
-rw-r--r-- 1 root root 1335 Jan 31 14:53 configuration.xsl
-rw-r--r-- 1 root root 318 Jan 31 14:53 container-executor.cfg
-rw-r--r-- 1 root root 1886 Feb 5 09:36 core-site.xml
-rw-r--r-- 1 root root 3589 Jan 31 14:53 hadoop-env.cmd
-rw-r--r-- 1 root root 4238 Jan 31 14:53 hadoop-env.sh
-rw-r--r-- 1 root root 2598 Jan 31 14:53 hadoop-metrics2.properties
-rw-r--r-- 1 root root 2490 Jan 31 14:53 hadoop-metrics.properties
-rw-r--r-- 1 root root 9683 Jan 31 14:53 hadoop-policy.xml
-rw-r--r-- 1 root root 3547 Feb 5 11:19 hdfs-site.xml
-rw-r--r-- 1 root root 1449 Jan 31 14:53 httpfs-env.sh
-rw-r--r-- 1 root root 1657 Jan 31 14:53 httpfs-log4j.properties
-rw-r--r-- 1 root root 21 Jan 31 14:53 httpfs-signature.secret
-rw-r--r-- 1 root root 620 Jan 31 14:53 httpfs-site.xml
-rw-r--r-- 1 root root 3518 Jan 31 14:53 kms-acls.xml
-rw-r--r-- 1 root root 1527 Jan 31 14:53 kms-env.sh
-rw-r--r-- 1 root root 1631 Jan 31 14:53 kms-log4j.properties
-rw-r--r-- 1 root root 5511 Jan 31 14:53 kms-site.xml
-rw-r--r-- 1 root root 11237 Jan 31 14:53 log4j.properties
-rw-r--r-- 1 root root 931 Jan 31 14:53 mapred-env.cmd
-rw-r--r-- 1 root root 1383 Jan 31 14:53 mapred-env.sh
-rw-r--r-- 1 root root 4113 Jan 31 14:53 mapred-queues.xml.template
-rw-r--r-- 1 root root 1479 Feb 2 10:46 mapred-site.xml
-rw------- 1 root root 3297 Feb 2 13:07 nohup.out
-rw-r--r-- 1 root root 30 Jan 31 14:53 slaves
-rw-r--r-- 1 root root 2316 Jan 31 14:53 ssl-client.xml.example
-rw-r--r-- 1 root root 2268 Jan 31 14:53 ssl-server.xml.example
-rw-r--r-- 1 root root 2191 Jan 31 14:53 yarn-env.cmd
-rw-r--r-- 1 root root 4567 Jan 31 14:53 yarn-env.sh
-rw-r--r-- 1 root root 2276 Feb 2 10:52 yarn-site.xml
二. 查看core-site.xml文件配置
[root@hadoop001 hadoop]# cat core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!--指定在Zookeeper上注册的节点的名字-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns</value>
</property>
<!--指定Hadoop数据存放目录-->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-2.7.3/data</value>
</property>
<!--指定zookeeper的连接地址-->
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
</property>
<!--指定代理用户主机列表-->
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
<description>配置*,表示任意主机。也能指定是某些主机(主机之间用英文逗号分开)。若是有指定,则表示超级用户代理功能只支持指定的主机,在其它主机节点仍会报错。</description>
</property>
<!--指定代理用户的用户组列表-->
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
<description>配置*,表示任意组。也能指定是某个组(组之间用英文逗号隔开),若是有指定,则表示该组下面的用户可提高为超级用户代理</description>
</property>
</configuration>
[root@hadoop001 hadoop]# cat core-site.xml | more
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!--指定在Zookeeper上注册的节点的名字-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns</value>
</property>
<!--指定Hadoop数据存放目录-->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-2.7.3/data</value>
</property>
<!--指定zookeeper的连接地址-->
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
</property>
<!--指定代理用户主机列表-->
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
<description>配置*,表示任意主机。也能指定是某些主机(主机之间用英文逗号分开)。若是有指定,
则表示超级用户代理功能只支持指定的主机,在其它主机节点仍会报错。</description>
</property>
<!--指定代理用户的用户组列表-->
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
<description>配置*,表示任意组。也能指定是某个组(组之间用英文逗号隔开),若是有指定,则表示
该组下面的用户可提高为超级用户代理</description>
</property>
</configuration>
三. 重点关注Hadoop数据存放目录
<!--指定Hadoop数据存放目录-->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-2.7.3/data</value>
</property>
四. 查看Hadoop数据存放目录层次结构
接下来,分别看看集群中不同节点的data目录下有些什么
- hadoop001
[root@hadoop001 data]# tree -d -L 2 -C
.
├── hdfs
│ ├── data
│ └── name
├── journal
│ └── ns
└── nm-local-dir
├── filecache
├── nmPrivate
└── usercache
- hadoop002
[root@hadoop002 data]# tree -d -L 2 -C
.
├── hdfs
│ ├── data
│ └── name
├── journal
│ └── ns
└── nm-local-dir
├── filecache
├── nmPrivate
└── usercache
- hadoop003
[root@hadoop003 data]# tree -d -L 2 -C
.
├── hdfs
│ └── data
├── journal
│ └── ns
└── nm-local-dir
├── filecache
├── nmPrivate
└── usercache
8 directories
显然,hadoop001和hadoop002的data目录下文件层次结构是一致的,hadoop003的data目录下少了/hdfs/name相关内容。这是因为hadoop001和hadoop002分别是Namenode主备节点,而hadoop003仅为datanode。
五. 探索namenode存放的数据文件
进入/opt/module/hadoop-2.7.3/data/hdfs/name
- hadoop001
[root@hadoop001 name]# tree
.
├── current
│ ├── edits_0000000000000000001-0000000000000000002
│ ├── edits_0000000000000000003-0000000000000000004
│ ├── edits_0000000000000000005-0000000000000000006
│ ├── edits_0000000000000000007-0000000000000000008
│ ├── edits_0000000000000000009-0000000000000000010
│ ├── edits_0000000000000000011-0000000000000000012
│ ├── edits_0000000000000000013-0000000000000000014
│ ├── edits_0000000000000000015-0000000000000000016
│ ├── edits_0000000000000000017-0000000000000000018
│ ├── edits_0000000000000000019-0000000000000000020
│ ├── edits_0000000000000000021-0000000000000000022
│ ├── edits_0000000000000000023-0000000000000000024
│ ├── edits_0000000000000000025-0000000000000000026
│ ├── edits_0000000000000000027-0000000000000000028
│ ├── edits_0000000000000000029-0000000000000000030
│ ├── edits_0000000000000000031-0000000000000000032
│ ├── edits_0000000000000000033-0000000000000000034
│ ├── edits_0000000000000000035-0000000000000000036
│ ├── edits_0000000000000000037-0000000000000000038
│ ├── edits_0000000000000000039-0000000000000000040
│ ├── edits_0000000000000000041-0000000000000000042
│ ├── edits_0000000000000000043-0000000000000000044
│ ├── edits_0000000000000000045-0000000000000000046
│ ├── edits_0000000000000000047-0000000000000000048
│ ├── edits_0000000000000000049-0000000000000000050
│ ├── edits_0000000000000000051-0000000000000000052
│ ├── edits_0000000000000000053-0000000000000000054
│ ├── edits_0000000000000000055-0000000000000000056
│ ├── edits_0000000000000000057-0000000000000000058
│ ├── edits_0000000000000000059-0000000000000000060
│ ├── edits_0000000000000000061-0000000000000000062
│ ├── edits_0000000000000000063-0000000000000000064
│ ├── edits_0000000000000000065-0000000000000000066
│ ├── edits_0000000000000000067-0000000000000000068
│ ├── edits_0000000000000000069-0000000000000000070
│ ├── edits_0000000000000000071-0000000000000000072
│ ├── edits_0000000000000000073-0000000000000000074
│ ├── edits_0000000000000000075-0000000000000000076
│ ├── edits_0000000000000000077-0000000000000000078
│ ├── edits_0000000000000000079-0000000000000000080
│ ├── edits_0000000000000000081-0000000000000000082
│ ├── edits_0000000000000000083-0000000000000000084
│ ├── edits_0000000000000000085-0000000000000000086
│ ├── edits_0000000000000000087-0000000000000000088
│ ├── edits_0000000000000000089-0000000000000000090
│ ├── edits_0000000000000000091-0000000000000000092
│ ├── edits_0000000000000000093-0000000000000000094
│ ├── edits_0000000000000000095-0000000000000000096
│ ├── edits_0000000000000000097-0000000000000000098
│ ├── edits_0000000000000000099-0000000000000000100
│ ├── edits_0000000000000000101-0000000000000000102
│ ├── edits_0000000000000000103-0000000000000000104
│ ├── edits_0000000000000000105-0000000000000000106
│ ├── edits_0000000000000000109-0000000000000000110
│ ├── edits_0000000000000000111-0000000000000000112
│ ├── edits_0000000000000000113-0000000000000000114
│ ├── edits_0000000000000000115-0000000000000000116
│ ├── edits_0000000000000000117-0000000000000000118
│ ├── edits_0000000000000000119-0000000000000000120
│ ├── edits_0000000000000000121-0000000000000000122
│ ├── edits_0000000000000000123-0000000000000000124
│ ├── edits_0000000000000000125-0000000000000000126
│ ├── edits_0000000000000000127-0000000000000000128
│ ├── edits_0000000000000000129-0000000000000000130
│ ├── edits_0000000000000000131-0000000000000000132
│ ├── edits_0000000000000000133-0000000000000000134
│ ├── edits_0000000000000000135-0000000000000000136
│ ├── edits_0000000000000000137-0000000000000000138
│ ├── edits_0000000000000000139-0000000000000000140
│ ├── edits_0000000000000000141-0000000000000000142
│ ├── edits_0000000000000000143-0000000000000000144
│ ├── edits_0000000000000000145-0000000000000000146
│ ├── edits_0000000000000000147-0000000000000000148
│ ├── edits_0000000000000000149-0000000000000000150
│ ├── edits_0000000000000000151-0000000000000000152
│ ├── edits_0000000000000000153-0000000000000000154
│ ├── edits_0000000000000000155-0000000000000000156
│ ├── edits_0000000000000000157-0000000000000000158
│ ├── edits_0000000000000000159-0000000000000000160
│ ├── edits_0000000000000000161-0000000000000000162
│ ├── edits_inprogress_0000000000000000163
│ ├── fsimage_0000000000000000000
│ ├── fsimage_0000000000000000000.md5
│ ├── fsimage_0000000000000000052
│ ├── fsimage_0000000000000000052.md5
│ ├── seen_txid
│ └── VERSION
└── in_use.lock
1 directory, 88 files
- hadoop002
[root@hadoop002 name]# tree -C
.
├── current
│ ├── edits_0000000000000000107-0000000000000000108
│ ├── fsimage_0000000000000000000
│ ├── fsimage_0000000000000000000.md5
│ ├── fsimage_0000000000000000052
│ ├── fsimage_0000000000000000052.md5
│ ├── seen_txid
│ └── VERSION
└── in_use.lock
1 directory, 8 files
hadoop001和hadoop002是主备关系,$HADOOP_HOME/etc/hadoop/data/hdfs/name路径下存放了主备同步所需的三类文件:edits、fsimage、seen_txid。
六. 解读Namenode的VERSION文件
除上述主备同步的文件以外,还有一个VERSION文件,用于记录namenode启动或重启、重新格式化后的重要信息。
- hadoop001
[root@hadoop001 current]# cat VERSION
#Sun Apr 17 09:48:48 CST 2022
namespaceID=1729410556
clusterID=CID-bdefd1cb-a53b-4300-bda2-39aaeae2abf6
cTime=0
storageType=NAME_NODE
blockpoolID=BP-1679095799-192.168.5.101-1650160128452
layoutVersion=-63
- hadoop002
[root@hadoop002 current]# cat VERSION
#Sun Apr 17 10:51:09 CST 2022
namespaceID=1729410556
clusterID=CID-bdefd1cb-a53b-4300-bda2-39aaeae2abf6
cTime=0
storageType=NAME_NODE
blockpoolID=BP-1679095799-192.168.5.101-1650160128452
layoutVersion=-63
七. 查看Datanode的VERSION
以hadoop003为例
[root@hadoop003 data]# tree -L 2
.
├── current
│ ├── BP-1601307948-192.168.5.101-1643618166217
│ ├── BP-1679095799-192.168.5.101-1650160128452
│ ├── BP-779737500-192.168.5.101-1648285297406
│ └── VERSION
└── in_use.lock
[root@hadoop003 current]# cat VERSION
#Sun Apr 17 11:32:18 CST 2022
storageID=DS-b2ab27c1-1402-4aaa-a869-f0dd49285a73
clusterID=CID-bdefd1cb-a53b-4300-bda2-39aaeae2abf6
cTime=0
datanodeUuid=43180dbd-f996-40cb-94e2-2d43d7da489a
storageType=DATA_NODE
layoutVersion=-56
八. 查看Journal_Node的VERSION
集群中每个节点都部署了Journal_node,以hadoop003为例
[root@hadoop001 current]# cat VERSION
#Sun Apr 17 09:48:48 CST 2022
namespaceID=1729410556
clusterID=CID-bdefd1cb-a53b-4300-bda2-39aaeae2abf6
cTime=0
storageType=JOURNAL_NODE
layoutVersion=-63
九. 关键结论
- 高可用集群(ha集群),有两个namenode,一个active状态,一个standby状态。
- 当配置好第一个namenode后,第一次启动第一台namenode:hadoop-daemon.sh start namenode
- 第一次格式化namenode会产生集群ID(ClusterID):hdfs namenode -format
- 在另一台namenode执行:hdfs namenode -bootstrapStandby 同步集群ID到第二台namenode
- 再次或多次对namenode格式化format后,会生成新的集群ID,其他节点的进程的集群ID却还是第一次格式化同步后的,集群ID版本不一致,不匹配。
namenode、datanode、journalnode的VERSION中记录的clusterID必须一致。