背景
集群规模达到上千节点,集群数据存储规模达到40P左右,导致单namespace存在性能瓶颈,集群需要配置多namespace,即配置联邦(Federation)策略,来解决单namespace性能问题
集群节点规划
共配置了三个namespace
集群服务端hdfs-site.xml配置文件
<property>
<name>dfs.nameservices</name>
<value>testCluster01,testCluster02,testCluster03,testClusterFed</value>
</property>
<property>
<name>dfs.ha.namenodes.testCluster01</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.ha.namenodes.testCluster02</name>
<value>nn3,nn4</value>
</property>
<property>
<name>dfs.ha.namenodes.testCluster03</name>
<value>nn5,nn6</value>
</property>
<!-- set the full address and IPC port of the NameNode processs -->
<property>
<name>dfs.namenode.rpc-address.testCluster01.nn1</name>
<value>10-11-09-222-test:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.testCluster01.nn1</name>
<value>10-11-09-222-test:8040</value>
</property>
<property>
<name>dfs.namenode.rpc-address.testCluster01.nn2</name>
<value>10-11-09-226-test:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.testCluster01.nn2</name>
<value>10-11-09-226-test:8040</value>
</property>
<property>
<name>dfs.namenode.rpc-address.testCluster02.nn3</name>
<value>10-11-09-223-test:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.testCluster02.nn3</name>
<value>10-11-09-223-test:8040</value>
</property>
<property>
<name>dfs.namenode.rpc-address.testCluster02.nn4</name>
<value>10-11-09-224-test:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.testCluster02.nn4</name>
<value>10-11-09-224-test:8040</value>
</property>
<property>
<name>dfs.namenode.rpc-address.testCluster03.nn5</name>
<value>10-11-09-225-test:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.testCluster03.nn5</name>
<value>10-11-09-225-test:8040</value>
</property>
<property>
<name>dfs.namenode.rpc-address.testCluster03.nn6</name>
<value>10-11-09-227-test:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.testCluster03.nn6</name>
<value>10-11-09-227-test:8040</value>
</property>
<!-- set the addresses for both NameNodes’ HTTP servers to listen on -->
<property>
<name>dfs.namenode.http-address.testCluster01.nn1</name>
<value>10-11-09-222-test:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.testCluster01.nn2</name>
<value>10-11-09-226-test:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.testCluster02.nn3</name>
<value>10-11-09-223-test:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.testCluster02.nn4</name>
<value>10-11-09-224-test:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.testCluster03.nn5</name>
<value>10-11-09-225-test:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.testCluster03.nn6</name>
<value>10-11-09-227-test:9870</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://10-11-09-222-test:8485;10-11-09-223-test:8485;10-11-09-224-test:8485;10-11-09-225-test:8485;10-11-09-226-test:8485/testCluster01</value>
</property>
注意dfs.namenode.shared.edits.dir属性对应的value,在另外两个namespace所在的namenode节点,分别对应为testCluster02,testCluster03,core-site.xml文件中fs.defaultFS属性配置为hdfs://testCluster01,这个代表默认访问的namespace
集群服务端hdfs-rbf-site.xml配置文件
<configuration>
<property>
<name>dfs.federation.router.default.nameserviceId</name>
<value>testCluster01</value>
</property>
<property>
<name>dfs.federation.router.default.nameservice.enable</name>
<value>true</value>
</property>
<property>
<name>dfs.federation.router.rpc.enable</name>
<value>true</value>
</property>
<property>
<name>dfs.federation.router.rpc-address</name>
<value>0.0.0.0:8888</value>
</property>
<property>
<name>dfs.federation.router.rpc-bind-host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>dfs.federation.router.handler.count</name>
<value>20</value>
</property>
<property>
<name>dfs.federation.router.handler.queue.size</name>
<value>200</value>
</property>
<property>
<name>dfs.federation.router.reader.count</name>
<value>5</value>
</property>
<property>
<name>dfs.federation.router.reader.queue.size</name>
<value>100</value>
</property>
<property>
<name>dfs.federation.router.connection.pool-size</name>
<value>6</value>
</property>
<property>
<name>dfs.federation.router.metrics.enable</name>
<value>true</value>
</property>
<property>
<name>dfs.client.failover.random.order</name>
<value>true</value>
</property>
<property>
<name>dfs.federation.router.file.resolver.client.class</name>
<value>org.apache.hadoop.hdfs.server.federation.resolver.MultipleDestinationMountTableResolver</value>
</property>
</configuration>
启动集群服务
- 在testCluster01对应的其中一台namenode节点执行hdfs zkfc -formatZK命令
- 在testCluster01对应的两台namenode节点,执行hdfs --daemon start zkfc命令
- 对于testCluster02,testCluster03 同样执行步骤1,2
- 在分配的journalnode节点上执行hdfs --daemon start journalnode命令
- 在testCluster01对应的其中一台namenode节点执行hdfs namenode -format命令,并执行 hdfs --daemon start namenode命令启动namenode进程
- 在testCluster01对应的另一台namenode节点,执行hdfs namenode -bootstrapStandby命令,并执行 hdfs --daemon start namenode命令启动namenode进程
- 在testCluster02对应的其中一台namenode节点执行hdfs namenode -format -clusterId CID-71d6a554-2919-45aa-b315-d05282e2ad15 命令,其中CID-71d6a554-2919-45aa-b315-d05282e2ad15为第一个namespace格式化后生成的clusterId,并执行 hdfs --daemon start namenode命令启动namenode进程
- 在testCluster02对应的另一台namenode节点,执行hdfs namenode -bootstrapStandby命令,并执行 hdfs --daemon start namenode命令启动namenode进程
- 在testCluster03执行同样的步骤7,8
- 启动datanode进程
启动router进程
在规划的router节点上执行hdfs --daemon start dfsrouter命令,启动联邦服务,默认端口8888,服务启动后访问界面如下:
配置挂载目录
hdfs dfsrouteradmin -add /testCluster01Root testCluster01 /
hdfs dfsrouteradmin -add /testCluster02Root testCluster02 /
hdfs dfsrouteradmin -add /testCluster02Root testCluster03 /
具体操作语法命令可参照apache官网
客户端访问方式
客户端如果想要通过router的方式访问联邦集群,hdfs-site.xml做如下调整,添加新的namespace(testClusterFed),并配置高可用模式,注意dfs.namenode.rpc-address.testClusterFed.r1属性配置的是router对应的服务,端口是8888
<property>
<name>dfs.nameservices</name>
<value>testCluster01,testCluster02,testCluster03,testClusterFed</value>
</property>
<property>
<name>dfs.ha.namenodes.testClusterFed</name>
<value>r1,r2,r3,r4,r5,r6</value>
</property>
<property>
<name>dfs.namenode.rpc-address.testClusterFed.r1</name>
<value>10-11-09-222-test:8888</value>
</property>
<property>
<name>dfs.namenode.rpc-address.testClusterFed.r2</name>
<value>10-11-09-223-test:8888</value>
</property>
<property>
<name>dfs.namenode.rpc-address.testClusterFed.r3</name>
<value>10-11-09-224-test:8888</value>
</property>
<property>
<name>dfs.namenode.rpc-address.testClusterFed.r4</name>
<value>10-11-09-225-test:8888</value>
</property>
<property>
<name>dfs.namenode.rpc-address.testClusterFed.r5</name>
<value>10-11-09-226-test:8888</value>
</property>
<property>
<name>dfs.namenode.rpc-address.testClusterFed.r6</name>
<value>10-11-09-227-test:8888</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.testClusterFed</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.client.failover.random.order</name>
<value>true</value>
</property>
即可通过hadoop fs -ls hdfs://testClusterFed/testCluster01Root 方式访问联邦集群