Hadoop安装教程 HA高可用模式


系统准备

  • 一共三台机器 hadoop-01、hadoop-02、hadoop-02
  • hadoop-01 做 NameNode(active); hadoop-02 做 NameNode(standby)
  • hadoop-02 做 ResourceManager(active); hadoop-03 做 ResourceManager(standby);
  • 三台机器都充当 DataNodeNodeManger
  • 三台机器都充当 JournalNode
  • 三台机器执行 yum install -y snappy

Hadoop 安装和配置

  • Hadoop 安装

    上传软件,将其解压值 ~/app目录
    tar -zxvf hadoop-{versionid}.tar.gz
    
  • Hadoop 配置

    Hadoop安装完成后,需要对主要的几个配置文件进行配置,如下:
    ${HADOOP_HOME}/etc/hadopp/core-site.xml
    ${HADOOP_HOME}/etc/hadopp/hdfs-site.xml
    ${HADOOP_HOME}/etc/hadopp/yarn-site.xml
    ${HADOOP_HOME}/etc/hadopp/slaves
    【配置参数中数据目录路径需要手动提前建立】
    

    针对 core-site.xml 文件:

      <configuration>
    
      <!-- #################### need modify #################### -->
    
      <property>
      <name>hadoop.tmp.dir</name>
      <value>路径/hadoop/tmp</value>
      <description>A base for other temporary directories.</description>
      </property>
    
      <!-- #################### need modify #################### -->
    
      <property>
      <name>fs.defaultFS</name>
      <value>hdfs://hyz</value>  <!--HA 模式下使用-->
      </property>
    
      <property>
      <name>io.compression.codecs</name>
      <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec</value>
      </property>
    
      <property>
      <name>ipc.client.connect.timeout</name>
      <value>120000</value>
      </property>
    
      <property>
      <name>ha.failover-controller.cli-check.rpc-timeout.ms</name>
      <value>120000</value>
      </property>
    
      </configuration>
    

    针对 hdfs-site.xml 文件:
    Hadoop官方提供了两种 HDFS 的 HA 配置: 一种基于QJM实现; 一种基于NFS实现

QJM:the Quorum Journal Manager,翻译是法定经济管理人,实在没法想象,所以大家都亲切的称之为QJM。这种方案是通过JournalNode共享EditLog的数据,使用的是Paxos算法(没错,zookeeper就是使用的这种算法),保证活跃的NameNode与备份的NameNode之间EditLog日志一致。
NFS:Network File System 或 Conventional Shared Storage,传统共享存储,其实就是在服务器挂载一个网络存储(比如NAS),活跃NameNode将EditLog的变化写到NFS,备份NameNode检查到修改就读取过来,是两个NameNode数据一致。

部署上:QJM需要启动几个JournalNode即可,NFS需要挂载一个共享存储
配置文件上: hdfs-site.xml的内容唯一的区别是 dfs.namenode.shared.edits.dir 属性的配置

本文采用 QJM 的方法配置 hdfs 的 HA 模式,hdfs-site.xml 内容如下:

 <configuration>

     <!-- #################### need modify #################### -->
     <property>
       <name>dfs.datanode.data.dir</name>
       <value>file:///路径/hadoop/data/datanode</value>
     </property>
     <!-- #################### need modify #################### -->

     <!-- #################### namenode ha #################### -->
     <property>
       <name>dfs.nameservices</name>
       <value>hyz</value>
     </property>

     <property>
       <name>dfs.ha.namenodes.hyz</name>
       <value>nn1,nn2</value>
     </property>

     <property>
       <name>dfs.namenode.rpc-address.hyz.nn1</name>
       <value>hadoop-01:8020</value>
     </property>

     <property>
       <name>dfs.namenode.rpc-address.hyz.nn2</name>
       <value>hadoop-02:8020</value>
     </property>

     <property>
       <name>dfs.namenode.http-address.hyz.nn1</name>
       <value>hadoop-01:50070</value>
     </property>

     <property>
       <name>dfs.namenode.http-address.hyz.nn2</name>
       <value>hadoop-02:50070</value>
     </property>

     <property>
       <name>dfs.namenode.servicerpc-address.hyz.nn1</name>
       <value>hadoop-01:53310</value>
     </property>

     <property>
       <name>dfs.namenode.servicerpc-address.hyz.nn2</name>
       <value>hadoop-02:53310</value>
     </property>

     <property>
       <name>dfs.namenode.shared.edits.dir</name>
       <value>qjournal://hadoop-01:8485;hadoop-02:8485;hadoop-03:8485/hyz</value>
     </property>

     <property>
       <name>dfs.journalnode.edits.dir</name>
       <value>路径/data/ha/journal</value>
     </property>

     <property>
       <name>dfs.journalnode.rpc-address</name>
       <value>0.0.0.0:8485</value>
     </property>

     <property>
       <name>dfs.journalnode.http-address</name>
       <value>0.0.0.0:8482</value>
     </property>

     <property>
       <name>dfs.client.failover.proxy.provider.hyz</name>
       <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
     </property>

     <property>
       <name>dfs.ha.fencing.methods</name>
       <value>sshfence</value>
     </property>

     <property>
       <name>dfs.ha.fencing.ssh.private-key-files</name>
       <value>路径/.ssh/id_rsa</value>
     </property>

     <property>
       <name>dfs.ha.fencing.ssh.connect-timeout</name>
       <value>10000</value>
     </property>

     <property>
       <name>dfs.ha.automatic-failover.enabled</name>
       <value>true</value>
     </property>

     <property>
       <name>ha.zookeeper.quorum</name>
       <value>zkF01:2181,zkF02:2181,zkF03:2181</value>
     </property>
     <!-- #################### namenode ha #################### -->

     <!-- #################### namenode #################### -->
     <property>
       <name>dfs.namenode.name.dir</name>
       <value>file:///路径/hadoop/data/namenod</value>
     </property>

     <property>
       <name>dfs.namenode.acls.enabled</name>
       <value>true</value>
     </property>

     <property>
       <name>dfs.namenode.handler.count</name>
       <value>50</value>
       <description>The number of server threads for the namenode.</description>
     </property>
     <!-- #################### namenode #################### -->

     <!-- #################### datanode #################### -->
     <property>
       <name>dfs.datanode.handler.count</name>
       <value>20</value>
       <description>The number of server threads for the datanode.</description>
     </property>

     <property>
       <name>dfs.datanode.max.xcievers</name>
       <value>8192</value>
     </property>

     <property>
       <name>dfs.datanode.socket.write.timeout</name>
       <value>480000</value>
     </property>

     <property>
       <name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
       <value>true</value>
     </property>

     <property>
       <name>dfs.datanode.failed.volumes.tolerated</name>
       <value>0</value>
     </property>

     <!-- #################### datanode #################### -->
     <property>
       <name>dfs.permissions.enabled</name>
       <value>true</value>
       <description>If "true", enable permission checking in HDFS. If "false", permission checking is turned off, but all other behavior is unchanged. Switching from one parameter value to the other does not change the mode, owner or group of files or directories.</description>
     </property>

     <property>
       <name>dfs.permissions</name>
       <value>false</value>
     </property>

     <property>
       <name>dfs.replication.min</name>
       <value>3</value>
       <description>Minimal block replication.</description>
     </property>

     <property>
       <name>dfs.replication</name>
       <value>3</value>
     </property>

     <property>
       <name>dfs.support.append</name>
       <value>true</value>
       <description>set if hadoop support append</description>
     </property>

     <property>
       <name>fs.checkpoint.period</name>
       <value>60</value>
       <description>set if hadoop support append</description>
     </property>

     <property> 
       <name>dfs.balance.bandwidthPerSec</name> 
       <value>10485760</value> 
       <description>Specifies the maximum bandwidth that each datanode can utilize for the balancing purpose in term of the number of bytes per second.</description> 
     </property>                     

     <property>
       <name>fs.hdfs.impl.disable.cache</name>
       <value>false</value>
     </property>

     <property>
       <name>dfs.socket.timeout</name>
       <value>1800000</value>
     </property>

     <property>
       <name>fs.trash.interval</name>
       <value>1440</value>
       <description>Number of minutes between trash checkpoint. if zero, the trash feature is disabled.</description>
     </property>

     <property>
       <name>dfs.blocksize</name>
       <value>134217728</value>
     </property>

 </configuration>

针对 yarn-site.xml 文件:

 <configuration>
 
 <!-- #################### need modify #################### -->
 <property>
 <name>yarn.nodemanager.resource.cpu-vcores</name>
 <value>CPU核数</value>
 </property>

 <property>
 <name>yarn.nodemanager.resource.memory-mb</name>
 <value>Mem大小</value>
 </property>

 <property>
 <name>yarn.nodemanager.local-dirs</name>
 <value>路径/yarn/yarnlocal</value>
 </property>

 <property>
 <name>yarn.nodemanager.log-dirs</name>
 <value>路径/yarn/yarnlog</value>
 </property>
 
 <property>
 <name>yarn.nodemanager.remote-app-log-dir</name>
 <value>路径/yarn/remote-app-logs</value>
 </property>

 <!-- #################### need modify #################### -->

 <!-- #################### resource manager ha #################### -->

 <property>
 <name>yarn.resourcemanager.ha.enabled</name>
 <value>true</value>
 </property>

 <property>
 <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
 <value>true</value>
 </property>

 <property>
 <name>yarn.resourcemanager.cluster-id</name>
 <value>yarn-cluster</value>
 </property>

 <property>
 <name>yarn.resourcemanager.ha.rm-ids</name>
 <value>rm1,rm2</value>
 </property>

 <property>
 <name>yarn.resourcemanager.ha.id</name>
 <value>rm1</value> <!--注意修改在rm2的机器上 -->
 </property>

 <property>
 <name>yarn.resourcemanager.store.class</name>
 <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
 </property>

 <property>
 <name>yarn.resourcemanager.zk-address</name>
 <value>zkF01:2181,zkF02:2181,zkF03:2181</value>
 </property>

 <property>
 <name>yarn.resourcemanager.zk.state-store.address</name>
 <value>zkF01:2181,zkF02:2181,zkF03:2181</value>
 </property>

 <property>
 <name>ha.zookeeper.quorum</name>
 <value>zkF01:2181,zkF02:2181,zkF03:2181</value>
 </property>

 <property>
 <name>yarn.resourcemanager.recovery.enabled</name>
 <value>true</value>
 </property>

 <property>
 <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
 <value>5000</value>
 </property>

 <property>
 <name>mapreduce.jobhistory.address</name>
 <value>hadoop-02:10020</value>
 </property>

 <property>
 <name>mapreduce.jobhistory.webapp.address</name>
 <value>hadoop-02:19888</value>
 </property>

 <property>
 <name>yarn.log-aggregation-enable</name>
 <value>true</value>
 </property>

 <!-- RM1 configs -->

 <property>
 <name>yarn.resourcemanager.address.rm1</name>
 <value>hadoop-02:8032</value>
 </property>

 <property>
 <name>yarn.resourcemanager.scheduler.address.rm1</name>
 <value>hadoop-02:8030</value>
 </property>

 <property>
 <name>yarn.resourcemanager.webapp.address.rm1</name>
 <value>hadoop-02:8088</value>
 </property>

 <property>
 <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
 <value>hadoop-02:8031</value>
 </property>

 <property>
 <name>yarn.resourcemanager.admin.address.rm1</name>
 <value>hadoop-02:8033</value>
 </property>

 <property>
 <name>yarn.resourcemanager.ha.admin.address.rm1</name>
 <value>hadoop-02:23142</value>
 </property>

 <!-- RM2 configs -->

 <property>
 <name>yarn.resourcemanager.address.rm2</name>
 <value>hadoop-03:8032</value>
 </property>

 <property>
 <name>yarn.resourcemanager.scheduler.address.rm2</name>
 <value>hadoop-03:8030</value>
 </property>

 <property>
 <name>yarn.resourcemanager.webapp.address.rm2</name>
 <value>hadoop-03:8088</value>
 </property>

 <property>
 <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
 <value>hadoop-03:8031</value>
 </property>

 <property>
 <name>yarn.resourcemanager.admin.address.rm2</name>
 <value>hadoop-03:8033</value>
 </property>

 <property>
 <name>yarn.resourcemanager.ha.admin.address.rm2</name>
 <value>hadoop-03:23142</value>
 </property>

 <!-- #################### resource manager ha #################### -->

 <!-- #################### node manager #################### -->

 <!-- Node Manager Configs -->

 <property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
 </property>

 <property>
 <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
 <value>org.apache.hadoop.mapred.ShuffleHandler</value>
 </property>

 <property>
 <description>Address where the localizer IPC is.</description>
 <name>yarn.nodemanager.localizer.address</name>
 <value>0.0.0.0:8040</value>
 </property>

 <property>
 <description>NM Webapp address.</description>
 <name>yarn.nodemanager.webapp.address</name>
 <value>0.0.0.0:8042</value>
 </property>

 <property>
 <name>yarn.nodemanager.log.retain-seconds</name>
 <value>10800</value>
 </property>

 <property>
 <name>yarn.log-aggregation.retain-seconds</name>
 <value>43200</value>
 </property>

 <property>
 <name>yarn.log-aggregation.retain-check-interval-seconds</name>
 <value>7200</value>
 </property>

 <!-- #################### node manager #################### -->

 </configuration>

针对 slaves 文件:
hadoop-01 hadoop-02 hadoop-03


Hadoop 启动

  • ${HADOOP_HOME}/sbin/hadoop-env.sh 文件中,手动修改 export JAVA_HOME = /opt/jdk (要绝对路径)
  • nn1
    • ${HADOOP_HOME}/bin/hdfs zkfc -formatZK
  • nn1/nn2/nn3
    • ${HADOOP_HOME}/sbin/hadoop-daemon.sh start journalnode
  • nn1
    • ${HADOOP_HOME}/bin/hdfs namenode -format
    • ${HADOOP_HOME}/sbin/hadoop-daemon.sh start namenode
  • nn2
    • ${HADOOP_HOME}/bin/hdfs namenode -bootstrapStandby
    • ${HADOOP_HOME}/sbin/hadoop-daemon.sh start namenode
  • nn1
    • ${HADOOP_HOME}/sbin/hadoop-daemons.sh start datanode
  • nn1/nn2
    • ${HADOOP_HOME}/sbin/hadoop-daemon.sh start zkfc
  • nn2/nn3
    • ${HADOOP_HOME}/sbin/start-yarn.sh
  • nn2
    • ${HADOOP_HOME}/sbin/mr-jobhistory-daemon.sh start historyserver

Hadoop 检查

  • jps 命令检查进程情况
    hadoop-01的线程情况.png

    hadoop-02的线程情况.png

    hadoop-03的线程情况.png
  • 浏览器输入 http://hadoop-01:50070 查看 NameNode 状态
    namenode状态.png
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • 终极算法 关注微信号每天收听我们的消息终极算法为您推送精品阅读 前言 Hadoop 在大数据技术体系中的地位至关...
    Yespon阅读 130,749评论 12 168
  • 1. Zookeeper介绍: 1.基本介绍: Zookeeper: 为分布式应用提供分布式协作(协调)服务。使用...
    奉先阅读 10,074评论 0 10
  • 一:注意事项和准备工作 1.1 准备3台Linux,可以ubuntu或者centOS ,本机设备性能不高,建议用...
    有一束阳光叫温暖阅读 3,173评论 0 2
  • 如何做1000表 1000表,是每一个正式的合格经销商每个季度(3个月)需要完成的一张季度报表。这张表格,将成为你...
    人在世上飘阅读 4,898评论 0 0
  • 整理了一下手机里的图片传到电脑里,看到你,想起从前的种种。 然后,我跪在跟你一起看星星的阳台上,祈求头上的神明,让...
    过去时阅读 2,720评论 0 0

友情链接更多精彩内容