013 Hadoop 高可用 - Namenode 自动故障切换

013 Hadoop High Availability – Namenode Automatic Failover

Before Hadoop 2.0 that is Hadoop 1.0 faced a single point of failure (SPOF) in NameNode. This means if the NameNode failed the entire system would not function and manual intervention was necessary to bring the Hadoop cluster up with the help of secondary NameNode which resulted in overall downtime. With Hadoop 2.0 we had single standby node to facilitate automatic failover and with Hadoop 3.0 which supports multiple standby nodes, the system has become even more highly available. In this tutorial, we will talk about Hadoop high availability. We will look at various types of failover and discuss in detail how the components of Zookeeper provide for automatic failover.

在 Hadoop 2.0 之前,Hadoop 1.0 在 NameNode 中面临单点故障 (SPOF).这意味着,如果 NameNode 出现故障,整个系统将无法运行,需要手动干预Hadoop 集群在二级 NameNode 的帮助下,导致了整体停机.借助 Hadoop 2.0,我们有了单个备用节点,以方便自动故障切换; 借助支持多个备用节点的 Hadoop 3.0,系统变得更加可用.在本教程中,我们将讨论 Hadoop 高可用性.我们将研究各种类型的故障切换,并详细讨论动物园管理员组件提供自动故障切换.

Hadoop High Availability – Automatic Failover

1. What is Hadoop High Availability?

With Hadoop 2.0, we have support for multiple NameNodes and with Hadoop 3.0 we have standby nodes. This overcomes the SPOF (Single Point Of Failure) issue using an extra NameNode (Passive Standby NameNode) for automatic failover. This is the high availability in Hadoop.

借助 Hadoop 2.0,我们支持多个名称节点,借助 Hadoop 3.0,我们拥有备用节点.这克服了使用额外的 NameNode (被动备用 NameNode) 进行自动故障切换的 SPOF (单点故障) 问题.这是 Hadoop 中的高可用性.

i. What is Failover?

i. 故障切换是什么

Failover is a process in which the system transfers control to a secondary system in an event of failure.

故障切换是指在发生故障时,系统将控制转移到辅助系统的过程.

There are two types of failover:

故障切换有两种类型:

  • Graceful Failover – In this type of failover the administrator manually initiates it. We use graceful failover in case of routine system maintenance. There is a need to manually transfer the control to standby NameNode it does not happen automatically.
  • **Automatic Failover – **In Automatic Failover, the system automatically transfers the control to standby NameNode without manual intervention. Without this automatic failover if the NameNode goes down then the entire system goes down. Hence the feature of Hadoop high availability is available only with this automatic failover, it acts as your insurance policy against a single point of failure.

  • 优雅的故障切换在这种类型的故障切换中,管理员手动启动它.在日常系统维护的情况下,我们使用优雅的故障切换.需要手动将控件转移到备用名称节点,它不会自动发生.
  • 自动故障切换自动故障切换,系统在没有人工干预的情况下自动将控制转移到备用名称节点.如果 NameNode 出现故障,那么整个系统就会出现故障.因此,Hadoop 高可用性的特性只有在这种自动故障切换时才可用,它充当了您针对单点故障的保险单.

2. NameNode High Availability in Hadoop

2. 、Hadoop 复制指令的高可用性

Automatic failover in Hadoop adds up below components to a Hadoop HDFS deployment:

Hadoop 中的自动故障切换将以下组件添加到 Hadoop HDFS 部署中:

  • ZooKeeper quorum.

  • ZKFailoverController Process (ZKFC).

  • 动物园管理员人数

  • 处理 (ZKFC).

i. Zookeeper Quorum

Zookeeper quorum is a centralized service for maintaining small amounts of data for coordination, configuration, and naming. It provides group services and synchronization. It keeps the client informed about changes in data and track client failures. Implementation of automatic HDFS failover relies on Zookeeper for:

Zookeeper 是用于维护少量数据以进行协调、配置和命名的集中服务.它提供组服务和同步.它让客户了解数据的变化,并跟踪客户故障.执行自动 HDFS 失败转移功能依赖于管理员的:

  • Failure detection- Zookeeper maintains a session with NameNode. In the event of failure, this session expires and the zookeeper informs the other NameNodes to start the failover process.
  • Active NameNode election- Zookeeper provides a method to elect a node as an active node. Hence whenever his active NameNode fails, other NameNode takes on exclusive lock in the Zookeeper, stating that it wants to become the next active NameNode.

ii. ZKFailoverController (ZKFC)

ZKFC is a client of Zookeeper that monitors and manages the namenode status. So, each of the machines which run namenode service also runs a ZKFC.

ZKFC是一个客户管理员监督和管理、复制指令的情况.因此,运行 namenode 服务的每台机器也都运行 ZKFC.

ZKFC handles:

ZKFC 手柄:

**Health Monitoring – **ZKFC periodically pings the active NameNode with Health check command and if the NameNode doesn’t respond it in time it will mark it as unhealthy. This may happen because the NameNode might be crashed or frozen.

健康监测-ZKFC 定期用健康检查命令 ping 活跃的 NameNode,如果 NameNode 没有及时响应,它会将其标记为不健康.这可能是因为 NameNode 可能会崩溃或冻结.

Zookeeper Session Management – If the local NameNode is healthy it keeps a session open in the Zookeeper. If this local NameNode is active, it holds a special lock znode. If the session expires then this lock will delete automatically.

会话管理-如果本地名称节点是健康的,它会在 Zookeeper 中保持会话打开.如果这个本地名称节点是活动的,它会持有一个特殊的锁Znode.如果会话过期,则此锁将自动删除.

Zookeeper-based Election – If there is a situation where local NameNode is healthy and ZKFC gets to know that none of the other nodes currently holds the znode lock, the ZKFC itself will try to acquire that lock. If it succeeds in this task then it has won the election and becomes responsible for running a failover. The failover is similar to manual failover; first, the previously active node is fenced if required to do so and then the local node becomes the active node.

动物园管理员的选举如果本地 NameNode 健康,ZKFC 知道目前没有其他节点持有 znode 锁,ZKFC 本身将尝试获得该锁.如果它在这个任务中成功,那么它就赢得了选举,并负责运行故障切换.故障切换类似于手动故障切换; 首先,如果需要,以前的活动节点会被隔离,然后本地节点会成为活动节点.

3. Summary

3. 简要

Hence, in this Hadoop High Availability article, we saw Zookeeper daemons configure to run on three or five nodes. Since Zookeeper does not have high resource requirement it could be run on the same node as the HDFS Namenode or standby Namenode. Many operators choose to deploy third Zookeeper process on the same node as the YARN Resource Manager. So, it is advised to keep Zookeeper data separate from HDFS metadata i.e. on different disks as it will give the best performance and isolation.

因此,在这个 Hadoop高可用性文章中,我们看到 Zookeeper 守护进程被配置为在三到五个节点上运行.因为管理员没有所需的它可以运行在相同的节点为 HDFS 、复制指令或待机、复制指令.许多操作员选择在与 YARN 资源管理器相同的节点上部署第三个 Zookeeper 进程.因此,建议将 Zookeeper 数据与 HDFS 元数据分开,即在不同的磁盘上,因为它将提供最佳的性能和隔离.

You must check the latest Hadoop Interview Questions for your upcoming interview.

你必须检查一下最新 Hadoop 面试题:为你即将到来的面试.

Still, if any doubt regarding Hadoop High Availability, ask in the comments. We will definitely get back to you.

尽管如此,如果对 Hadoop 的高可用性有任何疑问,请在评论中提问.我们一定会给你回复的

https://data-flair.training/blogs/hadoop-high-availability

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 220,137评论 6 511
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 93,824评论 3 396
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 166,465评论 0 357
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 59,131评论 1 295
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 68,140评论 6 397
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,895评论 1 308
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,535评论 3 420
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,435评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,952评论 1 319
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 38,081评论 3 340
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 40,210评论 1 352
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,896评论 5 347
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,552评论 3 331
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 32,089评论 0 23
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 33,198评论 1 272
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,531评论 3 375
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 45,209评论 2 357

推荐阅读更多精彩内容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi阅读 7,338评论 0 10
  • HA 今天的主要内容 HDFS High Availability Using the Quorum Journa...
    须臾之北阅读 3,772评论 0 1
  • 翻译: http://hadoop.apache.org/docs/stable/hadoop-project-d...
    金刚_30bf阅读 516评论 0 1
  • 翻译: https://www.cloudera.com/documentation/enterprise/lat...
    金刚_30bf阅读 2,640评论 1 1
  • 拍遍365根栏杆 我的湖水就蓝了起来 阳光紧贴水草 在风中碎成透明的波纹 在无所事事的下午 你告诉我 你将我名刻在...
    叶虚愚阅读 164评论 0 2