022 HBase 压缩和数据局部性

022 HBase Compaction and Data Locality in Hadoop

1. HBase Compaction and Data Locality With Hadoop

1. HBase 压实和数据的位置与 Hadoop

In this Hadoop HBase tutorial of HBase Compaction and Data Locality with Hadoop, we will learn the whole concept of Minor and Major Compaction in HBase, a process by which HBase cleans itself in detail. Also, we will see Data Locality with Hadoop Compaction because data locality is a solution to data not being available to Mapper.
So, let’s start HBase Compaction and Data Locality in Hadoop.

HBase 的 Hadoop 教程 HBase 的压实和地点与 Hadoop 数据,我们将未成年人学 ”的概念和主要的压实HBase,HBase 详细清理自身的过程.此外,我们将看到数据的位置Hadoop因为数据局部性是 Mapper 无法使用的数据的解决方案.
所以,让我们在 Hadoop 中开始 Compaction 压缩和数据本地化.

HBase Compaction and Data Locality in Hadoop

2. What is HBase Compaction?

2. HBase 的压实?

As we know, for read performance, HBase is an optimized distributed data store. But this optimal read performance needs one file per column family. Although, during the heavy writes, it is not always possible to have one file per column family. Hence, to reduce the maximum number of disk seeks needed for read, HBase tries to combine all HFiles into a large single HFile. So, this process is what we call Compaction.
Do you know about HBase Architecture
In other words, Compaction in HBase is a process by which HBase cleans itself, whereas this process is of two types: Minor HBase Compaction as well as Major HBase Compaction.

在读取性能方面,HBase 是一个经过优化的分布式数据存储.但是,这种最佳读取性能需要每个列族一个文件.虽然,在大量写入期间,每个列族不总是可以有一个文件.因此,为了减少读取所需的最大磁盘查找次数,HBase 尝试将所有 HFiles 组合成一个大的单个HFile.这个过程就是我们所说的压实.
了解 HBase 架构吗:
换句话说,Compaction 中的压缩是 HBase 清理自己的过程,而这个过程有两种类型: 轻微的 HBase 压缩和主要的 HBase 压缩.

a. HBase Minor Compaction

A.HBase 轻微的压实

The process of combining the configurable number of smaller HFiles into one large HFile is what we call Minor compaction. Though, it is quite important since, reading particular rows needs many disk reads and may reduce overall performance, without it.
Here are the several processes which involve in HBase Minor Compaction, are:

将可配置数量较小的 HFile 合并成一个大 HFile 的过程就是我们所说的小压缩.尽管如此,这一点非常重要,因为读取特定行需要许多磁盘读取,如果没有它,可能会降低整体性能.
以下是 HBase 次要压缩中涉及的几个过程:

  1. By combining smaller Hfiles, it creates bigger Hfile.
  2. Also, Hfile stores the deleted file along with it.
  3. To store more data increases space in memory.
  4. Uses merge sorting.

[图片上传中...(image-cb1f06-1564812083800-3)]

HBase Compaction

b. HBase Major compaction

B.HBase 主要压实

Whereas, a process of combining the StoreFiles of regions into a single StoreFile, is what we call HBase Major Compaction. Also, it deletes remove and expired versions. As a process, it merges all StoreFiles into single StoreFile and also runs every 24 hours. However, the region will split into new regions after compaction, if the new larger StoreFile is greater than a certain size (defined by property).
Have a look at HBase Commands
Well, the HBase Major Compaction in HBase is the other way to go around:

然而,将区域的存储文件组合成一个存储文件的过程,就是我们所说的 HBase 主要压缩.此外,它会删除删除和过期的版本.作为一个过程,它将所有存储文件合并到单个存储文件中,并且每 24 小时运行一次.然而,如果新的更大的存储文件大于某个大小 (由属性定义),则该区域将在压缩后拆分为新区域.
看看 HBase 命令
嗯,HBase 中的 HBase 主要压缩是另一种方法:

  1. Data present per column family in one region is accumulated to 1 Hfile.
  2. All deleted files or expired cells are deleted permanently, during this process.
  3. Increase read performance of newly created Hfile.
  4. It accepts lots of I/O.
  5. Possibilities for traffic congestion.
  6. The other name of major compaction process is Write amplification Process.
  7. And it is must schedule this process at a minimum bandwidth of network I/O.

HBase Major Compaction

<form action="https://data-flair.training:443/blogs/hbase-compaction/" method="post" class="wpcf7-form" novalidate="">

Get the most demanding skills of IT Industry - Learn Hadoop
<input type="email" name="your-email" value="" size="40" class="wpcf7-form-control wpcf7-text wpcf7-email wpcf7-validates-as-required wpcf7-validates-as-email" aria-required="true" aria-invalid="false" placeholder="Email"> <input type="tel" name="your-phone" value="" size="40" class="wpcf7-form-control wpcf7-text wpcf7-tel wpcf7-validates-as-required wpcf7-validates-as-tel" aria-required="true" aria-invalid="false" placeholder="Phone with country code"> <input type="submit" value="Submit" class="wpcf7-form-control wpcf7-submit">

获得 IT 行业最苛刻的技能-学习 Hadoop
<input type="email" name="your-email" value="" size="40" class="wpcf7-form-control wpcf7-text wpcf7-email wpcf7-validates-as-required wpcf7-validates-as-email" aria-required="true" aria-invalid="false" placeholder="Email"> <input type="tel" name="your-phone" value="" size="40" class="wpcf7-form-control wpcf7-text wpcf7-tel wpcf7-validates-as-required wpcf7-validates-as-tel" aria-required="true" aria-invalid="false" placeholder="Phone with country code"> <input type="submit" value="Submit" class="wpcf7-form-control wpcf7-submit">

</form>

3. HBase Compaction Tuning

HBase 3. 压缩调整

a. Short Description of HBase Compaction:

A.HBase 压实的简要说明:

Now, to enhance performance and stability of the HBase cluster, we can use some hidden HBase compaction configuration like below.

现在,为了增强 HBase 集群的性能和稳定性,我们可以使用一些隐藏的 compaction 压缩配置,如下所示.

b. Disabling Automatic Major Compactions in HBase

在 HBase 中禁用自动主要操作

Generally, HBase users ask to possess a full management of major compaction events hence the method to do that is by setting** HBase.hregion.majorcompaction** to 0, disable periodic automatic major compactions in HBase.
However, it does not offer 100% management of major compactions, yet, by HBase automatically, minor compactions can be promoted to major ones, sometimes, although, we’ve got another configuration choice, luckily, that will help during this case.
Let’s take a tour to HBase Operations.

通常,HBase 用户要求对主要压实事件进行全面管理,因此实现这一点的方法是通过设置HBase.hregion.majorcompaction到 0,在 HBase 中禁用定期自动主要操作.
然而,它并没有提供 100% 的主要 compactions 管理,但是,通过 HBase 自动地,次要的 compactions 可以被提升到主要的 compactions,尽管有时,幸运的是, 在这种情况下,这将会有所帮助.
让我们来了解一下 HBase 操作.

c. Maximum HBase Compaction Selection Size

最大 Compaction 压缩选择大小

Control compaction process in HBase is another option:
hbase.hstore.compaction.max.size (by default value is set to LONG.MAX_VALUE)
In HBase 1.2+ we have as well:
hbase.hstore.compaction.max.size.offpeak

HBase 中的控制压缩过程是另一个选择:
Compaction.hstore.compaction.max.size (默认设置为 LONG.max _ value)
在 HBase 1.2 + 中,我们也有:
Compaction.hstore.compaction.最大尺寸.非峰值

d. Off-peak Compactions in HBase

HBase 中的非高峰竞争

Further, we can use off-peak configuration settings, if our deployment has off-peak hours.
Here are HBase Compaction Configuration options must set, to enable off peak compaction:
hbase.offpeak.start.hour= 0..23
hbase.offpeak.end.hour= 0..23
Compaction file ratio for off peak 5.0 (by default) or for peak hours is 1.2.
Both can be changed:
hbase.hstore.compaction.ratio
hbase.hstore.compaction.ratio.offpeak
As much high the file ratio value will be, the more will be the aggressive (frequent) compaction. So, for the majority of deployments, default values are fine.

此外,如果我们的部署有非高峰时间,我们可以使用非高峰配置设置.
这里是HBase 压缩配置必须设置选项,以启用非峰值压实:
Hbase.offpeak.开始.小时 = 月..23
Hbase.offpeak..小时 = 月..23
非峰值 5.0 (默认情况下) 或峰值时间的压实文件比率为 1.2.
两者都可以改变:
Compaction.hstore.压实.比
Compaction.hstore.压实.比.
文件比率值越高,积极 (频繁) 的压缩就越多.因此,对于大多数部署来说,默认值是可以的.

4. Data Locality in Hadoop

4. Hadoop 数据位置

As we know, in Hadoop, Datasets is stored in HDFS. Basically, it is divided into blocks as well as stored among the data nodes in a Hadoop cluster. Though, the individual Mappers will process the blocks (input splits), while a MapReduce job is executed against the dataset. However, data has to copy over the network from the data node that has data to the data node that is executing the Mapper task, when data is not available for Mapper in the same node. So, it is what we call data Locality in Hadoop.
You can learn more about Data Locality in Hadoop
In Hadoop, there are 3 categories of Data Locality, such as:

我们知道,在 Hadoop 中,数据集存储在HDFS.基本上,它被分成块,并存储在Hadoop 集群.然而,单个地图绘制程序将处理这些块 (输入拆分),而 MapReduce 作业对数据集执行.然而,当数据在同一节点中不可用于 Mapper 时,数据必须通过网络从具有数据的数据节点复制到执行 Mapper 任务的数据节点.这就是我们所说的 Hadoop 中的数据局部性.
您可以在 Hadoop 中了解更多关于数据局部性的信息
在 Hadoop 中,有 3 类数据局部性,如:

Data Locality in Hadoop

1. Data Local Data Locality

1. 数据本地数据的地方

Data local data locality is when data is located on the same node as the mapper working on the data. In this case, the proximity of data is very near to computation. Basically, it is the highly preferable option.

数据本地性是当数据位于与处理数据的映射程序相同的节点上时.在这种情况下,数据的接近性非常接近于计算.基本上,这是一个非常好的选择.

2. Intra-Rack Data Locality

2. 所在地内部资料架子

However, because of resource constraints, it is always not possible to execute the Mapper on the same node. Hence at that time, the Mapper executes on another node within the same rack as the node that has data. So, this is what we call intra-rack data locality.

然而,由于资源限制,在同一个节点上执行映射程序总是不可能的.因此,在那个时候,映射程序在与有数据的节点位于同一机架内的另一个节点上执行.这就是我们所说的机架内数据局部性.

3. Inter-Rack Data Locality

3. 间架资料地点

Well, there is a case when we are not able to achieve intra-rack locality as well as data locality because of resource constraints. So, at that time we need to execute the mapper on nodes on different racks, and also then the data copy from the node that has data to the node executing mapper between racks. So, this is what we call inter-rack data locality. Although, this option is less preferable.
Let’s learn features and principle of Hadoop
So, this was all in HBase Compaction and Data Locality in Hadoop. Hope you like our explanation.

嗯,有一种情况是,由于资源限制,我们无法实现机架内局部性和数据局部性.因此,当时我们需要在不同机架上的节点上执行映射程序,然后将数据从具有数据的节点复制到机架之间执行映射程序的节点上.这就是我们所说的机架间数据局部性.虽然,这个选项不太可取.
让我们学习 Hadoop 的特性和原理
所以,这都是 Hadoop 中的 HBase 压缩和数据本地化.希望你喜欢我们的解释

5. Conclusion: HBase Compaction

5..结论: Compaction 压缩

Hence, in this Hadoop HBase tutorial of HBase Compaction and Data Locality, we have seen the cleaning process of HBase that is HBase Compaction. Also, we have seen a solution to data not being available to Mapper, Apache Hadoop Data Locality in detail. Hope it helps! Please share your experience through comments on our HBase Compaction explanation.
See also –
HBase Performance Tuning
For reference

因此,在这个 Compaction 压缩和数据局部性的 Hadoop HBase 教程中,我们看到了 cleaning 压缩的清理过程.ALso,我们已经看到了一个数据不可用的解决方案映射,详细说明 Apache Hadoop 数据局部性.希望有帮助!请通过对我们的 HBase 压缩解释的评论分享你的经验.
另见-
HBase 性能调优
供参考

https://data-flair.training/blogs/hbase-compaction

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 220,137评论 6 511
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 93,824评论 3 396
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 166,465评论 0 357
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 59,131评论 1 295
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 68,140评论 6 397
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,895评论 1 308
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,535评论 3 420
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,435评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,952评论 1 319
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 38,081评论 3 340
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 40,210评论 1 352
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,896评论 5 347
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,552评论 3 331
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 32,089评论 0 23
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 33,198评论 1 272
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,531评论 3 375
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 45,209评论 2 357

推荐阅读更多精彩内容