第一次写系列博客,最近项目中用到了hbase,本文从以下几点来探讨下hbase,做个简单的了解。官网链接:https://hbase.apache.org/
hbase是什么 ?
- 先看下官网介绍
Apache HBase is the Hadoop database, a distributed, scalable, big data store.
看下来得到这几个关键字:Hadoop 、数据库、分布式、可扩展、大数据量
hbase的由来 ?
Use Apache HBase when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware.
Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al.
Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.
上面这段介绍,对hbase概括的很全了:
- 对大数据进行随机、实时的读写
- 支持几十亿行、几百万列的存储
- 开源的、分布式的、版本化的非关系型数据库
- 仿照Google的Bigtable,依托Hadoop 的 HDFS文件系统
hbase的特性
1 Linear and modular scalability.
2 Strictly consistent reads and writes.
3 Automatic and configurable sharding of tables
4 Automatic failover support between RegionServers.
5 Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables.
6 Easy to use Java API for client access.
7 Block cache and Bloom Filters for real-time queries.
8 Query predicate push down via server side Filters
9 Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options
10 Extensible jruby-based (JIRB) shell
11 Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX
线性的 && 模块可伸缩性
读写严格一致性
自动配置分片
RegionServers间自动故障转移
很方便的支持Hadoop MapReduce作业(不太理解)
易于使用Java api客户端访问
实时查询的块缓存和布隆过滤器
通过服务器端过滤器查询谓词下推
Thrift网关和REST-ful Web服务,支持XML,Protobuf和二进制数据编码选项
个可扩展的基于Jruby的(JIRB)外壳
支持通过Hadoop指标子系统将指标导出到文件或Ganglia; 或通过JMX
后面这4个指标特性没有看太懂。