概要
本文简单的安装单机版的HBase数据库,单机版底层存储是直接使用的本地文件系统、这样的话就不用搭建HDFS文件服务了。然后HBase提供了hbase-client来对数据库做操作,但是这里使用Apache Phoenix,可以支持SQL的方式来读写HBase,搭建完HBase并安装Phoenix插件之后,我们基于Spring JDBC和Phoenix客户端来开发一个增删改查HBase的示例。
Phoenix分为客户端和服务端两部分,相当于在HBase上再加了一层SQL翻译,支持JDBC协议,客户端发送SQL经由phoenix发到其作为一个HBase插件的服务端上,把SQL再转成HBase指令交给HBase执行。
HBase简介
HBase是大数据时代的默认存储,适合存储海量数据,用户行为类数据、其他大数据平台的底层存储、报表展示类。
环境安装与搭建
吐槽一下,HBase这入门环境搭建简直是霍格大爷,差点被劝退。
hbase-2.3.7 + phoenix-hbase-2.3-5.1.2死活不行,hbase本身倒是能正常用shell登入进行操作,用phoenix就是不行,卡在sqlline.py连接那里,然后hbase就Region in transition了、要么就ConnectionLoss for /hbase/hbaseid,只能删除数据目录重启。
后来只能老实的安装网上别人的成功安装经验,用hbase-2.2.4 + phoenix-hbase-2.0-5.0.0这个组合才成功。
到conf/hbase-env.sh
里改一下JAVA_HOME环境变量:
export JAVA_HOME=/usr/java/jdk1.8.0_131/
修改hbase-site.xml
:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- hbase存放数据目录 -->
<property>
<name>hbase.rootdir</name>
<value>file:///home/hbase-2.2.4/hbase</value>
</property>
<!-- ZooKeeper数据文件路径 -->
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hbase-2.2.4/zookeeper</value>
</property>
<property>
<name>hbase.master.ipc.address</name>
<value>0.0.0.0</value>
</property>
<property>
<name>hbase.regionserver.ipc.address</name>
<value>0.0.0.0</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
</configuration>
这里可以启动一下试试看,./bin/start-hbase.sh
hbase shell进入命令行,list查看表,create 'test', 'cf' ,describe 'test'
hbase(main):001:0> list
TABLE
0 row(s)
Took 1.1635 seconds
=> []
hbase(main):013:0* create 'test', 'cf'
Created table test
Took 0.7584 seconds
=> Hbase::Table - test
hbase(main):014:0> list
TABLE
test
1 row(s)
Took 0.0299 seconds
=> ["test"]
hbase(main):019:0* describe 'test'
Table test is ENABLED
test
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION =>
'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s)
Quota is disabled
Took 0.3620 seconds
hbase(main):020:0> put 'test', 'row1', 'cf:a', 'value1'
Took 0.1434 seconds
hbase(main):021:0> put 'test', 'row2', 'cf:b', 'value2'
Took 0.0289 seconds
hbase(main):022:0> put 'test', 'row3', 'cf:c', 'value3'
Took 0.0142 seconds
hbase(main):023:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=2022-01-18T14:16:36.606, value=value1
row2 column=cf:b, timestamp=2022-01-18T14:16:49.123, value=value2
row3 column=cf:c, timestamp=2022-01-18T14:16:59.043, value=value3
3 row(s)
Took 0.0911 seconds
hbase(main):025:0* get 'test', 'row1'
COLUMN CELL
cf:a timestamp=2022-01-18T14:16:36.606, value=value1
1 row(s)
Took 0.0510 seconds
禁用表、启用表、禁用后删除表:
disable 'test'
enable 'test'
drop 'test'
然后安装Phoenix:
1、把phoenix安装包里的jar包复制到hbase的lib目录里
2、把hbase-site.xml
文件cp到phoenix的bin目录,后面用本地这个phoenix客户端需要。
3、添加环境变量
vim /etc/profile
# For Phoenix
export PHOENIX_HOME=/usr/phoenix-hbase-2.3-5.1.2-bin
export PHOENIX_CLASSPATH=$PHOENIX_HOME
export PATH=$PHOENIX_HOME/bin:$PATH
source /etc/profile 生效。
使用phoenix自带的sqlline.py localhost:2181 验证一下:
[root@VM_0_11_centos bin]# ./sqlline.py localhost:2181
Setting property: [incremental, false]
Setting property: [isolation, TRANSACTION_READ_COMMITTED]
issuing: !connect jdbc:phoenix:localhost:2181 none none org.apache.phoenix.jdbc.PhoenixDriver
Connecting to jdbc:phoenix:localhost:2181
22/01/18 17:14:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connected to: Phoenix (version 5.0)
Driver: PhoenixEmbeddedDriver (version 5.0)
Autocommit status: true
Transaction isolation: TRANSACTION_READ_COMMITTED
Building list of tables and columns for tab-completion (set fastconnect to true to skip)...
133/133 (100%) Done
Done
sqlline version 1.2.0
0: jdbc:phoenix:localhost:2181> !table
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+------+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION | INDEX_STATE | IMMU |
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+------+
| | SYSTEM | CATALOG | SYSTEM TABLE | | | | | | fals |
| | SYSTEM | FUNCTION | SYSTEM TABLE | | | | | | fals |
| | SYSTEM | LOG | SYSTEM TABLE | | | | | | true |
| | SYSTEM | SEQUENCE | SYSTEM TABLE | | | | | | fals |
| | SYSTEM | STATS | SYSTEM TABLE | | | | | | fals |
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+------+
测试一下表操作:
0: jdbc:phoenix:localhost:2181> create table if not exists "staff"(
. . . . . . . . . . . . . . . > id varchar primary key,
. . . . . . . . . . . . . . . > name varchar,
. . . . . . . . . . . . . . . > age varchar);
No rows affected (1.28 seconds)
0: jdbc:phoenix:localhost:2181> !table
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+------+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION | INDEX_STATE | IMMU |
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+------+
| | SYSTEM | CATALOG | SYSTEM TABLE | | | | | | fals |
| | SYSTEM | FUNCTION | SYSTEM TABLE | | | | | | fals |
| | SYSTEM | LOG | SYSTEM TABLE | | | | | | true |
| | SYSTEM | SEQUENCE | SYSTEM TABLE | | | | | | fals |
| | SYSTEM | STATS | SYSTEM TABLE | | | | | | fals |
| | | staff | TABLE | | | | | | fals |
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+------+
SpringBoot整合开发
用的org.apache.phoenix:phoenix-core:5.0.0-HBase-2.0
这个依赖,slf4j绑定跟springboot的冲突,所以exclude掉:
plugins {
id 'org.springframework.boot' version '2.1.13.RELEASE'
id 'io.spring.dependency-management' version '1.0.9.RELEASE'
id 'java'
}
version = '0.0.1-SNAPSHOT'
sourceCompatibility = '1.8'
repositories {
mavenLocal()
maven { url 'http://maven.aliyun.com/nexus/content/groups/public/' }
//mavenCentral()
}
dependencies {
implementation 'org.springframework.boot:spring-boot-starter-web'
implementation 'org.springframework.boot:spring-boot-starter-jdbc'
implementation 'org.springframework.boot:spring-boot-starter-test'
implementation 'org.projectlombok:lombok:1.18.22'
annotationProcessor('org.projectlombok:lombok')
compile group: 'com.alibaba', name: 'fastjson', version: '1.2.73'
compile('org.apache.phoenix:phoenix-core:5.0.0-HBase-2.0'){
exclude group: 'org.slf4j'
}
}
phoenix支持JDBC,这里选了Spring JDBC也就是JdbcTemplate
来通过phoenix对HBase做增删改查。
数据源配置:
server.port=8080
spring.application.name=hbase-test
spring.datasource.driver-class-name=org.apache.phoenix.jdbc.PhoenixDriver
spring.datasource.name=phoenixDataSource
spring.datasource.url=jdbc:phoenix:122.xx.xxx.187:2181
演示代码:
应用启动的时候创建一个custem_user表:
@Slf4j
@Component
public class SystemInitRunner implements ApplicationRunner{
@Autowired
private JdbcTemplate jdbcTemplate;
@Override
public void run(ApplicationArguments args) throws Exception {
log.info("应用启动...");
initHBaseTables();
}
public void initHBaseTables() {
StringBuilder builder = new StringBuilder();
builder.append("CREATE TABLE IF NOT EXISTS \"custemuser\" (")
.append("\"uid\" VARCHAR primary key,")
.append("\"basic\".\"name\" VARCHAR,")
.append("\"basic\".\"mobile\" VARCHAR)");
String sql = builder.toString();
log.info("开始执行HBase建表语句 {}" , sql);
try {
jdbcTemplate.execute(sql);
log.info("HBase custemuser表创建完毕");
}catch(DataAccessException e) {
log.error("HBase custemuser表创建失败:{}", e.getMessage());
throw new RuntimeException(e.getCause());
}
}
}
对custem_user表的新增与查询接口:
@Slf4j
@RestController
@RequestMapping("/hbase")
public class HBaseTestController {
@Autowired
private JdbcTemplate jdbcTemplate;
@RequestMapping(value = "/addUser", method = RequestMethod.POST)
public void addUser(@RequestBody CustemUser user) {
String sql = "upsert into \"custemuser\" values(?,?,?)";
int ret = jdbcTemplate.update(sql, new PreparedStatementSetter() {
@Override
public void setValues(PreparedStatement ps) throws SQLException {
ps.setString(1, user.getUid());
ps.setString(2, user.getName());
ps.setString(3, user.getMobile());
}});
log.info("HBase表custem_user已添加修改完毕,数据库返回{}", ret);
}
@RequestMapping(value = "/getUserByMobile", method = RequestMethod.GET)
public CustemUser getUserByMobile(String mobile) {
String sql = "select * from \"custemuser\" where \"basic\".\"mobile\" = ?";
CustemUser user= jdbcTemplate.queryForObject(sql,
new Object[] {mobile},
new RowMapper<CustemUser>() {
@Override
public CustemUser mapRow(ResultSet rs, int rowNum) throws SQLException {
CustemUser u = new CustemUser();
u.setUid(rs.getString(1));
u.setName(rs.getString(2));
u.setMobile(rs.getString(3));
return u;
}});
log.info("HBase用户查询结果{}", JSON.toJSONString(user));
return user;
}
}
DTO对象:
@Setter
@Getter
@NoArgsConstructor
@ToString
public class CustemUser {
private String uid;
private String name;
private String mobile;
}
postMan测试:
POST http://localhost:8080/hbase/addUser
requestBody:
{
"uid":"1001",
"name":"肥兔子爱豆畜子",
"mobile":"137xxxx8612"
}
GET http://localhost:8080/hbase/getUserByMobile?mobile=137xxxx8612
返回:
{
"uid": "1001",
"name": "肥兔子爱豆畜子",
"mobile": "137xxxx8612"
}
Phoenix SQL语法
我们直接使用hbase shell去数据库里看一下custem_user的记录:
hbase(main):013:0> scan "custem_user"
ROW COLUMN+CELL
1001 column=0:\x00\x00\x00\x00, timestamp=1642576351561, value=x
1001 column=0:\x80\x0B, timestamp=1642576351561, value=\xE8\x82\xA5\xE5\x85\x94\xE5\xAD\x90\xE7\x88\xB1\xE8\xB1\x86\x
E7\x95\x9C\xE5\xAD\x90
1001 column=0:\x80\x0C, timestamp=1642576351561, value=137xxxx8612
1 row(s)
Took 0.0814 seconds
可以看到Rowkey对应的就是我们建的表的主键id,然后id、name、mobile3个列一起被归到0这个列族了,这是因为我们在建表的时候没有指定列族。把建表语句改一下就行了:
CREATE TABLE IF NOT EXISTS "custem_user" (
"uid" VARCHAR primary key,
"basic"."name" VARCHAR,
"basic"."mobile" VARCHAR)
就可以把name和mobile归结到basic这个列族里。
一般开发时在写到Java代码之前可以用DBeaver工具测试一下SQL是否正确:
CREATE TABLE IF NOT EXISTS "test" (
"uid" VARCHAR primary key,
"basic"."name" VARCHAR,
"basic"."mobile" VARCHAR
);
UPSERT INTO "test" values('123','liny','13789388372');
UPSERT INTO "test" values('456','douchuzi','13429586338');
SELECT * FROM "test" WHERE "basic"."mobile" = '13789388372';
SELECT * FROM "test" WHERE "mobile" = '13429586338';
上面两种查询方式都是可以的。
而如下这么写不行:
SELECT * FROM "test" WHERE mobile = '13429586338';
报错:SQL 错误 [504] [42703]: ERROR 504 (42703): Undefined column. columnName=test.MOBILE
Phoenix SQL里边表名、列明都是大小写敏感的,需要用双引号标识,我们建表的时候表custem_user的basic列族下mobile列,WHERE条件后的mobile字段没有加双引号,而从报错信息看显然是去按照test.MOBILE去找列了。
实践中遇到的问题:
坑1:
应用启动的时候报错:HADOOP_HOME AND HADOOP.HOME.DIR ARE UNSET,解决办法是去steveloughran/winutils: Windows binaries for Hadoop versions (github.com) 下载各版本Hadoop的winutil到本地,然后设置好环境变量就可以了。依赖包里可以看到是Hadoop3.0,所以设置里边的3.0目录到环境变量HADOOP_HOME,重启IDE即可。
坑2:
应用开始运行后用phoenix创建表的时候报错:Can not resolve VM_0_11_centos, please check your network java.net.UnknownHostException: VM_0_11_centos
报错日志可以看到是hbase-client连接失败,VM_0_11_centos是笔者远程HBase所在服务器的机器名,查阅一些文档发现HBase的Region Server启动的时候就是把自己的hostname存放在zookeeper的、而不是ip,所以在客户端本地hosts
文件中添加:122.xx.xxx.187 VM_0_11_centos,然后刷新下windows的本地dns即可:
ipconfig /displaydns
ipconfig /flushdns
下一步进阶
HBase的原理,包括它的架构和集群搭建。
底层存储LSM-Tree数据结构,数据读写流程。
由底层存储结构和架构决定的性能特性,使用场景:海量数据存储、高性能的随机写、较高性能的随机读。
集群服务故障的处理机制,集群工具,周边生态,性能调优以及最佳实践等。
参考:
入门环境搭建与Phoenix集成开发:
SpringBoot - 使用Phoenix操作HBase教程2(使用JdbcTemplate) (hangge.com) 系列
基础概念:
我终于看懂了HBase,太不容易了... - 知乎 (zhihu.com)
入门HBase,看这一篇就够了 - 简书 (jianshu.com)
Hbase--读取数据快还是写数据快 - 简书 (jianshu.com)
架构应用:
云数据库HBase,云时代的大会数据存储 - 阿里云 (aliyun.com)
分库分表技术演进暨最佳实践 - 简书 (jianshu.com)
HBase实战 | 从MySQL到HBase:数据存储方案转型的演进-阿里云开发者社区 (aliyun.com)
基于HBase快速构建 海量订单存储系统-阿里云开发者社区 (aliyun.com)
参考书:
《HBase实战》