0x0 背景
近期要做一个数据统计功能,公司选择了clickhouse作为数据库;下面记录一下该数据库的特性和使用教程。
0x1 介绍
ClickHouse is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).
clickhouse是一个列式数据库,主要用于数据分析;
从目前使用看来,特点如下:
- 列式存储查询效率高
- 不支持事务;
- 适用于一写多读
- 支持特殊的修改和删除语句,sql标准的删改语句不支持
0x2 安装教程
1.从官网下载最新的tgz包,然后解压执行sh脚本即可,比较简单:
export LATEST_VERSION=`curl https://api.github.com/repos/ClickHouse/ClickHouse/tags 2>/dev/null | grep -Eo '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | head -n 1`
curl -O https://repo.clickhouse.tech/tgz/clickhouse-common-static-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/clickhouse-common-static-dbg-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/clickhouse-server-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/clickhouse-client-$LATEST_VERSION.tgz
tar -xzvf clickhouse-common-static-$LATEST_VERSION.tgz
sudo clickhouse-common-static-$LATEST_VERSION/install/doinst.sh
tar -xzvf clickhouse-common-static-dbg-$LATEST_VERSION.tgz
sudo clickhouse-common-static-dbg-$LATEST_VERSION/install/doinst.sh
tar -xzvf clickhouse-server-$LATEST_VERSION.tgz
sudo clickhouse-server-$LATEST_VERSION/install/doinst.sh
sudo /etc/init.d/clickhouse-server start
tar -xzvf clickhouse-client-$LATEST_VERSION.tgz
sudo clickhouse-client-$LATEST_VERSION/install/doinst.sh
2.启动clickhouse服务端:
root@ubuntu:/etc/clickhouse-server# clickhouse-server
Include not found: clickhouse_remote_servers
Include not found: clickhouse_compression
Logging trace to /var/log/clickhouse-server/clickhouse-server.log
Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log
Logging trace to console
2020.04.29 14:26:41.942431 [ 1 ] {} <Trace> Pipe: Pipe capacity is 1.00 MiB
2020.04.29 14:26:41.945559 [ 1 ] {} <Information> : Starting ClickHouse 20.2.1.2183 with revision 54432
2020.04.29 14:26:41.945635 [ 1 ] {} <Information> Application: starting up
2020.04.29 14:26:41.952544 [ 1 ] {} <Debug> Application: Set max number of file descriptors to 1048576 (was 1024).
2020.04.29 14:26:41.952591 [ 1 ] {} <Debug> Application: Initializing DateLUT.
2020.04.29 14:26:41.952600 [ 1 ] {} <Trace> Application: Initialized DateLUT with time zone 'PRC'.
2020.04.29 14:26:41.953286 [ 1 ] {} <Debug> Application: Configuration parameter 'interserver_http_host' doesn't exist or exists and empty. Will use 'localhost' as replica host.
2020.04.29 14:26:41.956136 [ 1 ] {} <Debug> ConfigReloader: Loading config 'users.xml'
Include not found: networks
2020.04.29 14:26:41.957092 [ 1 ] {} <Information> Application: Uncompressed cache size was lowered to 991.13 MiB because the system has low amount of memory
2020.04.29 14:26:41.957443 [ 1 ] {} <Information> Application: Mark cache size was lowered to 991.13 MiB because the system has low amount of memory
......
3.配置允许远程连接
进入clickhouse配置文件
/etc/clickhouse-server/config.xml
将<listen_host>::</listen_host>
取消注释;
重启服务:service clickhouse-server restart
4.启动clickhouse-client
root@ubuntu:~# clickhouse-client
ClickHouse client version 20.2.1.2183 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 20.2.1 revision 54432.
localhost :) show databases;
SHOW DATABASES
┌─name────┐
│ default │
│ system │
│ test │
└─────────┘
3 rows in set. Elapsed: 0.008 sec.
localhost :)
0x3 简单的demo
create table if not exists test.tb_test
(
id Int64,
datetime DateTime,
content Nullable(String),
value Nullable(Float64),
date Date
)
engine = MergeTree --使用mergeTree引擎,ch主要引擎
partition by toYYYYMM(datetime) --按照datetime这个字段的月进行分区
order by id --按照id进行排序
TTL datetime + INTERVAL 3 DAY ; --三天过期
--修改表中数据过期时间,到期后数据会在merge时被删除
ALTER TABLE test.tb_test
MODIFY TTL datetime + INTERVAL 1 DAY;
--查询
select * from tb_test order by id;
--删除分区,可用于定时任务删除旧数据
alter table tb_test drop partition '202005';
--插入数据
insert into tb_test values (5, '2020-02-29 12:38:37', 'abcde', 12.553, '2020-04-25');
--修改数据,不推荐使用
alter table tb_test update content = 'hello click' where id=52;
--删除数据,不推荐使用
alter table tb_test delete WHERE id=56;
0x4 高级用法
1.求和引擎SummingMergeTree
这种引擎可以自动聚合非主键数字列,可以用于事件统计
--自动求和聚合表
CREATE TABLE IF NOT EXISTS tb_stat
(
regionId String, --门店id
groupId String, --统计组id
in int, --进客流
out int, --出客流
statDate DateTime --统计时间
)
ENGINE = SummingMergeTree
partition by (toYYYYMM(statDate), regionId)
ORDER BY (toStartOfHour(statDate), regionId, groupId);
insert into tb_stat values ('1232364', '111', 32, 2, '2020-03-25 12:56:00');
insert into tb_stat values ('1232364', '111', 34, 44, '2020-03-25 12:21:00');
insert into tb_stat values ('1232364', '111', 54, 12, '2020-03-25 12:20:00');
insert into tb_stat values ('1232364', '222', 45, 11, '2020-03-25 12:13:00');
insert into tb_stat values ('1232364', '222', 32, 33, '2020-03-25 12:44:00');
insert into tb_stat values ('1232364', '222', 12, 23, '2020-03-25 12:22:00');
insert into tb_stat values ('1232364', '333', 54, 54, '2020-03-25 12:11:00');
insert into tb_stat values ('1232364', '333', 22, 74, '2020-03-25 12:55:00');
insert into tb_stat values ('1232364', '333', 12, 15, '2020-03-25 12:34:00');
select toStartOfHour(statDate), regionId, groupId, sum(in), sum(out)
from tb_stat group by toStartOfHour(statDate), regionId, groupId;
数据插入后,大概过1分钟,在此查询该表可以发现,只剩下3调数据:
select * from tb_stat;
1232364 111 480 232 2020-03-25 04:56:00
1232364 222 356 268 2020-03-25 04:13:00
1232364 333 352 572 2020-03-25 04:11:00