ClickHouse系列7-Doris Hive Spark ClickHouse简单的比较

一.测试环境准备

测试环境:
Doris 4台虚拟机,4核8G 150GB普通磁盘。
Hive 4台虚拟机,4核8G 150GB普通磁盘。
Hive on Spark 同Hive
ClickHouse 1台虚拟,4核8G 150GB普通磁盘。

测试数据:
数据量7亿左右。

hive> desc ods_fact_sale_orc;
OK
id                      bigint                                      
sale_date               string                                      
prod_name               string                                      
sale_nums               int                                         
Time taken: 0.202 seconds, Fetched: 4 row(s)

测试语句:

select * from ods_fact_sale_orc where id = 100;
select count(*) from ods_fact_sale_orc;

备注:
因为使用的是低配虚拟机,加上配置都是默认值,故测试结果只能作为一个参考,请知。

二. 测试结果

2.1 Hive

hive> select * from ods_fact_sale_orc where id = 100;
Query ID = root_20211208104736_0c2ab392-2f47-4b42-94fc-c2daf14a2f7d
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
21/12/08 10:47:38 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1638236643110_0032, Tracking URL = http://hp3:8088/proxy/application_1638236643110_0032/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1638236643110_0032
Hadoop job information for Stage-1: number of mappers: 9; number of reducers: 0
2021-12-08 10:47:45,948 Stage-1 map = 0%,  reduce = 0%
2021-12-08 10:47:55,366 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU 5.48 sec
2021-12-08 10:47:58,483 Stage-1 map = 56%,  reduce = 0%, Cumulative CPU 44.13 sec
2021-12-08 10:48:05,773 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 54.31 sec
2021-12-08 10:48:07,831 Stage-1 map = 78%,  reduce = 0%, Cumulative CPU 63.71 sec
2021-12-08 10:48:08,867 Stage-1 map = 89%,  reduce = 0%, Cumulative CPU 73.08 sec
2021-12-08 10:48:09,896 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 82.58 sec
MapReduce Total cumulative CPU time: 1 minutes 22 seconds 580 msec
Ended Job = job_1638236643110_0032
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 9   Cumulative CPU: 82.58 sec   HDFS Read: 2150344344 HDFS Write: 830 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 22 seconds 580 msec
OK
100     2012-07-09 00:00:00.0   PROD9   38
Time taken: 34.411 seconds, Fetched: 1 row(s)
hive> 
    > select count(*) from ods_fact_sale_orc;
Query ID = root_20211208104925_0b4ece78-2735-4ebb-bab9-431aae074e11
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
21/12/08 10:49:25 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1638236643110_0033, Tracking URL = http://hp3:8088/proxy/application_1638236643110_0033/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1638236643110_0033
Hadoop job information for Stage-1: number of mappers: 9; number of reducers: 1
2021-12-08 10:49:32,158 Stage-1 map = 0%,  reduce = 0%
2021-12-08 10:49:39,433 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU 3.87 sec
2021-12-08 10:49:40,465 Stage-1 map = 22%,  reduce = 0%, Cumulative CPU 9.17 sec
2021-12-08 10:49:41,497 Stage-1 map = 56%,  reduce = 0%, Cumulative CPU 25.21 sec
2021-12-08 10:49:45,624 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 30.72 sec
2021-12-08 10:49:46,653 Stage-1 map = 78%,  reduce = 0%, Cumulative CPU 35.9 sec
2021-12-08 10:49:47,685 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 46.3 sec
2021-12-08 10:49:53,847 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 48.87 sec
MapReduce Total cumulative CPU time: 48 seconds 870 msec
Ended Job = job_1638236643110_0033
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 9  Reduce: 1   Cumulative CPU: 48.87 sec   HDFS Read: 1992485 HDFS Write: 109 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 48 seconds 870 msec
OK
767830000
Time taken: 29.842 seconds, Fetched: 1 row(s)
hive> 

2.2 Hive on Spark

hive> 
    > set hive.execution.engine=spark;
hive> select count(*) from ods_fact_sale_orc;
Query ID = root_20211208105129_f4675518-560e-43e7-b4d7-c4db0a5d52c4
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Running with YARN Application = application_1638236643110_0034
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/yarn application -kill application_1638236643110_0034
Hive on Spark Session Web UI URL: http://hp4:10421

Query Hive on Spark job[0] stages: [0, 1]
Spark job[0] status = RUNNING
--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
--------------------------------------------------------------------------------------
Stage-0 ........         0      FINISHED      9          9        0        0       0  
Stage-1 ........         0      FINISHED      1          1        0        0       0  
--------------------------------------------------------------------------------------
STAGES: 02/02    [==========================>>] 100%  ELAPSED TIME: 11.12 s    
--------------------------------------------------------------------------------------
Spark job[0] finished successfully in 11.12 second(s)
Spark Job[0] Metrics: TaskDurationTime: 25265, ExecutorCpuTime: 15802, JvmGCTime: 469, BytesRead / RecordsRead: 1979283 / 749902, BytesReadEC: 0, ShuffleTotalBytesRead / ShuffleRecordsRead: 522 / 9, ShuffleBytesWritten / ShuffleRecordsWritten: 522 / 9
OK
767830000
Time taken: 30.556 seconds, Fetched: 1 row(s)
hive> 
    > select * from ods_fact_sale_orc where id = 100;
Query ID = root_20211208105212_f5968f86-89ab-4e44-b712-4a687a7bec7f
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Running with YARN Application = application_1638236643110_0034
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/yarn application -kill application_1638236643110_0034
Hive on Spark Session Web UI URL: http://hp4:10421

Query Hive on Spark job[1] stages: [2]
Spark job[1] status = RUNNING
--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
--------------------------------------------------------------------------------------
Stage-2 ........         0      FINISHED      9          9        0        0       0  
--------------------------------------------------------------------------------------
STAGES: 01/01    [==========================>>] 100%  ELAPSED TIME: 15.08 s    
--------------------------------------------------------------------------------------
Spark job[1] finished successfully in 15.08 second(s)
Spark Job[1] Metrics: TaskDurationTime: 49170, ExecutorCpuTime: 42275, JvmGCTime: 2493, BytesRead / RecordsRead: 2150340258 / 749902, BytesReadEC: 0, ShuffleTotalBytesRead / ShuffleRecordsRead: 0 / 0, ShuffleBytesWritten / ShuffleRecordsWritten: 0 / 0
OK
100     2012-07-09 00:00:00.0   PROD9   38
Time taken: 15.237 seconds, Fetched: 1 row(s)
hive> 

2.3 Doris

mysql> select * from table3 where id = 100;
+------+-----------------------+-----------+-----------+
| id   | sale_date             | prod_name | sale_nums |
+------+-----------------------+-----------+-----------+
|  100 | 2012-07-09 00:00:00.0 | PROD9     |        38 |
+------+-----------------------+-----------+-----------+
1 row in set (0.03 sec)

mysql> select count(*) from table3;
+-----------+
| count(*)  |
+-----------+
| 767830000 |
+-----------+
1 row in set (17.92 sec)

2.4 Clickhouse

hp5 :) select * from fact_sale where id = 100;

SELECT *
FROM fact_sale
WHERE id = 100

Query id: d4100ca4-f0a1-412d-a10d-2e052980409c

┌──id─┬───────────sale_date─┬─prod_name─┬─sale_nums─┐
│ 100 │ 2012-07-09 00:00:00 │ PROD9     │        38 │
└─────┴─────────────────────┴───────────┴───────────┘

1 rows in set. Elapsed: 4.805 sec. Processed 767.83 million rows, 3.07 GB (159.81 million rows/s., 639.24 MB/s.)

hp5 :) select count(*) from fact_sale;

SELECT count(*)
FROM fact_sale

Query id: 227625c0-ef2f-4eda-9b16-c6f71f6c3b04

┌───count()─┐
│ 767830000 │
└───────────┘

1 rows in set. Elapsed: 0.017 sec. 

hp5 :) 

CK存储的时候有预聚合,所以加where条件,再测试一次count(*)的性能

hp5 :) select count(*) from fact_sale where id <=1000000;

SELECT count(*)
FROM fact_sale
WHERE id <= 1000000

Query id: bce489e4-d6ff-40c3-b81d-cd020c2d07a1

┌─count()─┐
│ 1000000 │
└─────────┘

1 rows in set. Elapsed: 4.578 sec. Processed 767.83 million rows, 3.07 GB (167.71 million rows/s., 670.85 MB/s.)

hp5 :) 

2.5 测试结果

产品 单行查询时间 查询总数时间
Hive 34秒 30秒
Hive on Spark 15秒 11秒
Doris 0.03秒 18秒
ClickHouse 4.85秒 4.58秒

可以得出简单结论:
由于Clickhouse是单机,而其他是集群,这个地方的结论对Clickhouse不公平,仅限于简单的比较。

  1. Doris单行查询性能秒杀 Hive和Hive on Spark,优于Clickhouse
  2. Doris查询总数性能比Hive on Spark稍慢,Clickhouse性能最佳

官网看到的,看来Aggregate模型的情况下count(*) 这个有点欺负Doris了。


image.png

后面的测试,有待更新......

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 215,634评论 6 497
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,951评论 3 391
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 161,427评论 0 351
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,770评论 1 290
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,835评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,799评论 1 294
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,768评论 3 416
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,544评论 0 271
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,979评论 1 308
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,271评论 2 331
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,427评论 1 345
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,121评论 5 340
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,756评论 3 324
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,375评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,579评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,410评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,315评论 2 352

推荐阅读更多精彩内容