订单列表查询优化方案(探索)
为了探索一个可以支撑订单列表查询快速响应的可行方案,我们分析了业务模型,发现业务将存在以下特点:
- 一张有效订单需要在72小时之内完成。
- 有效的订单会持续不断的进行处理。
- 业务人员更加关心未完成订单。
- 双11会在10月份之后就开始进行预售。
我们创建1000万条订单记录和1000多万条(1 : 1.3)订单明细记录来模拟数据库的压力。
示例项目地址:https://github.com/dengbin19910916/pg-test.git
欢迎大家提供更加优秀的解决方案。
名词解释:
trade_order:订单表
order_item:订单明细表
code_dict:代码值字典
1.1. 订单总量较大的问题
我们可以通过分区表来降低数据索引大小来加快数据的查询速度,先根据订单渠道分区,再根据订单的下单时间进行分区,
这样每一个分区表的数据将被控制在合理的范围内,以下分区仅供测试使用,分区请根据实际情况进行合理的调整,
-- 订单表
create table trade_order
(
id bigserial not null,
channel varchar(4) not null,
out_id varchar(100) not null,
shop_code varchar(50) not null,
shop_name varchar(200) not null,
paid_time timestamp null,
buyer_name varchar(200) not null,
buyer_memo varchar(2000) null,
receiver_name varchar(200) not null,
receiver_mobile varchar(20) not null,
receiver_country_code varchar(50) null,
receiver_city_code varchar(50) null,
receiver_province_code varchar(50) null,
receiver_area_code varchar(50) null,
receiver_address varchar(50) null,
receiver_zip varchar(50) null,
status varchar(4) not null,
created_time timestamp not null,
updated_time timestamp not null,
constraint trade_order_pk primary key (id, channel, created_time) -- 分区表的分区键必须包含在主键中
) partition by list (channel);
create table trade_order_tm partition of trade_order for values in (1) partition by range (created_time);
create table trade_order_tm_201901 partition of trade_order_tm for values from ('2019-01-01') to ('2019-02-01');
create table trade_order_tm_201902 partition of trade_order_tm for values from ('2019-02-01') to ('2019-03-01');
create table trade_order_tm_201903 partition of trade_order_tm for values from ('2019-03-01') to ('2019-04-01');
create table trade_order_tm_201904 partition of trade_order_tm for values from ('2019-04-01') to ('2019-05-01');
create table trade_order_tm_201905 partition of trade_order_tm for values from ('2019-05-01') to ('2019-06-01');
create table trade_order_tm_201906 partition of trade_order_tm for values from ('2019-06-01') to ('2019-07-01');
create table trade_order_tm_201907 partition of trade_order_tm for values from ('2019-07-01') to ('2019-08-01');
create table trade_order_tm_201908 partition of trade_order_tm for values from ('2019-08-01') to ('2019-09-01');
create table trade_order_tm_201909 partition of trade_order_tm for values from ('2019-09-01') to ('2019-10-01');
create table trade_order_tm_201910 partition of trade_order_tm for values from ('2019-10-01') to ('2019-11-01');
create table trade_order_tm_201911 partition of trade_order_tm for values from ('2019-11-01') to ('2019-12-01');
create table trade_order_tm_201912 partition of trade_order_tm for values from ('2019-12-01') to ('2020-01-01');
create table trade_order_jd_vip partition of trade_order for values in (2, 3) partition by range (created_time);
create table trade_order_jd_vip_201901 partition of trade_order_jd_vip for values from ('2019-01-01') to ('2019-02-01');
create table trade_order_jd_vip_201902 partition of trade_order_jd_vip for values from ('2019-02-01') to ('2019-03-01');
create table trade_order_jd_vip_201903 partition of trade_order_jd_vip for values from ('2019-03-01') to ('2019-04-01');
create table trade_order_jd_vip_201904 partition of trade_order_jd_vip for values from ('2019-04-01') to ('2019-05-01');
create table trade_order_jd_vip_201905 partition of trade_order_jd_vip for values from ('2019-05-01') to ('2019-06-01');
create table trade_order_jd_vip_201906 partition of trade_order_jd_vip for values from ('2019-06-01') to ('2019-07-01');
create table trade_order_jd_vip_201907 partition of trade_order_jd_vip for values from ('2019-07-01') to ('2019-08-01');
create table trade_order_jd_vip_201908 partition of trade_order_jd_vip for values from ('2019-08-01') to ('2019-09-01');
create table trade_order_jd_vip_201909 partition of trade_order_jd_vip for values from ('2019-09-01') to ('2019-10-01');
create table trade_order_jd_vip_201910 partition of trade_order_jd_vip for values from ('2019-10-01') to ('2019-11-01');
create table trade_order_jd_vip_201911 partition of trade_order_jd_vip for values from ('2019-11-01') to ('2019-12-01');
create table trade_order_jd_vip_201912 partition of trade_order_jd_vip for values from ('2019-12-01') to ('2020-01-01');
create table trade_order_gw partition of trade_order for values in (4) partition by range (created_time);
create table trade_order_gw_201901 partition of trade_order_gw for values from ('2019-01-01') to ('2019-02-01');
create table trade_order_gw_201902 partition of trade_order_gw for values from ('2019-02-01') to ('2019-03-01');
create table trade_order_gw_201903 partition of trade_order_gw for values from ('2019-03-01') to ('2019-04-01');
create table trade_order_gw_201904 partition of trade_order_gw for values from ('2019-04-01') to ('2019-05-01');
create table trade_order_gw_201905 partition of trade_order_gw for values from ('2019-05-01') to ('2019-06-01');
create table trade_order_gw_201906 partition of trade_order_gw for values from ('2019-06-01') to ('2019-07-01');
create table trade_order_gw_201907 partition of trade_order_gw for values from ('2019-07-01') to ('2019-08-01');
create table trade_order_gw_201908 partition of trade_order_gw for values from ('2019-08-01') to ('2019-09-01');
create table trade_order_gw_201909 partition of trade_order_gw for values from ('2019-09-01') to ('2019-10-01');
create table trade_order_gw_201910 partition of trade_order_gw for values from ('2019-10-01') to ('2019-11-01');
create table trade_order_gw_201911 partition of trade_order_gw for values from ('2019-11-01') to ('2019-12-01');
create table trade_order_gw_201912 partition of trade_order_gw for values from ('2019-12-01') to ('2020-01-01');
comment on table trade_order is '订单表';
comment on column trade_order.id is '订单ID';
comment on column trade_order.channel is '渠道';
comment on column trade_order.out_id is '外部订单ID';
comment on column trade_order.shop_code is '店铺编码';
comment on column trade_order.shop_name is '店铺名称';
comment on column trade_order.paid_time is '支付时间';
comment on column trade_order.buyer_name is '买家昵称';
comment on column trade_order.buyer_memo is '买家备注';
comment on column trade_order.receiver_name is '收货人昵称';
comment on column trade_order.receiver_mobile is '收货人手机号';
comment on column trade_order.receiver_country_code is '收货人国家编码';
comment on column trade_order.receiver_province_code is '收货人省份编码';
comment on column trade_order.receiver_city_code is '收货人城市编码';
comment on column trade_order.receiver_area_code is '收货人地区编码';
comment on column trade_order.receiver_address is '收货人详细地址';
comment on column trade_order.receiver_zip is '收货人邮政编码';
comment on column trade_order.status is '订单状态';
comment on column trade_order.created_time is '创建时间';
comment on column trade_order.updated_time is '最后时间';
-- 订单明细表
create table order_item
(
id bigserial not null,
order_id bigint not null,
order_channel varchar(4) not null,
serial int not null,
sku_id varchar(50) not null,
name varchar(200) not null,
quantity int not null,
tag_price decimal(16, 2) not null,
purchase_price decimal(16, 2) not null,
given boolean not null default false,
order_created_time timestamp not null,
created_time timestamp not null,
updated_time timestamp not null,
constraint order_item_pk
primary key (id, order_channel, order_created_time)
) partition by list (order_channel);
create table order_item_tm partition of order_item for values in (1) partition by range (order_created_time);
create table order_item_tm_201901 partition of order_item_tm for values from ('2019-01-01') to ('2019-02-01');
create table order_item_tm_201902 partition of order_item_tm for values from ('2019-02-01') to ('2019-03-01');
create table order_item_tm_201903 partition of order_item_tm for values from ('2019-03-01') to ('2019-04-01');
create table order_item_tm_201904 partition of order_item_tm for values from ('2019-04-01') to ('2019-05-01');
create table order_item_tm_201905 partition of order_item_tm for values from ('2019-05-01') to ('2019-06-01');
create table order_item_tm_201906 partition of order_item_tm for values from ('2019-06-01') to ('2019-07-01');
create table order_item_tm_201907 partition of order_item_tm for values from ('2019-07-01') to ('2019-08-01');
create table order_item_tm_201908 partition of order_item_tm for values from ('2019-08-01') to ('2019-09-01');
create table order_item_tm_201909 partition of order_item_tm for values from ('2019-09-01') to ('2019-10-01');
create table order_item_tm_201910 partition of order_item_tm for values from ('2019-10-01') to ('2019-11-01');
create table order_item_tm_201911 partition of order_item_tm for values from ('2019-11-01') to ('2019-12-01');
create table order_item_tm_201912 partition of order_item_tm for values from ('2019-12-01') to ('2020-01-01');
create table order_item_jd_vip partition of order_item for values in (2, 3) partition by range (order_created_time);
create table order_item_jd_vip_201901 partition of order_item_jd_vip for values from ('2019-01-01') to ('2019-02-01');
create table order_item_jd_vip_201902 partition of order_item_jd_vip for values from ('2019-02-01') to ('2019-03-01');
create table order_item_jd_vip_201903 partition of order_item_jd_vip for values from ('2019-03-01') to ('2019-04-01');
create table order_item_jd_vip_201904 partition of order_item_jd_vip for values from ('2019-04-01') to ('2019-05-01');
create table order_item_jd_vip_201905 partition of order_item_jd_vip for values from ('2019-05-01') to ('2019-06-01');
create table order_item_jd_vip_201906 partition of order_item_jd_vip for values from ('2019-06-01') to ('2019-07-01');
create table order_item_jd_vip_201907 partition of order_item_jd_vip for values from ('2019-07-01') to ('2019-08-01');
create table order_item_jd_vip_201908 partition of order_item_jd_vip for values from ('2019-08-01') to ('2019-09-01');
create table order_item_jd_vip_201909 partition of order_item_jd_vip for values from ('2019-09-01') to ('2019-10-01');
create table order_item_jd_vip_201910 partition of order_item_jd_vip for values from ('2019-10-01') to ('2019-11-01');
create table order_item_jd_vip_201911 partition of order_item_jd_vip for values from ('2019-11-01') to ('2019-12-01');
create table order_item_jd_vip_201912 partition of order_item_jd_vip for values from ('2019-12-01') to ('2020-01-01');
create table order_item_gw partition of order_item for values in (4) partition by range (order_created_time);
create table order_item_gw_201901 partition of order_item_gw for values from ('2019-01-01') to ('2019-02-01');
create table order_item_gw_201902 partition of order_item_gw for values from ('2019-02-01') to ('2019-03-01');
create table order_item_gw_201903 partition of order_item_gw for values from ('2019-03-01') to ('2019-04-01');
create table order_item_gw_201904 partition of order_item_gw for values from ('2019-04-01') to ('2019-05-01');
create table order_item_gw_201905 partition of order_item_gw for values from ('2019-05-01') to ('2019-06-01');
create table order_item_gw_201906 partition of order_item_gw for values from ('2019-06-01') to ('2019-07-01');
create table order_item_gw_201907 partition of order_item_gw for values from ('2019-07-01') to ('2019-08-01');
create table order_item_gw_201908 partition of order_item_gw for values from ('2019-08-01') to ('2019-09-01');
create table order_item_gw_201909 partition of order_item_gw for values from ('2019-09-01') to ('2019-10-01');
create table order_item_gw_201910 partition of order_item_gw for values from ('2019-10-01') to ('2019-11-01');
create table order_item_gw_201911 partition of order_item_gw for values from ('2019-11-01') to ('2019-12-01');
create table order_item_gw_201912 partition of order_item_gw for values from ('2019-12-01') to ('2020-01-01');
comment on table order_item is '订单明细表';
comment on column order_item.id is '订单明细ID';
comment on column order_item.order_id is '订单ID';
comment on column order_item.order_channel is '订单渠道';
comment on column order_item.sku_id is 'SKU ID';
comment on column order_item.name is '商品名称';
comment on column order_item.quantity is 'SKU数量';
comment on column order_item.tag_price is '商品吊牌价';
comment on column order_item.purchase_price is '商品实际购买价';
comment on column order_item.given is '是否赠送';
comment on column order_item.order_created_time is '订单创建时间';
comment on column order_item.created_time is '创建时间';
comment on column order_item.updated_time is '最后更新时间';
通过以上方式,我们创建了一个分区订单表和分区订单明细表,先按照订单渠道进行第一次分区,在根据下单时间按照每月的方式再次进行分区,
但是对于双11的特殊跨月访问情况,这种分区会导致通过查询条件 created_time between '2019-08-15' and '2019-09-11' 进行查询
的时候
1.2. 有效的未完成的订单数据量问题
应为处理订单的处理期限为72小时,并且订单会源源不断的被处理,故系统的非已完成的订单量不会过于庞大,但是订单中心的订单总量会较大。
以下是一个查询所有未完成的订单。
-- 记录数 258421
explain ( analyze true , costs , buffers , timing , verbose false )
select count(1)
from trade_order a
where a.channel = '1' -- 天猫渠道
and a.created_time between '2019-08-15' and '2019-09-11'
and exists(select 1
from order_item b
where b.order_id = a.id
and b.order_channel = '1'
and b.order_created_time between '2019-08-15' and '2019-09-11'
and b.given = false
)
and a.status <> '4'; -- 非已完成
Aggregate (cost=55928.48..55928.49 rows=1 width=8) (actual time=4229.916..4229.916 rows=1 loops=1)
Buffers: shared hit=3733186
-> Nested Loop (cost=55370.28..55569.78 rows=143480 width=0) (actual time=918.037..4201.476 rows=258421 loops=1)
Buffers: shared hit=3733186
-> HashAggregate (cost=55369.85..55371.85 rows=200 width=8) (actual time=917.986..1064.494 rows=458923 loops=1)
Group Key: b.order_id
Buffers: shared hit=513961
-> Append (cost=0.42..53531.74 rows=735244 width=8) (actual time=0.029..676.242 rows=737540 loops=1)
Buffers: shared hit=513961
-> Index Scan using order_item_tm_201908_order_created_time_idx on order_item_tm_201908 b (cost=0.42..27599.23 rows=459737 width=8) (actual time=0.028..393.246 rows=459178 loops=1)
Index Cond: ((order_created_time >= '2019-08-15 00:00:00'::timestamp without time zone) AND (order_created_time <= '2019-09-11 00:00:00'::timestamp without time zone))
Filter: ((NOT given) AND ((order_channel)::text = '1'::text))
Rows Removed by Filter: 68553
Buffers: shared hit=319756
-> Index Scan using order_item_tm_201909_order_created_time_idx on order_item_tm_201909 b_1 (cost=0.42..22256.30 rows=275507 width=8) (actual time=0.031..223.477 rows=278362 loops=1)
Index Cond: ((order_created_time >= '2019-08-15 00:00:00'::timestamp without time zone) AND (order_created_time <= '2019-09-11 00:00:00'::timestamp without time zone))
Filter: ((NOT given) AND ((order_channel)::text = '1'::text))
Rows Removed by Filter: 42161
Buffers: shared hit=194205
-> Append (cost=0.42..0.97 rows=2 width=8) (actual time=0.006..0.007 rows=1 loops=458923)
Buffers: shared hit=3219225
-> Index Scan using trade_order_tm_201908_pkey on trade_order_tm_201908 a (cost=0.42..0.48 rows=1 width=8) (actual time=0.003..0.003 rows=0 loops=458923)
Index Cond: ((id = b.order_id) AND ((channel)::text = '1'::text) AND (created_time >= '2019-08-15 00:00:00'::timestamp without time zone) AND (created_time <= '2019-09-11 00:00:00'::timestamp without time zone))
Filter: ((status)::text <> '4'::text)
Rows Removed by Filter: 0
Buffers: shared hit=1666012
-> Index Scan using trade_order_tm_201909_pkey on trade_order_tm_201909 a_1 (cost=0.42..0.48 rows=1 width=8) (actual time=0.003..0.003 rows=0 loops=458923)
Index Cond: ((id = b.order_id) AND ((channel)::text = '1'::text) AND (created_time >= '2019-08-15 00:00:00'::timestamp without time zone) AND (created_time <= '2019-09-11 00:00:00'::timestamp without time zone))
Filter: ((status)::text <> '4'::text)
Rows Removed by Filter: 0
Buffers: shared hit=1553213
Planning Time: 4.085 ms
Execution Time: 4233.171 ms
因为我们使用的下单时间为 ['2019-08-15', '2019-09-11'],它跨越了2个时间区间,我们发现查询已经正确的使用了分区表 trade_order_tm_201908,trade_order_tm_201909 和 order_item_tm_201908,order_item_tm_201909 ,但是对于25万的数据量进行嵌套循环扫描,数据库还是消耗了4秒钟的时间,虽然我们可以通过修改数据库的参数(只在会话中生效)强制使用
并行查询进行加速,但是效果也不是很明显。
-- 表最小大小启用并行
set min_parallel_table_scan_size = 0;
-- 索引最小大小启用并行
set min_parallel_index_scan_size = 0;
-- 并行扫描记录的花费
set parallel_tuple_cost = 0;
-- 启动一个并行进程的花费
set parallel_setup_cost = 0;
-- 指定表的并行度(并行度不要超过物理核心数)
alter table trade_order_tm_201908 set (parallel_workers = 4);
alter table trade_order_tm_201909 set (parallel_workers = 4);
alter table order_item_tm_201908 set (parallel_workers = 4);
alter table order_item_tm_201909 set (parallel_workers = 4);
对于跨越时间区间的问题,我们做一些分区时间上的小调整,如下:
create table trade_order_tm_201901 partition of trade_order_tm for values from ('2019-01-01') to ('2019-04-01');
create table trade_order_tm_201904 partition of trade_order_tm for values from ('2019-04-01') to ('2019-07-01');
create table trade_order_tm_201907 partition of trade_order_tm for values from ('2019-07-01') to ('2019-10-01');
create table trade_order_tm_201910 partition of trade_order_tm for values from ('2019-10-01') to ('2020-01-01');
-- 。。。
实际可以根据数据情况进行更为合理的时间分区,对于以上情况并没有进行实际测试,大家可以进行补充。对于跨越时间区间的问题,我们发现有一个有意思的现象,
我们可以通过以下SQL来干扰查询优化器的成本计算,让优化器启动并行加速。
explain ( analyze true , costs , buffers , timing , verbose false )
select count(1)
from trade_order a
where a.channel = '1' -- 天猫渠道
and a.created_time between '2019-09-01' and '2019-09-11'
and exists(select 1
from order_item b
where b.order_id = a.id
and b.order_channel = '1'
and b.order_created_time between '2019-08-15' and '2019-09-11'
and b.given = false
)
and a.status <> '4'; -- 非已完成
Finalize Aggregate (cost=50944.40..50944.41 rows=1 width=8) (actual time=209.440..209.441 rows=1 loops=1)
Buffers: shared hit=771773
-> Gather (cost=50944.09..50944.40 rows=3 width=8) (actual time=209.310..236.256 rows=4 loops=1)
Workers Planned: 3
Workers Launched: 3
Buffers: shared hit=771773
-> Partial Aggregate (cost=49944.09..49944.10 rows=1 width=8) (actual time=204.051..204.051 rows=1 loops=4)
Buffers: shared hit=771773
-> Nested Loop Semi Join (cost=0.42..49900.92 rows=17266 width=0) (actual time=0.125..201.552 rows=24395 loops=4)
Buffers: shared hit=771773
-> Parallel Append (cost=0.00..19269.75 rows=34533 width=8) (actual time=0.022..56.565 rows=26988 loops=4)
Buffers: shared hit=15394
-> Parallel Seq Scan on trade_order_tm_201909 a (cost=0.00..19097.08 rows=34533 width=8) (actual time=0.021..54.215 rows=26988 loops=4)
Filter: ((created_time >= '2019-09-01 00:00:00'::timestamp without time zone) AND (created_time <= '2019-09-11 00:00:00'::timestamp without time zone) AND ((status)::text <> '4'::text) AND ((channel)::text = '1'::text))
Rows Removed by Filter: 116507
Buffers: shared hit=15394
-> Append (cost=0.42..1.33 rows=2 width=8) (actual time=0.005..0.005 rows=1 loops=107950)
Buffers: shared hit=756379
-> Index Scan using order_item_tm_201908_order_id_idx on order_item_tm_201908 b (cost=0.42..0.66 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=107950)
Index Cond: (order_id = a.id)
Filter: ((NOT given) AND (order_created_time >= '2019-08-15 00:00:00'::timestamp without time zone) AND (order_created_time <= '2019-09-11 00:00:00'::timestamp without time zone) AND ((order_channel)::text = '1'::text))
Buffers: shared hit=324171
-> Index Scan using order_item_tm_201909_order_id_idx on order_item_tm_201909 b_1 (cost=0.42..0.66 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=107950)
Index Cond: (order_id = a.id)
Filter: ((NOT given) AND (order_created_time >= '2019-08-15 00:00:00'::timestamp without time zone) AND (order_created_time <= '2019-09-11 00:00:00'::timestamp without time zone) AND ((order_channel)::text = '1'::text))
Rows Removed by Filter: 0
Buffers: shared hit=432208
Planning Time: 3.050 ms
Execution Time: 236.355 ms
trade_order的下单时间区间为 ['2019-09-01', '2019-09-11'] 但明细表的订单下单时间区间为 ['2019-08-15', '2019-09-11'],这时候我们发现
查询的代价为56920.58,它选择了使用3个并行来加速嵌套循环半连接(Nested Loop Semi Join),这时候的查询好水降为250毫秒左右,对于时间区间完全匹配
的情况并不会发生这种情况,所以我们应该慎用这种方式,它可能存在一定的不确定性。
现在我们还只是将订单按平台和下单时间进行分拆来降低每一个分区表的数据量大小,通过这种方式来加强各个分区的单独的处理能。
但这种优化真的合理吗?我们真的需要这种复杂的分区方案吗?
我们通过更深层次的思考后,发现数据库中的大量订单是属于已完成的订单,这种数据不会再进行任何的修改操作,也就是说这种数据从未完成的热数据变成了已完成的冷数据,
对于冷数据我们在业务操作中并不关心,还有我们的订单会源源不断的进入订单中心并处理完成,也就是说处于未完成的订单不会太多,所以我们自然而然的想到了是否可以在
建立索引的时候指定条件来过滤掉我们不需要的数据,故我们进行了如下修改:
-- 订单表
create table trade_order
(
id bigserial not null,
channel varchar(4) not null,
out_id varchar(100) not null,
shop_code varchar(50) not null,
shop_name varchar(200) not null,
paid_time timestamp null,
buyer_name varchar(200) not null,
buyer_memo varchar(2000) null,
receiver_name varchar(200) not null,
receiver_mobile varchar(20) not null,
receiver_country_code varchar(50) null,
receiver_city_code varchar(50) null,
receiver_province_code varchar(50) null,
receiver_area_code varchar(50) null,
receiver_address varchar(50) null,
receiver_zip varchar(50) null,
status varchar(4) not null,
created_time timestamp not null,
updated_time timestamp not null,
constraint trade_order_pk primary key (id, channel) -- 分区表的分区键必须包含在主键中
) partition by list (channel);
create table trade_order_tm partition of trade_order for values in ('1');
create table trade_order_jd_vip partition of trade_order for values in ('2', '3', '4');
-- 订单明细表
create table order_item
(
id bigserial not null,
order_id bigint not null,
order_channel varchar(4) not null,
serial int not null,
sku_id varchar(50) not null,
name varchar(200) not null,
quantity int not null,
tag_price decimal(16, 2) not null,
purchase_price decimal(16, 2) not null,
given boolean not null default false,
order_created_time timestamp not null,
order_status varchar(4) not null,
created_time timestamp not null,
updated_time timestamp not null,
constraint order_item_pk
primary key (id, order_channel)
) partition by list (order_channel);
create table order_item_tm partition of order_item for values in ('1');
create table order_item_jd_vip partition of order_item for values in ('2', '3', '4');
-- 创建条件索引
create index trade_order_created_time_index on trade_order (created_time) where status <> '4';
create index trade_order_status_index on trade_order using btree (status) where status <> '4';
create index trade_order_receiver_name_index on trade_order using btree (receiver_name) where status <> '4';
create index trade_order_buyer_memo_index on trade_order using gin (buyer_memo gin_trgm_ops) where status <> '4';
create index order_item_order_id on order_item using btree (order_id) where order_status <> '4';
create index order_item_order_created_time_index on order_item (order_created_time) where order_status <> '4';
create index order_item_order_status_index on order_item (order_status) where order_status <> '4';
create index order_item_given_index on order_item (given) where order_status <> '4';
这一次我们建立了时间更加集中的测试数据来模拟双11将有大量订单被同步至订单中心的情景。
-- 记录数 1200175
explain ( analyze true , costs , buffers , timing , verbose false )
select count(1)
from trade_order a
where a.channel = '1'
and a.status = '1'
and a.created_time >= '2019-09-21 00:00:00'
Finalize Aggregate (cost=56711.26..56711.27 rows=1 width=8) (actual time=278.878..278.879 rows=1 loops=1)
Buffers: shared hit=1133803
-> Gather (cost=56710.83..56711.25 rows=4 width=8) (actual time=278.586..314.005 rows=5 loops=1)
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=1133803
-> Parallel Append (cost=55710.83..55710.85 rows=1 width=8) (actual time=273.406..273.406 rows=1 loops=5)
Buffers: shared hit=1133803
-> Partial Aggregate (cost=55710.83..55710.84 rows=1 width=8) (actual time=273.405..273.405 rows=1 loops=5)
Buffers: shared hit=1133803
-> Parallel Index Scan using trade_order_tm_created_time_idx on trade_order_tm a (cost=0.43..55335.03 rows=150322 width=0) (actual time=0.048..261.075 rows=119865 loops=5)
Index Cond: (created_time >= '2019-09-21 00:00:00'::timestamp without time zone)
Filter: (((channel)::text = '1'::text) AND ((status)::text = '1'::text))
Rows Removed by Filter: 150056
Buffers: shared hit=1133803
Planning Time: 0.333 ms
Execution Time: 314.079 ms
通过以上SQL我们查询了所有未确认的订单总记录数,共599325条记录,耗时314毫秒,已经可以很好的满足业务的需求了。
接下来我们对有赠品的订单进行查询,SQL如下:
explain ( analyze true , costs , buffers , timing , verbose false )
select count(1)
from trade_order a
where a.channel = '1'
and a.status = '1'
and a.created_time >= '2019-09-21 00:00:00'
and exists(select 1
from order_item b
where b.order_id = a.id
and b.order_channel = '1'
and b.order_status = '1'
and b.order_created_time >= '2019-09-21 00:00:00'
and b.given = true
);
Aggregate (cost=43894.67..43894.68 rows=1 width=8) (actual time=678.725..678.725 rows=1 loops=1)
Buffers: shared hit=445323
-> Nested Loop (cost=42853.74..43143.06 rows=300644 width=0) (actual time=192.187..670.102 rows=92873 loops=1)
Buffers: shared hit=445323
-> HashAggregate (cost=42853.31..42855.31 rows=200 width=8) (actual time=192.154..219.396 rows=92873 loops=1)
Group Key: b.order_id
Buffers: shared hit=73137
-> Append (cost=0.43..42610.78 rows=97013 width=8) (actual time=0.032..161.060 rows=97692 loops=1)
Buffers: shared hit=73137
-> Index Scan using order_item_tm_given_idx on order_item_tm b (cost=0.43..42125.71 rows=97013 width=8) (actual time=0.031..150.090 rows=97692 loops=1)
Index Cond: (given = true)
Filter: (given AND (order_created_time >= '2019-09-21 00:00:00'::timestamp without time zone) AND ((order_channel)::text = '1'::text) AND ((order_status)::text = '1'::text))
Rows Removed by Filter: 122399
Buffers: shared hit=73137
-> Append (cost=0.43..1.43 rows=1 width=8) (actual time=0.004..0.005 rows=1 loops=92873)
Buffers: shared hit=372186
-> Index Scan using trade_order_tm_pkey on trade_order_tm a (cost=0.43..1.42 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=92873)
Index Cond: ((id = b.order_id) AND ((channel)::text = '1'::text))
Filter: ((created_time >= '2019-09-21 00:00:00'::timestamp without time zone) AND ((status)::text = '1'::text))
Buffers: shared hit=372186
Planning Time: 0.644 ms
Execution Time: 678.803 ms
发现此次查询并没有触发数据库的并行查询,并且查询中包含了一个嵌套循环,消耗了400+毫秒的执行时间,对于这种情况,我们通过在订单表中加入includes_gift(订单中是否包含赠品)字段来避免嵌套循环扫描。
-- 记录数 92695
explain ( analyze true , costs , buffers , timing , verbose false )
select count(1)
from trade_order a
where a.channel = '1'
and a.status = '1'
and a.created_time >= '2019-09-21 00:00:00'
and a.includes_gift = true
;
Finalize Aggregate (cost=56113.92..56113.93 rows=1 width=8) (actual time=267.326..267.327 rows=1 loops=1)
Buffers: shared hit=1127219
-> Gather (cost=56113.49..56113.91 rows=4 width=8) (actual time=267.154..303.035 rows=5 loops=1)
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=1127219
-> Parallel Append (cost=55113.49..55113.51 rows=1 width=8) (actual time=262.160..262.161 rows=1 loops=5)
Buffers: shared hit=1127219
-> Partial Aggregate (cost=55113.49..55113.50 rows=1 width=8) (actual time=262.156..262.157 rows=1 loops=5)
Buffers: shared hit=1127219
-> Parallel Index Scan using trade_order_tm_created_time_idx on trade_order_tm a (cost=0.43..55057.47 rows=22408 width=0) (actual time=0.089..259.940 rows=18539 loops=5)
Index Cond: (created_time >= '2019-09-21 00:00:00'::timestamp without time zone)
Filter: (includes_gift AND ((channel)::text = '1'::text) AND ((status)::text = '1'::text))
Rows Removed by Filter: 251265
Buffers: shared hit=1127219
Planning Time: 0.339 ms
Execution Time: 303.110 ms
同时我们有些有些订单是有买家留言的,对于买家留言我们希望进行中缀模糊匹配(前后都有%),这时候我们需要使用一种特殊的索引-Gin索引。
create index trade_order_buyer_memo_index on trade_order using gin (buyer_memo gin_trgm_ops) where status <> '4';
explain ( analyze true , costs , buffers , timing , verbose false )
select count(1)
from trade_order a
where a.channel = '1'
and a.status = '1'
and a.created_time >= '2019-09-21 00:00:00'
and a.includes_gift = true
and buyer_memo like '%广东省%';
-- 无索引
Finalize Aggregate (cost=56895.98..56895.99 rows=1 width=8) (actual time=248.714..248.715 rows=1 loops=1)
Buffers: shared hit=1127219
-> Gather (cost=56895.55..56895.97 rows=4 width=8) (actual time=248.700..282.707 rows=5 loops=1)
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=1127219
-> Parallel Append (cost=55895.55..55895.57 rows=1 width=8) (actual time=243.180..243.181 rows=1 loops=5)
Buffers: shared hit=1127219
-> Partial Aggregate (cost=55895.55..55895.56 rows=1 width=8) (actual time=243.178..243.179 rows=1 loops=5)
Buffers: shared hit=1127219
-> Parallel Index Scan using trade_order_tm_created_time_idx on trade_order_tm a (cost=0.43..55895.25 rows=121 width=0) (actual time=1.101..243.130 rows=87 loops=5)
Index Cond: (created_time >= '2019-09-21 00:00:00'::timestamp without time zone)
Filter: (includes_gift AND ((buyer_memo)::text ~~ '%广东省%'::text) AND ((channel)::text = '1'::text) AND ((status)::text = '1'::text))
Rows Removed by Filter: 269717
Buffers: shared hit=1127219
Planning Time: 0.534 ms
Execution Time: 282.775 ms
-- 有索引
Finalize Aggregate (cost=8217.88..8217.89 rows=1 width=8) (actual time=14.852..14.852 rows=1 loops=1)
Buffers: shared hit=6979
-> Append (cost=8217.87..8217.88 rows=1 width=8) (actual time=14.846..14.847 rows=1 loops=1)
Buffers: shared hit=6979
-> Partial Aggregate (cost=8217.87..8217.88 rows=1 width=8) (actual time=14.846..14.846 rows=1 loops=1)
Buffers: shared hit=6979
-> Bitmap Heap Scan on trade_order_tm a (cost=96.30..8216.59 rows=509 width=0) (actual time=2.880..14.792 rows=493 loops=1)
Recheck Cond: (((buyer_memo)::text ~~ '%广东省%'::text) AND ((status)::text <> '4'::text))
Filter: (includes_gift AND (created_time >= '2019-09-21 00:00:00'::timestamp without time zone) AND ((channel)::text = '1'::text) AND ((status)::text = '1'::text))
Rows Removed by Filter: 6632
Heap Blocks: exact=6918
Buffers: shared hit=6979
-> Bitmap Index Scan on trade_order_tm_buyer_memo_idx (cost=0.00..96.17 rows=7620 width=0) (actual time=2.048..2.048 rows=7125 loops=1)
Index Cond: ((buyer_memo)::text ~~ '%广东省%'::text)
Buffers: shared hit=61
Planning Time: 0.738 ms
Execution Time: 14.917 ms
查询速度有了明显提升,但是Gin索引是基于相似度匹配的一种索引,它要求待搜索的内容长度必须 >= 3才能进行索引搜索。
Finalize Aggregate (cost=9669.81..9669.82 rows=1 width=8) (actual time=201.025..201.026 rows=1 loops=1)
Buffers: shared hit=64248
-> Append (cost=9669.79..9669.80 rows=1 width=8) (actual time=201.017..201.018 rows=1 loops=1)
Buffers: shared hit=64248
-> Partial Aggregate (cost=9669.79..9669.80 rows=1 width=8) (actual time=201.016..201.017 rows=1 loops=1)
Buffers: shared hit=64248
-> Bitmap Heap Scan on trade_order_tm a (cost=1271.94..9668.42 rows=548 width=0) (actual time=75.892..200.800 rows=576 loops=1)
Recheck Cond: (((buyer_memo)::text ~~ '%广东%'::text) AND ((status)::text <> '4'::text))
Rows Removed by Index Recheck: 74130
Filter: (includes_gift AND (created_time >= '2019-09-21 00:00:00'::timestamp without time zone) AND ((channel)::text = '1'::text) AND ((status)::text = '1'::text))
Rows Removed by Filter: 7587
Heap Blocks: exact=63777
Buffers: shared hit=64248
-> Bitmap Index Scan on trade_order_tm_buyer_memo_idx (cost=0.00..1271.80 rows=7849 width=0) (actual time=63.515..63.516 rows=82293 loops=1)
Index Cond: ((buyer_memo)::text ~~ '%广东%'::text)
Buffers: shared hit=471
Planning Time: 0.323 ms
Execution Time: 201.084 ms
对于留言这种搜索型的数据,我们更加建议使用全文检索索引。
总结:
我们经过分析与试验得到了一个较优的解决方案:使用分区表 + 条件索引。
通过按渠道分区将天猫的订单放入独立的分区表,然后根据订单的状态流转特性建立条件索引来获取较优的性能。