tpch用法

一、初识tpch

1.tpch简介

TPCH是国际事务处理性能委员会(TPC,Transaction Processing Performance Council)于 1994 年指定的标准,是一款面向商品零售业务的决策支持系统测试基准,TPCH包含8张表,22个查询,遵循SQL92。
tpc事务处理性能委员会( Transaction ProcessingPerformance Council )
tpc-h 模拟决策支持系统中的数据库操作,测试数据库系统复杂查询的响应时间,以每小时执行的查询数(TPC-H QphH@Siz)作为度量指标.

tpc-h规范:
Query: 22条sql
Modify: RF1, RF2
3个测试:数据装载、顺序执行一次query+modify、模拟生产

模拟生产:在22个查询执行的同时,还有一对更新操作RF1 和RF2 并发地执行,RF1向Order 表和Lineitem 表中插入原行数的0.1%的新行,模拟新销售业务的数据加入到数据库中;RF2 从Order 表和Lineitem表中删除等量与RF1 增加的数据,模拟旧的销售数据被淘汰。

2.tpch关系图

TPC-H包括8张表(表上有些约束等需要满足,参见TPC-H规范),如下:
PART:表示零件的信息,如表4-1所示。
SUPPLIER:表示供货商的信息,如表4-2所示。
PARTSUPP:表示供货商的零件的信息,如表4-3所示。
CUSTOMER:表示消费者的信息,如表4-4所示。
ORDERS:表示订单的信息,如表4-5所示。
LINEITEM:表示在线商品的信息,如表4-6所示。
NATION:表示国家的信息,如表4-7所示。
REGION:表示地区的信息,如表4-8所示。
这8张表之间的关系,如图4-1所示或图4-2所示。


image.png

二、安装TPCH工具

1.安装工具下载:

1.TPC-H工具下载网站:http://www.tpc.org/tpch/
2.链接:https://pan.baidu.com/s/1VuB-reZ_1EiWT8cj_wrGhw 
提取码:rqgg

2.解压安装包

先在windows下解压得到zip压缩包,再上传服务器。
unzip tpc-h-tool-3.0.0.zip -d  /root

3.编辑makefile.suite文件

cd /root/TPC-H_Tools_v3.0.0/dbgen
vi makefile.suite 
CC      = gcc
DATABASE= POSTGRESQL
MACHINE = LINUX
WORKLOAD = TPCH

4.编辑tpcd.h,追加如下配置

vi tpcd.h
#ifdef POSTGRESQL
#define GEN_QUERY_PLAN  "EXPLAIN PLAN"
#define START_TRAN      "SET TRANSACTION"
#define END_TRAN        "COMMIT;"
#define SET_OUTPUT      ""
#define SET_ROWCOUNT    "LIMIT %d\n"
#define SET_DBASE      ""
#endif

5.执行编译安装

make -f makefile.suite

6.生成裸数据

dbgen参数详解
-v 详细信息
-s 表示生成G数据
-1 SF,Scale Factor ,数据库的比例因子。TPC-H标准规定,测试数据库的比例因子必须从下列固定值中选择:1,10,30,100,1000,3000,10000 (相当于1GB,10GB,30GB,100GB,1000GB,3000GB,10000GB)。数据库的大小缺省定义为1(例如:SF=1;近似于1GB)。
-S 切分数据
-f 覆盖之前的文件
更多参数可使用 ./dbgen -help 查看

./dbgen -s xxx ---这里的xxx表示想要生成的数据量(单位GB)
注:这里的-s 参数指定生成测试数据的仓库数,建议设置100以上,这里为了方便,选取较小的值,真实业务场景测试可设置1000以上。
生成的数据文件存放在/root/TPC-H_Tools_v3.0.0/dbgen目录下,以tbl结尾,一共8个文件,对应的是8个表。

三、列存测试

初始化数据库:

create database tpch;
--create tablespace tpch location '/data/tpch';
--alter database tpch set tablespace tpch;

1.创建测试表

CREATE TABLE NATION  ( N_NATIONKEY  INTEGER NOT NULL,
                            N_NAME       CHAR(25) NOT NULL,
                            N_REGIONKEY  INTEGER NOT NULL,
                            N_COMMENT    VARCHAR(152),
                            N_NULL    VARCHAR(10)
                            )with (ORIENTATION = COLUMN);
CREATE TABLE REGION  ( R_REGIONKEY  INTEGER NOT NULL,
                            R_NAME       CHAR(25) NOT NULL,
                            R_COMMENT    VARCHAR(152),
                            R_NULL VARCHAR(10)  
                            )with (ORIENTATION = COLUMN);
CREATE TABLE PART  ( P_PARTKEY     INTEGER NOT NULL,
                          P_NAME        VARCHAR(55) NOT NULL,
                          P_MFGR        CHAR(25) NOT NULL,
                          P_BRAND       CHAR(10) NOT NULL,
                          P_TYPE        VARCHAR(25) NOT NULL,
                          P_SIZE        INTEGER NOT NULL,
                          P_CONTAINER   CHAR(10) NOT NULL,
                          P_RETAILPRICE DECIMAL(15,2) NOT NULL,
                          P_COMMENT     VARCHAR(23) NOT NULL,
                          P_NULL VARCHAR(10) 
                          )with (ORIENTATION = COLUMN);
                        
CREATE TABLE SUPPLIER ( S_SUPPKEY     INTEGER NOT NULL,
                             S_NAME        CHAR(25) NOT NULL,
                             S_ADDRESS     VARCHAR(40) NOT NULL,
                             S_NATIONKEY   INTEGER NOT NULL,
                             S_PHONE       CHAR(15) NOT NULL,
                             S_ACCTBAL     DECIMAL(15,2) NOT NULL,
                             S_COMMENT     VARCHAR(101) NOT NULL,
                             S_NULL VARCHAR(10)         
                             )with (ORIENTATION = COLUMN);
CREATE TABLE PARTSUPP ( PS_PARTKEY     INTEGER NOT NULL,
                             PS_SUPPKEY     INTEGER NOT NULL,
                             PS_AVAILQTY    INTEGER NOT NULL,
                             PS_SUPPLYCOST  DECIMAL(15,2)  NOT NULL,
                             PS_COMMENT     VARCHAR(199) NOT NULL,
                             PS_NULL  VARCHAR(10)   
                             )with (ORIENTATION = COLUMN);
CREATE TABLE CUSTOMER ( C_CUSTKEY     INTEGER NOT NULL,
                             C_NAME        VARCHAR(25) NOT NULL,
                             C_ADDRESS     VARCHAR(40) NOT NULL,
                             C_NATIONKEY   INTEGER NOT NULL,
                             C_PHONE       CHAR(15) NOT NULL,
                             C_ACCTBAL     DECIMAL(15,2)   NOT NULL,
                             C_MKTSEGMENT  CHAR(10) NOT NULL,
                             C_COMMENT     VARCHAR(117) NOT NULL,
                             C_NULL  VARCHAR(10)        
                             )with (ORIENTATION = COLUMN);
CREATE TABLE ORDERS  ( O_ORDERKEY       INTEGER NOT NULL,
                           O_CUSTKEY        INTEGER NOT NULL,
                           O_ORDERSTATUS    CHAR(1) NOT NULL,
                           O_TOTALPRICE     DECIMAL(15,2) NOT NULL,
                           O_ORDERDATE      DATE NOT NULL,
                           O_ORDERPRIORITY  CHAR(15) NOT NULL,  
                           O_CLERK          CHAR(15) NOT NULL, 
                           O_SHIPPRIORITY   INTEGER NOT NULL,
                           O_COMMENT        VARCHAR(79) NOT NULL,
                           O_NULL   VARCHAR(10)       
                           )with (ORIENTATION = COLUMN);
CREATE TABLE LINEITEM ( L_ORDERKEY    INTEGER NOT NULL,
                             L_PARTKEY     INTEGER NOT NULL,
                             L_SUPPKEY     INTEGER NOT NULL,
                             L_LINENUMBER  INTEGER NOT NULL,
                             L_QUANTITY    DECIMAL(15,2) NOT NULL,
                             L_EXTENDEDPRICE  DECIMAL(15,2) NOT NULL,
                             L_DISCOUNT    DECIMAL(15,2) NOT NULL,
                             L_TAX         DECIMAL(15,2) NOT NULL,
                             L_RETURNFLAG  CHAR(1) NOT NULL,
                             L_LINESTATUS  CHAR(1) NOT NULL,
                             L_SHIPDATE    DATE NOT NULL,
                             L_COMMITDATE  DATE NOT NULL,
                             L_RECEIPTDATE DATE NOT NULL,
                             L_SHIPINSTRUCT CHAR(25) NOT NULL,
                             L_SHIPMODE     CHAR(10) NOT NULL,
                             L_COMMENT      VARCHAR(44) NOT NULL,
                             L_NULL  VARCHAR(10)  
                             )with (ORIENTATION = COLUMN);

2.创建约束

ALTER TABLE LINEITEM ADD PARTIAL CLUSTER KEY (L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE);
ALTER TABLE ORDERS ADD PARTIAL CLUSTER KEY (O_ORDERDATE);

3.导入数据前,上面生成文件传至数据库服务器上

从tpch安装服务器把文件传至数据库服务器上
scp *.tbl root@10.0.0.151:/tmp    
给文件授权
chmod -R 777 /tmp/*.tbl

4.导入数据

copy customer from '/tmp/customer.tbl' DELIMITERS '|';
copy lineitem from '/tmp/lineitem.tbl' DELIMITERS '|';
copy nation from '/tmp/nation.tbl' DELIMITERS '|';
copy orders from '/tmp/orders.tbl' DELIMITERS '|';
copy partsupp from '/tmp/partsupp.tbl' DELIMITERS '|';
copy part from '/tmp/part.tbl' DELIMITERS '|';
copy region from '/tmp/region.tbl' DELIMITERS '|';
copy supplier from '/tmp/supplier.tbl' DELIMITERS '|';
vacuum ANALYZE ;

四、执行22条测试sql

Q1:定价汇总报表查询
定价汇总报表查询是分组、排序、聚集操作并存的单表查询操作,报告已开票、发货和退回的业务量; 为给定日期发货的所有订单项提供汇总定价报告,按l_returnflag和l_linestatus分组并排序,列出了扩展价格、折扣扩展价格、折扣扩展价格加税、平均数量、平均扩展价格、平均折扣、各个分组的总行数。

Q1语句的特点是:带有分组、排序、聚集操作并存的单表查询操作。这个查询会导致表上的数据有95%到97%行被读取到。

--语句详解

select 
    l_returnflag, //返回标志
    l_linestatus, 
    sum(l_quantity) as sum_qty, //总的数量
    sum(l_extendedprice) as sum_base_price, //聚集函数操作
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, 
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, 
    avg(l_quantity) as avg_qty, 
    avg(l_extendedprice) as avg_price, 
    avg(l_discount) as avg_disc, 
    count(*) as count_order //每个分组所包含的行数
from 
    lineitem
where 
    l_shipdate <= date'1998-12-01' - interval '90' day //时间段是随机生成的
group by //分组操作
    l_returnflag, 
    l_linestatus
order by //排序操作
    l_returnflag, 
    l_linestatus;

示例sql:

select
    l_returnflag,
    l_linestatus,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(*) as count_order
from
    lineitem
where
    l_shipdate <= date '1998-12-01' - interval '103' day
group by
    l_returnflag,
    l_linestatus
order by
    l_returnflag,
    l_linestatus
LIMIT 1;

Q2:最低成本供应商查询
最低成本供应商查询是带有排序、聚集操作、子查询并存的多表查询操作,查找应该选择那个供应商来为给定地区的给定零件下订单;在给定的区域中,针对特定类型和尺寸的每个零件,找到能够以最低成本供应的供应商。如果该地区的多个供应商以相同的(最低)成本供货。查询列出了供应商的账户余额、名称和国家,零部件编号、制造商,供应商的地址、电话号码和评论信息。

Q2语句的特点是:带有排序、聚集操作、子查询并存的多表查询操作。查询语句没有从语法上限制返回多少条元组,但是TPC-H标准规定,查询结果只返回前100行(通常依赖于应用程序实现)。

--语句详解

select
    s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment 
  /*查询供应者的帐户余额、名字、国家、零件的号码、生产者、供应者的地址、电话号码、备注信息 */
from
    part, supplier, partsupp, nation, region //五表连接
where
    p_partkey = ps_partkey
    and s_suppkey = ps_suppkey
    and p_size = [SIZE] //指定大小,在区间[1, 50]内随机选择
    and p_type like '%[TYPE]' //指定类型,在TPC-H标准指定的范围内随机选择
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = '[REGION]' //指定地区,在TPC-H标准指定的范围内随机选择
    and ps_supplycost = ( //子查询
        select
            min(ps_supplycost) //聚集函数
        from
            partsupp, supplier, nation, region //与父查询的表有重叠
        where
            p_partkey = ps_partkey
            and s_suppkey = ps_suppkey
            and s_nationkey = n_nationkey
            and n_regionkey = r_regionkey
            and r_name = '[REGION]'
    )
order by //排序
    s_acctbal desc,
    n_name,
    s_name,
    p_partkey;

示例sql:

select
    s_acctbal,
    s_name,
    n_name,
    p_partkey,
    p_mfgr,
    s_address,
    s_phone,
    s_comment
from
    part,
    supplier,
    partsupp,
    nation,
    region,
    (
        select
            ps_partkey,
            min(ps_supplycost) as min
        from
            partsupp,
            supplier,
            nation,
            region
        where
            s_suppkey = ps_suppkey
            and s_nationkey = n_nationkey
            and n_regionkey = r_regionkey
            and r_name = 'EUROPE'
        group by
            ps_partkey
    ) temp
where
    p_partkey = partsupp.ps_partkey
    and s_suppkey = ps_suppkey
    and p_size = 47
    and p_type like '%BRASS'
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = 'EUROPE'
    and ps_supplycost = temp.min
    and p_partkey = temp.ps_partkey
order by
    s_acctbal desc,
    n_name,
    s_name,
    p_partkey
LIMIT 100;

Q3:运输优先级查询
运输优先级查询是带有分组、排序、聚集操作并存的三表查询操作,检索价值最高的未发货订单;在给定日期尚未发货的订单中收入最大的订单的运输优先级和潜在收入(l_extendedprice * (1-l_discount)的总和),订单按照收入的降序排序。

Q3语句的特点是:带有分组、排序、聚集操作并存的三表查询操作。查询语句没有从语法上限制返回多少条元组,但是TPC-H标准规定,查询结果只返回前10行(通常依赖于应用程序实现)。

--语句详解

select
    l_orderkey,
    sum(l_extendedprice*(1-l_discount)) as revenue, //潜在的收入,聚集操作
    o_orderdate,
    o_shippriority
from
    customer, orders, lineitem //三表连接
where
    c_mktsegment = '[SEGMENT]' //在TPC-H标准指定的范围内随机选择
    and c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate < date '[DATE]' //指定日期段,在在[1995-03-01, 1995-03-31]中随机选择
    and l_shipdate > date '[DATE]'
group by //分组操作
    l_orderkey, //订单标识
    o_orderdate, //订单日期
    o_shippriority //运输优先级
order by //排序操作
    revenue desc, //降序排序,把潜在最大收入列在前面
    o_orderdate;

示例sql:

select
    l_orderkey,
    sum(l_extendedprice * (1 - l_discount)) as revenue,
    o_orderdate,
    o_shippriority
from
    customer,
    orders,
    lineitem
where
    c_mktsegment = 'BUILDING'
    and c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate < date '1995-03-20'
    and l_shipdate > date '1995-03-20'
group by
    l_orderkey,
    o_orderdate,
    o_shippriority
order by
    revenue desc,
    o_orderdate
LIMIT 10;

Q4:订单优先级查询
订单优先级查询是带有分组、排序、聚集操作、子查询并存的单表查询操作,确定订单优先系统的工作情况并评估客户满意度;在给定季度中的订购数量,其中客户在承诺日志之后收到至少一个订单项,列出了按优先级升序排序的每个订单优先级的此类订单的计数。

Q4语句的特点是:带有分组、排序、聚集操作、子查询并存的单表查询操作。子查询是相关子查询。

--语句详解

select
    o_orderpriority, //订单优先级
    count(*) as order_count //订单优先级计数
from orders //单表查询
where
    o_orderdate >= date '[DATE]'
    and o_orderdate < date '[DATE]' + interval '3' month //指定订单的时间段+三个月,DATE是在1993年1月和1997年10月之间随机选择的一个月的第一天
    and exists ( //子查询
        select
            *
        from
            lineitem
        where
            l_orderkey = o_orderkey
            and l_commitdate < l_receiptdate
    )
group by //按订单优先级分组
    o_orderpriority
order by //按订单优先级排序
    o_orderpriority;

示例sql:

select
    o_orderpriority,
    count(*) as order_count
from
    orders
where
    o_orderdate >= date '1995-12-01'
    and o_orderdate < date '1995-12-01' + interval '3' month
    and exists 
    (
        select
            *
        from
            lineitem
        where
            l_orderkey = o_orderkey
            and l_commitdate < l_receiptdate 
        limit 1
    )
group by
    o_orderpriority
order by
    o_orderpriority
LIMIT 1;

Q5:本地供应商收入量查询
本地供应商收入量查询是带有分组、排序、聚集操作、子查询并存的多表连接查询操作,列出了通过本地供应商完成的收入量;为一个地区的每个国家列出了由订单项交易产生的收入量,以确定是否在给定区域建立本地配送中心。

Q5语句的特点是:带有分组、排序、聚集操作、子查询并存的多表连接查询操作。

--语句详解

select
    n_name,
    sum(l_extendedprice * (1 - l_discount)) as revenue //聚集操作
from
    customer,orders,lineitem,supplier,nation,region //六表连接
where
    c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and l_suppkey = s_suppkey
    and c_nationkey = s_nationkey
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = '[REGION]' //指定地区,在TPC-H标准指定的范围内随机选择
    and o_orderdate >= date '[DATE]' //DATE是从1993年到1997年中随机选择的一年的1月1日
    and o_orderdate < date '[DATE]' + interval '1' year
group by //按名字分组
    n_name
order by //按收入降序排序,注意分组和排序子句不同
    revenue desc;

示例sql:

select
    n_name,
    sum(l_extendedprice * (1 - l_discount)) as revenue
from
    customer,
    orders,
    lineitem,
    supplier,
    nation,
    region
where
    c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and l_suppkey = s_suppkey
    and c_nationkey = s_nationkey
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = 'AMERICA'
    and o_orderdate >= date '1997-01-01'
    and o_orderdate < date '1997-01-01' + interval '1' year
group by
    n_name
order by
    revenue desc
LIMIT 1;

Q6:预测收入变化查询
预测收入变化查询是带有聚集操作的单表查询操作,量化了在给定年份中给定百分比范围内消除某些公司范围内的折扣所导致的收入增长量,用于寻找增加收入的方法;在给定的年份发货的所有订单项,折扣介于“DISCOUNT-0.01”和“DISCOUNT+0.01”之间,如果l_quantity小于quantity的订单的折扣被消除后总收入增加的数量。

Q6语句的特点是:带有聚集操作的单表查询操作。查询语句使用了BETWEEN-AND操作符,有的数据库可以对BETWEEN-AND进行优化。

--语句详解

select
    sum(l_extendedprice*l_discount) as revenue //潜在的收入增加量
from
    lineitem //单表查询
where
    l_shipdate >= date '[DATE]' //DATE是从[1993, 1997]中随机选择的一年的1月1日
    and l_shipdate < date '[DATE]' + interval '1' year //一年内
    and l_discount between [DISCOUNT] - 0.01 and [DISCOUNT] + 0.01 //between
    and l_quantity < [QUANTITY]; // QUANTITY在区间[24, 25]中随机选择

示例sql:

select
    sum(l_extendedprice * l_discount) as revenue
from
    lineitem
where
    l_shipdate >= date '1997-01-01'
    and l_shipdate < date '1997-01-01' + interval '1' year
    and l_discount between 0.07 - 0.01 and 0.07 + 0.01
    and l_quantity < 24
LIMIT 1;

Q7:批量出货查询
批量出货查询是带有分组、排序、聚集、子查询操作并存的多表查询操作,确定在某些国家直接运输的货物的价值以帮助重新谈判运输合同;对于两个给定的国家,在1995年到1996年期间,从其中一个国家的供应商想另一个国家的客户运送零件的订单项获得的总折扣收入。

Q7语句的特点是:带有分组、排序、聚集、子查询操作并存的多表查询操作。子查询的父层查询不存在其他查询对象,是格式相对简单的子查询。

--语句详解

select
    supp_nation, //供货商国家
    cust_nation, //顾客国家
    l_year, sum(volume) as revenue //年度、年度的货运收入
from ( //子查询
    select
        n1.n_name as supp_nation,
        n2.n_name as cust_nation,
        extract(year from l_shipdate) as l_year,
        l_extendedprice * (1 - l_discount) as volume
    from
        supplier,lineitem,orders,customer,nation n1,nation n2 //六表连接
    where
        s_suppkey = l_suppkey
        and o_orderkey = l_orderkey
        and c_custkey = o_custkey
        and s_nationkey = n1.n_nationkey
        and c_nationkey = n2.n_nationkey
        and ( // NATION2和NATION1的值不同,表示查询的是跨国的货运情况
            (n1.n_name = '[NATION1]' and n2.n_name = '[NATION2]')
            or (n1.n_name = '[NATION2]' and n2.n_name = '[NATION1]')
        )
        and l_shipdate between date '1995-01-01' and date '1996-12-31'
    ) as shipping
group by
    supp_nation,
    cust_nation,
    l_year
order by
    supp_nation,
    cust_nation,
    l_year;

示例sql:

select
    supp_nation,
    cust_nation,
    l_year,
    sum(volume) as revenue
from
(
    select
        n1.n_name as supp_nation,
        n2.n_name as cust_nation,
        extract(year from l_shipdate) as l_year,
        l_extendedprice * (1 - l_discount) as volume
    from
        supplier,
        lineitem,
        orders,
        customer,
        nation n1,
        nation n2
    where
        s_suppkey = l_suppkey
        and o_orderkey = l_orderkey
        and c_custkey = o_custkey
        and s_nationkey = n1.n_nationkey
        and c_nationkey = n2.n_nationkey
        and 
        (
            (n1.n_name = 'CHINA' and n2.n_name = 'INDONESIA')
            or (n1.n_name = 'INDONESIA' and n2.n_name = 'CHINA')
        )
        and l_shipdate between date '1995-01-01' and date '1996-12-31'
) as shipping
group by
    supp_nation,
    cust_nation,
    l_year
order by
    supp_nation,
    cust_nation,
    l_year
LIMIT 1;

Q8:全国市场份额查询
全国市场份额查询带有分组、排序、聚集、子查询操作并存的查询操作,确定给定区域内给定国家的市场份额在两年内对于给定零件类型的变化情况; 给定区域内给定国家的市场份额被定义为收入的一部分,即 l_extendedprice * (1-l_discount)的总和。

Q8语句的特点是:带有分组、排序、聚集、子查询操作并存的查询操作。子查询的父层查询不存在其他查询对象,是格式相对简单的子查询,但子查询自身是多表连接的查询。

--语句详解:

select
    o_year, //年份
    sum(case
    when nation = '[NATION]'//指定国家,在TPC-H标准指定的范围内随机选择
    then volume
    else 0
    end) / sum(volume) as mkt_share //市场份额:特定种类的产品收入的百分比;聚集操作
from ( //子查询
    select
        extract(year from o_orderdate) as o_year, //分解出年份
        l_extendedprice * (1-l_discount) as volume, //特定种类的产品收入
        n2.n_name as nation
    from
        part,supplier,lineitem,orders,customer,nation n1,nation n2,region //八表连接
    where
        p_partkey = l_partkey
        and s_suppkey = l_suppkey
        and l_orderkey = o_orderkey
        and o_custkey = c_custkey
        and c_nationkey = n1.n_nationkey
        and n1.n_regionkey = r_regionkey
        and r_name = '[REGION]' //指定地区,在TPC-H标准指定的范围内随机选择
        and s_nationkey = n2.n_nationkey
        and o_orderdate between date '1995-01-01' and date '1996-12-31' //只查95、96年的情况
        and p_type = '[TYPE]' //指定零件类型,在TPC-H标准指定的范围内随机选择
) as all_nations
group by //按年分组
    o_year
order by //按年排序
    o_year;

示例sql:

select
    o_year,
    sum(case
    when nation = 'INDONESIA' then volume
    else 0
    end)   / sum(volume) as mkt_share
from
(
    select
        extract(year from o_orderdate) as o_year,
        l_extendedprice * (1 - l_discount) as volume,
        n2.n_name as nation
        from
        part,
        supplier,
        lineitem,
        orders,
        customer,
        nation n1,
        nation n2,
        region
    where
        p_partkey = l_partkey
        and s_suppkey = l_suppkey
        and l_orderkey = o_orderkey
        and o_custkey = c_custkey
        and c_nationkey = n1.n_nationkey
        and n1.n_regionkey = r_regionkey
        and r_name = 'ASIA'
        and s_nationkey = n2.n_nationkey
        and o_orderdate between date '1995-01-01' and date '1996-12-31'
        and p_type = 'PROMO POLISHED TIN'
) as all_nations
group by
    o_year
order by
    o_year
LIMIT 1;

Q9:产品类型利润度量查询
产品类型利润度量查询是带有分组、排序、聚集、子查询操作并存的查询操作,确定按供应商国家和年份细分的给定零件系列的利润。

Q9语句的特点是:带有分组、排序、聚集、子查询操作并存的查询操作。子查询的父层查询不存在其他查询对象,是格式相对简单的子查询,但子查询自身是多表连接的查询。子查询中使用了LIKE操作符,有的查询优化器不支持对LIKE操作符进行优化。

--语句详解

select
    nation,
    o_year,
    sum(amount) as sum_profit //每个国家每一年所有被定购的零件在一年中的总利润
from(
  select
        n_name as nation, //国家
        extract(year from o_orderdate) as o_year, //取出年份
        l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount //利润
    from
        part,supplier,lineitem,partsupp,orders,nation //六表连接
    where
        s_suppkey = l_suppkey
        and ps_suppkey = l_suppkey
        and ps_partkey = l_partkey
        and p_partkey = l_partkey
        and o_orderkey = l_orderkey
        and s_nationkey = n_nationkey
        and p_name like '%[COLOR]%' //LIKE操作,查询优化器可能进行优化
) as profit
group by //按国家和年份分组
    nation,
    o_year
order by //按国家和年份排序,年份大者靠前
    nation,
    o_year desc;

示例sql:

select
    nation,
    o_year,
    sum(amount) as sum_profit
from
(
    select
        n_name as nation,
        extract(year from o_orderdate) as o_year,
        l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
    from
        part,
        supplier,
        lineitem,
        partsupp,
        orders,
        nation
    where
        s_suppkey = l_suppkey
        and ps_suppkey = l_suppkey
        and ps_partkey = l_partkey
        and p_partkey = l_partkey
        and o_orderkey = l_orderkey
        and s_nationkey = n_nationkey
        and p_name like '%thistle%'
) as profit
group by
    nation,
    o_year
order by
    nation,
    o_year desc
LIMIT 1;

Q10:退货报告查询
退货报告查询是带有分组、排序、聚集操作并存的多表连接查询操作,识别可能对运送给他们的零件有问题的客户;根据对给定季度收入损失的影响,找出排名靠前的已退回零件的客户。

Q10语句的特点是:带有分组、排序、聚集操作并存的多表连接查询操作。查询语句没有从语法上限制返回多少条元组,但是TPC-H标准规定,查询结果只返回前10行(通常依赖于应用程序实现)。

--语句详解:

select
    c_custkey, c_name, //客户信息
    sum(l_extendedprice * (1 - l_discount)) as revenue, //收入损失
    c_acctbal,
    n_name, c_address, c_phone, c_comment //国家、地址、电话、意见信息等
from
    customer, orders, lineitem, nation
where
    c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate >= date '[DATE]' // DATE是位于1993年一月到1994年十二月中任一月的一号
    and o_orderdate < date '[DATE]' + interval '3' month //3个月内
    and l_returnflag = 'R' //货物被回退
    and c_nationkey = n_nationkey
group by
    c_custkey,
    c_name,
    c_acctbal,
    c_phone,
    n_name,
    c_address,
    c_comment
order by
    revenue desc;

示例sql:

select
    c_custkey,
    c_name,
    sum(l_extendedprice * (1 - l_discount)) as revenue,
    c_acctbal,
    n_name,
    c_address,
    c_phone,
    c_comment
from
    customer,
    orders,
    lineitem,
    nation
where
    c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate >= date '1994-06-01'
    and o_orderdate < date '1994-06-01' + interval '3' month
    and l_returnflag = 'R'
    and c_nationkey = n_nationkey
group by
    c_custkey,
    c_name,
    c_acctbal,
    c_phone,
    n_name,
    c_address,
    c_comment
order by
    revenue desc
LIMIT 20;

Q11:库存价值查询
库存价值查询是带有分组、排序、聚集、子查询操作并存的多表连接查询操作,查找给定国家/地区供应商的库存价值; 扫描给定国家的供应商的可用库存来查找占有所有可用零件总价值的重要百分比的所有零件。

Q11语句的特点是:带有分组、排序、聚集、子查询操作并存的多表连接查询操作。子查询位于分组操作的HAVING条件中。

--语句详解

select
    ps_partkey,
    sum(ps_supplycost * ps_availqty) as value //聚集操作,商品的总价值
from
    partsupp, supplier, nation
where
    ps_suppkey = s_suppkey
    and s_nationkey = n_nationkey
    and n_name = '[NATION]'
group by
    ps_partkey 
having //带有HAVING子句的分组操作
    sum(ps_supplycost * ps_availqty) > ( //HAVING子句中包括有子查询
        select
            sum(ps_supplycost * ps_availqty) * [FRACTION] //子查询中存在聚集操作;FRACTION为0.0001/SF1
        from
            partsupp, supplier, nation //与父查询的表连接一致
        where //与父查询的WHEWR条件一致
            ps_suppkey = s_suppkey
            and s_nationkey = n_nationkey
            and n_name = '[NATION]' //指定国家
    )
order by //按商品的价值降序排序
    value desc;

示例sql:

select
    ps_partkey,
    sum(ps_supplycost * ps_availqty) as value
from
    partsupp,
    supplier,
    nation
where
    ps_suppkey = s_suppkey
    and s_nationkey = n_nationkey
    and n_name = 'ALGERIA'
group by
    ps_partkey having
    sum(ps_supplycost * ps_availqty) > 
    (
        select
            sum(ps_supplycost * ps_availqty) * 0.0001000000
        from
            partsupp,
            supplier,
            nation
        where
            ps_suppkey = s_suppkey
            and s_nationkey = n_nationkey
            and n_name = 'ALGERIA'
    )
order by
    value desc
LIMIT 1;

Q12:运送方式和订单优先级查询
运送方式和订单优先级查询是带有分组、排序、聚集操作并存的两表连接查询操作,确定选择较便宜的运送方式是否会对关键优先级订单产生负面影响,导致消费者更多的在合同日期之后收到货物。

Q12语句的特点是:带有分组、排序、聚集操作并存的两表连接查询操作。

--语句详解

select
    l_shipmode,
    sum(//聚集操作
        case when 
      o_orderpriority ='1-URGENT' //OR运算,二者满足其一即可,选出URGENT或HIGH的
            or o_orderpriority ='2-HIGH'
        then 1
    else 0
        end) as high_line_count,
    sum(
    case when 
        o_orderpriority <> '1-URGENT' //AND运算,二者都不满足,非URGENT非HIGH的
            and o_orderpriority <> '2-HIGH'
        then 1
        else 0
        end) as low_line_count
from
    orders,lineitem
where
    o_orderkey = l_orderkey
    and l_shipmode in ('[SHIPMODE1]', '[SHIPMODE2]') 
  /* 指定货运模式的类型,在TPC-H标准指定的范围内随机选择,SHIPMODE2必须有别于SHIPMODE1 */
    and l_commitdate < l_receiptdate
    and l_shipdate < l_commitdate
    and l_receiptdate >= date '[DATE]' //从1993年到1997年中任一年的一月一号
    and l_receiptdate < date '[DATE]' + interval '1' year //1年之内
group by //分组操作
    l_shipmode
order by //排序操作
    l_shipmode;

示例sql:

select
    l_shipmode,
    sum(case when o_orderpriority = '1-URGENT' or o_orderpriority = '2-HIGH' then 1 else 0 end) as high_line_count,
    sum(case when o_orderpriority <> '1-URGENT' and o_orderpriority <> '2-HIGH' then 1 else 0 end) as low_line_count
from
    orders,
    lineitem
where
    o_orderkey = l_orderkey
    and l_shipmode in ('RAIL', 'SHIP')
    and l_commitdate < l_receiptdate
    and l_shipdate < l_commitdate
    and l_receiptdate >= date '1994-01-01'
    and l_receiptdate < date '1994-01-01' + interval '1' year
group by
    l_shipmode
order by
    l_shipmode
LIMIT 1;

Q13:客户分布查询
客户分布查询是带有分组、排序、聚集、子查询、左外连接操作并存的查询操作;根究客户的订单数量确定客户的分布,计算并报告有多少客户没有订单,有多个客户有1/2/3 等个订单。

Q13语句的特点是:带有分组、排序、聚集、子查询、左外连接操作并存的查询操作。

--语句详解

select
    c_count, count(*) as custdist //聚集操作,统计每个组的个数
from (//子查询
    select
        c_custkey,
        count(o_orderkey)
    from
        customer left outer join orders 
  on //子查询中包括左外连接操作
        c_custkey = o_custkey
        and o_comment not like ‘%[WORD1]%[WORD2]%’ //LIKE操作
            //WORD1 为以下四个可能值中任意一个:special、pending、unusual、express
            //WORD2 为以下四个可能值中任意一个:packages、requests、accounts、deposits
    group by //子查询中的分组操作
        c_custkey
) as c_orders (c_custkey, c_count)
group by //分组操作
    c_count
order by //排序操作
    custdist desc, //从大到小降序排序
    c_count desc;
TPC-H标准定义了Q13语句等价的变形SQL,与上述查询语句格式上不相同,上述语句使用子查询作为查询的对象,变形的SQL把子查询部分变为视图,然后基于视图做查询,这种做法的意义在于有些数据库不支持如上语法,但存在等价的其他语法,如MySQL就不支持如上语法,需要使用如下等价形式。

create view orders_per_cust:s (custkey, ordercount) as //创建视图,相当与标准Q13的子查询内容
select
c_custkey,
count(o_orderkey)
from
customer left outer join orders on
c_custkey = o_custkey
and o_comment not like '%:1%:2%'
group by
c_custkey;

select
ordercount,
count(*) as custdist
from
orders_per_cust:s //对视图进行查询
group by
ordercount
order by
custdist desc,
ordercount desc;
drop view orders_per_cust:s;

示例sql:

select
    c_count,
    count(*) as custdist
from
(
    select
        c_custkey,
        count(o_orderkey)
    from
        customer left outer join orders on
        c_custkey = o_custkey
        and o_comment not like '%express%packages%'
    group by
        c_custkey
) as c_orders (c_custkey, c_count)
group by
    c_count
order by
    custdist desc,
    c_count desc
LIMIT 1;

Q14:促销效果查询
促销效果查询是带有分组、排序、聚集、子查询、左外连接操作并存的查询操作,监控市场对促销的反应;确定在给定年份和月份的收入中有多少百分比来之与促销部分。

Q14语句的特点是:带有分组、排序、聚集、子查询、左外连接操作并存的查询操作。

语句详解:

select
    100.00 * sum(
    case when p_type like 'PROMO%' //促销零件
        then l_extendedprice*(1-l_discount) //某一特定时间的收入
        else 0
        end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
    lineitem, part
where
    l_partkey = p_partkey
    and l_shipdate >= date '[DATE]' // DATE是从1993年到1997年中任一年的任一月的一号
    and l_shipdate < date '[DATE]' + interval '1' month;

示例sql:

select
    100.00 * sum(case when p_type like 'PROMO%' then l_extendedprice * (1 - l_discount) else 0 end)  / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
    lineitem,
    part
where
    l_partkey = p_partkey
    and l_shipdate >= date '1994-02-01'
    and l_shipdate < date '1994-02-01' + interval '1' month
LIMIT 1;

Q15:顶级供应商查询
顶级供应商查询是带有分排序、聚集、聚集子查询操作并存的普通表与视图的连接操作,确定顶级供应商以对其进行奖励、给予更多业务或获得特殊认可;在给定的年费的给定季度对总出货量贡献最大的供应商。

Q15语句的特点是:带有分排序、聚集、聚集子查询操作并存的普通表与视图的连接操作。

--语句详解:

create view revenue[STREAM_ID](supplier_no, total_revenue) as //创建复杂视图(带有分组操作)
select
    l_suppkey,
    sum(l_extendedprice * (1 - l_discount)) //获取供货商为公司带来的总利润
from
    lineitem
where
    l_shipdate >= date '[DATE]' //DATE 是从1993年一月到1997年十月中任一月的一号
    and l_shipdate < date '[DATE]' + interval '3' month //3个月内
group by //分组键与查询对象之一相同
    l_suppkey;


//查询语句
select
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
from
    supplier,revenue[STREAM_ID] //普通表与复杂视图进行连接操作
where
    s_suppkey = supplier_no
    and total_revenue = (//聚集子查询
        select
            max(total_revenue)
        from
            revenue[STREAM_ID] //聚集子查询从视图获得数据
    )
order by
    s_suppkey;
  
//删除视图
drop view revenue[STREAM_ID];

TPC-H标准定义了Q15语句等价的变形SQL,使用了WITH子句,然后用WITH的对象与表进行连接。变形SQL的语句如下:

WITH revenue (supplier_no, total_revenue) as (
    SELECT
        l_suppkey,
        SUM(l_extendedprice * (1-l_discount))
    FROM
        lineitem
    WHERE
        l_shipdate >= date ':1'
        AND l_shipdate < date ':1' + interval '3' month
    GROUP BY
        l_suppkey
)

SELECT
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
FROM
    supplier,
    revenue
WHERE
    s_suppkey = supplier_no
    AND total_revenue = (
        SELECT
            MAX(total_revenue)
        FROM
            revenue
    )
ORDER BY
    s_suppkey;

示例sql:

create or replace view revenue0 (supplier_no, total_revenue) as
select
    l_suppkey,
    sum(l_extendedprice * (1 - l_discount))
from
    lineitem
where
    l_shipdate >= date '1994-02-01'
    and l_shipdate < date '1994-02-01' + interval '3' month
group by
    l_suppkey;
select
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
from
    supplier,
    revenue0
where
    s_suppkey = supplier_no
    and total_revenue = 
    (
        select
            max(total_revenue)
        from
            revenue0
    )
order by
    s_suppkey
LIMIT 1;

Q16:零部件/供货商关系查询
零部件/供货商关系查询是带有分组、排序、聚集、去重、NOT IN子查询操作并存的两表连接操作;找出有多少供应商可以提供具有给定属性的零件。

Q16语句的特点是:带有分组、排序、聚集、去重、NOT IN子查询操作并存的两表连接操作。

--语句详解

select
    p_brand,
    p_type,
    p_size,
    count(distinct ps_suppkey) as supplier_cnt //聚集、去重操作
from
    partsupp,
    part
where
    p_partkey = ps_partkey
    and p_brand <> '[BRAND]'
        // BRAND=Brand#MN ,M和N是两个字母,代表两个数值,相互独立,取值在1到5之间
    and p_type not like '[TYPE]%' //消费者不感兴趣的类型和尺寸
    and p_size in ([SIZE1], [SIZE2], [SIZE3], [SIZE4], [SIZE5], [SIZE6], [SIZE7], [SIZE8])
        //TYPEX是在1到50之间任意选择的一组八个不同的值
    and ps_suppkey not in ( //NOT IN子查询,消费者排除某些供货商
        select
            s_suppkey
        from
            supplier
        where
            s_comment like '%Customer%Complaints%'
    )
group by //分组操作
    p_brand,
    p_type,
    p_size
order by //排序操作
    supplier_cnt desc, //按数量降序排列,按品牌、种类、尺寸升序排列
    p_brand,
    p_type,
    p_size;

示例sql:

select
    p_brand,
    p_type,
    p_size,
    count(distinct ps_suppkey) as supplier_cnt
from
    partsupp,
    part
where
     p_partkey = ps_partkey
     and p_brand <> 'Brand#55'
     and p_type not like 'MEDIUM PLATED%'
     and p_size in (35, 17, 43, 49, 19, 41, 30, 47)
     and ps_suppkey not in 
     (
         select
             s_suppkey
         from
             supplier
         where
             s_comment like '%Customer%Complaints%'
    )
group by
    p_brand,
    p_type,
    p_size
order by
    supplier_cnt desc,
    p_brand,
    p_type,
    p_size
LIMIT 1;

Q17:小额订单收入查询
小额订单收入查询是带有聚集、聚集子查询操作并存的两表连接操作;确定如果不在为少量特定零件完成订单,平均每年将损失多少收入,通过集中在更大的出货量上来减少管理费用;给定品牌和给定容器类型的零部件,确定在7年内数据库中所有订单订购的此类零件的平均订单数量,如果不再接收低于20%平均水平的订单,平均每年的总收入损失多少。

Q17语句的特点是:带有聚集、聚集子查询操作并存的两表连接操作。

--语句详解

select
    sum(l_extendedprice) / 7.0 as avg_yearly //聚集操作
from
    lineitem, part
where
    p_partkey = l_partkey
    and p_brand = '[BRAND]' /*指定品牌。 BRAND=’Brand#MN’ ,M和N是两个字母,代表两个数值,相互独立,取值在1到5之间 */
    and p_container = '[CONTAINER]' //指定包装类型。在TPC-H标准指定的范围内随机选择
    and l_quantity < ( //聚集子查询
        select
            0.2 * avg(l_quantity)
        from
            lineitem
        where
            l_partkey = p_partkey
);

示例sql:

select
    sum(l_extendedprice) / 7.0 as avg_yearly
from
   lineitem,
   part,
   (SELECT l_partkey AS agg_partkey, 0.2 * avg(l_quantity) AS avg_quantity FROM lineitem GROUP BY l_partkey) part_agg
where
    p_partkey = l_partkey
    and agg_partkey = l_partkey
    and p_brand = 'Brand#53'
    and p_container = 'JUMBO JAR'
    and l_quantity < avg_quantity    
LIMIT 1;

Q18:大批量客户查询
大批量客户查询是带有分组、排序、聚集、IN子查询操作并存的三表连接操作,根据客户的大批量订单对客户进行排名。

Q18语句的特点是:带有分组、排序、聚集、IN子查询操作并存的三表连接操作。查询语句没有从语法上限制返回多少条元组,但是TPC-H标准规定,查询结果只返回前100行(通常依赖于应用程序实现)。

--语句详解

select
    c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, //基本信息
    sum(l_quantity) //订货总数
from
    customer, orders, lineitem
where
    o_orderkey in ( //带有分组操作的IN子查询
        select
            l_orderkey
        from
            lineitem
        group by 
            l_orderkey 
    having
            sum(l_quantity) > [QUANTITY] // QUANTITY是位于312到315之间的任意值
    )
    and c_custkey = o_custkey
    and o_orderkey = l_orderkey 
group by
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice
order by
    o_totalprice desc,
    o_orderdate;

示例sql:

select
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice,
    sum(l_quantity)
from
    customer,
    orders,
    lineitem
where
    o_orderkey in 
    (
        select
            l_orderkey
        from
            lineitem
        group by
            l_orderkey having
            sum(l_quantity) > 313
    )
    and c_custkey = o_custkey
    and o_orderkey = l_orderkey
group by
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice
order by
    o_totalprice desc,
    o_orderdate
LIMIT 100;

Q19:折扣收入查询
折扣收入查询是带有分组、排序、聚集、IN子查询操作并存的三表连接操作,归因于以特定方式处理的选定零部件的总折扣收入。本查询是用数据挖掘工具产生格式化代码的一个例子;对一些空运或人工运输零件三个不同种类的所有订单的总折扣收入。零件的选择考虑特定品牌、包装和尺寸范围。

Q19语句的特点是:带有分组、排序、聚集、IN子查询操作并存的三表连接操作。

--语句详解

select
    sum(l_extendedprice * (1 - l_discount) ) as revenue
from
    lineitem, part
where (
        p_partkey = l_partkey
        and p_brand = ‘[BRAND1]’ /*特定品牌。BRAND1、BRAND2、BRAND3=‘Brand#MN’,M和N是两个字母,代表两个数值,相互独立,取值在1到5之间 */
        and p_container in ( ‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’) //包装范围
        and l_quantity >= [QUANTITY1] and l_quantity <= [QUANTITY1] + 10 /* QUANTITY1 是1到10之间的任意取值 */
        and p_size between 1 and 5 //尺寸范围
        and l_shipmode in (‘AIR’, ‘AIR REG’) //运输模式,如下带有阴影的粗体表示的条件是相同的,存在条件化简的可能
        and l_shipinstruct = ‘DELIVER IN PERSON’
    ) or (
        p_partkey = l_partkey
        and p_brand = ‘[BRAND2]’
        and p_container in (‘MED BAG’, ‘MED BOX’, ‘MED PKG’, ‘MED PACK’)
        and l_quantity >= [QUANTITY2] and l_quantity <= [QUANTITY2] + 10 /* QUANTITY2 是10到20之间的任意取值 */
        and p_size between 1 and 10
    and l_shipmode in (‘AIR’, ‘AIR REG’)
        and l_shipinstruct = ‘DELIVER IN PERSON’
    ) or (
    p_partkey = l_partkey
    and p_brand = ‘[BRAND3]’
        and p_container in ( ‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’)
        and l_quantity >= [QUANTITY3] and l_quantity <= [QUANTITY3] + 10 /* QUANTITY3 是20到30之间的任意取值 */
        and p_size between 1 and 15
        and l_shipmode in (‘AIR’, ‘AIR REG’)
        and l_shipinstruct = ‘DELIVER IN PERSON’
    );

示例sql:

select
    sum(l_extendedprice* (1 - l_discount)) as revenue
from
    lineitem,
    part
where
    (
         p_partkey = l_partkey
         and p_brand = 'Brand#42'
         and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
         and l_quantity >= 6 and l_quantity <= 6 + 10
         and p_size between 1 and 5
         and l_shipmode in ('AIR', 'AIR REG')
         and l_shipinstruct = 'DELIVER IN PERSON'
    )
    or
    (
         p_partkey = l_partkey
         and p_brand = 'Brand#55'
         and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
         and l_quantity >= 20 and l_quantity <= 20 + 10
         and p_size between 1 and 10
         and l_shipmode in ('AIR', 'AIR REG')
         and l_shipinstruct = 'DELIVER IN PERSON'
    )
    or
    (
         p_partkey = l_partkey
         and p_brand = 'Brand#24'
         and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
         and l_quantity >= 23 and l_quantity <= 23 + 10
         and p_size between 1 and 15
         and l_shipmode in ('AIR', 'AIR REG')
         and l_shipinstruct = 'DELIVER IN PERSON'
    )
LIMIT 1;

Q20:潜在零部件促销查询
Q20语句查询确定在某一年内,找出指定国家的能对某一零件商品提供更有竞争力价格的供货货。所谓更有竞争力的供货商,是指那些零件有过剩的供货商,超过供或商在某一年中货运给定国的某一零件的50%则为过剩。
潜在零部件促销查询是带有排序、聚集、IN子查询、普通子查询操作并存的两表连接操作,识别特定国家的供应商,他们选择的零部件可能是促销报价的候选者;可识别提供给定零件过多的供应商,过多被定义为超过50%的零部件。

Q20语句的特点是:带有排序、聚集、IN子查询、普通子查询操作并存的两表连接操作。

--语句详解

select
    s_name, s_address
from
    supplier, nation
where
    s_suppkey in ( //第一层的IN子查询
        select
            ps_suppkey
        from
            partsupp
        where
            ps_partkey in ( //第二层嵌套的IN子查询
                select
                    p_partkey
                from
                    part
                where
                    p_name like '[COLOR]%' //COLOR为产生P_NAME的值的列表中的任意值
            )
            and ps_availqty > (//第二层嵌套的子查询
                select
                    0.5 * sum(l_quantity) //聚集子查询
                from
                    lineitem
                where
                    l_partkey = ps_partkey
                    and l_suppkey = ps_suppkey
                    and l_shipdate >= date('[DATE]') // DATE为在1993年至1997年的任一年的一月一号
                    and l_shipdate < date('[DATE]') + interval ‘1’ year //1年内
            )
    )
  and s_nationkey = n_nationkey
  and n_name = '[NATION]' //TPC-H标准定义的任意值
order by
    s_name;

示例sql:

select s_name,s_address
from supplier,nation
where
    s_suppkey in 
    (select ps_suppkey
        from partsupp,
            (
                select
                    l_partkey agg_partkey,
                    l_suppkey agg_suppkey,
                    0.5 * sum(l_quantity) AS agg_quantity
                from
                    lineitem
                where
                    l_shipdate >= date '1997-01-01'
                    and l_shipdate < date '1997-01-01' + interval '1' year
                group by
                    l_partkey,
                    l_suppkey
            ) agg_lineitem
        where
            agg_partkey = ps_partkey
            and agg_suppkey = ps_suppkey
            and ps_partkey in 
            (
                select
                    p_partkey
                from
                    part
                where
                    p_name like 'frosted%'
            )
            and ps_availqty > agg_quantity
    )
    and s_nationkey = n_nationkey
    and n_name = 'UNITED STATES'
order by
    s_name
LIMIT 1;

Q21:供应商留单等待查询
供应商留单等待查询是带有分组、排序、聚集、EXISTS子查询、NOT EXISTS子查询操作并存的四表连接操作,标识无法及时发货的供应商。

Q21语句的特点是:带有分组、排序、聚集、EXISTS子查询、NOT EXISTS子查询操作并存的四表连接操作。查询语句没有从语法上限制返回多少条元组,但是TPC-H标准规定,查询结果只返回前100行(通常依赖于应用程序实现)。

--语句详解

select
    s_name, count(*) as numwait
from
    supplier, lineitem l1, orders, nation
where
    s_suppkey = l1.l_suppkey
    and o_orderkey = l1.l_orderkey
    and o_orderstatus = 'F'
    and l1.l_receiptdate > l1.l_commitdate
    and exists ( //EXISTS子查询
        select
            *
        from
            lineitem l2
        where
            l2.l_orderkey = l1.l_orderkey
            and l2.l_suppkey <> l1.l_suppkey
    )
    and not exists ( //NOT EXISTS子查询
        select
            *
        from
            lineitem l3
        where
            l3.l_orderkey = l1.l_orderkey
            and l3.l_suppkey <> l1.l_suppkey
            and l3.l_receiptdate > l3.l_commitdate
    )
    and s_nationkey = n_nationkey
    and n_name = '[NATION]' //TPC-H标准定义的任意值
group by
    s_name
order by
    numwait desc,
    s_name;

示例sql:

select
    s_name,
    count(*) as numwait
from
    supplier,
    lineitem l1,
    orders,
    nation
where
    s_suppkey = l1.l_suppkey
    and o_orderkey = l1.l_orderkey
    and o_orderstatus = 'F'
    and l1.l_receiptdate > l1.l_commitdate
    and exists 
    (
        select
            *
        from
            lineitem l2
        where
            l2.l_orderkey = l1.l_orderkey
            and l2.l_suppkey <> l1.l_suppkey
    )
    and not exists 
    (
        select
            *
        from
            lineitem l3
        where
            l3.l_orderkey = l1.l_orderkey
            and l3.l_suppkey <> l1.l_suppkey
            and l3.l_receiptdate > l3.l_commitdate
    )
    and s_nationkey = n_nationkey
    and n_name = 'VIETNAM'
group by
    s_name
order by
    numwait desc,
    s_name
LIMIT 100;

Q22:全球销售机会查询
全球销售机会查询是带有分组、排序、聚集、EXISTS子查询、NOT EXISTS子查询操作并存的四表连接操作,确定有可能进行购买的客户所在的地理位置,计算特定国家代码范围内有多少客户7年未下订单,但他们的“正”账户余额高于平均水平。

Q22语句的特点是:带有分组、排序、聚集、EXISTS子查询、NOT EXISTS子查询操作并存的四表连接操作。

--语句详解

select
    cntrycode,
    count(*) as numcust,
    sum(c_acctbal) as totacctbal
from ( //第一层子查询
    select
        substring(c_phone from 1 for 2) as cntrycode,
        c_acctbal
    from
        customer
    where
        // I1…I7是在TPC-H中定义国家代码的可能值中不重复的任意值
        substring(c_phone from 1 for 2) in ('[I1]','[I2]','[I3]','[I4]','[I5]','[I6]','[I7]')
        and c_acctbal > (//第二层聚集子查询
            select
                avg(c_acctbal)
            from
                customer
            where
                c_acctbal > 0.00
                and substr (c_phone from 1 for 2) in ('[I1]','[I2]','[I3]','[I4]','[I5]','[I6]','[I7]')
        )
        and not exists (//第二层NOT EXISTS子查询
            select
                *
            from
                orders
            where
                o_custkey = c_custkey
        )
    ) as custsale
group by
    cntrycode
order by
    cntrycode;

示例sql:

select
    cntrycode,
    count(*) as numcust,
    sum(c_acctbal) as totacctbal
from
(
    select
        substring(c_phone from 1 for 2) as cntrycode,
        c_acctbal
    from
        customer
    where
        substring(c_phone from 1 for 2) in
        ('34', '18', '30', '24', '33', '22', '29')
        and c_acctbal > 
        (
            select
                avg(c_acctbal)
            from
                customer
            where
                c_acctbal > 0.00
                and substring(c_phone from 1 for 2) in
                ('34', '18', '30', '24', '33', '22', '29')
        )
        and not exists 
        (
            select
                *
            from
                orders
            where
                o_custkey = c_custkey
        )
) as custsale
group by
    cntrycode
order by
    cntrycode limit 1;

五、测试结果举例


image.png

六、测试总结

1.行存模式,执行计划均不走索引,获取超过80%以上的记录,走的是全表扫描
2.列存模式tpch查询性能优于行存。
3.列存不支持主外键约束。delta表是系统表,不支持变更,即不支持建立主外键
4.列存cluster key约束需要在导入数据前创建,否则只会影响新数据,不会影响存量数据
5.cluster key带排序功能,会稍微提升22条sql的查询性能
6.列存可以创建btree索引,适用于局部很小范围查询或等值查询,不适用于这些sql
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 215,463评论 6 497
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,868评论 3 391
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 161,213评论 0 351
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,666评论 1 290
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,759评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,725评论 1 294
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,716评论 3 415
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,484评论 0 270
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,928评论 1 307
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,233评论 2 331
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,393评论 1 345
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,073评论 5 340
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,718评论 3 324
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,308评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,538评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,338评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,260评论 2 352

推荐阅读更多精彩内容