PostgreSQL 源码解读(47)- 查询语句#32(query_planner函数#8)

先前的章节已介绍了函数query_planner中子函数query_planner中qp_callback(回调函数)和fix_placeholder_input_needed_levels的主要实现逻辑,本节继续介绍remove_useless_joins、reduce_unique_semijoins和add_placeholders_to_base_rels的实现逻辑。

query_planner代码片段:

     //...
 
     /*
      * Remove any useless outer joins.  Ideally this would be done during
      * jointree preprocessing, but the necessary information isn't available
      * until we've built baserel data structures and classified qual clauses.
      */
     joinlist = remove_useless_joins(root, joinlist);//清除无用的外连接
 
     /*
      * Also, reduce any semijoins with unique inner rels to plain inner joins.
      * Likewise, this can't be done until now for lack of needed info.
      */
     reduce_unique_semijoins(root);//消除半连接
 
     /*
      * Now distribute "placeholders" to base rels as needed.  This has to be
      * done after join removal because removal could change whether a
      * placeholder is evaluable at a base rel.
      */
     add_placeholders_to_base_rels(root);//在"base rels"中添加PH
 
      //...

一、数据结构

PlaceHolderVar
上一小节已介绍过PHInfo

 /*
  * Placeholder node for an expression to be evaluated below the top level
  * of a plan tree.  This is used during planning to represent the contained
  * expression.  At the end of the planning process it is replaced by either
  * the contained expression or a Var referring to a lower-level evaluation of
  * the contained expression.  Typically the evaluation occurs below an outer
  * join, and Var references above the outer join might thereby yield NULL
  * instead of the expression value.
  *
  * Although the planner treats this as an expression node type, it is not
  * recognized by the parser or executor, so we declare it here rather than
  * in primnodes.h.
  */
 
 typedef struct PlaceHolderVar
 {
     Expr        xpr;
     Expr       *phexpr;         /* the represented expression */
     Relids      phrels;         /* base relids syntactically within expr src */
     Index       phid;           /* ID for PHV (unique within planner run) */
     Index       phlevelsup;     /* > 0 if PHV belongs to outer query */
 } PlaceHolderVar;

SpecialJoinInfo

 /*
  * "Special join" info.
  *
  * One-sided outer joins constrain the order of joining partially but not
  * completely.  We flatten such joins into the planner's top-level list of
  * relations to join, but record information about each outer join in a
  * SpecialJoinInfo struct.  These structs are kept in the PlannerInfo node's
  * join_info_list.
  *
  * Similarly, semijoins and antijoins created by flattening IN (subselect)
  * and EXISTS(subselect) clauses create partial constraints on join order.
  * These are likewise recorded in SpecialJoinInfo structs.
  *
  * We make SpecialJoinInfos for FULL JOINs even though there is no flexibility
  * of planning for them, because this simplifies make_join_rel()'s API.
  *
  * min_lefthand and min_righthand are the sets of base relids that must be
  * available on each side when performing the special join.  lhs_strict is
  * true if the special join's condition cannot succeed when the LHS variables
  * are all NULL (this means that an outer join can commute with upper-level
  * outer joins even if it appears in their RHS).  We don't bother to set
  * lhs_strict for FULL JOINs, however.
  *
  * It is not valid for either min_lefthand or min_righthand to be empty sets;
  * if they were, this would break the logic that enforces join order.
  *
  * syn_lefthand and syn_righthand are the sets of base relids that are
  * syntactically below this special join.  (These are needed to help compute
  * min_lefthand and min_righthand for higher joins.)
  *
  * delay_upper_joins is set true if we detect a pushed-down clause that has
  * to be evaluated after this join is formed (because it references the RHS).
  * Any outer joins that have such a clause and this join in their RHS cannot
  * commute with this join, because that would leave noplace to check the
  * pushed-down clause.  (We don't track this for FULL JOINs, either.)
  *
  * For a semijoin, we also extract the join operators and their RHS arguments
  * and set semi_operators, semi_rhs_exprs, semi_can_btree, and semi_can_hash.
  * This is done in support of possibly unique-ifying the RHS, so we don't
  * bother unless at least one of semi_can_btree and semi_can_hash can be set
  * true.  (You might expect that this information would be computed during
  * join planning; but it's helpful to have it available during planning of
  * parameterized table scans, so we store it in the SpecialJoinInfo structs.)
  *
  * jointype is never JOIN_RIGHT; a RIGHT JOIN is handled by switching
  * the inputs to make it a LEFT JOIN.  So the allowed values of jointype
  * in a join_info_list member are only LEFT, FULL, SEMI, or ANTI.
  *
  * For purposes of join selectivity estimation, we create transient
  * SpecialJoinInfo structures for regular inner joins; so it is possible
  * to have jointype == JOIN_INNER in such a structure, even though this is
  * not allowed within join_info_list.  We also create transient
  * SpecialJoinInfos with jointype == JOIN_INNER for outer joins, since for
  * cost estimation purposes it is sometimes useful to know the join size under
  * plain innerjoin semantics.  Note that lhs_strict, delay_upper_joins, and
  * of course the semi_xxx fields are not set meaningfully within such structs.
  */
 
 typedef struct SpecialJoinInfo
 {
     NodeTag     type;
     Relids      min_lefthand;   /* base relids in minimum LHS for join */
     Relids      min_righthand;  /* base relids in minimum RHS for join */
     Relids      syn_lefthand;   /* base relids syntactically within LHS */
     Relids      syn_righthand;  /* base relids syntactically within RHS */
     JoinType    jointype;       /* always INNER, LEFT, FULL, SEMI, or ANTI */
     bool        lhs_strict;     /* joinclause is strict for some LHS rel */
     bool        delay_upper_joins;  /* can't commute with upper RHS */
     /* Remaining fields are set only for JOIN_SEMI jointype: */
     bool        semi_can_btree; /* true if semi_operators are all btree */
     bool        semi_can_hash;  /* true if semi_operators are all hash */
     List       *semi_operators; /* OIDs of equality join operators */
     List       *semi_rhs_exprs; /* righthand-side expressions of these ops */
 } SpecialJoinInfo;

二、源码解读

remove_useless_joins
清除无用的连接,比如以下的SQL语句:

select t1.dwbh 
from t_grxx t1 left join t_dwxx t2 on t1.dwbh = t2.dwbh;

左连接,而且t_dwxx.dwbh唯一,这样的连接是不需要的连接,直接查询t_grxx即可.
从执行计划来看,PG只对t_grxx进行扫描:

testdb=# explain verbose select t1.dwbh from t_grxx t1 left join t_dwxx t2 on t1.dwbh = t2.dwbh;
                             QUERY PLAN                             
--------------------------------------------------------------------
 Seq Scan on public.t_grxx t1  (cost=0.00..14.00 rows=400 width=38)
   Output: t1.dwbh
(2 rows)

源代码如下:

 /*
  * remove_useless_joins
  *      Check for relations that don't actually need to be joined at all,
  *      and remove them from the query.
  *
  * We are passed the current joinlist and return the updated list.  Other
  * data structures that have to be updated are accessible via "root".
  */
 List *
 remove_useless_joins(PlannerInfo *root, List *joinlist)
 {
     ListCell   *lc;
 
     /*
      * We are only interested in relations that are left-joined to, so we can
      * scan the join_info_list to find them easily.
      */
 restart:
     foreach(lc, root->join_info_list)//遍历连接信息链表
     {
         SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc);
         int         innerrelid;
         int         nremoved;
 
         /* Skip if not removable */
         if (!join_is_removable(root, sjinfo))//判断是否可以清除连接
             continue;
 
         /*
          * Currently, join_is_removable can only succeed when the sjinfo's
          * righthand is a single baserel.  Remove that rel from the query and
          * joinlist.
          */
         innerrelid = bms_singleton_member(sjinfo->min_righthand);
 
         remove_rel_from_query(root, innerrelid,
                               bms_union(sjinfo->min_lefthand,
                                         sjinfo->min_righthand));//从查询中删除相应的Rel
 
         /* We verify that exactly one reference gets removed from joinlist */
         nremoved = 0;
         joinlist = remove_rel_from_joinlist(joinlist, innerrelid, &nremoved);
         if (nremoved != 1)
             elog(ERROR, "failed to find relation %d in joinlist", innerrelid);
 
         /*
          * We can delete this SpecialJoinInfo from the list too, since it's no
          * longer of interest.
          */
         //更新连接链表信息 
         root->join_info_list = list_delete_ptr(root->join_info_list, sjinfo);
 
         /*
          * Restart the scan.  This is necessary to ensure we find all
          * removable joins independently of ordering of the join_info_list
          * (note that removal of attr_needed bits may make a join appear
          * removable that did not before).  Also, since we just deleted the
          * current list cell, we'd have to have some kluge to continue the
          * list scan anyway.
          */
         goto restart;
     }
 
     return joinlist;
 }

reduce_unique_semijoins
把可以简化的半连接转化为内连接.
比如以下的SQL语句:

select t1.*
from t_grxx t1 
where dwbh IN (select t2.dwbh from t_dwxx t2);

由于子查询"select t2.dwbh from t_dwxx t2"的dwbh是PK,子查询提升后,t_grxx的dwbh只对应t_dwxx唯一的一条记录,因此可以把半连接转换为内连接,执行计划如下:

testdb=# explain verbose select t1.*
from t_grxx t1 
where dwbh IN (select t2.dwbh from t_dwxx t2);
                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Hash Join  (cost=1.07..20.10 rows=6 width=176)
   Output: t1.dwbh, t1.grbh, t1.xm, t1.xb, t1.nl
   Inner Unique: true
   Hash Cond: ((t1.dwbh)::text = (t2.dwbh)::text)
   ->  Seq Scan on public.t_grxx t1  (cost=0.00..14.00 rows=400 width=176)
         Output: t1.dwbh, t1.grbh, t1.xm, t1.xb, t1.nl
   ->  Hash  (cost=1.03..1.03 rows=3 width=38)
         Output: t2.dwbh
         ->  Seq Scan on public.t_dwxx t2  (cost=0.00..1.03 rows=3 width=38)
               Output: t2.dwbh
(10 rows)

跟踪分析:

(gdb) n
199   reduce_unique_semijoins(root);
(gdb) step
reduce_unique_semijoins (root=0x1702968) at analyzejoins.c:520
520   for (lc = list_head(root->join_info_list); lc != NULL; lc = next)
(gdb) n
522     SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc);
(gdb) 

查看SpecialJoinInfo内存结构:

528     next = lnext(lc);
(gdb) p *sjinfo
$1 = {type = T_SpecialJoinInfo, min_lefthand = 0x1749818, min_righthand = 0x1749830, syn_lefthand = 0x1749570, 
  syn_righthand = 0x17495d0, jointype = JOIN_SEMI, lhs_strict = true, delay_upper_joins = false, semi_can_btree = true, 
  semi_can_hash = true, semi_operators = 0x17496c8, semi_rhs_exprs = 0x17497b8}

内表(innerrel,即t_dwxx)如支持唯一性,则可以考虑把半连接转换为内连接

550     if (!rel_supports_distinctness(root, innerrel))
...
575     root->join_info_list = list_delete_ptr(root->join_info_list, sjinfo);
...

源代码如下:

 /*
  * reduce_unique_semijoins
  *      Check for semijoins that can be simplified to plain inner joins
  *      because the inner relation is provably unique for the join clauses.
  *
  * Ideally this would happen during reduce_outer_joins, but we don't have
  * enough information at that point.
  *
  * To perform the strength reduction when applicable, we need only delete
  * the semijoin's SpecialJoinInfo from root->join_info_list.  (We don't
  * bother fixing the join type attributed to it in the query jointree,
  * since that won't be consulted again.)
  */
 void
 reduce_unique_semijoins(PlannerInfo *root)
 {
     ListCell   *lc;
     ListCell   *next;
 
     /*
      * Scan the join_info_list to find semijoins.  We can't use foreach
      * because we may delete the current cell.
      */
     for (lc = list_head(root->join_info_list); lc != NULL; lc = next)
     {
         SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc);//特殊连接信息,先前通过deconstruct函数生成
         int         innerrelid;
         RelOptInfo *innerrel;
         Relids      joinrelids;
         List       *restrictlist;
 
         next = lnext(lc);
 
         /*
          * Must be a non-delaying semijoin to a single baserel, else we aren't
          * going to be able to do anything with it.  (It's probably not
          * possible for delay_upper_joins to be set on a semijoin, but we
          * might as well check.)
          */
         if (sjinfo->jointype != JOIN_SEMI ||
             sjinfo->delay_upper_joins)
             continue;
 
         if (!bms_get_singleton_member(sjinfo->min_righthand, &innerrelid))
             continue;
 
         innerrel = find_base_rel(root, innerrelid);
 
         /*
          * Before we trouble to run generate_join_implied_equalities, make a
          * quick check to eliminate cases in which we will surely be unable to
          * prove uniqueness of the innerrel.
          */
         if (!rel_supports_distinctness(root, innerrel))
             continue;
 
         /* Compute the relid set for the join we are considering */
         joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
 
         /*
          * Since we're only considering a single-rel RHS, any join clauses it
          * has must be clauses linking it to the semijoin's min_lefthand.  We
          * can also consider EC-derived join clauses.
          */
         restrictlist =
             list_concat(generate_join_implied_equalities(root,
                                                          joinrelids,
                                                          sjinfo->min_lefthand,
                                                          innerrel),
                         innerrel->joininfo);
 
         /* Test whether the innerrel is unique for those clauses. */
         if (!innerrel_is_unique(root,
                                 joinrelids, sjinfo->min_lefthand, innerrel,
                                 JOIN_SEMI, restrictlist, true))
             continue;
 
         /* OK, remove the SpecialJoinInfo from the list. */
         root->join_info_list = list_delete_ptr(root->join_info_list, sjinfo);//删除特殊连接信息
     }
 }

add_placeholders_to_base_rels
把PHV分发到base rels中,代码较为简单

 /*
  * add_placeholders_to_base_rels
  *      Add any required PlaceHolderVars to base rels' targetlists.
  *
  * If any placeholder can be computed at a base rel and is needed above it,
  * add it to that rel's targetlist.  This might look like it could be merged
  * with fix_placeholder_input_needed_levels, but it must be separate because
  * join removal happens in between, and can change the ph_eval_at sets.  There
  * is essentially the same logic in add_placeholders_to_joinrel, but we can't
  * do that part until joinrels are formed.
  */
 void
 add_placeholders_to_base_rels(PlannerInfo *root)
 {
     ListCell   *lc;
 
     foreach(lc, root->placeholder_list)//遍历PH链表
     {
         PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
         Relids      eval_at = phinfo->ph_eval_at;
         int         varno;
 
         if (bms_get_singleton_member(eval_at, &varno) &&
             bms_nonempty_difference(phinfo->ph_needed, eval_at))//添加到需要的RelOptInfo中
         {
             RelOptInfo *rel = find_base_rel(root, varno);
 
             rel->reltarget->exprs = lappend(rel->reltarget->exprs,
                                             copyObject(phinfo->ph_var));
             /* reltarget's cost and width fields will be updated later */
         }
     }
 }

三、参考资料

planmain.c
relation.h

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 218,451评论 6 506
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 93,172评论 3 394
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 164,782评论 0 354
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,709评论 1 294
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,733评论 6 392
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,578评论 1 305
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,320评论 3 418
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,241评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,686评论 1 314
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,878评论 3 336
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,992评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,715评论 5 346
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,336评论 3 330
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,912评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 33,040评论 1 270
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,173评论 3 370
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,947评论 2 355

推荐阅读更多精彩内容