PostgreSQL Page页结构解析(3)- 行数据

本文介绍了PG数据页Page中存储的原始内容以及如何阅读它们,这一节主要介绍行数据(Items)。

一、测试数据

详见上一节,数据文件中的内容如下:

[xdb@localhost utf8db]$ hexdump -C $PGDATA/base/16477/24801
00000000  01 00 00 00 88 20 2a 12  00 00 00 00 28 00 60 1f  |..... *.....(.`.|
00000010  00 20 04 20 00 00 00 00  d8 9f 4e 00 b0 9f 4e 00  |. . ......N...N.|
00000020  88 9f 4e 00 60 9f 4e 00  00 00 00 00 00 00 00 00  |..N.`.N.........|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001f60  e5 1b 18 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00001f70  04 00 03 00 02 08 18 00  04 00 00 00 13 34 20 20  |.............4  |
00001f80  20 20 20 20 20 05 64 00  e4 1b 18 00 00 00 00 00  |    .d.........|
00001f90  00 00 00 00 00 00 00 00  03 00 03 00 02 08 18 00  |................|
00001fa0  03 00 00 00 13 33 20 20  20 20 20 20 20 05 63 00  |.....3      .c.|
00001fb0  e3 1b 18 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00001fc0  02 00 03 00 02 08 18 00  02 00 00 00 13 32 20 20  |.............2  |
00001fd0  20 20 20 20 20 05 62 00  e2 1b 18 00 00 00 00 00  |    .b.........|
00001fe0  00 00 00 00 00 00 00 00  01 00 03 00 02 08 18 00  |................|
00001ff0  01 00 00 00 13 31 20 20  20 20 20 20 20 05 61 00  |.....1      .a.|
00002000

二、Items(Tuples)

每个Tuple包括两部分,第一部分是Tuple头部信息,第二部分是实际的数据。

1、HeapTupleHeader

相关数据结构如下:

//--------------------- src/include/storage/off.h
/*
* OffsetNumber:
*
* this is a 1-based index into the linp (ItemIdData) array in the
* header of each disk page.
*/
typedef uint16 OffsetNumber;

//--------------------- src/include/storage/block.h
/*
* BlockId:
*
* this is a storage type for BlockNumber.  in other words, this type
* is used for on-disk structures (e.g., in HeapTupleData) whereas
* BlockNumber is the type on which calculations are performed (e.g.,
* in access method code).
*
* there doesn't appear to be any reason to have separate types except
* for the fact that BlockIds can be SHORTALIGN'd (and therefore any
* structures that contains them, such as ItemPointerData, can also be
* SHORTALIGN'd).  this is an important consideration for reducing the
* space requirements of the line pointer (ItemIdData) array on each
* page and the header of each heap or index tuple, so it doesn't seem
* wise to change this without good reason.
*/
typedef struct BlockIdData
{
    uint16      bi_hi;
    uint16      bi_lo;
} BlockIdData;

typedef BlockIdData *BlockId; /* block identifier */

//--------------------- src/include/storage/itemptr.h
/*
  * ItemPointer:
  *
  * This is a pointer to an item within a disk page of a known file
  * (for example, a cross-link from an index to its parent table).
  * blkid tells us which block, posid tells us which entry in the linp
  * (ItemIdData) array we want.
  *
  * Note: because there is an item pointer in each tuple header and index
  * tuple header on disk, it's very important not to waste space with
  * structure padding bytes.  The struct is designed to be six bytes long
  * (it contains three int16 fields) but a few compilers will pad it to
  * eight bytes unless coerced.  We apply appropriate persuasion where
  * possible.  If your compiler can't be made to play along, you'll waste
  * lots of space.
  */
 typedef struct ItemPointerData
 {
     BlockIdData ip_blkid;
     OffsetNumber ip_posid;
 }

//--------------------- src/include/access/htup_details.h
typedef struct HeapTupleFields
{
    TransactionId t_xmin;       /* inserting xact ID */
    TransactionId t_xmax;       /* deleting or locking xact ID */
    union
    {
        CommandId   t_cid;      /* inserting or deleting command ID, or both */
        TransactionId t_xvac;   /* old-style VACUUM FULL xact ID */
    }           t_field3;
} HeapTupleFields;

typedef struct DatumTupleFields
{
    int32       datum_len_;     /* varlena header (do not touch directly!) */

    int32       datum_typmod;   /* -1, or identifier of a record type */

    Oid         datum_typeid;   /* composite type OID, or RECORDOID */

    /*
     * datum_typeid cannot be a domain over composite, only plain composite,
     * even if the datum is meant as a value of a domain-over-composite type.
     * This is in line with the general principle that CoerceToDomain does not
     * change the physical representation of the base type value.
     *
     * Note: field ordering is chosen with thought that Oid might someday
     * widen to 64 bits.
     */
} DatumTupleFields;

struct HeapTupleHeaderData
{
    union
    {
        HeapTupleFields t_heap;
        DatumTupleFields t_datum;
    }           t_choice;

    ItemPointerData t_ctid;     /* current TID of this or newer tuple (or a
                                 * speculative insertion token) */

    /* Fields below here must match MinimalTupleData! */

#define FIELDNO_HEAPTUPLEHEADERDATA_INFOMASK2 2
    uint16      t_infomask2;    /* number of attributes + various flags */

#define FIELDNO_HEAPTUPLEHEADERDATA_INFOMASK 3
    uint16      t_infomask;     /* various flag bits, see below */

#define FIELDNO_HEAPTUPLEHEADERDATA_HOFF 4
    uint8       t_hoff;         /* sizeof header incl. bitmap, padding */

    /* ^ - 23 bytes - ^ */

#define FIELDNO_HEAPTUPLEHEADERDATA_BITS 5
    bits8       t_bits[FLEXIBLE_ARRAY_MEMBER];  /* bitmap of NULLs */

    /* MORE DATA FOLLOWS AT END OF STRUCT */
};

结构体展开,详见下表:

Field           Type            Length  Offset  Description
t_xmin          TransactionId   4 bytes 0       insert XID stamp
t_xmax          TransactionId   4 bytes 4       delete XID stamp
t_cid           CommandId       4 bytes 8       insert and/or delete CID stamp (overlays with t_xvac)
t_xvac          TransactionId   4 bytes 8       XID for VACUUM operation moving a row version
t_ctid          ItemPointerData 6 bytes 12      current TID of this or newer row version
t_infomask2     uint16          2 bytes 18      number of attributes, plus various flag bits
t_infomask      uint16          2 bytes 20      various flag bits
t_hoff          uint8           1 byte  22      offset to user data
//注意:t_cid和t_xvac为联合体,共用存储空间

从上一节我们已经得出第1个Tuple的偏移为8152,下面使用hexdump对其中的数据逐个解析:
t_xmin

[xdb@localhost ~]$ hexdump -C $PGDATA/base/16477/24801 -s 8152 -n 4
00001fd8  e2 1b 18 00                                       |....|
00001fdc
[xdb@localhost ~]$ echo $((0x00181be2))
1580002

t_xmax

[xdb@localhost ~]$ hexdump -C $PGDATA/base/16477/24801 -s 8156 -n 4
00001fdc  00 00 00 00                                       |....|
00001fe0

t_cid/t_xvac

[xdb@localhost ~]$ hexdump -C $PGDATA/base/16477/24801 -s 8160 -n 4
00001fe0  00 00 00 00                                       |....|
00001fe4

t_ctid

[xdb@localhost ~]$ hexdump -C $PGDATA/base/16477/24801 -s 8164 -n 6
00001fe4  00 00 00 00 01 00                                 |......|
00001fea

//ip_blkid=\x0000,即blockid=0
//ip_posid=\x0001,即posid=1,第1个tuple

t_infomask2

[xdb@localhost ~]$ hexdump -C $PGDATA/base/16477/24801 -s 8170 -n 2
00001fea  03 00                                             |..|
00001fec

//t_infomask2=\x0003,3代表什么意思?我们看看t_infomask2的说明
 /*

  * information stored in t_infomask2:

  */

 #define HEAP_NATTS_MASK 0x07FF /* 11 bits for number of attributes */

 /* bits 0x1800 are available */

 #define HEAP_KEYS_UPDATED 0x2000 /* tuple was updated and key cols

  * modified, or tuple deleted */

 #define HEAP_HOT_UPDATED 0x4000 /* tuple was HOT-updated */

 #define HEAP_ONLY_TUPLE 0x8000 /* this is heap-only tuple */

 #define HEAP2_XACT_MASK 0xE000 /* visibility-related bits */
//根把十六进制值转换为二进制显示
     11111111111 #define HEAP_NATTS_MASK         0x07FF 
  10000000000000 #define HEAP_KEYS_UPDATED       0x2000  
 100000000000000 #define HEAP_HOT_UPDATED        0x4000  
1000000000000000 #define HEAP_ONLY_TUPLE         0x8000  
1110000000000000 #define HEAP2_XACT_MASK         0xE000 
1111111111111110 #define SpecTokenOffsetNumber       0xfffe
//前(低)11位为属性的个数,3意味着有3个属性(字段)

t_infomask

[xdb@localhost ~]$ hexdump -C $PGDATA/base/16477/24801 -s 8172 -n 2
00001fec  02 08                                             |..|
00001fee
[xdb@localhost ~]$ echo $((0x0802))
2050
[xdb@localhost ~]$ echo "obase=2;2050"|bc
100000000010

//t_infomask=\x0802,十进制值为2050,二进制值为100000000010
//t_infomask说明
               1 #define HEAP_HASNULL            0x0001  /* has null attribute(s) */
              10 #define HEAP_HASVARWIDTH        0x0002  /* has variable-width attribute(s) */
             100 #define HEAP_HASEXTERNAL        0x0004  /* has external stored attribute(s) */
            1000 #define HEAP_HASOID             0x0008  /* has an object-id field */
           10000 #define HEAP_XMAX_KEYSHR_LOCK   0x0010  /* xmax is a key-shared locker */
          100000 #define HEAP_COMBOCID           0x0020  /* t_cid is a combo cid */
         1000000 #define HEAP_XMAX_EXCL_LOCK     0x0040  /* xmax is exclusive locker */
        10000000 #define HEAP_XMAX_LOCK_ONLY     0x0080  /* xmax, if valid, is only a locker */
                    /* xmax is a shared locker */
                 #define HEAP_XMAX_SHR_LOCK  (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_KEYSHR_LOCK)
                 #define HEAP_LOCK_MASK  (HEAP_XMAX_SHR_LOCK | HEAP_XMAX_EXCL_LOCK | \
                          HEAP_XMAX_KEYSHR_LOCK)
       100000000 #define HEAP_XMIN_COMMITTED     0x0100  /* t_xmin committed */
      1000000000 #define HEAP_XMIN_INVALID       0x0200  /* t_xmin invalid/aborted */
                 #define HEAP_XMIN_FROZEN        (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)
     10000000000 #define HEAP_XMAX_COMMITTED     0x0400  /* t_xmax committed */
    100000000000 #define HEAP_XMAX_INVALID       0x0800  /* t_xmax invalid/aborted */
   1000000000000 #define HEAP_XMAX_IS_MULTI      0x1000  /* t_xmax is a MultiXactId */
  10000000000000 #define HEAP_UPDATED            0x2000  /* this is UPDATEd version of row */
 100000000000000 #define HEAP_MOVED_OFF          0x4000  /* moved to another place by pre-9.0
                                         * VACUUM FULL; kept for binary
                                         * upgrade support */
1000000000000000 #define HEAP_MOVED_IN           0x8000  /* moved from another place by pre-9.0
                                         * VACUUM FULL; kept for binary
                                         * upgrade support */
                 #define HEAP_MOVED (HEAP_MOVED_OFF | HEAP_MOVED_IN)
1111111111110000 #define HEAP_XACT_MASK          0xFFF0  /* visibility-related bits */
//\x0802,二进制100000000010表示第2位和第12位为1,
//意味着存在可变长属性(HEAP_HASVARWIDTH),XMAX无效(HEAP_XMAX_INVALID)

t_hoff

[xdb@localhost ~]$ hexdump -C $PGDATA/base/16477/24801 -s 8174 -n 1
00001fee  18                                                |.|
00001fef
[xdb@localhost ~]$ echo $((0x18))
24
//用户数据开始偏移为24,即8152+24
2、Tuple

说完了Tuple的头部数据,接下来我们看看实际的数据存储。上一节我们得到Tuple总的长度是39,计算得到数据大小为39-24=15。

[xdb@localhost ~]$ hexdump -C $PGDATA/base/16477/24801 -s 8176 -n 15
00001ff0  01 00 00 00 13 31 20 20  20 20 20 20 20 05 61     |.....1       .a|
00001fff

回顾我们的表结构:
create table t_page (id int,c1 char(8),c2 varchar(16));
第1个字段为int,第2个字段为定长字符,第3个字段为变长字符。
相应的数据:
id=\x00000001,数字1
c1=\x133120202020202020,字符串,无需高低位变换,第1个字节\x13为标志位,后面是字符'1'+7个空格
c2=\x0561,字符串,第1个字节\x05为标志位,后面是字符'a'

三、小结

以上简单介绍了如何阅读Raw Datafile中的数据行信息,包括Tuple头部信息和用户数据。在空间使用上面,可以看PG到为了进行实际的数据查询和MVCC等机制,数据库添加了很多的额外信息,实际占用的空间大小比实际的数据要大很多,如果使用列式存储应可有效的压缩空间。

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 216,001评论 6 498
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,210评论 3 392
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 161,874评论 0 351
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,001评论 1 291
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,022评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,005评论 1 295
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,929评论 3 416
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,742评论 0 271
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,193评论 1 309
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,427评论 2 331
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,583评论 1 346
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,305评论 5 342
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,911评论 3 325
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,564评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,731评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,581评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,478评论 2 352

推荐阅读更多精彩内容