citus多副本模式-开启，功能验证和使用约束

citus是一款基于PostgreSQL的插件，通过citus可以将单机版的PostgreSQL集合在一起形成分布式PostgreSQL数据库。citus支持两种模式的数据高可用策略：

流复制模式 本质是基于PostgreSQL原生的主从实现高可用，citus集群中的每一个PostgreSQL实例均包含一主一从(或多从)节点，通过PostgreSQL自身的高可用特性保证数据不丢失
多副本模式 本质是将一份数据存储在多个不同的PostgreSQL节点上，保证数据可用性

本文简单研究citus多副本的用法，以及使用上的一些问题，包含以下几个步骤

部署验证环境
开启多副本模式，并进行简单验证
测试多副本模式与流复制模式的兼容性

部署验证环境

搭建1协调节点+2数据节点citus集群如下（配置过程不是本文重点，略过此步骤）

节点	角色
192.168.0.1	协调节点+数据节点
192.168.0.2	数据节点
192.168.0.3	数据节点

开启多副本模式

配置和建表

连接到citus协调节点，执行以下命令即可开启多副本模式

# 设置副本数量（仅当前session有效）,如需持久化应使用PostgreSQL的alter system命令
# 默认值为1，即只有1份数据，无高可用机制
set citus.shard_replication_factor = 3;

建分布式表

create table repl3(a int primary key , b int , c int );
select create_distributed_table('repl3', 'a');

验证建好的表具有3副本

我们从两个方面验证建好的表有3副本特性

元数据

查询元数据可以看到，每个shardid放置到了三个位置

repl1=# select t1.logicalrelid,t1.shardid,count(1) from pg_dist_shard t1, pg_dist_placement t2 where t1.shardid = t2.shardid group by t1.logicalrelid,t1.shardid;
 logicalrelid | shardid | count
--------------+---------+-------
 repl3        |  102247 |     3
 repl3        |  102248 |     3
 repl3        |  102234 |     3
...

物理数据

先插入一条数据，确认落在哪个分片上

# 插入一条数据，可以看到落在 102245 这个shardid上
repl1=# insert into repl3(a, b, c) values (0,0,0) ;
INSERT 0 1
repl1=# explain select * from repl3 where a = 0;
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Custom Scan (Citus Adaptive)  (cost=0.00..0.00 rows=0 width=0)
   Task Count: 1
   Tasks Shown: All
   ->  Task
         Node: host=192.168.0.1 port=15432 dbname=repl1
         ->  Index Scan using repl3_pkey_102245 on repl3_102245 repl3  (cost=0.15..3.17 rows=1 width=12)
               Index Cond: (a = 0)
(7 rows)

#

查看sharid对应的物理节点,发现在groupid为1,2,3的节点上均有一个副本

repl1=# select t1.*, t2.* from pg_dist_shard t1, pg_dist_placement t2 where t1.shardid = t2.shardid and t1.shardid = 102245;
 logicalrelid | shardid | shardstorage | shardminvalue | shardmaxvalue | placementid | shardid | shardstate | shardlength | groupid
--------------+---------+--------------+---------------+---------------+-------------+---------+------------+-------------+---------
 repl3        |  102245 | t            | -402653184    | -268435457    |         456 |  102245 |          1 |           0 |       2
 repl3        |  102245 | t            | -402653184    | -268435457    |         457 |  102245 |          1 |           0 |       3
 repl3        |  102245 | t            | -402653184    | -268435457    |         458 |  102245 |          1 |           0 |       1
(3 rows)

查询三个数据节点上对应shardid的表，发现数据确实存有3份

repl1=# select run_command_on_workers('select a from repl3_102245');
 run_command_on_workers
-------------------------
 (192.168.0.1,15432,t,0)
 (192.168.0.2,15432,t,0)
 (192.168.0.3,15432,t,0)
(3 rows)

至此，功能验证完毕

测试与citus流复制模式的兼容性

实际生产上，单个协调节点是肯定存在性能瓶颈的，开源citus在流复制模式下，可以将分布式元数据从协调节点同步到数据节点。

开启流复制

# 可以看到，citus使用的是statement模式，该模式下不支持将元数据同步到数据节点
repl1=# show citus.replication_model;
 citus.replication_model
-------------------------
 statement

# 这个配置对整个节点生效
alter system set citus.replication_model='streaming';
SELECT pg_reload_conf();

# 同步元数据到数据节点
SELECT * from start_metadata_sync_to_node('192.168.0.2', 15432);

# 查看同步情况
repl1=# select * from pg_dist_node;
nodeid | groupid |  nodename   | nodeport | noderack | hasmetadata | isactive | noderole | nodecluster | metadatasynced | shouldhaveshards
--------+---------+-------------+----------+----------+-------------+----------+----------+-------------+----------------+------------------
      1 |       1 | 192.168.0.1 |    15432 | default  | f           | t        | primary  | default     | f              | t
      3 |       3 | 192.168.0.3 |    15432 | default  | f           | t        | primary  | default     | f              | t
      2 |       2 | 192.168.0.2 |    15432 | default  | t           | t        | primary  | default     | t              | t
(3 rows)

建表测试

repl1=# create table repl_model_streaming(a int primary key, b int , c int);
repl1=# select create_distributed_table('repl_model_streaming', 'a');
ERROR:  replication factors above one are incompatible with the streaming replication model
HINT:  Try again after reducing "citus.shard_replication_factor" to one or setting "citus.replication_model" to "statement".

可见，citus的流复制模式不可以和多副本模式同时使用

副本为1时可以，但是副本为1等同于没有副本