半小时精通PG高可用方案repmgr

一、部署场景

操作系统：Debian 11, PG版本：14.2, repmgr版本：5.3.1

主库(node1): 10.211.55.9, 从库1(node2): 10.211.55.4, 从库2(node3): 10.211.55.6

二、安装repmgr

1、主从库都做如下操作

apt-get install -y postgresql-14-repmgr --因PG版本是14，所以下载与PG匹配的版本

apt-get install -y rsync

/etc/hosts 在结尾新增

10.211.55.9 node1

10.211.55.4 node2

10.211.55.6 node3

visudo 在结尾新增

postgres ALL = NOPASSWD: systemctl stop postgresql, systemctl start postgresql, systemctl restart postgresql, systemctl status postgresql, systemctl reload postgresql

在root用户下设置postgres用户的密码为 postgres

在postgres用户下配置ssh免密，三台之前都需要免密，包括自已，即如下操作在所有节点都需要做

ssh-keygen -t rsa

ssh-copy-id postgres@10.211.55.9

ssh-copy-id postgres@10.211.55.4

ssh-copy-id postgres@10.211.55.6

ssh-copy-id postgres@node1

ssh-copy-id postgres@node2

ssh-copy-id postgres@node3

2、仅在主库做如下操作

/etc/postgresql/14/main/postgresql.conf 配置文件含以下内容

listen_addresses = '*'

max_wal_senders = 10

max_replication_slots = 10

wal_level = hot_standby

hot_standby = on

archive_mode = always -- 从库需要设置为 always, 但为了避免failover后需要再次处理，所以主从都设为 always 比较好

archive_command = '/bin/true'

shared_preload_libraries = 'repmgr'

/etc/postgresql/14/main/pg_hba.conf 配置文件含以下内容

local all postgres peer

local replication repmgr trust

host replication repmgr 127.0.0.1/32 trust

host replication repmgr 10.211.55.0/24 trust

local repmgr repmgr trust

host repmgr repmgr 127.0.0.1/32 trust

host repmgr repmgr 10.211.55.0/24 trust

重启pg服务

/etc/repmgr.conf 配置文件含以下内容

node_id=1

node_name='node1'

conninfo='host=node1 user=repmgr dbname=repmgr connect_timeout=2'

data_directory='/var/lib/postgresql/14/main'

failover=automatic

promote_command='/usr/lib/postgresql/14/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'

follow_command='/usr/lib/postgresql/14/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'

service_start_command = 'sudo systemctl start postgresql'

service_stop_command = 'sudo systemctl stop postgresql'

service_restart_command = 'sudo systemctl restart postgresql'

service_reload_command = 'sudo systemctl reload postgresql'

repmgrd_pid_file='/tmp/repmgrd.pid'

log_file='/tmp/repmgrd.log'

priority=100

切到postgres用户下，新创建用户及数据库

createuser -s repmgr

createdb repmgr -O repmgr

切到postgres用户下集群注册主节点

repmgr -f /etc/repmgr.conf primary register

3、仅从库，以node2为例，node3与node2是相同的

数据目录需要为空，例如/var/lib/postgresql/14/main，如果创建了可以保留目录结构，但要清空里面的文件

测试可以连接到主库

psql 'host=node1 user=repmgr dbname=repmgr connect_timeout=2'

/etc/repmgr.conf 配置文件含以下内容

node_id=2

node_name='node2'

conninfo='host=node2 user=repmgr dbname=repmgr connect_timeout=2'

data_directory='/var/lib/postgresql/14/main'

failover=automatic

promote_command='/usr/lib/postgresql/14/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'

follow_command='/usr/lib/postgresql/14/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'

service_start_command = 'sudo systemctl start postgresql'

service_stop_command = 'sudo systemctl stop postgresql'

service_restart_command = 'sudo systemctl restart postgresql'

service_reload_command = 'sudo systemctl reload postgresql'

repmgrd_pid_file='/tmp/repmgrd.pid'

log_file='/tmp/repmgrd.log'

priority=100

运行dru-run命令以判断从库可以clone一份主库的数据

repmgr -h node1 -U repmgr -d repmgr -f /etc/repmgr.conf standby clone --dry-run

以上检测没问题后再实际执行

repmgr -h node1 -U repmgr -d repmgr -f /etc/repmgr.conf standby clone

将主库/etc/postgresql/14/main下的postgresql.conf和pg_hba.conf考贝到从库，并启动pg

systemctl start postgresql

以standby身份注册集群

repmgr -f /etc/repmgr.conf standby register

检查集群状态, 应该加进来了一个从库

repmgr -f /etc/repmgr.conf cluster show

\c repmgr

SELECT * FROM repmgr.nodes;

-- 相关事件查看

\c repmgr

SELECT * from repmgr.events;

用相同的方法处理node3

4、主从，启动repmgrd

repmgrd -f /etc/repmgr.conf

停止直接 killall repmgrd 即可

重载配置文件 kill -HUP `cat /tmp/repmgrd.pid`

检查本地节点状态 repmgr node check --repmgrd

三、管理

-- 检查集群状态

node相关

repmgr -f /etc/repmgr.conf node status

repmgr -f /etc/repmgr.conf node check

cluster相关

repmgr -f /etc/repmgr.conf cluster show

repmgr -f /etc/repmgr.conf cluster crosscheck

service相关

repmgr -f /etc/repmgr.conf service status

库里的表也可以查状态，进repmgr库里检查集群状态

SELECT * FROM repmgr.nodes;

暂停repmgrd，可以在任何一个节点上执行，可以用于例行维护，避免主库正常关闭，集群被切换

repmgr -f /etc/repmgr.conf service pause

解除暂停为

repmgr -f /etc/repmgr.conf service unpause

-- 集群新增一个standby节点

相应节点做好配置后，在此节点为上执行 repmgr -f /etc/repmgr.conf standby register

-- 集群删除一个standby节点

执行 repmgr standby unregister -f /etc/repmgr.conf --node-id=3

-- 集群删除一个primary节点

相应节点做好配置后，在此节点为上执行 repmgr -f /etc/repmgr.conf primary unregister --node-id=1

四、故障处理

当主节点故障后系统会选举出一个新主节点，然后如果故障的主节点在未修复的情况下再次启动的话是这个状态 ! running

postgres@node2:~$ repmgr -f /etc/repmgr.conf service status

----+------------+---------+-----------+------------+---------+-------+---------+--------------------

1 | node1 | primary | ! running | | running | 48488 | no | n/a

2 | node2 | primary | * running | | running | 18529 | no | n/a

我们要将1节点移除，可以在2或3上用命令，如果卸载不掉可以在结尾加上 --force 强制卸载

repmgr primary unregister --node-id 1

如果是坏了的standby节点修复后，可以再加回来

五、相关参数

log_status_interval -- 支持重载，检查repmgrd是否正常, 默认是300秒，会在日志中打印类似的信息 [2018-07-12 00:47:32] [INFO] monitoring connection to upstream node "node1" (ID: 1)

monitor_interval_secs -- 支持重载，检查主库节点的状态的间隔时间，默认是2秒

connection_check_type -- 探测主库是否可用, 1. ping(默认) 2.connection 仅建立连接 3.query发起一个select查询

reconnect_attempts -- 当主库连不上的时侯，再次尝试连接的次数，默认6次

reconnect_interval -- 当主库连不上的时侯，二次连接的时间间隔，默认10秒

最后编辑于：2022.03.11 00:00:02

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 220,295评论 6赞 512
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 93,928评论 3赞 396
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 166,682评论 0赞 357
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 59,209评论 1赞 295
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 68,237评论 6赞 397
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 51,965评论 1赞 308
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 40,586评论 3赞 420
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 39,487评论 0赞 276
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 46,016评论 1赞 319
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 38,136评论 3赞 340
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 40,271评论 1赞 352
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 35,948评论 5赞 347
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 41,619评论 3赞 331
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 32,139评论 0赞 23
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 33,252评论 1赞 272
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 48,598评论 3赞 375
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 45,267评论 2赞 358