PostgreSQL-Patroni高可用启动报错

环境说明

一主两从:

角色 主机 组件
pa-pg-1 172.22.138.220 etcd、patroni、postgresql
pa-pg-2 172.22.138.219 etcd、patroni、postgresql
pa-pg-3 172.22.138.218 etcd、patroni、postgresql

安装部署好Patroni高可用后,主库pa-pg-1和从库pa-pg-3均启动正常,而pa-pg-2的patroni日志一直输出以下报错:

...
Aug 15 10:09:56 DB-test2 patroni: 2024-08-15 11:09:56,419 INFO: Lock owner: pa-pg-1; I am pa-pg-2
Aug 15 10:09:56 DB-test2 patroni: 2024-08-15 11:09:56,421 INFO: Local timeline=3 lsn=0/DB000110
Aug 15 10:09:56 DB-test2 patroni: 2024-08-15 11:09:56,422 ERROR: Exception when working with leader
Aug 15 10:09:56 DB-test2 patroni: Traceback (most recent call last):
Aug 15 10:09:56 DB-test2 patroni: File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/rewind.py", line 80, in check_leader_is_not_in_recovery
Aug 15 10:09:56 DB-test2 patroni: with get_connection_cursor(connect_timeout=3, options='-c statement_timeout=2000', **conn_kwargs) as cur:
Aug 15 10:09:56 DB-test2 patroni: File "/usr/lib64/python3.6/contextlib.py", line 81, in __enter__
Aug 15 10:09:56 DB-test2 patroni: return next(self.gen)
Aug 15 10:09:56 DB-test2 patroni: File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/connection.py", line 157, in get_connection_cursor
Aug 15 10:09:56 DB-test2 patroni: conn = psycopg.connect(**kwargs)
Aug 15 10:09:56 DB-test2 patroni: File "/usr/local/lib/python3.6/site-packages/patroni/psycopg.py", line 104, in connect
Aug 15 10:09:56 DB-test2 patroni: ret = _connect(*args, **kwargs)
Aug 15 10:09:56 DB-test2 patroni: File "/usr/local/lib64/python3.6/site-packages/psycopg2/__init__.py", line 126, in connect
Aug 15 10:09:56 DB-test2 patroni: File "/usr/local/lib64/python3.6/site-packages/psycopg2/extensions.py", line 175, in make_dsn
Aug 15 10:09:56 DB-test2 patroni: psycopg2.ProgrammingError: invalid dsn: invalid connection option "target_session_attrs"
Aug 15 10:09:56 DB-test2 patroni: 2024-08-15 11:09:56,425 INFO: no action. I am (pa-pg-2), a secondary, and following a leader (pa-pg-1)
...

排查思路

1.首先一直关注的报错是"ERROR: Exception when working with leader"所以按照这个报错,首先去排查了patroni各节点的状态,但是通过节点状态并没有发现什么明显异常。

postgres # patronictl -c /etc/patroni.yml list
+ Cluster: pgsql (7400977424401590005) ----------+----+-----------+-----------------+-----------------------------+
| Member  | Host           | Role    | State     | TL | Lag in MB | Pending restart | Pending restart reason      |
+---------+----------------+---------+-----------+----+-----------+-----------------+-----------------------------+
| pa-pg-1 | 172.22.138.220 | Leader  | running   |  3 |           |                 |                             |
| pa-pg-2 | 172.22.138.219 | Replica | streaming |  3 |         0 |                 |                             |
| pa-pg-3 | 172.22.138.218 | Replica | streaming |  3 |         0 | *               | max_connections: 10000->100 |
+---------+----------------+---------+-----------+----+-----------+-----------------+-----------------------------+

2.排查主库的pg_stat_replication和从库的pg_wal_stat_reciver,检查后未见明显异常

postgres=# select * from pg_stat_replication;
+-------+----------+---------+------------------+----------------+-----------------+-------------+-------------------------------+--------------+-----------+------------+------------+------------+------------+-----------+-----------+------------+---------------+------------+-------------------------------+
|  pid  | usesysid | usename | application_name |  client_addr   | client_hostname | client_port |         backend_start         | backend_xmin |   state   |  sent_lsn  | write_lsn  | flush_lsn  | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state |          reply_time           |
+-------+----------+---------+------------------+----------------+-----------------+-------------+-------------------------------+--------------+-----------+------------+------------+------------+------------+-----------+-----------+------------+---------------+------------+-------------------------------+
| 12368 |    16384 | replica | pa-pg-2          | 172.22.138.219 |                 |       47170 | 2024-08-15 10:23:36.860733+08 |              | streaming | 0/DC000000 | 0/DC000000 | 0/DC000000 | 0/DC000000 |           |           |            |             0 | async      | 2024-08-15 11:10:18.855529+08 |
| 11101 |    16384 | replica | pa-pg-3          | 172.22.138.218 |                 |       55800 | 2024-08-15 09:57:31.073827+08 |              | streaming | 0/DC000000 | 0/DC000000 | 0/DC000000 | 0/DC000000 |           |           |            |             0 | async      | 2024-08-15 11:10:19.992588+08 |
+-------+----------+---------+------------------+----------------+-----------------+-------------+-------------------------------+--------------+-----------+------------+------------+------------+------------+-----------+-----------+------------+---------------+------------+-------------------------------+
...
postgres=# select * from pg_stat_wal_receiver;
-[ RECORD 1 ]---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid                   | 18720
status                | streaming
receive_start_lsn     | 0/DB000000
receive_start_tli     | 3
written_lsn           | 0/DC000000
flushed_lsn           | 0/DC000000
received_tli          | 3
last_msg_send_time    | 2024-08-15 11:11:08.623733+08
last_msg_receipt_time | 2024-08-15 11:11:08.623328+08
latest_end_lsn        | 0/DC000000
latest_end_time       | 2024-08-15 11:09:52.14472+08
slot_name             | pa_pg_2
sender_host           | 172.22.138.220
sender_port           | 5432
conninfo              | user=replica passfile=/home/postgres/pgpass channel_binding=prefer dbname=replication host=172.22.138.220 port=5432 application_name=pa-pg-2 fallback_application_name=pgsql sslmode=prefer sslcompression=0 sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsrvname=postgres target_session_attrs=any

3.查看postgresql的server日志,更是一点信息都没有。

2024-08-15 11:07:18 CST--- :LOG:  redo starts at 0/DA0025B8
2024-08-15 11:07:18 CST--- :LOG:  consistent recovery state reached at 0/DB000110
2024-08-15 11:07:18 CST--- :LOG:  invalid record length at 0/DB000110: wanted 24, got 0
2024-08-15 11:07:18 CST--- :LOG:  database system is ready to accept read-only connections
2024-08-15 11:07:18 CST--- :LOG:  started streaming WAL from primary at 0/DB000000 on timeline 3
2024-08-15 11:10:06 CST--- :LOG:  received fast shutdown request
2024-08-15 11:10:06 CST--- :LOG:  aborting any active transactions
2024-08-15 11:10:06 CST-postgres-127.0.0.1-Patroni heartbeat :FATAL:  terminating connection due to administrator command
2024-08-15 11:10:06 CST--- :FATAL:  terminating walreceiver process due to administrator command
2024-08-15 11:10:06 CST--- :LOG:  shutting down
2024-08-15 11:10:06 CST--- :LOG:  database system is shut down
2024-08-15 11:10:10 CST--- :LOG:  starting PostgreSQL 14.13 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
2024-08-15 11:10:10 CST--- :LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-08-15 11:10:10 CST--- :LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2024-08-15 11:10:10 CST--- :LOG:  database system was shut down in recovery at 2024-08-15 11:10:06 CST
2024-08-15 11:10:10 CST--- :LOG:  entering standby mode
2024-08-15 11:10:10 CST--- :LOG:  redo starts at 0/DA0025B8
2024-08-15 11:10:10 CST--- :LOG:  consistent recovery state reached at 0/DB000110
2024-08-15 11:10:10 CST--- :LOG:  invalid record length at 0/DB000110: wanted 24, got 0
2024-08-15 11:10:10 CST--- :LOG:  database system is ready to accept read-only connections
2024-08-15 11:10:10 CST--- :LOG:  started streaming WAL from primary at 0/DB000000 on timeline 3

4.随后寻求外援,通过官方github的issues,有类似报错,建议为复制用户replica添加postgres库的connect权限,司马当活马医了。去添加了相关的权限。

https://github.com/patroni/patroni/issues/2256

postgres=# grant CONNECT on database postgres to replica;
GRANT

5.重启pa-pg-2服务,查看patroni的日志发现,报错是一点没变。此时把报错的重点放在了这条上面"psycopg2.ProgrammingError: invalid dsn: invalid connection option "target_session_attrs""

Aug 15 11:09:56 DB-test2 patroni: 2024-08-15 11:09:56,422 ERROR: Exception when working with leader
Aug 15 11:09:56 DB-test2 patroni: Traceback (most recent call last):
Aug 15 11:09:56 DB-test2 patroni: File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/rewind.py", line 80, in check_leader_is_not_in_recovery
Aug 15 11:09:56 DB-test2 patroni: with get_connection_cursor(connect_timeout=3, options='-c statement_timeout=2000', **conn_kwargs) as cur:
Aug 15 11:09:56 DB-test2 patroni: File "/usr/lib64/python3.6/contextlib.py", line 81, in __enter__
Aug 15 11:09:56 DB-test2 patroni: return next(self.gen)
Aug 15 11:09:56 DB-test2 patroni: File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/connection.py", line 157, in get_connection_cursor
Aug 15 11:09:56 DB-test2 patroni: conn = psycopg.connect(**kwargs)
Aug 15 11:09:56 DB-test2 patroni: File "/usr/local/lib/python3.6/site-packages/patroni/psycopg.py", line 104, in connect
Aug 15 11:09:56 DB-test2 patroni: ret = _connect(*args, **kwargs)
Aug 15 11:09:56 DB-test2 patroni: File "/usr/local/lib64/python3.6/site-packages/psycopg2/__init__.py", line 126, in connect
Aug 15 11:09:56 DB-test2 patroni: File "/usr/local/lib64/python3.6/site-packages/psycopg2/extensions.py", line 175, in make_dsn
Aug 15 11:09:56 DB-test2 patroni: psycopg2.ProgrammingError: invalid dsn: invalid connection option "target_session_attrs"
Aug 15 11:09:56 DB-test2 patroni: 2024-08-15 11:09:56,425 INFO: no action. I am (pa-pg-2), a secondary, and following a leader (pa-pg-1)

6.通过官方github的issues,建议将psycopg2模块的版本修改为psycopg2==2.8.6,但是我的机服务器上,没有这个模块,正常启动的两台也没有这个模块。但是当我检查这个相关模块时发现了问题。

https://github.com/patroni/patroni/issues/1969

#pa-pg-1和pa-pg-3
root # pip3 list|grep psycopg2
psycopg2-binary    2.9.8

#pa-pg-2
psycopg2-binary    2.9.5

7.通过上方发现patroni的所需要的模块psycopg2-binary,三台节点唯有启动报错的pa-pg-2版本与其他两台不一致。所以决定在pa-pg-2节点重装这个模块。重装后pa-pg-2的patroni日志不再输出报错。

root # pip3 uninstall psycopg2-binary
root # pip3 install psycopg2-binary==2.9.8
...
Aug 15 11:23:14 DB-test2 systemd: Started patroni - a high-availability PostgreSQL.
Aug 15 11:23:14 DB-test2 systemd-logind: New session 886 of user root.
Aug 15 11:23:14 DB-test2 systemd: Started Session 886 of user root.
Aug 15 11:23:14 DB-test2 systemd-logind: Removed session 886.
Aug 15 11:23:15 DB-test2 patroni: postgresql parameter listen_addresses=0.0.0.0 failed validation, defaulting to None
Aug 15 11:23:15 DB-test2 patroni: postgresql parameter port=5432 failed validation, defaulting to None
Aug 15 11:23:15 DB-test2 patroni: 2024-08-15 11:23:15,225 INFO: No PostgreSQL configuration items changed, nothing to reload.
Aug 15 11:23:15 DB-test2 patroni: localhost:5432 - accepting connections
Aug 15 11:23:15 DB-test2 patroni: 2024-08-15 11:23:15,233 INFO: establishing a new patroni heartbeat connection to postgres
Aug 15 11:23:15 DB-test2 patroni: 2024-08-15 11:23:15,254 INFO: no action. I am (pa-pg-2), a secondary, and following a leader (pa-pg-1)
Aug 15 11:23:16 DB-test2 systemd-logind: New session 887 of user root.
Aug 15 11:23:16 DB-test2 systemd: Started Session 887 of user root.
Aug 15 11:23:16 DB-test2 systemd-logind: Removed session 887.
Aug 15 11:23:16 DB-test2 patroni: 2024-08-15 11:23:16,423 INFO: no action. I am (pa-pg-2), a secondary, and following a leader (pa-pg-1)

总结

在进行集群相关的应用部署时,只是检查好相关模块安装成功与否还不够,还需要更细致的检查相关的版本号是否一致。

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 204,921评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 87,635评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 151,393评论 0 338
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,836评论 1 277
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,833评论 5 368
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,685评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,043评论 3 399
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,694评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 42,671评论 1 300
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,670评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,779评论 1 332
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,424评论 4 321
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,027评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,984评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,214评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,108评论 2 351
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,517评论 2 343

推荐阅读更多精彩内容