说明

基于patroni和Etcd部署的Postgresql12高可用主从同步方案.
其中etcd和pg+patroni是docker部署, keepalived是虚机部署.
我将pg和patroni打为了一个镜像. pg版本为12.
需要的脚本和配置文件均在https://e.coding.net/luangeng/icmp/hapg.git

部署规划:

计划使用3台虚拟机安装高可用pg集群, ip分别为ip1,ip2,ip3

ip1: etcd1 keepalived主 pg+patroni1
ip2: etcd2 keepalived备 pg+patroni2
ip3: etcd3
虚拟IP: 供客户端访问

安装步骤:

安装docker(三台全部需要)
启动etcd实例(三台分别部署etcd1~3)
构建pg+patroni镜像(ip1和ip2需要)
启动pg+patroni容器(ip1和ip2需要)
安装keepalived(ip1和ip2需要)

详细步骤

一. 安装docker-ce

下载docker包,地址:https://download.docker.com/linux/static/stable/x86_64/
创建docker.service文件

[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target firewalld.service
Wants=network-online.target
[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd
ExecReload=/bin/kill -s HUP $MAINPID
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
# Uncomment TasksMax if your systemd version supports it.
# Only systemd 226 and above support this version.
#TasksMax=infinity
TimeoutStartSec=0
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes
# kill only the docker process, not all processes in the cgroup
KillMode=process
# restart the docker process if it exits prematurely
Restart=on-failure
StartLimitBurst=3
StartLimitInterval=60s
 
[Install]
WantedBy=multi-user.target

执行以下脚本进行安装:

tar -zxvf docker-18.06.3-ce.tgz 
cp docker/* /usr/bin/

cp docker.service /etc/systemd/system/docker.service

chmod +x /etc/systemd/system/docker.service
systemctl daemon-reload
systemctl start docker
systemctl enable docker.service  

docker version
docker ps
docker images

二. 启动Etcd集群

离线导入Etcd镜像(或者在线自动下载)

docker load  -i  etcd.image

Etcd启动脚本如下,需修改IP, 修改NODE=1~3分别部署三个

ETCD_VERSION=v3.4.14
TOKEN=my-etcd-token
CLUSTER_STATE=new
NAME_1=etcd-node-0
NAME_2=etcd-node-1
NAME_3=etcd-node-2
HOST_1=172.21.53.169
HOST_2=172.21.53.164
HOST_3=172.21.53.170
CLUSTER=${NAME_1}=http://${HOST_1}:2380,${NAME_2}=http://${HOST_2}:2380,${NAME_3}=http://${HOST_3}:2380

NODE=1  #安装第几台修改为几

if [ "$NODE" = 1 ] ; then
# For node 1
THIS_NAME=${NAME_1}
THIS_IP=${HOST_1}
sudo docker run --net=host -d --restart always --name etcd quay.io/coreos/etcd:${ETCD_VERSION} \
    /usr/local/bin/etcd \
    --data-dir=data.etcd --name ${THIS_NAME} \
    --initial-advertise-peer-urls http://${THIS_IP}:2380 --listen-peer-urls http://${THIS_IP}:2380 \
    --advertise-client-urls http://${THIS_IP}:2379 --listen-client-urls http://${THIS_IP}:2379 \
    --initial-cluster ${CLUSTER} \
    --initial-cluster-state ${CLUSTER_STATE} --initial-cluster-token ${TOKEN} \
    --enable-v2=true
fi

if [ "$NODE" = 2 ] ; then
# For node 2
THIS_NAME=${NAME_2}
THIS_IP=${HOST_2}
sudo docker run --net=host -d --restart always --name etcd quay.io/coreos/etcd:${ETCD_VERSION} \
    /usr/local/bin/etcd \
    --data-dir=data.etcd --name ${THIS_NAME} \
    --initial-advertise-peer-urls http://${THIS_IP}:2380 --listen-peer-urls http://${THIS_IP}:2380 \
    --advertise-client-urls http://${THIS_IP}:2379 --listen-client-urls http://${THIS_IP}:2379 \
    --initial-cluster ${CLUSTER} \
    --initial-cluster-state ${CLUSTER_STATE} --initial-cluster-token ${TOKEN} \
    --enable-v2=true
fi

if [ "$NODE" = 3 ] ; then
# For node 3
THIS_NAME=${NAME_3}
THIS_IP=${HOST_3}
sudo docker run --net=host -d --restart always --name etcd quay.io/coreos/etcd:${ETCD_VERSION} \
    /usr/local/bin/etcd \
    --data-dir=data.etcd --name ${THIS_NAME} \
    --initial-advertise-peer-urls http://${THIS_IP}:2380 --listen-peer-urls http://${THIS_IP}:2380 \
    --advertise-client-urls http://${THIS_IP}:2379 --listen-client-urls http://${THIS_IP}:2379 \
    --initial-cluster ${CLUSTER} \
    --initial-cluster-state ${CLUSTER_STATE} --initial-cluster-token ${TOKEN} \
    --enable-v2=true
fi

检查Etcd集群是否启动成功

docker exec etcd etcdctl --endpoints=ip1:2379,ip2:2379,ip3:2379 endpoint status --write-out=table

三. 构建pg+patroni镜像

下载Dockerfile和其他配置文件
构建hapg镜像, 第一次比较慢

docker build -t hapg:v1 .

查看镜像是否构建成功

docker images

四. 启动pg+patroni容器

运行启动脚本

docker run -d --name pg \
--privileged=true \
--net=host \
--restart always \
-e MY_IP=ip1 -e IP_2=ip2 -e ETCDS=ip1:2379,ip2:2379,ip3:2379 \
-v /var/lib/postgresql/data:/var/lib/postgresql/data \
hapg:v1

通过docker logs -f pg查看启动是否报错
通过访问http://ip1:8008/cluster查看pg主从是否正常, 正确的话显示如下:

{
"members": [
{
"name": "192.168.56.113",
"role": "replica",
"state": "running",
"api_url": "http://192.168.56.113:8008/patroni",
"host": "192.168.56.113",
"port": 5432,
"timeline": 20,
"lag": 0
},
{
"name": "192.168.56.114",
"role": "leader",
"state": "running",
"api_url": "http://192.168.56.114:8008/patroni",
"host": "192.168.56.114",
"port": 5432,
"timeline": 20
}
]
}

通过关闭master节点pg验证故障切换是否正常

docker stop pg
或
docker kill pg

五. 安装启动Keepalived

可通过apt install keepalived或其他方式安装
主keepalived配置文件:

# 全局配置
global_defs {
   router_id LVS_DEVEL
   vrrp_skip_check_adv_addr

   #vrrp_strict
   vrrp_garp_interval 0
   vrrp_gna_interval 0
}

# 检查pg是否是主的脚本
vrrp_script chk_pg {
    script "sh /etc/keepalived/check.sh"
    interval 2
    weight -20 
}
# vrrp配置虚IP
vrrp_instance VI_1 {
    state MASTER
    # 绑定的网卡
    interface enp0s8
    # 虚拟路由id  两台机器需保持一致
    virtual_router_id 151
    # 优先级 MASTER的值要大于BACKUP
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    # 虚拟IP地址 两台keepalived需要一致
    virtual_ipaddress {
        192.168.56.115
    }
    # 检查脚本 vrrp_script的名字
    track_script {
        chk_pg
    }
}

备keepalived配置文件:

# 全局配置
global_defs {
   router_id LVS_DEVEL
   vrrp_skip_check_adv_addr

   #vrrp_strict
   vrrp_garp_interval 0
   vrrp_gna_interval 0
}

# 检查pg是否是主的脚本
vrrp_script chk_pg {
    script "sh /etc/keepalived/check.sh"
    interval 2
    weight -20 
}
# vrrp配置虚IP
vrrp_instance VI_1 {
    state BACKUP
    # 绑定的网卡, 按ip addr修改
    interface enp0s8
    # 虚拟路由id  两台机器需保持一致
    virtual_router_id 151
    # 优先级 MASTER的值要大于BACKUP
    priority 90
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    # 虚拟IP地址 两台keepalived需要一致
    virtual_ipaddress {
        192.168.56.115
    }
    # 检查脚本 vrrp_script的名字
    track_script {
        chk_pg
    }
}

3./etc/keepalived/check.sh如下

code=$(curl -l -m 10 -o /dev/null -s -w %{http_code} http://localhost:8008/)
if [ "$code" = 200 ] ; then
    exit 0
else
    exit 1
fi

通过systemctl restart keepalived 重新启动keepalived
通过journalctl -f -u keepalived 查看启动是否报错
通过ip addr查看虚IP是否创建
通过关闭pg1查看虚ip漂移是否正确

六. 启用HAProxy(可选)

HAProxy已经打入hapg镜像, 其配置文件如下

global
    maxconn 100

defaults
    log global
    mode tcp
    retries 2
    timeout client 30m
    timeout connect 4s
    timeout server 30m
    timeout check 5s

listen stats
    mode http
    bind *:7000
    stats enable
    stats uri /

listen lg
    bind *:5000
    option httpchk
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server postgresql_${MY_IP} ${MY_IP}:5432 maxconn 100 check port 8008
    server postgresql_${IP_2} ${IP_2}:5432 maxconn 100 check port 8008

七. patroictl 的操作

./patronictl -c postgres.yml list cluster_name  #查看集群
./patronictl -c postgres.yml remove cluster_name  #移除Etcd中的集群信息

八. 参考

Postgresql高可用Docker部署:pg+patroni+keepalived