说明
基于patroni和Etcd部署的Postgresql12高可用主从同步方案.
其中etcd和pg+patroni是docker部署, keepalived是虚机部署.
我将pg和patroni打为了一个镜像. pg版本为12.
需要的脚本和配置文件均在https://e.coding.net/luangeng/icmp/hapg.git
部署规划:
计划使用3台虚拟机安装高可用pg集群, ip分别为ip1,ip2,ip3
ip1: etcd1 keepalived主 pg+patroni1
ip2: etcd2 keepalived备 pg+patroni2
ip3: etcd3
虚拟IP: 供客户端访问
安装步骤:
- 安装docker(三台全部需要)
- 启动etcd实例(三台分别部署etcd1~3)
- 构建pg+patroni镜像(ip1和ip2需要)
- 启动pg+patroni容器(ip1和ip2需要)
- 安装keepalived(ip1和ip2需要)
详细步骤
一. 安装docker-ce
- 下载docker包,地址:https://download.docker.com/linux/static/stable/x86_64/
- 创建docker.service文件
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target firewalld.service
Wants=network-online.target
[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd
ExecReload=/bin/kill -s HUP $MAINPID
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
# Uncomment TasksMax if your systemd version supports it.
# Only systemd 226 and above support this version.
#TasksMax=infinity
TimeoutStartSec=0
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes
# kill only the docker process, not all processes in the cgroup
KillMode=process
# restart the docker process if it exits prematurely
Restart=on-failure
StartLimitBurst=3
StartLimitInterval=60s
[Install]
WantedBy=multi-user.target
- 执行以下脚本进行安装:
tar -zxvf docker-18.06.3-ce.tgz
cp docker/* /usr/bin/
cp docker.service /etc/systemd/system/docker.service
chmod +x /etc/systemd/system/docker.service
systemctl daemon-reload
systemctl start docker
systemctl enable docker.service
docker version
docker ps
docker images
二. 启动Etcd集群
- 离线导入Etcd镜像(或者在线自动下载)
docker load -i etcd.image
- Etcd启动脚本如下,需修改IP, 修改NODE=1~3分别部署三个
ETCD_VERSION=v3.4.14
TOKEN=my-etcd-token
CLUSTER_STATE=new
NAME_1=etcd-node-0
NAME_2=etcd-node-1
NAME_3=etcd-node-2
HOST_1=172.21.53.169
HOST_2=172.21.53.164
HOST_3=172.21.53.170
CLUSTER=${NAME_1}=http://${HOST_1}:2380,${NAME_2}=http://${HOST_2}:2380,${NAME_3}=http://${HOST_3}:2380
NODE=1 #安装第几台修改为几
if [ "$NODE" = 1 ] ; then
# For node 1
THIS_NAME=${NAME_1}
THIS_IP=${HOST_1}
sudo docker run --net=host -d --restart always --name etcd quay.io/coreos/etcd:${ETCD_VERSION} \
/usr/local/bin/etcd \
--data-dir=data.etcd --name ${THIS_NAME} \
--initial-advertise-peer-urls http://${THIS_IP}:2380 --listen-peer-urls http://${THIS_IP}:2380 \
--advertise-client-urls http://${THIS_IP}:2379 --listen-client-urls http://${THIS_IP}:2379 \
--initial-cluster ${CLUSTER} \
--initial-cluster-state ${CLUSTER_STATE} --initial-cluster-token ${TOKEN} \
--enable-v2=true
fi
if [ "$NODE" = 2 ] ; then
# For node 2
THIS_NAME=${NAME_2}
THIS_IP=${HOST_2}
sudo docker run --net=host -d --restart always --name etcd quay.io/coreos/etcd:${ETCD_VERSION} \
/usr/local/bin/etcd \
--data-dir=data.etcd --name ${THIS_NAME} \
--initial-advertise-peer-urls http://${THIS_IP}:2380 --listen-peer-urls http://${THIS_IP}:2380 \
--advertise-client-urls http://${THIS_IP}:2379 --listen-client-urls http://${THIS_IP}:2379 \
--initial-cluster ${CLUSTER} \
--initial-cluster-state ${CLUSTER_STATE} --initial-cluster-token ${TOKEN} \
--enable-v2=true
fi
if [ "$NODE" = 3 ] ; then
# For node 3
THIS_NAME=${NAME_3}
THIS_IP=${HOST_3}
sudo docker run --net=host -d --restart always --name etcd quay.io/coreos/etcd:${ETCD_VERSION} \
/usr/local/bin/etcd \
--data-dir=data.etcd --name ${THIS_NAME} \
--initial-advertise-peer-urls http://${THIS_IP}:2380 --listen-peer-urls http://${THIS_IP}:2380 \
--advertise-client-urls http://${THIS_IP}:2379 --listen-client-urls http://${THIS_IP}:2379 \
--initial-cluster ${CLUSTER} \
--initial-cluster-state ${CLUSTER_STATE} --initial-cluster-token ${TOKEN} \
--enable-v2=true
fi
- 检查Etcd集群是否启动成功
docker exec etcd etcdctl --endpoints=ip1:2379,ip2:2379,ip3:2379 endpoint status --write-out=table
三. 构建pg+patroni镜像
- 下载Dockerfile和其他配置文件
- 构建hapg镜像, 第一次比较慢
docker build -t hapg:v1 .
- 查看镜像是否构建成功
docker images
四. 启动pg+patroni容器
- 运行启动脚本
docker run -d --name pg \
--privileged=true \
--net=host \
--restart always \
-e MY_IP=ip1 -e IP_2=ip2 -e ETCDS=ip1:2379,ip2:2379,ip3:2379 \
-v /var/lib/postgresql/data:/var/lib/postgresql/data \
hapg:v1
- 通过
docker logs -f pg查看启动是否报错 - 通过访问
http://ip1:8008/cluster查看pg主从是否正常, 正确的话显示如下:
{
"members": [
{
"name": "192.168.56.113",
"role": "replica",
"state": "running",
"api_url": "http://192.168.56.113:8008/patroni",
"host": "192.168.56.113",
"port": 5432,
"timeline": 20,
"lag": 0
},
{
"name": "192.168.56.114",
"role": "leader",
"state": "running",
"api_url": "http://192.168.56.114:8008/patroni",
"host": "192.168.56.114",
"port": 5432,
"timeline": 20
}
]
}
- 通过关闭master节点pg验证故障切换是否正常
docker stop pg
或
docker kill pg
五. 安装启动Keepalived
- 可通过
apt install keepalived或其他方式安装 - 主keepalived配置文件:
# 全局配置
global_defs {
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
#vrrp_strict
vrrp_garp_interval 0
vrrp_gna_interval 0
}
# 检查pg是否是主的脚本
vrrp_script chk_pg {
script "sh /etc/keepalived/check.sh"
interval 2
weight -20
}
# vrrp配置虚IP
vrrp_instance VI_1 {
state MASTER
# 绑定的网卡
interface enp0s8
# 虚拟路由id 两台机器需保持一致
virtual_router_id 151
# 优先级 MASTER的值要大于BACKUP
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
# 虚拟IP地址 两台keepalived需要一致
virtual_ipaddress {
192.168.56.115
}
# 检查脚本 vrrp_script的名字
track_script {
chk_pg
}
}
- 备keepalived配置文件:
# 全局配置
global_defs {
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
#vrrp_strict
vrrp_garp_interval 0
vrrp_gna_interval 0
}
# 检查pg是否是主的脚本
vrrp_script chk_pg {
script "sh /etc/keepalived/check.sh"
interval 2
weight -20
}
# vrrp配置虚IP
vrrp_instance VI_1 {
state BACKUP
# 绑定的网卡, 按ip addr修改
interface enp0s8
# 虚拟路由id 两台机器需保持一致
virtual_router_id 151
# 优先级 MASTER的值要大于BACKUP
priority 90
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
# 虚拟IP地址 两台keepalived需要一致
virtual_ipaddress {
192.168.56.115
}
# 检查脚本 vrrp_script的名字
track_script {
chk_pg
}
}
3./etc/keepalived/check.sh如下
code=$(curl -l -m 10 -o /dev/null -s -w %{http_code} http://localhost:8008/)
if [ "$code" = 200 ] ; then
exit 0
else
exit 1
fi
- 通过
systemctl restart keepalived重新启动keepalived - 通过
journalctl -f -u keepalived查看启动是否报错 - 通过
ip addr查看虚IP是否创建 - 通过关闭pg1查看虚ip漂移是否正确
六. 启用HAProxy(可选)
HAProxy已经打入hapg镜像, 其配置文件如下
global
maxconn 100
defaults
log global
mode tcp
retries 2
timeout client 30m
timeout connect 4s
timeout server 30m
timeout check 5s
listen stats
mode http
bind *:7000
stats enable
stats uri /
listen lg
bind *:5000
option httpchk
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server postgresql_${MY_IP} ${MY_IP}:5432 maxconn 100 check port 8008
server postgresql_${IP_2} ${IP_2}:5432 maxconn 100 check port 8008
七. patroictl 的操作
./patronictl -c postgres.yml list cluster_name #查看集群
./patronictl -c postgres.yml remove cluster_name #移除Etcd中的集群信息
八. 参考