参考: https://kubernetes.io/zh/docs/tutorials/stateful-application/zookeeper/#tolerating-node-failure
背景
使用 移动云的容器服务部署zookeeper集群
注意事项
创建statefulset时每个pod需要创建对应的持久化存储,本次案例中每个pod会动态分配一个20GB的磁盘,因此需要为每个 Pod 准备的 PersistentVolume。
由于移动云存储使用的storageClass为csi-ecloud-ebs-ceph-cache-ext4,因此在配置volumeClaimTemplates时需要注意。
如下为statefulset的spce中volumeClaimTemplates:
volumeClaimTemplates:
- metadata:
name: datadir
annotations:
volume.beta.kubernetes.io/storage-class: "csi-ecloud-ebs-ceph-cache-ext4"
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi
应用部署
- zk statefulset的创建
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: zk
namespace: zookeeper
spec:
selector:
matchLabels:
app: zk
serviceName: zk-hs
replicas: 3
updateStrategy:
type: RollingUpdate
podManagementPolicy: OrderedReady
template:
metadata:
labels:
app: zk
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- zk
topologyKey: "kubernetes.io/hostname"
containers:
- name: kubernetes-zookeeper
imagePullPolicy: Always
image: "leolee32/kubernetes-library:kubernetes-zookeeper1.0-3.4.10"
resources:
requests:
memory: "1Gi"
cpu: "0.5"
ports:
- containerPort: 2181
name: client
- containerPort: 2888
name: server
- containerPort: 3888
name: leader-election
command:
- sh
- -c
- "start-zookeeper \
--servers=3 \
--data_dir=/var/lib/zookeeper/data \
--data_log_dir=/var/lib/zookeeper/data/log \
--conf_dir=/opt/zookeeper/conf \
--client_port=2181 \
--election_port=3888 \
--server_port=2888 \
--tick_time=2000 \
--init_limit=10 \
--sync_limit=5 \
--heap=512M \
--max_client_cnxns=60 \
--snap_retain_count=3 \
--purge_interval=12 \
--max_session_timeout=40000 \
--min_session_timeout=4000 \
--log_level=INFO"
readinessProbe:
exec:
command:
- sh
- -c
- "zookeeper-ready 2181"
initialDelaySeconds: 10
timeoutSeconds: 5
livenessProbe:
exec:
command:
- sh
- -c
- "zookeeper-ready 2181"
initialDelaySeconds: 10
timeoutSeconds: 5
volumeMounts:
- name: datadir
mountPath: /var/lib/zookeeper
securityContext:
runAsUser: 1000
fsGroup: 1000
volumeClaimTemplates:
- metadata:
name: datadir
annotations:
volume.beta.kubernetes.io/storage-class: "csi-ecloud-ebs-ceph-cache-ext4"
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi
- 观察应用部署状态
kubectl get pods -w -l app=zk
- 创建pdb
# cat zk-pdb.yaml
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: zk-pdb
spec:
selector:
matchLabels:
app: zk
maxUnavailable: 1
- 创建Service
# cat zookeeper-hs-svc.yaml
apiVersion: v1
kind: Service
metadata:
name: zk-hs
namespace: zookeeper
labels:
app: zk
spec:
ports:
- port: 2888
name: server
- port: 3888
name: leader-election
clusterIP: None
selector:
app: zk
# cat zookeeper-cs-svc.yaml
apiVersion: v1
kind: Service
metadata:
name: zk-cs
namespace: zookeeper
labels:
app: zk
spec:
ports:
- port: 2181
name: client
selector:
app: zk
STS重点配置讲解
- pod非亲和性,保证应用跨主机部署
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- zk
topologyKey: "kubernetes.io/hostname"
- 配置非特权用户
保证应用进程的运行用户的zookeeper
securityContext:
runAsUser: 1000
fsGroup: 1000
管理Zookeeper进程
1.更新cpus分配数量
kubectl patch sts zk --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value":"0.3"}]'
- 观察更新状态
kubectl rollout status sts/zk
- 查看历史或先前配置
kubectl rollout history sts/zk
- 撤销改动
kubectl rollout undo sts/zk
处理故障进程
重启策略控制k8s处理一个pod容器入口点的进程故障。 对于 StatefulSet 中的 Pods 来说,Always 是唯一合适的 RestartPolicy,也是默认值。 你应该绝不覆盖有状态应用的默认策略。
kubectl exec zk-0 -- pkill java
kubectl get pod -w -l app=zk
存活性测试
livenessProbe:
exec:
command:
- sh
- -c
- "zookeeper-ready 2181"
initialDelaySeconds: 15
timeoutSeconds: 5
这个探针实际上是调用一个bash脚本,使用 ZooKeeper 的四字缩写 ruok 来测试服务器的健康状态:
OK=$(echo ruok | nc 127.0.0.1 $1)
if [ "$OK" == "imok" ]; then
exit 0
else
exit 1
fi
就绪性测试
就绪代表应用可以处理业务的能力
readinessProbe:
exec:
command:
- sh
- -c
- "zookeeper-ready 2181"
initialDelaySeconds: 15
timeoutSeconds: 5