准备
环境规划
我们准备使用Kubernetes来拉起并管理整个ES的集群。
参考: https://github.com/pires/kubernetes-elasticsearch-cluster
Node Type | COUNT | MEM | JVM |
---|---|---|---|
Master | 3 | 16GB | 512MB |
Client | 2 | 16GB | 6GB |
Data | 3 | 16GB | 8GB |
Master 节点
Master节点不做特殊准备,和业务节点混用Minion。(节省资源)
Client节点
原则上推荐Client也用独立的Minion节点,如果资源充足,可如此。
我们也采用节省资源模式,和业务节点混用。
Kubernetes在调配Pod的时候有时候会很Stupid,拉起节点的时候,可以检测下相应节点的状态。
Data节点准备
由于ES的Data节点为有状态服务,需要持久化数据入磁盘,故准备三台机器来实现数据的存储。
Minion Node | CPU | MEM | DISK |
---|---|---|---|
minion18 | 4 Core | 16GB | 500GB |
minion19 | 4 Core | 16GB | 500GB |
minion20 | 4 Core | 16GB | 500GB |
部署架构
Kubernetes集群准备
Kubernetes 集群存在, Kubernetes集群搭建在此就不再累述,本人已有文章专门阐述。也可以参考Kubernetes官网进行安装。
Data节点加入集群
将Minion18,Minion19, Minion20加入集群。
[centos@master1 ~]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 Ready master 54d v1.10.3
master2 Ready master 54d v1.10.3
master3 Ready master 54d v1.10.3
minion1 Ready <none> 35d v1.10.3
minion10 Ready <none> 35d v1.10.3
minion11 Ready <none> 5h v1.10.3
minion12 Ready <none> 35d v1.10.3
minion13 Ready <none> 1d v1.10.3
minion14 Ready <none> 35d v1.10.3
minion15 Ready <none> 35d v1.10.3
minion16 Ready <none> 35d v1.10.3
minion17 Ready <none> 7d v1.10.3
minion18 Ready <none> 2d v1.10.3
minion19 Ready <none> 2d v1.10.3
minion2 Ready <none> 7d v1.10.3
minion20 Ready <none> 2d v1.10.3
minion21 Ready <none> 7d v1.10.3
minion22 Ready <none> 7d v1.10.3
minion3 Ready <none> 35d v1.10.3
minion4 Ready <none> 35d v1.10.3
minion5 Ready <none> 35d v1.10.3
minion6 Ready <none> 35d v1.10.3
minion7 Ready <none> 35d v1.10.3
minion8 Ready <none> 35d v1.10.3
minion9 Ready <none> 35d v1.10.3
[centos@master1 ~]$ kubectl label node minion18 efknode=efk
[centos@master1 ~]$ kubectl label node minion19 efknode=efk
[centos@master1 ~]$ kubectl label node minion20 efknode=efk
Data节点文件系统准备
Option1 使用GlusterFS
依据ES的官网推荐,不太推荐使用分布式文件系统(NFS/GlusterFS等)来进行数据的存储,对ES的性能会造成很大的影响。
个人本着不死心的态度,尝试了一下GlusterFS(由于Redhat的存在,对GlusterFS还是有点情有独钟),只是ES跑在GlusterFS上会有一个恶心死人不偿命的异常,GlusterFS还是ES好像号称3.10版本后解决了,只是更新到最新版本,问题依旧。
Elasticsearch get CorruptIndexException errors when running with GlusterFS persistent storage
折腾了将近一周时间,放弃了使用GlusterFS来作为ES的存储。
Option2 使用Local Volume
将Kubernetes拉起来的GlusterFS清理掉,将清理出来的磁盘空间利用起来。
由于在拉起GlusterFS的时候,磁盘只需挂在宿主机,无须Mount,此处需要再将磁盘mount到宿主机文件系统。
我们将三个磁盘分部mount到各自宿主机 /data 目录。
[centos@minion18 ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 50G 14G 37G 28% /
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 2.6M 7.8G 1% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/mapper/docker--vg-docker--lv 200G 8.4G 192G 5% /var/lib/docker
/dev/xvdf1 493G 49M 493G 1% /data
tmpfs 1.6G 0 1.6G 0% /run/user/1000
分别为三个宿主机的Local Volume创建相应的PV。
统一使用 local-storage 作为 storageClassName,以便于后面ES Data节点拉起的时候使用相应的PV。
Minion18: es-pv-local0.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: es-local-pv0
spec:
capacity:
storage: 450Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: local-storage
local:
path: /data
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- minion18
Minion19: es-pv-local1.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: es-local-pv1
spec:
capacity:
storage: 450Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: local-storage
local:
path: /data
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- minion19
Minion20: es-pv-local2.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: es-local-pv2
spec:
capacity:
storage: 450Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: local-storage
local:
path: /data
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- minion20
在Master1节点依据上述文件创建相应的PV。
[centos@master1 ~]$ kubectl create -f es-pv-local0.yaml
[centos@master1 ~]$ kubectl create -f es-pv-local1.yaml
[centos@master1 ~]$ kubectl create -f es-pv-local2.yaml
[centos@master1 ~]$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
es-local-pv0 450Gi RWO Retain Avaliable local-storage 1d
es-local-pv1 450Gi RWO Retain Avaliable local-storage 1d
es-local-pv2 450Gi RWO Retain Avaliable local-storage 1d
[centos@master1 ~]$
ES Installation
ES相应的Yaml文件准备
[centos@master1 ~]$
[centos@master1 ~]$ cd kuberepo/
[centos@master1 kuberepo]$ cd sys-kube/
[centos@master1 sys-kube]$ cd kube-log
[centos@master1 kube-log]$ git clone https://github.com/pires/kubernetes-elasticsearch-cluster.git
ES Docker Image 准备
打开 es-master.yaml和kibana.yaml,发现Image为:
image: quay.io/pires/docker-elasticsearch-kubernetes:6.2.4
image: docker.elastic.co/kibana/kibana-oss:6.2.4
为了在拉起的时候减少等待时间,先将这两个镜像pull下来,然后tag到私库,并更改image的镜像地址为私库地址。
[centos@master1 kube-log]$ docker pull quay.io/pires/docker-elasticsearch-kubernetes:6.2.4
[centos@master1 kube-log]$ docker tag quay.io/pires/docker-elasticsearch-kubernetes:6.2.4 hub.***.***/google_containers/docker-elasticsearch-kubernetes:6.2.4
[centos@master1 kube-log]$ docker push hub.***.***/google_containers/docker-elasticsearch-kubernetes:6.2.4
[centos@master1 kube-log]$ docker pull docker.elastic.co/kibana/kibana-oss:6.2.4
[centos@master1 kube-log]$ docker tag docker.elastic.co/kibana/kibana-oss:6.2.4 hub.***.***/google_containers/kibana-oss:6.2.4
[centos@master1 kube-log]$ docker push hub.***.***/google_containers/kibana-oss:6.2.4
ES Yaml文件调整
- 调整所有Yaml文件,使得所有的Pod,Service,Deployment,StatefulSet, Role, ServiceAccount等全部隶属kube-system namespace下。
es-master.yaml
- 调整elasticsearch的镜像来源为私库。
- 调整JVM参数为512MB
完整的Yaml文件如下:
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: es-master
namespace: kube-system
labels:
component: elasticsearch
role: master
spec:
replicas: 3
template:
metadata:
labels:
component: elasticsearch
role: master
spec:
initContainers:
- name: init-sysctl
image: busybox:1.27.2
command:
- sysctl
- -w
- vm.max_map_count=262144
securityContext:
privileged: true
containers:
- name: es-master
image: hub.***.***/google_containers/docker-elasticsearch-kubernetes:6.2.4
imagePullPolicy: Always
env:
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: CLUSTER_NAME
value: myesdb
- name: NUMBER_OF_MASTERS
value: "2"
- name: NODE_MASTER
value: "true"
- name: NODE_INGEST
value: "false"
- name: NODE_DATA
value: "false"
- name: HTTP_ENABLE
value: "false"
- name: ES_JAVA_OPTS
value: -Xms512m -Xmx512m
- name: PROCESSORS
valueFrom:
resourceFieldRef:
resource: limits.cpu
resources:
limits:
cpu: 1
ports:
- containerPort: 9300
name: transport
livenessProbe:
initialDelaySeconds: 30
timeoutSeconds: 30
tcpSocket:
port: transport
volumeMounts:
- name: storage
mountPath: /data
imagePullSecrets:
- name: kube-sec
volumes:
- emptyDir:
medium: ""
name: "storage"
es-client.yaml
- 调整elasticsearch的镜像来源为私库。
- 调整JVM参数为6GB
完整的Yaml文件如下:
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: es-client
namespace: kube-system
labels:
component: elasticsearch
role: client
spec:
replicas: 2
template:
metadata:
labels:
component: elasticsearch
role: client
spec:
initContainers:
- name: init-sysctl
image: busybox:1.27.2
command:
- sysctl
- -w
- vm.max_map_count=262144
securityContext:
privileged: true
containers:
- name: es-client
#image: quay.io/pires/docker-elasticsearch-kubernetes:6.2.4
image: hub.***.***/google_containers/docker-elasticsearch-kubernetes:6.2.4
imagePullPolicy: Always
env:
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: CLUSTER_NAME
value: myesdb
- name: NODE_MASTER
value: "false"
- name: NODE_DATA
value: "false"
- name: HTTP_ENABLE
value: "true"
- name: ES_JAVA_OPTS
value: -Xms6144m -Xmx6144m
- name: NETWORK_HOST
value: _site_,_lo_
- name: PROCESSORS
valueFrom:
resourceFieldRef:
resource: limits.cpu
resources:
limits:
cpu: 2
ports:
- containerPort: 9200
name: http
- containerPort: 9300
name: transport
livenessProbe:
tcpSocket:
port: transport
readinessProbe:
httpGet:
path: /_cluster/health
port: http
initialDelaySeconds: 20
timeoutSeconds: 5
volumeMounts:
- name: storage
mountPath: /data
volumes:
- emptyDir:
medium: ""
name: storage
imagePullSecrets:
- name: kube-sec
stateful/es-data-stateful.yaml
由于生产环境的日志数据比较庞大,并且在部分ES节点down掉重新拉起后数据不丢失,我们推荐使用有状态的数据节点,亦即:stateful/es-data-stateful.yaml。
如果只是自己测试,可以直接使用es-data.yaml来拉起数据节点,用此种方式拉起的节点,重启后数据将丢失。
- 调整elasticsearch的镜像来源为私库。
- 调整JVM参数为8GB
- 增加节点选择efk节点。
完整的Yaml文件如下,请注意volumeClaimTemplates片段,使用的是我们前面准备的local-storage:
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: es-data
namespace: kube-system
labels:
component: elasticsearch
role: data
spec:
serviceName: elasticsearch-data
replicas: 3
template:
metadata:
labels:
component: elasticsearch
role: data
spec:
nodeSelector:
efknode: efk
initContainers:
- name: init-sysctl
image: busybox:1.27.2
command:
- sysctl
- -w
- vm.max_map_count=262144
securityContext:
privileged: true
containers:
- name: es-data
#image: quay.io/pires/docker-elasticsearch-kubernetes:6.2.4
image: hub.***.***/google_containers/docker-elasticsearch-kubernetes:6.2.4
imagePullPolicy: Always
env:
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: CLUSTER_NAME
value: myesdb
- name: NODE_MASTER
value: "false"
- name: NODE_INGEST
value: "false"
- name: HTTP_ENABLE
value: "false"
- name: ES_JAVA_OPTS
value: -Xms8192m -Xmx8192m
- name: PROCESSORS
valueFrom:
resourceFieldRef:
resource: limits.cpu
resources:
limits:
cpu: 1
ports:
- containerPort: 9300
name: transport
livenessProbe:
tcpSocket:
port: transport
initialDelaySeconds: 20
periodSeconds: 10
volumeMounts:
- name: storage
mountPath: /data
volumeClaimTemplates:
- metadata:
name: storage
spec:
storageClassName: local-storage
accessModes: [ ReadWriteOnce ]
resources:
requests:
storage: 450Gi
Kibana Yaml文件调整
kibana.yaml
- 调整kibana的镜像来源为私库。
- 注销SERVER_BASEPATH环境变量。---(由于我们的Kibana portal不和Kubernetes的Dashboard集成,所以此BASEPATH已不需要)。
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: kibana
namespace: kube-system
labels:
component: kibana
spec:
replicas: 1
selector:
matchLabels:
component: kibana
template:
metadata:
labels:
component: kibana
spec:
containers:
- name: kibana
image: hub.***.***/google_containers/kibana-oss:6.2.4
#image: docker.elastic.co/kibana/kibana-oss:6.2.4
imagePullPolicy: Always
env:
- name: CLUSTER_NAME
value: myesdb
#- name: SERVER_BASEPATH
# value: /api/v1/namespaces/default/services/kibana:http/proxy
resources:
limits:
cpu: 1000m
requests:
cpu: 100m
ports:
- containerPort: 5601
name: http
imagePullSecrets:
- name: kube-sec
kibana-svc.yaml
- 调整kibana的Service类型从默认的ClusterIP类型为NodePort,以便于从外部访问。
- 请注意此文件中的targetPort采用的是名字代称的形式,http 即是Kibana Deployment的 5601端口的名字。
apiVersion: v1
kind: Service
metadata:
name: kibana
namespace: kube-system
labels:
component: kibana
spec:
type: NodePort
selector:
component: kibana
ports:
- name: http
port: 80
nodePort: 30072
targetPort: http
集群拉起
ES集群拉起
按照https://github.com/pires/kubernetes-elasticsearch-cluster/tree/master/stateful页面中的建议,我们先后拉起:
[centos@master1 kube-log]$ cd kubernetes-elasticsearch-cluster/
[centos@master1 kubernetes-elasticsearch-cluster]$ kubectl create -f es-discovery-svc.yaml
[centos@master1 kubernetes-elasticsearch-cluster]$ kubectl create -f es-svc.yaml
[centos@master1 kubernetes-elasticsearch-cluster]$ kubectl create -f es-master.yaml
等待所有master节点变为ready状态:
[centos@master1 kubernetes-elasticsearch-cluster]$ kubectl get pods -n kube-system -o wide | grep es-master
es-master-7b74f78667-2n2q5 1/1 Running 0 1m 10.244.12.44 minion18
es-master-7b74f78667-gtvl5 1/1 Running 0 1m 10.244.10.74 minion21
es-master-7b74f78667-hztq9 1/1 Running 0 1m 10.244.11.89 minion22
[centos@master1 kubernetes-elasticsearch-cluster]$
拉起client节点:
[centos@master1 kubernetes-elasticsearch-cluster]$ kubectl create -f es-client.yaml
拉起data节点,此处无须等待client节点拉起:
[centos@master1 kubernetes-elasticsearch-cluster]$ kubectl create -f stateful/es-data-svc.yaml
[centos@master1 kubernetes-elasticsearch-cluster]$ kubectl create -f stateful/es-data-stateful.yaml
等待大概2m,检查数据节点和client节点:
[centos@master1 kubernetes-elasticsearch-cluster]$ kubectl get pods -n kube-system -o wide | grep es
es-client-56cdd8f49f-mjfb4 1/1 Running 0 1m 10.244.16.2 minion11
es-client-56cdd8f49f-pl46b 1/1 Running 0 1m 10.244.15.13 minion13
es-data-0 1/1 Running 0 1m 10.244.12.45 minion18
es-data-1 1/1 Running 0 1m 10.244.14.46 minion20
es-data-2 1/1 Running 0 1m 10.244.13.34 minion19
es-master-7b74f78667-2n2q5 1/1 Running 0 5m 10.244.12.44 minion18
es-master-7b74f78667-gtvl5 1/1 Running 0 5m 10.244.10.74 minion21
es-master-7b74f78667-hztq9 1/1 Running 0 5m 10.244.11.89 minion22
kubernetes-dashboard-7d5dcdb6d9-wm885 1/1 Running 0 10d 10.244.2.13 master2
[centos@master1 kubernetes-elasticsearch-cluster]$
检查PV和PVC:
[centos@master1 kubernetes-elasticsearch-cluster]$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
es-local-pv0 450Gi RWO Retain Bound kube-system/storage-es-data-0 local-storage 10m
es-local-pv1 450Gi RWO Retain Bound kube-system/storage-es-data-2 local-storage 10m
es-local-pv2 450Gi RWO Retain Bound kube-system/storage-es-data-1 local-storage 10m
[centos@master1 kubernetes-elasticsearch-cluster]$
[centos@master1 kubernetes-elasticsearch-cluster]$ kubectl get pvc -n kube-system
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
storage-es-data-0 Bound es-local-pv0 450Gi RWO local-storage 1m
storage-es-data-1 Bound es-local-pv2 450Gi RWO local-storage 1m
storage-es-data-2 Bound es-local-pv1 450Gi RWO local-storage 1m
[centos@master1 kubernetes-elasticsearch-cluster]$
Kibana 拉起
[centos@master1 kubernetes-elasticsearch-cluster]$ kubectl create -f kibana.yaml
[centos@master1 kubernetes-elasticsearch-cluster]$ kubectl create -f kibana-svc.yaml
检查Kibana和ES的链接
6.2.4版本的Kibana比以前的5版本的要友好一些。