环境
环境 | 版本 |
---|---|
Kubernetes | 1.11.6 |
Rancher | 2.1.6 |
rook | 0.9.2 |
kubectl | 1.11.6 |
使用背景
在使用ceph前使用heketi搭建的glusterfs cluster用于kubernetes storageclass存储,但发现了一些问题无法解决:
-
glusterfs 各个节点之间会建立大量的TCP连接用于支持节点间的数据通讯
netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c
统计查看到系统会建立数万个连接,几乎会耗尽服务器资源,目前仅仅可以通过修改
net.ipv4.ip_local_port_range
来限制各个节点本地创建的连接数量。245 ESTABLISHED 29 LISTEN 37772 TIME_WAIT
使用过程中pv卷删除后 gluster Self Heal会继续检查这些删除的卷,错误日志导致系统glustershd.log打满
支持的Heketi创建的cluster环境,heketi存在单点故障导致集群无法创建新的数据卷问题,暂无集群解决方案(除Openshift Origin中方案,在kubernetes中使用容器方式创建glusterfs cluster,但是会存在系统升级时候的存储问题,有坑)。
搭建步骤
-
下载rook安装文件
git clone https://github.com/rook/rook.git && git checkout v0.9.2 && cd cluster/examples/kubernetes/ceph
-
修改配置文件 operator.yaml,如下可供参考
apiVersion: v1 kind: Namespace metadata: name: rook-ceph-system --- apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: name: cephclusters.ceph.rook.io spec: group: ceph.rook.io names: kind: CephCluster listKind: CephClusterList plural: cephclusters singular: cephcluster scope: Namespaced version: v1 validation: openAPIV3Schema: properties: spec: properties: cephVersion: properties: allowUnsupported: type: boolean image: type: string name: pattern: ^(luminous|mimic|nautilus)$ type: string dashboard: properties: enabled: type: boolean urlPrefix: type: string port: type: integer dataDirHostPath: pattern: ^/(\S+) type: string mon: properties: allowMultiplePerNode: type: boolean count: maximum: 9 minimum: 1 type: integer required: - count network: properties: hostNetwork: type: boolean storage: properties: nodes: items: {} type: array useAllDevices: {} useAllNodes: type: boolean required: - mon additionalPrinterColumns: - name: DataDirHostPath type: string description: Directory used on the K8s nodes JSONPath: .spec.dataDirHostPath - name: MonCount type: string description: Number of MONs JSONPath: .spec.mon.count - name: Age type: date JSONPath: .metadata.creationTimestamp - name: State type: string description: Current State JSONPath: .status.state --- apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: name: cephfilesystems.ceph.rook.io spec: group: ceph.rook.io names: kind: CephFilesystem listKind: CephFilesystemList plural: cephfilesystems singular: cephfilesystem scope: Namespaced version: v1 additionalPrinterColumns: - name: MdsCount type: string description: Number of MDSs JSONPath: .spec.metadataServer.activeCount - name: Age type: date JSONPath: .metadata.creationTimestamp --- apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: name: cephobjectstores.ceph.rook.io spec: group: ceph.rook.io names: kind: CephObjectStore listKind: CephObjectStoreList plural: cephobjectstores singular: cephobjectstore scope: Namespaced version: v1 --- apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: name: cephobjectstoreusers.ceph.rook.io spec: group: ceph.rook.io names: kind: CephObjectStoreUser listKind: CephObjectStoreUserList plural: cephobjectstoreusers singular: cephobjectstoreuser scope: Namespaced version: v1 --- apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: name: cephblockpools.ceph.rook.io spec: group: ceph.rook.io names: kind: CephBlockPool listKind: CephBlockPoolList plural: cephblockpools singular: cephblockpool scope: Namespaced version: v1 --- apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: name: volumes.rook.io spec: group: rook.io names: kind: Volume listKind: VolumeList plural: volumes singular: volume shortNames: - rv scope: Namespaced version: v1alpha2 --- # The cluster role for managing all the cluster-specific resources in a namespace apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRole metadata: name: rook-ceph-cluster-mgmt labels: operator: rook storage-backend: ceph rules: - apiGroups: - "" resources: - secrets - pods - pods/log - services - configmaps verbs: - get - list - watch - patch - create - update - delete - apiGroups: - extensions resources: - deployments - daemonsets - replicasets verbs: - get - list - watch - create - update - delete --- # The role for the operator to manage resources in the system namespace apiVersion: rbac.authorization.k8s.io/v1beta1 kind: Role metadata: name: rook-ceph-system namespace: rook-ceph-system labels: operator: rook storage-backend: ceph rules: - apiGroups: - "" resources: - pods - configmaps verbs: - get - list - watch - patch - create - update - delete - apiGroups: - extensions resources: - daemonsets verbs: - get - list - watch - create - update - delete --- # The cluster role for managing the Rook CRDs apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRole metadata: name: rook-ceph-global labels: operator: rook storage-backend: ceph rules: - apiGroups: - "" resources: # Pod access is needed for fencing - pods # Node access is needed for determining nodes where mons should run - nodes - nodes/proxy verbs: - get - list - watch - apiGroups: - "" resources: - events # PVs and PVCs are managed by the Rook provisioner - persistentvolumes - persistentvolumeclaims verbs: - get - list - watch - patch - create - update - delete - apiGroups: - storage.k8s.io resources: - storageclasses verbs: - get - list - watch - apiGroups: - batch resources: - jobs verbs: - get - list - watch - create - update - delete - apiGroups: - ceph.rook.io resources: - "*" verbs: - "*" - apiGroups: - rook.io resources: - "*" verbs: - "*" --- # Aspects of ceph-mgr that require cluster-wide access kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: rook-ceph-mgr-cluster labels: operator: rook storage-backend: ceph rules: - apiGroups: - "" resources: - configmaps - nodes - nodes/proxy verbs: - get - list - watch --- # The rook system service account used by the operator, agent, and discovery pods apiVersion: v1 kind: ServiceAccount metadata: name: rook-ceph-system namespace: rook-ceph-system labels: operator: rook storage-backend: ceph --- # Grant the operator, agent, and discovery agents access to resources in the rook-ceph-system namespace kind: RoleBinding apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: rook-ceph-system namespace: rook-ceph-system labels: operator: rook storage-backend: ceph roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: rook-ceph-system subjects: - kind: ServiceAccount name: rook-ceph-system namespace: rook-ceph-system --- # Grant the rook system daemons cluster-wide access to manage the Rook CRDs, PVCs, and storage classes kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: rook-ceph-global namespace: rook-ceph-system labels: operator: rook storage-backend: ceph roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: rook-ceph-global subjects: - kind: ServiceAccount name: rook-ceph-system namespace: rook-ceph-system --- # The deployment for the rook operator apiVersion: apps/v1beta1 kind: Deployment metadata: name: rook-ceph-operator namespace: rook-ceph-system labels: operator: rook storage-backend: ceph spec: replicas: 1 template: metadata: labels: app: rook-ceph-operator spec: serviceAccountName: rook-ceph-system containers: - name: rook-ceph-operator image: rook/ceph:v0.9.2 args: ["ceph", "operator"] volumeMounts: - mountPath: /var/lib/rook name: rook-config - mountPath: /etc/ceph name: default-config-dir env: # To disable RBAC, uncomment the following: # - name: RBAC_ENABLED # value: "false" # Rook Agent toleration. Will tolerate all taints with all keys. # Choose between NoSchedule, PreferNoSchedule and NoExecute: # - name: AGENT_TOLERATION # value: "NoSchedule" # (Optional) Rook Agent toleration key. Set this to the key of the taint you want to tolerate # - name: AGENT_TOLERATION_KEY # value: "" # (Optional) Rook Agent mount security mode. Can by `Any` or `Restricted`. # `Any` uses Ceph admin credentials by default/fallback. # For using `Restricted` you must have a Ceph secret in each namespace storage should be consumed from and # set `mountUser` to the Ceph user, `mountSecret` to the Kubernetes secret name. # to the namespace in which the `mountSecret` Kubernetes secret namespace. # - name: AGENT_MOUNT_SECURITY_MODE # value: "Any" # Set the path where the Rook agent can find the flex volumes # 仅增加了此处配置,见文档 https://rook.io/docs/rook/v0.9/flexvolume.html说明 - name: FLEXVOLUME_DIR_PATH value: "/var/lib/kubelet/volumeplugins" # value: "" # Set the path where kernel modules can be found # - name: LIB_MODULES_DIR_PATH # value: "" # Mount any extra directories into the agent container # - name: AGENT_MOUNTS # value: "somemount=/host/path:/container/path,someothermount=/host/path2:/container/path2" # Rook Discover toleration. Will tolerate all taints with all keys. # Choose between NoSchedule, PreferNoSchedule and NoExecute: # - name: DISCOVER_TOLERATION # value: "NoSchedule" # (Optional) Rook Discover toleration key. Set this to the key of the taint you want to tolerate # - name: DISCOVER_TOLERATION_KEY # value: "" # Allow rook to create multiple file systems. Note: This is considered # an experimental feature in Ceph as described at # http://docs.ceph.com/docs/master/cephfs/experimental-features/#multiple-filesystems-within-a-ceph-cluster # which might cause mons to crash as seen in https://github.com/rook/rook/issues/1027 - name: ROOK_ALLOW_MULTIPLE_FILESYSTEMS value: "false" # The logging level for the operator: INFO | DEBUG - name: ROOK_LOG_LEVEL value: "INFO" # The interval to check if every mon is in the quorum. - name: ROOK_MON_HEALTHCHECK_INTERVAL value: "45s" # The duration to wait before trying to failover or remove/replace the # current mon with a new mon (useful for compensating flapping network). - name: ROOK_MON_OUT_TIMEOUT value: "300s" # The duration between discovering devices in the rook-discover daemonset. - name: ROOK_DISCOVER_DEVICES_INTERVAL value: "60m" # Whether to start pods as privileged that mount a host path, which includes the Ceph mon and osd pods. # This is necessary to workaround the anyuid issues when running on OpenShift. # For more details see https://github.com/rook/rook/issues/1314#issuecomment-355799641 - name: ROOK_HOSTPATH_REQUIRES_PRIVILEGED value: "false" # In some situations SELinux relabelling breaks (times out) on large filesystems, and doesn't work with cephfs ReadWriteMany volumes (last relabel wins). # Disable it here if you have similiar issues. # For more details see https://github.com/rook/rook/issues/2417 - name: ROOK_ENABLE_SELINUX_RELABELING value: "true" # In large volumes it will take some time to chown all the files. Disable it here if you have performance issues. # For more details see https://github.com/rook/rook/issues/2254 - name: ROOK_ENABLE_FSGROUP value: "true" # The name of the node to pass with the downward API - name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName # The pod name to pass with the downward API - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name # The pod namespace to pass with the downward API - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace volumes: - name: rook-config emptyDir: {} - name: default-config-dir emptyDir: {}
-
仅增加了如下配置,如果不配置会导致kubernetes在默认的flexvolume目录
/var/lib/kubelet/volumeplugins
下找不到flexvolume驱动,导致创建的pv无法挂在到容器中# 仅增加了此处配置,见文档 https://rook.io/docs/rook/v0.9/flexvolume.html说明 - name: FLEXVOLUME_DIR_PATH value: "/var/lib/kubelet/volumeplugins"
也可以通过官方文档中的建议修改rke配置文件
kubelet: image: "" extra_args: volume-plugin-dir: /usr/libexec/kubernetes/kubelet-plugins/volume/exec # 此处 extra_binds 应该是kubelet下参数而非extra_args下参数,rook文档似乎有错误,此方式未尝试, 我希望尽量减少对ranhcer本身的配置 extra_binds: - /usr/libexec/kubernetes/kubelet-plugins/volume/exec:/usr/libexec/kubernetes/kubelet-plugins/volume/exec
-
-
创建 rook-agent
kubectl apply -f operator.yaml
-
修改cluster配置,如下可供参考
apiVersion: v1 kind: Namespace metadata: name: rook-ceph --- apiVersion: v1 kind: ServiceAccount metadata: name: rook-ceph-osd namespace: rook-ceph --- apiVersion: v1 kind: ServiceAccount metadata: name: rook-ceph-mgr namespace: rook-ceph --- kind: Role apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: rook-ceph-osd namespace: rook-ceph rules: - apiGroups: [""] resources: ["configmaps"] verbs: [ "get", "list", "watch", "create", "update", "delete" ] --- # Aspects of ceph-mgr that require access to the system namespace kind: Role apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: rook-ceph-mgr-system namespace: rook-ceph rules: - apiGroups: - "" resources: - configmaps verbs: - get - list - watch --- # Aspects of ceph-mgr that operate within the cluster's namespace kind: Role apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: rook-ceph-mgr namespace: rook-ceph rules: - apiGroups: - "" resources: - pods - services verbs: - get - list - watch - apiGroups: - batch resources: - jobs verbs: - get - list - watch - create - update - delete - apiGroups: - ceph.rook.io resources: - "*" verbs: - "*" --- # Allow the operator to create resources in this cluster's namespace kind: RoleBinding apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: rook-ceph-cluster-mgmt namespace: rook-ceph roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: rook-ceph-cluster-mgmt subjects: - kind: ServiceAccount name: rook-ceph-system namespace: rook-ceph-system --- # Allow the osd pods in this namespace to work with configmaps kind: RoleBinding apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: rook-ceph-osd namespace: rook-ceph roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: rook-ceph-osd subjects: - kind: ServiceAccount name: rook-ceph-osd namespace: rook-ceph --- # Allow the ceph mgr to access the cluster-specific resources necessary for the mgr modules kind: RoleBinding apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: rook-ceph-mgr namespace: rook-ceph roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: rook-ceph-mgr subjects: - kind: ServiceAccount name: rook-ceph-mgr namespace: rook-ceph --- # Allow the ceph mgr to access the rook system resources necessary for the mgr modules kind: RoleBinding apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: rook-ceph-mgr-system namespace: rook-ceph-system roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: rook-ceph-mgr-system subjects: - kind: ServiceAccount name: rook-ceph-mgr namespace: rook-ceph --- # Allow the ceph mgr to access cluster-wide resources necessary for the mgr modules kind: RoleBinding apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: rook-ceph-mgr-cluster namespace: rook-ceph roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: rook-ceph-mgr-cluster subjects: - kind: ServiceAccount name: rook-ceph-mgr namespace: rook-ceph --- apiVersion: ceph.rook.io/v1 kind: CephCluster metadata: name: rook-ceph namespace: rook-ceph spec: cephVersion: # The container image used to launch the Ceph daemon pods (mon, mgr, osd, mds, rgw). # v12 is luminous, v13 is mimic, and v14 is nautilus. # RECOMMENDATION: In production, use a specific version tag instead of the general v13 flag, which pulls the latest release and could result in different # versions running within the cluster. See tags available at https://hub.docker.com/r/ceph/ceph/tags/. image: ceph/ceph:v13.2.4-20190109 # Whether to allow unsupported versions of Ceph. Currently only luminous and mimic are supported. # After nautilus is released, Rook will be updated to support nautilus. # Do not set to true in production. allowUnsupported: false # The path on the host where configuration files will be persisted. If not specified, a kubernetes emptyDir will be created (not recommended). # Important: if you reinstall the cluster, make sure you delete this directory from each host or else the mons will fail to start on the new cluster. # In Minikube, the '/data' directory is configured to persist across reboots. Use "/data/rook" in Minikube environment. dataDirHostPath: /var/lib/rook # set the amount of mons to be started mon: count: 3 allowMultiplePerNode: true # enable the ceph dashboard for viewing cluster status dashboard: # 是否开启dashboard enabled: true # serve the dashboard under a subpath (useful when you are accessing the dashboard via a reverse proxy) # urlPrefix: /ceph-dashboard # serve the dashboard at the given port. # port: 8443 # serve the dashboard using SSL # 可以关闭ssl,在创建ingress提供dashboard服务 # 部署过程中有遇到为创建dashboard服务情况,如果遇到可以修改ssl配置开关 # 再次 kubectl apply -f cluster.yaml 更新cluster会创建出来 ssl: false network: # toggle to use hostNetwork hostNetwork: false rbdMirroring: # The number of daemons that will perform the rbd mirroring. # rbd mirroring must be configured with "rbd mirror" from the rook toolbox. workers: 1 # To control where various services will be scheduled by kubernetes, use the placement configuration sections below. # The example under 'all' would have all services scheduled on kubernetes nodes labeled with 'role=storage-node' and # tolerate taints with a key of 'storage-node'. # placement: # all: # nodeAffinity: # requiredDuringSchedulingIgnoredDuringExecution: # nodeSelectorTerms: # - matchExpressions: # - key: role # operator: In # values: # - storage-node # podAffinity: # podAntiAffinity: # tolerations: # - key: storage-node # operator: Exists # The above placement information can also be specified for mon, osd, and mgr components # mon: # osd: # mgr: resources: # The requests and limits set here, allow the mgr pod to use half of one CPU core and 1 gigabyte of memory # mgr: # limits: # cpu: "500m" # memory: "1024Mi" # requests: # cpu: "500m" # memory: "1024Mi" # The above example requests/limits can also be added to the mon and osd components # mon: # osd: storage: # cluster level storage configuration and selection # 关闭使用所有节点和所有可用磁盘设置,否则下面具体nodes配置不会生效 useAllNodes: false useAllDevices: false deviceFilter: location: config: # The default and recommended storeType is dynamically set to bluestore for devices and filestore for directories. # Set the storeType explicitly only if it is required not to use the default. # storeType: bluestore metadataDevice: # databaseSizeMB: "1024" # this value can be removed for environments with normal sized disks (100 GB or larger) # journalSizeMB: "1024" # this value can be removed for environments with normal sized disks (20 GB or larger) osdsPerDevice: "1" # this value can be overridden at the node or device level # 使用了单独的裸盘作为ceph存储,指定kubernetes节点名称,要使用的裸盘名称 nodes: - name: "wx-xx-10" devices: - name: "sdb" - name: "wx-xx-09" devices: - name: "sdb" - name: "wx-xx-08" devices: - name: "sdb" - name: "wx-xx-07" devices: - name: "sdb" - name: "wx-xx-06" devices: - name: "sdb"
-
创建 cluster:
kubectl apply -f cluster.yaml
-
创建 storageclass
kubectl apply -f storageclass.yaml # 具体storageclass 副本数量修改配置中 # replicated: # size: 1 # 默认为一个副本
-
至此安装完成
遇到的问题
-
安装中出现错误需要重新安装时,使用kubectl delete -f cluster.yaml删除后还需要清理宿主机伤的配置信息和创建的lvm信息
ansible all --become-user root -m shell -a "rm -rf /var/lib/rook/*"
-
还需要清除磁盘信息,否则再次安装该磁盘不会被ceph使用
ansible all --become-user root -m shell -a "wipefs --all --force /dev/sdb" # 在参考如下文章删除lvm残留信息,否则一样在再此安装中遇到问题 # 参考文章:http://www.strugglesquirrel.com/2018/03/28/%E8%A7%A3%E5%86%B3%E6%97%A0%E6%B3%95%E6%AD%A3%E5%B8%B8%E5%88%A0%E9%99%A4lvm%E7%9A%84%E9%97%AE%E9%A2%98/ lsblk | grep 'ceph-' dmsetup remove ceph--xxx
之后可以再创建ingress到service
rook-ceph-mgr-dashboard
,admin密码可以到rook-ceph-mgr-a
服务日志中找到
文章记录稍简洁,如有问题可留言共同交流学习