Koord-controller-manager控制器

控制器

总共包括4种控制器

NodeMetric控制器

监听Node的事件，为每一个Node都创建一个NodeMetric，记录的指标采集相关的配置

&slov1alpha1.NodeMetricSpec{
        CollectPolicy: &slov1alpha1.NodeMetricCollectPolicy{
            AggregateDurationSeconds: defaultColocationCfg.MetricAggregateDurationSeconds,
            ReportIntervalSeconds:    defaultColocationCfg.MetricReportIntervalSeconds,
            NodeAggregatePolicy:      defaultColocationCfg.MetricAggregatePolicy,
            NodeMemoryCollectPolicy:  defaultColocationCfg.MetricMemoryCollectPolicy,
        },
    }

可以通过ConfigMap来自定义配置，第一层级下的是集群级别的配置，nodeConfigs下的是节点级别的配置

节点级别的配置拥有更高的优先级

colocation-config: |
    {
      "enable": false,
      "metricAggregateDurationSeconds": 300,
      "metricReportIntervalSeconds": 60,
      "metricAggregatePolicy": {
        "durations": [
          "5m",
          "10m",
          "15m"
        ]
      },
      "nodeConfigs": [
        {
          "name": "anolis",
          "nodeSelector": {
            "matchLabels": {
              "kubernetes.io/kernel": "anolis"
            }
          },
          "metricAggregateDurationSeconds": 400,
      "metricReportIntervalSeconds": 80,
        }
      ]
    }

NodeSLO控制器

监听Node的事件，为每一个Node都创建一个NodeSLO，生成SLO相关配置，服务于Koordlet种的QoSManager

同样可以通过ConfigMap来配置，也支持集群级别的配置和节点级别的配置

resource-threshold-config: |
    {
      "clusterStrategy": {
        "enable": false,
        "cpuSuppressThresholdPercent": 65,
        "cpuSuppressPolicy": "cpuset",
        "memoryEvictThresholdPercent": 70,
        "memoryEvictLowerPercent": 65,
        "cpuEvictBESatisfactionUpperPercent": 90,
        "cpuEvictBESatisfactionLowerPercent": 60,
        "cpuEvictBEUsageThresholdPercent": 90
      },
      "nodeStrategies": [
        {
          "name": "anolis",
          "nodeSelector": {
            "matchLabels": {
              "kubernetes.io/kernel": "anolis"
            }
          },
          "cpuEvictBEUsageThresholdPercent": 80
        }
      ]
    }

ElasticQuota控制器

监听ElasticQuotaProfile的创建，为每一个ElasticQuotaProfile都创建一份ElasticQuota

ElasticQuota会记录profile中通过标签选中的节点的所有可分配cpu、内存资源；同时记录不可调度的资源

NodeResource控制器

动态更新Node的status、metadata部分

Webhook

controller-manager还会运行一个webhook server，默认监听在0.0.0.0:9876

主要包括动态修改Node和Pod的资源，webhook server由apiserver在写入etcd前调用，通过MutatingWebhookConfiguration资源指定

apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
  name: mutating-webhook-configuration
webhooks:
- admissionReviewVersions:
  - v1
  - v1beta1
  clientConfig:
    caBundle: <BASE 64 CA>
    service:
      name: <controller-manager-svc-name>
      namespace: <controller-manager-svc-namespace>
      path: /mutate-pod
      port: 443
  failurePolicy: Ignore
  matchPolicy: Equivalent
  name: somename
  namespaceSelector: {}
  objectSelector: {}
  reinvocationPolicy: Never
  rules:
  - apiGroups:
    - ""
    apiVersions:
    - v1
    operations:
    - CREATE
    - UPDATE
    resources:
    - pod
    scope: '*'
  sideEffects: None
  timeoutSeconds: 10

Pod资源

/validate-pod：验证资源
/mutate-pod：修改Pod信息

webhook修改Pod信息的前提是需要手动在集群内创建ClusterColocationProfile资源

apiVersion: config.koordinator.sh/v1alpha1
kind: ClusterColocationProfile
metadata:
  name: colocation-profile-example
spec:
  namespaceSelector:
    matchLabels:
      koordinator.sh/enable-colocation: "true"
  selector:
    matchLabels:
      koordinator.sh/enable-colocation: "true"
  qosClass: BE
  priorityClassName: koord-batch
  koordinatorPriority: 1000
  schedulerName: koord-scheduler
  labels:
    koordinator.sh/mutated: "true"
  annotations: 
    koordinator.sh/intercepted: "true"
  patch:
    spec:
      terminationGracePeriodSeconds: 30

可以简单理解为这是一个配置，webhook会按照这个配置来对Pod信息进行修改，这个配置针对的是含有koordinator.sh/enable-colocation=true的namespace下的含有koordinator.sh/enable-colocation=true的Pod

他会修改pod的标签、注解、调度器、优先级信息，同时将qosClass、koordinatorPriority写入到pod的标签里

对于Qos为PriorityMid、PriorityBatch、PriorityFree的这部分pod还会修改容器的资源部分

将原本的request和limit里的cpu、memory资源替换为batch-cpu、batch-memory资源

然后再将替换后的资源写入到pod的注解里

比如创建这么一个Pod

apiVersion: v1
kind: Pod
metadata:
  labels:
    koordinator.sh/enable-colocation: "true"
  name: test-pod
spec:
  containers:
  - name: app
    image: nginx:1.15.1
    resources:
        limits:
          cpu: "1"
          memory: "3456Mi"
        requests:
          cpu: "1"
          memory: "3456Mi"

将会被webhook修改为下面的内容

apiVersion: v1
kind: Pod
metadata:
  annotations: 
    koordinator.sh/intercepted: true
    node.koordinator.sh/extended-resource-spec: "{"app": {"limits": {"kubernetes.io/batch-cpu": "1000", "kubernetes.io/batch-memory": "3456Mi"}, "requests": {"kubernetes.io/batch-cpu": "1000", "kubernetes.io/batch-memory": "3456Mi"}}}"
  labels:
    koordinator.sh/qosClass: BE
    koordinator.sh/priority: 1000
    koordinator.sh/mutated: true
  ...
spec:
  terminationGracePeriodSeconds: 30
  priority: 5000
  priorityClassName: koord-batch
  schedulerName: koord-scheduler
  containers:
  - name: app
    image: nginx:1.15.1
    resources:
        limits:
          kubernetes.io/batch-cpu: "1000"
          kubernetes.io/batch-memory: 3456Mi
        requests:
          kubernetes.io/batch-cpu: "1000"
          kubernetes.io/batch-memory: 3456Mi

Node资源

/validate-node：资源验证
/mutate-node-status：修改Node信息

参考

官方文档