本章重点: 如何基于 custom-metrics-apiserver 项目,打造自己的 metric server
1. custom-metrics-apiserver简介
项目地址: https://github.com/kubernetes-sigs/custom-metrics-apiserver/tree/master
自定义metric server,具体来说需要做以下几个事情:
(1)实现 custom-metrics-apiserver 的 三个接口,如下:
type CustomMetricsProvider interface {
// 定义metric。 例如 pod_cpu_used_1m
ListAllMetrics() []CustomMetricInfo
// 如何根据 metric的信息,得到具体的值
GetMetricByName(name types.NamespacedName, info CustomMetricInfo) (*custom_metrics.MetricValue, error)
// 如何根据 metric selector的信息,得到具体的值
GetMetricBySelector(namespace string, selector labels.Selector, info CustomMetricInfo) (*custom_metrics.MetricValueList, error)
}
GetMetricBySelectorm, GetMetricByName 在reststorage.go被使用。
restful接口在installer.go中被定义。
总的来说,可以认为
(1)基于custom-metrics-apiserver这个项目,你只要实现上述三个接口就行。其他的事情这个包在你new provider的时候都自动实现了。
(2)ListAllMetrics 注册了所有的Metric,让api-server 知道有哪些自定义metric。
(3)GetMetricByName, GetMetricBySelector 都是返回具体的Metric数据。
(4)一般api server都是 调用GetMetricBySelector,因为hpa的对象基本都是deploy, GetMetricBySelector会循环调用GetMetricByName取得deploy所有pod的metric信息。
2. 定制自己的metric server
2.1 代码部署和编译
这里我做了如下的修改。对于metric server而言,无论访问什么metric,都返回10。
func (p *monitorProvider) GetMetricByName(
name types.NamespacedName,
info provider.CustomMetricInfo,
metricSelector labels.Selector,
) (*custom_metrics.MetricValue, error) {
ref, err := helpers.ReferenceFor(p.mapper, name, info)
if err != nil {
return nil, err
}
return &custom_metrics.MetricValue{
DescribedObject: ref,
// MetricName: info.Metric,
Metric: custom_metrics.MetricIdentifier{
Name: info.Metric,
},
Timestamp: metav1.Time{time.Unix(int64(10), 0)},
Value: *resource.NewMilliQuantity(int64(10*1000.0), resource.DecimalSI),
}, nil
}
更详细的可以参考: https://github.com/kubernetes-sigs/custom-metrics-apiserver/tree/master/test-adapter
编译生成自己的镜像:zoux/hpa:v1。然后生成一下的deployment。
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: kube-hpa
name: kube-hpa
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: kube-hpa
template:
metadata:
labels:
app: kube-hpa
name: kube-hpa
spec:
hostNetwork: true
containers:
- name: kube-hpa
image: zoux/hpa:v1
imagePullPolicy: IfNotPresent
command:
- /metric-server
args:
- --master-url=XXX
- --kube-config=/pkc/config
- --tls-private-key-file=/pkc/server-key.pem
- --secure-port=9997
- --v=10
ports:
- containerPort: 9997
resources:
limits:
cpu: 2
memory: 2048Mi
requests:
cpu: 0.5
memory: 500Mi
volumeMounts:
- name: pkc
mountPath: /pkc
readOnly: true
volumes:
- name: pkc
hostPath:
path: /opt/kubernetes/ssl
验证部署成功
root@k8s-master:~/testyaml/hpa# kubectl get pod -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-hpa-84c884f994-gd5fl 1/1 Running 0 3d13h 192.168.0.5 192.168.0.5 <none> <none>
2.2 创建 Sv and APIService
上面虽然部署成功了,但是apiserver还是访问不到。
k8s-master:~/testyaml/hpa# kubectl get --raw "/apis/custom.metrics.k8s.io/v1
Error from server (NotFound): the server could not find the requested resource
原因在于,apiserver不知道如何找到kube-hpa-84c884f994-gd5fl这个pod进行访问。所以需要创建下面的svc和apiserver。
root@k8s-master:~/testyaml/hpa# cat tls.yaml
apiVersion: v1
kind: Service
metadata:
name: kube-hpa
namespace: kube-system
spec:
clusterIP: None
ports:
- name: https-hpa-dont-edit-it
port: 9997
targetPort: 9997
selector:
app: kube-hpa
---
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
name: v1beta1.custom.metrics.k8s.io
spec:
service:
name: kube-hpa
namespace: kube-system
port: 9997
group: custom.metrics.k8s.io
version: v1beta1
insecureSkipTLSVerify: true
groupPriorityMinimum: 100
versionPriority: 100
创建完成,验证是否成功:
root@k8s-master:~/testyaml/hpa# kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1"
{"kind":"APIResourceList","apiVersion":"v1","groupVersion":"custom.metrics.k8s.io/v1beta1","resources":[{"name":"pods/pod_cpu_used_1m","singularName":"","namespaced":true,"kind":"MetricValueList","verbs":["get"]},{"name":"pods/pod_cpu_used_5m","singularName":"","namespaced":true,"kind":"MetricValueList","verbs":["get"]},{"name":"pods/container_cpu_used_1m","singularName":"","namespaced":true,"kind":"MetricValueList","verbs":...}
如果报错。查看该apiserver哪里报错了
root@k8s-master:~/testyaml/hpa# kubectl get APIService v1beta1.custom.metrics.k8s.io -oyaml
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
creationTimestamp: "2021-06-13T13:22:01Z"
name: v1beta1.custom.metrics.k8s.io
resourceVersion: "1590641"
selfLink: /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.custom.metrics.k8s.io
uid: d488d6a8-7e79-4311-a1e9-0b12e4591375
spec:
group: custom.metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: kube-hpa
namespace: kube-system
port: 9997
version: v1beta1
versionPriority: 100
status:
conditions:
- lastTransitionTime: "2021-06-13T13:42:17Z"
message: all checks passed
reason: Passed
status: "True"
type: Available
或者直接curl访问:
curl -k https://nodeip:9997/apis/custom.metrics.k8s.io/v1beta1
2.3 system:anonymous授权
如果没有出现类似问题,这一步直接跳过
又是会出现如下的错误 或者 上述的APIService没有运行成功。都是因为system:anonymous权限不够
annotations:
autoscaling.alpha.kubernetes.io/conditions: '[{"type":"AbleToScale","status":"True","lastTransitionTime":"2021-06-13T13:33:12Z","reason":"SucceededGetScale","message":"the
HPA controller was able to get the target''s current scale"},{"type":"ScalingActive","status":"False","lastTransitionTime":"2021-06-13T13:33:12Z","reason":"FailedGetPodsMetric","message":"the
HPA was unable to compute the replica count: unable to get metric pod_cpu_usage_for_limit_1m:
unable to fetch metrics from custom metrics API: pods.custom.metrics.k8s.io
\"*\" is forbidden: User \"system:anonymous\" cannot get resource \"pods/pod_cpu_usage_for_limit_1m\"
in API group \"custom.metrics.k8s.io\" in the namespace \"default\""}]'
autoscaling.alpha.kubernetes.io/metrics: '[{"type":"Pods","pods":{"metricName":"pod_cpu_usage_for_limit_1m","targetAverageValue":"60"}}]'
metric-containerName: zx-hpa
creationTimestamp: "2021-06-13T13:32:56Z"
name: nginx-hpa-zx-1
namespace: default
resourceVersion: "1589301"
selfLink: /apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers/
这是可以直接绑定clusterrole https://github.com/kubernetes-sigs/metrics-server/issues/81
我这里是直接给了 cluster-admin 权限,实际情况可以按照需求赋权。
kubectl create clusterrolebinding anonymous-role-binding --clusterrole=cluster-admin --user=system:anonymous
3. 创建hpa验证是否成功
可以看出来都是10
root@k8s-master:~/testyaml/hpa# kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx-hpa-zx-1 Deployment/zx-hpa 10/60 1 3 3 9m55s
root@k8s-master:~/testyaml/hpa# kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx-hpa-zx-1 Deployment/zx-hpa 10/60 1 3 3 9m57s
4. 追踪整个过程
第一步 Kcm(hpa controller)发送的请求。
I0613 23:12:36.498740 9879 httplog.go:90] GET /apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/%2A/pod_aa_100m?labelSelector=app%3Dzx-hpa-test: (35.302304ms) 200 [kube-controller-manager/v1.17.4 (linux/amd64) kubernetes/8d8aa39/horizontal-pod-autoscaler 192.168.0.4:42750]
要在url里使用不安全字符,就需要使用转义。
%2A = *
%3D = =(等号)
第二步 apiserver进行了 url转换。
kcm访问的是: /apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/%2A/pod_aa_100m?labelSelector=app%3Dzx-hpa-test
但是由于第二步创建了 Sv and APIService。所以访问这个url会被转换为:
192.168.0.5是pod kube-hpa-84c884f994-gd5fl 所在的节点ip也是podia(hostNetwork模式)。 9997是定义的端口。
直接在master节点上(masterip=192.168.0.4)通过curl模拟
root@k8s-master:~/testyaml/hpa# curl -k https://192.168.0.5:9997/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/%2A/pod_aa_100m?labelSelector=app%3Dzx-hpa-test
{
"kind": "MetricValueList",
"apiVersion": "custom.metrics.k8s.io/v1beta1",
"metadata": {
"selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/%2A/pod_aa_100m"
},
"items": [
{
"describedObject": {
"kind": "Pod",
"namespace": "default",
"name": "zx-hpa-7b56cddd95-5j6r4|",
"apiVersion": "/v1"
},
"metricName": "pod_aa_100m",
"timestamp": "1970-01-01T00:00:10Z",
"value": "10",
"selector": null
},
{
"describedObject": {
"kind": "Pod",
"namespace": "default",
"name": "zx-hpa-7b56cddd95-lthbz|",
"apiVersion": "/v1"
},
"metricName": "pod_aa_100m",
"timestamp": "1970-01-01T00:00:10Z",
"value": "10",
"selector": null
},
{
"describedObject": {
"kind": "Pod",
"namespace": "default",
"name": "zx-hpa-7b56cddd95-n9ft9|",
"apiVersion": "/v1"
},
"metricName": "pod_aa_100m",
"timestamp": "1970-01-01T00:00:10Z",
"value": "10",
"selector": null
}
]
}
5. 总结
(1)如何定制自己的metric-server,包括代码编写和环境搭建
(2)Kubernetes 里的 Custom Metrics 机制,也是借助 Aggregator APIServer 扩展机制来实现的。这里的具体原理是,当你把 Custom Metrics APIServer 启动之后,Kubernetes 里就会出现一个叫作custom.metrics.k8s.io的 API。而当你访问这个 URL 时,Aggregator 就会把你的请求转发给 Custom Metrics APIServer 。
这里一定要注意: kube-apiserver启动参数一定要包含: -enable-swagger-ui=true
(3)ListAllMetrics()并没有将metric注册到apiserver。 apiserver并没有对metric进行验证。上文中,我metric server的ListAllMetrics()并没有注册 pod_aa_100m这个metric,但是可以正常使用。
原因:apiserver并没有进行验证,apiserver只进行url转发,如果有返回数据,apiserver就认为这个metric是正确的。所以这一点可以用来自定义metric。