设计介绍
监控分成两个部分
核心指标流程 包括的组件有 kubelet、resource estimator、metrics-server、API server。这些指标被kubernetes的核心组件使用:kubectl、sheduler、HPA。指标数据流转如上图中黑色部分所示,具体过程如下:
通用监控流程 用来收集各种各样的指标并暴露给用户,部分指标通过adapter适配后提供给HPA(Horizontal Pod Autoscaler)使用。这些指标也可以通过一个API adapter转换后存入到基础存储组件中给其他需要历史轨迹的组件使用。一般通用监控组件会在节点上都安装一个代理,然后会有一个集群级别的收集组件。指标数据流转参考架构图中的蓝色部分,具体流程如下
各个节点上运行的agent收集节点上的指标信息,这些信息可包含以下信息类型的子集或全集(取决于监控组件的设计)
集群级别的收集组件把这些信息收集起来
通过adapter转换成满足存储组件需要的格式存储起来
对于用户自定义的HPA指标,通过adapter转换成HPA识别的形式参与HPA
可选的技术方案
这些都是刚开始设计时的想法,现在prometheus采用的是在节点上安装一个nodereporter来收集节点信息,汇报给Prometheus
kubernetes 引入Aggregation Layer机制,可以让用户方便的在核心API之外对k8s集群进行扩展。安装配置好集群后aggregation会运行在kube-apiserver进程里面,用户通过在集群中创建一个APIService对象,在其中设定对应的URL(路径 /apis/{group}/{version}/…),之后访问这个路径,API server会将请求转发到具体的后端服务,一般情况下,通过extension-apiserver服务实现,这个服务会作为一个pod运行在k8s集群中。
用户访问extension-apiserver的过程如下
API server通过以下配置项向extension-apiserver提供访问的证书以及用户信息
当API server有以上配置项时,API server会在kube-system命名空间下生成一个configmap:extension-apiserver-authentication,里面包含--requestheader* 相关的配置信息。Extension-apiserver要想对API server的请求认证,需要先拿到这个configmap中的信息,可以通过给Extension-apiserver使用的serviceAccount赋予kube-system:extension-apiserver-authentication-reader这个角色来实现。
目前API server没有以上配置项时也会产生这个configmap,对应的内容如下
root@master:/opt/k8s/work# kubectl get configmaps -n kube-system extension-apiserver-authentication -o yaml
apiVersion: v1
data:
client-ca-file: |
-----BEGIN CERTIFICATE-----
MIIDmjCCAoKgAwIBAgIUMgmbH118p4mkwRHqgFl3bltHX1MwDQYJKoZIhvcNAQEL
BQAwZTELMAkGA1UEBhMCQ04xEDAOBgNVBAgTB05hbkppbmcxEDAOBgNVBAcTB05h
bkppbmcxDDAKBgNVBAoTA2s4czEPMA0GA1UECxMGc3lzdGVtMRMwEQYDVQQDEwpr
NQIDAQABo0IwQDAOBgNVHQ8BAf8EBAMCAQYwDwYDVR0TAQH/BAUwAwEB/zAdBgNV
HQ4EFgQUcyfeeyf0LulhElMz7x4YXC7FBXIwDQYJKoZIhvcNAQELBQADggEBAHvN
18jceQ9BthnxFNoCZ5yjiQGQViVcaw76gEm/OrmxKGFUXJyDmZghP+gjJ8ZOADZ9
Brw+F66ULWMBfFQrESUf3nnnaScFdrZ9TcoKDPPhzibOfEqGMf6RNFTjlWk11ZUl
qPTPmkJlGqMGvRgPMPm2xwucE5+o762C94iLFBfmqaS/FHGsoR7hfGSEAn0q9by5
SotQpHpAt5tzE8N7KEXFIDOr8LlbXOd/lLn1+G84NY8lWWcARFgvAuOFgKQqfenm
ezrX/nv45OvuKBYVf7o+8CXfoTK7vc7RTtqWHA+zNbjly7IaYeaPyDxQqWSY6cBZ
Fzh51DLVlbmTyeagMXo=
-----END CERTIFICATE-----
kind: ConfigMap
metadata:
creationTimestamp: "2020-02-09T12:04:05Z"
name: extension-apiserver-authentication
namespace: kube-system
resourceVersion: "21"
selfLink: /api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication
uid: 6cb1a94e-78e5-49c9-8f5a-ae8183f8de96
Extension-apiserver 要对请求的user进行鉴权,需要先发送一个SubjectAccessReview请求到API server,为了能够发送这个请求还需要给xtension-apiserver使用的serviceAccount赋予system:auth-delegator这个角色
启用aggregation功能需要在kube-apiserver配置项中追加如下信息
--requestheader-client-ca-file=<path to aggregator CA cert>
--requestheader-allowed-names=front-proxy-client
--requestheader-extra-headers-prefix=X-Remote-Extra-
--requestheader-group-headers=X-Remote-Group
--requestheader-username-headers=X-Remote-User
--proxy-client-cert-file=<path to aggregator proxy cert>
--proxy-client-key-file=<path to aggregator proxy key>
如果运行API server的机器上没有运行kube-proxy进程,还需要追加如下一个配置项
--enable-aggregator-routing=true
CA 冲突问题
启用了aggregation功能后,API server会有两个配置
如果配置不好,会造成CA冲突
当两者都配置后,API server会先检查证书是否被requestheader-client-ca-file签名,不是时才会用client-ca-file来判断。所以一般情况下这两个ca要不一致,如果两个配置的一样,可能会造成原来能正常访问API server的证书在API server启用aggregator不能再访问了,因为有可能原来正常访问的证书中的CN不在requestheader-allowed-names这个列表中,(官方文档说法,实际使用时测试了下,用同一个ca,没有报错,不知道是什么原因)
生成 client ca,要安装cfssl工具集cfssl
生成proxy用证书
配置API server追加如下配置项
--requestheader-client-ca-file=/etc/kubernetes/cert/client-ca.pem
--requestheader-allowed-names=front-proxy-client
--requestheader-extra-headers-prefix=X-Remote-Extra-
--requestheader-group-headers=X-Remote-Group
--requestheader-username-headers=X-Remote-User
--proxy-client-cert-file=/etc/kubernetes/cert/proxy-client.pem
--proxy-client-key-file=/etc/kubernetes/cert/proxy-client-key.pem
重启API server
systemctl daemon-reload
systemctl restart kube-apiserver
对应的extension-apiserver-authentication中的内容
root@master:/opt/k8s/work# kubectl get configmaps -n kube-system extension-apiserver-authentication -o yaml
apiVersion: v1
data:
client-ca-file: |
-----BEGIN CERTIFICATE-----
MIIDmjCCAoKgAwIBAgIUMgmbH118p4mkwRHqgFl3bltHX1MwDQYJKoZIhvcNAQEL
BQAwZTELMAkGA1UEBhMCQ04xEDAOBgNVBAgTB05hbkppbmcxEDAOBgNVBAcTB05h
...
Fzh51DLVlbmTyeagMXo=
-----END CERTIFICATE-----
requestheader-allowed-names: '["front-proxy-client"]'
requestheader-client-ca-file: |
-----BEGIN CERTIFICATE-----
MIIDmjCCAoKgAwIBAgIULBdSC4QJy1MBYwGDb0b9g7YMDH0wDQYJKoZIhvcNAQEL
...
QKHtdMypc3mPUO6sBcY=
-----END CERTIFICATE-----
requestheader-extra-headers-prefix: '["X-Remote-Extra-"]'
requestheader-group-headers: '["X-Remote-Group"]'
requestheader-username-headers: '["X-Remote-User"]'
kind: ConfigMap
metadata:
creationTimestamp: "2020-02-09T12:04:05Z"
name: extension-apiserver-authentication
namespace: kube-system
resourceVersion: "2930987"
selfLink: /api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication
uid: 6cb1a94e-78e5-49c9-8f5a-ae8183f8de96
在安装metrics-server之前,虽然kubelet收集了系统信息,但是这些信息只能通过kubelet的接口进行访问,调用kubectl top nodes会报如下错误
$ kubectl top nodes
Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)
下载metrics-server对应的镜像,上传到自己的私有镜像库中
docker pull gcr.azk8s.cn/google_containers/metrics-server-amd64:v0.3.6
docker tag gcr.azk8s.cn/google_containers/metrics-server-amd64:v0.3.6 192.168.0.107/k8s/metrics-server-amd64:v0.3.6
docker push 192.168.0.107/k8s/metrics-server-amd64:v0.3.6
下载metrics-server启动文件
$ cd /opt/k8s/work/
$ wget https://github.com/kubernetes-sigs/metrics-server/archive/master.zip
$ unzip master.zip
$ cd metrics-server-master/deploy/kubernetes
修改 metrics-server-deployment.yaml 文件,为 metrics-server 添加两个命令行参数,并修改镜像名称,指向自己的私有仓库
$ diff metrics-server-deployment.yaml metrics-server-deployment.yaml.bak
32c32
< image: 192.168.0.107/k8s/metrics-server-amd64:v0.3.6
---
> image: k8s.gcr.io/metrics-server-amd64:v0.3.6
36,37d35
< - --metric-resolution=30s
< - --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
kubelet-preferred-address:优先选择用IP访问kubelet,否则会用主机的hostname来访问,默认安装的coreDns不支持hostname的解析,也可通过修改coreDNS的配置文件,追加 hosts配置
...
Corefile: |
.:53 {
errors
healthhosts {
192.168.0.107 master
192.168.0.114 slave
fallthrough
}
...
启动metrics-server
$ cd /opt/k8s/work/metrics-server-master/deploy/kubernetes
$ kubectl create -f .
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
serviceaccount/metrics-server created
deployment.apps/metrics-server created
service/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
查看运行情况
$ kubectl -n kube-system get all -l k8s-app=metrics-server
NAME READY STATUS RESTARTS AGE
pod/metrics-server-857d7c4878-swpvk 1/1 Running 0 72s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/metrics-server 1/1 1 1 72s
NAME DESIRED CURRENT READY AGE
replicaset.apps/metrics-server-857d7c4878 1 1 1 72s
查看 metrics-server 输出的 metrics
$ kubectl get --raw https://192.168.0.107:6443/apis/metrics.k8s.io/v1beta1/nodes | jq .
{
"kind": "NodeMetricsList",
"apiVersion": "metrics.k8s.io/v1beta1",
"metadata": {
"selfLink": "/apis/metrics.k8s.io/v1beta1/nodes"
},
"items": [
{
"metadata": {
"name": "master",
"selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/master",
"creationTimestamp": "2020-02-27T09:30:12Z"
},
"timestamp": "2020-02-27T09:29:35Z",
"window": "30s",
"usage": {
"cpu": "414650216n",
"memory": "6069004Ki"
}
},
{
"metadata": {
"name": "slave",
"selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/slave",
"creationTimestamp": "2020-02-27T09:30:12Z"
},
"timestamp": "2020-02-27T09:29:35Z",
"window": "30s",
"usage": {
"cpu": "80942639n",
"memory": "2393408Ki"
}
}
]
}
$ kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
master 427m 10% 5928Mi 76%
slave 92m 2% 2335Mi 62%
启动完后,服务都正常,获取不到指标信息
$ kubectl top nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
kube-apiserver 服务中一直出现下面的错
E0227 16:52:50.472445 19192 available_controller.go:419] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.0.0.96:443/apis/metrics.k8s.io/v1beta1: bad status from https://10.0.0.96:443/apis/metrics.k8s.io/v1beta1: 403
按照这个错,应该是kube-apiserver访问metrics-server时权限不足,可我们明明提供了proxy*相关的参数,最后再一次检查kube-apiserver的启动文件,发现--proxy-client-cert-file参数后面少了个分行符
错误
--proxy-client-cert-file=/etc/kubernetes/cert/proxy-client.pem
--proxy-client-key-file=/etc/kubernetes/cert/proxy-client-key.pem \
正确
--proxy-client-cert-file=/etc/kubernetes/cert/proxy-client.pem \
--proxy-client-key-file=/etc/kubernetes/cert/proxy-client-key.pem \
手机扫一扫
移动阅读更方便
你可能感兴趣的文章