安装监控插件

在 Kubernetes 集群中，可以通过安装 Heapster 插件来统计整个集群的资源使用情况，并通过集成 InfluxDB 数据库来存储监控数据以及 Grafana 控制台来提供数据分析和可视化界面，通过这三者的搭配使用，则可以实现对集群的资源使用情况的监控。

⚠️ 需要注意的是，上面三个插件只是并不负责 数据的采集 工作，数据的采集是依靠 Kubelet 组件中的名为 cAdvisor 的 agent 来完成的，Heapster 会主动请求所有的 cAdvisor 来获取所需数据，如下图：

Heapster 以 Pod 的形式运行在某个 Node 中，通过 Service 暴露服务，使外部可以通过一个稳定的 IP 地址访问。它从集群中所有的 cAdvisor 汇集数据，然后通过一个单独的地址暴露。使用了 InfluxDB 后，它会将汇集的所有监控数据都存储在其中，然后 Grafana 会以该数据库为数据源，将数据可视化展出。

安装三个插件

首先从 Github 中下载相关源码，源码中有相关的 YAML 配置文件，这些配置文件需要进行相应的修改才可以正常使用：

🌈 ➜ git clone https://github.com/kubernetes-retired/heapster.git
🌈 ➜ cd deploy/kube-config/influxdb
🌈 ➜ ls -l
total 24
-rw-r--r--@ 1 yangsijie  staff  2338  4  5 19:44 grafana.yaml
-rw-r--r--@ 1 yangsijie  staff  1490  4  5 19:34 heapster.yaml
-rw-r--r--@ 1 yangsijie  staff  1025  4  5 18:22 influxdb.yaml

上面的三个 YAML 文件分别对应了三个插件

安装 Heapster

在使用 YAML 配置文件安装前，需要对该配置文件进行相应的修改，主要有以下四处：

将 Deployment 的 apiVersion 改为 apps/v1；
将 Deployment 的 spec.selector 手动指定，否则创建会验证报错；
为 Heapster 使用的 ServiceAccount 授权，否则会出现类似 403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:heapster, verb=create, resource=nodes, subresource=stats)" 这类的无权限提示；
修改 Heapster 应用的启动参数，否则会出现类似 Error in scraping containers from kubelet:192.168.10.77:10255: failed to get all container stats from Kubelet URL "http://192.168.10.77:10255/stats/container/": Post http://192.168.10.77:10255/stats/container/: dial tcp 192.168.10.77:10255: getsockopt: connection refused 的错误；

修改后的 YAML 配置文件内容如下：

apiVersion: v1
kind: ServiceAccount
metadata:
  name: heapster
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: heapster-cluster-admin
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: heapster
  namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: heapster
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      task: monitoring
      k8s-app: heapster
  template:
    metadata:
      labels:
        task: monitoring
        k8s-app: heapster
    spec:
      serviceAccountName: heapster
      containers:
      - name: heapster
        image: k8s.gcr.io/heapster-amd64:v1.5.4
        imagePullPolicy: IfNotPresent
        command:
        - /heapster
        - --source=kubernetes:https://kubernetes.default?kubeletHttps=true&kubeletPort=10250&insecure=true
        - --sink=influxdb:http://monitoring-influxdb.kube-system.svc:8086
---
apiVersion: v1
kind: Service
metadata:
  labels:
    task: monitoring
    # For use as a Cluster add-on (https://github.com/kubernetes/kubernetes/tree/master/cluster/addons)
    # If you are NOT using this as an addon, you should comment out this line.
    kubernetes.io/cluster-service: 'true'
    kubernetes.io/name: Heapster
  name: heapster
  namespace: kube-system
spec:
  ports:
  - port: 80
    targetPort: 8082
  selector:
    k8s-app: heapster

相对于原文件：
添加了 7 - 18 行；
修改了第 20 行；
添加了 27 - 30 行；
修改了第 44 行；

安装 Influxdb

在使用相应 YAML 配置文件安装之前，同样对配置文件进行了相应修改，主要有两处：

将 Deployment 的 apiVersion 改为 apps/v1；
将 Deployment 的 spec.selector 手动指定，否则创建会验证报错；

修改后的 YAML 配置文件如下：

apiVersion: apps/v1
kind: Deployment
metadata:
  name: monitoring-influxdb
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      task: monitoring
      k8s-app: influxdb
  template:
    metadata:
      labels:
        task: monitoring
        k8s-app: influxdb
    spec:
      containers:
      - name: influxdb
        image: k8s.gcr.io/heapster-influxdb-amd64:v1.5.2
        volumeMounts:
        - mountPath: /data
          name: influxdb-storage
      volumes:
      - name: influxdb-storage
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  labels:
    task: monitoring
    # For use as a Cluster add-on (https://github.com/kubernetes/kubernetes/tree/master/cluster/addons)
    # If you are NOT using this as an addon, you should comment out this line.
    kubernetes.io/cluster-service: 'true'
    kubernetes.io/name: monitoring-influxdb
  name: monitoring-influxdb
  namespace: kube-system
spec:
  ports:
  - port: 8086
    targetPort: 8086
  selector:
    k8s-app: influxdb

相对于原文件：
修改了第 1 行；
添加了第 8 - 11 行；

安装 Grafana

在使用相应 YAML 配置文件安装之前，同样对配置文件进行了相应修改，主要有三处：

将 Deployment 的 apiVersion 改为 apps/v1；
将 Deployment 的 spec.selector 手动指定，否则创建会验证报错；
将用于暴露其服务的 Service 的类型修改为 NodePort 类型，便于用户从集群外部访问；

修改后的 YAML 配置文件如下：

apiVersion: apps/v1
kind: Deployment
metadata:
  name: monitoring-grafana
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      task: monitoring
      k8s-app: grafana
  template:
    metadata:
      labels:
        task: monitoring
        k8s-app: grafana
    spec:
      containers:
      - name: grafana
        image: k8s.gcr.io/heapster-grafana-amd64:v5.0.4
        ports:
        - containerPort: 3000
          protocol: TCP
        volumeMounts:
        - mountPath: /etc/ssl/certs
          name: ca-certificates
          readOnly: true
        - mountPath: /var
          name: grafana-storage
        env:
        - name: INFLUXDB_HOST
          value: monitoring-influxdb
        - name: GF_SERVER_HTTP_PORT
          value: "3000"
          # The following env variables are required to make Grafana accessible via
          # the kubernetes api-server proxy. On production clusters, we recommend
          # removing these env variables, setup auth for grafana, and expose the grafana
          # service using a LoadBalancer or a public IP.
        - name: GF_AUTH_BASIC_ENABLED
          value: "false"
        - name: GF_AUTH_ANONYMOUS_ENABLED
          value: "true"
        - name: GF_AUTH_ANONYMOUS_ORG_ROLE
          value: Admin
        - name: GF_SERVER_ROOT_URL
          # If you're only using the API Server proxy, set this value instead:
          # value: /api/v1/namespaces/kube-system/services/monitoring-grafana/proxy
          value: /
      volumes:
      - name: ca-certificates
        hostPath:
          path: /etc/ssl/certs
      - name: grafana-storage
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  labels:
    # For use as a Cluster add-on (https://github.com/kubernetes/kubernetes/tree/master/cluster/addons)
    # If you are NOT using this as an addon, you should comment out this line.
    kubernetes.io/cluster-service: 'true'
    kubernetes.io/name: monitoring-grafana
  name: monitoring-grafana
  namespace: kube-system
spec:
  # In a production setup, we recommend accessing Grafana through an external Loadbalancer
  # or through a public IP.
  # type: LoadBalancer
  # You could also use NodePort to expose the service at a randomly-generated port
  type: NodePort
  ports:
  - port: 80
    targetPort: 3000
  selector:
    k8s-app: grafana

相对于原文件：
修改了第 1 行；
添加了第 8 - 11 行；
修改了第 71 行；

打开 Grafana 界面

通过查看 Grafana 的 Service 监听的端口，就可以打开相应的 Web 界面：

🌈 ➜  kubectl get services -n kube-system
NAME                  TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE
monitoring-grafana    NodePort    10.105.118.191   <none>        80:31906/TCP             3h14m