Kubernetes更優(yōu)雅的監(jiān)控工具PrometheusOperator

Kubernetes更優(yōu)雅的監(jiān)控工具Prometheus Operator

[TOC]

創(chuàng)新互聯(lián)建站是一家專注于成都網(wǎng)站設(shè)計(jì)、網(wǎng)站建設(shè)與策劃設(shè)計(jì),盤山網(wǎng)站建設(shè)哪家好?創(chuàng)新互聯(lián)建站做網(wǎng)站,專注于網(wǎng)站建設(shè)10余年,網(wǎng)設(shè)計(jì)領(lǐng)域的專業(yè)建站公司;建站業(yè)務(wù)涵蓋:盤山等地區(qū)。盤山做網(wǎng)站價(jià)格咨詢:028-86922220

1. Kubernetes Operator 介紹

在 Kubernetes 的支持下，管理和伸縮 Web 應(yīng)用、移動(dòng)應(yīng)用后端以及 API 服務(wù)都變得比較簡(jiǎn)單了。其原因是這些應(yīng)用一般都是無(wú)狀態(tài)的，所以 Deployment 這樣的基礎(chǔ) Kubernetes API 對(duì)象就可以在無(wú)需附加操作的情況下，對(duì)應(yīng)用進(jìn)行伸縮和故障恢復(fù)了。

而對(duì)于數(shù)據(jù)庫(kù)、緩存或者監(jiān)控系統(tǒng)等有狀態(tài)應(yīng)用的管理，就是個(gè)挑戰(zhàn)了。這些系統(tǒng)需要應(yīng)用領(lǐng)域的知識(shí)，來(lái)正確的進(jìn)行伸縮和升級(jí)，當(dāng)數(shù)據(jù)丟失或不可用的時(shí)候，要進(jìn)行有效的重新配置。我們希望這些應(yīng)用相關(guān)的運(yùn)維技能可以編碼到軟件之中，從而借助 Kubernetes 的能力，正確的運(yùn)行和管理復(fù)雜應(yīng)用。

Operator 這種軟件，使用 TPR(第三方資源，現(xiàn)在已經(jīng)升級(jí)為 CRD) 機(jī)制對(duì) Kubernetes API 進(jìn)行擴(kuò)展，將特定應(yīng)用的知識(shí)融入其中，讓用戶可以創(chuàng)建、配置和管理應(yīng)用。和 Kubernetes 的內(nèi)置資源一樣，Operator 操作的不是一個(gè)單實(shí)例應(yīng)用，而是集群范圍內(nèi)的多實(shí)例。

2. Prometheus Operator介紹

Kubernetes的Prometheus Operator為Kubernetes服務(wù)和Prometheus實(shí)例的部署和管理提供了簡(jiǎn)單的監(jiān)控定義。

安裝完畢后，Prometheus Operator提供了以下功能：

創(chuàng)建/毀壞: 在Kubernetes namespace中更容易啟動(dòng)一個(gè)Prometheus實(shí)例，一個(gè)特定的應(yīng)用程序或團(tuán)隊(duì)更容易使用Operator。
簡(jiǎn)單配置: 配置Prometheus的基礎(chǔ)東西，比如在Kubernetes的本地資源versions, persistence, retention policies, 和replicas。
Target Services通過(guò)標(biāo)簽: 基于常見的Kubernetes label查詢，自動(dòng)生成監(jiān)控target 配置；不需要學(xué)習(xí)普羅米修斯特定的配置語(yǔ)言。

Prometheus Operator 架構(gòu)圖如下：

Kubernetes更優(yōu)雅的監(jiān)控工具Prometheus Operator

以上架構(gòu)中的各組成部分以不同的資源方式運(yùn)行在 Kubernetes 集群中，它們各自有不同的作用：

Operator： Operator 資源會(huì)根據(jù)自定義資源（Custom Resource Definition / CRDs）來(lái)部署和管理 Prometheus Server，同時(shí)監(jiān)控這些自定義資源事件的變化來(lái)做相應(yīng)的處理，是整個(gè)系統(tǒng)的控制中心。
Prometheus： Prometheus 資源是聲明性地描述 Prometheus 部署的期望狀態(tài)。
Prometheus Server： Operator 根據(jù)自定義資源 Prometheus 類型中定義的內(nèi)容而部署的 Prometheus Server 集群，這些自定義資源可以看作是用來(lái)管理 Prometheus Server 集群的 StatefulSets 資源。
ServiceMonitor： ServiceMonitor 也是一個(gè)自定義資源，它描述了一組被 Prometheus 監(jiān)控的 targets 列表。該資源通過(guò) Labels 來(lái)選取對(duì)應(yīng)的 Service Endpoint，讓 Prometheus Server 通過(guò)選取的 Service 來(lái)獲取 Metrics 信息。
Service： Service 資源主要用來(lái)對(duì)應(yīng) Kubernetes 集群中的 Metrics Server Pod，來(lái)提供給 ServiceMonitor 選取讓 Prometheus Server 來(lái)獲取信息。簡(jiǎn)單的說(shuō)就是 Prometheus 監(jiān)控的對(duì)象，例如 Node Exporter Service、MySQL Exporter Service 等等。
Alertmanager： Alertmanager 也是一個(gè)自定義資源類型，由 Operator 根據(jù)資源描述內(nèi)容來(lái)部署 Alertmanager 集群。

3. Prometheus Operator部署

環(huán)境：

Kubernetes version: kubeadm安裝的1.12
helm version: v2.11.0

我們使用helm安裝。helm chart根據(jù)實(shí)際使用修改。prometheus-operator

里面整合了grafana和監(jiān)控kubernetes的exporter。需要注意的是，grafana我配置使用了mysql保存數(shù)據(jù)，相關(guān)說(shuō)明在另一篇文章中《使用Helm部署Prometheus和Grafana監(jiān)控Kubernetes》。

cd helm/prometheus-operator/
helm install --name prometheus-operator --namespace monitoring -f values.yaml ./

為了更加靈活的的使用Prometheus Operator，添加自定義監(jiān)控是必不可少的。這里我們使用ceph-exporter做示例。

values.yaml中這一段即是使用servicemonitor來(lái)添加監(jiān)控：

serviceMonitor:
  enabled: true  # 開啟監(jiān)控
  # on what port are the metrics exposed by etcd
  exporterPort: 9128
  # for apps that have deployed outside of the cluster, list their adresses here
  endpoints: []
  # Are we talking http or https?
  scheme: http
  # service selector label key to target ceph exporter pods
  serviceSelectorLabelKey: app
  # default rules are in templates/ceph-exporter.rules.yaml
  prometheusRules: {}
  # Custom Labels to be added to ServiceMonitor
  # 經(jīng)過(guò)測(cè)試，servicemonitor標(biāo)簽添加prometheus operator的release標(biāo)簽即可正常監(jiān)控
  additionalServiceMonitorLabels: 
    release: prometheus-operator
  #Custom Labels to be added to Prometheus Rules CRD
  additionalRulesLabels: {}

最重要的是這個(gè)參數(shù)additionalServiceMonitorLabels，經(jīng)過(guò)測(cè)試，servicemonitor需要添加prometheus operator已有的標(biāo)簽，才能成功添加監(jiān)控。

[root@lab1 prometheus-operator]# kubectl get servicemonitor ceph-exporter -n monitoring -o yaml
[root@lab1 templates]# kubectl get servicemonitor -n monitoring ceph-exporter -o yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  creationTimestamp: 2018-10-30T06:51:12Z
  generation: 1
  labels:
    app: ceph-exporter
    chart: ceph-exporter-0.1.0
    heritage: Tiller
    prometheus: ceph-exporter
    release: prometheus-operator
  name: ceph-exporter
  namespace: monitoring
  resourceVersion: "13937459"
  selfLink: /apis/monitoring.coreos.com/v1/namespaces/monitoring/servicemonitors/ceph-exporter
  uid: 30569173-dc10-11e8-bcf3-000c293d66a5
spec:
  endpoints:
  - interval: 30s
    port: http
  namespaceSelector:
    matchNames:
    - monitoring
  selector:
    matchLabels:
      app: ceph-exporter
      release: ceph-exporter

[root@lab1 prometheus-operator]# kubectl get pod -n monitoring  prometheus-operator-operator-7459848949-8dddt -o yaml|more
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: 2018-10-30T00:39:37Z
  generateName: prometheus-operator-operator-7459848949-
  labels:
    app: prometheus-operator-operator
    chart: prometheus-operator-0.1.6
    heritage: Tiller
    pod-template-hash: "745984894
    release: prometheus-operator

要點(diǎn)說(shuō)明：

ServiceMonitor的標(biāo)簽中至少需要有和prometheus-operator POD中標(biāo)簽相匹配；
ServiceMonitor的spec參數(shù)
service能被prometheus訪問(wèn)，各端點(diǎn)正常；
遇到問(wèn)題，可以開啟prometheus operator和prometheus的調(diào)試日志。雖然日志沒有什么其它信息，但是prometheus operator調(diào)試日志可以看到當(dāng)前監(jiān)控到的servicemonitor，這樣可以確認(rèn)安裝的servicemonitor是否被匹配到。

安裝成功后，查看相關(guān)資源：

[root@lab1 prometheus-operator]# kubectl get service,servicemonitor,ep -n monitoring
NAME                                                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
service/alertmanager-operated                          ClusterIP   None             <none>        9093/TCP,6783/TCP   12d
service/ceph-exporter                                  ClusterIP   10.100.57.62     <none>        9128/TCP            46h
service/monitoring-mysql-mysql                         ClusterIP   10.108.93.155    <none>        3306/TCP            42d
service/prometheus-operated                            ClusterIP   None             <none>        9090/TCP            12d
service/prometheus-operator-alertmanager               ClusterIP   10.98.42.209     <none>        9093/TCP            6d19h
service/prometheus-operator-grafana                    ClusterIP   10.103.100.150   <none>        80/TCP              6d19h
service/prometheus-operator-kube-state-metrics         ClusterIP   10.110.76.250    <none>        8080/TCP            6d19h
service/prometheus-operator-operator                   ClusterIP   None             <none>        8080/TCP            6d19h
service/prometheus-operator-prometheus                 ClusterIP   10.111.24.83     <none>        9090/TCP            6d19h
service/prometheus-operator-prometheus-node-exporter   ClusterIP   10.97.126.74     <none>        9100/TCP            6d19h

NAME                                                                               AGE
servicemonitor.monitoring.coreos.com/ceph-exporter                                 1d
servicemonitor.monitoring.coreos.com/prometheus-operator                           8d
servicemonitor.monitoring.coreos.com/prometheus-operator-alertmanager              6d
servicemonitor.monitoring.coreos.com/prometheus-operator-apiserver                 6d
servicemonitor.monitoring.coreos.com/prometheus-operator-coreDNS                   6d
servicemonitor.monitoring.coreos.com/prometheus-operator-kube-controller-manager   6d
servicemonitor.monitoring.coreos.com/prometheus-operator-kube-etcd                 6d
servicemonitor.monitoring.coreos.com/prometheus-operator-kube-scheduler            6d
servicemonitor.monitoring.coreos.com/prometheus-operator-kube-state-metrics        6d
servicemonitor.monitoring.coreos.com/prometheus-operator-kubelet                   6d
servicemonitor.monitoring.coreos.com/prometheus-operator-node-exporter             6d
servicemonitor.monitoring.coreos.com/prometheus-operator-operator                  6d
servicemonitor.monitoring.coreos.com/prometheus-operator-prometheus                6d

NAME                                                     ENDPOINTS                                                                 AGE
endpoints/alertmanager-operated                          10.244.6.174:9093,10.244.6.174:6783                                       12d
endpoints/ceph-exporter                                  10.244.2.59:9128                                                          46h
endpoints/monitoring-mysql-mysql                         10.244.6.171:3306                                                         42d
endpoints/prometheus-operated                            10.244.2.60:9090,10.244.6.175:9090                                        12d
endpoints/prometheus-operator-alertmanager               10.244.6.174:9093                                                         6d19h
endpoints/prometheus-operator-grafana                    10.244.6.106:3000                                                         6d19h
endpoints/prometheus-operator-kube-state-metrics         10.244.2.163:8080                                                         6d19h
endpoints/prometheus-operator-operator                   10.244.6.113:8080                                                         6d19h
endpoints/prometheus-operator-prometheus                 10.244.2.60:9090,10.244.6.175:9090                                        6d19h
endpoints/prometheus-operator-prometheus-node-exporter   192.168.105.92:9100,192.168.105.93:9100,192.168.105.94:9100 + 4 more...   6d19h

4. Grafana添加dashboard

上面的prometheus-operator里的_dashboards有我修改過(guò)的dashboard，比較全面，使用手動(dòng)在grafana界面導(dǎo)入，后續(xù)可以隨意修改dashboard，使用過(guò)程中非常方便。而如果將dashboard json文件放到dashboards目錄中，helm安裝的話，安裝的dashboard不支持grafana中直接修改，使用過(guò)程中比較麻煩。

5. Alertmanager添加報(bào)警

添加prometheusrule，以下是一個(gè)示例：

[root@lab1 ceph-exporter]# kubectl get prometheusrule -n monitoring ceph-exporter -o yaml 
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  creationTimestamp: 2018-10-30T06:51:12Z
  generation: 1
  labels:
    app: prometheus
    chart: ceph-exporter-0.1.0
    heritage: Tiller
    prometheus: ceph-exporter
    release: ceph-exporter
  name: ceph-exporter
  namespace: monitoring
  resourceVersion: "13965150"
  selfLink: /apis/monitoring.coreos.com/v1/namespaces/monitoring/prometheusrules/ceph-exporter
  uid: 30543ec9-dc10-11e8-bcf3-000c293d66a5
spec:
  groups:
  - name: ceph-exporter.rules
    rules:
    - alert: Ceph
      annotations:
        description: There is no running ceph exporter.
        summary: Ceph exporter is down
      expr: absent(up{job="ceph-exporter"} == 1)
      for: 5m
      labels:
        severity: critical

默認(rèn)監(jiān)控k8s的rule已經(jīng)很多很全面了，可以自行調(diào)整prometheus-operator/templates/all-prometheus-rules.yaml。

報(bào)警規(guī)則可修改values.yaml中alertmanager:下面這段

  config:
    global:
      resolve_timeout: 5m
      # The smarthost and SMTP sender used for mail notifications.
      smtp_smarthost: 'smtp.163.com:25'
      smtp_from: 'xxxxxx@163.com'
      smtp_auth_username: 'xxxxxx@163.com'
      smtp_auth_password: 'xxxxxx'
      # The API URL to use for Slack notifications.
      slack_api_url: 'https://hooks.slack.com/services/some/api/token'
    route:
      group_by: ["job", "alertname"]
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h
      receiver: 'noemail'
      routes:
      - match:
          severity: critical
        receiver: critical_email_alert
      - match_re:
          alertname: "^KubeJob*"
        receiver: default_email

    receivers:
      - name: 'default_email'
        email_configs:
        - to : 'xxxxxx@163.com'
          send_resolved: true

      - name: 'critical_email_alert'
        email_configs:
        - to : 'xxxxxx@163.com'
          send_resolved: true

      - name: 'noemail'
        email_configs:
        - to : 'null@null.cn'
          send_resolved: false

  ## Alertmanager template files to format alerts
  ## ref: https://prometheus.io/docs/alerting/notifications/
  ##      https://prometheus.io/docs/alerting/notification_examples/
  ##
  templateFiles:
    template_1.tmpl: |-
      {{ define "cluster" }}{{ .ExternalURL | reReplaceAll ".*alertmanager\\.(.*)" "$1" }}{{ end }}

      {{ define "slack.k8s.text" }}
      {{- $root := . -}}
      {{ range .Alerts }}
       *Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}`
       *Cluster:*  {{ template "cluster" $root }}
       *Description:* {{ .Annotations.description }}
       *Graph:* <{{ .GeneratorURL }}|:chart_with_upwards_trend:>
       *Runbook:* <{{ .Annotations.runbook }}|:spiral_note_pad:>
       *Details:*
         {{ range .Labels.SortedPairs }} ? *{{ .Name }}:* `{{ .Value }}`
         {{ end }}

6. 小結(jié)

Prometheus Operator通過(guò)定義servicemonitor和prometheusrule就能動(dòng)態(tài)調(diào)整prometheus和alertmanager配置，更加符合Kubernetes的操作習(xí)慣，使Kubernetes監(jiān)控更優(yōu)雅。

參考資料：
[1] https://www.kancloud.cn/huyipow/prometheus/527093
[2] https://coreos.com/blog/introducing-operators.html
[3] https://coreos.com/blog/the-prometheus-operator.html
[4] https://github.com/coreos/prometheus-operator
[5] https://prometheus.io/docs/introduction/overview/
[6] https://prometheus.io/docs/alerting/alertmanager/
[7] https://github.com/1046102779/prometheus

名稱欄目：Kubernetes更優(yōu)雅的監(jiān)控工具PrometheusOperator
文章出自：http://chinadenli.net/article16/gidjgg.html

成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián)，為您提供軟件開發(fā)、ChatGPT、企業(yè)建站、網(wǎng)站導(dǎo)航、品牌網(wǎng)站建設(shè)、網(wǎng)站改版

聲明：本網(wǎng)站發(fā)布的內(nèi)容（圖片、視頻和文字）以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主，如果涉及侵權(quán)請(qǐng)盡快告知，我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng)，如需處理請(qǐng)聯(lián)系客服。電話：028-86922220；郵箱：631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載，或轉(zhuǎn)載時(shí)需注明來(lái)源：創(chuàng)新互聯(lián)

猜你還喜歡下面的內(nèi)容

欧美一区二区三区老妇人-欧美做爰猛烈大尺度电-99久久夜色精品国产亚洲a-亚洲福利视频一区二区

Kubernetes更優(yōu)雅的監(jiān)控工具PrometheusOperator