[TOC]
創(chuàng)新互聯(lián)建站是一家專注于成都網(wǎng)站設(shè)計(jì)、網(wǎng)站建設(shè)與策劃設(shè)計(jì),盤山網(wǎng)站建設(shè)哪家好?創(chuàng)新互聯(lián)建站做網(wǎng)站,專注于網(wǎng)站建設(shè)10余年,網(wǎng)設(shè)計(jì)領(lǐng)域的專業(yè)建站公司;建站業(yè)務(wù)涵蓋:盤山等地區(qū)。盤山做網(wǎng)站價(jià)格咨詢:028-86922220
在 Kubernetes 的支持下,管理和伸縮 Web 應(yīng)用、移動(dòng)應(yīng)用后端以及 API 服務(wù)都變得比較簡(jiǎn)單了。其原因是這些應(yīng)用一般都是無(wú)狀態(tài)的,所以 Deployment 這樣的基礎(chǔ) Kubernetes API 對(duì)象就可以在無(wú)需附加操作的情況下,對(duì)應(yīng)用進(jìn)行伸縮和故障恢復(fù)了。
而對(duì)于數(shù)據(jù)庫(kù)、緩存或者監(jiān)控系統(tǒng)等有狀態(tài)應(yīng)用的管理,就是個(gè)挑戰(zhàn)了。這些系統(tǒng)需要應(yīng)用領(lǐng)域的知識(shí),來(lái)正確的進(jìn)行伸縮和升級(jí),當(dāng)數(shù)據(jù)丟失或不可用的時(shí)候,要進(jìn)行有效的重新配置。我們希望這些應(yīng)用相關(guān)的運(yùn)維技能可以編碼到軟件之中,從而借助 Kubernetes 的能力,正確的運(yùn)行和管理復(fù)雜應(yīng)用。
Operator 這種軟件,使用 TPR(第三方資源,現(xiàn)在已經(jīng)升級(jí)為 CRD) 機(jī)制對(duì) Kubernetes API 進(jìn)行擴(kuò)展,將特定應(yīng)用的知識(shí)融入其中,讓用戶可以創(chuàng)建、配置和管理應(yīng)用。和 Kubernetes 的內(nèi)置資源一樣,Operator 操作的不是一個(gè)單實(shí)例應(yīng)用,而是集群范圍內(nèi)的多實(shí)例。
Kubernetes的Prometheus Operator為Kubernetes服務(wù)和Prometheus實(shí)例的部署和管理提供了簡(jiǎn)單的監(jiān)控定義。
安裝完畢后,Prometheus Operator提供了以下功能:
Prometheus Operator 架構(gòu)圖如下:
以上架構(gòu)中的各組成部分以不同的資源方式運(yùn)行在 Kubernetes 集群中,它們各自有不同的作用:
Operator: Operator 資源會(huì)根據(jù)自定義資源(Custom Resource Definition / CRDs)來(lái)部署和管理 Prometheus Server,同時(shí)監(jiān)控這些自定義資源事件的變化來(lái)做相應(yīng)的處理,是整個(gè)系統(tǒng)的控制中心。
Prometheus: Prometheus 資源是聲明性地描述 Prometheus 部署的期望狀態(tài)。
Prometheus Server: Operator 根據(jù)自定義資源 Prometheus 類型中定義的內(nèi)容而部署的 Prometheus Server 集群,這些自定義資源可以看作是用來(lái)管理 Prometheus Server 集群的 StatefulSets 資源。
ServiceMonitor: ServiceMonitor 也是一個(gè)自定義資源,它描述了一組被 Prometheus 監(jiān)控的 targets 列表。該資源通過(guò) Labels 來(lái)選取對(duì)應(yīng)的 Service Endpoint,讓 Prometheus Server 通過(guò)選取的 Service 來(lái)獲取 Metrics 信息。
Service: Service 資源主要用來(lái)對(duì)應(yīng) Kubernetes 集群中的 Metrics Server Pod,來(lái)提供給 ServiceMonitor 選取讓 Prometheus Server 來(lái)獲取信息。簡(jiǎn)單的說(shuō)就是 Prometheus 監(jiān)控的對(duì)象,例如 Node Exporter Service、MySQL Exporter Service 等等。
Alertmanager: Alertmanager 也是一個(gè)自定義資源類型,由 Operator 根據(jù)資源描述內(nèi)容來(lái)部署 Alertmanager 集群。
環(huán)境:
kubeadm安裝的1.12
v2.11.0
我們使用helm安裝。helm chart根據(jù)實(shí)際使用修改。prometheus-operator
里面整合了grafana和監(jiān)控kubernetes的exporter。需要注意的是,grafana我配置使用了mysql保存數(shù)據(jù),相關(guān)說(shuō)明在另一篇文章中《使用Helm部署Prometheus和Grafana監(jiān)控Kubernetes》。
cd helm/prometheus-operator/
helm install --name prometheus-operator --namespace monitoring -f values.yaml ./
為了更加靈活的的使用Prometheus Operator,添加自定義監(jiān)控是必不可少的。這里我們使用ceph-exporter做示例。
values.yaml
中這一段即是使用servicemonitor來(lái)添加監(jiān)控:
serviceMonitor:
enabled: true # 開啟監(jiān)控
# on what port are the metrics exposed by etcd
exporterPort: 9128
# for apps that have deployed outside of the cluster, list their adresses here
endpoints: []
# Are we talking http or https?
scheme: http
# service selector label key to target ceph exporter pods
serviceSelectorLabelKey: app
# default rules are in templates/ceph-exporter.rules.yaml
prometheusRules: {}
# Custom Labels to be added to ServiceMonitor
# 經(jīng)過(guò)測(cè)試,servicemonitor標(biāo)簽添加prometheus operator的release標(biāo)簽即可正常監(jiān)控
additionalServiceMonitorLabels:
release: prometheus-operator
#Custom Labels to be added to Prometheus Rules CRD
additionalRulesLabels: {}
最重要的是這個(gè)參數(shù)
additionalServiceMonitorLabels
,經(jīng)過(guò)測(cè)試,servicemonitor需要添加prometheus operator已有的標(biāo)簽,才能成功添加監(jiān)控。
[root@lab1 prometheus-operator]# kubectl get servicemonitor ceph-exporter -n monitoring -o yaml
[root@lab1 templates]# kubectl get servicemonitor -n monitoring ceph-exporter -o yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
creationTimestamp: 2018-10-30T06:51:12Z
generation: 1
labels:
app: ceph-exporter
chart: ceph-exporter-0.1.0
heritage: Tiller
prometheus: ceph-exporter
release: prometheus-operator
name: ceph-exporter
namespace: monitoring
resourceVersion: "13937459"
selfLink: /apis/monitoring.coreos.com/v1/namespaces/monitoring/servicemonitors/ceph-exporter
uid: 30569173-dc10-11e8-bcf3-000c293d66a5
spec:
endpoints:
- interval: 30s
port: http
namespaceSelector:
matchNames:
- monitoring
selector:
matchLabels:
app: ceph-exporter
release: ceph-exporter
[root@lab1 prometheus-operator]# kubectl get pod -n monitoring prometheus-operator-operator-7459848949-8dddt -o yaml|more
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: 2018-10-30T00:39:37Z
generateName: prometheus-operator-operator-7459848949-
labels:
app: prometheus-operator-operator
chart: prometheus-operator-0.1.6
heritage: Tiller
pod-template-hash: "745984894
release: prometheus-operator
要點(diǎn)說(shuō)明:
ServiceMonitor
的標(biāo)簽中至少需要有和prometheus-operator POD中標(biāo)簽相匹配;ServiceMonitor
的spec參數(shù)service
能被prometheus訪問(wèn),各端點(diǎn)正常;安裝成功后,查看相關(guān)資源:
[root@lab1 prometheus-operator]# kubectl get service,servicemonitor,ep -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/alertmanager-operated ClusterIP None <none> 9093/TCP,6783/TCP 12d
service/ceph-exporter ClusterIP 10.100.57.62 <none> 9128/TCP 46h
service/monitoring-mysql-mysql ClusterIP 10.108.93.155 <none> 3306/TCP 42d
service/prometheus-operated ClusterIP None <none> 9090/TCP 12d
service/prometheus-operator-alertmanager ClusterIP 10.98.42.209 <none> 9093/TCP 6d19h
service/prometheus-operator-grafana ClusterIP 10.103.100.150 <none> 80/TCP 6d19h
service/prometheus-operator-kube-state-metrics ClusterIP 10.110.76.250 <none> 8080/TCP 6d19h
service/prometheus-operator-operator ClusterIP None <none> 8080/TCP 6d19h
service/prometheus-operator-prometheus ClusterIP 10.111.24.83 <none> 9090/TCP 6d19h
service/prometheus-operator-prometheus-node-exporter ClusterIP 10.97.126.74 <none> 9100/TCP 6d19h
NAME AGE
servicemonitor.monitoring.coreos.com/ceph-exporter 1d
servicemonitor.monitoring.coreos.com/prometheus-operator 8d
servicemonitor.monitoring.coreos.com/prometheus-operator-alertmanager 6d
servicemonitor.monitoring.coreos.com/prometheus-operator-apiserver 6d
servicemonitor.monitoring.coreos.com/prometheus-operator-coreDNS 6d
servicemonitor.monitoring.coreos.com/prometheus-operator-kube-controller-manager 6d
servicemonitor.monitoring.coreos.com/prometheus-operator-kube-etcd 6d
servicemonitor.monitoring.coreos.com/prometheus-operator-kube-scheduler 6d
servicemonitor.monitoring.coreos.com/prometheus-operator-kube-state-metrics 6d
servicemonitor.monitoring.coreos.com/prometheus-operator-kubelet 6d
servicemonitor.monitoring.coreos.com/prometheus-operator-node-exporter 6d
servicemonitor.monitoring.coreos.com/prometheus-operator-operator 6d
servicemonitor.monitoring.coreos.com/prometheus-operator-prometheus 6d
NAME ENDPOINTS AGE
endpoints/alertmanager-operated 10.244.6.174:9093,10.244.6.174:6783 12d
endpoints/ceph-exporter 10.244.2.59:9128 46h
endpoints/monitoring-mysql-mysql 10.244.6.171:3306 42d
endpoints/prometheus-operated 10.244.2.60:9090,10.244.6.175:9090 12d
endpoints/prometheus-operator-alertmanager 10.244.6.174:9093 6d19h
endpoints/prometheus-operator-grafana 10.244.6.106:3000 6d19h
endpoints/prometheus-operator-kube-state-metrics 10.244.2.163:8080 6d19h
endpoints/prometheus-operator-operator 10.244.6.113:8080 6d19h
endpoints/prometheus-operator-prometheus 10.244.2.60:9090,10.244.6.175:9090 6d19h
endpoints/prometheus-operator-prometheus-node-exporter 192.168.105.92:9100,192.168.105.93:9100,192.168.105.94:9100 + 4 more... 6d19h
上面的prometheus-operator里的_dashboards
有我修改過(guò)的dashboard,比較全面,使用手動(dòng)在grafana界面導(dǎo)入,后續(xù)可以隨意修改dashboard,使用過(guò)程中非常方便。而如果將dashboard json文件放到dashboards
目錄中,helm安裝的話,安裝的dashboard不支持grafana中直接修改,使用過(guò)程中比較麻煩。
添加prometheusrule,以下是一個(gè)示例:
[root@lab1 ceph-exporter]# kubectl get prometheusrule -n monitoring ceph-exporter -o yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
creationTimestamp: 2018-10-30T06:51:12Z
generation: 1
labels:
app: prometheus
chart: ceph-exporter-0.1.0
heritage: Tiller
prometheus: ceph-exporter
release: ceph-exporter
name: ceph-exporter
namespace: monitoring
resourceVersion: "13965150"
selfLink: /apis/monitoring.coreos.com/v1/namespaces/monitoring/prometheusrules/ceph-exporter
uid: 30543ec9-dc10-11e8-bcf3-000c293d66a5
spec:
groups:
- name: ceph-exporter.rules
rules:
- alert: Ceph
annotations:
description: There is no running ceph exporter.
summary: Ceph exporter is down
expr: absent(up{job="ceph-exporter"} == 1)
for: 5m
labels:
severity: critical
默認(rèn)監(jiān)控k8s的rule已經(jīng)很多很全面了,可以自行調(diào)整prometheus-operator/templates/all-prometheus-rules.yaml
。
報(bào)警規(guī)則可修改values.yaml
中alertmanager:
下面這段
config:
global:
resolve_timeout: 5m
# The smarthost and SMTP sender used for mail notifications.
smtp_smarthost: 'smtp.163.com:25'
smtp_from: 'xxxxxx@163.com'
smtp_auth_username: 'xxxxxx@163.com'
smtp_auth_password: 'xxxxxx'
# The API URL to use for Slack notifications.
slack_api_url: 'https://hooks.slack.com/services/some/api/token'
route:
group_by: ["job", "alertname"]
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: 'noemail'
routes:
- match:
severity: critical
receiver: critical_email_alert
- match_re:
alertname: "^KubeJob*"
receiver: default_email
receivers:
- name: 'default_email'
email_configs:
- to : 'xxxxxx@163.com'
send_resolved: true
- name: 'critical_email_alert'
email_configs:
- to : 'xxxxxx@163.com'
send_resolved: true
- name: 'noemail'
email_configs:
- to : 'null@null.cn'
send_resolved: false
## Alertmanager template files to format alerts
## ref: https://prometheus.io/docs/alerting/notifications/
## https://prometheus.io/docs/alerting/notification_examples/
##
templateFiles:
template_1.tmpl: |-
{{ define "cluster" }}{{ .ExternalURL | reReplaceAll ".*alertmanager\\.(.*)" "$1" }}{{ end }}
{{ define "slack.k8s.text" }}
{{- $root := . -}}
{{ range .Alerts }}
*Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}`
*Cluster:* {{ template "cluster" $root }}
*Description:* {{ .Annotations.description }}
*Graph:* <{{ .GeneratorURL }}|:chart_with_upwards_trend:>
*Runbook:* <{{ .Annotations.runbook }}|:spiral_note_pad:>
*Details:*
{{ range .Labels.SortedPairs }} ? *{{ .Name }}:* `{{ .Value }}`
{{ end }}
Prometheus Operator通過(guò)定義servicemonitor和prometheusrule就能動(dòng)態(tài)調(diào)整prometheus和alertmanager配置,更加符合Kubernetes的操作習(xí)慣,使Kubernetes監(jiān)控更優(yōu)雅。
參考資料:
[1] https://www.kancloud.cn/huyipow/prometheus/527093
[2] https://coreos.com/blog/introducing-operators.html
[3] https://coreos.com/blog/the-prometheus-operator.html
[4] https://github.com/coreos/prometheus-operator
[5] https://prometheus.io/docs/introduction/overview/
[6] https://prometheus.io/docs/alerting/alertmanager/
[7] https://github.com/1046102779/prometheus
名稱欄目:Kubernetes更優(yōu)雅的監(jiān)控工具PrometheusOperator
文章出自:http://chinadenli.net/article16/gidjgg.html
成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián),為您提供軟件開發(fā)、ChatGPT、企業(yè)建站、網(wǎng)站導(dǎo)航、品牌網(wǎng)站建設(shè)、網(wǎng)站改版
聲明:本網(wǎng)站發(fā)布的內(nèi)容(圖片、視頻和文字)以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主,如果涉及侵權(quán)請(qǐng)盡快告知,我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如需處理請(qǐng)聯(lián)系客服。電話:028-86922220;郵箱:631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載,或轉(zhuǎn)載時(shí)需注明來(lái)源: 創(chuàng)新互聯(lián)