Prometheus Monitoring System Deployment Guide
Introduction to Prometheus
Prometheus is an open-source monitoring system that originated as an alert toolkit at SoundCloud. Since 2012, it has been widely adopted by companies and organizations. Prometheus has a vibrant developer and user community, with increasing participation in its development and usage. It is now an independent open-source project not tied to any single company. To highlight this independence and formalize its governance, Prometheus joined the Cloud Native Computing Foundation (CNCF) in 2016, following the lead of Kubernetes. Its key features include:
- A multi-dimensional data model: Time series consist of a metric name and key-value (k/v) labels.
- A flexible query language: PromQL.
- Storage that operates without dependencies: Supports both local and remote storage solutions.
- HTTP-based pull mode: Utilizes the HTTP protocol to retrieve data, offering simplicity and clarity.
- Monitoring targets can be configured through service discovery or static configurations.
- Supports various statistical data models, offering user-friendly visualizations.
Pre-deployment preparation
To successfully deploy the Prometheus in a Kubernetes cluster provided by the CCE service, complete the following prerequisites first:
- You have an initialized Kubernetes cluster on CCE
- You can access the cluster normally via kubectl according to the [guide document](CCE/Operation guide/Operation process.md).
Create a Prometheus user
To effectively separate user roles and permissions, we create a specific user for the monitoring system and assign appropriate cluster roles. First, edit the configuration file named rbac-setup.yml. The file content is as follows:
1apiVersion: rbac.authorization.k8s.io/v1beta1
2kind: ClusterRole
3metadata:
4 name: prometheus
5rules:
6- apiGroups: [""]
7 resources:
8 - nodes
9 - nodes/proxy
10 - services
11 - endpoints
12 - pods
13 verbs: ["get", "list", "watch"]
14- nonResourceURLs: ["/metrics"]
15 verbs: ["get"]
16---
17apiVersion: v1
18kind: ServiceAccount
19metadata:
20 name: prometheus
21 namespace: default
22---
23apiVersion: rbac.authorization.k8s.io/v1beta1
24kind: ClusterRoleBinding
25metadata:
26 name: prometheus
27roleRef:
28 apiGroup: rbac.authorization.k8s.io
29 kind: ClusterRole
30 name: prometheus
31subjects:
32- kind: ServiceAccount
33 name: prometheus
34 namespace: default
After completing the editing, execute the command:
1$ kubectl create -f rbac-setup.yml
2$ kubectl get sa
The response similar to the following information will be returned:
1NAME SECRETS AGE
2default 1 1d
3prometheus 1 8h
Create a configuration object (ConfigMap)
After creating the roles, we need to create a configuration file object (ConfigMap) for Prometheus. Edit the file prometheus-kubernetes-configmap.yml with the following content: The alerting.rules section contains user-defined alerting rules. Here, we use examples of alerts for “container memory usage exceeding 90%” and “node unavailability”. For the syntax of defining alerting rules, refer to Prometheus Document:
1apiVersion: v1
2kind: ConfigMap
3metadata:
4 name: prometheus
5data:
6 alerting.rules: |-
7 # ALERT when container memory usage exceed 90%
8 ALERT container_mem_over_90
9 IF (sum(container_memory_working_set_bytes{image!="",name=~"^k8s_.*", pod_name!=""}) by (pod_name)) / (sum (container_spec_memory_limit_bytes{image!="",name=~"^k8s_.*", pod_name!=""}) by (pod_name)) > 0.9 and (sum(container_memory_working_set_bytes{image!="",name=~"^k8s_.*", pod_name!=""}) by (pod_name)) / (sum (container_spec_memory_limit_bytes{image!="",name=~"^k8s_.*", pod_name!=""}) by (pod_name)) < 2
10 FOR 30s
11 ANNOTATIONS {
12 description = "Memory Usage of Pod {{ $labels.pod_name }} on {{ $labels.kubernetes_io_hostname }} has exceeded 90%",
13 }
14 # ALERT when node is down
15 ALERT node_down
16 IF up == 0
17 FOR 30s
18 ANNOTATIONS {
19 description = "Node {{ $labels.kubernetes_io_hostname }} is down",
20 }
21 prometheus.yml: |-
22 rule_files:
23 # alerting rules
24 - /etc/prometheus/alert.rules
25 alerting:
26 alertmanagers:
27 - scheme: http
28 static_configs:
29 - targets:
30 - "localhost:9093"
31 # A scrape configuration for running Prometheus on a Kubernetes cluster.
32 # This uses separate scrape configs for cluster components (i.e. API server, node)
33 # and services to allow each to use different authentication configs.
34 #
35 # Kubernetes labels will be added as Prometheus labels on metrics via the
36 # `labelmap` relabeling action.
37 #
38 # If you are using Kubernetes 1.7.2 or earlier, please take note of the comments
39 # for the kubernetes-cadvisor job; you will need to edit or remove this job.
40
41 # Scrape config for API servers.
42 #
43 # Kubernetes exposes API servers as endpoints to the default/kubernetes
44 # service so this uses `endpoints` role and uses relabelling to only keep
45 # the endpoints associated with the default/kubernetes service using the
46 # default named port `https`. This works for single API server deployments as
47 # well as HA API server deployments.
48 scrape_configs:
49 - job_name: 'kubernetes-apiservers'
50
51 kubernetes_sd_configs:
52 - role: endpoints
53
54 # Default to scraping over https. If required, just disable this or change to
55 # `http`.
56 scheme: https
57
58 # This TLS & bearer token file config is used to connect to the actual scrape
59 # endpoints for cluster components. This is separate to discovery auth
60 # configuration because discovery & scraping are two separate concerns in
61 # Prometheus. The discovery auth config is automatic if Prometheus runs inside
62 # the cluster. Otherwise, more config options have to be provided within the
63 # <kubernetes_sd_config>.
64 tls_config:
65 ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
66 # If your node certificates are self-signed or use a different CA to the
67 # master CA, then disable certificate verification below. Note that
68 # certificate verification is an integral part of a secure infrastructure
69 # so this should only be disabled in a controlled environment. You can
70 # disable certificate verification by uncommenting the line below.
71 #
72 insecure_skip_verify: true
73 bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
74
75 # Keep only the default/kubernetes service endpoints for the https port. This
76 # will add targets for each API server which Kubernetes adds an endpoint to
77 # the default/kubernetes service.
78 relabel_configs:
79 - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
80 action: keep
81 regex: default;kubernetes;https
82
83 # Scrape config for nodes (kubelet).
84 #
85 # Rather than connecting directly to the node, the scrape is proxied though the
86 # Kubernetes apiserver. This means it will work if Prometheus is running out of
87 # cluster, or can't connect to nodes for some other reason (e.g. because of
88 # firewalling).
89 - job_name: 'kubernetes-nodes'
90
91 # Default to scraping over https. If required, just disable this or change to
92 # `http`.
93 scheme: https
94
95 # This TLS & bearer token file config is used to connect to the actual scrape
96 # endpoints for cluster components. This is separate to discovery auth
97 # configuration because discovery & scraping are two separate concerns in
98 # Prometheus. The discovery auth config is automatic if Prometheus runs inside
99 # the cluster. Otherwise, more config options have to be provided within the
100 # <kubernetes_sd_config>.
101 tls_config:
102 ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
103 insecure_skip_verify: true
104 bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
105
106 kubernetes_sd_configs:
107 - role: node
108
109 relabel_configs:
110 - action: labelmap
111 regex: __meta_kubernetes_node_label_(.+)
112 - target_label: __address__
113 replacement: kubernetes.default.svc:443
114 - source_labels: [__meta_kubernetes_node_name]
115 regex: (.+)
116 target_label: __metrics_path__
117 replacement: /api/v1/nodes/${1}/proxy/metrics
118
119 # Scrape config for Kubelet cAdvisor.
120 #
121 # This is required for Kubernetes 1.7.3 and later, where cAdvisor metrics
122 # (those whose names begin with 'container_') have been removed from the
123 # Kubelet metrics endpoint. This job scrapes the cAdvisor endpoint to
124 # retrieve those metrics.
125 #
126 # In Kubernetes 1.7.0-1.7.2, these metrics are only exposed on the cAdvisor
127 # HTTP endpoint; use "replacement: /api/v1/nodes/${1}:4194/proxy/metrics"
128 # in that case (and ensure cAdvisor's HTTP server hasn't been disabled with
129 # the --cadvisor-port=0 Kubelet flag).
130 #
131 # This job is not necessary and should be removed in Kubernetes 1.6 and
132 # earlier versions, or it will cause the metrics to be scraped twice.
133 - job_name: 'kubernetes-cadvisor'
134
135 # Default to scraping over https. If required, just disable this or change to
136 # `http`.
137 scheme: https
138
139 # This TLS & bearer token file config is used to connect to the actual scrape
140 # endpoints for cluster components. This is separate to discovery auth
141 # configuration because discovery & scraping are two separate concerns in
142 # Prometheus. The discovery auth config is automatic if Prometheus runs inside
143 # the cluster. Otherwise, more config options have to be provided within the
144 # <kubernetes_sd_config>.
145 tls_config:
146 ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
147 bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
148
149 kubernetes_sd_configs:
150 - role: node
151
152 relabel_configs:
153 - action: labelmap
154 regex: __meta_kubernetes_node_label_(.+)
155 - target_label: __address__
156 replacement: kubernetes.default.svc:443
157 - source_labels: [__meta_kubernetes_node_name]
158 regex: (.+)
159 target_label: __metrics_path__
160 replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
161
162 # Scrape config for service endpoints.
163 #
164 # The relabeling allows the actual service scrape endpoint to be configured
165 # via the following annotations:
166 #
167 # * `prometheus.io/scrape`: Only scrape services that have a value of `true`
168 # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need
169 # to set this to `https` & most likely set the `tls_config` of the scrape config.
170 # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
171 # * `prometheus.io/port`: If the metrics are exposed on a different port to the
172 # service then set this appropriately.
173 - job_name: 'kubernetes-service-endpoints'
174
175 kubernetes_sd_configs:
176 - role: endpoints
177
178 relabel_configs:
179 - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
180 action: keep
181 regex: true
182 - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
183 action: replace
184 target_label: __scheme__
185 regex: (https?)
186 - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
187 action: replace
188 target_label: __metrics_path__
189 regex: (.+)
190 - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
191 action: replace
192 target_label: __address__
193 regex: ([^:]+)(?::\d+)?;(\d+)
194 replacement: $1:$2
195 - action: labelmap
196 regex: __meta_kubernetes_service_label_(.+)
197 - source_labels: [__meta_kubernetes_namespace]
198 action: replace
199 target_label: kubernetes_namespace
200 - source_labels: [__meta_kubernetes_service_name]
201 action: replace
202 target_label: kubernetes_name
203
204 # Example scrape config for probing services via the Blackbox Exporter.
205 #
206 # The relabeling allows the actual service scrape endpoint to be configured
207 # via the following annotations:
208 #
209 # * `prometheus.io/probe`: Only probe services that have a value of `true`
210 - job_name: 'kubernetes-services'
211
212 metrics_path: /probe
213 params:
214 module: [http_2xx]
215
216 kubernetes_sd_configs:
217 - role: service
218
219 relabel_configs:
220 - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
221 action: keep
222 regex: true
223 - source_labels: [__address__]
224 target_label: __param_target
225 - target_label: __address__
226 replacement: blackbox-exporter.example.com:9115
227 - source_labels: [__param_target]
228 target_label: instance
229 - action: labelmap
230 regex: __meta_kubernetes_service_label_(.+)
231 - source_labels: [__meta_kubernetes_namespace]
232 target_label: kubernetes_namespace
233 - source_labels: [__meta_kubernetes_service_name]
234 target_label: kubernetes_name
235
236 # Example scrape config for probing ingresses via the Blackbox Exporter.
237 #
238 # The relabeling allows the actual ingress scrape endpoint to be configured
239 # via the following annotations:
240 #
241 # * `prometheus.io/probe`: Only probe services that have a value of `true`
242 - job_name: 'kubernetes-ingresses'
243
244 metrics_path: /probe
245 params:
246 module: [http_2xx]
247
248 kubernetes_sd_configs:
249 - role: ingress
250
251 relabel_configs:
252 - source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe]
253 action: keep
254 regex: true
255 - source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]
256 regex: (.+);(.+);(.+)
257 replacement: ${1}://${2}${3}
258 target_label: __param_target
259 - target_label: __address__
260 replacement: blackbox-exporter.example.com:9115
261 - source_labels: [__param_target]
262 target_label: instance
263 - action: labelmap
264 regex: __meta_kubernetes_ingress_label_(.+)
265 - source_labels: [__meta_kubernetes_namespace]
266 target_label: kubernetes_namespace
267 - source_labels: [__meta_kubernetes_ingress_name]
268 target_label: kubernetes_name
269
270 # Example scrape config for pods
271 #
272 # The relabeling allows the actual pod scrape endpoint to be configured via the
273 # following annotations:
274 #
275 # * `prometheus.io/scrape`: Only scrape pods that have a value of `true`
276 # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
277 # * `prometheus.io/port`: Scrape the pod on the indicated port instead of the
278 # pod's declared ports (default is a port-free target if none are declared).
279 - job_name: 'kubernetes-pods'
280
281 kubernetes_sd_configs:
282 - role: pod
283
284 relabel_configs:
285 - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
286 action: keep
287 regex: true
288 - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
289 action: replace
290 target_label: __metrics_path__
291 regex: (.+)
292 - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
293 action: replace
294 regex: ([^:]+)(?::\d+)?;(\d+)
295 replacement: $1:$2
296 target_label: __address__
297 - action: labelmap
298 regex: __meta_kubernetes_pod_label_(.+)
299 - source_labels: [__meta_kubernetes_namespace]
300 action: replace
301 target_label: kubernetes_namespace
302 - source_labels: [__meta_kubernetes_pod_name]
303 action: replace
304 target_label: kubernetes_pod_name
Create a configuration file object for the monitoring alert component (AlertManager). Edit the file alertmanager-kubernetes-configmap.yml with the following content. Replace the configuration with valid SMTP settings and email recipients
1apiVersion: v1
2kind: ConfigMap
3metadata:
4 name: alertmanager
5data:
6 alertmanager.yml: |-
7 global:
8# Set your own SMTP server and authentication parameters
9 smtp_smarthost: 'localhost:25'
10 smtp_from: 'addr@domain.com'
11 smtp_auth_username: 'username@domain.com'
12 smtp_auth_password: 'password'
13 # The directory from which notification templates are read.
14 templates:
15 - '/etc/alertmanager/template/*.tmpl'
16 # The root route on which each incoming alert enters.
17 route:
18 # The labels by which incoming alerts are grouped together. For example,
19 # multiple alerts coming in for cluster=A and alertname=LatencyHigh would
20 # be batched into a single group.
21 group_by: ['alertname', 'pod_name']
22 # When a new group of alerts is created by an incoming alert, wait at
23 # least 'group_wait' to send the initial notification.
24 # This way ensures that you get multiple alerts for the same group that start
25 # firing shortly after another are batched together on the first
26 # notification.
27 group_wait: 30s
28 # When the first notification was sent, wait 'group_interval' to send a batch
29 # of new alerts that started firing for that group.
30 group_interval: 5m
31 # If an alert has successfully been sent, wait 'repeat_interval' to
32 # resend them.
33 repeat_interval: 3h
34 # A default receiver
35 receiver: AlertMail
36 receivers:
37 - name: 'AlertMail'
38 email_configs:
39 - to: 'receiver@domain.com' # Replace with the alert recipient’s email address
After editing, use the kubectl command to create the corresponding ConfigMap object
1$ kubectl create -f prometheus-kubernetes-configmap.yml
2$ kubectl create -f alertmanager-kubernetes-configmap.yml
3$ kubectl get configmaps
The response similar to the following information will be returned:
1NAME DATA AGE
2alertmanager 1 29s
3prometheus 2 36s
Create Node Exporter
The default monitoring setup gathers limited resource information about cluster nodes. To access more detailed node resource data, we need to deploy the Node Exporter service on every node in the Kubernetes cluster. This can be achieved by using Kubernetes’ DaemonSet object to deploy Node Exporter on all nodes. The node-exporter.yaml file for creating the DaemonSet and service objects is as follows:
1apiVersion: v1
2kind: Service
3metadata:
4 annotations:
5 prometheus.io/scrape: 'true'
6 labels:
7 app: node-exporter
8 name: node-exporter
9 name: node-exporter
10spec:
11 clusterIP: None
12 ports:
13 - name: scrape
14 port: 9100
15 protocol: TCP
16 selector:
17 app: node-exporter
18 type: ClusterIP
19
20---
21apiVersion: extensions/v1beta1
22kind: DaemonSet
23metadata:
24 name: node-exporter
25spec:
26 template:
27 metadata:
28 labels:
29 app: node-exporter
30 name: node-exporter
31 spec:
32 containers:
33 - image: hub.baidubce.com/public/node-exporter:latest
34 name: node-exporter
35 ports:
36 - containerPort: 9100
37 hostPort: 9100
38 name: scrape
39 hostNetwork: true
40 hostPID: true
Then create the relevant objects:
1$ kubectl create -f node-exporter.yaml
2$ kubectl get daemonsets
3NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE-SELECTOR AGE
4node-exporter 2 2 2 2 2 <none> 8h
5$ kubectl get services
6NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
7kubernetes 172.18.0.1 <none> 443/TCP 1d
8node-exporter None <none> 9100/TCP 8h
Create Prometheus and its associated service
Finally, we will create Prometheus and its service for data aggregation and display. Create the file prometheus-deployment.yaml for deployment, with the following content:
1apiVersion: v1
2kind: Service
3metadata:
4 annotations:
5 prometheus.io/scrape: 'true'
6 labels:
7 name: prometheus
8 name: prometheus
9spec:
10 selector:
11 app: prometheus
12 type: LoadBalancer
13 ports:
14 - name: prometheus
15 protocol: TCP
16 port: 9090
17 nodePort: 30900
18
19---
20apiVersion: extensions/v1beta1
21kind: Deployment
22metadata:
23 name: prometheus
24spec:
25 replicas: 1
26 selector:
27 matchLabels:
28 app: prometheus
29 template:
30 metadata:
31 name: prometheus
32 labels:
33 app: prometheus
34 spec:
35 serviceAccountName: prometheus
36 containers:
37 - name: prometheus
38 image: hub.baidubce.com/public/prometheus:latest
39 args:
40 - '-storage.local.retention=6h'
41 - '-storage.local.memory-chunks=500000'
42 - '-config.file=/etc/prometheus/prometheus.yml'
43 ports:
44 - name: web
45 containerPort: 9090
46 volumeMounts:
47 - name: prometheus-config-volume
48 mountPath: /etc/prometheus
49 - name: alertmanager
50 image: hub.baidubce.com/public/alertmanager:latest
51 args:
52 - '-config.file=/etc/alertmanager/alertmanager.yml'
53 ports:
54 - name: web
55 containerPort: 9093
56 volumeMounts:
57 - name: alertmanager-config-volume
58 mountPath: /etc/alertmanager
59 #imagePullSecrets:
60 #- name: myregistrykey
61 volumes:
62 - name: prometheus-config-volume
63 configMap:
64 name: prometheus
65 - name: alertmanager-config-volume
66 configMap:
67 name: alertmanager
Execute the following command:
1$ kubectl create -f prometheus-deployment.yaml
2$ kubectl get deployments
The response similar to the following information will be returned:
1NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
2prometheus 1 1 1 1 8h
Execute the following command:
$ kubectl get services
The response similar to the following information will be returned:
1NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
2kubernetes 172.18.0.1 <none> 443/TCP 1d
3node-exporter None <none> 9100/TCP 8h
4prometheus 172.18.164.101 180.72.136.254 9090:30900/TCP 8h
As shown above, we can access the Prometheus system via 180.72.136.254:9090 to monitor the cluster. The monitoring page is as shown in the figure below:

