Monitoring Clusters by Prometheus

Last Updated：2020-10-27

Monitoring Overview

At present, the container monitoring of CCE is composed of a series of open source components. Users can customize and deploy each component on the page, and set its public network open policy, persistent configuration, etc. After full deployment, users will get the following monitoring capabilities:

Collection, storage and display scheme of prometheus + grafana + node-exporter + kube-state-metrics based on open source.
Node monitoring: Provide status, CPU, memory, disk and other indicators of each node.
Application monitoring: Provide container index view based on namespace-app-pod-containe multi-level filtering, such as CPU, memory, network traffic, etc.
Resource monitoring: Provide performance indicators such as Pod quantity, Pod start time, deployment, job, and API-server requests.
Custom monitoring: Provide access to user-defined exporter.

Users can deploy, uninstall and update components on the page, open or close public network services, and provide incremental update function of grafana template.

Use Mode

CCE container monitoring products are based on open-source components, and the functions of each component are as follows:

Prometheus: Monitor core services, responsible for data aggregation and storage.
Node-exporter : Collection component, which provides index collection of host dimension, such as CPU utilization of node node.
Kube-state-metrics: Collection component, which provides indicator collection of resource dimensions, such as Pod quantity and startup time.
Grafana: Display component to provide visual configuration and display of monitoring data.

The container monitoring page of the new cluster is as follows:

After selecting the corresponding cluster, it will be found that all components of the new cluster are in the "not installed" state. At this time, you can click the three deployment buttons above to initialize the deployment.

Deploy Core Services

Deployment process:

1.Prometheus is selected by default.
2.Enable public network access: The default web service of prometheus does not have permission to verify. It is closed by default here. You can click open. After deployment, you can also disable or enable the public network service on the list page.
3.Persistent storage: Users can choose to persist Prometheus data to cds cloud disk, add or choose their own cloud disk.
4.Confirm the submission: Deploy the pod, service and other resources of prometheus with one-click. It takes time for Pod to start. You can view Service and the running status of Pod on the list page.

Deploy Collection Components

Deployment process:

1.Select the components to be deployed.
2.The default template of grafana uses the data collected by the above two components. The user can deploy all of them during initialization or separately in the future.

Deploy Presentation Components

Deployment process:

1.Grafana is selected by default and will bring 3 sets of monitoring templates: Node monitoring, application monitoring and resource monitoring.
2.Enable public network access: Grafana generally needs to expose services. It is recommended to open public network access. The default password is cluster id. please change the password as soon as possible after login.
3.Persistent storage: Users can choose to persist the panel configuration data of grafana to cds cloud disk to prevent Pod drift and data loss during template upgrade.
4.Confirm the submission: One click deployment of grafana's pod, service and other resources, and automatically mount the cds disk selected by the user.

View Component List

After deployment, users can view the component status on the following list page. The meaning of each column is as follows:

Component name: Click to enter the deployment or daemonset page corresponding to the component.
Type: It consist of display component, collection component and core service.
Installation status: Installed, not installed.
Operation status: Display the actual value/expected number of pod. If the actual value is not equal to the expected value, a yellow mark will be displayed, normally green.
Access address: For prometheus and grafana, the Internet address will be displayed. At the same time, you can click the button to close or open the Internet service.
Version number: Each component is an open source project. The open source version number used will be shown here. Users can view the documents. The version number of grafana will be marked with -v1.0, which is used to identify the current version of the template and upgrade the monitoring template.
Create time: Component creation time.

You can check a component to update or uninstall it, or click the deployment button at the top to redeploy it.

Uninstall: Uninstall the checked components completely.
Update: Uninstall and reinstall the existing components. If cds persistence is previously mounted, the previous cds will be reused.
Redeployment: You can click the deployment button at the top, which will completely delete the existing components and redeploy them. If cds was previously mounted, the new components will no longer load the original data.

Display Monitoring Template

There are three default templates for grafana: Node monitoring, application monitoring and resource monitoring.

Node monitoring:

Application monitoring:

Resource monitoring:

Important Notes

Tips

1.The default account of grafana service is admin, and the password is the cluster id. please change the password in time after login.
2.If you choose to mount the cds disk, please do not delete the disk on other pages, otherwise the data will be abnormal.
3.The default web service of prometheus does not have permission verification. If it is opened for temporary debugging, it is recommended to close the public network access in time after use.
4.If the running status of the component is abnormal, you can click to enter the component details page to view the events and logs of the corresponding pod.

How to customize monitoring

If you want to deploy a user-defined exporter in the cluster and expose the user-defined monitoring indicators, there are two options:

1.Select the exporter recommended by prometheus: https://prometheus.io/docs/instrumenting/exporters/.

2.Develop exporter components independently: https://prometheus.io/docs/instrumenting/writing_exporters/.

After the exporter is ready, deploy it in the cluster according to the official documents, create the corresponding service resources, and add them in the annotation of service.

prometheus.io/scrape: "true"

Prometheus will automatically collect new monitoring indicators. After confirming that the indicators are correct, users can customize their own monitoring charts on grafana.

Resource Occupation

Limit and request are set for all the above components. If all the components are installed, the initial occupied resources are:

CPU: 1 core
Memory: 1G
BLB/EIP: If the public network access is opened, it will be charged according to the traffic.

Storage Management

Configure Alarm Rules

百度智能云

Cloud Container Engine

Monitoring Clusters by Prometheus

Monitoring Overview

Use Mode

Deploy Core Services

Deploy Collection Components

Deploy Presentation Components

View Component List

Display Monitoring Template

Important Notes

Tips

How to customize monitoring

Resource Occupation