百度智能云

All Product Document

          Cloud-Native Application Platform CNAP

          Alarm Center

          Precondition

          The alarm feature of the CNAP platform depends on the Prometheus component. Please deploy the Prometheus component for the cluster in the component center first.

          Alarm Rules

          Basic rules

          Basic alarm rules can meet the most common alarm requirements, such as CPU and MEM threshold alarms. We provide basic alarms for different resource granularities. You can choose as needed.

          image.png

          • Rule name: The name of the rule must be unique and unchangeable;
          • Rule mode: Types of rules: support three types: basic, aggregation, and advanced;
          • Alarm resources: Types of resource where the rule applies currently include: Area, cluster, node, application, environment, deployment group, instance, micro-service deployment group, micro-service container group;
          • Filter resources: The default rules are applied to all resources of a certain type. You can click to edit the filter conditions and filter the resources you want to configure alarms in the pop-up window.
          • Filter conditions: It consists of three parts: Tag, operator, value:

            • Tags: Corresponding to the column name in the table above, each tag can only be configured with one filter condition;
            • Operator: Specify how to filter the table, support: Equal to = not equal to! =Regular match=~Regular mismatch! ~
            • Value: Value of the filter condition;

          The above table only shows the resources that have collected metric data, but if the regular rule is used to filter, the newly added resource that conforms to the regular rule will also apply this alarm.

          • Alarm rules: From left to right: Alarm metric, duration, operator, threshold:

            • Alarm metrics: Metrics to trigger the alarm;
            • Duration: Support immediate effect, which is suitable for log keyword alarm;
            • Operator support: < <= > >= == !=
            • Threshold: Support any floating point number, regardless of positive or negative;
          • Data overview The metric data preview from the current moment to 30 minutes ago, the red dotted line in the figure is the threshold line;
          • Alarm level: Support four alarm levels: Reminder, minor, important, urgent, different alarm levels send different alarm emails;
          • Additional tags: Additional tags will be sent along with the alarm email, and the specific limits are shown in the figure above;
          • Effective date: Period of day when the rule takes effect, which supports the configuration of the period in the early morning;
          • Take effect immediately: Whether the new rule takes effect immediately;
          • Send interval: Frequency of sending alarm emails. When the alarm is over, an alarm release email will be sent;
          • Notification object: It supports five notification objects: Email, mobile phone number, sub-user, message recipient, message receiving group;

          You can click [Add Notification Object] to add new sub-users, message recipients, and message recipient groups. Sub-users can configure mobile phone and email as shown below:

          image.png

          Aggregation rules

          Aggregation rules provide the most granular alarm resource templates, and you can aggregate the dimensions you want by yourself.

          image.png

          • Alarm resources: In aggregation rules, it currently supports: Nodes, instances, micro-service deployment groups, and micro-service container groups;
          • Alarm rules: Your custom monitoring metrics will be displayed in the alarm metrics of instance resources;
          • Aggregation method: Different alarm metrics will have different aggregation methods;
          • Aggregate tag: It is the dimension for aggregation, and you can specify multiple;

          Advanced rules

          Advanced rules support writing PromQL directly, please refer to Prometheus Official Document for specific syntax .

          image.png

          • Alarm rules: Under advanced rules, write PromQL here;
          • Effective policy: Corresponding to the duration in the basic rules and aggregation rules;
          • Data overview Preview of metric data from current moment to 30 minutes before, no threshold line;

          Alarm Record

          When any resource of the alarm rule triggers the threshold condition, an alarm record will be generated. There is at most one alarm record of [Alarming] in the same rule at the same time, and the number of alarm records of [Alarm End] is unlimited.

          image.png

          • Rule name: Alarm record corresponding to the name of fuzzy search rule;
          • Time scope: Search alarm records whose start time is within this time range;
          • Notification object: Search alarm records containing the notification object;
          • Record ID: Search an alarm record accurately;
          • Filter conditions: Search in KV format, supporting your additional tags;
          • Alarm details: For details of the alarm record, click to expand the following sidebar.
          • Data overview Data preview from [Start Time] to [End Time], the metric data here is only retained for one month;
          • Rule name: Name of the corresponding alarm rule;
          • Alarm status: [Alarming] or [Alarm End];
          • Alarm level: Correspond to the alarm level configured in the alarm rule;
          • Alarm rules: The basic rules and aggregation rules are displayed in Chinese, and the advanced rules are displayed in PromQL;
          • Notification object: Notification object corresponding to the rule configuration;
          • Start time: The moment when any resource triggers the alarm rule;
          • End time: The moment when all resources no longer trigger the alarm rule;
          • Alarm resources: The summary of the resources that triggered the alarm corresponds to the curve on the graph;

          Alarm Notification

          It supports two alarm notification forms: email and short message. Alarm email.

          Alarm short message:

          Dear user, your alarm rule zc-410-2 was triggered on 2020-04-10. Please visit the following link for details: 
          https://console.bce.baidu.com/cnap/#/alertrecord/alt-12345678 
          「CNAP」 
          Previous
          Micro-service Monitoring
          Next
          Event Monitoring