Baidu AI Cloud
中国站

百度智能云

Cloud Database MONGODB

Best Practices for MongoDB Monitoring and Alarm

Baidu smart cloud database DocDB for MongoDB provides service status monitoring and alarm functions. This article mainly introduces the common alarm parameters (disk space usage rate, instance linking number, entrance and exit traffic, etc.) and alarm mode setting methods of cloud database DocDB for MongoDB instance.

Background

  • With the development of data volume and business, the performance resource usage rate of MongoDB instance may gradually increase until it is exhausted.
  • In some scenarios, the performance resources of MongoDB instances may be consumed largely and abnormally. For example, the CPU usage rate increases due to a large number of slow queries, and the disk space is consumed sharply due to a large number of data writes.
  • By setting the monitoring and alarm rules for the key performance indicators of an instance, you will be informed of the abnormal indicator data at the first time, so that you can locate and handle the faults quickly.

Chek the monitor

Steps to check the monitor

The steps to check instance monitoring are as follows::

  1. Login MongoDB Management Console.
  2. Select the Region where the instance is located at the upper left corner of the console page.
  3. Click the Replica Set Instance List or Shard Set Instance List on the left navigation bar.
  4. Click the instance name of the target instance to enter the Instance Details page.
  5. Click Monitor in the left navigation bar, then select different nodes or components to view the monitoring information of different nodes in replica set and shard cluster:

    • Replica Set Instance. The monitoring information of replica set instance can be viewed by selecting nodes and time periods in the replica set;

      image.png

    • Shard Custer Instance. The monitoring information of shard cluster instance can be viewed by selecting different components and time periods in the shard cluster to view the monitoring information;

Monitoring item description

In order to better monitor and operate MongoDB instances, Baidu smart cloud provides the following monitoring parameters:

Monitor Parameters Description
CPU usage rate The usage rate of the instance CPU can be used as an indicator of capacity expansion
Memory usage rate The available memory of the instance can be used as an indicator of capacity expansion
Disk space usage rate The percentage of disk capacity used by the instance can be used as an indicator of capacity expansion
Disk space usage amount The disk capacity used by the instance can be used as an indicator of capacity expansion
Operation volume Operation times of operation instance per second, business characteristic index:
insert
query
update
getmore
delete
command
Linking number Number of client sessions connected to the instance
Network traffic Throughput of network connection, service characteristic index

Alarm setting

Steps to add alarm strategy

In order to operate the instance automatically and more conveniently and automatically inform the user of any abnormal monitoring item, you can create an alarm strategy for the designated nodes and related monitoring items of MongoDB instance. Set the threshold and notification mode of monitoring items in the alarm strategy, when a monitoring item reaches the threshold, automatically send alarm information to users.

Steps to add the alarm strategy:

  1. On the instance monitoring page, select the node where alarm strategy needs to be set, and click on Alarm Details to enter the alarm strategy page. The alarm strategy setting entry of shard cluster is shown in the following figure:

    image.png

  2. Click Add Strategy to create an alarm strategy for the instance node. The configuration of the alarm strategy includes Strategy Information and Alarm Action, wherein the Strategy Informationis to select the type of monitoring item and alarm threshold, and the Alarm Actionis an action to be executed when an alarm occurs.

    • Create An Alarm Strategy

      image.png

    • Create An Alarm Action

      image.png

  3. Fill in the Strategy InformationAlarm Action completely, then click "Submit", the alarm strategy will be created successfully.

Alarm strategy parameter description

Alarm Strategy Parameter Description

Parameters description and setting suggestions for creating alarm strategy:

Alarm strategy creation parameters Description Setting suggestion
Name Alarm strategy naming The name should be readable
Monitoring item Type of monitoring items:
Action item monitoring:insert, query, delete, update, getmore, command
Instance linking number
Ingress traffic
Egress traffic
Disk usage rate
Disk usage amount
It is recommended to set the disk usage rate alarm strategy first so as to expand the capacity in time and avoid the impact of disk fullness on writing.
Statistical cycle Statistics should be made once every other cycle, and the monitoring item values in one cycle should be calculated according to the following statistical methods, with 1 minute, 5 minutes, 10 minutes, 15 minutes optional Generally select 5 minutes, if high data sensitivity is required, select 1 minute can be selected.
Statistical method Calculation method of statistical value in each cycle: average value, sum value, maximum value, minimum value and number of samples Average value and sum value are statistical methods commonly used
Threshold The preset threshold of statistical items, which can be set as triggering the alarm when the statistical items >, > =, =, = threshold. For threshold setting, refer to "Alarm Threshold General Settings" below
Alarm after repeating several times Alarm after the number of times that the statistical value exceeds threshold several times continuously Generally, it can be set to 3 times, and if higher sensitivity is required, it can be set to 1 time

Suggestion on setting alarm threshold

  • Disk usage alarm setting
  • The usage rate of disk space is relative, so there is no need to modify the alarm strategy after each expansion, and it is usually selected as an alarm item. The absolute value of disk space usage can also be set as an alarm item according to business needs.
  • The alarm threshold of disk usage rate should be set as 80% of the available disk physical space. When the preset red line is exceeded, check the disk consumption status to determine whether the disk is occupied due to normal data service, then further meet the service requirements by locating faults, deleting data or expanding capacity. Common alarm parameters for disk usage rate are as follows:
  • Linking number alarm setting. In the cloud database DocDB for MongoDB instance, it is recommended to set the linking number to 80% of the maximum linking number in your instance. If the instance parameters frequently exceed the preset value, check the workload or upgrade the instance configuration.
  • Other parameters. Set the minimum value according to the user's historical data, and set a maximum threshold value. When receiving the alarm information, handle the alarm in time and take corresponding countermeasures: handle faults or upgrade.

Suggestion on setting alarm action

Parameter Description
Name of action Name of Alarm Action Name should be readable
Available regions Alarm actions in different regions are only applicable to alarm policies in respective regions Set according to geographical distribution of examples
Notification method Baidu smart cloud currently supports "Email Notification", "SMS Notification", "Telephone Notification" and the combined notification form of the three SMS and telephone notification have higher timeliness
Notified Object Notified Object supports "user group" and "user"
if you do not have a created notification object at present, you can click "Add User Group" or "add user" to create it
Timed close timed close of alarm action can be select in order to prevent false alarm of the instance during reconfiguration, restart and other operation and maintenance operations, set timed close and mask the alarm according to business needs
Previous
Best Practices for MongoDB Data Security
Next
Best Practices for MongoDB Backup and Recovery