Best Practices for MongoDB Monitoring and Alarm

Last Updated：2021-05-12

Baidu smart cloud database DocDB for MongoDB provides service status monitoring and alarm functions. This article mainly introduces the common alarm parameters (disk space usage rate, instance linking number, entrance and exit traffic, etc.) and alarm mode setting methods of cloud database DocDB for MongoDB instance.

Background

With the development of data volume and business, the performance resource usage rate of MongoDB instance may gradually increase until it is exhausted.
In some scenarios, the performance resources of MongoDB instances may be consumed largely and abnormally. For example, the CPU usage rate increases due to a large number of slow queries, and the disk space is consumed sharply due to a large number of data writes.
By setting the monitoring and alarm rules for the key performance indicators of an instance, you will be informed of the abnormal indicator data at the first time, so that you can locate and handle the faults quickly.

Chek the monitor

Steps to check the monitor

The steps to check instance monitoring are as follows:：

Login MongoDB Management Console.
Select the Region where the instance is located at the upper left corner of the console page.
Click the Replica Set Instance List or Shard Set Instance List on the left navigation bar.
Click the instance name of the target instance to enter the Instance Details page.
Click Monitor in the left navigation bar, then select different nodes or components to view the monitoring information of different nodes in replica set and shard cluster:
- Replica Set Instance. The monitoring information of replica set instance can be viewed by selecting nodes and time periods in the replica set;
- Shard Custer Instance. The monitoring information of shard cluster instance can be viewed by selecting different components and time periods in the shard cluster to view the monitoring information;

Monitoring item description

In order to better monitor and operate MongoDB instances, Baidu smart cloud provides the following monitoring parameters:

Monitor Parameters	Description
CPU usage rate	The usage rate of the instance CPU can be used as an indicator of capacity expansion
Memory usage rate	The available memory of the instance can be used as an indicator of capacity expansion
Disk space usage rate	The percentage of disk capacity used by the instance can be used as an indicator of capacity expansion
Disk space usage amount	The disk capacity used by the instance can be used as an indicator of capacity expansion
Operation volume	Operation times of operation instance per second, business characteristic index: insert query update getmore delete command
Linking number	Number of client sessions connected to the instance
Network traffic	Throughput of network connection, service characteristic index

Alarm setting

Steps to add alarm strategy

In order to operate the instance automatically and more conveniently and automatically inform the user of any abnormal monitoring item, you can create an alarm strategy for the designated nodes and related monitoring items of MongoDB instance. Set the threshold and notification mode of monitoring items in the alarm strategy, when a monitoring item reaches the threshold, automatically send alarm information to users.

Steps to add the alarm strategy:

On the instance monitoring page, select the node where alarm strategy needs to be set, and click on Alarm Details to enter the alarm strategy page. The alarm strategy setting entry of shard cluster is shown in the following figure:
Click Add Strategy to create an alarm strategy for the instance node. The configuration of the alarm strategy includes Strategy Information and Alarm Action， wherein the Strategy Informationis to select the type of monitoring item and alarm threshold, and the Alarm Actionis an action to be executed when an alarm occurs.
- Create An Alarm Strategy
- Create An Alarm Action
Fill in the Strategy Information和Alarm Action completely, then click "Submit", the alarm strategy will be created successfully.

Alarm strategy parameter description

Alarm Strategy Parameter Description

Parameters description and setting suggestions for creating alarm strategy:

Alarm strategy creation parameters	Description	Setting suggestion
Name	Alarm strategy naming	The name should be readable
Monitoring item	Type of monitoring items: Action item monitoring：insert, query, delete, update, getmore, command Instance linking number Ingress traffic Egress traffic Disk usage rate Disk usage amount	It is recommended to set the disk usage rate alarm strategy first so as to expand the capacity in time and avoid the impact of disk fullness on writing.
Statistical cycle	Statistics should be made once every other cycle, and the monitoring item values in one cycle should be calculated according to the following statistical methods, with 1 minute, 5 minutes, 10 minutes, 15 minutes optional	Generally select 5 minutes, if high data sensitivity is required, select 1 minute can be selected.
Statistical method	Calculation method of statistical value in each cycle: average value, sum value, maximum value, minimum value and number of samples	Average value and sum value are statistical methods commonly used
Threshold	The preset threshold of statistical items, which can be set as triggering the alarm when the statistical items >, > =, =, = threshold.	For threshold setting, refer to "Alarm Threshold General Settings" below
Alarm after repeating several times	Alarm after the number of times that the statistical value exceeds threshold several times continuously	Generally, it can be set to 3 times, and if higher sensitivity is required, it can be set to 1 time

Suggestion on setting alarm threshold

Disk usage alarm setting

The usage rate of disk space is relative, so there is no need to modify the alarm strategy after each expansion, and it is usually selected as an alarm item. The absolute value of disk space usage can also be set as an alarm item according to business needs.

The alarm threshold of disk usage rate should be set as 80% of the available disk physical space. When the preset red line is exceeded, check the disk consumption status to determine whether the disk is occupied due to normal data service, then further meet the service requirements by locating faults, deleting data or expanding capacity. Common alarm parameters for disk usage rate are as follows:

Linking number alarm setting. In the cloud database DocDB for MongoDB instance, it is recommended to set the linking number to 80% of the maximum linking number in your instance. If the instance parameters frequently exceed the preset value, check the workload or upgrade the instance configuration.
Other parameters. Set the minimum value according to the user's historical data, and set a maximum threshold value. When receiving the alarm information, handle the alarm in time and take corresponding countermeasures: handle faults or upgrade.

Suggestion on setting alarm action

	Parameter	Description
Name of action	Name of Alarm Action	Name should be readable
Available regions	Alarm actions in different regions are only applicable to alarm policies in respective regions	Set according to geographical distribution of examples
Notification method	Baidu smart cloud currently supports "Email Notification", "SMS Notification", "Telephone Notification" and the combined notification form of the three	SMS and telephone notification have higher timeliness
Notified Object	Notified Object	supports "user group" and "user" if you do not have a created notification object at present, you can click "Add User Group" or "add user" to create it
Timed close	timed close of alarm action can be select	in order to prevent false alarm of the instance during reconfiguration, restart and other operation and maintenance operations, set timed close and mask the alarm according to business needs

Best Practices for MongoDB Data Security

Best Practices for MongoDB Backup and Recovery

百度智能云

Cloud Database MONGODB