百度智能云

All Product Document

          Relational Database Service

          RDS Monitoring and Alarm Configuration

          Background

          After the creation of the RDS instance, two alarm strategies (namely, disk usage rate and CPU occupancy rate) gets configured automatically by default. To learn about the database running status more promptly and accurately, recommend you to configure more exhaustive monitoring policies in BCM autonomously. BCM provides an RDS-related monitoring data collection solution, and you may perform selection and configuration according to your needs.

          BCM for RDS Monitoring Configuration Method

          See Monitoring and Alarm Operations Guide

          Monitoring Iitems Statistical Cycle Statistical Method Recommended Threshold Alarm after User-defined Repetitions
          CPU occupancy rate 1min Mean > 80% 3
          Data space disk usage rate 1min Mean > 80% 3
          System space disk usage rate 1min Mean > 80% 3
          Memory utilization 1min Mean > 90% 3
          Slow log 1min Mean > two times current instance's CPU cores 3
          Master-slave delay 1min Mean 300 3
          Total number of connections 1min Mean > 80% of current instance parameter "max_connections" 3
          Number of active connections 1min Mean > two times current instance's CPU cores 3
          Maximum transaction execution time 1min Mean 60 3

          Best Practices for RDS Disk Monitoring

          Disk monitoring curve

          • Data space disk usage rate

            Note: It means the usage rate of the data space disk. Calculation formula: Disk space for data/ purchased disk space, i.e., user data(including table file, shared table space, and temporary file)/purchased disk space). See the following blue monitoring curve. Influence: If the usage rate of the data disk space exceeds 100%, set the "rds" instance to read-only mode, so that the user cannot write the data.

          • System space disk usage rate

            Note: system space disk usage rate, calculation formula: (data usage disk space plus log usage disk space)/purchased disk space, namely, (user data +log (mysql.log, slow.log, mysql.err, binlog, system collection log ))/(purchased disk space). See the following red monitoring curve, influence: If the usage rate of the system space disk reaches 100%, the disk is full, leading to failure to continue the data write-in.

          Case

          One customer purchases a Dual High-availability instance initializes the data, and then views the following disk monitoring information:

          Usage rate of the data space disk

          System space disk usage: 14.42%

          To ensure data security and audit, the customer enables full log and relatively long cycle for reserving "binlog". After a period of running, the customer receives a call from "rds": disk usage rate rises sharply and hits 87% within one hour, so the "Full Disk" risk exists. See the following figure:

          Authorize DBA to locate the cause for the sharp rise of disk usage rate: use of SQL against the rules leads to a sharp increase of log files like "mysql.log", "slow.log", and "mysql.err", etc.

          The solution is to upgrade the disk package reasonably, and optimize SQL, and purge abnormal log files. After that, the system disk usage rate declines to normal. See the following figure:

          Previous
          Slow Query Alarm Handling Method
          Next
          CPU Alarm Handling Method