Health Check Exception Troubleshooting
Scenario introduction
The health check for a load balancer periodically detects the status of real servers or services, ensuring the load balancer only routes traffic to servers in normal operating condition. Through health checks, the load balancer can monitor backend server availability in real time, quickly detect and isolate faulty nodes, and maintain system stability and performance. Occasionally, unexplained health check anomalies may arise. This document introduces possible causes of such anomalies and their troubleshooting methods.
Troubleshooting steps
Check RS security group configuration
Typically, real servers are configured to allow health check traffic from BLB instances by default. For certain models without this capability or where default pass-through is disabled, health check traffic may be blocked by the security group settings on the real servers. To resolve this, configure the security group to allow inbound traffic from: 100.64.0.0/10 (IPv4) or 2403:ed40:f200::/40 (IPv6), with the protocol matching the health check protocol and the port set to the configured health check port. If no health check port is configured in BLB, the default port is the service port of the real server. Once configured, the real server will properly allow health check traffic within this range.
Check Host configuration (for HTTP/HTTPS listeners)
In certain real server setups, cases where the Host field in the HTTP header is null may be blocked to prevent abnormal requests. To address health check anomalies caused by this issue when configuring the backend open protocol for HTTP and HTTPS, set the Host header parameter to the intended domain name.
Check the real server port service configuration
When a separate health check port is specified, it may differ from the actual service port provided by the real server. In this situation, you need to start the corresponding service on the health check port of the real server to properly respond to the health check requests from the BLB instance.
- The service needs to be exposed on 0.0.0.0, which can be confirmed using the
netstat -antup(Linux) ornetstat -ano(Windows) command. - The service must correctly respond to the specified health check request protocol or method.
- If expected response conditions are configured, the service's returned response must meet these conditions to be considered successful.
Other possible scenarios
If the health check anomaly is not caused by the above issues, follow these steps to troubleshoot:
- Ensure that the real server has the required service port enabled.
- Verify whether protective software like firewalls within the real server are blocking health check messages.
- Ensure there are no iptables rules restricting the real server.
- Confirm that the client can directly access the real server's application service without issues.
- Check whether the Load Balancer's health check parameters are configured properly.
- For Layer 7 (HTTP/HTTPS) listeners, it is generally recommended to use static pages for health checks.
- Verify whether high load on the real server is causing slow external response times.
