Self-Service Diagnostic Tool Overview
Overview
Baidu AI Cloud's self-service diagnostic tool provides a comprehensive diagnostic service for Baidu Cloud Compute, covering system status, hardware health, application issues, and more. Acting as a dedicated doctor for your Baidu Cloud Compute instances, it helps monitor health and quickly resolves common problems.
Application scenarios
- The application running on the instance is disrupted.
- The performance of the instance's application does not meet expectations.
- The operating system of the instance is unresponsive.
- Unable to establish remote connections to the instance.
- Routine health checks for the instance.
Product advantages
- User-friendly: No professional O&M expertise is required. Troubleshoot instances with a single click via the console, and the system will automatically diagnose and suggest solutions for detected issues.
- Enhanced efficiency: While manual troubleshooting of instances often takes days, the self-service diagnostic tool can identify issues within minutes.
- Completely free: The self-service diagnostic tool is a value-added feature of Baidu Cloud Compute and comes with no additional cost.
Terminology
Before using cloud assistant, you should understand the following core concepts:
| Term | Meaning |
|---|---|
| Diagnosis | The self-service diagnostic tool systematically performs diagnostic tasks on instances to pinpoint root causes of specific issues and offer solutions. |
| Diagnostic item | The smallest task executed in a diagnostic session, such as a GPU card-drop diagnostic during GPU hardware analysis, counts as one diagnostic item. |
| Classification of diagnostic items | Organize multiple diagnostic items to facilitate statistical analysis. |
Diagnostic results explanation
Each diagnostic item yields a diagnostic result, defined as follows:
- Normal: No anomalies found.
- Diagnostic: This diagnostic item was not fully completed, possibly due to instance shutdown during its execution.
- Low risk: Some abnormal metrics were detected in the diagnostic items, but they generally do not affect instance usage.
- Medium risk: Certain abnormal metrics were found in diagnostic items, which may affect instance performance.
- High risk: These anomalies can often make some instance hardware resources unavailable or result in serious performance issues.
Diagnostic status description
A diagnostic execution typically takes 1-2 minutes, varying by selected diagnostic scenario. Diagnostic status details are as follows:
| Status | Description |
|---|---|
| Diagnosing | Diagnostics are currently underway. |
| Diagnosis completed | All diagnostic items included in this diagnosis have been completed |
| Diagnostics failed | If not all diagnostic items in this diagnosis session are completed (e.g., due to shut down during execution), uncompleted diagnostic items will be marked as interrupted |
Dependencies to self-service diagnostic tool
Using the self-service diagnostic tool requires cloud assistant to be installed on your instance. Refer to Cloud Assistant Documentation for cloud assistant installation guidance.
