GPU workload resources
Updated at:2025-10-27
GPU workload resources, including task attributes, the number of GPU cards, GPU resource usage, and a summary of GPU node usage.
Prerequisites
- The CCE AI Job Scheduler component has been installed and its version is ≥ 1.7.9
- The CCE GPU Manager component has been installed
- Accessed monitoring instances
- Collection tasks need to be enabled. For details, refer to the document: Access Monitoring Instance and Enable Collection Tasks
Application method
- Sign in to Cloud Container Engine Console (CCE).
- Click Cluster Management on the left sidebar. In the Cluster List, select the Cluster Name you need. Under Actions - More on the right, click Prometheus Monitoring to navigate to the Prometheus Monitoring Service.

- In the options at the bottom of the Prometheus Monitoring Page, select Cloud-Native AI Monitoring, then select GPU Workload Resources.
GPU workload resources are shown as follows:

You can click the button in the upper right corner to set monitoring time, manual refresh, and automatic refresh by yourself.
Detailed description of GPU workload resources
Task attributes
| Monitoring items | Description |
|---|---|
| Name of workload | Name of current workload |
| Type | Type of current workload |
| Namespace | Namespace of current workload |
| Start time | Start time of current workload |
| Runtime | Runtime of current workload |

Card count & GPU resource utilization
| Monitoring items | Description |
|---|---|
| GPU card count | Count of GPU cards of the current workload |
| GPU utilization rate | Real-time average utilization of all GPUs of current workload |
| Memory utilization rate | Real-time average utilization of all memory of current workload |
| Memory usage | Real-time memory usage of current workload |

GPU node usage summary
| Monitoring items | Description |
|---|---|
| Namespace | Namespace of GPU nodes in current workload |
| IP of the node | IP of GPU nodes in current workload |
| Pod name | Name of Pod running on GPU nodes in current workload |
| Allocated GPU cards | GPU cards allocated to GPU nodes in the current workload |
| Average GPU utilization rate | Average GPU utilization rate of GPU nodes in current workload |
| Memory usage | Memory usage of GPU nodes in current workload |
| Average memory utilization rate | Average memory utilization rate of GPU nodes in current workload |

