GPU node resources
Updated at:2025-10-27
GPU node resources include GPU-occupied Pods, GPU card allocation, GPU card utilization, total GPU & Memory, GPU & Memory usage rates, average GPU card utilization, GPU card utilization rates, average memory utilization of GPU cards, memory usage of GPU cards, GPU card usage rates, Memory usage rates, and a detailed list of active GPU-occupied Pods.
Prerequisites
- The CCE AI Job Scheduler component has been installed and its version is ≥ 1.7.9
- The CCE GPU Manager component has been installed
- Accessed monitoring instances
- Collection tasks need to be enabled. For details, refer to the document: Access Monitoring Instance and Enable Collection Tasks
Application method
- Sign in to Cloud Container Engine Console (CCE).
- Click Cluster Management on the left sidebar. In the Cluster List, select the Cluster Name you need. Under Actions - More on the right, click Prometheus Monitoring to navigate to the Prometheus Monitoring Service.

- In the options at the bottom of the Prometheus Monitoring Page, select Cloud-Native AI Monitoring, then select GPU Node Resources.
GPU node resources are shown as follows:

You can click the button in the upper right corner to set monitoring time, manual refresh, and automatic refresh by yourself.
Detailed description of GPU node resources
Count of GPU-occupied Pods
| Monitoring items | Description |
|---|---|
| Count of GPU-occupied Pods | Count of GPU-occupied Pods on the current node |
GPU card allocation
| Monitoring items | Description |
|---|---|
| Total GPU cards | Total GPU cards in the current node |
| Allocation count | Allocated GPU cards in the current node |
| Card allocation rate | Allocation rate = allocated GPU cards / total GPU cards |
GPU card usage
| Monitoring items | Description |
|---|---|
| Average GPU card utilization rate | Real-time average utilization rate of all GPU cards in the current node, average utilization rate of GPU cards = sum (utilization rate of all GPU cards) / total GPU cards) |
| Average memory utilization rate of GPU card | Real-time average memory utilization rate of all GPU cards in the current node, average memory utilization rate = sum (memory utilization rate of all GPU cards) / total GPU cards) |
Total CPU & Memory / utilization rate
| Monitoring items | Description |
|---|---|
| CPU core count | Total CPU core count in the current node |
| Average CPU utilization rate | Real-time average utilization rate of all CPUs in the current node |
| Total memory | Total memory of the current node |
| Memory utilization rate | Real-time average utilization rate of all memory in the current node |
Utilization rate
| Monitoring items | Description |
|---|---|
| Average GPU card utilization rate | Real-time average utilization rate of all GPU cards in the current node, average utilization rate of GPU cards = sum (utilization rate of all GPU cards) / total GPU cards) |
| GPU card utilization rate | Real-time utilization rate of all GPU cards in the current node |
| Average memory utilization rate of GPU card | Real-time average memory utilization rate of all GPU cards in the current node, average memory utilization rate = sum (memory utilization rate of all GPU cards) / total GPU cards) |
| GPU card memory utilization rate | Real-time memory utilization rate of all GPU cards in the current node |
| CPU utilization rate | Real-time utilization rate of all CPU in the current node |
| Memory utilization rate | Real-time utilization rate of all memory in the current node |

List of running GPU-occupied Pods
| Monitoring items | Description |
|---|---|
| Name of workload | Name of the GPU-occupied Pods workload running on the current node |
| Type | Types of GPU-occupied Pod tasks running on the current node |
| Namespace | GPU-occupied Pods namespace running on the current node |
| Pod name | Name of the GPU-occupied Pods running on the current node |
| Allocated GPU cards | GPU cards allocated to GPU-occupied Pods running on the current node |
| Average GPU utilization rate | Real-time average utilization rate of GPU card in GPU-occupied Pods running on the current node |
| Average GPU memory utilization rate | Real-time average memory utilization rate of GPU card of GPU-occupied Pods running on the current node |
| Memory usage | Memory usage of GPU-occupied Pods running on the current node |
| CPU core count | CPU core count of GPU-occupied Pods running on the current node |

