Cloud-Native AI Overview

Updated at：2025-10-27

Overview of cloud-native AI

Cloud-native AI is powered by Baidu AI Cloud Container Engine (CCE), which enables GPU memory and computational resource sharing and isolation. It integrates popular deep learning frameworks like PaddlePaddle, TensorFlow, and PyTorch. With efficient task orchestration and management, it offers low-threshold deep learning training services for enterprises. This boosts GPU resource utilization, accelerates AI training, and reduces costs while enhancing performance.

Usage process

Step 1 (mandatory): Create a new cluster of v1.18 or above, and add nodes with GPU devices;

Step 2 (mandatory): Install cloud-native AI components. For details, see [Component Overview](CCE/Operation guide/Component Management/Component Overview.md);

Step 3 (optional): Enable memory sharing for GPU nodes;

Step 4 (mandatory): Create a new queue, specify resource quotas and associate users. For details, see [Create a New Queue](CCE/Operation guide/Cloud-native AI/Queue Management/Create Queue.md);

Step 5 (mandatory): Create a new task and submit the AI training task. For details, see [Create a New Task](CCE/Operation guide/Cloud-native AI/Task Management/Create TensorFlow Task.md).

GPU/NPU support list

Currently, the sharing and isolation of memory and computing power are supported for the following GPU/NPU models (including but not limited to these models). You can submit a ticket to learn more:

GPU/NPU card model
NVIDIA V100 16GB SXM2
NVIDIA V100 32GB SXM2
NVIDIA T4
NVIDIA A100 80GB SXM
NVIDIA A100 40GB SXM
NVIDIA A800 80GB
NVIDIA A30
NVIDIA A10
Kunlun Chip R200

Inspection and Diagnosis

AI Monitoring Dashboard

CCE CCE

CCE CCE

Cloud-Native AI Overview

Overview of cloud-native AI

Usage process

GPU/NPU support list