Overview of Cloud Native AI

Last Updated：2022-01-14

Overview of Cloud Native AI

The Cloud Native AI is based on Baidu AI Cloud Container Engine (CCE) and supports the sharing and isolation of GPU memory and computing power. Also, the Cloud Native AI integrates mainstream deep learning frameworks such as PaddlePaddle, TensorFlow, and Pytorch. Thus, the Cloud Native AI can provide low-threshold and high-efficiency learning training services through orchestration and management of AI tasks, helping enterprise customers improve GPU resource utilization efficiency and AI training speed, quickly reducing costs and increasing efficiency.

At present, this feature is in the open beta test stage. You need to apply for the open beta test before you use the feature.

Operating Process

Step 1 (required): Create a v1.18 cluster and add a node with a GPU device;

Step 2 (required): Install the Cloud Native AI component. For details, see Component Overview;

Step 3 (optional): Enable the graphic memory sharing for GPU nodes;

Step 4 (required): Create a queue, specify resource quotas and associate users. For details, see Create Queue;

Step 5 (Required): Create a task and submit an AI training task. For details, see Create Task.

CPU Support List

At present, the following types of GPUs support the sharing and isolation of graphic memory and computing power:

Cloud Container Engine for Edge

Queue Management