百度智能云

All Product Document

          Cloud Container Engine

          Overview of Cloud Native AI

          Overview of Cloud Native AI

          The Cloud Native AI is based on Baidu AI Cloud Container Engine (CCE) and supports the sharing and isolation of GPU memory and computing power. Also, the Cloud Native AI integrates mainstream deep learning frameworks such as PaddlePaddle, TensorFlow, and Pytorch. Thus, the Cloud Native AI can provide low-threshold and high-efficiency learning training services through orchestration and management of AI tasks, helping enterprise customers improve GPU resource utilization efficiency and AI training speed, quickly reducing costs and increasing efficiency.

          At present, this feature is in the open beta test stage. You need to apply for the open beta test before you use the feature.

          Operating Process

          Step 1 (required): Create a v1.18 cluster and add a node with a GPU device;

          Step 2 (required): Install the Cloud Native AI component. For details, see Component Overview;

          Step 3 (optional): Enable the graphic memory sharing for GPU nodes;

          Step 4 (required): Create a queue, specify resource quotas and associate users. For details, see Create Queue;

          Step 5 (Required): Create a task and submit an AI training task. For details, see Create Task.

          CPU Support List

          At present, the following types of GPUs support the sharing and isolation of graphic memory and computing power:

          | Tesla series |---------- | Tesla V100-SXM2-16GB | Tesla V100-SXM2-32GB | Tesla T4

          Previous
          Cloud Container Engine for Edge
          Next
          Queue Management