百度智能云

All Product Document

          Cloud Container Engine

          Description of CCE GPU Manager

          Component Introduction

          A series of GPU device plugins combined with the supporting scheduler can achieve GPU resource scheduling capabilities in complex scenarios.

          Component Feature

          • Topology allocation: provides a GPU-based topological allocation feature. When the user allocates more than one GPU card to the Pod, the system automatically selects the fastest method for the topological connection to allocate GPU devices.
          • GPU sharing: provides the ability to enable the graphic memory sharing feature for the GPU devices on the node and supports the allocation of GPU cards to multiple Pods according to the graphic memory size.
          • Isolation of graphic memory and computing power: When multiple Pods share a single GPU card, the graphic memory is isolated from the computing power.

          Application Scenarios

          When you run the GPU application in the CCE cluster, it can solve the waste of resources due to the exclusive use of the whole card in the AI training scenario to improve the usage rate of resources and reduce the cost.

          Restriction Description

          • Only version v1.18 of Kubernetes clusters are available.
          • At present, this component relies on the CCE AI Job Scheduler. If you need to install the component together with the CCE AI Job Scheduler, the component feature may be unavailable.

          Install Component

          1. Log in to Baidu AI Cloud Official Website, and then enter the management console.
          2. Select “Product Service > Cloud Native > CCE”, and click CCE to enter the Container Engine management console.
          3. Click Cluster Management > Cluster List in the navbar on the left side.
          4. On the cluster list page, click the target cluster name to enter the cluster management page.
          5. On the cluster management page, click Component Management.
          6. Select the CCE GPU Manager component in the component management list and click “Install”.
          7. In the confirmation pop-up box, click the "OK" button to complete the component installation.
          Previous
          Description of CCE Dynamic Scheduling Plug-in
          Next
          Description of CCE RDMA Device Plugin