Baidu AI Cloud
中国站

百度智能云

Cloud Container Engine

Description of CCE AI Job Scheduler

Component Introduction

The task scheduling component supports the scheduling and management of various AI tasks. Combined with the CCE Deep Leaning Frameworks Operator, you can conduct the deep learning model training directly on the CCE.

Component Feature

  • This component supports many scheduling policies and enhanced Job management abilities.
  • The component support two scheduling policies, including spread policy and binpack policy. The binpack policy means that multiple Pods share the same GPU card, which is suitable for scenarios where you need to improve GPU resource utilization. The spread policy means that multiple Pods use different GPU cards decentralized, which applies to GPU high-availability scenarios.

Application Scenarios

You can directly run a deep learning task on a CCE cluster, improving AI engineering efficiency.

Restriction Description

  • Only version v1.18 of Kubernetes clusters are available.

Install Component

  1. Log in to Baidu AI Cloud Official Website, and then enter the management console.
  2. Select “Product Service > Cloud Native CCE”, and click CCE to enter Cloud Container Engine console.
  3. Click Cluster Management > Cluster List in the navbar on the left side.
  4. On the cluster list page, click the target cluster name to enter the cluster management page.
  5. On the cluster management page, clickComponent Management.
  6. Select the CCE AI Job Scheduler component in the component management list and click Install.
  7. Complete the deep learning framework configuration on the component configuration page.

image.png

  • Resource scheduling: support two resource scheduling policies, including binpack and spread policies. The binpack policy means that multiple Pods share the same GPU card. The spread policy means that multiple Pods use different GPU cards in a decentralized manner.
  1. Click the “Install” button to complete the component installation.
Previous
Component Overview
Next
Description of CCE Deep Learning Frameworks Operator