Description of CCE AI Job Scheduler
Last Updated:2022-01-14
Component Introduction
The task scheduling component supports the scheduling and management of various AI tasks. Combined with the CCE Deep Leaning Frameworks Operator, you can conduct the deep learning model training directly on the CCE.
Component Feature
- This component supports many scheduling policies and enhanced Job management abilities.
- The component support two scheduling policies, including spread policy and binpack policy. The binpack policy means that multiple Pods share the same GPU card, which is suitable for scenarios where you need to improve GPU resource utilization. The spread policy means that multiple Pods use different GPU cards decentralized, which applies to GPU high-availability scenarios.
Application Scenarios
You can directly run a deep learning task on a CCE cluster, improving AI engineering efficiency.
Restriction Description
- Only version v1.18 of Kubernetes clusters are available.
Install Component
- Log in to Baidu AI Cloud Official Website, and then enter the management console.
- Select “Product Service > Cloud Native CCE”, and click CCE to enter Cloud Container Engine console.
- Click Cluster Management > Cluster List in the navbar on the left side.
- On the cluster list page, click the target cluster name to enter the cluster management page.
- On the cluster management page, clickComponent Management.
- Select the CCE AI Job Scheduler component in the component management list and click Install.
- Complete the deep learning framework configuration on the component configuration page.
- Resource scheduling: support two resource scheduling policies, including binpack and spread policies. The binpack policy means that multiple Pods share the same GPU card. The spread policy means that multiple Pods use different GPU cards in a decentralized manner.
- Click the “Install” button to complete the component installation.