Create PaddlePaddle Task
Updated at:2025-10-27
You can create a new task specifically of the PaddlePaddle type.
Prerequisites
- You have successfully installed the CCE AI Job Scheduler and CCE Deep Learning Frameworks Operator components; without these, the cloud-native AI features will be unavailable.
- As an IAM user, you can only use a queue to create new tasks if you are part of the users linked to that queue.
- Installing the CCE Deep Learning Frameworks Operator component will also install the PaddlePaddle deep learning framework.
Limitations
- PaddlePaddle tasks currently don’t support sharing GPU memory.
Operation steps
- Sign in to the Baidu AI Cloud official website and enter the management console.
- Go to Product Services - Cloud Native - Cloud Container Engine (CCE) to access the CCE management console.
- Click Cluster Management - Cluster List in the left navigation pane.
- Click on the target cluster name in the Cluster List page to navigate to the cluster management page.
- On the Cluster Management page, click Cloud-Native AI - Task Management.
- Click Create Task on the Task Management page.
- On the Create Task page, configure basic task information:

- Specify a custom task name using lowercase letters, numbers, “-”, or “.”. The name must start and end with a lowercase letter or number and be 1-65 characters long.
- Namespace: Choose the namespace for the new task.
- Queue: Choose the corresponding queue for the new task.
- Task priority: Set the priority level for the task.
- Allow overcommitment: Enable this option to use task preemption for overcommitment. The CCE AI Job Scheduler component must be installed and updated to version 1.4.0 or higher.
- Tolerance for delay: The system will prioritize scheduling tasks or workloads to fragmented cluster resources to enhance cluster resource utilization, though this might impact business latency performance.
- Configure basic code information:

- Code configuration type: Specify the code configuration method. Current options include “BOS File,” “Local File Upload,” and “Not Configured Temporarily.”
- Execution command: Define the command to execute the code.
- Configure data-related information:

- Set Data Source: Supports both datasets and persistent volume claims (PVCs). For datasets: All available datasets are listed, and selecting a dataset will automatically select a PVC with the same name. For PVCs: Directly select the desired PVC.
- Click "Next" to proceed to container-related configurations.
- Configure task type information:

- Select Framework: Choose PaddleJob.
- Training method: Select either Single-Machine or Distributed training.
- Select Role: For "Single-machine" training, only "Worker" can be chosen. For "Distributed" training, the "PS" role can also be selected.
- Configure pod information (advanced settings are optional).

- Specify the number of pods desired in the pod.
- Define the restart policy for the pod. Options: “Restart on Failure” or “Never Restart”.
- Provide the address for pulling the container image. Alternatively, click Select Image to choose the desired image.
- Enter the image version. If left unspecified, the latest version will be used by default.
- Set the CPU, memory, and GPU resource requirements for the container.
- Environment Variables: Enter the variable names and their corresponding values.
- Lifecycle: Includes start commands, parameters, actions after startup, and actions before stopping, all of which can be customized as needed.
- Configure the advanced task settings.

- Set the maximum allowable training duration (leave blank for unlimited duration).
- Add credentials to access the private image registry if using a private image.
- Tensorboard: If task visualization is required, the Tensorboard function can be enabled. After enabling, you need to specify the “Service Type” and “ Training Log Reading Path”.
- Assign K8s labels to the task.
- Provide annotations for the task.
- Click the Finish button to finalize task creation.
Example of creating a task with YAML
Plain Text
1apiVersion: batch.paddlepaddle.org/v1
2kind: PaddleJob
3metadata:
4 name: resnet
5spec:
6 cleanPodPolicy: Never
7 worker:
8 replicas: 2
9 template:
10 spec:
11 schedulerName: volcano
12 containers:
13 - name: resnet
14 image: registry.baidubce.com/cce-public/kubeflow/paddle-operator/demo-resnet:v1
15 env:
16 # for gpu memory over request, set 0 to disable
17 - name: CGPU_MEM_ALLOCATOR_TYPE
18 value: “1”
19 command:
20 - python
21 args:
22 - "-m"
23 - "paddle.distributed.launch"
24 - "train_fleet.py"
25 volumeMounts:
26 - mountPath: /dev/shm
27 name: dshm
28 resources:
29 requests:
30 cpu: 1
31 memory: 2Gi
32 limits:
33 baidu.com/v100_16g_cgpu: "1"
34 volumes:
35 - name: dshm
36 emptyDir:
37 medium: Memory
