NodeGroup Management

CCE CCE

  • Function Release Records
  • Common Tools
    • Command Line Scenario Examples
  • API Reference
    • Overview
    • Common Headers and Error Responses
    • General Description
  • Product Announcement
    • Announcement on the Discontinuation of CCE Standalone Clusters
    • CCE New Cluster Management Release Announcement
    • Upgrade Announcement for CCE Cluster Audit Component kube-external-auditor
    • CCE Console Upgrade Announcement
    • Announcement on Management Fees for CCE Managed Clusters
    • Container Runtime Version Release Notes
    • Announcement on the Decommissioning of CCE Image Repository
    • Kubernetes Version Release Notes
      • CCE Release of Kubernetes v1_26 History
      • CCE Kubernetes Version Update Notes
      • CCE Release of Kubernetes v1_24 History
      • CCE Release of Kubernetes v1_30 History
      • CCE Release of Kubernetes v1_22 History
      • CCE Release of Kubernetes v1_18 History
      • CCE Release of Kubernetes v1_20 History
      • CCE Release of Kubernetes v1_28 History
      • Release Notes for CCE Kubernetes 1_31 Version
      • Kubernetes Version Overview and Mechanism
    • Security Vulnerability Fix Announcement
      • Vulnerability CVE-2019-5736 Fix Announcement
      • Vulnerability CVE-2021-30465 Fix Announcement
      • CVE-2025-1097, CVE-2025-1098, and Other Vulnerabilities Fix Announcement
      • CVE-2020-14386 Vulnerability Fix Announcement
      • Impact Statement on runc Security Issue (CVE-2024-21626)
  • Service Level Agreement (SLA)
    • CCE Service Level Agreement SLA (V1_0)
  • Typical Practices
    • Pod Anomaly Troubleshooting
    • Adding CGroup V2 Node
    • Common Linux System Configuration Parameters Description
    • Encrypting etcd Data Using KMS
    • Configuring Container Network Parameters Using CNI
    • CCE - Public Network Access Practice
    • Practice of using private images in CCE clusters
    • Unified Access for Virtual Machines and Container Services via CCE Ingress
    • User Guide for Custom CNI Plugins
    • CCE Cluster Network Description and Planning
    • Cross-Cloud Application Migration to Baidu CCE Using Velero
    • CCE Resource Recommender User Documentation
    • Continuous Deployment with Jenkins in CCE Cluster
    • CCE Best Practice-Guestbook Setup
    • CCE Best Practice-Container Network Mode Selection
    • CCE Usage Checklist
    • VPC-ENI Mode Cluster Public Network Access Practice
    • CCE Container Runtime Selection
    • Cloud-native AI
      • Elastic and Fault-Tolerant Training Using CCE AITraining Operator
      • Deploy the TensorFlow Serving inference service
      • Best Practice for GPU Virtualization with Optimal Isolation
  • FAQs
    • How do business applications use load balancer
    • Using kubectl on Windows
    • Cluster management FAQs
    • Common Questions Overview
    • Auto scaling FAQs
    • Create a simple service via kubectl
  • Operation guide
    • Prerequisites for use
    • Identity and access management
    • Permission Management
      • Configure IAM Tag Permission Policy
      • Permission Overview
      • Configure IAM Custom Permission Policy
      • Configure Predefined RBAC Permission Policy
      • Configure IAM Predefined Permission Policy
      • Configure Cluster OIDC Authentication
    • Configuration Management
      • Configmap Management
      • Secret Management
    • Traffic access
      • BLB ingress annotation description
      • Use K8S_Service via CCE
      • Use K8S_Ingress via CCE
      • Implement Canary Release with CCE Based on Nginx-Ingress
      • Create CCE_Ingress via YAML
      • LoadBalancer Service Annotation Description
      • Service Reuses Existing Load Balancer BLB
      • Use Direct Pod Mode LoadBalancer Service
      • NGINX Ingress Configuration Reference
      • Create LoadBalancer_Service via YAML
      • Use NGINX Ingress
    • Virtual Node
      • Configuring BCIPod
      • Configuring bci-profile
      • Managing virtual nodes
    • Node management
      • Add a node
      • Managing Taints
      • Setting Node Blocking
      • Setting GPU Memory Sharing
      • Remove a node
      • Customizing Kubelet Parameters
      • Kubelet Container Monitor Read-Only Port Risk Warning
      • Managing Node Tag
      • Drain node
    • Component Management
      • CCE CSI CDS Plugin Description
      • CCE Fluid Description
      • CCE CSI PFS L2 Plugin
      • CCE Calico Felix Description
      • CCE Ingress Controller Description
      • CCE QoS Agent Description
      • CCE GPU Manager Description
      • CCE Ingress NGINX Controller Description
      • CCE P2P Accelerator Description
      • CCE Virtual Kubelet Component
      • CoreDNS Description
      • CCE Log Operator Description
      • CCE Node Remedier Description
      • CCE Descheduler Description
      • CCE Dynamic Scheduling Plugin Description
      • Kube Scheduler Documentation
      • CCE NPU Manager Description
      • CCE CronHPA Controller Description
      • CCE LB Controller Description
      • Kube ApiServer Description
      • CCE Backup Controller Description
      • CCE Network Plugin Description
      • CCE CSI PFS Plugin Description
      • CCE Credential Controller Description
      • CCE Deep Learning Frameworks Operator Description
      • Component Overview
      • CCE Image Accelerate Description
      • CCE CSI BOS Plugin Description
      • CCE Onepilot Description
      • Description of Kube Controller Manager
      • CCE_Hybrid_Manager Description
      • CCE NodeLocal DNSCache Description
      • CCE Node Problem Detector Description
      • CCE Ascend Mindx DL Description
      • CCE RDMA Device Plugin Description
      • CCE AI Job Scheduler Description
    • Image registry
      • Image Registry Basic Operations
      • Using Container Image to Build Services
    • Helm Management
      • Helm Template
      • Helm Instance
    • Cluster management
      • Upgrade Cluster Kubernetes Version
      • CCE Node CDS Dilatation
      • Managed Cluster Usage Instructions
      • Create cluster
      • CCE Supports GPUSharing Cluster
      • View Cluster
      • Connect to Cluster via kubectl
      • CCE Security Group
      • CCE Node Resource Reservation Instructions
      • Operate Cluster
      • Cluster Snapshot
    • Serverless Cluster
      • Product overview
      • Using Service in Serverless Cluster
      • Creating a Serverless Cluster
    • Storage Management
      • Using Cloud File System
      • Overview
      • Using Parallel File System PFS
      • Using RapidFS
      • Using Object Storage BOS
      • Using Parallel File System PFS L2
      • Using Local Storage
      • Using Cloud Disk CDS
    • Inspection and Diagnosis
      • Cluster Inspection
      • GPU Runtime Environment Check
      • Fault Diagnosis
    • Cloud-native AI
      • Cloud-Native AI Overview
      • AI Monitoring Dashboard
        • Connecting to a Prometheus Instance and Starting a Job
        • NVIDIA Chip Resource Observation
          • AI Job Scheduler component
          • GPU node resources
          • GPU workload resources
          • GPUManager component
          • GPU resource pool overview
        • Ascend Chip Resource Observation
          • Ascend resource pool overview
          • Ascend node resource
          • Ascend workload resource
      • Task Management
        • View Task Information
        • Create TensorFlow Task
        • Example of RDMA Distributed Training Based on NCCL
        • Create PaddlePaddle Task
        • Create AI Training Task
        • Delete task
        • Create PyTorch Task
        • Create Mxnet Task
      • Queue Management
        • Modify Queue
        • Create Queue
        • Usage Instructions for Logical Queues and Physical Queues
        • Queue deletion
      • Dataset Management
        • Create Dataset
        • Delete dataset
        • View Dataset
        • Operate Dataset
      • AI Acceleration Kit
        • AIAK Introduction
        • Using AIAK-Training PyTorch Edition
        • Deploying Distributed Training Tasks Using AIAK-Training
        • Accelerating Inference Business Using AIAK-Inference
      • GPU Virtualization
        • GPU Exclusive and Shared Usage Instructions
        • Image Build Precautions in Shared GPU Scenarios
        • Instructions for Multi-GPU Usage in Single-GPU Containers
        • GPU Virtualization Adaptation Table
        • GPU Online and Offline Mixed Usage Instructions
        • MPS Best Practices & Precautions
        • Precautions for Disabling Node Video Memory Sharing
    • Elastic Scaling
      • Container Timing Horizontal Scaling (CronHPA)
      • Container Horizontal Scaling (HPA)
      • Implementing Second-Level Elastic Scaling with cce-autoscaling-placeholder
      • CCE Cluster Node Auto-Scaling
    • Network Management
      • How to Continue Dilatation When Container Network Segment Space Is Exhausted (VPC-ENI Mode)
      • Container Access to External Services in CCE Clusters
      • CCE supports dual-stack networks of IPv4 and IPv6
      • Using NetworkPolicy Network Policy
      • Traffic Forwarding Configuration for Containers in Peering Connections Scenarios
      • CCE IP Masquerade Agent User Guide
      • Creating VPC-ENI Mode Cluster
      • How to Continue Dilatation When Container Network Segment Space Is Exhausted (VPC Network Mode)
      • Using NetworkPolicy in CCE Clusters
      • Network Orchestration
        • Container Network QoS Management
        • VPC-ENI Specified Subnet IP Allocation (Container Network v2)
        • Cluster Pod Subnet Topology Distribution (Container Network v2)
      • Network Connectivity
        • Container network accesses the public network via NAT gateway
      • Network Maintenance
        • Common Error Code Table for CCE Container Network
      • DNS
        • CoreDNS Component Manual Dilatation Guide
        • DNS Troubleshooting Guide
        • DNS Principle Overview
    • Namespace Management
      • Set Limit Range
      • Set Resource Quota
      • Basic Namespace Operations
    • Workload
      • CronJob Management
      • Set Workload Auto-Scaling
      • Deployment Management
      • Job Management
      • View the Pod
      • StatefulSet Management
      • Password-Free Pull of Container Image
      • Create Workload Using Private Image
      • DaemonSet Management
    • Monitor Logs
      • Monitor Cluster with Prometheus
      • CCE Event Center
      • Cluster Service Profiling
      • CCE Cluster Anomaly Event Alerts
      • Java Application Monitor
      • Cluster Audit Dashboard
      • Logging
      • Cluster Audit
      • Log Center
        • Configure Collection Rules Using CRD
        • View Cluster Control Plane Logs
        • View Business Logs
        • Log Overview
        • Configure Collection Rules in Cloud Container Engine Console
    • Application management
      • Overview
      • Secret
      • Configuration dictionary
      • Deployment
      • Service
      • Pod
    • NodeGroup Management
      • NodeGroup Management
      • NodeGroup Node Fault Detection and Self-Healing
      • Configuring Scaling Policies
      • NodeGroup Introduction
      • Adding Existing External Nodes
      • Custom NodeGroup Kubelet Configuration
      • Adding Alternative Models
      • Dilatation NodeGroup
    • Backup Center
      • Restore Management
      • Backup Overview
      • Backup Management
      • Backup repository
  • Quick Start
    • Quick Deployment of Nginx Application
    • CCE Container Engine Usage Process Overview
  • Product pricing
    • Product pricing
  • Product Description
    • Application scenarios
    • Introduction
    • Usage restrictions
    • Features
    • Advantages
    • Core concepts
  • Solution-Fabric
    • Fabric Solution
  • Development Guide
    • EFK Log Collection System Deployment Guide
    • Using Network Policy in CCE Cluster
    • Creating a LoadBalancer-Type Service
    • Prometheus Monitoring System Deployment Guide
    • kubectl Management Configuration
  • API_V2 Reference
    • Overview
    • Common Headers and Error Responses
    • Cluster Related Interfaces
    • Instance Related Interfaces
    • Service domain
    • General Description
    • Kubeconfig Related Interfaces
    • RBAC Related Interfaces
    • Autoscaler Related Interfaces
    • Network Related Interfaces
    • InstanceGroup Related Interfaces
    • Appendix
    • Component management-related APIs
    • Package adaptation-related APIs
    • Task Related Interfaces
  • Solution-Xchain
    • Hyperchain Solution
  • SDK
    • Go-SDK
      • Overview
      • NodeGroup Management
      • Initialization
      • Install the SDK Package
      • Cluster management
      • Node management
All documents
menu
No results found, please re-enter

CCE CCE

  • Function Release Records
  • Common Tools
    • Command Line Scenario Examples
  • API Reference
    • Overview
    • Common Headers and Error Responses
    • General Description
  • Product Announcement
    • Announcement on the Discontinuation of CCE Standalone Clusters
    • CCE New Cluster Management Release Announcement
    • Upgrade Announcement for CCE Cluster Audit Component kube-external-auditor
    • CCE Console Upgrade Announcement
    • Announcement on Management Fees for CCE Managed Clusters
    • Container Runtime Version Release Notes
    • Announcement on the Decommissioning of CCE Image Repository
    • Kubernetes Version Release Notes
      • CCE Release of Kubernetes v1_26 History
      • CCE Kubernetes Version Update Notes
      • CCE Release of Kubernetes v1_24 History
      • CCE Release of Kubernetes v1_30 History
      • CCE Release of Kubernetes v1_22 History
      • CCE Release of Kubernetes v1_18 History
      • CCE Release of Kubernetes v1_20 History
      • CCE Release of Kubernetes v1_28 History
      • Release Notes for CCE Kubernetes 1_31 Version
      • Kubernetes Version Overview and Mechanism
    • Security Vulnerability Fix Announcement
      • Vulnerability CVE-2019-5736 Fix Announcement
      • Vulnerability CVE-2021-30465 Fix Announcement
      • CVE-2025-1097, CVE-2025-1098, and Other Vulnerabilities Fix Announcement
      • CVE-2020-14386 Vulnerability Fix Announcement
      • Impact Statement on runc Security Issue (CVE-2024-21626)
  • Service Level Agreement (SLA)
    • CCE Service Level Agreement SLA (V1_0)
  • Typical Practices
    • Pod Anomaly Troubleshooting
    • Adding CGroup V2 Node
    • Common Linux System Configuration Parameters Description
    • Encrypting etcd Data Using KMS
    • Configuring Container Network Parameters Using CNI
    • CCE - Public Network Access Practice
    • Practice of using private images in CCE clusters
    • Unified Access for Virtual Machines and Container Services via CCE Ingress
    • User Guide for Custom CNI Plugins
    • CCE Cluster Network Description and Planning
    • Cross-Cloud Application Migration to Baidu CCE Using Velero
    • CCE Resource Recommender User Documentation
    • Continuous Deployment with Jenkins in CCE Cluster
    • CCE Best Practice-Guestbook Setup
    • CCE Best Practice-Container Network Mode Selection
    • CCE Usage Checklist
    • VPC-ENI Mode Cluster Public Network Access Practice
    • CCE Container Runtime Selection
    • Cloud-native AI
      • Elastic and Fault-Tolerant Training Using CCE AITraining Operator
      • Deploy the TensorFlow Serving inference service
      • Best Practice for GPU Virtualization with Optimal Isolation
  • FAQs
    • How do business applications use load balancer
    • Using kubectl on Windows
    • Cluster management FAQs
    • Common Questions Overview
    • Auto scaling FAQs
    • Create a simple service via kubectl
  • Operation guide
    • Prerequisites for use
    • Identity and access management
    • Permission Management
      • Configure IAM Tag Permission Policy
      • Permission Overview
      • Configure IAM Custom Permission Policy
      • Configure Predefined RBAC Permission Policy
      • Configure IAM Predefined Permission Policy
      • Configure Cluster OIDC Authentication
    • Configuration Management
      • Configmap Management
      • Secret Management
    • Traffic access
      • BLB ingress annotation description
      • Use K8S_Service via CCE
      • Use K8S_Ingress via CCE
      • Implement Canary Release with CCE Based on Nginx-Ingress
      • Create CCE_Ingress via YAML
      • LoadBalancer Service Annotation Description
      • Service Reuses Existing Load Balancer BLB
      • Use Direct Pod Mode LoadBalancer Service
      • NGINX Ingress Configuration Reference
      • Create LoadBalancer_Service via YAML
      • Use NGINX Ingress
    • Virtual Node
      • Configuring BCIPod
      • Configuring bci-profile
      • Managing virtual nodes
    • Node management
      • Add a node
      • Managing Taints
      • Setting Node Blocking
      • Setting GPU Memory Sharing
      • Remove a node
      • Customizing Kubelet Parameters
      • Kubelet Container Monitor Read-Only Port Risk Warning
      • Managing Node Tag
      • Drain node
    • Component Management
      • CCE CSI CDS Plugin Description
      • CCE Fluid Description
      • CCE CSI PFS L2 Plugin
      • CCE Calico Felix Description
      • CCE Ingress Controller Description
      • CCE QoS Agent Description
      • CCE GPU Manager Description
      • CCE Ingress NGINX Controller Description
      • CCE P2P Accelerator Description
      • CCE Virtual Kubelet Component
      • CoreDNS Description
      • CCE Log Operator Description
      • CCE Node Remedier Description
      • CCE Descheduler Description
      • CCE Dynamic Scheduling Plugin Description
      • Kube Scheduler Documentation
      • CCE NPU Manager Description
      • CCE CronHPA Controller Description
      • CCE LB Controller Description
      • Kube ApiServer Description
      • CCE Backup Controller Description
      • CCE Network Plugin Description
      • CCE CSI PFS Plugin Description
      • CCE Credential Controller Description
      • CCE Deep Learning Frameworks Operator Description
      • Component Overview
      • CCE Image Accelerate Description
      • CCE CSI BOS Plugin Description
      • CCE Onepilot Description
      • Description of Kube Controller Manager
      • CCE_Hybrid_Manager Description
      • CCE NodeLocal DNSCache Description
      • CCE Node Problem Detector Description
      • CCE Ascend Mindx DL Description
      • CCE RDMA Device Plugin Description
      • CCE AI Job Scheduler Description
    • Image registry
      • Image Registry Basic Operations
      • Using Container Image to Build Services
    • Helm Management
      • Helm Template
      • Helm Instance
    • Cluster management
      • Upgrade Cluster Kubernetes Version
      • CCE Node CDS Dilatation
      • Managed Cluster Usage Instructions
      • Create cluster
      • CCE Supports GPUSharing Cluster
      • View Cluster
      • Connect to Cluster via kubectl
      • CCE Security Group
      • CCE Node Resource Reservation Instructions
      • Operate Cluster
      • Cluster Snapshot
    • Serverless Cluster
      • Product overview
      • Using Service in Serverless Cluster
      • Creating a Serverless Cluster
    • Storage Management
      • Using Cloud File System
      • Overview
      • Using Parallel File System PFS
      • Using RapidFS
      • Using Object Storage BOS
      • Using Parallel File System PFS L2
      • Using Local Storage
      • Using Cloud Disk CDS
    • Inspection and Diagnosis
      • Cluster Inspection
      • GPU Runtime Environment Check
      • Fault Diagnosis
    • Cloud-native AI
      • Cloud-Native AI Overview
      • AI Monitoring Dashboard
        • Connecting to a Prometheus Instance and Starting a Job
        • NVIDIA Chip Resource Observation
          • AI Job Scheduler component
          • GPU node resources
          • GPU workload resources
          • GPUManager component
          • GPU resource pool overview
        • Ascend Chip Resource Observation
          • Ascend resource pool overview
          • Ascend node resource
          • Ascend workload resource
      • Task Management
        • View Task Information
        • Create TensorFlow Task
        • Example of RDMA Distributed Training Based on NCCL
        • Create PaddlePaddle Task
        • Create AI Training Task
        • Delete task
        • Create PyTorch Task
        • Create Mxnet Task
      • Queue Management
        • Modify Queue
        • Create Queue
        • Usage Instructions for Logical Queues and Physical Queues
        • Queue deletion
      • Dataset Management
        • Create Dataset
        • Delete dataset
        • View Dataset
        • Operate Dataset
      • AI Acceleration Kit
        • AIAK Introduction
        • Using AIAK-Training PyTorch Edition
        • Deploying Distributed Training Tasks Using AIAK-Training
        • Accelerating Inference Business Using AIAK-Inference
      • GPU Virtualization
        • GPU Exclusive and Shared Usage Instructions
        • Image Build Precautions in Shared GPU Scenarios
        • Instructions for Multi-GPU Usage in Single-GPU Containers
        • GPU Virtualization Adaptation Table
        • GPU Online and Offline Mixed Usage Instructions
        • MPS Best Practices & Precautions
        • Precautions for Disabling Node Video Memory Sharing
    • Elastic Scaling
      • Container Timing Horizontal Scaling (CronHPA)
      • Container Horizontal Scaling (HPA)
      • Implementing Second-Level Elastic Scaling with cce-autoscaling-placeholder
      • CCE Cluster Node Auto-Scaling
    • Network Management
      • How to Continue Dilatation When Container Network Segment Space Is Exhausted (VPC-ENI Mode)
      • Container Access to External Services in CCE Clusters
      • CCE supports dual-stack networks of IPv4 and IPv6
      • Using NetworkPolicy Network Policy
      • Traffic Forwarding Configuration for Containers in Peering Connections Scenarios
      • CCE IP Masquerade Agent User Guide
      • Creating VPC-ENI Mode Cluster
      • How to Continue Dilatation When Container Network Segment Space Is Exhausted (VPC Network Mode)
      • Using NetworkPolicy in CCE Clusters
      • Network Orchestration
        • Container Network QoS Management
        • VPC-ENI Specified Subnet IP Allocation (Container Network v2)
        • Cluster Pod Subnet Topology Distribution (Container Network v2)
      • Network Connectivity
        • Container network accesses the public network via NAT gateway
      • Network Maintenance
        • Common Error Code Table for CCE Container Network
      • DNS
        • CoreDNS Component Manual Dilatation Guide
        • DNS Troubleshooting Guide
        • DNS Principle Overview
    • Namespace Management
      • Set Limit Range
      • Set Resource Quota
      • Basic Namespace Operations
    • Workload
      • CronJob Management
      • Set Workload Auto-Scaling
      • Deployment Management
      • Job Management
      • View the Pod
      • StatefulSet Management
      • Password-Free Pull of Container Image
      • Create Workload Using Private Image
      • DaemonSet Management
    • Monitor Logs
      • Monitor Cluster with Prometheus
      • CCE Event Center
      • Cluster Service Profiling
      • CCE Cluster Anomaly Event Alerts
      • Java Application Monitor
      • Cluster Audit Dashboard
      • Logging
      • Cluster Audit
      • Log Center
        • Configure Collection Rules Using CRD
        • View Cluster Control Plane Logs
        • View Business Logs
        • Log Overview
        • Configure Collection Rules in Cloud Container Engine Console
    • Application management
      • Overview
      • Secret
      • Configuration dictionary
      • Deployment
      • Service
      • Pod
    • NodeGroup Management
      • NodeGroup Management
      • NodeGroup Node Fault Detection and Self-Healing
      • Configuring Scaling Policies
      • NodeGroup Introduction
      • Adding Existing External Nodes
      • Custom NodeGroup Kubelet Configuration
      • Adding Alternative Models
      • Dilatation NodeGroup
    • Backup Center
      • Restore Management
      • Backup Overview
      • Backup Management
      • Backup repository
  • Quick Start
    • Quick Deployment of Nginx Application
    • CCE Container Engine Usage Process Overview
  • Product pricing
    • Product pricing
  • Product Description
    • Application scenarios
    • Introduction
    • Usage restrictions
    • Features
    • Advantages
    • Core concepts
  • Solution-Fabric
    • Fabric Solution
  • Development Guide
    • EFK Log Collection System Deployment Guide
    • Using Network Policy in CCE Cluster
    • Creating a LoadBalancer-Type Service
    • Prometheus Monitoring System Deployment Guide
    • kubectl Management Configuration
  • API_V2 Reference
    • Overview
    • Common Headers and Error Responses
    • Cluster Related Interfaces
    • Instance Related Interfaces
    • Service domain
    • General Description
    • Kubeconfig Related Interfaces
    • RBAC Related Interfaces
    • Autoscaler Related Interfaces
    • Network Related Interfaces
    • InstanceGroup Related Interfaces
    • Appendix
    • Component management-related APIs
    • Package adaptation-related APIs
    • Task Related Interfaces
  • Solution-Xchain
    • Hyperchain Solution
  • SDK
    • Go-SDK
      • Overview
      • NodeGroup Management
      • Initialization
      • Install the SDK Package
      • Cluster management
      • Node management
  • Document center
  • arrow
  • CCECCE
  • arrow
  • Operation guide
  • arrow
  • NodeGroup Management
  • arrow
  • NodeGroup Management
Table of contents on this page
  • Overview
  • Create node group
  • View node group
  • Configure auto scaling
  • Step 1: Enable auto scaling
  • Step 2: Global configuration
  • Step 3: Auto scaling configuration
  • Adjust node count
  • Edit node group advanced configuration
  • Duplicate node group
  • Delete node group
  • Remove node

NodeGroup Management

Updated at:2025-10-27

Overview

This document describes how to create, view, manage, and delete node groups in a cluster via the Cloud Container Engine (CCE) console. For related concepts and usage limitations, refer to Node group Introduction.

Create node group

  1. Sign in to the Baidu AI Cloud Cloud Container Engine (CCE) Console, click Cluster Management > Cluster List in the left navigation bar to enter the Cluster List page, click the Cluster Name to enter the Cluster Management page.
  2. Click Node Group in the left-hand navigation bar to access the Node Group List page.
  3. Click the Create Node Group button or the Create Now link to navigate to the Node Group Creation page.
  4. Fill in the basic node group configuration. The configuration items and descriptions are as follows:
  • Node group information
ConfigMap Description
Name of node group The custom name supports a mix of uppercase and lowercase letters, numbers, Chinese characters, and -_/ special characters. It must start with a letter and be 1 to 65 characters long.
VPC network The cluster’s default network is the VPC network, and it cannot be changed.
Worker security group Support the configuration of "Use Default Security Group" and "Use Custom Security Group". If the default security group is used, the current security group of the cluster is bound. If it is unsuitable, you can customize a security group. After instance is created, add or modify access rules based on actual access requirements. Refer to CCE Default/Additional Security Group Description.

  • Node configuration: Subsequent node group scaling will use this configuration as the template for creating nodes.
ConfigMap Description
Node type Choose as needed. Currently supported options include Baidu Cloud Compute (BCC), Elastic Bare Metal Compute (EBC), and Baidu Bare Metal Compute (BBC).
Bill type Choose as needed: Currently supported payment methods are postpay, prepay, and spot instances.
  • Subscription payment requires upfront payment for the selected duration, offering lower costs compared to the postpaid mode.
  • The postpaid mode charges based on actual instance usage, with no upfront payment required but requiring an account balance of at least RMB 100. This option usually has a higher cost compared to prepaid modes.
  • Spot instance is a new operational mode where you bid within a defined range. If the market price for the chosen specification drops below your bid and resource inventory is sufficient, the spot instance is created and billed at the current market rate.
  • Availability zone An availability zone (AZ) is a physical area within a region that features independent power and network infrastructure, isolating any faults to a single AZ. It is used for filtering subnets available within a specific zone.
    Node subnet Choose the subnet to assign IP addresses to nodes. The available subnets differ across various availability zones.
    Instance configuration Based on different CPU-to-memory ratios, Baidu Cloud Compute offers various types of instance family. For specifications and applicable scenarios, refer to Specification.
    Image types and OS Choose the appropriate image type and operating system as per your actual requirements.
  • Public image: Officially provided by Baidu AI Cloud, these include only the basic operating system environment.
  • Custom image: The image generated via the Custom Image function includes the base OS, applications, and personalized configuration of the system disk. Custom images help you quickly create Baidu Cloud Compute with personalized configuration.
  • Shared image: Allows shared custom images to be used between users. Shared users can find the shared images through the management console or API and use them to create new instances or reinstall the operating system.
  • System disk Used for OS installation. Non-heterogeneous computing instances with Linux as the operating system default to 20 GB; if using Windows, the default is 40 GB. Heterogeneous computing instances default to 40 GB regardless of the operating system. Available cloud disk types depend on the region and specifications and are displayed on the interface.
    Data disk It refers to the mounted data disk, used to increase the storage capacity of Baidu Cloud Compute and unselected by default. There is an upper limit to the count of cloud disk servers that can be mounted. If you wish to mount cloud disk servers beyond this limit, submit a ticket to contact us. Currently, both the system disk and data disk of Baidu Cloud Compute are Cloud Disk Servers (CDS). For details on the disk type and usage limitations of CDS, please refer to Disk Type and Usage Limitations.
    Bind snapshot policy Snapshot strategy binding is disabled by default. Through snapshot, you can realize disk data backup, disk data recovery and disk image creation. For more snapshot usage and limitations, please see Snapshot Usage Instructions. Snapshot is currently a paid service. Please refer to Snapshot Charge Instructions.
    Public IP address To enable public network access, purchase an EIP or bind an existing EIP after the instance purchase is successful. Public network bandwidth can be purchased in the following ways:
  • For subscription billing, the bandwidth fee for the selected period must be paid upfront and will be included in the instance payment when purchasing subscription-based Baidu Cloud Compute.
  • With postpay traffic billing, charges are based on actual data transfer volume without a usage cap, but the maximum peak bandwidth can be set.
  • Pay by bandwidth usage involves charges based on the fixed bandwidth value chosen by users, with a maximum purchasable bandwidth of 200 Mbps.
  • Instance name You can either customize the instance name or allow the system to generate it randomly.
    Domain switch If enabled by the user, the hostname will include a domain suffix to support DNS resolution.
    Administrator user name For Windows systems, the administrator account is "Administrator," while for Linux systems, it is "root.\
    Administrator password The available methods for setting passwords vary depending on the instance's operating system.
  • Custom: Create a personalized password for logging into the instance.
  • Randomly Generated: After purchase, log into the console to reset your password. Refer to Reset Password.
  • Key Pairs: For Linux OS, you can use key pairs to connect to Baidu Cloud Compute. SSH key pairs offer a more secure login method than traditional passwords. See Key Pair Settings for further details.
  • Count The number of nodes you input represents the initially desired node count. The limits are as follows:
  • A single node group can accommodate up to 1,000 nodes.
  • You can process a maximum of 200 nodes per operation.
  • The scaling limit depends on the total number of remaining IPs across the subnets in the node group and the model inventory.
  • Deployment group When creating Baidu Cloud Compute instances in a designated deployment group, they are distributed across physical servers with other Baidu Cloud Compute instances in the same deployment group, to ensure high availability of service during hardware failures. For specific settings, refer to Deployment Group. A maximum of 2 deployment groups can be added into an instance.
    Auto scaling Enable auto-scaling, and the system will automatically increase capacity based on conditions, node configurations, and auto-scaling settings. It will also calculate costs and generate orders automatically. Post-scaling, you can manually review node and order details.
    Failure detection and self-healing Supports node failure detection with customizable self-healing rules.

    • Advanced configuration
    ConfigMap Description
    Scaling strategy
  • Model configuration order: The node group will scale according to the active/standby model sequence you set. If the active model cannot scale, select a standby model for scaling.
  • Uniform distribution across multiple subnets: Distribute node instances evenly across specified multi-availability zones (or multiple subnets) within the scalability group. This strategy works only when multiple subnets are configured.
  • Node memory sharing By default, it is unchecked. When this option is checked, GPU sharing function is enabled for newly added nodes by default. Memory sharing applies only to nodes with GPU devices; nodes without GPUs will be ignored. For details, please refer to GPU Exclusive and Shared Type.
  • Note: Enabling node memory sharing requires installation of GPU Manager and AI Job Scheduler components.
  • Kubelet data directory Storage directories for volume files, plugin files, etc., include paths such as /var/lib/kubelet.
  • If a data disk is mounted, it is recommended to store data in the data disk.
  • Container data directory Storage directories for containers and images. Configure paths like /home/cce/containerd for container and image storage.
  • If a data disk is mounted, it is recommended to store data in the data disk.
  • Pre-deployment execution script This script will automatically run before node deployment. Ensure the script supports reentrancy and includes retry logic. The script content and logs generated will be saved in the node’s /usr/local/cce/scripts/ directory.
    Post-deployment execution script This script will automatically run after node deployment. You must manually verify the script's execution status. The script content and logs generated will be saved in the node’s /usr/local/cce/scripts/ directory.
    Custom kubelet parameters Support custom configuration of kubelet parameters. For details, refer to Custom Kubelet Parameters.
    Block a node Node blocking is disabled by default. When enabled, the node enters a non-schedulable state, and new Pods will not be assigned to it. To uncordon a node, execute the kubectl uncordon command. Blocking nodes reduces the cluster's remaining available resource quota and may affect the scheduling of future services and the performance of current ones if the reserved resources are insufficient.
    Resource labels Resource labels allow you to categorize cloud resources by various criteria (such as purpose, owner, or item). Each label consists of two part: key and value. For specific settings, please refer to Label Function.
  • By default, labels are consistently added to resources associated with instances, such as CDS and EIP, but this can be turned off.
  • IAM role Set an IAM role for Baidu Cloud Compute instances. For details, refer to Set IAM Role.
    Labels K8S labels are identifiers for managing and selecting K8S objects and will be automatically bound to created nodes in a node group. Each label consists of two parts: key and value. For more information, refer to K8S Label Description.
    Taints Node taints and pod tolerations work in conjunction. After setting taints on a node, pod scheduling onto the node can be prevented, or pods can be evicted from the node. Unless the pod tolerations can match the node taints. For details, refer to Taints and Tolerance Description.
    Annotations Annotations are a mechanism for attaching non-identifying metadata to objects. Each annotation consists of two parts: key and value. For details, refer to Annotation Description.

    image.png

    1. Click the Finish button to complete node group creation.

    View node group

    1. After completing the process, return to the node group list to view the node group.
    2. The node group list displays the following information:
    Column name Description
    Name of node group/ID The node group ID, as a unique identifier for the node group, can be used to locate specific nodes in the cluster node list.
    Bill type The default billing method is post-pay.
    Instance configuration Details of the selected node configuration, including specifications and type, are shown during node group creation.
    Actual node count You can check the actual number of ready nodes, their statuses, and scaling progress in the node list.
    Desired node count The node count specified during node group creation reflects the desired number of available nodes maintained in the group.
    Auto scaling range When auto-scaling is enabled, the scaling range is displayed, and the desired node count adjusts automatically within this range.
    Failure detection and self-healing If failure detection and self-healing are activated, you can view the rules for self-healing failures.
    Creation time Time taken to create a node group.

    Configure auto scaling

    Step 1: Enable auto scaling

    When using the node group function for the first time, you must obtain the Authorization to Enable Auto Scaling before activating the auto scaling function. You can enable auto scaling by clicking Authorization to Enable Auto Scaling in the Global Configuration module of the node group list, or enable it when creating a node group for the first time.

    image.png

    image.png


    Step 2: Global configuration

    After obtaining the authorization to enable auto scaling, click Edit Configuration in the Global Configuration module of the node group list to enable auto scale-down and configure scaling algorithms in the pop-up window. This configuration will be applied to all node groups that auto scaling is enabled in the cluster. The relevant configuration items and descriptions are as follows:

    ConfigMap Optional Description
    Auto scale-down Scale-down threshold The cluster may automatically scale down when the resource utilization rate (CPU, GPU, memory) of the nodes in the scaling group falls below the defined threshold. Default range: 20–80.
    Scale-down trigger latency If node resource usage stays below the scale-down threshold for the specified scale-down trigger delay, the cluster may begin automatic scale-down. Default range: 1–60.
    Maximum concurrent scale-down count The number of nodes that can be scaled down simultaneously when node utilization drops to 0. Default range: 1–20.
    Scale-down start interval after scale-up After scaling up, newly created nodes will be assessed for scale-down potential once this interval is over. Default range: 1–60.
    Do not scale down the following nodes Includes locally-stored pods, even those not managed by DaemonSet under the kube-system namespace.
    Scale-up algorithm Random For details, see Scale-up Algorithm Introduction.
    least-waste
    most-pods
    priority

    image.png


    Step 3: Auto scaling configuration

    Once authorization is granted to enable auto scaling, you can activate or deactivate the auto scaling feature for a node group while creating or editing the node group from the list page. You can then set up relevant auto scaling strategies. These steps explain how to configure auto scaling for an existing node group.

    1. On the node group list page, locate the target node group and click More > Auto Scaling Configuration in the operation column.
    2. In the auto scaling settings pop-up window, enable auto scaling and define the scaling range and priority.
    ConfigMap Description
    Scaling range When auto scaling is activated, the desired node count will be automatically adjusted within the specified range. You can input the minimum and maximum desired node counts.
    Scale-up priority Node groups with auto scaling enabled will scale up according to their set priority. A higher priority value denotes greater precedence.

    image.png


    Adjust node count

    Adjusting node count manually means changing the desired node count in the node group to scale up or down.

    1. On the node group list page, locate the target node group requiring count adjustment, and click Adjust Node Count in the operation column.
    2. Enter the desired node count in the node group adjustment pop-up window, then click OK to adjust the expected node count of the node group.

    Description:

    • If auto scaling is active, manual adjustments to the desired node count won't be allowed when scaling conditions are met. CCE will automatically manage the count within the scaling range.
    • To stop automatic desired node adjustments, disable auto scaling in the configuration first, then modify node counts manually.
    • The desired node count doesn't represent the number of new nodes. For instance, if a node pool with 3 nodes is scaled up with a desired node count of 5, the system will add 2 additional nodes, not 5.

    • A single node group can accommodate up to 1,000 nodes.
    • A maximum of 500 nodes can be adjusted per operation.
    • The scaling limit depends on the total number of remaining IPs across the subnets in the node group and the model inventory.

    image.png


    Edit node group advanced configuration

    After a node group is created, the CCE Kubernetes Cluster allows certain adjustments to the node group configuration through the console.

    Description:

  • The operation of editing advanced configuration of node groups doesn't affect existing nodes or service operations within the node group. >
  • After updating node group configuration, modifications will only apply to new nodes, and not affect the configuration of existing nodes in the node group, except for explicitly stated scenarios (e.g. synchronized update of labels, taints or annotations for existing nodes). >
  • After updating the node group configuration, new nodes in the node groups will use this configuration by default. >
  • To update node group configurations, refer to this step. If you have made modifications to the nodes through other means, these modifications will be overwritten during node group upgrade. >
  • After checking the option to synchronize updates of labels, taints, and annotations for existing nodes, adding or modifying labels, taints or annotations in the node group will automatically apply to both new and existing nodes. Meanwhile, modifying labels and taints on existing nodes will refresh node configuration along with the node group configuration. >
  • When the option to synchronize updates for labels, taints, and annotations on existing nodes is disabled, any additions or modifications to those parameters in the node group will only apply to new nodes. Changes made on existing nodes will take precedence and won't be affected by subsequent updates from the node group's configuration.
    1. Go to the node group list page, locate the target node group, and click its Node Group Name/ID to access the details page for removing nodes.
    2. On the node group details page, click Modify under advanced configuration to edit the advanced configuration items of node group and follow the prompts to complete the configuration. image.png
      image.png

    Duplicate node group

    The CCE console offers a simple way to replicate the configuration of an existing node group to create new ones based on this setup.

    1. On the node group list page, locate the target node group to be replicated and click More > Replicate in the operation column.

      image.png

    2. On the replicate node group page, you can view the configuration of replicated node group and modify it as needed. After confirming the configuration, click Complete to finish the replication of node group.

    Delete node group

    When deleting a node group, you can decide whether to keep/remove/delete nodes and whether to release the pay-as-you-go public IP or cloud disk attached to the instance based on your needs.

    Note: The deleted node groups cannot be recovered. Please make data backup and proceed with caution.

    1. On the node group list page, locate the target node group to be deleted, and click Delete in the operation column.
    2. In the "Delete Node Group" pop-up, decide whether to keep, remove, or delete the nodes in this group, as well as whether to release the associated pay-as-you-go public IP and cloud disk server bound to the instance. When deleting a node group, you can perform the following actions:

      • Keep the nodes in this node group in the cluster
      • Remove nodes in this node group from the cluster, but retain the virtual machine resources
      • Remove nodes in this node group from the cluster and release virtual machine resources (prepaid resources will not be automatically released)
      • Release the postpay public IP and cloud disk server bound to the instance

    image.png


    Remove node

    Note:

    • Before using the node group feature, nodes that pre-existed in the cluster or were added through scale-up (not via the node group method) do not belong to any node group and are managed separately under "Node Management > Node List.\
    • Removal of nodes from the cluster Node Group Management > Node List is performed with the nodes as object. This follows the operation logic of current node and does not affect the desired node count of node group. The node group will automatically adjust the node count based on the current desired count.
    • Removing nodes from Node Management > Node Group > Node List will reduce the desired node count of the corresponding node group.
    1. On the node group list page, locate the target node group from which nodes need to be removed, and click the Node Group Name/ID to enter the Node Group Details page.
    2. In the left navigation bar, select Node List to view all nodes in the current node group.
    3. Locate the node to be removed from the node group, and select Remove Node in the operation column. To remove multiple nodes at the same time, you can check multiple nodes on the current page and then click More Operations > Remove Nodes at the top to perform the removal operation.
    4. In the "Remove Node" pop-up, decide whether to retain or release the node from the cluster and whether to keep or release the corresponding instance of the node.

      • Keep the nodes in this node group in the cluster
      • Remove nodes in this node group from the cluster, but retain the virtual machine resources
      • Remove nodes in this node group from the cluster and release virtual machine resources (prepaid resources will not be automatically released)
      • Release the postpay public IP and cloud disk server bound to the instance
    5. Click OK. These nodes are now removed from the node group.

    image.png

    image.png

    Previous
    Application management
    Next
    NodeGroup Node Fault Detection and Self-Healing