How to Continue Dilatation When Container Network Segment Space Is Exhausted (VPC Network Mode)
Note: The following only applies to clusters using "VPC Route" mode
Overview
The maximum count of nodes in a cluster is determined by the size of the container network segment and the maximum pod count per node. For example:
- With container network segment 172.16.0.0/16 and maximum 256 pods per node, a cluster can have at most 256 nodes
- With container network segment 192.168.0.0/22 and maximum 128 pods per node, a cluster can have at most 8 nodes;
In some situations, if the container network segment selected during cluster creation is too small, or the maximum pod count per node is set too high, expanding the cluster may exceed the maximum allowable node count. If the kube-controller-manager cannot allocate a pod network segment for the new nodes, those nodes may remain in the notReady state.
Note: Before March 11, 2021, the controller manager was deployed in binary mode, adhering to the older solution. For clusters created after March 11, 2021, the controller manager is deployed as a static pod, following the updated solution.
Solution for clusters of master deployed in binary
Before March 11, 2021, the controller manager was deployed in binary.
Currently, only the V1 architecture of the container network is supported, though it is possible to manually upgrade to the V2 architecture. To check the current container network version, verify if the cluster contains the cce-cni-node-agent object. If present, the version in use is V1.
1$ kubectl -n kube-system get cm cce-cni-node-agent
2NAME DATA AGE
3cce-cni-node-agent 1 125d
Step 1
Modify the configuration of the cce-network-operator on the cluster's master node. The field to adjust is cluster-pool-ipv4-mask-size. This modification increases the cluster's capacity to accommodate more nodes by reducing the maximum pod count per node, which involves increasing the value of cluster-pool-ipv4-mask-size.
For clusters with multiple master replicas, each master node’s configuration must be updated individually.
Note: Do not modify cluster-pool-ipv4-mask-size to a smaller value than the current one, as this will cause network failure due to network segment conflicts
Step 2
There are two methods to remove a node from the cluster and rejoin it:
- From the CCE product console interface, select "Remove Node" or "Delete Node," and then "Move into Node" or "Add a Node.\
- Execute
kubectl delete node <nodeName>to remove node from the K8s cluster. Executekubectl get pods --all-namespaces=true -o wide | grep <nodeName>to ensure that there is no pod on the node. Then, reboot the kubelet on the node to rejoin it to the K8s cluster.
Note: Regardless of the method used to remove a node from the cluster, all pods on the removed node will be rescheduled. For nodes hosting online services, proceed with caution.
Practical case
Problem scenario
The current cluster has a container network segment of 172.26.0.0/22, with the kube-controller-manager configured as --node-cidr-mask-size=24. This means the cluster can support up to 4 nodes, each with a maximum pod capacity of 256.
The cluster already has 4 nodes. Further expansion will render additional nodes unusable.
1[root@instance-rhkiutp6-3 ~]# kubectl get node
2NAME STATUS ROLES AGE VERSION
310.0.5.3 Ready <none> 119m v1.13.10
410.0.5.4 Ready <none> 117m v1.13.10
510.0.5.5 Ready <none> 20m v1.13.10
610.0.5.6 Ready <none> 118m v1.13.10
7[root@instance-rhkiutp6-3 ~]# kubectl describe node | grep -i podcidr
8PodCIDR: 172.26.2.0/24
9PodCIDR: 172.26.1.0/24
10PodCIDR: 172.26.3.0/24
11PodCIDR: 172.26.0.0/24
Modification steps
Step 1
On the master, execute vim /etc/systemd/system/kube-controller.service to view the kube-controller-manager configuration:
1[Unit]
2Description=Kubernetes Controller Manager
3After=network.target
4After=kube-apiserver.service
5[Service]
6ExecStart=/opt/kube/bin/kube-controller-manager \
7--allocate-node-cidrs=true \
8--cloud-config=/etc/kubernetes/cloud.config \
9--cluster-cidr=172.26.0.0/22 \
10 --node-cidr-mask-size=24 \ #Modify here
11.......
12--kubeconfig=/etc/kubernetes/controller-manager.conf \
13--leader-elect=true \
14--logtostderr=true \
15--master=https://100.64.230.195:6443 \
16--v=6
17Restart=always
18Type=simple
19LimitNOFILE=65536
20[Install]
21WantedBy=multi-user.target
Change the --node-cidr-mask-size value from 24 to 26. After this adjustment, the cluster will be able to accommodate up to 16 nodes, though the maximum pod count per node will decrease to 64.
After modifying the configuration on each master node, execute the following command to restart the kube-controller-manager.
1systemctl daemon-reload
2systemctl restart kube-controller.service
Step 2
Execute kubectl delete node 10.0.5.4, and the 10.0.5.4 node will no longer appear in the cluster status:
1[root@instance-rhkiutp6-3 ~]# kubectl get node
2NAME STATUS ROLES AGE VERSION
310.0.5.3 Ready <none> 132m v1.13.10
410.0.5.5 Ready <none> 33m v1.13.10
510.0.5.6 Ready <none> 132m v1.13.10
6[root@instance-rhkiutp6-3 ~]# kubectl describe node | grep -i podcidr
7PodCIDR: 172.26.2.0/24
8PodCIDR: 172.26.3.0/24
9PodCIDR: 172.26.0.0/24
Execute kubectl get pods --all-namespaces=true -o wide | grep <nodeName> to ensure no pod on 10.0.5.4.
On node 10.0.5.4, execute systemctl restart kubelet.service to reboot kubelet. The cluster status then indicates that node 10.0.5.4 is rejoined again, and the container network segment is changed to 172.26.1.0/26:
1[root@instance-rhkiutp6-3 ~]# kubectl get node
2NAME STATUS ROLES AGE VERSION
310.0.5.3 Ready <none> 138m v1.13.10
410.0.5.4 Ready <none> 3m55s v1.13.10
510.0.5.5 Ready <none> 40m v1.13.10
610.0.5.6 Ready <none> 138m v1.13.10
7[root@instance-rhkiutp6-3 ~]# kubectl describe node | grep -i podcidr
8PodCIDR: 172.26.2.0/24
9PodCIDR: 172.26.1.0/26
10PodCIDR: 172.26.3.0/24
11PodCIDR: 172.26.0.0/24
Each time an existing node is moved in or removed, it will create expansion space for 3 nodes. For example, 3 additional nodes can now be expanded for the cluster, and the allocated PodCIDRs will be 172.26.1.64/26, 172.26.1.128/26, and 172.26.1.192/26.
Users should continue following the outlined steps to add or remove nodes, creating additional space for expansion as needed.
Note: While nodes with differing PodCIDR masks can theoretically coexist in the same cluster, it is recommended to add or remove all nodes at once to ensure they share the same PodCIDR mask.
Solution for master using static pod deployment method
After March 11, 2021, the controller manager has utilized static pod deployment. Pod updates can be performed by modifying the relevant configuration file.
The container network architecture is classified into V1 and V2 versions. Confirm the version in use by checking if the cluster contains the cce-cni-node-agent object. If it is present, the version in use is V1.
1$ kubectl -n kube-system get cm cce-cni-node-agent
2NAME DATA AGE
3cce-cni-node-agent 1 125d
Step 1: Modify configuration
Modify the configuration of kube-controller-manageron the master node in the cluster, and the field to be modified is --node-cidr-mask-size.
Since the goal of this modification is to allow the cluster to accommodate more nodes by reducing the maximum pod count per node, it is necessary to increase the --node-cidr-mask-size value.
For clusters with multiple master replicas, each master node’s configuration must be updated individually.
| Note: Do not modify node-cidr-mask-size to a smaller value than the current one, as it will cause network failure due to network segment conflicts
Step 2: Modify the network component
The container network supports both V1 and V2 architectures, which require different modification procedures. Confirm the version in use by checking if the cluster contains the cce-cni-node-agent object. If it is present, the version in use is V1.
1$ kubectl -n kube-system get cm cce-cni-node-agent
2NAME DATA AGE
3cce-cni-node-agent 1 125d
For the V1 architecture of the container network, V1 directly utilizes the podcidr in the node spec without the need for any modifications.
For container network V2 architecture, it is required to modify cce-network-v2-config, and the corresponding field is cluster-pool-ipv4-mask-size. Then reboot the corresponding components cce-network-operator and cce-network-agent to ensure the configuration takes effect.
Step 3: Remove and move in the node again
There are two methods to remove a node from the cluster and rejoin it:
- From the CCE product console interface, select "Remove Node" or "Delete Node," and then "Move into Node" or "Add a Node.\
- Execute
kubectl delete node <nodeName>to remove node from the K8s cluster. Executekubectl get pods --all-namespaces=true -o wide | grep <nodeName>to ensure that there is no pod on the node. Then, reboot thekubeleton the node to rejoin it to the K8s cluster. Note: Regardless of the method used to remove a node from the cluster, all pods on the removed node will drift. For nodes hosting online services, operate with caution.
Practical case
Problem scenario
Currently, a cluster has a container network segment of 172.16.0.0/24, with kube-controller-manager configuration set to --node-cidr-mask-size=24. That is, the cluster can accommodate up to 256 nodes, with a maximum pod count of 256 per node. It is now required to increase node capacity to 1,024.
1[root@root ~]# kubectl get no
2NAME STATUS ROLES AGE VERSION
3192.168.1.4 Ready <none> 42m v1.24.4
4192.168.1.5 Ready <none> 35m v1.24.4
5root@root:~# kubectl describe node | grep -i podcidr
6root@root:~# kubectl describe node | grep -i podcidr
7PodCIDR: 10.0.1.0/24
8PodCIDRs: 10.0.1.0/24
9PodCIDR: 10.0.0.0/24
10PodCIDRs: 10.0.0.0/24
Modification steps
Step 1: Modify configuration
Check cnode-cidr-mask-size configuration:
1[root@root manifests]# kubectl get po kube-controller-manager-192.168.1.4 -n kube-system -o yaml | grep node-cidr-mask-size
2 - --node-cidr-mask-size=24
Change the cluster-pool-ipv4-mask-size value from 24 to 26. After the update, the cluster can support up to 1024 nodes, but the maximum number of pods per node will be reduced to 64.
Modify the kube-controller-manager parameter node-cidr-mask-size=26:
1vim /etc/kubernetes/manifests/kube-controller-manager.yaml
2apiVersion: v1
3kind: Pod
4metadata:
5 annotations:
6 scheduler.alpha.kubernetes.io/critical-pod: ""
7 creationTimestamp: null
8 labels:
9 component: kube-controller-manager
10 tier: control-plane
11 name: kube-controller-manager
12 namespace: kube-system
13spec:
14 containers:
15 - command:
16 - kube-controller-manager
17 - --cluster-cidr=172.16.0.0/16
18 - --feature-gates=MixedProtocolLBService=true
19 - --master=https://192.168.1.4:6443
20 - --node-cidr-mask-size=26
21 ……
Here, kube-controller-manager is deployed statically. For static pods, kubelet will monitor changes to the definition files. After saving and closing the editor, kubelet will detect the file changes and automatically delete old pods and start new pods based on the updated definition file.
Step 2: Modify the network component
The V2 network architecture used here requires modifying the cluster-pool-ipv4-mask-size field in cce-network-v2-config.
1kubectl edit cm cce-network-v2-config -n kube-system
2apiVersion: v1
3data:
4 cced: |
5 annotate-k8s-node: true
6 api-rate-limit:
7 bcecloud/apis/v1/AttachENI: rate-limit:5/1s,rate-burst:5,max-wait-duration:30s,parallel-requests:5,log:true
8 bcecloud/apis/v1/BatchAddPrivateIP: rate-limit:5/1s,rate-burst:10,max-wait-duration:15s,parallel-requests:5,log:true
9 bcecloud/apis/v1/BatchDeletePrivateIP: rate-limit:5/1s,rate-burst:10,max-wait-duration:15s,parallel-requests:5,log:true
10 bcecloud/apis/v1/CreateENI: rate-limit:5/1s,rate-burst:5,max-wait-duration:30s,parallel-requests:5,log:true
11 bcecloud/apis/v1/DescribeSubnet: rate-limit:5/1s,rate-burst:5,max-wait-duration:30s,parallel-requests:5
12 bcecloud/apis/v1/StatENI: rate-limit:10/1s,rate-burst:15,max-wait-duration:30s,parallel-requests:10
13 auto-create-network-resource-set-resource: true
14 bbcEndpoint: bbc.gz.baidubce.com
15 bccEndpoint: bcc.gz.baidubce.com
16 bce-cloud-access-key: ""
17 bce-cloud-country: cn
18 bce-cloud-host: cce-gateway.gz.baidubce.com
19 bce-cloud-region: gz
20 bce-cloud-secure-key: ""
21 bce-cloud-vpc-id: vpc-2f5wibbx4js7
22 cce-cluster-id: cce-clboj6fa
23 cce-endpoint-gc-interval: 30s
24 cluster-pool-ipv4-cidr:
25 - 172.16.0.0/16
26 cluster-pool-ipv4-mask-size: 26
Reboot the corresponding components cce-network-operator and cce-network-agent to ensure the configuration takes effect.
1kubectl rollout restart deployment cce-network-operator -n kube-system
2kubectl rollout restart daemonset cce-network-agent -n kube-system
Step 3: Remove and move in the node again
Execute the kubectl delete command to delete the corresponding node.
1kubectl delete node 192.168.1.4
Check the cluster status to confirm the 192.168.1.4 node is no longer present:
1[root@root ~]# kubectl get no
2NAME STATUS ROLES AGE VERSION
3192.168.1.5 Ready <none> 47m v1.24.4
4root@root:~# kubectl describe node | grep -i podcidr
5PodCIDR: 10.0.1.0/24
6PodCIDRs: 10.0.1.0/24
Make sure there are no pods running on 192.168.1.4. If any pods remain, rejoining the node will not alter the pod CIDR. Use the following command to check.
1kubectl get pods --all-namespaces=true -o wide | grep <nodeName>
On node 192.168.1.2, execute systemctl restart kubelet.service to reboot kubelet. The cluster status then indicates that node 192.168.1.2 is rejoined again, and the container network segment is changed to 172.26.1.0/26:
1[root@root ~]# kubectl get node
2NAME STATUS ROLES AGE VERSION
3192.168.1.4 Ready <none> 11m v1.24.4
4192.168.1.5 Ready <none> 102m v1.24.4
5[root@root-3 ~]# kubectl describe node | grep -i podcidr
6PodCIDR: 10.0.0.0/26
7PodCIDRs: 10.0.0.0/26
8PodCIDR: 10.0.1.0/24
9PodCIDRs: 10.0.1.0/24
Users should continue following the steps outlined above to add or remove nodes as needed to create additional expansion capacity.
Note: While nodes with differing PodCIDR masks can theoretically coexist in the same cluster, it is recommended to add or remove all nodes at once to ensure they share the same PodCIDR mask.
