How to Continue to Expand Capacity When the Container Network Segment Space is Exhausted
Overview
The maximum number of nodes in a cluster is determined by the size of the container network segment and the maximum number of Pods on each node, for example:
- When 172.16.0.0/16 is selected for the container network segment, and the maximum number of Pods on each node is 256, a cluster can have a maximum of 256 nodes only;
- When 192.168.0.0/22 is selected for the container network segment, and the maximum number of Pods on each node is 128, a cluster can have a maximum of 8 nodes only;
In some cases, because the size of the selected container network segment is too small or the maximum number of containers on each node is too large when the user creates a cluster, the number of nodes exceeds the maximum number of nodes in the cluster when the cluster is scaled up. Because kube-controller-manager can't assign the Pod network segments to the scaled-up nodes, the node status of notReady may be caused.
Solution
Step 1
To modify the configuration of kube-controller-manager on the master cluster in the cluster, the field to be modified is --node-cidr-mask-size. Since the modification aims to let the cluster to accommodate more nodes, that is, to reduce the maximum number of Pods on the node, we should modify the -node-cidr-mask-size value to make it higher.
In case of master of multiple duplicates, we should modify the configurations of master nodes one by one.
Note: We can't modify --node-cidr-mask-size to a value smaller than the current value, or the network is blocked due to the caused network segment conflicts.
Step 2
To remove and re-add from and to the cluster, there are two methods:
- Select "Remove Node" or "Delete Node" from the console interface of CCE product, and then select "Immigrate Node" or "Add Node".
- Execute
kubectl delete node <nodeName>
to delete nodes from the k8s cluster. Executekubectl get pods --all-namespaces=true -o wide | grep <nodeName>
Ensure there is no Pod on the node, then restart the kubelet on the node to re-add the node to the k8s cluster.
Note:No matter which mode is used to delete the nodes from the cluster, all the removed Pods on the node can drift. For the nodes carrying the online services, we should operate carefully.
Practical Case
Problem scenarios
Currently, the container network segment of one cluster is 172.26.0.0/22, and --node-cidr-mask-size=24 in the kube-controller-manager configuration. In other words, the cluster can accommodate a maximum of 4 nodes, and the maximum number of Pods on each node is 256.
There have been 4 nodes in the cluster. In case of scaling-up again, the nodes cause the newly scaled-up nodes unavailable.
[root@instance-rhkiutp6-3 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
10.0.5.3 Ready <none> 119m v1.13.10
10.0.5.4 Ready <none> 117m v1.13.10
10.0.5.5 Ready <none> 20m v1.13.10
10.0.5.6 Ready <none> 118m v1.13.10
[root@instance-rhkiutp6-3 ~]# kubectl describe node| grep -i podcidr
PodCIDR: 172.26.2.0/24
PodCIDR: 172.26.1.0/24
PodCIDR: 172.26.3.0/24
PodCIDR: 172.26.0.0/24
Modification steps
Step 1
Execute vim /etc/systemd/system/kube-controller.service
on the master to view the kube-controller-manager configuration:
[Unit]
Description=Kubernetes Controller Manager
After=network.target
After=kube-apiserver.service
[Service]
ExecStart=/opt/kube/bin/kube-controller-manager \
--allocate-node-cidrs=true \
--cloud-config=/etc/kubernetes/cloud.config \
--cluster-cidr=172.26.0.0/22 \
--node-cidr-mask-size=24 \ # Modify here
.......
--kubeconfig=/etc/kubernetes/controller-manager.conf \
--leader-elect=true \
--logtostderr=true \
--master=https://100.64.230.195:6443 \
--v=6
Restart=always
Type=simple
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
Modify the value of --node-cidr-mask-size to 26 from 24. After such modification, the cluster can accommodate a maximum of 16 nodes, and the maximum number of Pods on each node is reduced to 64.
After modifying the configurations of all master nodes successively, execute the following commands to restart the kube-controller-manager.
systemctl daemon-reload
systemctl restart kube-controller.service
Step 2
Execute kubectl delete node 10.0.5.4
. At this time, no 10.0.5.4 nodes already exist in the cluster status.
[root@instance-rhkiutp6-3 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
10.0.5.3 Ready <none> 132m v1.13.10
10.0.5.5 Ready <none> 33m v1.13.10
10.0.5.6 Ready <none> 132m v1.13.10
[root@instance-rhkiutp6-3 ~]# kubectl describe node| grep -i podcidr
PodCIDR: 172.26.2.0/24
PodCIDR: 172.26.3.0/24
PodCIDR: 172.26.0.0/24
Execute kubectl get pods --all-namespaces=true -o wide | grep <nodeName>
Ensure there is no Pod on 10.0.5.4.
Execute systemctl restart kubelet.service
on the 10.0.5.4 node to restart kubelet. At this time, the 10.0.5.4 node is added in the cluster status again and the container network segment becomes 172.26.1.0/26;
[root@instance-rhkiutp6-3 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
10.0.5.3 Ready <none> 138m v1.13.10
10.0.5.4 Ready <none> 3m55s v1.13.10
10.0.5.5 Ready <none> 40m v1.13.10
10.0.5.6 Ready <none> 138m v1.13.10
[root@instance-rhkiutp6-3 ~]# kubectl describe node| grep -i podcidr
PodCIDR: 172.26.2.0/24
PodCIDR: 172.26.1.0/26
PodCIDR: 172.26.3.0/24
PodCIDR: 172.26.0.0/24
Once one existing code is immigrated and removed, 3-node scaling-up space is created. For example, now you can continue to scale up 3 nodes for the cluster. The distributed PodCIDR is 172.26.1.64/26
, 172.26.1.128/26
and 172.26.1.192/26
.
The user continue to perform the steps above to immigrate and remove all nodes to create more scaling-up space.
Note: The nodes with different PodCIDR masks can be in the same cluster theoretically. But the user is recommended to immigrate and remove all nodes to allow the nodes to have the same PodCIDR mask.