CCE Use of Check List
CCE provides the container management services based on the native kubernetes. To facilitate the users’ better use of CCE, we summarize the checklist of some best practices from three aspects of cluster, application and troubleshooting. We strongly recommend you to view the checklist before you use CCE or the services are online. To view the checklist can help you migrate the services to CCE, and reduce the risks of application abnormality or cluster re-construction due to improper use.
Cluster
Type | Item | Recommendation | Reference file |
---|---|---|---|
Cluster | Number of nodes | Regardless of the services scale, for the online services, we strongly recommend that the number of cluster nodes should be at least greater than 1, and meanwhile a certain of resource buffer should be reserved to prevent the single point failure from causing the business damage. | |
Node password | Be sure to set the root password of the node as a strong password. | ||
Node network | If the has any external access requirements, we don't recommend to directly bind the EIP to the nodes. The external exposure of nodes has security risks. The node subnet of the cluster can be created as the NAT subnet. | Best Practices of NAT Gateway. | |
VPC route | The realization of the CCE container network depends on the VPC routing. The routing rules created by CCE are described as auto generated by cce . The created route should not conflict with the routing rules of CCE. If you cannot avoid the conflicts, you can send a work order for consultation. | ||
Security group | If there are any requirements on the setting of the security group, the security group should unblock the node network, the container network, the network segment 100.64.230.0/24 and the ports 22, 6443 and 30000-32768, or the container engine network may be blocked. | ||
Disk capacity | When a cluster is created, it is strongly recommended to mount at least 100GB of CDS (CCE is ticked by default) for the nodes. | ||
Rising and falling configuration of nodes | Because the rising and falling configuration requires restarting the machine, and this may cause the cluster undercapacity, CCE doesn't directly support the rising and falling configuration of virtual machines. The users can operate in the BCC page, but it is recommended that users should use the method of scaling up first and then scaling down for realization, so as to reduce the business impact. | ||
Virtual machine monitoring | The too high usage rate of virtual machines CPU, MEM and disk impact the cluster stability. CCE has an Evicted mechanism. When the node load is too high, some instances are migrated, so it is strongly recommended that the users should add the monitoring alarm to nodes in BCM. | Add Alarm in BCM | |
Three-party resources of Baidu AI Cloud | It is strongly recommended that CCE users shouldn't directly modify the related configurations of resources created by CCE, including name, in the product pages of BCC, DCC, VPC, BLB and EIP, etc., and this may cause unexpected results. |
Application
Type | Item | Recommendation | Reference file |
---|---|---|---|
Application | Image | It is recommended users can install some common debugging tools in the image when constructing the Docker image, such as: ping, telnet, curl, vim, etc., which can be customized. | |
Private image | The secret should be set when the container uses the private image. | ||
Number of duplicate instances | In case of stateless services and no conflicts, it is recommended that the number of instance service versions is greater than 2 to avoid the instance migration and temporarily unavailable services caused by the single point failure. | ||
Resource limit | It is strongly recommended that the resource.limits should be set for all online services. | kubernetes Resource Limit | |
Health check | It is recommended to set the health check methods of livenesss and readiness probe for all online services to ensure automatic Failover of services. | kubernetes Health Check | |
Service exposure mode | Intra-cluster access: ClusterIP Service; Extra-cluster access: LB Service; Extra-cluster access (HTTP/HTTPS): Ingress. |
LoadBalancer Access Network Traffic Ingress Access Network Traffic |
|
Service data persistence | The services have the data persistence requirements, and it is recommended to use the services by PV and PVC modes. Currently, CCE has supported the use of file storage (CFS), block storage (CDS) and object storage (BOS) by the PV/PVC mode. | Use CFS by PV/PVC Mode Use CDS by PV/PVC Mode Use BOS by PV/PVC Mode |
Troubleshooting
Does the container fail to start?
Generally, you can view the error messages by the following two methods:
1.kubectl describe podName 2.kubectl logs podName
If no obvious errors are found by the above methods, you can modify the start command of the container in YAML, for example, setting the start command as sleep 3600:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: hub.baidubce.com/cce/nginx-alpine-go:latest
command: ["/bin/sh", "-c", "sleep 3600"]
After starting the service, enter the container by kubectl exec -it podName/bin/sh, and manually execute the start command to view the error information of services.
Does the creation of LoadBalancer Service fail ?
You can view the Events and troubleshoot the failure reasons by kubectl describe service serviceName. Generally, the reason is that the quotas of EIP and BLB exceed the limit, and you can send a work order to apply for increase of the quotas.
Note: The number of EIP instances which can be purchased by the users <=current number of existing BCC instances + current number of existing BLB instances+2
Does the container network access fail?
The container network access fails in the following conditions:
- Serivce EIP cannot access;
- ServiceName cannot access in the container;
- Service ClusterIP cannot access in the cluster;
- PodIP cannot access in the cluster;
- ...
The container network problems are different access problems of Service which are generally caused by PodIP blockage. First, check whether the ping can be unblocked for PodIP in the nodes and Pods respectively. If not, check two places:
- View the VPC routing table, and confirm whether any routing rules conflict with CCE;
- View the VPC security group and ensure whether any policies intercept the requests.
If none is confirmed, you can send a work order to contact the administrator for troubleshooting.
Note: If the ping can be directly unblocked for Service clusterIP, access should be made by ip:port; the ping can be unblocked for PodIP.