CCE Usage Checklist
Overview
CCE provides a container management service based on native Kubernetes. To help users better leverage CCE, we’ve compiled a checklist of best practices covering cluster management, application deployment, and troubleshooting. We strongly encourage CCE users to review this checklist before starting or launching services to ensure a smooth transition to CCE and minimize risks like application issues or cluster reconfiguration due to improper usage.
Cluster check items
| Types | Item | Suggestions | Reference documentation |
|---|---|---|---|
| Cluster | Count of nodes | Regardless of service scale, for online service, it is strongly recommended to maintain a cluster node count of at least greater than 1, and reserve certain resource buffer to prevent service failure from single-point failures. | |
| Node password | The root password of node must be strong. | ||
| Node network | For clusters requiring external access, it is not recommended to bind EIPs directly to nodes, as this could expose them to security risks. Instead, enable public network access by attaching a NAT gateway to the cluster's VPC network. | Container network accesses the public network via NAT gateway | |
| VPC route | The CCE container network relies on VPC routing. Routing rules created by CCE are marked as "auto generated by cce." When creating new routes, users must avoid conflicts with these rules. If conflicts are unavoidable, submit a support ticket for assistance. | ||
| Security group | When configuring a security group, ensure it allows access to the node network, container network, and the 100.64.230.0/24 network segment, along with ports 22, 6443, and 30000–32768. Failure to do so could result in network issues for the cloud container engine. | ||
| Disk capacity | When creating a cluster, it is highly recommended to allocate at least 100GB of CDS storage to the node (this option is set as the default in CCE). | ||
| Node scaling | Scaling operations require root-level access to the machine, which may lead to insufficient cluster capacity. Since CCE does not directly support virtual machine scaling, it is advised to perform this action via the BCC page. Always scale down before scaling up to minimize service disruptions. | ||
| Virtual Machine Monitor | Excessive use of CPU, memory, or disk resources on a VM may impact cluster stability. CCE includes an eviction mechanism that migrates certain instances when node load becomes too high. It is highly recommended to set up node monitoring alerts in BCM. | Add alarm on BCM | |
| Baidu AI Cloud third-party resources | It is strongly advised that CCE users avoid directly modifying the resources (including names and other configurations) created by CCE on the BCC, DCC, VPC, BLB, or EIP product pages, as this could lead to unintended issues. |
Application check items
| Types | Item | Suggestions | Reference documentation |
|---|---|---|---|
| Application | Image | When building Docker images, users are encouraged to include common debugging tools such as ping, telnet, curl, and vim, which can be customized as needed. | |
| Private image | If a container uses a private image, it is essential to configure a secret. | Practice of using private images in K83S Cluster CCEs | |
| Count of instance replicas | For stateless services without conflicts, it is advised to have more than 2 instance replicas to prevent service interruptions caused by single-point failures, ensuring continuity even during instance migrations. | ||
| Limit range | It is highly recommended that all deployed services configure resource.limits. | Kubernetes limit range | |
| Health check | It is advisable to configure all launched services with liveness and readiness probes to enable automatic failover and ensure service reliability. | Kubernetes health check | |
| Service exposure mode | Intra-cluster access: ClusterIP Service; Extra-cluster access: LB Service; Extra-cluster access (HTTP/HTTPS): Ingress |
LoadBalancer ingress network traffic Ingress network traffic |
|
| Service data persistence | For services requiring data persistence, it is recommended to use PV and PVC modes. CCE currently supports Cloud File System (CFS), Cloud Disk Service (CDS), and Baidu AI Cloud Object Storage (BOS) via the PV/PVC mode. | Using CFS via PV/PVC mode Using CDS via PV/PVC mode Using BOS via PV/PVC mode |
Common troubleshooting
1. Does the container fail to start?
Generally, you can view the error messages by the following two methods:
- kubectl describe podName
- kubectl logs podName
If no obvious errors are found by the above methods, you can modify the start command of the container in YAML, for example, setting the start command as sleep 3600:
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 name: nginx-deployment
5 labels:
6 app: nginx
7spec:
8 replicas: 1
9 selector:
10 matchLabels:
11 app: nginx
12 template:
13 metadata:
14 labels:
15 app: nginx
16 spec:
17 containers:
18 - name: nginx
19 image: hub.baidubce.com/cce/nginx-alpine-go:latest
20 command: ["/bin/sh", "-c", "sleep 3600"]
Once the service is running, use the command kubectl exec -it podName /bin/sh to enter the container and manually run the start command to check for service error messages.
2. Does the creation of LoadBalancer Service fail?
Use the command kubectl describe service serviceName to view events and troubleshoot issues. Typically, the cause is quota limits for EIP or BLB. Submit a ticket to request quota increases if needed.
Note: The count of EIP instances which can be purchased by the users <=current count of existing BCC instances + current number of existing BLB instances+2
3. Does the container network access fail?
The container network access fails in the following conditions:
- Service EIP is inaccessible;
- ServiceName is inaccessible within the container;
- Service ClusterIP is inaccessible within the cluster;
- PodIP is inaccessible within the cluster;
- ...
Container network issues often stem from PodIP blockage, leading to access problems. First, verify if the PodIP can be reached by pinging it from nodes and pods. If not, investigate the following:
- View the VPC route table, and confirm whether any routing rules conflict with CCE;
- Check the VPC security group policies to ensure no rules are blocking the requests.
If the issue persists, submit a ticket to contact an administrator for further troubleshooting.
Note: If the service clusterIP ping is successful, use ip:port for access. Additionally, ensure the PodIP ping is reachable.
