Container Horizontal Scaling (HPA)
The Horizontal Pod Autoscaler (HPA) provided by Baidu AI Cloud’s Container Engine (CCE) automatically adjusts the number of Pods for a service based on CPU and memory metrics, ensuring better handling of scenarios like traffic surges. This document outlines how to implement Pod auto-scaling using the Cloud Container Engine console or the YAML method.
Implementation principle
- The HPA component collects Pod monitoring metrics every 15 seconds, calculates the desired number of replicas based on current metric data, current replica count, and target metric values, and adjusts the number of replicas if necessary to achieve auto-scaling.
- For example, consider a workload with 2 instances, a current average CPU usage of 90%, and a target CPU usage of 60%. The number of adjusted replicas will be calculated as:
90% * 2 / 60% = 3. This means the 2 instances will automatically scale up to 3 instances.
Description:
If a workload has multiple scaling metrics configured, HPA will calculate the target number of replicas for each metric separately and scale based on the maximum value.
Prerequisites
- A K8S Cluster CCE has been created. For specific operations, refer to Create a K8S Cluster CCE.
Operation steps
Use HPA through the console
Method I: Enable HPA for an existing workload
- Sign in to the Baidu AI Cloud management console. Navigate to Product Services > Cloud Native > Cloud Container Engine (CCE). Click Cluster Management > Cluster List, then select the target cluster to enter the Cluster Details page. In the sidebar, click Autoscaler - Horizontal Pod Autoscaler (HPA).
- Click the Create Rule operation on the Horizontal Pod Autoscaler List page.
- Configure the scaling settings on the Create Auto Scaling page.
| ConfigMap | Required/Optional | Configuration |
|---|---|---|
| Namespace | Required | Select the namespace where the scaling rules are applied. |
| Rule name | Required | Specify a name for the auto-scaling rule. It must be up to 63 characters long, contain only lowercase letters, numbers, and hyphens (-), and begin with a lowercase letter while ending with a number or lowercase letter. |
| Workload type | Required | Supports both stateless deployments (Deployment) and stateful deployments (StatefulSet). |
| Workload name | Required | Choose between stateless and stateful deployments within the selected namespace; multiple selections are allowed. |
| Scaling metrics | Required | Provide CPU and memory metrics for scaling; multiple metrics can be configured. |
| Minimum number of Pods | Required | The number of Pods will not be scaled below the specified minimum range. |
| Maximum Pod count | Required | The number of Pods will not exceed the specified maximum range. |
Description:
- Support selecting multiple workloads. If multiple workloads are selected, corresponding scaling rules will be generated based on the number of workloads. The actually generated scaling rule name is "rule name+workload name”
- The number of Pods will be automatically adjusted within the specified range, but will not go beyond those limits.
- Click OK to complete creation.
Method II: Enable HPA during the creation of a workload
Taking the creation of a stateless deployment as an example:
- Sign in to the Baidu AI Cloud management console. Navigate to Product Services - Cloud Native - Cloud Container Engine (CCE). Click Cluster Management - Cluster List, then select the target cluster to enter the Cluster Details page. In the sidebar, click Workloads > Stateless Deployment.
- Click the Create Stateless Deployment operation on the Stateless Deployment page.
- On the Create Stateless Deployment - Advanced Settings page, check Horizontal Pod Autoscaler to enable Horizontal Pod Autoscaler (HPA) for the workload.
| ConfigMap | Required/Optional | Configuration |
|---|---|---|
| Rule name | Required | Specify the name of the auto scaling rule. It must be 63 characters long, contain only lowercase letters, numbers and the separator ("-"), and start with a lowercase letter and end with a number or lowercase letter |
| Scaling metrics | Required | Provide CPU and memory metrics; multiple metrics can be selected |
| Minimum number of Pods | Required | The count of Pods will not be lower than the set range |
| Maximum Pod count | Required | The count of Pods will not exceed the set range |
- Click Finish, and then a stateless deployment supporting HPA has been configured.
Use HPA through kubectl commands
To demonstrate HPA, a custom Docker image based on the php-apache image will be used. The image includes an index.php page with code for running CPU-intensive computing tasks. An example Dockerfile is as follows:
1FROM php:5-apache
2ADD index.php /var/www/html/index.php
3RUN chmod a+rx index.php
An example index.php file is as follows:
1<?php
2$x = 0.0001;
3for ($i = 0; $i <= 1000000; $i++) {
4 $x += sqrt($x);
5}
6echo "OK!";
7?>
Prerequisites
- kubectl has been configured to access the Kubernetes cluster from the local machine. For details, refer to Connect to a Cluster via kubectl.
Step I: Deploy the workload and service
- First, deploy a deployment running the above Docker image and expose it as a Kubernetes service
1kubectl run php-apache --image=hpa-example --requests=cpu=200m --expose --port=80
2service "php-apache" created
3deployment "php-apache" created
Step II: Create an HPA
1kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
2deployment "php-apache" autoscaled
3# After creation, the current monitoring value is "unknown". After performing step 3, wait 1-2 minutes, and then it will change to a normal percentage here
4[root@instance-2tpjy37t ~]# kubectl get hpa
5NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
6php-apache Deployment/php-apache <unknown>/50% 1 10 1 5s
The corresponding YAML is as follows:
1apiVersion: autoscaling/v2alpha1
2kind: HorizontalPodAutoscaler
3metadata:
4 name: php-apache
5 namespace: default
6spec:
7 scaleTargetRef:
8 apiVersion: apps/v1beta1
9 kind: Deployment
10 name: php-apache
11 minReplicas: 1
12 maxReplicas: 10
13 metrics:
14 - type: Resource
15 resource:
16 name: cpu
17 targetAverageUtilization: 50
Field explanations:
- scaleTargetRef: Target object of HPA auto scaling
- minReplicas: Minimum count of Pods
- maxReplicas: Maximum allowed count of Pods
- metrics: Metrics
- targetAverageUtilization: The set resource utilization rate. Horizontal Pod Autoscaler is triggered when this value is exceeded
HPA currently supports 3 types of metrics. For details, refer to kubernetes Horizontal Pod Autoscaler:
- Predefined metrics:CPU and memory usage of Pods (built-in support)
- Custom Pod metrics:Monitoring metrics provided by applications (requiring deployment of monitoring and custom metric server)
- Custom object metrics:Monitoring metrics of other resources in the same namespace as the Pod (requiring deployment of monitoring and custom metric server)
Step III: Add load to the service and verify auto scale-up
Start a container and send infinite query requests to the php-apache server in a loop (run the following command in another terminal)
1kubectl run -i --tty load-generator --image=busybox /bin/sh
2
3Hit enter for command prompt
4
5$ while true; do wget -q -O- http://php-apache.default.svc.cluster.local; done
6
7# Output:
8OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!!OK!OK!OK!OK
Step IV: Observe HPA changes
After the load increases, the replica count of the deployment will start to increase
1[root@instance-2tpjy37t ~]# kubectl get hpa
2NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
3php-apache Deployment/php-apache 332%/50% 1 10 7 19m
4[root@instance-2tpjy37t ~]# kubectl get deployment php-apache
5NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
6php-apache 7 7 7 7 19m
Step V: Stop the service load and verify automatic scale-down
In the terminal of the container generating the load, enter <ctrl> + c to terminate the load generation.
Then check the load status again (wait a few minutes).
1[root@instance-2tpjy37t ~]# kubectl get hpa
2NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
3php-apache Deployment/php-apache/scale 0% / 50% 1 10 1 11m
4# After the load decreases, the count of Pods will also decrease accordingly
5[root@instance-2tpjy37t ~]# kubectl get deployment php-apache
6NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
7php-apache 1 1 1 1 27m
