Container Horizontal Scaling (HPA)

Updated at：2025-10-27

The Horizontal Pod Autoscaler (HPA) provided by Baidu AI Cloud’s Container Engine (CCE) automatically adjusts the number of Pods for a service based on CPU and memory metrics, ensuring better handling of scenarios like traffic surges. This document outlines how to implement Pod auto-scaling using the Cloud Container Engine console or the YAML method.

Implementation principle

The HPA component collects Pod monitoring metrics every 15 seconds, calculates the desired number of replicas based on current metric data, current replica count, and target metric values, and adjusts the number of replicas if necessary to achieve auto-scaling.
For example, consider a workload with 2 instances, a current average CPU usage of 90%, and a target CPU usage of 60%. The number of adjusted replicas will be calculated as: 90% * 2 / 60% = 3. This means the 2 instances will automatically scale up to 3 instances.

Description:

If a workload has multiple scaling metrics configured, HPA will calculate the target number of replicas for each metric separately and scale based on the maximum value.

Prerequisites

A K8S Cluster CCE has been created. For specific operations, refer to Create a K8S Cluster CCE.

Operation steps

Use HPA through the console

Method I: Enable HPA for an existing workload

Sign in to the Baidu AI Cloud management console. Navigate to Product Services > Cloud Native > Cloud Container Engine (CCE). Click Cluster Management > Cluster List, then select the target cluster to enter the Cluster Details page. In the sidebar, click Autoscaler - Horizontal Pod Autoscaler (HPA).
Click the Create Rule operation on the Horizontal Pod Autoscaler List page.
Configure the scaling settings on the Create Auto Scaling page.

ConfigMap	Required/Optional	Configuration
Namespace	Required	Select the namespace where the scaling rules are applied.
Rule name	Required	Specify a name for the auto-scaling rule. It must be up to 63 characters long, contain only lowercase letters, numbers, and hyphens (`-`), and begin with a lowercase letter while ending with a number or lowercase letter.
Workload type	Required	Supports both stateless deployments (Deployment) and stateful deployments (StatefulSet).
Workload name	Required	Choose between stateless and stateful deployments within the selected namespace; multiple selections are allowed.
Scaling metrics	Required	Provide CPU and memory metrics for scaling; multiple metrics can be configured.
Minimum number of Pods	Required	The number of Pods will not be scaled below the specified minimum range.
Maximum Pod count	Required	The number of Pods will not exceed the specified maximum range.

Description:

Support selecting multiple workloads. If multiple workloads are selected, corresponding scaling rules will be generated based on the number of workloads. The actually generated scaling rule name is "rule name+workload name”

The number of Pods will be automatically adjusted within the specified range, but will not go beyond those limits.

Click OK to complete creation.

Method II: Enable HPA during the creation of a workload

Taking the creation of a stateless deployment as an example:

Sign in to the Baidu AI Cloud management console. Navigate to Product Services - Cloud Native - Cloud Container Engine (CCE). Click Cluster Management - Cluster List, then select the target cluster to enter the Cluster Details page. In the sidebar, click Workloads > Stateless Deployment.
Click the Create Stateless Deployment operation on the Stateless Deployment page.
On the Create Stateless Deployment - Advanced Settings page, check Horizontal Pod Autoscaler to enable Horizontal Pod Autoscaler (HPA) for the workload.

ConfigMap	Required/Optional	Configuration
Rule name	Required	Specify the name of the auto scaling rule. It must be 63 characters long, contain only lowercase letters, numbers and the separator ("-"), and start with a lowercase letter and end with a number or lowercase letter
Scaling metrics	Required	Provide CPU and memory metrics; multiple metrics can be selected
Minimum number of Pods	Required	The count of Pods will not be lower than the set range
Maximum Pod count	Required	The count of Pods will not exceed the set range

Click Finish, and then a stateless deployment supporting HPA has been configured.

Use HPA through kubectl commands

To demonstrate HPA, a custom Docker image based on the php-apache image will be used. The image includes an index.php page with code for running CPU-intensive computing tasks. An example Dockerfile is as follows:

Dockerfile

1FROM php:5-apache
2ADD index.php /var/www/html/index.php
3RUN chmod a+rx index.php

An example index.php file is as follows:

                PHP
                
            

                <?php
$x = 0.0001;
for ($i = 0; $i <= 1000000; $i++) {
	$x += sqrt($x);
}
echo "OK!";
?>
            

Prerequisites

kubectl has been configured to access the Kubernetes cluster from the local machine. For details, refer to Connect to a Cluster via kubectl.

Step I: Deploy the workload and service

First, deploy a deployment running the above Docker image and expose it as a Kubernetes service

                Bash
                
                kubectl run php-apache --image=hpa-example --requests=cpu=200m --expose --port=80
service "php-apache" created
deployment "php-apache" created

Step II: Create an HPA

                Bash
                
            

                kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
deployment "php-apache" autoscaled
# After creation, the current monitoring value is "unknown". After performing step 3, wait 1-2 minutes, and then it will change to a normal percentage here
[root@instance-2tpjy37t ~]# kubectl get hpa
NAME         REFERENCE               TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   <unknown>/50%   1         10         1          5s
            

The corresponding YAML is as follows:

YAML

1apiVersion: autoscaling/v2alpha1
2kind: HorizontalPodAutoscaler
3metadata:
4  name: php-apache
5  namespace: default
6spec:
7  scaleTargetRef:
8    apiVersion: apps/v1beta1
9    kind: Deployment
10    name: php-apache
11  minReplicas: 1
12  maxReplicas: 10
13  metrics:
14  - type: Resource
15    resource:
16      name: cpu
17      targetAverageUtilization: 50

Field explanations:

scaleTargetRef: Target object of HPA auto scaling
minReplicas: Minimum count of Pods
maxReplicas: Maximum allowed count of Pods
metrics: Metrics
targetAverageUtilization: The set resource utilization rate. Horizontal Pod Autoscaler is triggered when this value is exceeded

HPA currently supports 3 types of metrics. For details, refer to kubernetes Horizontal Pod Autoscaler:

Predefined metrics:CPU and memory usage of Pods (built-in support)
Custom Pod metrics:Monitoring metrics provided by applications (requiring deployment of monitoring and custom metric server)
Custom object metrics:Monitoring metrics of other resources in the same namespace as the Pod (requiring deployment of monitoring and custom metric server)

Step III: Add load to the service and verify auto scale-up

Start a container and send infinite query requests to the php-apache server in a loop (run the following command in another terminal)

                Bash
                
                kubectl run -i --tty load-generator --image=busybox /bin/sh

Hit enter for command prompt

$ while true; do wget -q -O- http://php-apache.default.svc.cluster.local; done

# Output:
OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!!OK!OK!OK!OK

Step IV: Observe HPA changes

After the load increases, the replica count of the deployment will start to increase

                Bash
                
            

                [root@instance-2tpjy37t ~]# kubectl get hpa
NAME         REFERENCE               TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   332%/50%   1         10         7          19m
[root@instance-2tpjy37t ~]# kubectl get deployment php-apache
NAME         DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
php-apache   7         7         7            7           19m
            

Step V: Stop the service load and verify automatic scale-down

In the terminal of the container generating the load, enter <ctrl> + c to terminate the load generation. Then check the load status again (wait a few minutes).

                Bash
                
            

                [root@instance-2tpjy37t ~]# kubectl get hpa
NAME         REFERENCE                     TARGET       MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache/scale   0% / 50%     1         10        1          11m
# After the load decreases, the count of Pods will also decrease accordingly
[root@instance-2tpjy37t ~]# kubectl get deployment php-apache
NAME         DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
php-apache   1         1         1            1           27m
            

Container Timing Horizontal Scaling (CronHPA)

Implementing Second-Level Elastic Scaling with cce-autoscaling-placeholder

CCE CCE

CCE CCE

Container Horizontal Scaling (HPA)

Implementation principle

Prerequisites

Operation steps

Use HPA through the console

Method I: Enable HPA for an existing workload

Method II: Enable HPA during the creation of a workload

Use HPA through kubectl commands

Prerequisites

Step I: Deploy the workload and service

Step II: Create an HPA

Step III: Add load to the service and verify auto scale-up

Step IV: Observe HPA changes

Step V: Stop the service load and verify automatic scale-down