A HorizontalPodAutoscaler(HPA for short)automatically updates a workload resource (such asa Deployment orStatefulSet), with theaim of automatically scaling the workload to match demand.
Horizontal scaling means that the response to increased load is to deploy morePods.This is different from vertical scaling, which for Kubernetes would meanassigning more resources (for example: memory or CPU) to the Pods that are alreadyrunning for the workload.
If the load decreases, and the number of Pods is above the configured minimum,the HorizontalPodAutoscaler instructs the workload resource (the Deployment, StatefulSet,or other similar resource) to scale back down.
This document walks you through an example of enabling HorizontalPodAutoscaler toautomatically manage scale for an example web app. This example workload is Apachehttpd running some PHP code.
Before you begin
You need to have a Kubernetes cluster, and the kubectl command-line tool mustbe configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have acluster, you can create one by usingminikubeor you can use one of these Kubernetes playgrounds:
Your Kubernetes server must be at or later than version 1.23.To check the version, enter kubectl version
.If you're running an olderrelease of Kubernetes, refer to the version of the documentation for that release (seeavailable documentation versions).
To follow this walkthrough, you also need to use a cluster that has aMetrics Server deployed and configured.The Kubernetes Metrics Server collects resource metrics fromthe kubelets in your cluster, and exposes those metricsthrough the Kubernetes API,using an APIService to addnew kinds of resource that represent metric readings.
To learn how to deploy the Metrics Server, see themetrics-server documentation.
If you are running Minikube, run the following command to enable metrics-server:
minikube addons enable metrics-server
Run and expose php-apache server
To demonstrate a HorizontalPodAutoscaler, you will first start a Deployment that runs a container using thehpa-example
image, and expose it as a Serviceusing the following manifest:
apiVersion: apps/v1kind: Deploymentmetadata: name: php-apachespec: selector: matchLabels: run: php-apache template: metadata: labels: run: php-apache spec: containers: - name: php-apache image: registry.k8s.io/hpa-example ports: - containerPort: 80 resources: limits: cpu: 500m requests: cpu: 200m---apiVersion: v1kind: Servicemetadata: name: php-apache labels: run: php-apachespec: ports: - port: 80 selector: run: php-apache
To do so, run the following command:
kubectl apply -f https://k8s.io/examples/application/php-apache.yaml
deployment.apps/php-apache createdservice/php-apache created
Create the HorizontalPodAutoscaler
Now that the server is running, create the autoscaler using kubectl
. Thekubectl autoscale subcommand,part of kubectl
, helps you do this.
You will shortly run a command that creates a HorizontalPodAutoscaler that maintainsbetween 1 and 10 replicas of the Pods controlled by the php-apache Deployment thatyou created in the first step of these instructions.
Roughly speaking, the HPA controller will increase and decreasethe number of replicas (by updating the Deployment) to maintain an average CPU utilization across all Pods of 50%.The Deployment then updates the ReplicaSet - this is part of how all Deployments work in Kubernetes -and then the ReplicaSet either adds or removes Pods based on the change to its .spec
.
Since each pod requests 200 milli-cores by kubectl run
, this means an average CPU usage of 100 milli-cores.See Algorithm details for more detailson the algorithm.
Create the HorizontalPodAutoscaler:
kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
horizontalpodautoscaler.autoscaling/php-apache autoscaled
You can check the current status of the newly-made HorizontalPodAutoscaler, by running:
# You can use "hpa" or "horizontalpodautoscaler"; either name works OK.kubectl get hpa
The output is similar to:
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGEphp-apache Deployment/php-apache/scale 0% / 50% 1 10 1 18s
(if you see other HorizontalPodAutoscalers with different names, that means they already existed,and isn't usually a problem).
Please note that the current CPU consumption is 0% as there are no clients sending requests to the server(the TARGET
column shows the average across all the Pods controlled by the corresponding deployment).
Increase the load
Next, see how the autoscaler reacts to increased load.To do this, you'll start a different Pod to act as a client. The container within the client Podruns in an infinite loop, sending queries to the php-apache service.
# Run this in a separate terminal# so that the load generation continues and you can carry on with the rest of the stepskubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
Now run:
# type Ctrl+C to end the watch when you're readykubectl get hpa php-apache --watch
Within a minute or so, you should see the higher CPU load; for example:
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGEphp-apache Deployment/php-apache/scale 305% / 50% 1 10 1 3m
and then, more replicas. For example:
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGEphp-apache Deployment/php-apache/scale 305% / 50% 1 10 7 3m
Here, CPU consumption has increased to 305% of the request.As a result, the Deployment was resized to 7 replicas:
kubectl get deployment php-apache
You should see the replica count matching the figure from the HorizontalPodAutoscaler
NAME READY UP-TO-DATE AVAILABLE AGEphp-apache 7/7 7 7 19m
Note:
It may take a few minutes to stabilize the number of replicas. Since the amountof load is not controlled in any way it may happen that the final number of replicaswill differ from this example.
Stop generating load
To finish the example, stop sending the load.
In the terminal where you created the Pod that runs a busybox
image, terminatethe load generation by typing <Ctrl> + C
.
Then verify the result state (after a minute or so):
# type Ctrl+C to end the watch when you're readykubectl get hpa php-apache --watch
The output is similar to:
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGEphp-apache Deployment/php-apache/scale 0% / 50% 1 10 1 11m
and the Deployment also shows that it has scaled down:
kubectl get deployment php-apache
NAME READY UP-TO-DATE AVAILABLE AGEphp-apache 1/1 1 1 27m
Once CPU utilization dropped to 0, the HPA automatically scaled the number of replicas back down to 1.
Autoscaling the replicas may take a few minutes.
Autoscaling on multiple metrics and custom metrics
You can introduce additional metrics to use when autoscaling the php-apache
Deploymentby making use of the autoscaling/v2
API version.
First, get the YAML of your HorizontalPodAutoscaler in the autoscaling/v2
form:
kubectl get hpa php-apache -o yaml > /tmp/hpa-v2.yaml
Open the /tmp/hpa-v2.yaml
file in an editor, and you should see YAML which looks like this:
apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: php-apachespec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: php-apache minReplicas: 1 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50status: observedGeneration: 1 lastScaleTime: <some-time> currentReplicas: 1 desiredReplicas: 1 currentMetrics: - type: Resource resource: name: cpu current: averageUtilization: 0 averageValue: 0
Notice that the targetCPUUtilizationPercentage
field has been replaced with an array called metrics
.The CPU utilization metric is a resource metric, since it is represented as a percentage of a resourcespecified on pod containers. Notice that you can specify other resource metrics besides CPU. By default,the only other supported resource metric is memory
. These resources do not change names from clusterto cluster, and should always be available, as long as the metrics.k8s.io
API is available.
You can also specify resource metrics in terms of direct values, instead of as percentages of therequested value, by using a target.type
of AverageValue
instead of Utilization
, andsetting the corresponding target.averageValue
field instead of the target.averageUtilization
.
metrics: - type: Resource resource: name: memory target: type: AverageValue averageValue: 500Mi
There are two other types of metrics, both of which are considered custom metrics: pod metrics andobject metrics. These metrics may have names which are cluster specific, and require a moreadvanced cluster monitoring setup.
The first of these alternative metric types is pod metrics. These metrics describe Pods, andare averaged together across Pods and compared with a target value to determine the replica count.They work much like resource metrics, except that they only support a target
type of AverageValue
.
Pod metrics are specified using a metric block like this:
type: Podspods: metric: name: packets-per-second target: type: AverageValue averageValue: 1k
The second alternative metric type is object metrics. These metrics describe a differentobject in the same namespace, instead of describing Pods. The metrics are not necessarilyfetched from the object; they only describe it. Object metrics support target
types ofboth Value
and AverageValue
. With Value
, the target is compared directly to the returnedmetric from the API. With AverageValue
, the value returned from the custom metrics API is dividedby the number of Pods before being compared to the target. The following example is the YAMLrepresentation of the requests-per-second
metric.
type: Objectobject: metric: name: requests-per-second describedObject: apiVersion: networking.k8s.io/v1 kind: Ingress name: main-route target: type: Value value: 2k
If you provide multiple such metric blocks, the HorizontalPodAutoscaler will consider each metric in turn.The HorizontalPodAutoscaler will calculate proposed replica counts for each metric, and then choose theone with the highest replica count.
For example, if you had your monitoring system collecting metrics about network traffic,you could update the definition above using kubectl edit
to look like this:
apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: php-apachespec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: php-apache minReplicas: 1 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50 - type: Pods pods: metric: name: packets-per-second target: type: AverageValue averageValue: 1k - type: Object object: metric: name: requests-per-second describedObject: apiVersion: networking.k8s.io/v1 kind: Ingress name: main-route target: type: Value value: 10kstatus: observedGeneration: 1 lastScaleTime: <some-time> currentReplicas: 1 desiredReplicas: 1 currentMetrics: - type: Resource resource: name: cpu current: averageUtilization: 0 averageValue: 0 - type: Object object: metric: name: requests-per-second describedObject: apiVersion: networking.k8s.io/v1 kind: Ingress name: main-route current: value: 10k
Then, your HorizontalPodAutoscaler would attempt to ensure that each pod was consuming roughly50% of its requested CPU, serving 1000 packets per second, and that all pods behind the main-routeIngress were serving a total of 10000 requests per second.
Autoscaling on more specific metrics
Many metrics pipelines allow you to describe metrics either by name or by a set of additionaldescriptors called labels. For all non-resource metric types (pod, object, and external,described below), you can specify an additional label selector which is passed to your metricpipeline. For instance, if you collect a metric http_requests
with the verb
label, you can specify the following metric block to scale only on GET requests:
type: Objectobject: metric: name: http_requests selector: {matchLabels: {verb: GET}}
This selector uses the same syntax as the full Kubernetes label selectors. The monitoring pipelinedetermines how to collapse multiple series into a single value, if the name and selectormatch multiple series. The selector is additive, and cannot select metricsthat describe objects that are not the target object (the target pods in the case of the Pods
type, and the described object in the case of the Object
type).
Autoscaling on metrics not related to Kubernetes objects
Applications running on Kubernetes may need to autoscale based on metrics that don't have an obviousrelationship to any object in the Kubernetes cluster, such as metrics describing a hosted service withno direct correlation to Kubernetes namespaces. In Kubernetes 1.10 and later, you can address this use casewith external metrics.
Using external metrics requires knowledge of your monitoring system; the setup issimilar to that required when using custom metrics. External metrics allow you to autoscale your clusterbased on any metric available in your monitoring system. Provide a metric
block with aname
and selector
, as above, and use the External
metric type instead of Object
.If multiple time series are matched by the metricSelector
,the sum of their values is used by the HorizontalPodAutoscaler.External metrics support both the Value
and AverageValue
target types, which function exactly the sameas when you use the Object
type.
For example if your application processes tasks from a hosted queue service, you could add the followingsection to your HorizontalPodAutoscaler manifest to specify that you need one worker per 30 outstanding tasks.
- type: External external: metric: name: queue_messages_ready selector: matchLabels: queue: "worker_tasks" target: type: AverageValue averageValue: 30
When possible, it's preferable to use the custom metric target types instead of external metrics, since it'seasier for cluster administrators to secure the custom metrics API. The external metrics API potentially allowsaccess to any metric, so cluster administrators should take care when exposing it.
Appendix: Horizontal Pod Autoscaler Status Conditions
When using the autoscaling/v2
form of the HorizontalPodAutoscaler, you will be able to seestatus conditions set by Kubernetes on the HorizontalPodAutoscaler. These status conditions indicatewhether or not the HorizontalPodAutoscaler is able to scale, and whether or not it is currently restrictedin any way.
The conditions appear in the status.conditions
field. To see the conditions affecting a HorizontalPodAutoscaler,we can use kubectl describe hpa
:
kubectl describe hpa cm-test
Name: cm-testNamespace: promLabels: <none>Annotations: <none>CreationTimestamp: Fri, 16 Jun 2017 18:09:22 +0000Reference: ReplicationController/cm-testMetrics: ( current / target ) "http_requests" on pods: 66m / 500mMin replicas: 1Max replicas: 4ReplicationController pods: 1 current / 1 desiredConditions: Type Status Reason Message ---- ------ ------ ------- AbleToScale True ReadyForNewScale the last scale time was sufficiently old as to warrant a new scale ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from pods metric http_requests ScalingLimited False DesiredWithinRange the desired replica count is within the acceptable rangeEvents:
For this HorizontalPodAutoscaler, you can see several conditions in a healthy state. The first,AbleToScale
, indicates whether or not the HPA is able to fetch and update scales, as well aswhether or not any backoff-related conditions would prevent scaling. The second, ScalingActive
,indicates whether or not the HPA is enabled (i.e. the replica count of the target is not zero) andis able to calculate desired scales. When it is False
, it generally indicates problems withfetching metrics. Finally, the last condition, ScalingLimited
, indicates that the desired scalewas capped by the maximum or minimum of the HorizontalPodAutoscaler. This is an indication thatyou may wish to raise or lower the minimum or maximum replica count constraints on yourHorizontalPodAutoscaler.
Quantities
All metrics in the HorizontalPodAutoscaler and metrics APIs are specified usinga special whole-number notation known in Kubernetes as aquantity. For example,the quantity 10500m
would be written as 10.5
in decimal notation. The metrics APIswill return whole numbers without a suffix when possible, and will generally returnquantities in milli-units otherwise. This means you might see your metric value fluctuatebetween 1
and 1500m
, or 1
and 1.5
when written in decimal notation.
Other possible scenarios
Creating the autoscaler declaratively
Instead of using kubectl autoscale
command to create a HorizontalPodAutoscaler imperatively wecan use the following manifest to create it declaratively:
apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: php-apachespec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: php-apache minReplicas: 1 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50
Then, create the autoscaler by executing the following command:
kubectl create -f https://k8s.io/examples/application/hpa/php-apache.yaml
horizontalpodautoscaler.autoscaling/php-apache created