Autoscaling Workloads (2025)

With autoscaling, you can automatically update your workloads in one way or another. This allows your cluster to react to changes in resource demand more elastically and efficiently.

In Kubernetes, you can scale a workload depending on the current demand of resources.This allows your cluster to react to changes in resource demand more elastically and efficiently.

When you scale a workload, you can either increase or decrease the number of replicas managed bythe workload, or adjust the resources available to the replicas in-place.

The first approach is referred to as horizontal scaling, while the second is referred to asvertical scaling.

There are manual and automatic ways to scale your workloads, depending on your use case.

Scaling workloads manually

Kubernetes supports manual scaling of workloads. Horizontal scaling can be doneusing the kubectl CLI.For vertical scaling, you need to patch the resource definition of your workload.

See below for examples of both strategies.

  • Horizontal scaling: Running multiple instances of your app
  • Vertical scaling: Resizing CPU and memory resources assigned to containers

Scaling workloads automatically

Kubernetes also supports automatic scaling of workloads, which is the focus of this page.

The concept of Autoscaling in Kubernetes refers to the ability to automatically update anobject that manages a set of Pods (for example aDeployment).

Scaling workloads horizontally

In Kubernetes, you can automatically scale a workload horizontally using a HorizontalPodAutoscaler (HPA).

It is implemented as a Kubernetes API resource and a controllerand periodically adjusts the number of replicasin a workload to match observed resource utilization such as CPU or memory usage.

There is a walkthrough tutorial of configuring a HorizontalPodAutoscaler for a Deployment.

Scaling workloads vertically

FEATURE STATE:Kubernetes v1.25 [stable]

You can automatically scale a workload vertically using a VerticalPodAutoscaler (VPA).Unlike the HPA, the VPA doesn't come with Kubernetes by default, but is a separate projectthat can be found on GitHub.

Once installed, it allows you to create CustomResourceDefinitions(CRDs) for your workloads which define how and when to scale the resources of the managed replicas.

Note:

You will need to have the Metrics Serverinstalled to your cluster for the HPA to work.

At the moment, the VPA can operate in four different modes:

ModeDescription
AutoCurrently, Recreate might change to in-place updates in the future
RecreateThe VPA assigns resource requests on pod creation as well as updates them on existing pods by evicting them when the requested resources differ significantly from the new recommendation
InitialThe VPA only assigns resource requests on pod creation and never changes them later.
OffThe VPA does not automatically change the resource requirements of the pods. The recommendations are calculated and can be inspected in the VPA object.

Requirements for in-place resizing

FEATURE STATE:Kubernetes v1.27 [alpha]

Resizing a workload in-place without restarting the Podsor its Containers requires Kubernetes version 1.27 or later.Additionally, the InPlaceVerticalScaling feature gate needs to be enabled.

InPlacePodVerticalScaling: Enables in-place Pod vertical scaling.

Autoscaling based on cluster size

For workloads that need to be scaled based on the size of the cluster (for examplecluster-dns or other system components), you can use theCluster Proportional Autoscaler.Just like the VPA, it is not part of the Kubernetes core, but hosted as itsown project on GitHub.

The Cluster Proportional Autoscaler watches the number of schedulable nodesand cores and scales the number of replicas of the target workload accordingly.

If the number of replicas should stay the same, you can scale your workloads vertically according to the cluster size usingthe Cluster Proportional Vertical Autoscaler.The project is currently in beta and can be found on GitHub.

While the Cluster Proportional Autoscaler scales the number of replicas of a workload, the Cluster Proportional Vertical Autoscaleradjusts the resource requests for a workload (for example a Deployment or DaemonSet) based on the number of nodes and/or coresin the cluster.

Event driven Autoscaling

It is also possible to scale workloads based on events, for example using theKubernetes Event Driven Autoscaler (KEDA).

KEDA is a CNCF graduated enabling you to scale your workloads based on the numberof events to be processed, for example the amount of messages in a queue. There existsa wide range of adapters for different event sources to choose from.

Autoscaling based on schedules

Another strategy for scaling your workloads is to schedule the scaling operations, for example in order toreduce resource consumption during off-peak hours.

Similar to event driven autoscaling, such behavior can be achieved using KEDA in conjunction withits Cron scaler. The Cron scaler allows you to define schedules(and time zones) for scaling your workloads in or out.

Scaling cluster infrastructure

If scaling workloads isn't enough to meet your needs, you can also scale your cluster infrastructure itself.

Scaling the cluster infrastructure normally means adding or removing nodes.Read cluster autoscalingfor more information.

What's next

  • Learn more about scaling horizontally
    • Scale a StatefulSet
    • HorizontalPodAutoscaler Walkthrough
  • Resize Container Resources In-Place
  • Autoscale the DNS Service in a Cluster
  • Learn about cluster autoscaling
Autoscaling Workloads (2025)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Fredrick Kertzmann

Last Updated:

Views: 5862

Rating: 4.6 / 5 (46 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Fredrick Kertzmann

Birthday: 2000-04-29

Address: Apt. 203 613 Huels Gateway, Ralphtown, LA 40204

Phone: +2135150832870

Job: Regional Design Producer

Hobby: Nordic skating, Lacemaking, Mountain biking, Rowing, Gardening, Water sports, role-playing games

Introduction: My name is Fredrick Kertzmann, I am a gleaming, encouraging, inexpensive, thankful, tender, quaint, precious person who loves writing and wants to share my knowledge and understanding with you.