Kubernetes Autoscaling

9 min readDec 19, 2024

Kubernetes offers following ways to automatically scale your applications, each addressing different aspects of scaling.

1. Horizontal Pod Autoscaler (HPA): The most common and fundamental autoscaling mechanism in Kubernetes. It automatically adjusts the number of pod replicas in a Deployment, ReplicaSet, StatefulSet, or ReplicationController.

2. Vertical Pod Autoscaler (VPA): VPA automatically adjusts the resource requests and limits (CPU and memory) for containers within pods.

3. Cluster Autoscaler (CA): CA automatically adjusts the size of your Kubernetes cluster by adding or removing nodes.

4. Event-Driven Autoscaling (KEDA): KEDA (Kubernetes Event-driven Autoscaling) allows you to scale applications based on events from various sources, such as message queues, databases, and cloud services.

Horizontal Pod Autoscaler (HPA)

The HPA’s functionality relies on several interacting components within the Kubernetes ecosystem:

Horizontal Pod Autoscaler Controller:

This is the core of the HPA. It’s a control loop that runs within the Kubernetes controller manager (kube-controller-manager).
It continuously monitors metrics and makes scaling decisions.
It’s responsible for fetching metrics, calculating the desired number of replicas, and updating the target resource (Deployment, ReplicaSet, etc.).
Formula to calcualte replica ratio = current metric value / target metric value

2. Metrics Server:

An in-cluster component that collects resource usage metrics (CPU and memory) from kubelets on each node.
It aggregates these metrics and exposes them through the Kubernetes Metrics API (metrics.k8s.io).
The HPA controller queries the Metrics Server to get resource utilization data for pods.
Note: The Metrics Server is essential for CPU and memory-based autoscaling.

3. Custom Metrics Adapter:

If you’re using custom metrics or external metrics, you’ll need a custom metrics adapter.
This adapter implements the Kubernetes Custom Metrics API (custom.metrics.k8s.io) or External Metrics API (external.metrics.k8s.io).
Popular options include:
Prometheus Adapter: Exposes metrics from Prometheus to the HPA.
Adapter for Google Cloud Monitoring (formerly Stackdriver): Exposes metrics from Google Cloud Monitoring.
KEDA (as mentioned before) can also serve as a custom metrics adapter, especially for event-driven metrics.

4. API Server:

The central hub of Kubernetes.
The HPA controller interacts with the API server to: Fetch HPA resource definitions. Fetch the current state of the target resource (e.g., Deployment). Update the target resource’s replica count.
The Metrics Server and custom metrics adapters also register with the API server to make their metrics discoverable.

5. Kubelet:

An agent that runs on each node in the cluster.
It collects resource usage data from the containers running on its node.
The Metrics Server retrieves this data from the kubelet’s summary API

6. Target Resource (Deployment, ReplicaSet, etc.):

The resource that the HPA controls.
The HPA controller modifies the replicas field of the target resource to scale the number of pods up or down.

Following is sample yaml file to configure HPA.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue
        averageValue: 500Mi

Note: If you are using Helm with help of GitOps then to avoid the conflict, keep the replicaCount=null to avoid conflicting with HPA.

Vertical Pod Autoscaler (VPA)

It works similar to HPA. Following is sample yaml file for reference.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind:       Deployment
    name:       my-app-deployment
  updatePolicy:
    updateMode: "Auto"

Note: By default VPA is not installed with Kubernetes cluster. You will need to install it first.

Cluster Autoscaler (CA)

The Cluster Autoscaler runs as a deployment within your Kubernetes cluster. It primarily interacts with your cloud provider’s infrastructure services ( Google Cloud Managed Instance Groups etc) and the Kubernetes API server.

The Cluster Autoscaler operates in a continuous loop, performing the following steps:

Scan for Unschedulable Pods: The CA periodically checks for pods that are in a Pending state and have conditions indicating that they cannot be scheduled due to insufficient resources (e.g., Insufficient cpu, Insufficient memory).
Check Scale-Up Conditions: If unschedulable pods are found, the CA determines if a scale-up is necessary and possible:

Node Group Limits: It checks if the target node group has reached its maximum size limit.
Cluster-Wide Limits: It considers any configured cluster-wide resource limits.
Backoff: It respects any backoff mechanisms that might be in place to prevent rapid scaling.

3. Simulate Scheduling: The CA simulates scheduling the unschedulable pods, along with a daemonset, onto potential new nodes. This simulation helps it determine the most efficient way to accommodate the pending pods and if it is possible to schedule them on the existing nodes.

4. Trigger Scale-Up (if needed): If the simulation indicates that new nodes are required, the CA interacts with the cloud provider’s API (via the cloud provider interface) to increase the desired size of the appropriate node group or pool. New nodes are then provisioned by the cloud provider.

5. Scan for Underutilized Nodes: The CA identifies nodes that have been underutilized for a certain period (configurable, default is 10 minutes). A node is typically considered underutilized if its CPU and memory utilization are below a configured threshold (default is 50%).

6. Check Scale-Down Conditions: If underutilized nodes are found, the CA checks if a scale-down is safe:

Pod Disruption Budgets (PDBs): It verifies that removing the node won’t violate any PDBs.
System Pods: It checks for system pods (e.g., kube-proxy, DNS) that might be disrupted. It usually will not remove nodes with only system pods unless they can be moved elsewhere or you have set the appropriate annotations (e.g. "cluster-autoscaler.kubernetes.io/scale-down-disabled": "true").
Unmovable Pods: It will not scale down a node if the pods on it cannot be moved to other nodes, for example due to local storage or specific node selectors.

7. Cordon and Drain Node: Before terminating a node, the CA cordons it (marks it as unschedulable) and then gracefully drains it by evicting the pods, this allows the pods to be rescheduled on other nodes.

8. Trigger Scale-Down (if safe): If it’s safe to remove the node, the CA interacts with the cloud provider’s API to decrease the desired size of the node group or pool. The cloud provider then terminates the corresponding node(s).

9. Repeat: The CA repeats this loop periodically (default is every 10 seconds, but it can be adjusted using the --scan-interval flag).

Event-Driven Autoscaling (KEDA)

KEDA is an open-source project that brings event-driven autoscaling to Kubernetes workloads. It allows you to scale your applications in or out based on events from a wide variety of sources (like message queues, databases, and cloud services) instead of only relying on standard resource metrics like CPU and memory utilization.

How KEDA Works:

KEDA works in conjunction with the Kubernetes Horizontal Pod Autoscaler (HPA). It acts as an advanced metrics provider, feeding custom and external metrics to the HPA based on events. Here’s a breakdown of the key components:

KEDA Operator:

A Kubernetes operator that you deploy to your cluster.
It watches for ScaledObject resources (more on this below).
It creates and manages the HPA objects based on the ScaledObject definitions.

2. Scaler:

A component that connects to a specific event source (e.g., Kafka, RabbitMQ, AWS SQS).
Each scaler knows how to authenticate with and retrieve metrics from its corresponding event source.
KEDA has a rich ecosystem of built-in scalers, and you can also create your own.
Each scaler implements a specific interface to expose metrics in a standardized way.

3. Metrics Adapter:

KEDA includes a custom metrics adapter that implements the Kubernetes Custom Metrics API and External Metrics API.
It receives metrics from the scalers and makes them available to the HPA.

4. Horizontal Pod Autoscaler (HPA):

The standard Kubernetes HPA, but instead of using resource metrics (CPU/memory) or basic custom metrics, it’s configured to use the custom or external metrics provided by KEDA.
The HPA uses these metrics to calculate the desired number of replicas and scales the target resource accordingly.

5. ScaledObject:

A custom resource definition (CRD) provided by KEDA.
You create ScaledObject resources to define how a specific workload should be scaled based on events.
It specifies:
The target workload (Deployment, StatefulSet, or custom resource).
The event source (scaler type and configuration).
Authentication details for the event source (if needed).
Scaling triggers (metric thresholds).
Advanced HPA settings (optional)

6. ScaledJob:

Similar to ScaledObject, it's a custom resource definition (CRD) provided by KEDA.
You create ScaledJob resources to define how a specific Job should be scaled based on events.
It specifies:
The target workload (Job).
The event source (scaler type and configuration).
Authentication details for the event source (if needed).
Scaling triggers (metric thresholds).

ScaledObject for Kafka (Deployment Example):

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-scaledobject
  namespace: your-namespace # Replace with your namespace
spec:
  scaleTargetRef:
    name: my-kafka-consumer-deployment # Replace with your Deployment name
  pollingInterval: 20  # Optional. Check Kafka every 20 seconds (default is 30 seconds)
  cooldownPeriod: 300   # Optional. Default is 300 seconds
  minReplicaCount: 0    # Optional. Default is 0, allow scaling to zero
  maxReplicaCount: 10   # Optional. Default is 100
  advanced: # Optional. HPA Behavior
    restoreToOriginalReplicaCount: true
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 300
          policies:
          - type: Percent
            value: 50
            periodSeconds: 60
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: my-kafka-broker-1:9092,my-kafka-broker-2:9092 # Replace with your Kafka brokers
      consumerGroup: my-consumer-group # Replace with your consumer group
      topic: my-topic # Replace with your Kafka topic
      lagThreshold: '100' # Scale up when the lag is more than 100 messages per partition
      offsetResetPolicy: latest # Optional. latest, earliest, none. Default: latest
      # For SASL/PLAIN or SASL/SCRAM authentication
      username: your_username # Optional. Remove if not needed
      password: your_password # Optional. Remove if not needed
      sasl: plain # Optional. plain, scram-sha-256, scram-sha-512. Remove if not needed
      tls: enable # Optional. enable, disable. Default: disable. For self-signed certs use 'enable' only. For CA signed certs, add ca: /path/to/cert within trigger.
      # For TLS Client Authentication with mTLS
      # cert: /path/to/cert
      # key: /path/to/key
      # ca: /path/to/ca
    authenticationRef:
      name: keda-trigger-auth-kafka # Optional. To use a TriggerAuthentication

ScaledJob for Kafka (Job Example):

apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: kafka-scaledjob
  namespace: your-namespace # Replace with your namespace
spec:
  jobTargetRef:
    parallelism: 1 # Specifies the number of parallel jobs
    completions: 1 # Specifies the number of successful completions required
    activeDeadlineSeconds: 600 # Specifies the duration in seconds relative to the startTime that the job may be active before the system tries to terminate it.
    backoffLimit: 6 # Specifies the number of retries before marking the job as failed.
    template: # The template of the job to be created
      metadata:
        labels:
          app: kafka-job
      spec:
        containers:
        - name: kafka-job-container
          image: my-kafka-job-image:latest
          imagePullPolicy: Always
          command: ["/bin/sh", "-c"]
          args: ["run-kafka-job.sh"]
          env:
          - name: KAFKA_BROKERS
            value: "my-kafka-broker-1:9092,my-kafka-broker-2:9092" # You can pass any configuration to your job that is required to connect to kafka or any other configuration
          # ... other container specs
        restartPolicy: Never  
  pollingInterval: 20
  successfulJobsHistoryLimit: 5
  failedJobsHistoryLimit: 5
  maxReplicaCount: 10
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: my-kafka-broker-1:9092,my-kafka-broker-2:9092 # Replace with your Kafka brokers
      consumerGroup: my-consumer-group # Replace with your consumer group, it will get created if it does not exist
      topic: my-topic # Replace with your Kafka topic
      lagThreshold: '100' # Scale up when the lag is more than 100 messages per partition
      offsetResetPolicy: latest # latest, earliest, none
      # For SASL/PLAIN or SASL/SCRAM authentication
      username: your_username
      password: your_password
      sasl: plain # plain, scram-sha-256, scram-sha-512
      tls: enable
      # For TLS Client Authentication with mTLS
      # cert: /path/to/cert
      # key: /path/to/key
      # ca: /path/to/ca
    authenticationRef:
      name: keda-trigger-auth-kafka

Happy learning :-)

Kubernetes Autoscaling

Horizontal Pod Autoscaler (HPA)

Vertical Pod Autoscaler (VPA)

Cluster Autoscaler (CA)

Event-Driven Autoscaling (KEDA)

Written by Dilip Kumar

No responses yet