DoiT Cloud Intelligence™

Kubernetes Fine-Grained Horizontal Pod Autoscaling with Container Resource Metrics

By Chimbu ChinnaduraiMay 2, 20244 min read

Kubernetes Horizontal Pod Autoscaler (HPA) has revolutionized how we manage workloads by automatically scaling deployments/statefulset pods up or down based on the average CPU utilization, average memory utilization or any other custom metric you specify to match demand.

Current Implementation

When calculating the resource usage of pods, the total value is determined by adding up the usage of each container within the pod. However, this method may not be appropriate for workloads where container usage is not closely related or does not change at the same rate.

For instance, a sidecar container handling logs might not consume significant resources, while the main blog application container handles most of the workload.HPA wouldn’t scale based on the critical container’s usage because the average pod metric might not reflect the true picture.

HPA scaling based on the average resource utilization of all pods in a deployment

New implementation

Introduced in Kubernetes v1.20 and now graduated to stable in v1.30, the Container resource metrics feature allows HPA to target individual container metrics within a pod. You can define the HPA to scale based on the resource utilization (CPU, memory, etc.) of a specific container within the pod.

This feature helps to allocate resources efficiently and avoid unnecessary scaling due to high pod utilization triggered by non-critical containers. By monitoring the resource consumption of the container responsible for the core functionality, you can focus on the real workload. This leads to better decision-making regarding scaling and prevents performance bottlenecks.

HPA scaling based on the average resource utilization of the target container across all pods in a deployment

In this blog, I will show you how to use the container resource metrics feature to scale your deployments in a multi-container pod setup.

Prerequisites

A Kubernetes cluster with version 1.27 or above.
Metrics Server is deployed on the Kubernetes cluster.
Kubectl is installed on your workstation.

Container resource metrics Scaling in action

Deploy a sample multi-container deployment with the below manifest, and cpu-stressor is the main container designed to simulate CPU stress on Kubernetes pods. Refer to the Github repo for more details about the cpu-stressor tool. The log-genertor is a sample secondary container in the same pod.


cat <<EOF | kubectl apply -f -
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: crm-scaling-demo
  labels:
    app: crm-scaling-demo
spec:
  selector:
    matchLabels:
      app: crm-scaling-demo
  template:
    metadata:
      labels:
        app: crm-scaling-demo
    spec:
      containers:
        - name: cpu-stressor
          image: narmidm/k8s-pod-cpu-stressor:1.0.0
          args:
            - "-cpu=0.5"
            - "-duration=3600s"
          resources:
            limits:
              cpu: "200m"
            requests:
              cpu: "100m"
        - name: log-generator
          image: busybox:1.28
          args: [/bin/sh, -c,\
                'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
          resources:
            requests:
              cpu: "100m"
EOF

Sample pod and resource usage across all containers in the pod

Create a HorizontailPodAutoscalerresource that performs scaling based on cpu-stressor container CPU usage instead of pod metrics.

cat <<EOF | kubectl apply -f -
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: crm-scaling-demo
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: crm-scaling-demo
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: ContainerResource #new-metrics-source
    containerResource:
      name: cpu
      container: cpu-stressor #container-name
      target:
        type: Utilization
        averageUtilization: 50
EOF

Sample HPA configuration based on the cpu-stressor container metrics

In the above example, the HPA controller scales the target such that the average utilization of the CPU in the cpu-stressor container of all the pods is 50%.

Wait for the cpu-stressor container to simulate CPU stress, and you can see HPA recalculates the number of pods based on the CPU utilization of the cpu-stressor container.

Sample HPA scaling based on the cpu-stressor container metrics

HPA scaling based on container resource metrics demo

The screenshot and demo video demonstrate successful HPA scaling based on the cpu-stressor container in a multi-container pod setup🚀.

With container resource metrics graduating to stable in Kubernetes v1.30, you can now achieve a new level of precision in your horizontal pod autoscaling, ensuring optimal application performance.

I hope this blog post has been helpful. For more information, please refer to the following resources: