If we don’t tell Kubernetes how much CPU and memory our Pods need, it’s flying blind. Pods could hog all the resources on a node, starve other workloads, or get killed randomly. Resource management is how we keep things predictable.
Requests vs Limits
Every container can have two resource settings:
- Requests — the minimum guaranteed resources. The scheduler uses this to decide which node has enough room.
- Limits — the maximum a container can use. It’s the ceiling.
spec:
containers:
- name: app
image: my-app:latest
resources:
requests:
cpu: "250m" # 250 millicores = 0.25 CPU
memory: "128Mi" # 128 mebibytes
limits:
cpu: "500m" # can burst up to 0.5 CPU
memory: "256Mi" # hard ceiling
A quick note on units: 1 CPU = 1 vCPU/core. 250m = 0.25 cores. Memory uses Mi (mebibytes) or Gi (gibibytes).
What Happens When Limits Are Exceeded
This is a common interview question, and the answer is different for CPU vs memory:
- CPU limit exceeded — the container gets throttled. It won’t crash, but it’ll run slower. The kernel simply won’t give it more CPU time.
- Memory limit exceeded — the container gets OOMKilled (Out Of Memory Killed). Kubernetes terminates it immediately. This is harsh but necessary to protect the node.
QoS Classes
Kubernetes assigns a Quality of Service class to every Pod based on its resource settings. When a node runs out of memory, K8s uses QoS to decide which Pods to evict first.
- Guaranteed — requests equal limits for all containers. Last to be evicted. Set this for critical workloads.
- Burstable — requests are set but are lower than limits (or limits aren’t set). Evicted after BestEffort.
- BestEffort — no requests or limits set at all. First to be evicted. Avoid this in production.
# Guaranteed — requests == limits
resources:
requests:
cpu: "500m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "256Mi"
# Burstable — requests < limits
resources:
requests:
cpu: "250m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "256Mi"
LimitRanges and ResourceQuotas
Cluster admins use these to enforce guardrails.
LimitRange — sets default and max/min resources per container in a namespace. If a developer forgets to set requests, the LimitRange fills in defaults.
ResourceQuota — sets total resource caps per namespace. For example, “the dev namespace can’t use more than 20 CPU cores and 64Gi memory total.”
apiVersion: v1
kind: ResourceQuota
metadata:
name: dev-quota
namespace: dev
spec:
hard:
requests.cpu: "20"
requests.memory: "64Gi"
limits.cpu: "40"
limits.memory: "128Gi"
pods: "50" # max 50 Pods in this namespace
Horizontal Pod Autoscaler (HPA)
HPA automatically scales the number of Pod replicas based on metrics like CPU or memory usage. This is the most common autoscaling approach.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # scale up when CPU > 70%
# Quick way to create an HPA
kubectl autoscale deployment my-app --min=2 --max=10 --cpu-percent=70
HPA checks metrics every 15 seconds by default. It scales up quickly but scales down slowly (5-minute stabilization window) to avoid flapping.
Vertical Pod Autoscaler (VPA)
Instead of adding more Pods, VPA adjusts the CPU and memory requests of existing Pods. Useful when we don’t know the right resource values upfront — VPA watches actual usage and recommends (or automatically applies) better values.
The catch: VPA has to restart Pods to apply new resource values, so it’s often used in “recommend-only” mode where it suggests values and we apply them ourselves.
Cluster Autoscaler
Operates at the infrastructure level. When Pods can’t be scheduled because there aren’t enough nodes, the Cluster Autoscaler adds more nodes to the cluster. When nodes are underutilized, it removes them.
- HPA scales Pods (horizontal)
- VPA right-sizes Pods (vertical)
- Cluster Autoscaler scales nodes (infrastructure)
They work together: HPA creates more Pods → Pods become unschedulable → Cluster Autoscaler adds nodes.
In simple language, requests tell the scheduler what we need, limits protect the node from greedy containers, and autoscalers keep everything right-sized based on actual traffic. Always set requests and limits in production — a Pod without them is a ticking time bomb.