Understanding Kubernetes Resource Management
Kubernetes uses resource requests and limits to manage CPU and memory allocation for containers. Proper resource management ensures applications run reliably while maximizing cluster utilization.
Requests vs Limits
Resource Requests
The amount of resources guaranteed to the container. Kubernetes uses requests for:
- Scheduling: Ensures node has enough resources before placing pod
- QoS Class: Determines pod priority during eviction
- Resource reservation: Prevents node overcommit of requests
Resource Limits
The maximum resources a container can use. When limits are reached:
- CPU: Container is throttled (slowed down)
- Memory: Container is killed (OOMKilled) and restarted
CPU Resources
CPU Units
- 1 CPU: One core of compute time
- 1000m (millicores): Same as 1 CPU
- 100m: 0.1 CPU (10% of one core)
- Fractional cores: Can specify 0.5, 0.25, etc.
CPU Behavior
When limit is reached: Container is CPU throttled. The kernel limits CPU cycles, causing the application to slow down but not crash.
Best practice: Set CPU requests based on average usage, limits based on peak usage. Allows bursting when needed.
Memory Resources
Memory Units
- Ki, Mi, Gi: Binary units (1024-based)
- K, M, G: Decimal units (1000-based)
- Common: Use Mi (Mebibytes) or Gi (Gibibytes)
Memory Behavior
When limit is reached: Container is killed with OOMKilled status. Kubernetes restarts it per restart policy.
Best practice: Set memory limits close to requests to avoid OOM kills. Memory is not compressible like CPU.
Quality of Service (QoS) Classes
Guaranteed
Highest priority, last to be evicted:
- Requests = Limits for all containers
- Both CPU and memory must be specified
Burstable
Medium priority, evicted before Guaranteed:
- Requests < Limits, or only requests specified
- Can use more than requested when available
BestEffort
Lowest priority, first to be evicted:
- No requests or limits specified
- Can use any available resources
- Most likely to be killed under pressure
Request/Limit Strategies
Conservative (70% request)
- Request: 70% of limit
- Limit: Peak usage
- Use case: Variable workloads, cost optimization
- Pros: Better node utilization, lower costs
- Cons: Risk of resource contention
Balanced (85% request)
- Request: 85% of limit
- Limit: Peak usage + 15% buffer
- Use case: Production workloads
- Pros: Good balance of utilization and reliability
Guaranteed (100% request)
- Request: Same as limit
- Limit: Expected maximum
- Use case: Critical workloads, databases
- Pros: Maximum reliability, highest QoS
- Cons: Lower node utilization, higher costs
Node Sizing Considerations
System Overhead
Reserve resources for Kubernetes system components:
- kubelet: ~100m CPU, ~200Mi memory
- OS: ~50m CPU, ~300Mi memory
- Eviction threshold: ~100Mi memory
- Total: ~10-20% of node resources
Pods Per Node
AWS default max pods per node based on ENI limits:
- t3.small: 11 pods
- t3.medium: 17 pods
- t3.large: 35 pods
- m5.large: 29 pods
- m5.xlarge: 58 pods
High Availability
For HA deployments:
- Run at least 2-3 replicas
- Size nodes so 2+ pods fit per node
- Enable pod anti-affinity to spread replicas
- Plan for one node failure