ReplicaSets and Deployments: Why You Almost Never Create a ReplicaSet Directly
A ReplicaSet keeps N pod replicas running. A Deployment manages ReplicaSets and adds rolling updates, rollback history, and update strategies on top. You almost always create Deployments, not ReplicaSets — but understanding the relationship between them explains what happens during every deployment.
What a ReplicaSet does
A ReplicaSet's sole responsibility is maintaining a desired number of pod replicas. It watches for pods matching its label selector. If a pod is deleted, it creates a replacement. If there are too many pods matching its selector, it deletes the excess. It doesn't know how to update pods — that's not its job.
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: api
spec:
replicas: 3
selector:
matchLabels:
app: api
version: v1.2.3
template:
metadata:
labels:
app: api
version: v1.2.3
spec:
containers:
- name: api
image: myapp:1.2.3
Label selector ownership: how ReplicaSets adopt pods
GotchaKubernetesA ReplicaSet owns pods by label selector, not by explicit reference. Any pod matching the selector is counted toward the desired replicas. This has a dangerous implication: a manually created pod with matching labels will be adopted by the ReplicaSet, and if that puts the count over the desired count, the ReplicaSet will delete one of the pods — possibly the manually created one.
Prerequisites
- Kubernetes labels and selectors
- pod lifecycle
Key Points
- Pods are owned by label selector match, not by a parent reference in the pod spec.
- Adding a pod with matching labels reduces the ReplicaSet's count — it may delete pods to compensate.
- Removing a label from a pod 'releases' it from the ReplicaSet — the RS creates a new pod to maintain count.
- The selector is immutable once the ReplicaSet is created — changing it requires deleting and recreating.
Why Deployments exist: the update problem
If you need to update a ReplicaSet (new image version), you have to delete the old ReplicaSet and create a new one. During the transition, you either have 0 pods (downtime) or you manage the overlap manually. ReplicaSets have no concept of gradual rollout.
Deployments solve this by owning multiple ReplicaSets and managing transitions between them:
Deployment: api
├── ReplicaSet: api-7d9f4b6c5 (image: myapp:1.2.2) → 0 replicas (after rollout)
└── ReplicaSet: api-8a3c2e1f7 (image: myapp:1.2.3) → 3 replicas (current)
During a rolling update:
- Deployment creates a new ReplicaSet with the updated pod template
- New RS scales up by
maxSurgepods - Old RS scales down by
maxUnavailablepods - Repeat until new RS has all replicas and old RS has 0
The old ReplicaSet is kept (with 0 replicas) as rollback history. kubectl rollout undo deployment/api scales the old RS back up and scales the new RS down.
# View rollout status
kubectl rollout status deployment/api
# View rollout history
kubectl rollout history deployment/api
# Rollback to previous version
kubectl rollout undo deployment/api
# Rollback to a specific revision
kubectl rollout undo deployment/api --to-revision=2
# View what changed in a specific revision
kubectl rollout history deployment/api --revision=3
How the ReplicaSet selector connects to Deployment
Every Deployment generates ReplicaSet names by hashing the pod template. The hash appears in both the ReplicaSet name and the pod names:
$ kubectl get replicasets
NAME DESIRED CURRENT READY AGE
api-7d9f4b6c5 3 3 3 2d
api-8a3c2e1f7 0 0 0 5d
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
api-7d9f4b6c5-x2q8p 1/1 Running 0 2d
api-7d9f4b6c5-m9kl7 1/1 Running 0 2d
api-7d9f4b6c5-p4nr2 1/1 Running 0 2d
The Deployment selects ReplicaSets using pod-template-hash. Each ReplicaSet selects pods using both the user-defined labels and the pod-template-hash:
# ReplicaSet's selector (managed by Deployment, not user-editable)
selector:
matchLabels:
app: api
pod-template-hash: "7d9f4b6c5" # added automatically by Deployment
This hash isolation prevents the "dangerous pod adoption" problem — ReplicaSets under a Deployment can't accidentally adopt pods from other ReplicaSet versions.
💡Revision history: tuning rollback depth
By default, Deployments keep 10 revision history entries (10 old ReplicaSets at 0 replicas). Adjust with revisionHistoryLimit:
spec:
revisionHistoryLimit: 3 # keep last 3 ReplicaSets for rollback
Set to 0 to disable rollback history entirely (old RSes deleted immediately after rollout). Use a low value (2-5) in environments with many frequent deployments to avoid cluttering the namespace with empty ReplicaSets.
# Clean up old ReplicaSets manually (if revisionHistoryLimit was 0 or cleanup failed)
kubectl get rs -l app=api | grep " 0 " | awk '{print $1}' | xargs kubectl delete rs
The revision limit doesn't affect currently-active Deployments — only the retained empty RSes for rollback.
When you'd create a ReplicaSet directly
Direct ReplicaSet creation is rare. Two situations where it makes sense:
-
Custom controllers: building an operator that manages pod groups with custom update logic. You create ReplicaSets directly and manage transitions yourself.
-
Testing scenarios: creating a fixed set of pods that must never be updated in place (no rolling update). A bare ReplicaSet can't be triggered into a rolling update — updating the spec replaces the ReplicaSet entirely.
For any production workload where you want updates, rollback, and change tracking, use a Deployment.
You have a Deployment with 3 replicas. During a rolling update (maxUnavailable=1, maxSurge=1), you run `kubectl get pods` and see 4 pods — 3 running the old image and 1 running the new image. All 4 show READY. What is the Deployment doing at this moment?
easyThe rolling update just started. The new ReplicaSet was just scaled up to 1 pod. The old ReplicaSet still has 3 pods. The new pod passed its readiness probe.
AThe Deployment is waiting for manual approval before scaling down the old ReplicaSet
Incorrect.Standard rolling updates don't require manual approval. That's a blue/green deployment pattern with external deployment controller.BThe Deployment has scaled up one new pod (maxSurge=1) and is now about to scale down one old pod (maxUnavailable=1) — it's at the peak of temporary overcapacity before removing an old pod
Correct!Rolling update sequence: (1) scale new RS to 1 (now 3 old + 1 new = 4 total, within maxSurge), (2) wait for new pod to be Ready, (3) scale old RS down by 1 (now 2 old + 1 new = 3 total, exactly minAvailable), (4) scale new RS to 2, and so on. The 4-pod state is the momentary overcapacity before the first old pod is terminated. With maxSurge=1 and maxUnavailable=1, the max total is 4 and minimum available is 2 during the rollout.CThe 4th pod is a spare created by the HPA in response to traffic
Incorrect.HPAs scale based on metrics, not deployment events. HPA scaling is independent of rolling updates.DThe rolling update failed and a manual recovery pod was created
Incorrect.All pods are Running and READY, indicating no failure.
Hint:maxSurge=1 means one extra pod can exist temporarily. What happens after the extra pod is Ready?