ReplicaSets and Deployments: Why You Almost Never Create a ReplicaSet Directly

2 min readCloud Infrastructure

A ReplicaSet keeps N pod replicas running. A Deployment manages ReplicaSets and adds rolling updates, rollback history, and update strategies on top. You almost always create Deployments, not ReplicaSets — but understanding the relationship between them explains what happens during every deployment.

awsekskubernetes

What a ReplicaSet does

A ReplicaSet's sole responsibility is maintaining a desired number of pod replicas. It watches for pods matching its label selector. If a pod is deleted, it creates a replacement. If there are too many pods matching its selector, it deletes the excess. It doesn't know how to update pods — that's not its job.

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api
      version: v1.2.3
  template:
    metadata:
      labels:
        app: api
        version: v1.2.3
    spec:
      containers:
      - name: api
        image: myapp:1.2.3

Label selector ownership: how ReplicaSets adopt pods

GotchaKubernetes

A ReplicaSet owns pods by label selector, not by explicit reference. Any pod matching the selector is counted toward the desired replicas. This has a dangerous implication: a manually created pod with matching labels will be adopted by the ReplicaSet, and if that puts the count over the desired count, the ReplicaSet will delete one of the pods — possibly the manually created one.

Prerequisites

  • Kubernetes labels and selectors
  • pod lifecycle

Key Points

  • Pods are owned by label selector match, not by a parent reference in the pod spec.
  • Adding a pod with matching labels reduces the ReplicaSet's count — it may delete pods to compensate.
  • Removing a label from a pod 'releases' it from the ReplicaSet — the RS creates a new pod to maintain count.
  • The selector is immutable once the ReplicaSet is created — changing it requires deleting and recreating.

Why Deployments exist: the update problem

If you need to update a ReplicaSet (new image version), you have to delete the old ReplicaSet and create a new one. During the transition, you either have 0 pods (downtime) or you manage the overlap manually. ReplicaSets have no concept of gradual rollout.

Deployments solve this by owning multiple ReplicaSets and managing transitions between them:

Deployment: api
├── ReplicaSet: api-7d9f4b6c5 (image: myapp:1.2.2) → 0 replicas (after rollout)
└── ReplicaSet: api-8a3c2e1f7 (image: myapp:1.2.3) → 3 replicas (current)

During a rolling update:

  1. Deployment creates a new ReplicaSet with the updated pod template
  2. New RS scales up by maxSurge pods
  3. Old RS scales down by maxUnavailable pods
  4. Repeat until new RS has all replicas and old RS has 0

The old ReplicaSet is kept (with 0 replicas) as rollback history. kubectl rollout undo deployment/api scales the old RS back up and scales the new RS down.

# View rollout status
kubectl rollout status deployment/api

# View rollout history
kubectl rollout history deployment/api

# Rollback to previous version
kubectl rollout undo deployment/api

# Rollback to a specific revision
kubectl rollout undo deployment/api --to-revision=2

# View what changed in a specific revision
kubectl rollout history deployment/api --revision=3

How the ReplicaSet selector connects to Deployment

Every Deployment generates ReplicaSet names by hashing the pod template. The hash appears in both the ReplicaSet name and the pod names:

$ kubectl get replicasets
NAME               DESIRED   CURRENT   READY   AGE
api-7d9f4b6c5      3         3         3       2d
api-8a3c2e1f7      0         0         0       5d

$ kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
api-7d9f4b6c5-x2q8p      1/1     Running   0          2d
api-7d9f4b6c5-m9kl7      1/1     Running   0          2d
api-7d9f4b6c5-p4nr2      1/1     Running   0          2d

The Deployment selects ReplicaSets using pod-template-hash. Each ReplicaSet selects pods using both the user-defined labels and the pod-template-hash:

# ReplicaSet's selector (managed by Deployment, not user-editable)
selector:
  matchLabels:
    app: api
    pod-template-hash: "7d9f4b6c5"  # added automatically by Deployment

This hash isolation prevents the "dangerous pod adoption" problem — ReplicaSets under a Deployment can't accidentally adopt pods from other ReplicaSet versions.

💡Revision history: tuning rollback depth

By default, Deployments keep 10 revision history entries (10 old ReplicaSets at 0 replicas). Adjust with revisionHistoryLimit:

spec:
  revisionHistoryLimit: 3   # keep last 3 ReplicaSets for rollback

Set to 0 to disable rollback history entirely (old RSes deleted immediately after rollout). Use a low value (2-5) in environments with many frequent deployments to avoid cluttering the namespace with empty ReplicaSets.

# Clean up old ReplicaSets manually (if revisionHistoryLimit was 0 or cleanup failed)
kubectl get rs -l app=api | grep " 0 " | awk '{print $1}' | xargs kubectl delete rs

The revision limit doesn't affect currently-active Deployments — only the retained empty RSes for rollback.

When you'd create a ReplicaSet directly

Direct ReplicaSet creation is rare. Two situations where it makes sense:

  1. Custom controllers: building an operator that manages pod groups with custom update logic. You create ReplicaSets directly and manage transitions yourself.

  2. Testing scenarios: creating a fixed set of pods that must never be updated in place (no rolling update). A bare ReplicaSet can't be triggered into a rolling update — updating the spec replaces the ReplicaSet entirely.

For any production workload where you want updates, rollback, and change tracking, use a Deployment.

You have a Deployment with 3 replicas. During a rolling update (maxUnavailable=1, maxSurge=1), you run `kubectl get pods` and see 4 pods — 3 running the old image and 1 running the new image. All 4 show READY. What is the Deployment doing at this moment?

easy

The rolling update just started. The new ReplicaSet was just scaled up to 1 pod. The old ReplicaSet still has 3 pods. The new pod passed its readiness probe.

  • AThe Deployment is waiting for manual approval before scaling down the old ReplicaSet
    Incorrect.Standard rolling updates don't require manual approval. That's a blue/green deployment pattern with external deployment controller.
  • BThe Deployment has scaled up one new pod (maxSurge=1) and is now about to scale down one old pod (maxUnavailable=1) — it's at the peak of temporary overcapacity before removing an old pod
    Correct!Rolling update sequence: (1) scale new RS to 1 (now 3 old + 1 new = 4 total, within maxSurge), (2) wait for new pod to be Ready, (3) scale old RS down by 1 (now 2 old + 1 new = 3 total, exactly minAvailable), (4) scale new RS to 2, and so on. The 4-pod state is the momentary overcapacity before the first old pod is terminated. With maxSurge=1 and maxUnavailable=1, the max total is 4 and minimum available is 2 during the rollout.
  • CThe 4th pod is a spare created by the HPA in response to traffic
    Incorrect.HPAs scale based on metrics, not deployment events. HPA scaling is independent of rolling updates.
  • DThe rolling update failed and a manual recovery pod was created
    Incorrect.All pods are Running and READY, indicating no failure.

Hint:maxSurge=1 means one extra pod can exist temporarily. What happens after the extra pod is Ready?