Kubernetes StatefulSets: Stable Identity, Persistent Storage, and When to Use Them
StatefulSets give each pod a stable name, a stable DNS record, and a persistent volume that follows it across reschedules. That stability is what databases and clustered applications need — and what makes StatefulSets harder to operate than Deployments.
What makes pods "stateful"
A Deployment's pods are interchangeable — any pod can handle any request, and Kubernetes can kill and replace them in any order. When a pod is replaced, it starts fresh with no memory of the previous pod.
A StatefulSet's pods are not interchangeable:
- Each has a stable name:
redis-0,redis-1,redis-2(not random hashes). - Each has a stable DNS record:
redis-0.redis-svc.namespace.svc.cluster.local. - Each has its own PersistentVolume that is bound to that pod's identity — if
redis-0is rescheduled to a different node, Kubernetes mounts the same PVC to the new pod.
This matters for clustered databases. A Redis Cluster node or a Kafka broker needs a stable address because other nodes store its address in their configuration. If the pod's address changes on reschedule, the cluster breaks.
StatefulSet pod identity and headless services
ConceptKubernetesStatefulSets require a headless Service (clusterIP: None) to give each pod a stable DNS record. The headless service does not load-balance — it returns DNS A records directly for each pod IP.
Prerequisites
- Kubernetes Services
- DNS in Kubernetes
- PersistentVolumes
Key Points
- Pods are named <statefulset-name>-<ordinal>: redis-0, redis-1, redis-2.
- Headless Service provides per-pod DNS: <pod-name>.<service-name>.<namespace>.svc.cluster.local.
- PVC templates create a dedicated PVC per pod. Deleting the StatefulSet does not delete PVCs — data is preserved.
- Pods start in order (0 → 1 → 2) and terminate in reverse (2 → 1 → 0) by default.
A StatefulSet for Redis Sentinel
apiVersion: v1
kind: Service
metadata:
name: redis-svc
spec:
clusterIP: None # headless — returns pod IPs directly, no load balancing
selector:
app: redis
ports:
- port: 6379
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
spec:
serviceName: redis-svc # must reference the headless service
replicas: 3
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:7.0
volumeMounts:
- name: data
mountPath: /data
volumeClaimTemplates: # creates a PVC per pod
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: gp3
resources:
requests:
storage: 10Gi
This creates:
redis-0with PVCdata-redis-0redis-1with PVCdata-redis-1redis-2with PVCdata-redis-2
DNS records: redis-0.redis-svc.default.svc.cluster.local, etc.
If redis-1 is evicted and rescheduled on another node, it remounts data-redis-1 and resumes from where it left off.
Ordered startup and why it matters
StatefulSets start pods sequentially: redis-0 must be Running and Ready before redis-1 starts. This is critical for clustered applications that require a specific initialization sequence.
For a database cluster (PostgreSQL with Patroni, MySQL with InnoDB Cluster), the first pod initializes as the primary. Subsequent pods join as replicas. If they all start simultaneously before the primary is ready, each tries to initialize as primary and the cluster is corrupted.
# Watch ordered startup
kubectl get pods -w
NAME READY STATUS RESTARTS
redis-0 0/1 Pending 0 ← starts first
redis-0 1/1 Running 0
redis-1 0/1 Pending 0 ← starts after redis-0 is Ready
redis-1 1/1 Running 0
redis-2 0/1 Pending 0
redis-2 1/1 Running 0
For workloads that do not need ordered startup (multiple independent Cassandra nodes), set podManagementPolicy: Parallel to start all pods simultaneously.
StatefulSet vs Deployment
The choice is determined by whether your application needs stable pod identity and persistent storage. Most applications do not need StatefulSets.
- Pods are anonymous and interchangeable
- Any pod can be killed and replaced in any order
- Horizontal scaling is instant — add pods, remove pods
- Works with shared storage (ReadWriteMany) or no persistent storage
- Use for: web servers, API services, background workers, anything stateless
- Each pod has a stable name, DNS record, and dedicated PVC
- Ordered startup and shutdown by default
- Scaling down does not delete PVCs — you must manually clean up storage
- Rolling updates are sequential and slower
- Use for: databases (Postgres, MySQL, Cassandra), message brokers (Kafka), caches (Redis Cluster)
Default to Deployments. Use StatefulSets only when an application explicitly requires stable pod identity, stable network addresses, or per-pod persistent storage. For most database workloads on AWS, a managed service (RDS, ElastiCache, MSK) is preferable to running a StatefulSet — you get similar guarantees with less operational burden.
⚠StatefulSet scaling: the PVC cleanup problem
When you scale down a StatefulSet from 3 replicas to 2, Kubernetes terminates redis-2 but keeps its PVC (data-redis-2). This is intentional — data loss from accidental scale-down would be unacceptable.
The side effect: scale-down leaves orphaned PVCs and their underlying volumes (EBS volumes on AWS), which continue to accrue storage charges. You must manually delete PVCs after confirming the data is no longer needed.
# After scaling down StatefulSet to 2:
kubectl get pvc
NAME STATUS VOLUME CAPACITY ...
data-redis-0 Bound pv-abc 10Gi ← still in use
data-redis-1 Bound pv-def 10Gi ← still in use
data-redis-2 Bound pv-ghi 10Gi ← orphaned, costs money
kubectl delete pvc data-redis-2 # manual cleanup required
A StatefulSet for a 3-node database cluster is upgraded (new image). After the rolling update, replica pods cannot connect to the primary because the primary's hostname has changed. What is the most likely configuration issue?
mediumThe StatefulSet previously had a different serviceName. It was recreated with a new serviceName to match a renamed headless Service.
AThe StatefulSet needs podManagementPolicy: Parallel for database clusters
Incorrect.Parallel pod management starts pods simultaneously. This would actually make the problem worse for a database cluster that needs sequential initialization. And it does not affect DNS records.BRenaming the headless Service changed all pod DNS records, breaking cluster member references
Correct!Pod DNS records follow the pattern <pod>.<serviceName>.<namespace>.svc.cluster.local. If the serviceName changed, all pod DNS records changed. Replica nodes that stored the old primary address (pod-0.old-service.namespace.svc.cluster.local) can no longer resolve it. The fix: never rename a headless Service for a running StatefulSet without updating all stored cluster membership configurations. DNS records are the stable identity — changing them is equivalent to changing the cluster member addresses.CThe StatefulSet must be deleted and recreated to update serviceName
Incorrect.serviceName is immutable on a StatefulSet — you cannot update it in place. But the issue described is what happens after the service was renamed, not the mechanics of the change.DPersistent volumes are tied to the old pod names and cannot be remounted
Incorrect.PVCs are tied to pod names (data-redis-0, etc.) based on the StatefulSet name, not the service name. PVC binding is not affected by renaming the headless Service.
Hint:Pod DNS records in a StatefulSet include the serviceName. Think about what happens to those records if the serviceName changes.