Running Stateful Workloads on ECS: Patterns and Hard Limits — Blog

The problem

ECS was designed for stateless workloads. A task dies, a replacement starts somewhere in the cluster, and nothing breaks. That assumption is built into the scheduler. When your application needs stable storage, stable identity, or ordered startup, ECS gives you no built-in mechanism — you have to construct one from lower-level primitives.

This is different from Kubernetes, where StatefulSets are a first-class scheduler concern: ordered pod creation, stable network identity (pod-0.svc.namespace.svc.cluster.local), and per-pod PersistentVolumeClaims.

ECS has none of that. What it does have is enough to cover most real-world stateful cases — as long as you understand what you're actually building.

Why ECS doesn't have StatefulSets

ConceptContainer Orchestration

ECS's task model is intentionally simpler than Kubernetes pods: tasks are ephemeral, placement is advisory, and network identity changes on restart. StatefulSet semantics require the scheduler to track individual instance identity, which adds significant complexity to the control plane.

Prerequisites

ECS task definitions
ECS services and clusters
Basic Kubernetes familiarity

Key Points

ECS tasks get a new IP on every restart. No stable hostname is assigned.
ECS has no per-task persistent volume claim. Storage must be attached externally.
Placement strategies influence where tasks land but do not guarantee it across restarts.
Fargate tasks have no access to host-level storage at all — EFS is the only persistent option.

Pattern 1: EFS for file-based persistence

AWS Elastic File System mounts into a task at a defined path. When the task dies and restarts — on any host in the cluster — it mounts the same filesystem. From the application's perspective, the files are still there.

{
  "volumes": [
    {
      "name": "app-data",
      "efsVolumeConfiguration": {
        "fileSystemId": "fs-0abc1234",
        "rootDirectory": "/data",
        "transitEncryption": "ENABLED"
      }
    }
  ],
  "containerDefinitions": [
    {
      "name": "app",
      "mountPoints": [
        {
          "sourceVolume": "app-data",
          "containerPath": "/var/app/data",
          "readOnly": false
        }
      ]
    }
  ]
}

This works for applications that write to the local filesystem and expect it to survive restarts: upload processors, ML model loaders, legacy apps with file-based state.

⚠Where EFS breaks down

EFS uses NFS semantics. Write latency is typically 1–3 ms, versus under 1 ms for EBS. For databases, that matters — most database engines are not written to tolerate NFS behavior under concurrent writes, and you will see correctness problems, not just performance problems. Do not mount EFS for a PostgreSQL or MySQL data directory.

EFS is also significantly more expensive than EBS per GB. For large, write-heavy workloads, the cost difference is material.

Pattern 2: Placement constraints to pin tasks to an instance

If you need block storage (EBS), you need to pin the task to the specific EC2 instance where the volume is attached. ECS placement constraints let you express this using instance attributes.

{
  "placementConstraints": [
    {
      "type": "memberOf",
      "expression": "attribute:ecs.instance-id == i-0abc12345def67890"
    }
  ]
}

You can also use a custom attribute on the instance and let the task match by attribute, which is more portable across replacements:

{
  "placementConstraints": [
    {
      "type": "memberOf",
      "expression": "attribute:role == db-primary"
    }
  ]
}

Then on the instance: aws ecs put-attributes --attributes "name=role,value=db-primary".

⚠You have just created a single point of failure

Pinning a task to an instance ties the task's availability to that instance's availability. If the host terminates — scheduled maintenance, capacity event, hardware failure — the task does not reschedule elsewhere because no other instance satisfies the constraint. You now have manual failure recovery: attach the EBS volume to a new instance, apply the attribute, wait for ECS to reschedule.

This is operationally equivalent to running a bare EC2 instance, with ECS wrapper around it. For most cases, you are better served by a managed service.

Pattern 3: Sticky sessions for user session state

Application Load Balancer target groups support duration-based stickiness. Once a client is routed to a specific task, subsequent requests from that client go to the same task for the stickiness duration.

This is often described as the ECS answer to session affinity. It is a partial answer at best.

📝Why sticky sessions are not a state management strategy

Sticky sessions keep a client on one task, but the task still restarts eventually — deploys, crashes, scale-in events. When it does, the session is gone. If your application relies on sticky sessions for correctness (not just performance), you have a latent data-loss bug waiting for your next deployment.

The correct solution is an external session store: Redis via ElastiCache, or DynamoDB for serverless workloads. The task becomes stateless again; session state lives outside it.

ECS patterns vs Kubernetes StatefulSet

The core difference is where identity and persistence are managed.

ECS stateful patterns

Persistence requires external services: EFS or EBS with constraints
No stable network identity — tasks get new IPs on restart
Ordered startup is not supported; you manage coordination in application code
Works without Kubernetes operational overhead

Kubernetes StatefulSet

Per-pod PVCs with stable binding across restarts
Stable DNS hostname per pod (pod-0.service.namespace.svc.cluster.local)
Ordered, graceful pod creation and deletion
Requires Kubernetes cluster management

Verdict

If your workload genuinely needs stable identity, ordered startup, or per-instance storage binding, Kubernetes StatefulSets are the right abstraction. ECS patterns can cover file-based persistence and soft session affinity, but they require more manual operation for failure recovery.

When to stop fighting the platform

The honest answer for most stateful workloads on ECS is: don't run them on ECS. Use the managed service.

| Workload | ECS pattern | Better option | |---|---|---| | Relational database | EBS + placement constraint | RDS | | Cache / session store | Sticky sessions | ElastiCache | | Message queue / event stream | EFS or EBS | SQS, MSK | | Object storage | EFS | S3 | | Search index | EFS | OpenSearch |

The ECS patterns are appropriate when you have a migration constraint (code that writes local files, legacy apps not yet refactored for external storage) and need a working deployment while the refactor happens. They are not a long-term architecture.

A legacy application writes uploaded files to /var/app/uploads on local disk. You are migrating it to ECS Fargate. What is the correct storage approach?

medium

The application cannot be rewritten immediately. Files must survive task restarts and be accessible if the task is replaced.

AUse an EBS volume mounted to the Fargate task
Incorrect.Fargate tasks run on AWS-managed infrastructure with no host access. EBS volumes cannot be mounted to Fargate tasks — only EFS is supported.
BMount an EFS volume at /var/app/uploads in the task definition
Correct!EFS is the only shared persistent storage option for Fargate. Files written to the mount path persist across task replacements and are accessible from any task that mounts the same filesystem.
CEnable sticky sessions on the ALB target group
Incorrect.Sticky sessions route clients to the same task, but do nothing to persist files when the task restarts or is replaced.
DIncrease the task ephemeral storage allocation
Incorrect.Ephemeral storage is local to the task and is deleted when the task stops. It does not survive restarts.

Hint:Fargate has no access to EC2 host volumes. The only durable storage option is a managed file service.