ECS Cluster Autoscaling: How Capacity Providers Scale EC2 Infrastructure

3 min readCloud Infrastructure

ECS cluster autoscaling uses capacity providers to manage EC2 instance count based on task resource needs. Understanding the CapacityProviderReservation metric, target tracking, and the interaction between service scaling and cluster scaling prevents the scaling lag that causes task placement failures.

awsecsautoscalingcapacity-providers

Two levels of scaling in ECS

ECS has two independent scaling concerns:

  1. Service scaling: adjusting the number of tasks (Application Auto Scaling, based on CPU/memory/custom metrics).
  2. Cluster scaling: adjusting the number of EC2 instances in the cluster (Auto Scaling Group, based on whether there is capacity to run tasks).

When only service scaling is configured, scaling up tasks can fail with "no container instances with sufficient resources" if the cluster has no capacity headroom. Capacity providers bridge these two layers.

Capacity provider mechanics

ConceptAWS ECS

A capacity provider wraps an EC2 Auto Scaling Group. ECS tracks how much of the ASG's capacity is used and creates a metric (CapacityProviderReservation) that drives instance scaling. The metric targets a percentage — when tasks demand more than available, new instances launch.

Prerequisites

  • ECS services and tasks
  • EC2 Auto Scaling Groups
  • CloudWatch target tracking

Key Points

  • CapacityProviderReservation = (tasks needing instances / total instance capacity) × 100.
  • Target tracking keeps CPR at the configured targetCapacity (e.g., 100 = use all capacity before scaling out).
  • When a task cannot be placed, CPR exceeds 100 — triggers ASG scale-out.
  • Scale-in is conservative: ECS only terminates instances after draining tasks and waiting for the scale-in cooldown.

How the CapacityProviderReservation metric works

ECS computes this metric per capacity provider:

CPR = (N_scheduled / N_provisioned) × 100

N_scheduled = number of tasks placed or pending on the ASG's instances
N_provisioned = total available capacity (instance count × task slots per instance)

When a service scales up (more tasks), more capacity is needed. If current instances are full, pending tasks raise the N_scheduled count without a corresponding N_provisioned increase — CPR climbs above 100.

targetCapacity: 100
Current:  4 instances, 8 task slots, 8 tasks running → CPR = 100
Scale-up: service adds 4 more tasks → 8 running + 4 pending → CPR = 150
          → CloudWatch alarm triggers ASG scale-out → 2 new instances
          → 4 pending tasks placed
          → CPR returns to 100

Setting targetCapacity: 100 means run instances at full utilization before scaling. Setting targetCapacity: 70 keeps 30% headroom — scale-out happens earlier, reducing task placement latency. The tradeoff is cost vs responsiveness.

Managed instance termination protection

When ECS scales in (instances to terminate), managed termination protection prevents tasks from being killed mid-execution:

  1. ECS marks instances it wants to remove.
  2. ECS drains tasks from marked instances (deregisters from load balancer, waits for in-flight requests).
  3. Once all tasks are stopped, instance protection is removed.
  4. ASG terminates the instance.
resource "aws_ecs_capacity_provider" "main" {
  name = "main-ec2"

  auto_scaling_group_provider {
    auto_scaling_group_arn = aws_autoscaling_group.ecs.arn

    managed_scaling {
      maximum_scaling_step_size = 4
      minimum_scaling_step_size = 1
      status                    = "ENABLED"
      target_capacity           = 100
    }

    managed_termination_protection = "ENABLED"
  }
}

Without managed termination protection, the ASG might terminate instances before tasks finish — tasks are killed mid-execution instead of draining gracefully.

Mixing On-Demand and Spot with capacity providers

Real clusters often combine On-Demand and Spot instances for cost optimization. Two capacity providers on a single cluster handle this:

resource "aws_ecs_cluster_capacity_providers" "main" {
  cluster_name = aws_ecs_cluster.main.name

  capacity_providers = [
    aws_ecs_capacity_provider.on_demand.name,
    aws_ecs_capacity_provider.spot.name,
  ]

  default_capacity_provider_strategy {
    base              = 2                                         # always 2 On-Demand tasks minimum
    weight            = 1
    capacity_provider = aws_ecs_capacity_provider.on_demand.name
  }

  default_capacity_provider_strategy {
    base              = 0
    weight            = 3
    capacity_provider = aws_ecs_capacity_provider.spot.name
  }
}

base: 2 ensures at least 2 tasks run on On-Demand regardless of scaling. weight: 3 for Spot vs weight: 1 for On-Demand means 3 out of 4 additional tasks go to Spot. During Spot interruptions, ECS reschedules interrupted tasks on available capacity — the On-Demand base keeps the service alive.

💡Fargate vs EC2 capacity providers: when to use which

Fargate capacity providers (FARGATE and FARGATE_SPOT) are simpler — you do not manage instances. AWS provisions infrastructure based on task resource requests.

Use Fargate when:

  • You want zero infrastructure management (no AMI updates, no instance type selection)
  • Workload is bursty and unpredictable — tasks start in seconds without waiting for instance launch
  • Cost at small scale is less important than operational simplicity

Use EC2 capacity providers when:

  • You need instance types not available in Fargate (GPU instances, high-memory)
  • Cost optimization matters at scale — EC2 Reserved Instances or Savings Plans provide better rates than Fargate
  • You need specific networking (ENI trunking, placement on dedicated hosts)
  • Tasks need access to local instance storage or specific kernel configurations

Mixed fleets (Fargate for burstable capacity, EC2 for baseline) are also common — Fargate handles spikes while On-Demand EC2 handles steady-state traffic at lower cost.

A service running on ECS EC2 capacity providers scales from 5 to 15 tasks. New tasks stay pending for 5 minutes before instances launch and tasks are placed. What is the most likely cause?

medium

The capacity provider has targetCapacity=100. The ASG has a scale-out cooldown of 300 seconds. The EC2 instances take 2 minutes to launch and register with ECS.

  • AECS is waiting for all 10 new tasks to be requested before initiating scale-out
    Incorrect.ECS triggers scale-out based on CPR exceeding the target capacity. The first unplaceable pending task causes CPR > 100, triggering the CloudWatch alarm. There is no batching requirement.
  • BThe 300-second scale-out cooldown is preventing immediate scale-out after the metric threshold is crossed
    Correct!The ASG scale-out cooldown (300 seconds = 5 minutes) prevents a new scale-out action from triggering until 5 minutes after the previous action. If the service scaled out recently or the ASG just launched instances, the cooldown blocks further scale-out. With a 2-minute instance launch time, a 5-minute cooldown means tasks wait up to 5 minutes. Reduce the scale-out cooldown (scale-in cooldown can stay longer to prevent flapping).
  • CThe CapacityProviderReservation metric has a 5-minute publication delay
    Incorrect.CPR is a custom CloudWatch metric published by ECS. It has a 1-minute publication interval, not 5 minutes.
  • DThe ASG maximum capacity is set too low
    Incorrect.If the ASG max were too low, instances would not launch at all, not launch after a 5-minute delay.

Hint:Think about what mechanism could impose a 5-minute wait before more instances are added.