EFS: Throughput Modes, Access Points, and the EKS CSI Driver — Blog

What EFS solves that S3 and EBS don't

EBS volumes attach to one EC2 instance at a time (except Multi-Attach in rare configurations). S3 is object storage — no file system semantics, no POSIX, no flock(). EFS fills the gap: a POSIX-compliant file system that multiple instances can mount simultaneously as NFS.

Common production uses: shared configuration files for a fleet of web servers, ML training datasets read by many GPU instances in parallel, container persistent storage in EKS where multiple pods need read-write access to the same files.

EFS architecture: mount targets and availability

ConceptAWS EFS

EFS file systems span an entire region. You create mount targets — one per availability zone — that give instances in each AZ a local NFS endpoint. Instances in us-east-1a use the mount target in us-east-1a. Data is synchronously replicated across AZs for durability.

Prerequisites

NFS protocol
VPC and subnets
availability zones

Key Points

Create one mount target per AZ. Use private subnets. Security group must allow NFS (TCP 2049) from instances.
The file system ID (fs-xxxxxxxx) is the same regardless of which mount target is used.
EFS Standard: data replicated across AZs. EFS One Zone: single AZ, 47% cheaper, lower durability.
EFS One Zone is appropriate for dev environments and data that can be reconstructed (build caches, scratch space).

Throughput modes: which to choose

EFS offers three throughput modes. The choice affects performance characteristics and cost significantly.

Bursting (original default): throughput scales with storage size. 50 MB/s baseline per TB stored, with burst capability using accumulated credits. New file systems have a burst credit balance. If your workload exceeds baseline throughput for extended periods, credits deplete and performance drops to baseline. This is the classic EFS performance cliff: tests work fine, production fails when credits run out.

Elastic (recommended default as of 2023): automatically scales throughput up to 3 GB/s for reads and 1 GB/s for writes based on actual workload. No credit system. You pay for throughput actually used, not provisioned. For most workloads, Elastic eliminates the burst credit management problem.

Provisioned: you specify a throughput value (MB/s) that is always available regardless of storage size. Use when you have a small file system but need consistently high throughput (e.g., 10 GB file system needing 500 MB/s).

resource "aws_efs_file_system" "shared" {
  creation_token = "shared-storage"
  encrypted      = true
  kms_key_id     = aws_kms_key.efs.arn

  throughput_mode = "elastic"   # recommended for most workloads

  lifecycle_policy {
    transition_to_ia = "AFTER_30_DAYS"  # move infrequently accessed files to IA storage class
  }

  lifecycle_policy {
    transition_to_primary_storage_class = "AFTER_1_ACCESS"  # move back to Standard on access
  }

  tags = {
    Name = "shared-storage"
  }
}

resource "aws_efs_mount_target" "az1" {
  file_system_id  = aws_efs_file_system.shared.id
  subnet_id       = aws_subnet.private_az1.id
  security_groups = [aws_security_group.efs.id]
}

resource "aws_efs_mount_target" "az2" {
  file_system_id  = aws_efs_file_system.shared.id
  subnet_id       = aws_subnet.private_az2.id
  security_groups = [aws_security_group.efs.id]
}

Access points: per-application file system views

EFS access points provide application-specific entry points into a shared file system. Each access point can enforce a specific root directory path and a specific POSIX user/group identity — even if the calling process runs as a different UID.

This solves the multi-tenant problem for containers: multiple pods can share one EFS file system with each pod isolated to its own subdirectory and UID, without needing to configure each container's user mapping.

resource "aws_efs_access_point" "app_data" {
  file_system_id = aws_efs_file_system.shared.id

  root_directory {
    path = "/app-data"
    creation_info {
      owner_gid   = 1000
      owner_uid   = 1000
      permissions = "755"
    }
  }

  posix_user {
    gid = 1000
    uid = 1000
  }
}

When a container mounts via this access point, it sees /app-data as the root /, and all file operations run as UID/GID 1000 — regardless of what user the container runs as.

💡EFS CSI driver: dynamic PVC provisioning in EKS

The EFS CSI driver enables Kubernetes pods to use EFS as persistent volumes. Install it as an EKS add-on:

aws eks create-addon \
  --cluster-name my-cluster \
  --addon-name aws-efs-csi-driver \
  --service-account-role-arn arn:aws:iam::123456789012:role/EFSCSIRole

Create a StorageClass referencing your file system:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: efs-sc
provisioner: efs.csi.aws.com
parameters:
  provisioningMode: efs-ap          # dynamic provisioning via access points
  fileSystemId: fs-xxxxxxxx
  directoryPerms: "700"
  basePath: "/dynamic-provisioning" # root directory for auto-created subdirs

PVCs using this StorageClass automatically create access points and subdirectories:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: app-storage
spec:
  accessModes:
    - ReadWriteMany              # multiple pods can mount this PVC simultaneously
  storageClassName: efs-sc
  resources:
    requests:
      storage: 5Gi               # EFS doesn't enforce capacity — this is informational

ReadWriteMany is EFS's key advantage over EBS (which only supports ReadWriteOnce). Multiple pods on different nodes can mount and write to the same PVC simultaneously.

Note: EFS 5Gi storage request is informational only — EFS scales automatically and doesn't enforce the requested size. The value affects Kubernetes scheduling calculations but has no effect on actual storage.

EFS vs EBS vs S3 decision framework

| Factor | EFS | EBS | S3 | |---|---|---|---| | Multiple simultaneous writers | Yes | No (except Multi-Attach) | Object-level only | | POSIX semantics | Yes | Yes | No | | Performance | NFS latency (~1ms) | Sub-ms | Higher (HTTP) | | Scale | Auto-scales | Fixed provisioned size | Auto-scales | | Cost (per GB) | Higher | Medium | Lower | | EKS access mode | ReadWriteMany | ReadWriteOnce | Via SDK |

EFS is the right choice when multiple pods or instances need concurrent read-write access to shared files with POSIX semantics. For single-pod persistent storage, EBS is faster and cheaper. For large objects accessed via API, S3 is far cheaper.

An EKS workload uses EFS with Bursting throughput mode. The file system stores 100 GB. During development, the service performs well. After launch, sustained write-heavy traffic causes intermittent performance degradation that resolves after periods of low activity. What is happening?

medium

No errors in the application logs. EFS CloudWatch metrics show the BurstCreditBalance metric dropping to zero during degraded periods and recovering during low-traffic periods.

AEFS has a maximum throughput limit that is being hit
Incorrect.EFS does have throughput limits, but the CloudWatch evidence points to burst credit depletion, not a hard limit ceiling.
BBurst credits are being depleted by sustained I/O. With 100 GB stored, baseline throughput is only 5 MB/s. When credits hit zero, throughput drops to baseline — recovering only as credits accumulate during low-traffic periods
Correct!EFS Bursting mode provides 50 MB/s per TB stored, so a 100 GB file system has ~5 MB/s baseline. Burst credits enable higher throughput temporarily, but sustained writes deplete them. When BurstCreditBalance hits zero, throughput falls to 5 MB/s. Credits only regenerate during periods below baseline, explaining the pattern. Fix: switch to Elastic throughput mode (automatically scales to actual demand) or Provisioned if you need a specific throughput guarantee.
CThe EFS mount target is in the wrong availability zone, causing cross-AZ latency
Incorrect.Cross-AZ EFS access adds latency, but the intermittent pattern tied to traffic levels points to burst credit exhaustion, not latency.
DNFS client caching is causing write conflicts between pods
Incorrect.NFS client caching can cause stale data issues, but not the bursty performance degradation tied to traffic levels described here.

Hint:The BurstCreditBalance metric is the key clue. What does it measure and what happens when it hits zero?