CloudFront: How CDN Caching and Edge Logic Actually Work

3 min readCloud Infrastructure

CloudFront reduces latency by serving content from the edge location nearest to the user. Understanding distributions, cache behaviors, origin access control, and Lambda@Edge makes the difference between a CDN that helps and one that causes stale content bugs.

awscloudfrontcdnedge

What CloudFront actually does

CloudFront is a CDN — a global network of edge locations that cache content close to users. When a user in Tokyo requests your site hosted in us-east-1, without a CDN the request travels ~7,000 miles. With a CloudFront edge location in Tokyo, it travels tens of miles to the edge, which serves cached content or fetches from origin once and caches the response.

The performance improvement is primarily about reducing round-trip time. A single HTTPS request requires multiple round-trips (DNS, TCP, TLS, HTTP). Reducing the geographic distance cuts each round-trip from 150ms to 10ms. For a page that requires 4 sequential requests, that is 560ms saved before you touch a single byte of HTML.

Distributions, behaviors, and origins

ConceptAWS CDN

A CloudFront distribution is the top-level unit — it has a domain name (d1234.cloudfront.net or your custom domain) and routes requests to origins. Behaviors define which URL patterns map to which origins, with what caching rules.

Prerequisites

  • HTTP caching
  • DNS and CNAME records
  • S3 and ALB basics

Key Points

  • One distribution can have multiple origins (S3 bucket, ALB, API Gateway, custom HTTP endpoint).
  • Behaviors route URL patterns to origins: /api/* → ALB, /static/* → S3, /* → default origin.
  • Each behavior has its own cache policy (TTL, which headers/cookies/query strings to cache on).
  • CloudFront edge locations serve from cache. On a cache miss, they forward to origin and cache the response.

Cache behaviors and what determines cache hits

A cache behavior specifies how CloudFront caches a URL pattern. The most important settings:

Cache key: what makes two requests "the same request" for caching purposes. By default, only the URL path. If your responses vary by Accept-Language header, add that header to the cache key — otherwise all languages get the same cached response.

TTL: how long CloudFront holds a cached response before checking origin for a fresh copy. CloudFront honors Cache-Control: max-age=X from your origin. You set a minimum and maximum TTL in the behavior to constrain what origins can request.

Forwarding: which headers, cookies, and query strings are forwarded to origin. Do not forward everything — it defeats caching (each unique combination becomes a separate cache entry). Forward only what your origin needs to generate the response.

URL: /api/user?id=42&lang=en
Cache key (wrong): /api/user?id=42&lang=en
→ Caches a separate response for every user and language combination
→ Effectively no caching

Cache key (right for static assets): /static/logo.png
→ One cache entry per file, shared by all users

Origin Access Control for S3

If your CloudFront distribution serves content from an S3 bucket, you need to prevent users from bypassing CloudFront and accessing S3 directly. The modern way is Origin Access Control (OAC).

With OAC:

  1. CloudFront signs requests to S3 using SigV4.
  2. Your S3 bucket policy allows s3:GetObject only from your CloudFront distribution's ARN.
  3. The bucket is private — direct S3 URLs return 403.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "cloudfront.amazonaws.com"
      },
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::my-bucket/*",
      "Condition": {
        "StringEquals": {
          "AWS:SourceArn": "arn:aws:cloudfront::123456789012:distribution/DISTRIBUTION_ID"
        }
      }
    }
  ]
}

OAC replaces the older Origin Access Identity (OAI) which used a special CloudFront IAM principal rather than signing requests.

Lambda@Edge vs CloudFront Functions

Both allow you to run code at CloudFront edge locations. They differ in when they run and what they can do.

CloudFront Functions: lightweight JavaScript running at the edge in microseconds. Good for: URL rewrites, header manipulation, simple auth checks. Cannot make network requests or access external services. Maximum execution time: 1ms.

// CloudFront Function: rewrite /blog → /blog/index.html
function handler(event) {
    var request = event.request;
    if (request.uri.endsWith('/blog')) {
        request.uri = '/blog/index.html';
    }
    return request;
}

Lambda@Edge: full Lambda running in the nearest regional edge location. Can make HTTP calls, access DynamoDB, read from S3. Maximum execution time: 5 seconds (origin request/response), 1 second (viewer request/response). ~10x more expensive than CloudFront Functions.

Lambda@Edge vs CloudFront Functions

Use CloudFront Functions for simple request/response manipulation. Use Lambda@Edge when you need network access, stateful logic, or external service calls.

CloudFront Functions
  • Microsecond execution — runs at all 400+ edge locations
  • JavaScript only, no external network calls
  • URL rewrites, header normalization, simple auth token validation
  • Free tier: 2M invocations/month; $0.10/1M after
Lambda@Edge
  • Millisecond execution — runs at ~13 regional edge locations
  • Any Lambda runtime, can call external APIs
  • A/B testing, personalization, auth with external IdP, dynamic origin selection
  • $0.60/1M invocations + duration charges
Verdict

Default to CloudFront Functions for header/URL manipulation. Use Lambda@Edge when you need to make an external service call or implement logic that requires more than basic JavaScript — accept the higher cost and regional deployment scope.

WAF integration

CloudFront integrates with AWS WAF to inspect and block requests at the edge before they reach your origin. WAF rules run on the CloudFront edge, so blocked requests never consume origin resources.

Common patterns:

  • Rate limiting: block IPs sending more than N requests per 5 minutes
  • Geographic blocking: drop requests from specific countries
  • SQL injection / XSS rules: AWS Managed Rules group blocks common attack patterns
  • Bot control: distinguish scrapers, crawlers, and automation from legitimate traffic
resource "aws_wafv2_web_acl" "cloudfront" {
  name  = "cloudfront-acl"
  scope = "CLOUDFRONT"  # must be us-east-1 for CloudFront
  # ...
}

resource "aws_cloudfront_distribution" "main" {
  web_acl_id = aws_wafv2_web_acl.cloudfront.arn
  # ...
}

WAF for CloudFront must be created in us-east-1 regardless of where your origin is — CloudFront's control plane is in us-east-1.

📝The intermediate domain pattern for large organizations

Large organizations sometimes route traffic through an intermediate subdomain before CloudFront:

user → dev.example.com → dev.internal.example.com (CNAME) → CloudFront → origin

The reasons are operational, not technical:

CDN portability: the public-facing DNS record (dev.example.com) points to an internal subdomain they control. If they switch from CloudFront to another CDN, they update one internal CNAME — not every public DNS record.

Multi-CDN routing: Route 53 health checks or latency routing can direct some users to CloudFront and others to a different CDN for redundancy, all behind the same public domain.

Compliance: some organizations require all traffic to pass through a WAF layer they operate before reaching the CDN. The intermediate layer is where that WAF lives.

For most applications, pointing the public domain directly to CloudFront via CNAME (or Route 53 Alias for the apex domain) is simpler and sufficient.

After deploying a new version of your application, users continue to see the old version for up to 24 hours despite the S3 objects being updated. What is the most likely cause and fix?

easy

The CloudFront distribution serves static assets from S3. The cache behavior has a max-age of 86400 seconds (24 hours). Assets are updated by overwriting S3 objects with the same keys.

  • ACloudFront is not detecting S3 object updates automatically
    Incorrect.CloudFront does not poll S3 for updates. It holds the cached version until the TTL expires. This is expected behavior, not a failure.
  • BThe TTL is too long — cached responses are served until the 24-hour TTL expires, requiring a cache invalidation or versioned asset paths
    Correct!When CloudFront caches a response, it serves that cached version for the full TTL regardless of origin changes. Two fixes: (1) Issue a CloudFront invalidation after deployment (costs $0.005 per path, first 1000 free monthly). (2) Better: use versioned asset paths — include a content hash in the filename (logo.abc123.png). New deployments create new filenames, old caches remain valid, and users always get fresh assets without invalidations.
  • CS3 object replication lag is causing the CDN to serve stale content
    Incorrect.S3 does not replicate objects within a region — a PUT is immediately consistent for subsequent GETs (S3 has strong consistency). The issue is CloudFront's cache, not S3 consistency.
  • DCloudFront needs to be redeployed after each S3 update
    Incorrect.CloudFront distributions do not need redeployment for content updates. The distribution configuration is separate from cached content.

Hint:Think about what TTL means for the relationship between origin updates and what users receive.