LFU Caching: Frequency Buckets, the Cold Start Problem, and When to Choose It Over LRU — Blog

The core idea: access frequency as a proxy for future value

LRU evicts the item that was accessed least recently. LFU evicts the item that has been accessed least often. For some access patterns, these decisions diverge significantly.

LRU assumes recency predicts future use — the item you just accessed is likely to be accessed again soon. LFU assumes frequency predicts future use — the item accessed 1,000 times is likely to be accessed again, even if the last access was an hour ago.

For content platforms with stable popularity (evergreen articles, popular product pages, reference documentation), an item accessed 500 times over the last week is more valuable to keep than one accessed twice in the last minute. LRU doesn't capture this. LFU does.

LFU mechanics and the cold start problem

ConceptCaching Algorithms

LFU tracks an access frequency counter per cached item. On eviction, the item with the lowest frequency is removed. Ties are broken by recency (the least recently accessed among minimum-frequency items). New items start at frequency 1 — making them immediately vulnerable to eviction before they can accumulate frequency.

Prerequisites

hash maps
heap data structures
doubly linked lists

Key Points

Each item has a frequency counter incremented on every access (get or put).
Eviction targets the item with the minimum frequency. Ties broken by LRU within that frequency.
Cold start: new items enter at frequency=1. If cache is under pressure, they evict before building any history.
Naïve heap implementation is O(log n) per operation. Frequency bucket (doubly linked list per frequency) achieves O(1).

The O(log n) implementation: heap with lazy deletion

A min-heap keyed on frequency gives O(log n) get and put. Lazy deletion handles the case where an item's frequency in the heap is stale (the item was accessed and its counter incremented, but the old heap entry was never removed).

import heapq

class LFUCache:
    def __init__(self, capacity: int):
        self.capacity = capacity
        self.data = {}
        self.freq = {}
        self.min_heap = []  # (frequency, insertion_order, key)
        self.counter = 0    # tie-breaking for same frequency

    def get(self, key) -> any:
        if key not in self.data:
            return -1
        self.freq[key] += 1
        self.counter += 1
        heapq.heappush(self.min_heap, (self.freq[key], self.counter, key))
        return self.data[key]

    def put(self, key, value) -> None:
        if self.capacity == 0:
            return

        if key in self.data:
            self.data[key] = value
            self.get(key)  # reuse get to update frequency
        else:
            if len(self.data) >= self.capacity:
                self._evict()
            self.data[key] = value
            self.freq[key] = 1
            self.counter += 1
            heapq.heappush(self.min_heap, (1, self.counter, key))

    def _evict(self):
        while self.min_heap:
            freq, _, evict_key = heapq.heappop(self.min_heap)
            # Lazy deletion: skip stale entries
            if evict_key in self.data and freq == self.freq[evict_key]:
                del self.data[evict_key]
                del self.freq[evict_key]
                return

The insertion order counter breaks ties between items with equal frequency — the oldest insertion among minimum-frequency items evicts first (LRU tiebreak).

The heap accumulates stale entries as items are accessed. In the worst case (every item accessed many times), the heap can grow to O(n × max_frequency) entries. This matters for long-running caches.

The O(1) implementation: frequency buckets

The key insight: you don't need a sorted heap. You only need to find and remove the item with the minimum frequency. Maintain a doubly linked list per frequency value, and track the current minimum frequency.

class Node:
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.freq = 1
        self.prev = None
        self.next = None

class DoublyLinkedList:
    """LRU list within a single frequency bucket."""
    def __init__(self):
        self.head = Node(None, None)  # dummy head
        self.tail = Node(None, None)  # dummy tail
        self.head.next = self.tail
        self.tail.prev = self.head
        self.size = 0

    def append(self, node):
        """Add to tail (most recently used within this frequency)."""
        node.prev = self.tail.prev
        node.next = self.tail
        self.tail.prev.next = node
        self.tail.prev = node
        self.size += 1

    def pop(self, node=None):
        """Remove specific node, or head (LRU) if none specified."""
        if self.size == 0:
            return None
        if not node:
            node = self.head.next
        node.prev.next = node.next
        node.next.prev = node.prev
        self.size -= 1
        return node

class LFUCache:
    def __init__(self, capacity: int):
        self.capacity = capacity
        self.size = 0
        self.min_freq = 0
        self.key_to_node = {}           # key → Node
        self.freq_to_list = {}          # frequency → DoublyLinkedList

    def _update(self, node):
        freq = node.freq
        self.freq_to_list[freq].pop(node)
        if self.freq_to_list[freq].size == 0:
            del self.freq_to_list[freq]
            if self.min_freq == freq:
                self.min_freq += 1  # min_freq can only increase by 1 on access

        node.freq += 1
        self.freq_to_list.setdefault(node.freq, DoublyLinkedList()).append(node)

    def get(self, key: int) -> int:
        if key not in self.key_to_node:
            return -1
        node = self.key_to_node[key]
        self._update(node)
        return node.value

    def put(self, key: int, value: int) -> None:
        if self.capacity == 0:
            return

        if key in self.key_to_node:
            self.key_to_node[key].value = value
            self._update(self.key_to_node[key])
        else:
            if self.size >= self.capacity:
                lfu_list = self.freq_to_list[self.min_freq]
                evicted = lfu_list.pop()       # LRU within min_freq bucket
                del self.key_to_node[evicted.key]
                self.size -= 1

            new_node = Node(key, value)
            self.key_to_node[key] = new_node
            self.freq_to_list.setdefault(1, DoublyLinkedList()).append(new_node)
            self.min_freq = 1              # new items always start at freq=1
            self.size += 1

Why min_freq only increments by 1 on access: when you access a node with frequency f, it moves to frequency f+1. If the f bucket is now empty and min_freq == f, the new minimum is f+1. New inserts always set min_freq = 1, so it never needs to scan.

⚠The cold start problem and frequency decay

New items enter the cache at frequency=1 and are immediately at risk of eviction under cache pressure. This creates a cold start problem: a genuinely popular item that was just added competes for cache space against stale items that accumulated high frequency long ago.

Example: a news article from 2 years ago has frequency=10,000 from historical traffic. A breaking news article gets frequency=5 in its first 30 seconds but will attract 50,000 views today. LFU evicts the new article.

Frequency decay addresses this by periodically halving all frequency counters (or applying an exponential decay):

import time

class DecayingLFUCache(LFUCache):
    def __init__(self, capacity, decay_interval=3600):
        super().__init__(capacity)
        self.decay_interval = decay_interval
        self.last_decay = time.time()

    def _maybe_decay(self):
        if time.time() - self.last_decay > self.decay_interval:
            # Halve all frequencies — requires rebuilding freq_to_list
            # In practice, this is expensive; use approximate decay
            for node in self.key_to_node.values():
                node.freq = max(1, node.freq // 2)
            self._rebuild_freq_lists()
            self.last_decay = time.time()

Redis uses a variant of this in its allkeys-lfu eviction policy: a logarithmic counter with time-based decay. The counter is capped at 255 and decremented by a configurable decay factor when the key hasn't been accessed for a while.

# redis.conf
maxmemory-policy allkeys-lfu
lfu-decay-time 1        # decay minutes between counter decrement
lfu-log-factor 10       # counter increment probability (higher = slower growth)

LFU vs LRU: when the choice matters

LFU outperforms LRU when:

Stable, long-lived popularity: product pages, reference articles, static assets with consistent traffic. The most popular items stay popular for days.
Bursty access patterns with a stable hot set: periodic spikes on the same content don't interfere with baseline popular items the way they do in LRU.

LRU outperforms LFU when:

Scan patterns: a one-time sequential scan of N items pollutes LFU's frequency counters for those items, potentially evicting genuinely popular content. LRU handles scans with a smaller impact since recently-accessed items from the scan eventually age out.
Recency-driven access: session data, user feeds, real-time event streams — the freshest data is the most valuable regardless of historical frequency.
Cold start sensitivity: systems with frequent new item introductions where waiting for frequency to build isn't acceptable.

In practice, LRU is used far more often because workloads are more often recency-driven than frequency-driven, and LRU's implementation is simpler to reason about.

A CDN caches static assets using LFU. A large batch job runs nightly and reads 10,000 rarely-accessed archive files. The next morning, popular landing page assets start getting cache misses. What is happening and how would you fix it?

medium

The CDN cache has capacity for 8,000 items. Before the batch job, popular landing page assets had accumulated high frequency (freq=500+). Archive files are accessed once during the batch job.

AThe batch job filled the cache, but LFU correctly retained the high-frequency landing page assets
Incorrect.LFU should retain high-frequency items. But the batch job reads 10,000 files into a cache with 8,000 capacity. The evictions during ingestion of 10,000 items with freq=1 each will displace some items — but the key insight is that the 10,000 new items all enter at freq=1, potentially evicting the landing page assets if the minimum frequency cluster is large enough.
BThe 10,000 archive files each got freq=1, but with 10,000 new items and 8,000 capacity, LFU evicted items during ingestion — including popular assets if the batch job ran during a low-traffic window when their frequencies hadn't been refreshed recently
Correct!During the batch job, 10,000 items need to enter the 8,000-item cache. Each insertion evicts the current min-frequency item. If the batch runs during off-peak hours, popular assets haven't been recently accessed, so their counters haven't been refreshed. The min_freq at insertion time might equal some popular items' frequencies. Fix: exclude batch job traffic from cache warming (bypass cache for the batch), or use frequency decay to prevent historical counts from becoming stale anchors.
CLFU is not designed to handle more items than cache capacity
Incorrect.LFU handles eviction of items beyond capacity — that's its core function. The issue is the specific eviction decisions made when 10,000 items with freq=1 are inserted.
DThe batch job should have used a higher-priority cache tier
Incorrect.This might be a reasonable operational fix, but it doesn't explain what happened. Understanding the eviction mechanics is the first step.

Hint:Think about what min_freq looks like after 8,000 insertions of items with freq=1, and what gets evicted next.