LFU Caching: Frequency Buckets, the Cold Start Problem, and When to Choose It Over LRU
LFU evicts the least frequently accessed item, not the least recently accessed one. The difference matters for stable, long-lived access patterns where popularity is a better predictor than recency. The challenge is implementing it efficiently and handling new items that haven't had time to build frequency.
The core idea: access frequency as a proxy for future value
LRU evicts the item that was accessed least recently. LFU evicts the item that has been accessed least often. For some access patterns, these decisions diverge significantly.
LRU assumes recency predicts future use — the item you just accessed is likely to be accessed again soon. LFU assumes frequency predicts future use — the item accessed 1,000 times is likely to be accessed again, even if the last access was an hour ago.
For content platforms with stable popularity (evergreen articles, popular product pages, reference documentation), an item accessed 500 times over the last week is more valuable to keep than one accessed twice in the last minute. LRU doesn't capture this. LFU does.
LFU mechanics and the cold start problem
ConceptCaching AlgorithmsLFU tracks an access frequency counter per cached item. On eviction, the item with the lowest frequency is removed. Ties are broken by recency (the least recently accessed among minimum-frequency items). New items start at frequency 1 — making them immediately vulnerable to eviction before they can accumulate frequency.
Prerequisites
- hash maps
- heap data structures
- doubly linked lists
Key Points
- Each item has a frequency counter incremented on every access (get or put).
- Eviction targets the item with the minimum frequency. Ties broken by LRU within that frequency.
- Cold start: new items enter at frequency=1. If cache is under pressure, they evict before building any history.
- Naïve heap implementation is O(log n) per operation. Frequency bucket (doubly linked list per frequency) achieves O(1).
The O(log n) implementation: heap with lazy deletion
A min-heap keyed on frequency gives O(log n) get and put. Lazy deletion handles the case where an item's frequency in the heap is stale (the item was accessed and its counter incremented, but the old heap entry was never removed).
import heapq
class LFUCache:
def __init__(self, capacity: int):
self.capacity = capacity
self.data = {}
self.freq = {}
self.min_heap = [] # (frequency, insertion_order, key)
self.counter = 0 # tie-breaking for same frequency
def get(self, key) -> any:
if key not in self.data:
return -1
self.freq[key] += 1
self.counter += 1
heapq.heappush(self.min_heap, (self.freq[key], self.counter, key))
return self.data[key]
def put(self, key, value) -> None:
if self.capacity == 0:
return
if key in self.data:
self.data[key] = value
self.get(key) # reuse get to update frequency
else:
if len(self.data) >= self.capacity:
self._evict()
self.data[key] = value
self.freq[key] = 1
self.counter += 1
heapq.heappush(self.min_heap, (1, self.counter, key))
def _evict(self):
while self.min_heap:
freq, _, evict_key = heapq.heappop(self.min_heap)
# Lazy deletion: skip stale entries
if evict_key in self.data and freq == self.freq[evict_key]:
del self.data[evict_key]
del self.freq[evict_key]
return
The insertion order counter breaks ties between items with equal frequency — the oldest insertion among minimum-frequency items evicts first (LRU tiebreak).
The heap accumulates stale entries as items are accessed. In the worst case (every item accessed many times), the heap can grow to O(n × max_frequency) entries. This matters for long-running caches.
The O(1) implementation: frequency buckets
The key insight: you don't need a sorted heap. You only need to find and remove the item with the minimum frequency. Maintain a doubly linked list per frequency value, and track the current minimum frequency.
class Node:
def __init__(self, key, value):
self.key = key
self.value = value
self.freq = 1
self.prev = None
self.next = None
class DoublyLinkedList:
"""LRU list within a single frequency bucket."""
def __init__(self):
self.head = Node(None, None) # dummy head
self.tail = Node(None, None) # dummy tail
self.head.next = self.tail
self.tail.prev = self.head
self.size = 0
def append(self, node):
"""Add to tail (most recently used within this frequency)."""
node.prev = self.tail.prev
node.next = self.tail
self.tail.prev.next = node
self.tail.prev = node
self.size += 1
def pop(self, node=None):
"""Remove specific node, or head (LRU) if none specified."""
if self.size == 0:
return None
if not node:
node = self.head.next
node.prev.next = node.next
node.next.prev = node.prev
self.size -= 1
return node
class LFUCache:
def __init__(self, capacity: int):
self.capacity = capacity
self.size = 0
self.min_freq = 0
self.key_to_node = {} # key → Node
self.freq_to_list = {} # frequency → DoublyLinkedList
def _update(self, node):
freq = node.freq
self.freq_to_list[freq].pop(node)
if self.freq_to_list[freq].size == 0:
del self.freq_to_list[freq]
if self.min_freq == freq:
self.min_freq += 1 # min_freq can only increase by 1 on access
node.freq += 1
self.freq_to_list.setdefault(node.freq, DoublyLinkedList()).append(node)
def get(self, key: int) -> int:
if key not in self.key_to_node:
return -1
node = self.key_to_node[key]
self._update(node)
return node.value
def put(self, key: int, value: int) -> None:
if self.capacity == 0:
return
if key in self.key_to_node:
self.key_to_node[key].value = value
self._update(self.key_to_node[key])
else:
if self.size >= self.capacity:
lfu_list = self.freq_to_list[self.min_freq]
evicted = lfu_list.pop() # LRU within min_freq bucket
del self.key_to_node[evicted.key]
self.size -= 1
new_node = Node(key, value)
self.key_to_node[key] = new_node
self.freq_to_list.setdefault(1, DoublyLinkedList()).append(new_node)
self.min_freq = 1 # new items always start at freq=1
self.size += 1
Why min_freq only increments by 1 on access: when you access a node with frequency f, it moves to frequency f+1. If the f bucket is now empty and min_freq == f, the new minimum is f+1. New inserts always set min_freq = 1, so it never needs to scan.
⚠The cold start problem and frequency decay
New items enter the cache at frequency=1 and are immediately at risk of eviction under cache pressure. This creates a cold start problem: a genuinely popular item that was just added competes for cache space against stale items that accumulated high frequency long ago.
Example: a news article from 2 years ago has frequency=10,000 from historical traffic. A breaking news article gets frequency=5 in its first 30 seconds but will attract 50,000 views today. LFU evicts the new article.
Frequency decay addresses this by periodically halving all frequency counters (or applying an exponential decay):
import time
class DecayingLFUCache(LFUCache):
def __init__(self, capacity, decay_interval=3600):
super().__init__(capacity)
self.decay_interval = decay_interval
self.last_decay = time.time()
def _maybe_decay(self):
if time.time() - self.last_decay > self.decay_interval:
# Halve all frequencies — requires rebuilding freq_to_list
# In practice, this is expensive; use approximate decay
for node in self.key_to_node.values():
node.freq = max(1, node.freq // 2)
self._rebuild_freq_lists()
self.last_decay = time.time()
Redis uses a variant of this in its allkeys-lfu eviction policy: a logarithmic counter with time-based decay. The counter is capped at 255 and decremented by a configurable decay factor when the key hasn't been accessed for a while.
# redis.conf
maxmemory-policy allkeys-lfu
lfu-decay-time 1 # decay minutes between counter decrement
lfu-log-factor 10 # counter increment probability (higher = slower growth)
LFU vs LRU: when the choice matters
LFU outperforms LRU when:
- Stable, long-lived popularity: product pages, reference articles, static assets with consistent traffic. The most popular items stay popular for days.
- Bursty access patterns with a stable hot set: periodic spikes on the same content don't interfere with baseline popular items the way they do in LRU.
LRU outperforms LFU when:
- Scan patterns: a one-time sequential scan of N items pollutes LFU's frequency counters for those items, potentially evicting genuinely popular content. LRU handles scans with a smaller impact since recently-accessed items from the scan eventually age out.
- Recency-driven access: session data, user feeds, real-time event streams — the freshest data is the most valuable regardless of historical frequency.
- Cold start sensitivity: systems with frequent new item introductions where waiting for frequency to build isn't acceptable.
In practice, LRU is used far more often because workloads are more often recency-driven than frequency-driven, and LRU's implementation is simpler to reason about.
A CDN caches static assets using LFU. A large batch job runs nightly and reads 10,000 rarely-accessed archive files. The next morning, popular landing page assets start getting cache misses. What is happening and how would you fix it?
mediumThe CDN cache has capacity for 8,000 items. Before the batch job, popular landing page assets had accumulated high frequency (freq=500+). Archive files are accessed once during the batch job.
AThe batch job filled the cache, but LFU correctly retained the high-frequency landing page assets
Incorrect.LFU should retain high-frequency items. But the batch job reads 10,000 files into a cache with 8,000 capacity. The evictions during ingestion of 10,000 items with freq=1 each will displace some items — but the key insight is that the 10,000 new items all enter at freq=1, potentially evicting the landing page assets if the minimum frequency cluster is large enough.BThe 10,000 archive files each got freq=1, but with 10,000 new items and 8,000 capacity, LFU evicted items during ingestion — including popular assets if the batch job ran during a low-traffic window when their frequencies hadn't been refreshed recently
Correct!During the batch job, 10,000 items need to enter the 8,000-item cache. Each insertion evicts the current min-frequency item. If the batch runs during off-peak hours, popular assets haven't been recently accessed, so their counters haven't been refreshed. The min_freq at insertion time might equal some popular items' frequencies. Fix: exclude batch job traffic from cache warming (bypass cache for the batch), or use frequency decay to prevent historical counts from becoming stale anchors.CLFU is not designed to handle more items than cache capacity
Incorrect.LFU handles eviction of items beyond capacity — that's its core function. The issue is the specific eviction decisions made when 10,000 items with freq=1 are inserted.DThe batch job should have used a higher-priority cache tier
Incorrect.This might be a reasonable operational fix, but it doesn't explain what happened. Understanding the eviction mechanics is the first step.
Hint:Think about what min_freq looks like after 8,000 insertions of items with freq=1, and what gets evicted next.