LRU-K: Evicting Based on K-th Most Recent Access for Scan Resistance
LRU-K evicts the item whose K-th most recent access is oldest, rather than the item whose single most recent access is oldest. This makes it resistant to sequential scan poisoning: a page accessed only once has an 'infinite' K-th access time and is evicted before pages that have been accessed K times. PostgreSQL uses LRU-2 in its buffer manager.
How LRU-K differs from LRU
Standard LRU uses the single most recent access time to rank pages for eviction. LRU-K uses the K-th most recent access time:
LRU (K=1): evict the page whose last access is oldest
LRU-2: evict the page whose 2nd-to-last access is oldest
LRU-3: evict the page whose 3rd-to-last access is oldest
Pages with fewer than K total accesses are assigned an "infinite" backward K-th reference distance — they're evicted first, before any page that has been accessed K or more times. This is the scan resistance property: a sequential scan page accessed once has infinite K-th reference distance and is always evicted before a hot page accessed many times.
Pages with fewer than K accesses are always evicted first — they go to a candidate pool with FIFO eviction
ConceptCachingLRU-K separates pages into two groups: correlated references (pages accessed ≥K times) and history references (pages accessed <K times). Pages in the history group are treated as FIFO — first-in, first-out eviction regardless of recency. Only after a page has been accessed K times does it graduate to the correlated group and receive LRU-K treatment. This ensures scan pages (accessed once) never displace frequently-accessed pages.
Prerequisites
- LRU cache
- Database buffer pool
- Sequential scan access patterns
Key Points
- Pages with <K accesses go to a candidate/history pool with FIFO eviction.
- Pages with ≥K accesses are ranked by their K-th most recent access time.
- PostgreSQL uses LRU-2: a page must be accessed twice before it's treated as a 'hot' page.
- K=1 degenerates to standard LRU. K=2 (LRU-2) is the most common production value.
Implementation with candidate pool
import heapq
import time
class LRU_K_Cache:
def __init__(self, capacity, K):
self.capacity = capacity
self.K = K
self.cache = {} # key → list of K most recent access times
self.heap = [] # min-heap: (K-th access time, key) for promoted pages
self.candidate_pool = {} # pages with fewer than K accesses
def _now(self):
return time.monotonic()
def access(self, key, value=None):
now = self._now()
if key in self.cache:
# Already promoted: update access history
self.cache[key].append(now)
if len(self.cache[key]) > self.K:
self.cache[key].pop(0)
# Update heap with new K-th access time
heapq.heappush(self.heap, (self.cache[key][0], key))
elif key in self.candidate_pool:
# In candidate pool: record another access
self.candidate_pool[key].append(now)
if len(self.candidate_pool[key]) >= self.K:
# Promote to main cache
self.cache[key] = self.candidate_pool.pop(key)
heapq.heappush(self.heap, (self.cache[key][0], key))
if len(self.cache) + len(self.candidate_pool) > self.capacity:
self._evict()
else:
# New page: add to candidate pool
self.candidate_pool[key] = [now]
if len(self.cache) + len(self.candidate_pool) > self.capacity:
self._evict()
def _evict(self):
# Prefer evicting from candidate pool (pages with <K accesses)
if self.candidate_pool:
# FIFO: remove the oldest entry in candidate pool
key = next(iter(self.candidate_pool))
del self.candidate_pool[key]
return
# Fall back to evicting from main cache by K-th access time
while self.heap:
kth_time, key = heapq.heappop(self.heap)
if key in self.cache and self.cache[key][0] == kth_time:
del self.cache[key]
return
LRU-K in practice: PostgreSQL clock-sweep
PostgreSQL's buffer manager uses a clock-sweep algorithm (an approximation of LRU-2). Each buffer has a usage count:
- On first access: usage count = 1
- On subsequent accesses: usage count incremented (max 5)
- Clock sweep: scan buffers, decrement usage count by 1; evict when count reaches 0
A page accessed once has count 1 and is evicted after one clock sweep pass. A page accessed 5 times requires 5 passes to evict — it survives scan rounds much longer. Sequential scan pages immediately have count 1 and are the first to be evicted on the next sweep.
-- Check buffer usage in PostgreSQL
SELECT buffers_clean, buffers_alloc, maxwritten_clean
FROM pg_stat_bgwriter;
-- See which relations are most cached
SELECT relname, count(*) AS buffers
FROM pg_buffercache b
JOIN pg_class c ON b.relfilenode = pg_relation_filenode(c.oid)
GROUP BY relname
ORDER BY buffers DESC
LIMIT 20;
📝LRU-K vs ARC: when to use each
LRU-K and ARC both improve on standard LRU for mixed workloads, but they solve different problems:
| | LRU-K | ARC | |---|---|---| | Mechanism | K-th access time eviction | Adaptive T1/T2 partitions | | Ghost lists | No | Yes (B1, B2) | | Self-tuning | No (K is fixed) | Yes (adjusts T1/T2 ratio) | | Scan resistance | Strong (candidate pool) | Moderate | | Memory overhead | Low | Higher (ghost lists) | | Implementation complexity | Moderate | High |
Use LRU-2 when you have predictable access patterns and need scan resistance (database buffer pools). Use ARC when access patterns change dynamically and you want the cache to adapt automatically (ZFS uses ARC for this reason).
You configure LRU-K with K=2 and cache capacity 5. A page P is accessed once and enters the candidate pool. Then P is accessed 3 more times (total 4 accesses). Another page Q enters the candidate pool with 1 access and the cache is full. What gets evicted?
mediumK=2. Cache capacity=5. P has been accessed 4 times total and was promoted to main cache after its 2nd access. Q has been accessed once and is in the candidate pool.
AP gets evicted because it has the oldest 2nd-most-recent access time among promoted pages
Incorrect.P's 2nd-most-recent access time may or may not be the oldest. But Q hasn't been accessed K=2 times yet — it's still in the candidate pool and gets priority for eviction.BQ gets evicted because it has fewer than K=2 accesses and sits in the candidate pool — candidate pool pages are evicted before promoted pages
Correct!LRU-K's candidate pool (for pages with <K accesses) is drained before evicting from the main cache. Q has only 1 access and hasn't reached K=2, so it stays in the candidate pool with FIFO treatment. When space is needed, Q is evicted first — before any promoted page (including P, which has 4 accesses). This is the core scan-resistance property: pages that haven't proven their worth (K accesses) are always evicted before established hot pages.CP gets evicted because it has been in the cache longer than Q
Incorrect.LRU-K doesn't use time-in-cache for eviction. It uses K-th access time for promoted pages and FIFO for candidate pool pages.DThe cache evicts both P and Q since the candidate pool and main cache count separately
Incorrect.The capacity limit applies to the total of main cache + candidate pool. Only one eviction is needed when capacity is exceeded.
Hint:Which pool (candidate vs main cache) gets priority for eviction in LRU-K?