Multi-Level Caching: L1, L2, and Beyond
You added Redis. Latency dropped from 50ms to 5ms. Great.
But now every request still makes a network call to Redis. What if you could skip even that?
Enter multi-level caching. Multiple cache layers, each faster than the last.
The Hierarchy#
L1: Local/In-Process Cache Lives in your application’s memory. No network call. Caffeine in Java, in-memory dict in Python. Microseconds.
L2: Distributed Cache Shared across instances. Redis, Memcached. Network hop required. Milliseconds.
L3: Database The source of truth. Slowest, but always has the data.
Optional L0: CDN For static or semi-static content. Cloudflare, CloudFront. Geographically distributed. Users hit edge servers.
Why Bother?#
Numbers from a service I worked on:
| Layer | Latency | Hit Rate |
|---|---|---|
| L1 (local) | 0.1ms | 60% |
| L2 (Redis) | 3ms | 35% |
| Database | 80ms | 5% |
60% of requests never left the process. 95% never hit the database. Average latency dropped from 80ms to under 2ms.
The L1 cache did most of the heavy lifting. Redis was the fallback for cache misses across instances. Database was the last resort.
The Consistency Problem#
Here’s where it gets tricky. You have 10 app servers. Each has its own L1 cache.
User updates their profile on Server A. Server A updates database and Redis. But Servers B through J still have the old data in their L1 cache.
Three ways to handle this:
1. Short TTL on L1
Keep L1 TTL very short. 5-30 seconds. Staleness is bounded. Simple but wasteful.
2. Pub/Sub Invalidation
When data changes, publish an invalidation message. All servers subscribe and clear their L1.
// On update
redis.publish("cache:invalidate", "user:123");
// All servers listen
redis.subscribe("cache:invalidate", key -> {
localCache.remove(key);
});
More complex but immediate consistency.
3. Accept Inconsistency
For some data, it’s fine. User sees their own stale avatar for 10 seconds? Probably okay. Bank balance? Not okay.
Match the strategy to the data.
Write Path#
When writing, update in reverse order: database first, then L2, then L1.
void updateUser(User user) {
database.save(user); // Source of truth first
redis.set("user:" + user.id, user); // L2
localCache.put("user:" + user.id, user); // L1
redis.publish("invalidate", "user:" + user.id); // Tell other servers
}
If you update L1 first and database fails, you have inconsistent data. Always update the slowest, most durable store first.
Read Path#
Read in forward order: L1 first, then L2, then database.
User getUser(String id) {
User user = localCache.get("user:" + id);
if (user != null) return user;
user = redis.get("user:" + id);
if (user != null) {
localCache.put("user:" + id, user); // Populate L1
return user;
}
user = database.findById(id);
redis.set("user:" + id, user); // Populate L2
localCache.put("user:" + id, user); // Populate L1
return user;
}
Each layer populates the one above it on miss.
When to Skip L1#
Not everything belongs in local cache:
- Large objects. L1 uses heap memory. Too much and you get GC pressure.
- High cardinality data. Millions of unique keys? L1 won’t help.
- Frequently updated data. Constant invalidation defeats the purpose.
- Consistency-critical data. When staleness isn’t acceptable.
L1 shines for hot, stable, small data. Config. Feature flags. User sessions.
What I’m Learning#
Multi-level caching is a trade-off machine. Each layer you add improves speed but complicates consistency.
The mental model that helps me: treat each cache layer as a progressively weaker guarantee. L1 might be stale. L2 is probably fresh. Database is truth. Design your application to tolerate the staleness each layer introduces.
The biggest win isn’t adding more layers. It’s knowing which data belongs in which layer.
What’s your experience with local caching? Has L1 staleness ever caused a bug?