You added Redis. Latency dropped from 50ms to 5ms. Great.

But now every request still makes a network call to Redis. What if you could skip even that?

Enter multi-level caching. Multiple cache layers, each faster than the last.

The Hierarchy#

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#000000','primaryTextColor':'#00ff00','primaryBorderColor':'#00ff00','lineColor':'#00ff00','secondaryColor':'#000000','tertiaryColor':'#000000'}}}%% graph TD R[Request] --> L1{L1: Local Cache} L1 -->|Hit| R1[~0.1ms] L1 -->|Miss| L2{L2: Distributed Cache} L2 -->|Hit| R2[~2-5ms] L2 -->|Miss| DB[(Database)] DB --> R3[~50-200ms] style R fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style L1 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style L2 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style DB fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style R1 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style R2 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style R3 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff

L1: Local/In-Process Cache Lives in your application’s memory. No network call. Caffeine in Java, in-memory dict in Python. Microseconds.

L2: Distributed Cache Shared across instances. Redis, Memcached. Network hop required. Milliseconds.

L3: Database The source of truth. Slowest, but always has the data.

Optional L0: CDN For static or semi-static content. Cloudflare, CloudFront. Geographically distributed. Users hit edge servers.

Why Bother?#

Numbers from a service I worked on:

LayerLatencyHit Rate
L1 (local)0.1ms60%
L2 (Redis)3ms35%
Database80ms5%

60% of requests never left the process. 95% never hit the database. Average latency dropped from 80ms to under 2ms.

The L1 cache did most of the heavy lifting. Redis was the fallback for cache misses across instances. Database was the last resort.

The Consistency Problem#

Here’s where it gets tricky. You have 10 app servers. Each has its own L1 cache.

User updates their profile on Server A. Server A updates database and Redis. But Servers B through J still have the old data in their L1 cache.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#000000','primaryTextColor':'#00ff00','primaryBorderColor':'#00ff00','lineColor':'#00ff00','secondaryColor':'#000000','tertiaryColor':'#000000'}}}%% graph TD U[User Update] --> A[Server A] A --> DB[(Database)] A --> Redis[Redis L2] A --> AL1[A's L1: Updated] B[Server B] --> BL1[B's L1: Stale] C[Server C] --> CL1[C's L1: Stale] style U fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style A fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style DB fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style Redis fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style AL1 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style B fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style BL1 fill:#000000,stroke:#ff0000,stroke-width:2px,color:#fff style C fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style CL1 fill:#000000,stroke:#ff0000,stroke-width:2px,color:#fff

Three ways to handle this:

1. Short TTL on L1

Keep L1 TTL very short. 5-30 seconds. Staleness is bounded. Simple but wasteful.

2. Pub/Sub Invalidation

When data changes, publish an invalidation message. All servers subscribe and clear their L1.

// On update
redis.publish("cache:invalidate", "user:123");

// All servers listen
redis.subscribe("cache:invalidate", key -> {
    localCache.remove(key);
});

More complex but immediate consistency.

3. Accept Inconsistency

For some data, it’s fine. User sees their own stale avatar for 10 seconds? Probably okay. Bank balance? Not okay.

Match the strategy to the data.

Write Path#

When writing, update in reverse order: database first, then L2, then L1.

void updateUser(User user) {
    database.save(user);           // Source of truth first
    redis.set("user:" + user.id, user);  // L2
    localCache.put("user:" + user.id, user);  // L1
    redis.publish("invalidate", "user:" + user.id);  // Tell other servers
}

If you update L1 first and database fails, you have inconsistent data. Always update the slowest, most durable store first.

Read Path#

Read in forward order: L1 first, then L2, then database.

User getUser(String id) {
    User user = localCache.get("user:" + id);
    if (user != null) return user;

    user = redis.get("user:" + id);
    if (user != null) {
        localCache.put("user:" + id, user);  // Populate L1
        return user;
    }

    user = database.findById(id);
    redis.set("user:" + id, user);      // Populate L2
    localCache.put("user:" + id, user); // Populate L1
    return user;
}

Each layer populates the one above it on miss.

When to Skip L1#

Not everything belongs in local cache:

  • Large objects. L1 uses heap memory. Too much and you get GC pressure.
  • High cardinality data. Millions of unique keys? L1 won’t help.
  • Frequently updated data. Constant invalidation defeats the purpose.
  • Consistency-critical data. When staleness isn’t acceptable.

L1 shines for hot, stable, small data. Config. Feature flags. User sessions.

What I’m Learning#

Multi-level caching is a trade-off machine. Each layer you add improves speed but complicates consistency.

The mental model that helps me: treat each cache layer as a progressively weaker guarantee. L1 might be stale. L2 is probably fresh. Database is truth. Design your application to tolerate the staleness each layer introduces.

The biggest win isn’t adding more layers. It’s knowing which data belongs in which layer.

What’s your experience with local caching? Has L1 staleness ever caused a bug?