Your cache is humming along. A popular key expires. 10,000 requests arrive in the next second. All of them miss the cache. All of them hit your database. Simultaneously.

Your database falls over. Requests timeout. Users see errors. You just experienced a cache stampede.

Also called the thundering herd problem. And it’s bitten me more than once.

Why It Happens#

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#000000','primaryTextColor':'#00ff00','primaryBorderColor':'#00ff00','lineColor':'#00ff00','secondaryColor':'#000000','tertiaryColor':'#000000'}}}%% sequenceDiagram autonumber participant R1 as Request 1 participant R2 as Request 2 participant R3 as Request 1000 participant C as Cache participant DB as Database Note over C: Key expires at T=0 R1->>C: GET popular_key C-->>R1: MISS R2->>C: GET popular_key C-->>R2: MISS R3->>C: GET popular_key C-->>R3: MISS R1->>DB: Query R2->>DB: Query R3->>DB: Query Note over DB: 1000 identical queries
Database overloaded

The gap between “cache miss” and “cache repopulated” is the danger zone. Every request in that window becomes a database query.

Hot keys make this worse. A product page viewed 1000 times per second? When that cache expires, all 1000 requests pile onto the database.

Solution 1: Locking (Mutex)#

Only one request fetches from the database. Everyone else waits.

String value = cache.get(key);
if (value == null) {
    if (lock.tryAcquire(key)) {
        try {
            value = database.query(key);
            cache.set(key, value, TTL);
        } finally {
            lock.release(key);
        }
    } else {
        // Wait and retry, or return stale data
        Thread.sleep(50);
        value = cache.get(key);
    }
}

Pros:

  • Only one database query per key.
  • Simple to understand.

Cons:

  • Adds latency for waiting requests.
  • Lock management complexity.
  • What if the lock holder crashes?

This works but feels heavy. You’re serializing requests that could otherwise proceed independently.

Solution 2: Probabilistic Early Expiration#

Don’t wait for the key to expire. Refresh it slightly before TTL.

Each request rolls the dice. As you approach expiration, the probability of refreshing increases. One request will “win” and refresh early, before the actual expiry.

boolean shouldRefresh = Math.random() < (timeUntilExpiry / REFRESH_WINDOW);
if (shouldRefresh) {
    // Refresh cache in background
    refreshAsync(key);
}
return cachedValue; // Still valid, return it

Pros:

  • No locks needed.
  • Cache stays warm. No stampede.
  • Requests don’t block each other.

Cons:

  • Slightly more cache refreshes than necessary.
  • Requires tracking TTL in your code.

I like this approach. It’s elegant. The cache refreshes itself before anyone notices it’s stale.

Solution 3: Request Coalescing#

Multiple identical requests get merged into one.

If 100 requests ask for user:123 while the first query is running, they all wait for that one result. The database sees 1 query, not 100.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#000000','primaryTextColor':'#00ff00','primaryBorderColor':'#00ff00','lineColor':'#00ff00','secondaryColor':'#000000','tertiaryColor':'#000000'}}}%% graph TD A[Request 1] --> C{Coalescer} B[Request 2] --> C D[Request 3] --> C C --> E[Single DB Query] E --> F[Result] F --> A F --> B F --> D style A fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style B fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style C fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style D fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style E fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style F fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff

In Java, CompletableFuture or libraries like Caffeine handle this automatically. Go has singleflight. Most languages have something similar.

Pros:

  • Database load is minimized.
  • Requests still get fresh data.
  • Works well with async patterns.

Cons:

  • All requests wait for the slowest query.
  • Memory overhead for tracking in-flight requests.

At Oracle, we used request coalescing for config lookups. A service restart triggered hundreds of identical queries. Coalescing turned that into one query with hundreds of waiters.

Solution 4: Stale-While-Revalidate#

Serve stale data immediately. Refresh in the background.

CacheEntry entry = cache.getWithMetadata(key);
if (entry.isExpired()) {
    refreshAsync(key);  // Background refresh
}
return entry.getValue();  // Return stale data now

Pros:

  • Zero latency impact. Users always get instant response.
  • Background refresh prevents stampede.

Cons:

  • Users might see stale data briefly.
  • Need to track staleness separately from presence.

Perfect when slightly stale data is acceptable. Which, honestly, is most of the time.

Combine Them#

The best defense uses layers:

  1. Probabilistic early refresh to keep hot keys warm.
  2. Request coalescing for when multiple requests hit simultaneously.
  3. Stale-while-revalidate as a fallback.
  4. TTL as the last resort.

No single technique is perfect. Stack them.

What I’m Learning#

Cache stampede taught me that caching isn’t just about reads being fast. It’s about protecting the database from your own traffic patterns.

A cache isn’t just a performance optimization. It’s a buffer. When that buffer disappears, even for a moment, the system behind it needs to survive. Design for the moment the cache fails, not just for when it works.

Have you experienced a thundering herd in production? What tipped you off?