Thundering Herd

Popular cache key expires. 10,000 requests arrive in the same second. All of them miss the cache. All of them hit the database. Database collapses under the load.

This is the thundering herd. Closely related to the cache stampede I wrote about earlier, but the thundering herd happens at a broader scale. It’s not just one key. It’s thousands of requests making the same bad decision at the same time.

Why It Happens#

Caches use TTLs. TTL expires, key disappears. If that key is popular (a product page, a config value, a user session lookup), hundreds or thousands of requests simultaneously discover the cache is empty. They all independently decide to fetch from the database.

The database was sized for 100 QPS. It just got 10,000. It doesn’t recover gracefully.

Request Coalescing#

The simplest fix: only let one request through to the database. Everyone else waits for that one result.

private final Map<String, CompletableFuture<Product>> inFlight = new ConcurrentHashMap<>();

public Product getProduct(String productId) {
    // Check cache first
    Product cached = cache.get(productId);
    if (cached != null) return cached;

    // Coalesce: only one request fetches from DB
    CompletableFuture<Product> future = inFlight.computeIfAbsent(productId, id -> {
        return CompletableFuture.supplyAsync(() -> {
            Product product = db.findById(id);
            cache.put(id, product, Duration.ofMinutes(5));
            return product;
        });
    });

    try {
        return future.get(2, TimeUnit.SECONDS);
    } finally {
        inFlight.remove(productId);
    }
}

First request fetches. The other 9,999 share the result. Database sees 1 query instead of 10,000.

graph TD R1[Request 1] --> C{Cache Hit?} R2[Request 2] --> C R3[Request N...] --> C C -->|Miss| CL{Already Fetching?} CL -->|No| DB[(Database)] CL -->|Yes| W[Wait for Result] DB --> CA[Update Cache] CA --> RS[Return to All] W --> RS style R1 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style R2 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style R3 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style C fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style CL fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style DB fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style CA fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style W fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style RS fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff

Probabilistic Early Expiration#

Prevention is better than cure. Instead of all keys expiring at exactly the same time, add jitter to the TTL.

int baseTtl = 300; // 5 minutes
int jitter = ThreadLocalRandom.current().nextInt(0, 60); // 0-60 seconds
cache.put(key, value, Duration.ofSeconds(baseTtl + jitter));

Keys expire at slightly different times. The herd spreads out. Database load stays manageable.

At Oracle, we had a config cache that expired every 5 minutes on the dot. Every 5 minutes, our MySQL read replicas spiked. Adding 30 seconds of random jitter smoothed it completely. Embarrassingly simple fix for a problem we’d been “monitoring” for months.

What I’m Learning#

Thundering herd is a coordination problem. Thousands of clients making independent decisions that are rational individually but destructive collectively. The fix is always some form of coordination: coalescing, locking, staggering, or background refresh.

If you have a popular cache key with a fixed TTL, you have a thundering herd waiting to happen.

Have you seen thundering herd in production? How did you spot it?