Cache Invalidation: The Hard Problem

“There are only two hard things in Computer Science: cache invalidation and naming things.”

I used to think this was a joke. Then I shipped a bug where users saw stale prices for 6 hours.

The cache was working perfectly. That was the problem.

Why Is It Hard?#

You have data in two places: database and cache. When the database changes, the cache needs to know. Sounds simple. It’s not.

Problem 1: What to invalidate?

User updates their profile. Easy, invalidate user:123. But what about the “top users” leaderboard that includes them? The “users in this city” list? The search results that contained their name?

One change can invalidate dozens of cached entries you didn’t think about.

Problem 2: When to invalidate?

Invalidate too early, and your cache is useless. Invalidate too late, and users see stale data. The timing matters.

Problem 3: Distributed systems

Your cache is on Server A. The write happened on Server B. How does A know to invalidate? Network delays mean there’s always a window where some servers have stale data.

Strategy 1: TTL (Time-To-Live)#

The lazy approach. Every cache entry expires after N seconds.

SET user:123 {data} EX 300  // expires in 5 minutes

Pros:

Simple. No coordination needed.
Stale data eventually fixes itself.
Works even if you miss an invalidation event.

Cons:

Data can be stale for up to TTL seconds.
Short TTL = more cache misses = slower.
Long TTL = more stale data.

TTL is your safety net. Even if everything else fails, the data will eventually refresh. I set TTL on everything, even when using other strategies.

Strategy 2: Explicit Invalidation#

When data changes, explicitly delete the cache key.

// After database update
userRepository.save(user);
cache.delete("user:" + user.getId());

Pros:

Immediate consistency (mostly).
No unnecessary cache misses.

Cons:

You must know every key affected by a change.
Easy to miss edge cases.
Doesn’t work across services without coordination.

This works for simple cases. User profile changes? Delete user cache. But complex relationships get messy fast.

Strategy 3: Event-Driven Invalidation#

Use CDC or events to broadcast changes. Listeners invalidate their caches.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#000000','primaryTextColor':'#00ff00','primaryBorderColor':'#00ff00','lineColor':'#00ff00','secondaryColor':'#000000','tertiaryColor':'#000000'}}}%% graph TD A[Database Write] --> B[CDC / Event Published] B --> C[Service A: Invalidate Cache] B --> D[Service B: Invalidate Cache] B --> E[Service C: Invalidate Cache] style A fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style B fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style C fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style D fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style E fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff

Pros:

Decoupled. Writer doesn’t need to know who’s caching.
Works across services.
Can handle complex relationships.

Cons:

Event delivery isn’t instant. Brief stale window.
More infrastructure (Kafka, CDC pipeline).
Events can be lost or delayed.

At Salesforce, we used event-driven invalidation for cross-service cache coordination. When a config changed, an event fired, and all downstream caches cleared. Usually took under a second.

Strategy 4: Version Keys#

Instead of invalidating, change the cache key itself.

// Before: user:123
// After:  user:123:v2

cache.get("user:" + userId + ":v" + version);

Pros:

No explicit invalidation needed.
Atomic switch to new data.

Cons:

Old versions linger until TTL.
Need to track current version somewhere.
Memory waste from old entries.

Useful for config or static assets where you control the version number.

The Pragmatic Approach#

Use all of them:

TTL on everything. Safety net. Even 1 hour is better than forever.
Explicit invalidation for obvious cases. User updates profile, delete user cache.
Events for cross-service. CDC or domain events to broadcast changes.
Accept some staleness. Most apps can tolerate seconds of stale data. Design for it.

What I’m Learning#

Cache invalidation is hard because it’s a distributed consistency problem wearing a performance hat. You’re trying to keep two data stores in sync without distributed transactions.

The 6-hour stale price bug? A TTL would have limited it to minutes. I learned to never trust “we’ll invalidate explicitly” without a TTL backup.

Perfect consistency with caching is expensive. Usually not worth it. Design for “eventually fresh” and make the window small enough that users don’t notice.

Have you been bitten by stale cache data? How long did it take to notice?