Why setting timeouts is harder than it looks. Cascading failures, timeout budgets, and the art of picking the right number.
Posts for: #System-Design
Distributed Locks: When One Process Must Win
Why distributed locking is harder than it looks. Naive Redis locks, Redlock, fencing tokens, and when to avoid locks entirely.
Connection Pooling: Why Opening Connections Is Expensive
The hidden cost of database connections. How connection pools work, why they matter, and how to size them without guessing.
Multi-Level Caching: L1, L2, and Beyond
Why one cache isn’t enough. How to layer local, distributed, and CDN caches for maximum performance without losing your mind on consistency.
Cache Stampede: When Expiry Causes Chaos
What happens when a popular cache key expires and thousands of requests hit your database at once. Three patterns to prevent the thundering herd.
Cache Invalidation: The Hard Problem
There are only two hard things in computer science: cache invalidation and naming things. Here’s why invalidation is so tricky, and what actually works.
Caching Patterns: Cache-Aside, Write-Through, and Friends
The four fundamental caching patterns every engineer should know. When to use cache-aside vs write-through vs write-behind vs read-through.
CRDTs: Data Structures That Never Conflict
How CRDTs let distributed systems merge updates without coordination. The math that makes ‘conflict-free’ possible.
Gossip Protocols: How Rumors Keep Systems Alive
How distributed systems spread information without a central coordinator. The surprisingly effective technique of random peer-to-peer chatter.
Vector Clocks and Lamport Timestamps
How distributed systems track ‘what happened before what’ without trusting wall clocks. Lamport timestamps for ordering, vector clocks for detecting conflicts.
The In-Memory Trap: Why Objects Are Slow
In-memory doesn’t always mean fast. How shifting from object-based to vector-based storage (Apache Arrow) delivered a 13x performance boost.
Raft: The Understandable Consensus Algorithm
How distributed systems agree on state. A practical look at Raft’s Leader Election and Log Replication, finally making sense of consensus.
The CAP Theorem: The Cliché I Tried to Avoid
Why the CAP Theorem is the most misunderstood rule in system design. Addressing the ‘Pick 2’ lie and how it sets the stage for consensus algorithms.
Distributed Tracing: Finding the Needle in the Haystack
When a request vanishes into a maze of 10 microservices. How Distributed Tracing and OpenTelemetry keep you from going insane during an outage.
Transactional Outbox: Solving the Dual Write Problem
Why your event-driven system is lying to you. Solving the ‘Dual Write’ problem using the Transactional Outbox pattern.
Materialized Views: The Read Optimization Pattern
Why standard views are just aliases and how materialized views act as an ‘in-database cache’ to solve the cross-shard query problem.
Saga Pattern: Managing Distributed Transactions
Why distributed ACID is a trap. Understanding choreography and orchestration sagas for long-running business processes.
Event Sourcing: Events as Source of Truth
Storing events instead of current state. How event sourcing works, rebuilding state from events, and when the complexity is worth it.
CQRS: Separating Reads from Writes
Command Query Responsibility Segregation - why you might want separate models for reading and writing data. When it helps, when it’s overkill, and implementation patterns.
Change Data Capture: Streaming Database Changes
How to capture and stream database changes in real-time. CDC patterns, implementation approaches, and when to use it instead of application-level events.
Two Generals Problem: Why Consensus is Impossible
The thought experiment that proves distributed consensus can’t be guaranteed over unreliable networks. Why acknowledgments create infinite regress and what it means for real systems.
Database Sharding: Splitting Data Across Machines
How to partition database across multiple servers. Hash-based vs range-based sharding, rebalancing strategies, and the complexity that comes with it.
Rate Limiting: Token Bucket vs Leaky Bucket
Protecting services from overload with rate limiting. Token bucket and leaky bucket algorithms explained with Java implementations and real-world trade-offs.
Backpressure: When Consumers Can’t Keep Up
Handling slow consumers in distributed systems. Queue growth, memory exhaustion, and strategies for applying backpressure - rejection, rate limiting, and flow control.
Retry Strategies: Exponential Backoff and Jitter
How to retry failed requests without overwhelming servers. Exponential backoff, jitter, and when to give up. Java implementations and real-world patterns.
Idempotency: Why Retries Need It
How to make operations safe to retry. Idempotency keys, database patterns, and why retrying non-idempotent operations causes data corruption.
Session Guarantees: The Promises Your Database Makes to You
Read-your-writes and monotonic reads aren’t just buzzwords. They’re the difference between a database that feels broken and one that makes sense to users.
Horizontal vs Vertical Scaling: Bigger Machine or More Machines
Comparing vertical scaling (scale up) and horizontal scaling (scale out). When to use each, trade-offs, and the complexity that comes with horizontal scaling.
Load Balancing Strategies: Picking the Right Server
Comparing load balancing algorithms - Round Robin, Least Connections, Weighted Round Robin, and IP Hash. Java implementations and real-world trade-offs.
Bloom Filters: Definitely Not Here
Bloom filters skip unnecessary disk reads in LSM trees by saying ‘definitely not here’ with zero false negatives. Learn how Cassandra and RocksDB use them.