Posts for: #System-Design

Thundering Herd

2026-02-11sohilladhani

Cache expires. 10,000 requests hit the database simultaneously. Your DB collapses. How request coalescing and probabilistic expiration prevent the stampede.

[Read more]

Structured Logging in Distributed Systems

2026-02-10sohilladhani

#distributed-systems #observability #logging #system-design #java

Grep through 50 log files to find one request. Or use structured logging with correlation IDs and find it in seconds.

[Read more]

Database Migrations Without Downtime

2026-02-09sohilladhani

#mysql #database #system-design #deployment #architecture

ALTER TABLE on a 2M row table locks it for minutes. Your users see errors. Here’s how expand-contract and shadow writes let you migrate without downtime.

[Read more]

Tail Latency: The P99 Problem

2026-02-08sohilladhani

#distributed-systems #performance #system-design #observability

Your average latency looks great. Your P99 is a disaster. Why tail latency matters more than averages, and what you can actually do about it.

[Read more]

Ordering Guarantees in Event-Driven Systems

2026-02-07sohilladhani

#distributed-systems #kafka #messaging #event-driven #system-design

Events arrive out of order. User updates overwrite each other. Here’s how partition keys, sequence numbers, and causal ordering keep things straight.

[Read more]

Dead Letter Queues

2026-02-06sohilladhani

#distributed-systems #kafka #messaging #resilience #system-design

Your consumer retried a bad message 10,000 times. It will never succeed. Dead letter queues catch the messages that can’t be processed so the rest of your system keeps moving.

[Read more]

Making Consumers Idempotent

2026-02-05sohilladhani

#distributed-systems #kafka #messaging #idempotency #system-design #java

Exactly-once delivery is impossible across boundaries. Here’s the pattern that actually works: at-least-once delivery with idempotent consumers.

[Read more]

Exactly-Once Delivery is a Lie

2026-02-04sohilladhani

#distributed-systems #kafka #messaging #idempotency #system-design

Kafka says exactly-once. Your consumer processed the message twice anyway. Here’s why exactly-once is impossible across system boundaries.

[Read more]

Graceful Shutdown: Dying Without Dropping Requests

2026-02-03sohilladhani

#kubernetes #deployment #resilience #system-design #spring-boot

What happens to in-flight requests when you deploy? How to shut down cleanly without dropping connections or corrupting state.

[Read more]

Timeouts: The Hardest Easy Problem

2026-02-02sohilladhani

#distributed-systems #timeouts #resilience #performance #system-design

Why setting timeouts is harder than it looks. Cascading failures, timeout budgets, and the art of picking the right number.

[Read more]

Distributed Locks: When One Process Must Win

2026-02-01sohilladhani

#distributed-systems #redis #locking #concurrency #system-design

Why distributed locking is harder than it looks. Naive Redis locks, Redlock, fencing tokens, and when to avoid locks entirely.

[Read more]

Connection Pooling: Why Opening Connections Is Expensive

2026-01-31sohilladhani

#performance #database #connection-pooling #java #system-design

The hidden cost of database connections. How connection pools work, why they matter, and how to size them without guessing.

[Read more]

Multi-Level Caching: L1, L2, and Beyond

2026-01-30sohilladhani

#caching #performance #redis #cdn #system-design #architecture

Why one cache isn’t enough. How to layer local, distributed, and CDN caches for maximum performance without losing your mind on consistency.

[Read more]

Cache Stampede: When Expiry Causes Chaos

2026-01-29sohilladhani

#caching #thundering-herd #cache-stampede #performance #system-design

What happens when a popular cache key expires and thousands of requests hit your database at once. Three patterns to prevent the thundering herd.

[Read more]

Cache Invalidation: The Hard Problem

2026-01-28sohilladhani

#caching #invalidation #consistency #system-design #architecture

There are only two hard things in computer science: cache invalidation and naming things. Here’s why invalidation is so tricky, and what actually works.

[Read more]

Caching Patterns: Cache-Aside, Write-Through, and Friends

2026-01-27sohilladhani

#caching #performance #system-design #redis #architecture

The four fundamental caching patterns every engineer should know. When to use cache-aside vs write-through vs write-behind vs read-through.

[Read more]

CRDTs: Data Structures That Never Conflict

2026-01-26sohilladhani

#distributed-systems #crdt #eventual-consistency #conflict-resolution #system-design

How CRDTs let distributed systems merge updates without coordination. The math that makes ‘conflict-free’ possible.

[Read more]

Gossip Protocols: How Rumors Keep Systems Alive

2026-01-25sohilladhani

#distributed-systems #gossip-protocol #failure-detection #system-design #architecture

How distributed systems spread information without a central coordinator. The surprisingly effective technique of random peer-to-peer chatter.

[Read more]

Vector Clocks and Lamport Timestamps

2026-01-24sohilladhani

#distributed-systems #vector-clocks #lamport-timestamps #consistency #system-design

How distributed systems track ‘what happened before what’ without trusting wall clocks. Lamport timestamps for ordering, vector clocks for detecting conflicts.

[Read more]

The In-Memory Trap: Why Objects Are Slow

2026-01-23sohilladhani

#performance #system-design #java #redis #apache-arrow

In-memory doesn’t always mean fast. How shifting from object-based to vector-based storage (Apache Arrow) delivered a 13x performance boost.

[Read more]

Raft: The Understandable Consensus Algorithm

2026-01-22sohilladhani

#distributed-systems #raft #consensus #system-design #architecture #fault-tolerance

How distributed systems agree on state. A practical look at Raft’s Leader Election and Log Replication, finally making sense of consensus.

[Read more]

The CAP Theorem: The Cliché I Tried to Avoid

2026-01-21sohilladhani

#distributed-systems #cap-theorem #database #system-design #architecture

Why the CAP Theorem is the most misunderstood rule in system design. Addressing the ‘Pick 2’ lie and how it sets the stage for consensus algorithms.

[Read more]

Distributed Tracing: Finding the Needle in the Haystack

2026-01-20sohilladhani

#architecture #microservices #observability #distributed-systems #system-design

When a request vanishes into a maze of 10 microservices. How Distributed Tracing and OpenTelemetry keep you from going insane during an outage.

[Read more]

Transactional Outbox: Solving the Dual Write Problem

2026-01-19sohilladhani

#architecture #microservices #event-driven #system-design #patterns

Why your event-driven system is lying to you. Solving the ‘Dual Write’ problem using the Transactional Outbox pattern.

[Read more]

Materialized Views: The Read Optimization Pattern

2026-01-18sohilladhani

#distributed-systems #database #performance #cqrs #system-design

Why standard views are just aliases and how materialized views act as an ‘in-database cache’ to solve the cross-shard query problem.

[Read more]

Saga Pattern: Managing Distributed Transactions

2026-01-17sohilladhani

#architecture #distributed-transactions #microservices #system-design #patterns #saga

Why distributed ACID is a trap. Understanding choreography and orchestration sagas for long-running business processes.

[Read more]

Event Sourcing: Events as Source of Truth

2026-01-16sohilladhani

#architecture #event-sourcing #event-driven #system-design #patterns

Storing events instead of current state. How event sourcing works, rebuilding state from events, and when the complexity is worth it.

[Read more]

CQRS: Separating Reads from Writes

2026-01-15sohilladhani

#architecture #cqrs #event-driven #system-design #patterns

Command Query Responsibility Segregation - why you might want separate models for reading and writing data. When it helps, when it’s overkill, and implementation patterns.

[Read more]

Change Data Capture: Streaming Database Changes

2026-01-14sohilladhani

#database #cdc #streaming #event-driven #system-design

How to capture and stream database changes in real-time. CDC patterns, implementation approaches, and when to use it instead of application-level events.

[Read more]

Two Generals Problem: Why Consensus is Impossible

2026-01-13sohilladhani

#distributed-systems #consensus #theory #system-design

The thought experiment that proves distributed consensus can’t be guaranteed over unreliable networks. Why acknowledgments create infinite regress and what it means for real systems.

[Read more]

< [Newer posts] :: [Older posts] >