Posts for: #Distributed-Systems

Leader Election: Picking One Node to Rule

2026-03-08sohilladhani

Three nodes, one job. Without leader election, all three run it simultaneously. With leader election, exactly one does the work while the others stand by.

[Read more]

MapReduce: Processing Data That Won’t Fit on One Machine

2026-03-07sohilladhani

#distributed-systems #system-design #architecture #java #performance

Your dataset is 10TB. One machine can’t hold it, let alone process it. MapReduce splits the work across hundreds of machines with a deceptively simple API.

[Read more]

Inverted Indexes: How Search Actually Works

2026-03-05sohilladhani

#data-structures #system-design #java #database #distributed-systems

A normal index maps documents to words. An inverted index maps words to documents. That reversal is why search is fast.

[Read more]

Checkpointing: Resuming Long-Running Jobs Without Starting Over

2026-03-04sohilladhani

#distributed-systems #system-design #architecture #java #patterns

A batch job runs for three hours and crashes at hour two. Without checkpointing, you restart from zero. With it, you lose ten minutes of work.

[Read more]

Content Fingerprinting: Detecting Near-Duplicates at Scale

2026-03-03sohilladhani

#distributed-systems #data-structures #algorithms #java #system-design

Exact duplicates are easy. Near-duplicates are hard. SimHash turns documents into compact fingerprints where similar content produces similar hashes.

[Read more]

Priority Queues in Distributed Systems

2026-03-02sohilladhani

#distributed-systems #system-design #architecture #java #redis

FIFO queues treat every message equally. But urgent config updates shouldn’t wait behind a thousand bulk sync jobs. Priority queues fix this, if you handle starvation.

[Read more]

Reconciliation: When Your Systems Disagree

2026-03-01sohilladhani

#distributed-systems #system-design #architecture #java #observability

Your database says one thing. The external system says another. Reconciliation is how you find the drift before your users do.

[Read more]

State Machines: Making Distributed Workflows Predictable

2026-02-28sohilladhani

#distributed-systems #system-design #architecture #java #patterns

Boolean flags and status strings create impossible states. An explicit state machine tells you exactly where a workflow is, what transitions are valid, and how to recover.

[Read more]

Optimistic vs Pessimistic Concurrency: Locks vs Versions

2026-02-27sohilladhani

#distributed-systems #database #system-design #java #mysql

Two users update the same row. Pessimistic locking blocks one until the other finishes. Optimistic locking lets both try and fails the loser. Choosing wrong kills either throughput or correctness.

[Read more]

Two-Phase Commit: The Original Distributed Transaction

2026-02-26sohilladhani

#distributed-systems #system-design #architecture #java #database

Two-phase commit guarantees atomicity across multiple databases. It also blocks everything if the coordinator dies. Here’s why microservices moved on.

[Read more]

Input Validation and Abuse Prevention in Distributed Systems

2026-02-25sohilladhani

#security #system-design #architecture #distributed-systems #patterns

Every public write endpoint is an abuse vector. Layered defense with validation, rate limiting, and async scanning keeps your system safe without killing performance.

[Read more]

Approximate Counting: HyperLogLog and Count-Min Sketch

2026-02-24sohilladhani

#distributed-systems #data-structures #system-design #performance #redis

Counting unique items across billions of events. A HashSet needs gigabytes. HyperLogLog does it in 12KB. The trick is accepting a little error.

[Read more]

SLOs and Error Budgets: When Good Enough is a Number

2026-02-23sohilladhani

#observability #system-design #architecture #distributed-systems #performance

100% availability is impossible and pursuing it wastes engineering time. SLOs turn reliability into a number you can reason about.

[Read more]

Distributed ID Generation: Snowflake and Friends

2026-02-21sohilladhani

#distributed-systems #system-design #architecture #java #database

Auto-increment IDs break the moment you have more than one database. Snowflake IDs, UUIDs, and database sequences each solve this differently.

[Read more]

Event Aggregation: When 47 Notifications Become One

2026-02-20sohilladhani

#distributed-systems #system-design #architecture #patterns #java

Showing every individual event overwhelms users. Grouping related events into summaries is a distributed systems problem hiding as a UX problem.

[Read more]

Presence Systems: Who’s Online and How You Know

2026-02-16sohilladhani

#distributed-systems #redis #system-design #architecture #patterns

Green dot means online. Simple, right? Behind that dot is a distributed system making heartbeat-based guesses about user liveness.

[Read more]

Fan-Out Strategies: Write-Time vs Read-Time

2026-02-14sohilladhani

#distributed-systems #system-design #architecture #performance #patterns

User posts an update. Do you push it to all followers immediately, or let them pull it when they check? The trade-off shapes your entire architecture.

[Read more]

WebSockets vs Long Polling: Choosing a Real-Time Transport

2026-02-13sohilladhani

#distributed-systems #architecture #java #performance #system-design

Your client needs real-time updates from the server. HTTP wasn’t built for this. Here’s how long polling, SSE, and WebSockets solve it differently.

[Read more]

Thundering Herd

2026-02-11sohilladhani

#distributed-systems #caching #performance #system-design #redis

Cache expires. 10,000 requests hit the database simultaneously. Your DB collapses. How request coalescing and probabilistic expiration prevent the stampede.

[Read more]

Structured Logging in Distributed Systems

2026-02-10sohilladhani

#distributed-systems #observability #logging #system-design #java

Grep through 50 log files to find one request. Or use structured logging with correlation IDs and find it in seconds.

[Read more]

Tail Latency: The P99 Problem

2026-02-08sohilladhani

#distributed-systems #performance #system-design #observability

Your average latency looks great. Your P99 is a disaster. Why tail latency matters more than averages, and what you can actually do about it.

[Read more]

Ordering Guarantees in Event-Driven Systems

2026-02-07sohilladhani

#distributed-systems #kafka #messaging #event-driven #system-design

Events arrive out of order. User updates overwrite each other. Here’s how partition keys, sequence numbers, and causal ordering keep things straight.

[Read more]

Dead Letter Queues

2026-02-06sohilladhani

#distributed-systems #kafka #messaging #resilience #system-design

Your consumer retried a bad message 10,000 times. It will never succeed. Dead letter queues catch the messages that can’t be processed so the rest of your system keeps moving.

[Read more]

Making Consumers Idempotent

2026-02-05sohilladhani

#distributed-systems #kafka #messaging #idempotency #system-design #java

Exactly-once delivery is impossible across boundaries. Here’s the pattern that actually works: at-least-once delivery with idempotent consumers.

[Read more]

Exactly-Once Delivery is a Lie

2026-02-04sohilladhani

#distributed-systems #kafka #messaging #idempotency #system-design

Kafka says exactly-once. Your consumer processed the message twice anyway. Here’s why exactly-once is impossible across system boundaries.

[Read more]

Timeouts: The Hardest Easy Problem

2026-02-02sohilladhani

#distributed-systems #timeouts #resilience #performance #system-design

Why setting timeouts is harder than it looks. Cascading failures, timeout budgets, and the art of picking the right number.

[Read more]

Distributed Locks: When One Process Must Win

2026-02-01sohilladhani

#distributed-systems #redis #locking #concurrency #system-design

Why distributed locking is harder than it looks. Naive Redis locks, Redlock, fencing tokens, and when to avoid locks entirely.

[Read more]

CRDTs: Data Structures That Never Conflict

2026-01-26sohilladhani

#distributed-systems #crdt #eventual-consistency #conflict-resolution #system-design

How CRDTs let distributed systems merge updates without coordination. The math that makes ‘conflict-free’ possible.

[Read more]

Gossip Protocols: How Rumors Keep Systems Alive

2026-01-25sohilladhani

#distributed-systems #gossip-protocol #failure-detection #system-design #architecture

How distributed systems spread information without a central coordinator. The surprisingly effective technique of random peer-to-peer chatter.

[Read more]

Vector Clocks and Lamport Timestamps

2026-01-24sohilladhani

#distributed-systems #vector-clocks #lamport-timestamps #consistency #system-design

How distributed systems track ‘what happened before what’ without trusting wall clocks. Lamport timestamps for ordering, vector clocks for detecting conflicts.

[Read more]

< [Newer posts] :: [Older posts] >