Why setting timeouts is harder than it looks. Cascading failures, timeout budgets, and the art of picking the right number.
Posts for: #Distributed-Systems
Distributed Locks: When One Process Must Win
Why distributed locking is harder than it looks. Naive Redis locks, Redlock, fencing tokens, and when to avoid locks entirely.
CRDTs: Data Structures That Never Conflict
How CRDTs let distributed systems merge updates without coordination. The math that makes ‘conflict-free’ possible.
Gossip Protocols: How Rumors Keep Systems Alive
How distributed systems spread information without a central coordinator. The surprisingly effective technique of random peer-to-peer chatter.
Vector Clocks and Lamport Timestamps
How distributed systems track ‘what happened before what’ without trusting wall clocks. Lamport timestamps for ordering, vector clocks for detecting conflicts.
Raft: The Understandable Consensus Algorithm
How distributed systems agree on state. A practical look at Raft’s Leader Election and Log Replication, finally making sense of consensus.
The CAP Theorem: The Cliché I Tried to Avoid
Why the CAP Theorem is the most misunderstood rule in system design. Addressing the ‘Pick 2’ lie and how it sets the stage for consensus algorithms.
Distributed Tracing: Finding the Needle in the Haystack
When a request vanishes into a maze of 10 microservices. How Distributed Tracing and OpenTelemetry keep you from going insane during an outage.
Materialized Views: The Read Optimization Pattern
Why standard views are just aliases and how materialized views act as an ‘in-database cache’ to solve the cross-shard query problem.
Two Generals Problem: Why Consensus is Impossible
The thought experiment that proves distributed consensus can’t be guaranteed over unreliable networks. Why acknowledgments create infinite regress and what it means for real systems.
Database Sharding: Splitting Data Across Machines
How to partition database across multiple servers. Hash-based vs range-based sharding, rebalancing strategies, and the complexity that comes with it.
Rate Limiting: Token Bucket vs Leaky Bucket
Protecting services from overload with rate limiting. Token bucket and leaky bucket algorithms explained with Java implementations and real-world trade-offs.
Backpressure: When Consumers Can’t Keep Up
Handling slow consumers in distributed systems. Queue growth, memory exhaustion, and strategies for applying backpressure - rejection, rate limiting, and flow control.
Retry Strategies: Exponential Backoff and Jitter
How to retry failed requests without overwhelming servers. Exponential backoff, jitter, and when to give up. Java implementations and real-world patterns.
Idempotency: Why Retries Need It
How to make operations safe to retry. Idempotency keys, database patterns, and why retrying non-idempotent operations causes data corruption.
Session Guarantees: The Promises Your Database Makes to You
Read-your-writes and monotonic reads aren’t just buzzwords. They’re the difference between a database that feels broken and one that makes sense to users.
Horizontal vs Vertical Scaling: Bigger Machine or More Machines
Comparing vertical scaling (scale up) and horizontal scaling (scale out). When to use each, trade-offs, and the complexity that comes with horizontal scaling.
Circuit Breakers: Failing Fast to Stay Alive
How circuit breakers prevent cascading failures in microservices. State transitions, Java implementation with Resilience4j, and real-world thresholds.
Load Balancing Strategies: Picking the Right Server
Comparing load balancing algorithms - Round Robin, Least Connections, Weighted Round Robin, and IP Hash. Java implementations and real-world trade-offs.
Read Repair and Anti-Entropy: Healing Stale Replicas
How do stale replicas catch up in distributed systems? Compare read repair and anti-entropy strategies with Merkle trees for healing data divergence.
Conflict Resolution: When Two Writes Win
Concurrent writes in distributed databases don’t merge automatically. Learn to detect conflicts with version vectors and resolve them without losing data.
Replication Lag: The Bug That Isn’t a Bug
Users see stale data after writes. It’s not a bug, it’s replication lag. Learn to handle read-after-write problems and causality violations in production.
Consistency Models: What Eventually Means
Eventual consistency doesn’t mean milliseconds. Understand linearizability, causal consistency, and quorum reads to pick the right consistency model.
Secondary Indexes in Distributed Databases
Querying partitioned databases by non-partition keys? Learn the tradeoffs between local and global secondary indexes in distributed systems.
Virtual Nodes: The Three-Layer Pattern of Consistent Hashing
Understand virtual nodes in consistent hashing through a simple three-layer model that decouples data distribution from server topology in distributed systems.