How distributed systems spread information without a central coordinator. The surprisingly effective technique of random peer-to-peer chatter.
Posts for: #System-Design
Vector Clocks and Lamport Timestamps
How distributed systems track ‘what happened before what’ without trusting wall clocks. Lamport timestamps for ordering, vector clocks for detecting conflicts.
The In-Memory Trap: Why Objects Are Slow
In-memory doesn’t always mean fast. How shifting from object-based to vector-based storage (Apache Arrow) delivered a 13x performance boost.
Raft: The Understandable Consensus Algorithm
How distributed systems agree on state. A practical look at Raft’s Leader Election and Log Replication, finally making sense of consensus.
The CAP Theorem: The Cliché I Tried to Avoid
Why the CAP Theorem is the most misunderstood rule in system design. Addressing the ‘Pick 2’ lie and how it sets the stage for consensus algorithms.
Distributed Tracing: Finding the Needle in the Haystack
When a request vanishes into a maze of 10 microservices. How Distributed Tracing and OpenTelemetry keep you from going insane during an outage.
Transactional Outbox: Solving the Dual Write Problem
Why your event-driven system is lying to you. Solving the ‘Dual Write’ problem using the Transactional Outbox pattern.
Materialized Views: The Read Optimization Pattern
Why standard views are just aliases and how materialized views act as an ‘in-database cache’ to solve the cross-shard query problem.
Saga Pattern: Managing Distributed Transactions
Why distributed ACID is a trap. Understanding choreography and orchestration sagas for long-running business processes.
Event Sourcing: Events as Source of Truth
Storing events instead of current state. How event sourcing works, rebuilding state from events, and when the complexity is worth it.
CQRS: Separating Reads from Writes
Command Query Responsibility Segregation - why you might want separate models for reading and writing data. When it helps, when it’s overkill, and implementation patterns.
Change Data Capture: Streaming Database Changes
How to capture and stream database changes in real-time. CDC patterns, implementation approaches, and when to use it instead of application-level events.
Two Generals Problem: Why Consensus is Impossible
The thought experiment that proves distributed consensus can’t be guaranteed over unreliable networks. Why acknowledgments create infinite regress and what it means for real systems.
Database Sharding: Splitting Data Across Machines
How to partition database across multiple servers. Hash-based vs range-based sharding, rebalancing strategies, and the complexity that comes with it.
Rate Limiting: Token Bucket vs Leaky Bucket
Protecting services from overload with rate limiting. Token bucket and leaky bucket algorithms explained with Java implementations and real-world trade-offs.
Backpressure: When Consumers Can’t Keep Up
Handling slow consumers in distributed systems. Queue growth, memory exhaustion, and strategies for applying backpressure - rejection, rate limiting, and flow control.
Retry Strategies: Exponential Backoff and Jitter
How to retry failed requests without overwhelming servers. Exponential backoff, jitter, and when to give up. Java implementations and real-world patterns.
Idempotency: Why Retries Need It
How to make operations safe to retry. Idempotency keys, database patterns, and why retrying non-idempotent operations causes data corruption.
Session Guarantees: The Promises Your Database Makes to You
Read-your-writes and monotonic reads aren’t just buzzwords. They’re the difference between a database that feels broken and one that makes sense to users.
Horizontal vs Vertical Scaling: Bigger Machine or More Machines
Comparing vertical scaling (scale up) and horizontal scaling (scale out). When to use each, trade-offs, and the complexity that comes with horizontal scaling.
Load Balancing Strategies: Picking the Right Server
Comparing load balancing algorithms - Round Robin, Least Connections, Weighted Round Robin, and IP Hash. Java implementations and real-world trade-offs.
Bloom Filters: Definitely Not Here
Bloom filters skip unnecessary disk reads in LSM trees by saying ‘definitely not here’ with zero false negatives. Learn how Cassandra and RocksDB use them.
Compaction Strategies: Cleaning Up After LSM Trees
LSM trees create SSTables fast but need compaction. Learn size-tiered vs leveled compaction strategies and the write vs read amplification tradeoff.
LSM Trees vs B-Trees: Write Fast or Read Fast
LSM Trees vs B-Trees: the write-fast or read-fast tradeoff. Learn when to use B-trees (MySQL) vs LSM trees (Cassandra) based on your database workload.
Write-Ahead Logging: How Databases Survive Crashes
How do databases survive crashes and ensure durability? Learn how Write-Ahead Logging (WAL) uses sequential writes to guarantee data persistence without killing performance.
Read Repair and Anti-Entropy: Healing Stale Replicas
How do stale replicas catch up in distributed systems? Compare read repair and anti-entropy strategies with Merkle trees for healing data divergence.
Conflict Resolution: When Two Writes Win
Concurrent writes in distributed databases don’t merge automatically. Learn to detect conflicts with version vectors and resolve them without losing data.
Replication Lag: The Bug That Isn’t a Bug
Users see stale data after writes. It’s not a bug, it’s replication lag. Learn to handle read-after-write problems and causality violations in production.
Consistency Models: What Eventually Means
Eventual consistency doesn’t mean milliseconds. Understand linearizability, causal consistency, and quorum reads to pick the right consistency model.
Secondary Indexes in Distributed Databases
Querying partitioned databases by non-partition keys? Learn the tradeoffs between local and global secondary indexes in distributed systems.