Posts on Sohil Ladhani Blog

Posts on Sohil Ladhani Bloghttps://sohilladhani.com/blog/post/Recent content in Posts on Sohil Ladhani BlogHugoen-usWed, 29 Apr 2026 00:00:00 +0000Column-Family Storagehttps://sohilladhani.com/blog/post/2026-04-29-column-family-storage/Wed, 29 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-29-column-family-storage/Your query is always “give me all events for user X, sorted by time.” A row-oriented database gives you rows where you pay for every column you didn’t ask for. Wide-column stores flip the model: you design the schema around your query, not the other way around. How It Works In a wide-column store like Cassandra or HBase, the primary key has two parts: the partition key and the clustering key.Blue-Green Deploymentshttps://sohilladhani.com/blog/post/2026-04-28-blue-green-deployments/Tue, 28 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-28-blue-green-deployments/Deploy the new version. Test it. Switch traffic. If something breaks, switch back. Instant rollback. Sounds ideal. The database migrations are where it gets complicated. The Pattern Blue-green runs two identical production environments. Blue is live. Green is idle. You deploy your new version to green. You test it against real infrastructure but with no live traffic. When you’re confident, you flip the load balancer to point to green. Green is now live.Canary Releaseshttps://sohilladhani.com/blog/post/2026-04-27-canary-releases/Mon, 27 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-27-canary-releases/CI passed. Staging tests passed. You’ve reviewed the code three times. Then you ship to production and something you never predicted breaks at scale. What Canary Means A canary release sends a small fraction of real traffic to the new version before switching everyone over. 1% of users hit v2, 99% hit v1. You watch your metrics. If v2 behaves well, you expand: 5%, then 20%, then 100%. If metrics degrade, you route that 1% back to v1 and investigate without anyone else affected.Feature Flagshttps://sohilladhani.com/blog/post/2026-04-26-feature-flags/Sun, 26 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-26-feature-flags/You ship a feature. Three minutes later, on-call pings you: error rate spiked. You need to roll back. A full redeploy takes 20 minutes. With a feature flag, rollback takes 30 seconds. What a Flag Is A feature flag is a conditional in your code. If the flag is on, the new code path runs. If it’s off, the old behavior runs. The flag is a config value read at runtime, not at deploy time.The Sidecar Pattern and Service Meshhttps://sohilladhani.com/blog/post/2026-04-25-sidecar-pattern-and-service-mesh/Sat, 25 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-25-sidecar-pattern-and-service-mesh/Every team writes the same retry logic. The same circuit breaker boilerplate. The same mTLS handshake setup. The platform team changes the retry policy and now has to update 30 services. There’s a better way. The Sidecar Pattern A sidecar is a separate process running in the same pod as your service. It intercepts all network traffic in and out. Your service code is unchanged. The sidecar handles retries, timeouts, circuit breaking, load balancing, and observability.Service Discoveryhttps://sohilladhani.com/blog/post/2026-04-24-service-discovery/Fri, 24 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-24-service-discovery/Your service starts. It gets an IP. Three days later it restarts and gets a different IP. Every service that had the old IP hardcoded is now broken. This is why you need service discovery. The Problem With Static Config In a small system, hardcoding IPs in config files works. Then you move to containers. Containers restart, scale up, scale down. IPs change constantly. You need a way for services to find each other without knowing addresses in advance.API Gateway Patternshttps://sohilladhani.com/blog/post/2026-04-23-api-gateway-patterns/Thu, 23 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-23-api-gateway-patterns/You have 12 microservices. Every mobile client talks to all 12. Each service handles its own auth, its own rate limiting, its own CORS. Adding a 13th service means updating every client app. The gateway pattern fixes that. What a Gateway Does An API gateway sits between clients and your services. Clients make one call. The gateway routes it, authenticates the caller, applies rate limits, then proxies to the right service.Feature Storeshttps://sohilladhani.com/blog/post/2026-04-22-feature-stores/Wed, 22 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-22-feature-stores/You train a model using yesterday’s data. You serve it using today’s data. The feature computation logic is slightly different between the two. The model degrades silently and you spend a week figuring out why. The Training-Serving Skew Problem ML models are trained on offline batches: historical data, features computed via Spark jobs, labels aggregated over time. At serving time, features are computed online: live data, lower latency budget, different code path.Embedding Vectors and ANN Searchhttps://sohilladhani.com/blog/post/2026-04-21-embedding-vectors-and-ann-search/Tue, 21 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-21-embedding-vectors-and-ann-search/“Find the 10 most similar items to this one” sounds simple. With millions of items represented as 256-dimensional vectors, exact search is too slow to be useful in production. What Embeddings Are An ML model maps an item (a product, a document, a user’s history) to a dense numeric vector. The geometry of that vector space encodes semantic similarity: similar items land close together. You train the model on interaction data and the embeddings learn to represent “things that users treat similarly.Collaborative Filteringhttps://sohilladhani.com/blog/post/2026-04-20-collaborative-filtering/Mon, 20 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-20-collaborative-filtering/You don’t know what a user wants. But you know what people like them have wanted. That’s the intuition behind collaborative filtering. The Two Approaches User-based CF finds users similar to you, then recommends what they liked. Item-based CF finds items similar to what you’ve already liked. Item-based is generally more stable because user behavior shifts rapidly (you might buy a couch once), while item similarity changes slowly (a couch is similar to other furniture regardless of who buys it).Token Revocation and Blacklistinghttps://sohilladhani.com/blog/post/2026-04-19-token-revocation-and-blacklisting/Sun, 19 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-19-token-revocation-and-blacklisting/You log out. Your JWT is still valid. The server has no record it was ever issued. This is the stateless token revocation problem. Why Revocation Is Hard JWTs are stateless by design. The server validates a token by checking the signature and expiry. It doesn’t consult a database. This is what makes them fast and scalable. But it means there’s no central list of “valid tokens” to update when a token should no longer be accepted.OAuth 2.0 Authorization Flowshttps://sohilladhani.com/blog/post/2026-04-18-oauth2-authorization-flows/Sat, 18 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-18-oauth2-authorization-flows/OAuth 2.0 is not an authentication protocol. It’s an authorization protocol. That confusion is the root of most OAuth misuse. What OAuth Actually Does OAuth lets a user grant a third-party application limited access to their account without sharing their password. The user sees a consent screen listing what the app wants to access. They approve. The app gets a token with exactly those permissions. Your password never leaves the authorization server.JWT and Token-Based Authhttps://sohilladhani.com/blog/post/2026-04-17-jwt-token-based-auth/Fri, 17 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-17-jwt-token-based-auth/The server doesn’t remember you. Every request carries proof of who you are. That’s the point of a token. The Structure A JWT is three base64url-encoded segments joined by dots: header, payload, signature. The header says which algorithm signed it. The payload carries claims: user ID, roles, expiry time. The signature is a cryptographic proof that the header and payload haven’t been tampered with. The server doesn’t need a database lookup to verify a JWT.Sequenced Writeshttps://sohilladhani.com/blog/post/2026-04-16-sequenced-writes/Thu, 16 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-16-sequenced-writes/Two events arrive out of order. You don’t know they’re out of order. You process them anyway. The system ends up in a state that never should have existed. Sequence Numbers as the Foundation A global sequence number assigned to every write event is the most direct solution to ordering problems. Event 1, event 2, event 3. If event 4 arrives after event 6, you know something is missing. You wait, or request a replay, rather than blindly processing forward.Market Data Distributionhttps://sohilladhani.com/blog/post/2026-04-15-market-data-distribution/Wed, 15 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-15-market-data-distribution/Every trade generates a tick: a price, a volume, a timestamp. An active stock might generate thousands of ticks per second. Distributing that data to thousands of subscribers simultaneously is its own problem. What Tick Data Looks Like A tick is small: instrument ID, price, quantity, timestamp. The volume is the problem. During market open or a news event, tick rates spike dramatically. Subscribers range from high-frequency algorithms (latency-sensitive, need every tick) to dashboards (showing “current price,” don’t care about ticks they missed).Order Matching Enginehttps://sohilladhani.com/blog/post/2026-04-14-order-matching-engine/Tue, 14 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-14-order-matching-engine/A stock exchange doesn’t just record trades. It runs an algorithm that decides which buyer gets matched with which seller. That algorithm is the matching engine, and its design choices are unusually interesting. The Limit Order Book The core data structure is the limit order book (LOB): two sorted collections of orders, bids (buy orders) and asks (sell orders). Bids are sorted by price descending (highest buyer first), asks by price ascending (lowest seller first).Delivery Receipts and Read Trackinghttps://sohilladhani.com/blog/post/2026-04-13-delivery-receipts-and-read-tracking/Mon, 13 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-13-delivery-receipts-and-read-tracking/“Sent” is not “delivered.” “Delivered” is not “opened.” These are three different states and conflating them causes subtle bugs in badge counts and notification UIs. The Delivery Gap APNs and FCM give you delivery confirmation at the gateway level, not the device level. You know the gateway accepted your payload. You don’t know if the device received it, displayed it, or was offline when it arrived. For most notifications this is fine.Notification Deduplicationhttps://sohilladhani.com/blog/post/2026-04-12-notification-deduplication/Sun, 12 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-12-notification-deduplication/Your retry logic fires. The user gets the same notification twice. They think your app is broken. They’re not wrong. The Problem with Retries Push delivery is at-least-once by design. Your server sends to APNs/FCM, the network hiccups, you don’t get a response, so you retry. APNs might have delivered the first one. The user now sees two identical alerts. The fix lives at two levels: your server and the gateway.Push Notification Deliveryhttps://sohilladhani.com/blog/post/2026-04-11-push-notification-delivery/Sat, 11 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-11-push-notification-delivery/You don’t send a push notification directly to a phone. You send it to Apple or Google, and they deliver it for you. That indirection has consequences most backend engineers don’t think about until something breaks. APNs and FCM Apple Push Notification Service (APNs) handles iOS. Firebase Cloud Messaging (FCM) handles Android (and can handle iOS too). Your server maintains a persistent HTTP/2 connection to these gateways and submits payloads. The gateway handles the actual delivery to the device, retries if the device is offline, and tells you when a token is no longer valid.Storage Tieringhttps://sohilladhani.com/blog/post/2026-04-10-storage-tiering/Fri, 10 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-10-storage-tiering/Most of your data is accessed once and then never again. Storing it on fast, expensive storage forever is just burning money. Hot, Warm, Cold The canonical model is three tiers based on access frequency. Hot storage (SSD-backed, high IOPS) handles recent data that’s accessed constantly. Warm storage (standard HDD or S3 Standard-IA) holds data accessed occasionally. Cold storage (archival, like Glacier) holds data that might never be touched again but legally must be retained.Delta Synchttps://sohilladhani.com/blog/post/2026-04-09-delta-sync/Thu, 09 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-09-delta-sync/You save a 200 MB file. One word changed. Re-uploading 200 MB to sync that change is absurd. Delta sync is how you avoid it. The Core Idea Split the file into blocks. On an update, compare the new version’s blocks against the stored version’s blocks. Transfer only the blocks that changed. Rsync pioneered this. It computes a fast rolling checksum for each block on the remote side, sends those checksums to the client, the client finds which local blocks match and which don’t, and transmits only the mismatches.Content-Addressable Storagehttps://sohilladhani.com/blog/post/2026-04-08-content-addressable-storage/Wed, 08 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-08-content-addressable-storage/Two users upload the same 50 MB file. Naive storage keeps two copies. Content-addressable storage keeps one. What “Content-Addressable” Means Instead of locating data by where it lives (a path, a filename), you locate it by what it is. Hash the content, use the hash as the key. Same content, same hash, same storage location. SHA-256 a file and store the result as its address. The practical consequence: deduplication becomes automatic.Offline-First Synchttps://sohilladhani.com/blog/post/2026-04-07-offline-first-sync/Tue, 07 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-07-offline-first-sync/The field rep drove into a dead zone. The mobile app kept working: they filled out three forms, updated two account records, closed a deal. Forty minutes later, connectivity returned and the sync ran. Two of those records had been updated by a desktop user in the meantime. The mobile changes were silently dropped. No error. No prompt. Just gone. The Core Problem The client operates against a local snapshot while offline.Revision History and Snapshottinghttps://sohilladhani.com/blog/post/2026-04-06-revision-history-and-snapshotting/Mon, 06 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-06-revision-history-and-snapshotting/A user hits Ctrl+Z forty times and expects to land exactly where they were yesterday. That is not just undo. That is a complete audit trail of every edit, stored efficiently, queryable at any point in time. The naive approach: store a full copy of the document after every change. Works for ten users. Collapses at ten thousand. Deltas, Not Copies Instead of storing full document state after every edit, store only what changed: the operation (insert 3 chars at position 12, delete 5 chars at position 20).Operational Transformationhttps://sohilladhani.com/blog/post/2026-04-05-operational-transformation/Sun, 05 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-05-operational-transformation/Two users edit the same document simultaneously. User A inserts “X” at position 5. User B deletes the character at position 3. Apply both naively and the result is corrupted. The positions shifted when B’s deletion ran first, and A’s insertion lands in the wrong place. The Position Problem Operations encode positions at generation time, not application time. When document state changes between generation and application, positions are stale. Operational Transformation (OT) transforms an incoming op relative to already-applied ops before executing it.Lambda and Kappa Architecturehttps://sohilladhani.com/blog/post/2026-04-04-lambda-and-kappa-architecture/Sat, 04 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-04-lambda-and-kappa-architecture/Real-time results are fast and approximate. Historical results are slow and accurate. The tension between them is where Lambda and Kappa architecture come from. Lambda: Two Pipelines Lambda runs two parallel systems. The batch layer processes all historical data on a schedule (Spark on HDFS, every few hours) and produces ground truth. The speed layer processes the live stream (Kafka Streams or Flink) for low-latency results. The serving layer merges both: “latest batch result plus stream delta since the last batch.Watermarks and Late-Arriving Datahttps://sohilladhani.com/blog/post/2026-04-03-watermarks-and-late-data/Fri, 03 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-03-watermarks-and-late-data/There are two clocks in any stream processing system. Event time: when the click actually happened, recorded in the payload. Processing time: when your system received it. On a healthy network they’re close. In reality they’re not. Mobile clients buffer events when offline. Retries add delay. A click at 10:00:05 might reach your processor at 10:00:47. The 10:00 window has long since closed. The Problem With Never Waiting If you never close a window, you never produce output.Stream Processing Windowshttps://sohilladhani.com/blog/post/2026-04-02-stream-processing-windows/Thu, 02 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-02-stream-processing-windows/Aggregating over an infinite stream sounds easy until you realize you have no idea when it ends. You need to cut it into chunks. That’s what windows are. Three Window Types Tumbling windows are fixed, non-overlapping buckets. “Clicks per minute” is a tumbling window: minute 1, minute 2, minute 3, no overlap. Simple to implement, but events that span the boundary get split across buckets. Sliding windows overlap. “Average clicks in the last 5 minutes, recomputed every minute” means each event can appear in up to 5 windows.ZooKeeper Ephemeral Nodeshttps://sohilladhani.com/blog/post/2026-04-01-zookeeper-ephemeral-nodes/Wed, 01 Apr 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-04-01-zookeeper-ephemeral-nodes/Redis locks expire after a TTL. If your process crashes, you wait up to 30 seconds for the lock to become available. ZooKeeper takes a different approach: lock it to the session, not a timer. Ephemeral Nodes ZooKeeper has two kinds of nodes: persistent (survive until explicitly deleted) and ephemeral (automatically deleted when the client session expires). A session is kept alive by a heartbeat. If the client crashes, heartbeats stop, the session expires after a configurable timeout, and the ephemeral node vanishes.The Redlock Algorithmhttps://sohilladhani.com/blog/post/2026-03-31-redlock-algorithm/Tue, 31 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-31-redlock-algorithm/A single Redis instance holds your lock. Redis crashes. The lock entry is gone. But your client already received “acquired” before the crash and is happily running. Another client acquires the same lock on the recovered instance. Two lock holders. The single-instance Redis lock has a fundamental flaw. Quorum Locking Redlock is Redis creator Antirez’s answer. Instead of one Redis, use N independent instances (typically 5). To acquire the lock:Redis Distributed Lockshttps://sohilladhani.com/blog/post/2026-03-30-redis-distributed-locks/Mon, 30 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-30-redis-distributed-locks/Two services start the same batch job at the same time. Both read the same data, both process it, both write conflicting results. Your database row lock didn’t help because the services are on different JVMs. This is the distributed lock problem. Why Database Locks Don’t Work Here A SELECT FOR UPDATE on a MySQL row holds a lock only for the lifetime of that connection. Cross-service, that’s useless. You’d need a shared coordination point, something every instance can talk to.Cache Write Strategieshttps://sohilladhani.com/blog/post/2026-03-29-cache-write-strategies/Sun, 29 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-29-cache-write-strategies/Reading from cache is easy. Writing is where it gets complicated. Three strategies, each with a different answer to the question: when does the cache get updated relative to the database? Write-through updates the cache and the database synchronously on every write. The cache is always consistent with the DB. The downside is that every write pays double the cost: serialize the object, write to cache, write to DB, all in the same request path.Hot Key Detection and Mitigationhttps://sohilladhani.com/blog/post/2026-03-28-hot-key-detection/Sat, 28 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-28-hot-key-detection/Redis is single-threaded per instance. One key receiving 50,000 reads per second will pin a single CPU core and nothing else on that shard gets processed fast. This is the hot key problem. Unlike a database where you might add replicas or indexes, a single Redis key is owned by a single shard. Traffic concentration on that key concentrates CPU on that node. Detection is straightforward: redis-cli --hotkeys scans keyspace and reports access frequency.Cache Eviction Policieshttps://sohilladhani.com/blog/post/2026-03-27-cache-eviction-policies/Fri, 27 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-27-cache-eviction-policies/Cache fills up. Something has to go. The question is: which thing? LRU (Least Recently Used) evicts whatever was accessed longest ago. Simple, intuitive, fast to implement with a doubly-linked list and hash map. LFU (Least Frequently Used) evicts whatever was accessed least often. More accurate in theory, more expensive in practice. The LFU decay problem tripped me up: new items start with zero frequency. A fresh key that’s about to become hot looks identical to a stale key nobody cares about.Testing Eventually Consistent Systems: When Assertions Need Patiencehttps://sohilladhani.com/blog/post/2026-03-26-testing-eventually-consistent-systems/Thu, 26 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-26-testing-eventually-consistent-systems/Write to primary. Read from replica. Assert. Fails intermittently. The classic flaky test in distributed systems. It’s not a bug in your code. It’s a bug in your test: you’re testing an eventually consistent system with strong-consistency assertions. This confused me for longer than I’d like to admit. The Polling Pattern The simplest fix: poll until the assertion passes or a timeout expires. @Test void userUpdateEventuallyPropagates() { // Write to primary userService.Contract Testing: Verifying Service Interactions Without E2E Testshttps://sohilladhani.com/blog/post/2026-03-25-contract-testing-in-microservices/Wed, 25 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-25-contract-testing-in-microservices/Service A returns a user object. Service B expects that object to have a name field. Team A renames it to fullName. Their tests pass. Team B’s tests pass (they mock Service A’s response). In production, Service B crashes with a null pointer because name doesn’t exist anymore. End-to-end tests should catch this, right? Maybe, if they’re up to date, if they cover this path, if they run in a shared environment.Chaos Engineering: Breaking Things on Purposehttps://sohilladhani.com/blog/post/2026-03-24-chaos-engineering-and-fault-injection/Tue, 24 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-24-chaos-engineering-and-fault-injection/All tests pass. 100% of health checks green. Monitoring looks beautiful. Then a single Redis node goes down, and your checkout flow returns 500s for 20 minutes. Your circuit breaker was configured but never actually triggered in production. It had a bug. You never knew because you never broke Redis on purpose. Chaos engineering is the practice of deliberately injecting failures to find these gaps before your users do. The Steady-State Hypothesis Before breaking anything, define what “normal” looks like.Consumer Group Rebalancing: The Partition Shufflehttps://sohilladhani.com/blog/post/2026-03-23-consumer-group-rebalancing/Mon, 23 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-23-consumer-group-rebalancing/Three consumers, six partitions. Each consumer handles two partitions. Consumer C crashes. Who takes over C’s partitions? Both A and B need to know C is gone, agree on the new assignment, and resume processing. This coordination is called rebalancing, and it’s one of the most disruptive events in a Kafka consumer group. The Stop-the-World Problem In the eager (default) rebalancing protocol, when any consumer joins or leaves, ALL consumers stop processing.Log Compaction: Keeping the Latest Without Keeping Everythinghttps://sohilladhani.com/blog/post/2026-03-22-log-compaction/Sun, 22 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-22-log-compaction/A Kafka topic stores every event ever published. User-42 changed their email 500 times. All 500 events are in the log. A new consumer starting from the beginning has to replay all 500 to figure out the current email. That’s wasteful. Delete old events? You’d break consumers who haven’t processed them yet. You need a way to keep the latest value for each key while discarding the history. How Log Compaction Works Instead of deleting records by age (retention period), log compaction deletes records by key.Merkle Trees: Detecting Differences Without Comparing Everythinghttps://sohilladhani.com/blog/post/2026-03-21-merkle-trees/Sat, 21 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-21-merkle-trees/Replica A and Replica B should hold the same 50 million rows. Are they in sync? Comparing every row pair: 50 million comparisons over the network. That’s not a sync check, that’s a distributed denial of service on your own infrastructure. Merkle trees compress this to a handful of hash comparisons. How It Works Split your data into ranges (by key). Hash each range. Then hash pairs of hashes together, building a tree.Quorum Reads and Writes: Tuning Consistency with Mathhttps://sohilladhani.com/blog/post/2026-03-20-quorum-reads-and-writes/Fri, 20 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-20-quorum-reads-and-writes/You replicate data across 3 nodes for durability. A write comes in. Do you wait for all 3 to confirm? That’s slow and any one node going down blocks all writes. Do you confirm after just 1? That’s fast, but if that node dies before replicating, the data is gone. Quorum systems let you pick the balance. The Quorum Formula With N replicas, a write quorum W is the number of nodes that must acknowledge a write.Push vs Pull Metrics Collection: Two Ways to Get the Numbershttps://sohilladhani.com/blog/post/2026-03-19-push-vs-pull-metrics-collection/Thu, 19 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-19-push-vs-pull-metrics-collection/You have 200 microservices. Each produces metrics. How do those metrics reach your monitoring system? Two fundamentally different approaches, and the choice affects service discovery, failure modes, and scalability. Pull Model (Prometheus-Style) Each service exposes a /metrics endpoint. The monitoring system knows about all services (via service discovery) and scrapes each one on a schedule: every 15 seconds, hit /metrics, parse the response, store the data. // Spring Boot Actuator exposes metrics automatically // GET /actuator/prometheus returns: // http_server_requests_seconds_count{method="GET",uri="/api/users"} 1523 // http_server_requests_seconds_sum{method="GET",uri="/api/users"} 45.Downsampling: Keeping Trends, Not Every Data Pointhttps://sohilladhani.com/blog/post/2026-03-18-downsampling-and-data-retention/Wed, 18 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-18-downsampling-and-data-retention/Your monitoring system stores CPU usage every second. That’s 86,400 data points per day per metric. For 1,000 metrics across 200 services, you’re generating 17 billion data points per day. Storage isn’t free, and nobody will ever look at per-second data from three months ago. But you can’t just delete it. “What was our error rate trend last quarter?” is a legitimate question. You need the trend without the granularity.Time-Series Databases: Storage Built for Timestampshttps://sohilladhani.com/blog/post/2026-03-17-time-series-databases/Tue, 17 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-17-time-series-databases/Every second, your system emits: CPU usage, memory, request count, error rate, latency percentiles, queue depth. Multiply by 200 services. That’s hundreds of thousands of data points per second, all append-only, all timestamped, and you mostly query them by time range. Regular databases can handle this, technically. But they weren’t built for it. What Makes Time-Series Different The access pattern is extreme. Writes are almost entirely appends: new data comes in, old data never changes.Transcoding Pipelines: Processing Video at Scalehttps://sohilladhani.com/blog/post/2026-03-16-transcoding-pipelines/Mon, 16 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-16-transcoding-pipelines/A user uploads a 4K video. Your system needs to produce: 4 resolution variants, 3 audio codec versions, thumbnails at 10-second intervals, and subtitle extraction. That’s not one job. That’s a directed acyclic graph of dependent tasks. The Pipeline as a DAG Transcoding isn’t a linear process. Some steps depend on others. Some can run in parallel. graph TD U["Upload: raw video"] --> V["Validate format"] V --> S["Split into segments"] S --> T1["Adaptive Bitrate Streaming: Adjusting Quality on the Flyhttps://sohilladhani.com/blog/post/2026-03-15-adaptive-bitrate-streaming/Sun, 15 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-15-adaptive-bitrate-streaming/You’re watching a video on your phone. WiFi is strong, so it’s crisp 1080p. You walk to the kitchen. Signal weakens. The video buffers for 10 seconds. Terrible experience. Adaptive bitrate streaming solves this. Instead of one video file, the server has the same video encoded at multiple quality levels. The client measures its bandwidth and switches quality between segments. Bandwidth drops? Next segment loads in 480p. Bandwidth recovers? Back to 1080p.CDN and Edge Caching: Serving Content from Next Doorhttps://sohilladhani.com/blog/post/2026-03-14-cdn-and-edge-caching/Sat, 14 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-14-cdn-and-edge-caching/A user in Tokyo requests a video hosted in Virginia. Round trip: 150-200ms. Multiply by every segment, every viewer, every concurrent stream. Your origin server melts. CDNs solve this by copying content to edge servers worldwide. Tokyo users hit the Tokyo edge. Virginia users hit the Virginia edge. Origin only serves cache misses. Pull vs Push Two strategies for getting content to the edge. Pull CDN: edge server gets a request, doesn’t have the content, fetches from origin, caches it, serves it.Proximity Search: Finding What's Nearby at Scalehttps://sohilladhani.com/blog/post/2026-03-13-proximity-search/Fri, 13 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-13-proximity-search/“Show me restaurants within 2 km.” Simple sentence, hard problem. You can’t compute Haversine distance against every row. You need to narrow the candidate set first, then rank by distance. This is where geohashing and spatial indexes become the query pattern, not just the storage trick. The Expanding Ring Pattern Start with the user’s geohash cell. Query for locations in that cell. Not enough results? Expand to neighboring cells. Still not enough?Quadtrees: When Fixed Grids Aren't Enoughhttps://sohilladhani.com/blog/post/2026-03-12-quadtrees-and-spatial-indexing/Thu, 12 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-12-quadtrees-and-spatial-indexing/Geohashing divides the world into equal-sized cells. Works great when your data is evenly distributed. But data is never evenly distributed. A geohash cell in downtown Tokyo contains thousands of points. A cell in the Sahara contains zero. You need a structure that subdivides dense areas and leaves sparse areas alone. How Quadtrees Work Start with a single rectangle covering your entire region. Set a capacity threshold, say 10 points per node.Geohashing: Turning Coordinates into Searchable Stringshttps://sohilladhani.com/blog/post/2026-03-11-geohashing/Wed, 11 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-11-geohashing/Your user is at (37.7749, -122.4194). You need the nearest 20 restaurants. Brute-forcing the distance formula against 10 million rows? That’s not a query, that’s a punishment. The problem: coordinates are two-dimensional. Database indexes are one-dimensional. You need a way to collapse 2D into 1D while preserving locality. How Geohashing Works Geohashing recursively divides the world into a grid. Start with the entire map. Split it in half vertically: left half is 0, right half is 1.Work Stealing: Dynamic Load Balancing Without a Coordinatorhttps://sohilladhani.com/blog/post/2026-03-10-work-stealing/Tue, 10 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-10-work-stealing/You have 1,000 tasks and 4 worker threads. Split evenly: 250 each. Sounds fair. But task sizes aren’t uniform. Thread 1 gets 250 tiny tasks and finishes in a second. Thread 3 gets 250 heavy tasks and takes a minute. Threads 1 and 2 sit idle while 3 and 4 grind. Static partitioning assumes equal task sizes. Work stealing doesn’t. How It Works Each worker has its own deque (double-ended queue).Delayed Message Delivery: Execute This in 30 Minuteshttps://sohilladhani.com/blog/post/2026-03-09-delayed-message-delivery/Mon, 09 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-09-delayed-message-delivery/User signs up, you want to send a welcome email in 30 minutes. The obvious approach: Thread.sleep(30 * 60 * 1000). The obvious problem: your server restarts and the task is gone forever. Delayed execution needs to survive restarts, scale across instances, and handle failures. Database Polling The simplest durable approach: write the task to a database with an execute_at timestamp. A poller checks every few seconds for due tasks.Leader Election: Picking One Node to Rulehttps://sohilladhani.com/blog/post/2026-03-08-leader-election/Sun, 08 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-08-leader-election/You deploy your scheduled job across three instances for high availability. All three wake up at midnight and start the same batch process. Now you have triple the writes, conflicting updates, and a mess to clean up. You need exactly one node to run the job. The others should wait and take over if it dies. Lease-Based Election The simplest production approach: use a shared lock with a time limit (a lease).MapReduce: Processing Data That Won't Fit on One Machinehttps://sohilladhani.com/blog/post/2026-03-07-mapreduce/Sat, 07 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-07-mapreduce/You need to count word frequencies across 10TB of text. One machine with 16GB RAM can’t even load the data. But 100 machines can each handle 100GB. The problem isn’t the computation. It’s the coordination. MapReduce gives you a framework: you write two functions, the framework handles the rest. The Three Phases Map: Each worker processes its chunk independently. Input: a slice of data. Output: key-value pairs. Shuffle: The framework groups all values by key and routes them to the right reducer.Trie Data Structures: Prefix Search in Millisecondshttps://sohilladhani.com/blog/post/2026-03-06-trie-data-structures/Fri, 06 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-06-trie-data-structures/User types “dis”. You need to suggest “distributed”, “discovery”, “disconnect”. With a hash map, you’d have to iterate every key and check if it starts with “dis”. That’s O(n). With a trie, it’s O(3): walk three nodes down and collect everything below. How Tries Work A trie is a tree where each node represents a character. The path from root to any node spells a prefix. Nodes marked as “end” represent complete words.Inverted Indexes: How Search Actually Workshttps://sohilladhani.com/blog/post/2026-03-05-inverted-indexes/Thu, 05 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-05-inverted-indexes/You have 50,000 documents. User searches for “connection pooling”. You could scan every document for those words. That’s O(n) per query. At scale, it’s unusable. An inverted index flips the relationship. Instead of asking “what words are in this document?”, you pre-compute “which documents contain this word?” The Structure For each term, maintain a sorted list of document IDs (a posting list). public class InvertedIndex { private final Map<String, TreeSet<Long>> index = new HashMap<>(); public void addDocument(long docId, String content) { for (String term : tokenize(content)) { index.Checkpointing: Resuming Long-Running Jobs Without Starting Overhttps://sohilladhani.com/blog/post/2026-03-04-checkpointing/Wed, 04 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-04-checkpointing/Your batch job processes 100,000 records. At record 87,000 it crashes. OOM, network timeout, pod eviction. Without checkpointing, you restart from record 1. With checkpointing, you restart from record 86,000. The difference between losing three hours of work and losing ten minutes. The Pattern Periodically save your position to durable storage. On restart, read the last checkpoint and resume from there. public class CheckpointedProcessor { private final String jobId; public void run() { long startFrom = checkpointRepo.Content Fingerprinting: Detecting Near-Duplicates at Scalehttps://sohilladhani.com/blog/post/2026-03-03-content-fingerprinting/Tue, 03 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-03-content-fingerprinting/Two documents differ by one paragraph. They’re not identical, so their SHA-256 hashes are completely different. But they’re 95% the same content. How do you detect that without comparing every pair? At scale, pairwise comparison is impossible. A million documents means 500 billion pairs. You need a shortcut. SimHash The trick: build a hash where similar inputs produce similar outputs. Regular hashes do the opposite (tiny change, completely different hash). SimHash preserves similarity.Priority Queues in Distributed Systemshttps://sohilladhani.com/blog/post/2026-03-02-priority-queues/Mon, 02 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-02-priority-queues/You have a message queue. Urgent alerts and bulk data syncs go into the same queue. The urgent alert sits behind 5,000 bulk messages. By the time it’s processed, it’s no longer urgent. FIFO doesn’t care about importance. Priority queues do. Multi-Level Priority The simplest approach: multiple queues, one per priority level. Workers check the high-priority queue first. public Runnable pollNext() { for (Queue queue : List.of(highQueue, mediumQueue, lowQueue)) { Runnable task = queue.Reconciliation: When Your Systems Disagreehttps://sohilladhani.com/blog/post/2026-03-01-reconciliation/Sun, 01 Mar 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-03-01-reconciliation/You send an event to an external system. Your database marks it as sent. The external system never received it. Now your internal state is wrong and nobody knows. This happens in every system that integrates with something outside its boundary. Network blips, missed CDC events, bugs in serialization. Data drifts apart silently. The Reconciliation Pattern On a schedule, fetch records from both sides and compare. @Scheduled(cron = "0 0 * * * *") // every hour public void reconcile() { Set<String> internal = internalRepo.State Machines: Making Distributed Workflows Predictablehttps://sohilladhani.com/blog/post/2026-02-28-state-machines/Sat, 28 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-28-state-machines/You have a workflow: create, process, complete. You model it with boolean flags: isProcessed, isCompleted, isFailed. Then someone asks: can a record be both processed and failed? Your code says yes. Your business logic says no. Welcome to impossible states. Explicit States, Explicit Transitions Replace flags with a single state field and a set of valid transitions. public enum WorkflowState { CREATED, PROCESSING, COMPLETED, FAILED; private static final Map<WorkflowState, Set<WorkflowState>> TRANSITIONS = Map.Optimistic vs Pessimistic Concurrency: Locks vs Versionshttps://sohilladhani.com/blog/post/2026-02-27-optimistic-vs-pessimistic-concurrency/Fri, 27 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-27-optimistic-vs-pessimistic-concurrency/Two users update the same row at the same time. One of them is going to lose. The question is when they find out. Pessimistic: Lock First, Ask Questions Later Grab the lock before you read. Nobody else can touch this row until you’re done. START TRANSACTION; SELECT * FROM configurations WHERE id = 42 FOR UPDATE; -- Row is now locked. Other transactions block here. UPDATE configurations SET value = 'new_value', version = version + 1 WHERE id = 42; COMMIT; Safe.Two-Phase Commit: The Original Distributed Transactionhttps://sohilladhani.com/blog/post/2026-02-26-two-phase-commit/Thu, 26 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-26-two-phase-commit/You need to write to two databases atomically. Either both succeed or both roll back. No partial state. Two-phase commit (2PC) solves this. It’s also the reason distributed transactions have a bad reputation. The Protocol A coordinator manages the transaction across participants (databases, services). Phase 1: Prepare. Coordinator asks each participant: “Can you commit?” Each participant writes to its WAL, acquires locks, and votes YES or NO. Phase 2: Commit/Abort. If all voted YES, coordinator sends COMMIT.Input Validation and Abuse Prevention in Distributed Systemshttps://sohilladhani.com/blog/post/2026-02-25-input-validation-and-abuse-prevention/Wed, 25 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-25-input-validation-and-abuse-prevention/You build a public API. Users can submit content. Day one, someone submits malicious payloads. Day two, a bot floods your endpoint with 10,000 requests per second. Day three, spam starts showing up in your system. Every public-facing write path needs defense. Not just authentication. Actual input validation and abuse prevention. The Layered Defense Pattern Don’t put all your validation in one place. Layer it. Layer 1: Syntactic validation. Is the input well-formed?Approximate Counting: HyperLogLog and Count-Min Sketchhttps://sohilladhani.com/blog/post/2026-02-24-approximate-counting-hyperloglog/Tue, 24 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-24-approximate-counting-hyperloglog/Count the unique users who visited a page today. Simple: put every user ID in a HashSet, check the size. 100 million unique users. Each ID is 8 bytes. That’s 800MB of memory. For one counter. Now multiply by thousands of pages. You can’t afford exact counting at this scale. But you can get 99.2% accuracy in 12KB. HyperLogLog The idea is counterintuitive. Hash each item. Look at the binary representation.SLOs and Error Budgets: When Good Enough is a Numberhttps://sohilladhani.com/blog/post/2026-02-23-slos-and-error-budgets/Mon, 23 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-23-slos-and-error-budgets/“We need 100% uptime.” No, you don’t. And you can’t have it anyway. What you need is a number. A specific, measurable target that tells you when reliability is good enough and when it’s not. That’s an SLO. SLI, SLO, SLA SLI (Service Level Indicator): What you measure. Request latency, error rate, availability. Concrete metrics from your structured logs or monitoring system. SLO (Service Level Objective): What you promise internally. “99.Base62 Encoding: Turning Numbers into Short Stringshttps://sohilladhani.com/blog/post/2026-02-22-base62-encoding/Sun, 22 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-22-base62-encoding/You have an ID: 192847561038. Nineteen digits. Ugly in a URL, hard to share, impossible to remember. Same number in base62: 3dJ7kP2. Seven characters. Clean, compact, URL-safe. Base62 encoding is how systems turn large numeric IDs into short, human-friendly strings. Why Base62 Base10 uses digits 0-9 (10 characters). Base16 (hex) uses 0-9 and a-f (16 characters). Base62 uses 0-9, a-z, and A-Z (62 characters). More characters per position means fewer positions needed.Distributed ID Generation: Snowflake and Friendshttps://sohilladhani.com/blog/post/2026-02-21-distributed-id-generation/Sat, 21 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-21-distributed-id-generation/Single database, auto-increment primary key. ID 1, 2, 3, 4. Simple. Unique. Ordered. Now shard across 4 databases. Each one auto-increments independently. Shard A generates 1, 2, 3. Shard B generates 1, 2, 3. You now have duplicate IDs across the system. The Options UUID (v4): 128-bit random value. Collision probability is astronomically low. No coordination needed. String id = UUID.randomUUID().toString(); // "f47ac10b-58cc-4372-a567-0e02b2c3d479" Problem: 36 characters, not sortable by time, poor index performance in MySQL (random values fragment B-tree indexes).Event Aggregation: When 47 Notifications Become Onehttps://sohilladhani.com/blog/post/2026-02-20-event-aggregation/Fri, 20 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-20-event-aggregation/47 people performed the same action on the same item. Do you show 47 separate notifications? Or “47 people did X on Y”? Obviously the second. But getting there in a distributed system is trickier than it looks. The Raw Event Problem Your system generates individual events. Each one is stored separately because that’s how event sourcing and fan-out work. But displaying them individually doesn’t scale. A popular item generates hundreds of events.Social Graphs at Scale: Storing Relationships in MySQLhttps://sohilladhani.com/blog/post/2026-02-19-social-graphs-at-scale/Thu, 19 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-19-social-graphs-at-scale/User A follows User B. Store it in a table. Done? CREATE TABLE follows ( follower_id BIGINT NOT NULL, followee_id BIGINT NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (follower_id, followee_id) ); Simple. Until you need to answer two different questions at scale. Two Queries, One Table “Who does User A follow?” Easy. Primary key starts with follower_id. MySQL seeks directly. SELECT followee_id FROM follows WHERE follower_id = 12345; “Who follows User B?Relevance Scoring: Why Chronological Order Breaks Downhttps://sohilladhani.com/blog/post/2026-02-18-relevance-scoring/Wed, 18 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-18-relevance-scoring/You follow 500 sources. Each posts multiple times a day. You open the app. There are 3,000 new items since your last visit. Sorted by time. You scroll past 200 items. Maybe 10 were interesting. The rest? Noise. Chronological order works when the volume is low. Once it’s not, you need scoring. The Scoring Function Every item gets a score. Higher score means higher in the list. The simplest useful model has three signals:Pre-Signed URLs: Uploading Files Without Touching Your Servershttps://sohilladhani.com/blog/post/2026-02-17-pre-signed-urls/Tue, 17 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-17-pre-signed-urls/User uploads a 10MB image. The request hits your API server. Your server reads the entire file into memory, then forwards it to S3. Meanwhile, that server thread is blocked, your memory spikes, and three other requests are waiting. I’ve seen a single bulk upload take down an API server. Not because of any bug, just because it ran out of memory buffering files it didn’t need to touch. The Pre-Signed URL Pattern Instead of proxying uploads through your server, let the client upload directly to object storage.Presence Systems: Who's Online and How You Knowhttps://sohilladhani.com/blog/post/2026-02-16-presence-systems/Mon, 16 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-16-presence-systems/Green dot next to a username. Looks simple. Behind it is a distributed system that’s constantly guessing whether a user is still connected. Presence is deceptively hard. It’s an inherently eventually consistent problem, and getting it wrong means showing someone as online when they closed their laptop 10 minutes ago. Heartbeat-Based Presence The standard approach: clients send periodic heartbeats. Server tracks the last heartbeat time. If no heartbeat arrives within a timeout window, the user is considered offline.Cursor-Based Pagination: Why Offset Breaks at Scalehttps://sohilladhani.com/blog/post/2026-02-15-cursor-based-pagination/Sun, 15 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-15-cursor-based-pagination/Page 1 loads instantly. Page 10 is fine. Page 500? Your API takes 4 seconds. Users on page 1000 give up entirely. I spent way too long blaming “slow queries” before I realized the pagination itself was the problem. Why Offset Pagination Breaks SELECT * FROM messages ORDER BY created_at DESC LIMIT 20 OFFSET 50000; MySQL doesn’t skip to row 50,000. It reads 50,020 rows, throws away the first 50,000, and returns 20.Fan-Out Strategies: Write-Time vs Read-Timehttps://sohilladhani.com/blog/post/2026-02-14-fan-out-strategies-write-time-vs-read-time/Sat, 14 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-14-fan-out-strategies-write-time-vs-read-time/User sends a message to a group of 500 people. Do you write that message into 500 inboxes right now? Or store it once and let each person fetch it when they open the app? This is the fan-out problem. And the answer changes everything about your storage, latency, and infrastructure costs. Fan-Out on Write (Push) When a message arrives, immediately write it to every recipient’s inbox. Reads become trivial: just query your own inbox.WebSockets vs Long Polling: Choosing a Real-Time Transporthttps://sohilladhani.com/blog/post/2026-02-13-websockets-vs-long-polling/Fri, 13 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-13-websockets-vs-long-polling/Your client needs live updates. New messages, price changes, status notifications. HTTP is request-response. Client asks, server answers. Server can’t just push data whenever it wants. So we hack around it. And the hack you choose matters more than you’d think. Long Polling Client sends a request. Server holds it open until there’s new data or a timeout hits. Client gets the response, immediately sends another request. Repeat forever. @GetMapping("/updates") public DeferredResult<List<Event>> poll(@RequestParam long lastEventId) { DeferredResult<List<Event>> result = new DeferredResult<>(30000L); eventBroker.Read Replicas: Hidden Consistency Trapshttps://sohilladhani.com/blog/post/2026-02-12-read-replicas-hidden-consistency-traps/Thu, 12 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-12-read-replicas-hidden-consistency-traps/User updates their profile name. Page refreshes. Old name is still showing. They click refresh again. New name appears. Your code is fine. Your database is fine. The read hit a replica that was 200ms behind the primary. Welcome to replication lag. The Setup Primary handles writes. Replicas handle reads. Replication is asynchronous. There’s always a lag, usually milliseconds, sometimes seconds under load. // Write goes to primary @Transactional public void updateProfile(String userId, String newName) { userRepo.Thundering Herdhttps://sohilladhani.com/blog/post/2026-02-11-thundering-herd/Wed, 11 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-11-thundering-herd/Popular cache key expires. 10,000 requests arrive in the same second. All of them miss the cache. All of them hit the database. Database collapses under the load. This is the thundering herd. Closely related to the cache stampede I wrote about earlier, but the thundering herd happens at a broader scale. It’s not just one key. It’s thousands of requests making the same bad decision at the same time.Structured Logging in Distributed Systemshttps://sohilladhani.com/blog/post/2026-02-10-structured-logging-in-distributed-systems/Tue, 10 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-10-structured-logging-in-distributed-systems/Production bug. User says checkout failed. You SSH into the server and grep the logs. Nothing useful. The request hit 6 services. The error is in service 4. You’re grepping through service 1. This is how I spent my first year debugging distributed systems. It doesn’t scale. The Problem With Unstructured Logs log.info("Processing order " + orderId + " for user " + userId); log.error("Failed to process order: " + e.Database Migrations Without Downtimehttps://sohilladhani.com/blog/post/2026-02-09-database-migrations-without-downtime/Mon, 09 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-09-database-migrations-without-downtime/ALTER TABLE on a 2M row table. MySQL locks it. Every query queues up behind the lock. Your API returns 503s for two minutes. I’ve done this. In production. On a Friday. Don’t be me. The Problem With Direct Migrations Most schema changes in MySQL acquire a metadata lock. Small tables, no problem. Large tables? That lock blocks reads and writes for the entire duration. -- Looks innocent. Blocks everything on a large table.Tail Latency: The P99 Problemhttps://sohilladhani.com/blog/post/2026-02-08-tail-latency-the-p99-problem/Sun, 08 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-08-tail-latency-the-p99-problem/Your dashboard says average latency is 50ms. Everything looks healthy. But 1% of your users are waiting 3 seconds. Some are timing out entirely. Averages lie. P99 tells the truth. Why Averages Hide Problems 100 requests. 99 complete in 40ms. One takes 5 seconds. Average: 89ms. Looks fine. That one user? Furious. Now add fan-out. Your API calls 5 backend services in parallel. Each service has a 1% chance of being slow.Ordering Guarantees in Event-Driven Systemshttps://sohilladhani.com/blog/post/2026-02-07-ordering-guarantees-in-event-driven-systems/Sat, 07 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-07-ordering-guarantees-in-event-driven-systems/User creates an account, then updates it. Your consumer processes the update first. Account doesn’t exist yet. Crash. Order matters. And distributed systems mess it up constantly. Kafka’s Ordering Promise Kafka guarantees ordering within a partition. Not across partitions. If you have a topic with 8 partitions, messages land on different partitions based on the key. Same key, same partition, same order. Different keys? No ordering guarantee between them. // Same user always goes to same partition kafkaTemplate.Dead Letter Queueshttps://sohilladhani.com/blog/post/2026-02-06-dead-letter-queues/Fri, 06 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-06-dead-letter-queues/Your consumer retries a message. Fails. Retries again. Fails. Retries 10,000 more times. Still fails. The message is malformed. It will never succeed. But your consumer doesn’t know that. It just keeps retrying, blocking every message behind it. This is the poison pill problem. And dead letter queues (DLQs) are the fix. The Poison Pill Not all failures are transient. A database timeout might resolve on retry. A malformed JSON payload never will.Making Consumers Idempotenthttps://sohilladhani.com/blog/post/2026-02-05-making-consumers-idempotent/Thu, 05 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-05-making-consumers-idempotent/Last post: exactly-once delivery is impossible across system boundaries. So what actually works? At-least-once delivery plus idempotent consumers. Kafka keeps delivering until you commit. Messages might repeat, but they won’t get lost. Your consumer handles duplicates gracefully. Database-Level Deduplication The trick: store the message ID in the same transaction as your business logic. @Transactional public void processOrder(OrderEvent event) { String messageId = event.getMessageId(); // Unique constraint on message_id prevents duplicates try { processedMessageRepo.Exactly-Once Delivery is a Liehttps://sohilladhani.com/blog/post/2026-02-04-exactly-once-delivery-is-a-lie/Wed, 04 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-04-exactly-once-delivery-is-a-lie/Kafka says “exactly-once.” Your vendor promises “exactly-once.” The conference talk slides say “exactly-once.” Your consumer just processed the same message twice. Customer got two emails. Order shipped twice. The Three Guarantees At-most-once: Fire and forget. Message might get lost. At-least-once: Retry until acknowledged. Might deliver duplicates. Exactly-once: Delivered exactly one time. The holy grail. Here’s the thing: exactly-once works within a system. It’s impossible across systems. The Boundary Problem When Kafka says “exactly-once,” they mean within Kafka.Graceful Shutdown: Dying Without Dropping Requestshttps://sohilladhani.com/blog/post/2026-02-03-graceful-shutdown/Tue, 03 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-03-graceful-shutdown/You deploy a new version. Kubernetes kills the old pod. A user’s request was mid-flight. They see a 502. Your deploy just caused an error. Not a bug in your code. Just bad timing. Graceful shutdown is the fix. Stop accepting new work, finish what you started, then die. The Kill Sequence When Kubernetes (or any orchestrator) wants to stop your pod: SIGTERM: “Please shut down.” Your app should start cleanup.Timeouts: The Hardest Easy Problemhttps://sohilladhani.com/blog/post/2026-02-02-timeouts/Mon, 02 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-02-timeouts/Service A calls Service B. Service B is slow. How long should A wait? Too short: A gives up on requests that would have succeeded. Users see errors. Retries pile on. B gets hammered. Too long: A’s threads block waiting. Connection pool drains. A becomes slow. A’s callers timeout. The slowness spreads. There’s no safe default. And yet I’ve seen codebases with no timeouts at all. The No-Timeout Trap No timeout means infinite wait.Distributed Locks: When One Process Must Winhttps://sohilladhani.com/blog/post/2026-02-01-distributed-locks/Sun, 01 Feb 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-02-01-distributed-locks/You have a cron job that sends daily reports. You deploy to 5 servers. Now you have 5 cron jobs sending 5 reports. You need a lock. Only one server should run the job. On a single machine, this is easy. synchronized in Java. threading.Lock in Python. But your servers don’t share memory. They need to agree on who holds the lock. Welcome to distributed locking. It’s harder than it looks.Connection Pooling: Why Opening Connections Is Expensivehttps://sohilladhani.com/blog/post/2026-01-31-connection-pooling/Sat, 31 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-31-connection-pooling/Every database query needs a connection. Open connection, run query, close connection. Simple. Except opening a connection is expensive. TCP handshake. TLS negotiation. Authentication. Protocol setup. For MySQL, this can take 20-50ms. Your query might only take 2ms. You’re spending 10x more time connecting than querying. The Pool A connection pool keeps connections open and ready. Instead of open-query-close, you borrow-query-return. %%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#000000','primaryTextColor':'#00ff00','primaryBorderColor':'#00ff00','lineColor':'#00ff00','secondaryColor':'#000000','tertiaryColor':'#000000'}}}%% sequenceDiagram autonumber participant App as Application participant Pool as Connection Pool participant DB as Database Note over Pool: 10 connections ready App->>Pool: Borrow connection Pool-->>App: Here's connection #3 App->>DB: SELECT * FROM users DB-->>App: Results App->>Pool: Return connection #3 Note over Pool: Connection #3 available again The pool handles the lifecycle.Multi-Level Caching: L1, L2, and Beyondhttps://sohilladhani.com/blog/post/2026-01-30-multi-level-caching/Fri, 30 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-30-multi-level-caching/You added Redis. Latency dropped from 50ms to 5ms. Great. But now every request still makes a network call to Redis. What if you could skip even that? Enter multi-level caching. Multiple cache layers, each faster than the last. The Hierarchy %%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#000000','primaryTextColor':'#00ff00','primaryBorderColor':'#00ff00','lineColor':'#00ff00','secondaryColor':'#000000','tertiaryColor':'#000000'}}}%% graph TD R[Request] --> L1{L1: Local Cache} L1 -->|Hit| R1[~0.1ms] L1 -->|Miss| L2{L2: Distributed Cache} L2 -->|Hit| R2[~2-5ms] L2 -->|Miss| DB[(Database)] DB --> R3[~50-200ms] style R fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style L1 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style L2 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style DB fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style R1 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style R2 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style R3 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff L1: Local/In-Process Cache Lives in your application’s memory.Cache Stampede: When Expiry Causes Chaoshttps://sohilladhani.com/blog/post/2026-01-29-cache-stampede/Thu, 29 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-29-cache-stampede/Your cache is humming along. A popular key expires. 10,000 requests arrive in the next second. All of them miss the cache. All of them hit your database. Simultaneously. Your database falls over. Requests timeout. Users see errors. You just experienced a cache stampede. Also called the thundering herd problem. And it’s bitten me more than once. Why It Happens %%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#000000','primaryTextColor':'#00ff00','primaryBorderColor':'#00ff00','lineColor':'#00ff00','secondaryColor':'#000000','tertiaryColor':'#000000'}}}%% sequenceDiagram autonumber participant R1 as Request 1 participant R2 as Request 2 participant R3 as Request 1000 participant C as Cache participant DB as Database Note over C: Key expires at T=0 R1->>C: GET popular_key C-->>R1: MISS R2->>C: GET popular_key C-->>R2: MISS R3->>C: GET popular_key C-->>R3: MISS R1->>DB: Query R2->>DB: Query R3->>DB: Query Note over DB: 1000 identical queriesDatabase overloaded The gap between “cache miss” and “cache repopulated” is the danger zone.Cache Invalidation: The Hard Problemhttps://sohilladhani.com/blog/post/2026-01-28-cache-invalidation/Wed, 28 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-28-cache-invalidation/“There are only two hard things in Computer Science: cache invalidation and naming things.” I used to think this was a joke. Then I shipped a bug where users saw stale prices for 6 hours. The cache was working perfectly. That was the problem. Why Is It Hard? You have data in two places: database and cache. When the database changes, the cache needs to know. Sounds simple. It’s not.Caching Patterns: Cache-Aside, Write-Through, and Friendshttps://sohilladhani.com/blog/post/2026-01-27-caching-patterns/Tue, 27 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-27-caching-patterns/Your database is slow. You add a cache. Problem solved? Not quite. Now you have two copies of the same data. When do you update the cache? When do you update the database? What if they disagree? These questions led to four patterns. Each makes different trade-offs. Cache-Aside (Lazy Loading) The most common pattern. Application talks to cache and database separately. Read path: Check cache If hit, return data If miss, read from database Store in cache Return data Write path:CRDTs: Data Structures That Never Conflicthttps://sohilladhani.com/blog/post/2026-01-26-crdts-conflict-free-replicated-data-types/Mon, 26 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-26-crdts-conflict-free-replicated-data-types/Two users add items to the same shopping cart. On different servers. At the same time. Network is down between them. Later, the network heals. Now what? With vector clocks, you can detect that both writes happened concurrently. But detection isn’t resolution. Someone still has to decide what the final cart looks like. What if the data structure itself knew how to merge? The Idea CRDTs (Conflict-free Replicated Data Types) are data structures designed so that any two states can always be merged into a consistent result.Gossip Protocols: How Rumors Keep Systems Alivehttps://sohilladhani.com/blog/post/2026-01-25-gossip-protocols/Sun, 25 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-25-gossip-protocols/You have 100 servers. One of them just died. How do the other 99 find out? Option 1: Every server pings every other server. That’s 99 × 99 = 9,801 health checks. Every few seconds. Your network melts. Option 2: Central coordinator tracks everyone. Single point of failure. We’ve seen how that ends. Option 3: Servers gossip. The Party Analogy Imagine a party with 100 people. You learn a secret. You don’t announce it to the whole room.Vector Clocks and Lamport Timestampshttps://sohilladhani.com/blog/post/2026-01-24-vector-clocks-and-lamport-timestamps/Sat, 24 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-24-vector-clocks-and-lamport-timestamps/What time is it? On your laptop, easy question. In a distributed system, it’s a trap. I used to assume timestamps would save me. Server A says event happened at 10:00:01, Server B says 10:00:02. A happened first. Done. Then I learned that clocks drift. Server A’s clock might be 3 seconds ahead of Server B. Now your “ordering” is garbage. The event that actually happened second looks like it happened first.The In-Memory Trap: Why Objects Are Slowhttps://sohilladhani.com/blog/post/2026-01-23-the-in-memory-trap-why-objects-are-slow/Fri, 23 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-23-the-in-memory-trap-why-objects-are-slow/I used to think that as soon as data hit the cache, the performance battle was over. In my head, RAM was the ultimate speed limit. But while building a metadata engine here in Hyderabad, I hit a wall that proved me wrong. We designed the system to support two access patterns. First, fetching a specific record by its ID. Second, filtering across thousands of records by a specific attribute. Initially, we stored everything as standard Objects.Raft: The Understandable Consensus Algorithmhttps://sohilladhani.com/blog/post/2026-01-22-raft-consensus-algorithm/Thu, 22 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-22-raft-consensus-algorithm/We just talked about the CAP theorem. Specifically, how sometimes you have to be CP (Consistent, Partition-tolerant). But being CP is hard. It means when things go sideways, you might have to stop and wait. Waiting for what? Waiting for everyone to agree on what happened. That’s Consensus. Getting nodes in a distributed system to agree on a single value or state is incredibly tricky. You saw the Two Generals Problem back on Jan 13th – a fundamental impossibility with unreliable networks.The CAP Theorem: The Cliché I Tried to Avoidhttps://sohilladhani.com/blog/post/2026-01-21-cap-theorem-the-cliche-we-get-wrong/Wed, 21 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-21-cap-theorem-the-cliche-we-get-wrong/I’ve avoided writing about the CAP Theorem for weeks. If you’ve spent more than ten minutes on a system design blog, you’ve seen the triangle. Consistency, Availability, Partition Tolerance. Pick two. It’s the ultimate cliché of the industry. The problem is that the “Pick 2” rule is a lie. It makes it sound like you have a choice, when in reality, the laws of physics have already made the choice for you.Distributed Tracing: Finding the Needle in the Haystackhttps://sohilladhani.com/blog/post/2026-01-20-distributed-tracing-finding-the-needle/Tue, 20 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-20-distributed-tracing-finding-the-needle/A user reports that their checkout is slow. You check the logs for the Order service. Everything looks green. You check the Payment service. It’s also fine. You check the Inventory service. No errors there either. Somewhere in that chain of five services, a request is hanging for 10 seconds. But because every service only sees its own little world, you’re blind. This is why you need Distributed Tracing. The Trace ID: A Digital Passport In a monolith, you have a stack trace.Transactional Outbox: Solving the Dual Write Problemhttps://sohilladhani.com/blog/post/2026-01-19-transactional-outbox-pattern/Mon, 19 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-19-transactional-outbox-pattern/You have a bug in your microservices, but you just haven’t found it yet. It usually looks like this: @Transactional public void completeOrder(Order order) { orderRepo.save(order); // Step 1: Update DB kafka.send("order-completed", order); // Step 2: Tell the world } This works 99.9% of the time. But that 0.1%? That’s where your data dies. This is the Dual Write Problem. Why Your Events Are “Ghosts” You are writing to two different things: a Database and a Message Broker.Materialized Views: The Read Optimization Patternhttps://sohilladhani.com/blog/post/2026-01-18-materialized-views-in-distributed-databases/Sun, 18 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-18-materialized-views-in-distributed-databases/Standard database views are often a disappointment. You write this beautiful, complex SQL with six JOINs and three aggregations. You save it as a VIEW. You think, “Great, now it’s fast.” It’s not. A standard view is just a macro. The database still runs that nightmare query every single time you call it. If you want speed, you need a Materialized View. The “Cache” Inside Your DB A materialized view is basically the result of a query, saved to disk as a physical table.Saga Pattern: Managing Distributed Transactionshttps://sohilladhani.com/blog/post/2026-01-17-saga-pattern-for-distributed-transactions/Sat, 17 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-17-saga-pattern-for-distributed-transactions/Distributed transactions are a trap. In a monolith, it’s easy. You wrap everything in a @Transactional block. If the payment fails, the order doesn’t get created. Atomicity is free. In microservices, you don’t have that luxury. You can’t start a transaction in the Order service and have it magically span across the Payment and Inventory services. This is where you need a Saga. What is a Saga? A Saga is just a sequence of local transactions.Event Sourcing: Events as Source of Truthhttps://sohilladhani.com/blog/post/2026-01-16-event-sourcing/Fri, 16 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-16-event-sourcing/Traditional databases store current state. Order total: $100. User balance: $500. You don’t know how you got there. History is lost. Event sourcing stores the events. State is derived. Current State vs Event Log Traditional (state-based): CREATE TABLE accounts ( id INT, balance DECIMAL, updated_at TIMESTAMP ); -- Only current state SELECT balance FROM accounts WHERE id = 123; -- Returns: 500 Balance is 500. But you don’t know: Was it 600 yesterday?CQRS: Separating Reads from Writeshttps://sohilladhani.com/blog/post/2026-01-15-cqrs-separating-reads-from-writes/Thu, 15 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-15-cqrs-separating-reads-from-writes/Same model for reads and writes. Works fine until it doesn’t. Writes need normalized schema. Reads need denormalized, fast queries. One model can’t optimize both. CQRS splits them. What is CQRS Command Query Responsibility Segregation. Fancy name for: separate write model from read model. Traditional approach: class UserService { public void updateUser(User user) { userRepository.save(user); // Write } public User getUser(Long id) { return userRepository.findById(id); // Read } } Same User entity, same database schema for both.Change Data Capture: Streaming Database Changeshttps://sohilladhani.com/blog/post/2026-01-14-change-data-capture/Wed, 14 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-14-change-data-capture/Database changes. You need other systems to react. How do they know what changed? Option 1: Application publishes events. Fragile. Bugs skip events. Option 2: Read database changes directly. Change Data Capture. What is CDC Change Data Capture streams database changes (inserts, updates, deletes) as events. Other systems subscribe to this stream. Example: User updates email in database. CDC captures: { "operation": "UPDATE", "table": "users", "before": {"id": 123, "email": "old@example.com"}, "after": {"id": 123, "email": "new@example.Two Generals Problem: Why Consensus is Impossiblehttps://sohilladhani.com/blog/post/2026-01-13-two-generals-problem-why-consensus-is-impossible/Tue, 13 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-13-two-generals-problem-why-consensus-is-impossible/Two armies need to attack together. They communicate via messengers through enemy territory. Messengers might get captured. Can they guarantee coordinated attack? No. Provably impossible. This is the Two Generals Problem. It explains why distributed systems are fundamentally hard. The Setup General A and General B are on opposite sides of an enemy. They must attack simultaneously to win. If only one attacks, they lose. The only way to communicate: send messengers through enemy territory.Database Sharding: Splitting Data Across Machineshttps://sohilladhani.com/blog/post/2026-01-12-database-sharding-splitting-data-across-machines/Mon, 12 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-12-database-sharding-splitting-data-across-machines/Single database hits limits. 1 billion users, 10 TB data. Can’t fit on one machine. Split the data. But how you split determines whether queries are fast or slow. Why Shard Vertical scaling limits: Can’t buy infinite RAM. 512GB is expensive, still not enough. Query throughput: One server handles 10,000 queries/sec. Need 100,000? Need more servers. Storage limits: Disk full. Need more space. Solution: partition data across multiple servers (shards). Each shard handles subset of data.Rate Limiting: Token Bucket vs Leaky Buckethttps://sohilladhani.com/blog/post/2026-01-11-rate-limiting-token-bucket-vs-leaky-bucket/Sun, 11 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-11-rate-limiting-token-bucket-vs-leaky-bucket/API gets hit with 10,000 requests/second. Designed for 100/second. Database crashes. Service dies. Rate limiting prevents this. Why Rate Limiting Protect your service from: Abusive clients (intentional or buggy) Traffic spikes you can’t handle DDoS attacks Expensive operations draining resources Without rate limiting, one bad client kills service for everyone. Token Bucket Algorithm Bucket holds tokens. Each request consumes one token. Tokens refill at fixed rate. Rules: Bucket capacity: 100 tokens Refill rate: 10 tokens/second Request arrives: Check if token available.Backpressure: When Consumers Can't Keep Uphttps://sohilladhani.com/blog/post/2026-01-10-backpressure-when-consumers-cant-keep-up/Sat, 10 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-10-backpressure-when-consumers-cant-keep-up/Producer sends 1000 messages/second. Consumer processes 100 messages/second. Queue grows. Memory fills. System crashes. This is the backpressure problem. The Problem Fast producer, slow consumer. Messages pile up in between. Eventually you run out of memory or disk. Real scenario: Kafka producer sending click events at 10,000/sec. Consumer writing to database at 1,000/sec. Lag grows to millions of messages. Consumer falls hours behind real-time. You need to slow down the producer or speed up the consumer.Retry Strategies: Exponential Backoff and Jitterhttps://sohilladhani.com/blog/post/2026-01-09-retry-strategies-exponential-backoff-and-jitter/Fri, 09 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-09-retry-strategies-exponential-backoff-and-jitter/Request fails. You retry immediately. Fails again. Retry immediately. Server gets hammered. Everything crashes. Naive retries make outages worse. The Problem with Immediate Retries Service goes down for 2 seconds. 10,000 clients hit timeout. All retry immediately. Service comes back up, gets hit with 10,000 simultaneous requests. Dies again. This is a retry storm. Your retries prevent the service from recovering. Exponential Backoff Wait longer between each retry. First retry after 1 second.Idempotency: Why Retries Need Ithttps://sohilladhani.com/blog/post/2026-01-08-idempotency-why-retries-need-it/Thu, 08 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-08-idempotency-why-retries-need-it/Network fails mid-request. Did the payment go through? You don’t know. So you retry. Now the user is charged twice. This is why idempotency matters. What Idempotency Means An operation is idempotent if doing it multiple times has the same effect as doing it once. Idempotent: SET balance = 100 (run 10 times, balance is still 100) DELETE user WHERE id = 5 (run 10 times, user still deleted once) GET /user/123 (reads don’t change state) Not idempotent:Session Guarantees: The Promises Your Database Makes to Youhttps://sohilladhani.com/blog/post/2026-01-07-session-guarantees-the-promises-your-database-makes-to-you/Wed, 07 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-07-session-guarantees-the-promises-your-database-makes-to-you/I spent two days debugging why users were reporting “lost data” in a distributed KV store I was building. Turns out, nothing was lost. The data was there. Just… not where the user expected it. The problem: user writes “value2”, immediately reads back “value1”. From their perspective, the database ate their write. From the database’s perspective, everything’s working perfectly - the write went to the leader, the read hit a follower that hadn’t caught up yet.Horizontal vs Vertical Scaling: Bigger Machine or More Machineshttps://sohilladhani.com/blog/post/2026-01-06-horizontal-scaling-vs-vertical-scaling/Tue, 06 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-06-horizontal-scaling-vs-vertical-scaling/System is slow. Two options: bigger machine or more machines. Vertical scaling is simple. Horizontal scaling is complex. Pick wrong and you’ll regret it. Vertical Scaling (Scale Up) Add more resources to a single machine. More CPU, more RAM, bigger disk. Database running out of memory? Upgrade from 16GB to 128GB RAM. Query execution slow? Add more CPU cores. Disk I/O bottleneck? Switch to NVMe SSDs. Pros: Simple. No code changes.Circuit Breakers: Failing Fast to Stay Alivehttps://sohilladhani.com/blog/post/2026-01-05-circuit-breakers-failing-fast-to-stay-alive/Mon, 05 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-05-circuit-breakers-failing-fast-to-stay-alive/Service A calls Service B. Service B is down. Service A keeps retrying. Now both services are down. Circuit breakers prevent this cascade. The Problem You have a microservice calling an external payment API. API goes down. Your service waits for timeout (say 30 seconds) on each request. Threads pile up waiting. Request queue grows. Memory fills up. Your service crashes. The retry storm makes it worse. When external API tries to recover, it gets hammered by backed-up retries.Load Balancing Strategies: Picking the Right Serverhttps://sohilladhani.com/blog/post/2026-01-04-load-balancing-strategies/Sun, 04 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-04-load-balancing-strategies/You have 5 servers. Request comes in. Which one handles it? Seems simple until you realize the algorithm determines whether your system handles load gracefully or collapses under traffic spikes. Why Load Balancing Matters Single server hits capacity (CPU, memory, connections). Solution: add more servers, distribute requests. But naive distribution fails. Send equal traffic to all servers? Server 1 might be processing expensive queries while Server 2 sits idle. You need smarter routing.Bloom Filters: Definitely Not Herehttps://sohilladhani.com/blog/post/2026-01-03-bloom-filters/Sat, 03 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-03-bloom-filters/LSM trees have a problem. A read might need to check 10 different SSTables to find a key. That’s 10 disk reads. Bloom filters fix this. They tell you “this SSTable definitely doesn’t have your key” without reading it. The Problem You query user_id=12345. LSM tree has 10 SSTables. Without any optimization, you read all 10 files, merge results. 9 of those SSTables don’t have that user. You wasted 9 disk reads.Compaction Strategies: Cleaning Up After LSM Treeshttps://sohilladhani.com/blog/post/2026-01-02-compaction-strategies/Fri, 02 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-02-compaction-strategies/LSM trees write fast by creating new SSTables. After a while, you have hundreds of them. Reads check every single one. Compaction merges SSTables into fewer, larger files. But how you merge matters. The Problem Write 1000 records. That’s SSTable 1. Update 500 of them. That’s SSTable 2 with the new values. Delete 200. That’s SSTable 3 with tombstones. Now a read has to check all three SSTables, merge the results, apply tombstones.LSM Trees vs B-Trees: Write Fast or Read Fasthttps://sohilladhani.com/blog/post/2026-01-01-lsm-trees-vs-b-trees-read-first-or-write-fast/Thu, 01 Jan 2026 00:00:00 +0000https://sohilladhani.com/blog/post/2026-01-01-lsm-trees-vs-b-trees-read-first-or-write-fast/Most databases use B-trees. Some use LSM trees. The choice determines whether writes or reads are fast. You can’t optimize both. B-Trees: Read-Optimized B-tree keeps data sorted on disk in a tree structure. Each node is a page (typically 4KB). Insert or update? Find the right page, modify it in place, write it back. Reads are fast. Single lookup finds your data. Range scans are efficient because data is already sorted and contiguous.Write-Ahead Logging: How Databases Survive Crasheshttps://sohilladhani.com/blog/post/2025-12-31-write-ahead-logging-how-databases-survive-crashes/Wed, 31 Dec 2025 00:00:00 +0000https://sohilladhani.com/blog/post/2025-12-31-write-ahead-logging-how-databases-survive-crashes/Database crashes mid-write. Power fails. Server dies. When it restarts, how does it know what happened? Write-Ahead Logging. The Problem Random writes are slow. Disk seeks take around 10ms. If you write every change directly to disk at random locations, throughput tanks. In-memory updates are fast but volatile. Crash before flushing to disk? Data gone. You need durability without killing performance. How WAL Works Every change gets appended to a log file first.Read Repair and Anti-Entropy: Healing Stale Replicashttps://sohilladhani.com/blog/post/2025-12-30-read-repair-anti-entropy/Tue, 30 Dec 2025 00:00:00 +0000https://sohilladhani.com/blog/post/2025-12-30-read-repair-anti-entropy/Your quorum read succeeds. Two out of three replicas responded with the latest data. But replica 3 is stale. How does it catch up? Two ways: fix it when you notice, or fix everything constantly. Read Repair Client reads from multiple replicas, detects one has stale data (older version number or timestamp), writes the newest value back to the stale replica. Happens during normal reads. Works great for hot data. Frequently accessed keys stay in sync.Conflict Resolution: When Two Writes Winhttps://sohilladhani.com/blog/post/2025-12-29-conflict-resolution-when-two-writes-win/Mon, 29 Dec 2025 00:00:00 +0000https://sohilladhani.com/blog/post/2025-12-29-conflict-resolution-when-two-writes-win/I thought concurrent writes would somehow merge automatically. They don’t. When two nodes accept writes to the same record simultaneously, the database doesn’t magically resolve it. You have to. Avoiding Conflicts Easiest solution: don’t allow them. Single leader replication. All writes to a key go through one node. No concurrent writes possible. This is what most systems do because conflict resolution is hard. If you must use multi-leader, partition carefully. User A’s data always writes to datacenter 1, User B to datacenter 2.Replication Lag: The Bug That Isn't a Bughttps://sohilladhani.com/blog/post/2025-12-28-replication-lag-the-bug-that-isnt-a-bug/Sun, 28 Dec 2025 00:00:00 +0000https://sohilladhani.com/blog/post/2025-12-28-replication-lag-the-bug-that-isnt-a-bug/In one of the companies I worked at, we had this intermittent bug. A service would process a request, immediately query its status, and see stale data. “System’s dropping requests!” the team said. Logs showed the write succeeded. Database confirmed the data was there. Took me embarrassingly long to realize: not a bug. Replication lag. The write went to the leader. The user’s next read hit a follower that hadn’t caught up yet.Consistency Models: What Eventually Meanshttps://sohilladhani.com/blog/post/2025-12-27-consistency-model-what-eventually-means/Sat, 27 Dec 2025 00:00:00 +0000https://sohilladhani.com/blog/post/2025-12-27-consistency-model-what-eventually-means/I used to think “eventual consistency” meant “maybe a few milliseconds.” Turns out it can mean seconds, minutes, or even longer. The lightbulb moment: consistency models aren’t about what data you store. They’re about what guarantees you make when someone reads it. The Spectrum Linearizability (strongest guarantee): After a write completes, all reads see that value. It’s like everyone’s looking at the same whiteboard. Expensive but simple to reason about.Query Execution Plans: Reading EXPLAIN Like a Maphttps://sohilladhani.com/blog/post/2025-12-26-query-execution-plans-reading-explain-explain-like-a-map/Fri, 26 Dec 2025 00:00:00 +0000https://sohilladhani.com/blog/post/2025-12-26-query-execution-plans-reading-explain-explain-like-a-map/I used to run EXPLAIN on slow queries, stare at the output, and have no idea what I was looking at. It felt like reading hieroglyphics. Then I realized: it’s just a map. Maps show you the route. EXPLAIN shows you the database’s route through your query. What EXPLAIN Actually Shows When you prefix a query with EXPLAIN, MySQL doesn’t execute it. Instead, it shows you its execution plan: the strategy it will use.Secondary Indexes in Distributed Databaseshttps://sohilladhani.com/blog/post/2025-12-25-secondary-indexes-in-distributed-databases/Thu, 25 Dec 2025 00:00:00 +0000https://sohilladhani.com/blog/post/2025-12-25-secondary-indexes-in-distributed-databases/You partition your database by user_id for scalability. Now someone asks: “Find all users in Ahmedabad city.” Problem: Ahmedabad users are scattered across all partitions. This is the secondary index problem in distributed systems. The Core Problem Partitioned by user_id: Server A (0-999): user_100 (amit, Ahmedabad) Server B (1000-1999): user_1500 (vijay, Morbi) Server C (2000-2999): user_2500 (narendra, Ahmedabad) Query by user_id=1500? Hash to Server B. Fast. Query by city='Ahmedabad'? Users on Server A and C.The Hidden Cost of JOINshttps://sohilladhani.com/blog/post/2025-12-24-the-hidden-cost-of-joins/Wed, 24 Dec 2025 00:00:00 +0000https://sohilladhani.com/blog/post/2025-12-24-the-hidden-cost-of-joins/Every JOIN you add doesn’t just fetch more data, it multiplies your query’s complexity. Early in my career, I wrote a query joining 5 tables to generate a result. Ran fine in the local machine with test data. In the test machine with real data? 12 seconds. The problem wasn’t the JOIN itself, it was not understanding the cost. How JOINs Multiply Cost -- Example table sizes: -- orders: 1M rows -- customers: 100K rows -- products: 10K rows SELECT * FROM orders JOIN customers ON orders.Indexing Strategies That Actually Workhttps://sohilladhani.com/blog/post/2025-12-23-indexing-strategies-that-actually-work/Tue, 23 Dec 2025 00:00:00 +0000https://sohilladhani.com/blog/post/2025-12-23-indexing-strategies-that-actually-work/When I was a junior developer, I added 5 indexes to a table thinking it would speed things up. Instead, I made queries 3x slower. If you have experience with indexes, you know that more indexes != faster queries. Indexes have costs: Write overhead: Every INSERT/UPDATE maintains all indexes Storage cost: Indexes consume disk space Query planner confusion: Too many options can lead to poor choices The Right Approach 1. Index cardinality mattersVirtual Nodes: The Three-Layer Pattern of Consistent Hashinghttps://sohilladhani.com/blog/post/2025-12-22-virtual-nodes-the-three-layer-pattern-of-consistent-hashing/Mon, 22 Dec 2025 00:00:00 +0000https://sohilladhani.com/blog/post/2025-12-22-virtual-nodes-the-three-layer-pattern-of-consistent-hashing/Virtual nodes in consistent hashing confused me for years. I understood the benefits but not the mechanics. Here’s the mental model that made it click. The standard explanation goes like this: “Virtual nodes minimize data movement when servers change. We place multiple vnodes on the hash ring and assign them to physical servers.” True, but hand-wavy. The breakthrough came when I understood it as three distinct layers: Layer 1: Application -> Vnode (Fixed) The application hashes keys to virtual nodes.The Query Optimization Frameworkhttps://sohilladhani.com/blog/post/2025-12-21-the-query-optimization-framework/Sun, 21 Dec 2025 00:53:18 +0530https://sohilladhani.com/blog/post/2025-12-21-the-query-optimization-framework/Most engineers guess at performance problems. Here’s a better way. Early in my career, when a query was slow, I’d just start changing things. Add an index. Rewrite the JOIN. Change the WHERE clause order. Sometimes it worked. Usually it didn’t. I was debugging by intuition, not by data. After trying few things, I came up with a framework that has worked for me: 1. Measure First: Establish how slow is “slow”.How to create a Media Server out of a routerhttps://sohilladhani.com/blog/post/2016-04-05-how-to-create-a-media-server-out-of-a-router/Tue, 05 Apr 2016 15:42:05 +0000https://sohilladhani.com/blog/post/2016-04-05-how-to-create-a-media-server-out-of-a-router/Hello folks. I’m here with yet another tutorial. This time, we are going to create a media server out of a router. Sounds cool, ain’t it? Let’s do it then. Before proceeding, I want you to go through the prerequisites for this tutorial. First of all, your router should have OpenWrt installed on your router. You can install it by following links like this. Secondly, your router should have a USB port.How I managed to deploy a 2 node ceph clusterhttps://sohilladhani.com/blog/post/2016-03-29-how-i-managed-to-deploy-a-2-node-ceph-cluster/Mon, 28 Mar 2016 18:52:31 +0000https://sohilladhani.com/blog/post/2016-03-29-how-i-managed-to-deploy-a-2-node-ceph-cluster/As a part of a course called Data Storage Technology and Networks in BITS Pilani – Hyderabad Campus, I took up project to integrate Ceph Storage Cluster with OpenStack. To integrate both of them, we first need to deploy Ceph Storage Cluster on more than 1 machine (we will use 2 machines for the purpose). This blog post will give you exact steps on how to do that. Before starting, let me tell you that deploying Ceph Cluster on 2 nodes is just for learning purpose.Import GPG key in CentOS 7https://sohilladhani.com/blog/post/2016-03-25-import-gpg-key-in-centos-7/Fri, 25 Mar 2016 10:02:51 +0000https://sohilladhani.com/blog/post/2016-03-25-import-gpg-key-in-centos-7/I was trying to deploy a ceph cluster on CentOS 7 machine and while following the steps mentioned on this page, I ran into following error: You have enabled checking of packages via GPG keys. This is a good thing. However, you do not have any GPG public keys installed. You need to download the keys for packages you wish to install and install them. You can do that by running the command: rpm --import public.Yet Another Network Controller (Part 2) – Running YANChttps://sohilladhani.com/blog/post/2016-03-13-yet-another-network-controller-part-2-running-yanc/Sun, 13 Mar 2016 05:25:44 +0000https://sohilladhani.com/blog/post/2016-03-13-yet-another-network-controller-part-2-running-yanc/Once you have set up yanc using this link, we now need to run yanc filesystem, yanc-of-adapter and mininet. Go to yanc folder and run the following command: $ ./yanc -f /net This will mount the yanc filesystem under /net directory. Open new terminal. Go to /apps/of-adapter and run following command: $ ./yanc-of-adapter /net unix:path=/var/run/dbus/system_bus_socket -vvv This will start yanc-of-adapter at port 6633 on the localhost. It will use unix:path=/var/run/dbus/system_bus_socket as the D-Bus for IPC (read about D-Bus here).Yet Another Network Controller (Part 1) – Getting startedhttps://sohilladhani.com/blog/post/2016-02-23-yet-another-network-controller-part-1-getting-started/Tue, 23 Feb 2016 10:12:29 +0000https://sohilladhani.com/blog/post/2016-02-23-yet-another-network-controller-part-1-getting-started/So this is a blog post (after a very long time) explaining how to start Yet Another Network Controller (yanc – https://github.com/ngn-colorado/yanc) on your Linux system. Clone yanc repository in your local machine if you haven’t already. $ git clone https://github.com/ngn-colorado/yanc.git $ cd yanc/ $ make $ sudo mkdir /net $ sudo chown <user> : <group> /net $ ./yanc -f /net This has started the yanc filesystem with /net as its mountpoint.How To Setup Your Own Web Server with or without a Network Routerhttps://sohilladhani.com/blog/post/2012-12-05-how-to-setup-your-own-web-server-with-or-without-anetwork-router/Wed, 05 Dec 2012 12:55:39 +0000https://sohilladhani.com/blog/post/2012-12-05-how-to-setup-your-own-web-server-with-or-without-anetwork-router/All the web developers out there know the thrill of developing a website or a web-service. And their main objective is always to make their end users happy. But, they must first ensure that their websites/services are available to the end users. This tutorial will help you to make your website/service available to your end users without purchasing a server-space on some remote servers, as your personal computer will act like a web server.My First 'Self-Made' Swing Applicationhttps://sohilladhani.com/blog/post/2012-11-28-my-first-self-made-swing-application/Wed, 28 Nov 2012 09:11:48 +0000https://sohilladhani.com/blog/post/2012-11-28-my-first-self-made-swing-application/I always wanted to create a Swing app by myself ever since I got to know that Swing has been included in the subject Advance Java Technology (or AJT). And I never imagined that it’d be the part of the first ever blog post of my life! I’m currently in 7th semester of my Computer Engineering career at Shankersinh Vaghela Bapu Institute of Technology. So I started learning basics of Swing API and gradually started adding components to it.