Storage Tiering

Most of your data is accessed once and then never again. Storing it on fast, expensive storage forever is just burning money. Hot, Warm, Cold The canonical model is three tiers based on access frequency. Hot storage (SSD-backed, high IOPS) handles recent data that’s accessed constantly. Warm storage (standard HDD or S3 Standard-IA) holds data accessed occasionally. Cold storage (archival, like Glacier) holds data that might never be touched again but legally must be retained.
[Read more]

Delta Sync

You save a 200 MB file. One word changed. Re-uploading 200 MB to sync that change is absurd. Delta sync is how you avoid it. The Core Idea Split the file into blocks. On an update, compare the new version’s blocks against the stored version’s blocks. Transfer only the blocks that changed. Rsync pioneered this. It computes a fast rolling checksum for each block on the remote side, sends those checksums to the client, the client finds which local blocks match and which don’t, and transmits only the mismatches.
[Read more]

Content-Addressable Storage

Two users upload the same 50 MB file. Naive storage keeps two copies. Content-addressable storage keeps one. What “Content-Addressable” Means Instead of locating data by where it lives (a path, a filename), you locate it by what it is. Hash the content, use the hash as the key. Same content, same hash, same storage location. SHA-256 a file and store the result as its address. The practical consequence: deduplication becomes automatic.
[Read more]

Offline-First Sync

The field rep drove into a dead zone. The mobile app kept working: they filled out three forms, updated two account records, closed a deal. Forty minutes later, connectivity returned and the sync ran. Two of those records had been updated by a desktop user in the meantime. The mobile changes were silently dropped. No error. No prompt. Just gone. The Core Problem The client operates against a local snapshot while offline.
[Read more]

Revision History and Snapshotting

A user hits Ctrl+Z forty times and expects to land exactly where they were yesterday. That is not just undo. That is a complete audit trail of every edit, stored efficiently, queryable at any point in time. The naive approach: store a full copy of the document after every change. Works for ten users. Collapses at ten thousand. Deltas, Not Copies Instead of storing full document state after every edit, store only what changed: the operation (insert 3 chars at position 12, delete 5 chars at position 20).
[Read more]

Operational Transformation

Two users edit the same document simultaneously. User A inserts “X” at position 5. User B deletes the character at position 3. Apply both naively and the result is corrupted. The positions shifted when B’s deletion ran first, and A’s insertion lands in the wrong place. The Position Problem Operations encode positions at generation time, not application time. When document state changes between generation and application, positions are stale. Operational Transformation (OT) transforms an incoming op relative to already-applied ops before executing it.
[Read more]

Lambda and Kappa Architecture

Real-time results are fast and approximate. Historical results are slow and accurate. The tension between them is where Lambda and Kappa architecture come from. Lambda: Two Pipelines Lambda runs two parallel systems. The batch layer processes all historical data on a schedule (Spark on HDFS, every few hours) and produces ground truth. The speed layer processes the live stream (Kafka Streams or Flink) for low-latency results. The serving layer merges both: “latest batch result plus stream delta since the last batch.
[Read more]

Watermarks and Late-Arriving Data

There are two clocks in any stream processing system. Event time: when the click actually happened, recorded in the payload. Processing time: when your system received it. On a healthy network they’re close. In reality they’re not. Mobile clients buffer events when offline. Retries add delay. A click at 10:00:05 might reach your processor at 10:00:47. The 10:00 window has long since closed. The Problem With Never Waiting If you never close a window, you never produce output.
[Read more]

Stream Processing Windows

Aggregating over an infinite stream sounds easy until you realize you have no idea when it ends. You need to cut it into chunks. That’s what windows are. Three Window Types Tumbling windows are fixed, non-overlapping buckets. “Clicks per minute” is a tumbling window: minute 1, minute 2, minute 3, no overlap. Simple to implement, but events that span the boundary get split across buckets. Sliding windows overlap. “Average clicks in the last 5 minutes, recomputed every minute” means each event can appear in up to 5 windows.
[Read more]

ZooKeeper Ephemeral Nodes

Redis locks expire after a TTL. If your process crashes, you wait up to 30 seconds for the lock to become available. ZooKeeper takes a different approach: lock it to the session, not a timer. Ephemeral Nodes ZooKeeper has two kinds of nodes: persistent (survive until explicitly deleted) and ephemeral (automatically deleted when the client session expires). A session is kept alive by a heartbeat. If the client crashes, heartbeats stop, the session expires after a configurable timeout, and the ephemeral node vanishes.
[Read more]

The Redlock Algorithm

A single Redis instance holds your lock. Redis crashes. The lock entry is gone. But your client already received “acquired” before the crash and is happily running. Another client acquires the same lock on the recovered instance. Two lock holders. The single-instance Redis lock has a fundamental flaw. Quorum Locking Redlock is Redis creator Antirez’s answer. Instead of one Redis, use N independent instances (typically 5). To acquire the lock:
[Read more]

Redis Distributed Locks

Two services start the same batch job at the same time. Both read the same data, both process it, both write conflicting results. Your database row lock didn’t help because the services are on different JVMs. This is the distributed lock problem. Why Database Locks Don’t Work Here A SELECT FOR UPDATE on a MySQL row holds a lock only for the lifetime of that connection. Cross-service, that’s useless. You’d need a shared coordination point, something every instance can talk to.
[Read more]

Cache Write Strategies

Reading from cache is easy. Writing is where it gets complicated. Three strategies, each with a different answer to the question: when does the cache get updated relative to the database? Write-through updates the cache and the database synchronously on every write. The cache is always consistent with the DB. The downside is that every write pays double the cost: serialize the object, write to cache, write to DB, all in the same request path.
[Read more]

Hot Key Detection and Mitigation

Redis is single-threaded per instance. One key receiving 50,000 reads per second will pin a single CPU core and nothing else on that shard gets processed fast. This is the hot key problem. Unlike a database where you might add replicas or indexes, a single Redis key is owned by a single shard. Traffic concentration on that key concentrates CPU on that node. Detection is straightforward: redis-cli --hotkeys scans keyspace and reports access frequency.
[Read more]

Cache Eviction Policies

Cache fills up. Something has to go. The question is: which thing? LRU (Least Recently Used) evicts whatever was accessed longest ago. Simple, intuitive, fast to implement with a doubly-linked list and hash map. LFU (Least Frequently Used) evicts whatever was accessed least often. More accurate in theory, more expensive in practice. The LFU decay problem tripped me up: new items start with zero frequency. A fresh key that’s about to become hot looks identical to a stale key nobody cares about.
[Read more]