A normal index maps documents to words. An inverted index maps words to documents. That reversal is why search is fast.
Posts for: #Database
Optimistic vs Pessimistic Concurrency: Locks vs Versions
Two users update the same row. Pessimistic locking blocks one until the other finishes. Optimistic locking lets both try and fails the loser. Choosing wrong kills either throughput or correctness.
Two-Phase Commit: The Original Distributed Transaction
Two-phase commit guarantees atomicity across multiple databases. It also blocks everything if the coordinator dies. Here’s why microservices moved on.
Distributed ID Generation: Snowflake and Friends
Auto-increment IDs break the moment you have more than one database. Snowflake IDs, UUIDs, and database sequences each solve this differently.
Social Graphs at Scale: Storing Relationships in MySQL
A follows table with two columns seems trivial. Until you need to query it from both directions, across shards, for millions of users.
Cursor-Based Pagination: Why Offset Breaks at Scale
OFFSET 50000 makes MySQL scan 50,000 rows just to skip them. Cursor pagination stays fast no matter how deep you go.
Read Replicas: Hidden Consistency Traps
You added read replicas to scale reads. Now users update their profile and see the old version. Welcome to replica lag.
Database Migrations Without Downtime
ALTER TABLE on a 2M row table locks it for minutes. Your users see errors. Here’s how expand-contract and shadow writes let you migrate without downtime.
Connection Pooling: Why Opening Connections Is Expensive
The hidden cost of database connections. How connection pools work, why they matter, and how to size them without guessing.
The CAP Theorem: The Cliché I Tried to Avoid
Why the CAP Theorem is the most misunderstood rule in system design. Addressing the ‘Pick 2’ lie and how it sets the stage for consensus algorithms.
Materialized Views: The Read Optimization Pattern
Why standard views are just aliases and how materialized views act as an ‘in-database cache’ to solve the cross-shard query problem.
Change Data Capture: Streaming Database Changes
How to capture and stream database changes in real-time. CDC patterns, implementation approaches, and when to use it instead of application-level events.
Database Sharding: Splitting Data Across Machines
How to partition database across multiple servers. Hash-based vs range-based sharding, rebalancing strategies, and the complexity that comes with it.
Bloom Filters: Definitely Not Here
Bloom filters skip unnecessary disk reads in LSM trees by saying ‘definitely not here’ with zero false negatives. Learn how Cassandra and RocksDB use them.
Compaction Strategies: Cleaning Up After LSM Trees
LSM trees create SSTables fast but need compaction. Learn size-tiered vs leveled compaction strategies and the write vs read amplification tradeoff.
LSM Trees vs B-Trees: Write Fast or Read Fast
LSM Trees vs B-Trees: the write-fast or read-fast tradeoff. Learn when to use B-trees (MySQL) vs LSM trees (Cassandra) based on your database workload.
Write-Ahead Logging: How Databases Survive Crashes
How do databases survive crashes and ensure durability? Learn how Write-Ahead Logging (WAL) uses sequential writes to guarantee data persistence without killing performance.
Query Execution Plans: Reading EXPLAIN Like a Map
Stop staring at EXPLAIN output confused. Learn to read MySQL execution plans like a map and find the root cause of slow queries in seconds, not hours.
Secondary Indexes in Distributed Databases
Querying partitioned databases by non-partition keys? Learn the tradeoffs between local and global secondary indexes in distributed systems.
The Hidden Cost of JOINs
Every JOIN multiplies query complexity. Learn the three JOIN strategies databases use and when denormalization beats JOIN performance by 30x.
Indexing Strategies That Actually Work
More indexes don’t mean faster queries. Learn when to add, remove, and optimize database indexes. Real examples of 7x performance gains through strategic indexing.
The Query Optimization Framework
Stop guessing at performance problems. Learn the 5-step systematic framework for debugging slow queries that helped reduce query times from 2+ seconds to 30ms.