Bloom Filters: Definitely Not Here

LSM trees have a problem. A read might need to check 10 different SSTables to find a key. That’s 10 disk reads.

Bloom filters fix this. They tell you “this SSTable definitely doesn’t have your key” without reading it.

The Problem#

You query user_id=12345. LSM tree has 10 SSTables. Without any optimization, you read all 10 files, merge results. 9 of those SSTables don’t have that user. You wasted 9 disk reads.

Disk reads are expensive. Even SSDs take microseconds. Checking 10 files adds up.

How Bloom Filters Work#

Bloom filter is a probabilistic data structure. It’s a bit array with hash functions.

When you insert a key, hash it with multiple hash functions (say 3). Set those bit positions to 1.

When you query a key, hash it the same way. Check those bit positions. If any bit is 0, the key definitely doesn’t exist. If all bits are 1, the key might exist (false positive possible).

Detailed example with a 64-bit array:

Initial state: All bits are 0.

Array: [0,0,0,0,0,0,0,0...0] (64 positions)

Insert user_123:

hash1(user_123) = 5 -> Set bit 5 to 1
hash2(user_123) = 17 -> Set bit 17 to 1
hash3(user_123) = 42 -> Set bit 42 to 1

Array now: [0,0,0,0,0,1,0,0...1...0...1...0]

Insert user_456:

hash1(user_456) = 5 -> Already 1
hash2(user_456) = 20 -> Set bit 20 to 1
hash3(user_456) = 50 -> Set bit 50 to 1

Array now has bits 5, 17, 20, 42, 50 set to 1.

Query user_789:

hash1(user_789) = 5 -> Bit is 1 (✓)
hash2(user_789) = 17 -> Bit is 1 (✓)
hash3(user_789) = 35 -> Bit is 0 (X)

At least one bit is 0. Key definitely doesn’t exist. Skip reading this SSTable entirely.

Query user_456:

hash1(user_456) = 5 -> Bit is 1 (✓)
hash2(user_456) = 20 -> Bit is 1 (✓)
hash3(user_456) = 50 -> Bit is 1 (✓)

All bits are 1. Key might exist. Have to read the SSTable to confirm.

Query user_999:

hash1(user_999) = 5 -> Bit is 1 (✓)
hash2(user_999) = 17 -> Bit is 1 (✓)
hash3(user_999) = 42 -> Bit is 1 (✓)

All bits are 1, but user_999 was never inserted. This is a false positive. Bits were set by other keys (user_123 and user_456). We read the SSTable unnecessarily, but at least we don’t miss data.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#000000','primaryTextColor':'#00ff00','primaryBorderColor':'#00ff00','lineColor':'#00ff00','secondaryColor':'#000000','tertiaryColor':'#000000','noteBkgColor':'#000000','noteBorderColor':'#00ff00','noteTextColor':'#00ff00'}}}%% graph TD Q[Query: user_456] Q --> BF1[Bloom Filter SSTable 1] BF1 --> N1[Bit check: 0 found
Definitely NOT here] N1 --> Skip1[Skip SSTable 1] Q --> BF2[Bloom Filter SSTable 2] BF2 --> Y2[All bits 1
MAYBE here] Y2 --> Read2[Read SSTable 2] Q --> BF3[Bloom Filter SSTable 3] BF3 --> N3[Bit check: 0 found
Definitely NOT here] N3 --> Skip3[Skip SSTable 3] style Q fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style BF1 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style BF2 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style BF3 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style N1 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style Y2 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style N3 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style Skip1 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style Read2 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style Skip3 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff

Bloom filter lets you skip SSTables that definitely don’t have your key.

The Trade-off#

Bloom filter uses memory. Typical size: 10 bits per key. For 1 million keys, that’s about 1.2MB. Small, but adds up across many SSTables.

False positive rate depends on size. Larger bloom filter = fewer false positives = less wasted disk I/O. Smaller = more false positives = more unnecessary reads.

Can’t have false negatives. If bloom filter says “not here,” it’s never wrong. If it says “maybe here,” you have to check.

Where It’s Used#

Cassandra and RocksDB put a bloom filter in front of each SSTable. Query checks bloom filter first. Skips SSTables that definitely don’t have the key.

Google’s BigTable paper popularized this. Now it’s standard in LSM-based storage engines.

CDNs use bloom filters to avoid cache misses. “Is this URL in cache?” Check bloom filter first instead of hash table lookup.

What I’m Curious About#

Tuning false positive rate. Lower rate means bigger bloom filter, more memory. Higher rate means more disk reads. How do you find the sweet spot for your workload?

Also, bloom filters can’t be updated. Once created for an SSTable, it’s immutable. Delete a key? Bloom filter still says it might exist. You handle this with tombstones during read, not by updating the filter.

Bloom filters save disk I/O by trading a bit of memory. For LSM trees reading across multiple SSTables, that trade-off is worth it.

Does your database use bloom filters?