Geohashing: Turning Coordinates into Searchable Strings
Your user is at (37.7749, -122.4194). You need the nearest 20 restaurants. Brute-forcing the distance formula against 10 million rows? That’s not a query, that’s a punishment.
The problem: coordinates are two-dimensional. Database indexes are one-dimensional. You need a way to collapse 2D into 1D while preserving locality.
How Geohashing Works#
Geohashing recursively divides the world into a grid. Start with the entire map. Split it in half vertically: left half is 0, right half is 1. Split each half horizontally: bottom is 0, top is 1. Keep subdividing. Each step adds a bit to the hash. After enough iterations, you have a string like 9q8yyk that represents a small rectangle on Earth.
The magic: nearby locations share a common prefix. 9q8yyk and 9q8yym are neighbors. You can find nearby points with a prefix query.
-- Store geohash as an indexed column
ALTER TABLE locations ADD COLUMN geohash VARCHAR(12);
CREATE INDEX idx_geohash ON locations(geohash);
-- Find locations near the user (precision 6 = ~1.2km cell)
SELECT * FROM locations
WHERE geohash LIKE '9q8yyk%'
ORDER BY geohash
LIMIT 20;
The Edge Problem#
Two points can be physically close but have completely different geohash prefixes. This happens at cell boundaries. The restaurant across the street might be in a different geohash cell.
The workaround: don’t just search one cell. Query the target cell plus its 8 neighbors. That’s 9 prefix queries instead of 1, but it eliminates the boundary blind spot.
Choosing Precision#
Shorter geohash means larger cells (less precision but fewer cells to search). Longer means smaller cells (more precision but more queries for a given radius). For “restaurants within 1 km,” precision 6 gives you cells of roughly 1.2 km x 0.6 km. Good enough for most use cases.
At Oracle, we had a hierarchical key encoding scheme for NSSAI configuration lookups. Network slices had hierarchical identifiers, and we needed fast prefix-based retrieval. Same concept as geohashing: encode the hierarchy into a string so that prefix matching finds related entries. Adding a B-tree index on the encoded key brought lookups from 2+ seconds to 30ms on 2M rows. One-dimensional encoding turned a multi-column scan into a prefix scan.
What I’m Learning#
Geohashing is a dimensionality reduction trick. Two dimensions become one, and suddenly your regular database indexes work. The trade-off is boundary artifacts, which you solve by querying neighbors. Combined with sharding on geohash prefix, you can distribute spatial data across machines while keeping nearby points on the same shard.
Have you used geohashing in production, or do you rely on your database’s built-in spatial extensions?