Quadtrees: When Fixed Grids Aren’t Enough
Geohashing divides the world into equal-sized cells. Works great when your data is evenly distributed. But data is never evenly distributed. A geohash cell in downtown Tokyo contains thousands of points. A cell in the Sahara contains zero. You need a structure that subdivides dense areas and leaves sparse areas alone.
How Quadtrees Work#
Start with a single rectangle covering your entire region. Set a capacity threshold, say 10 points per node. When a node exceeds 10 points, split it into 4 equal quadrants (NW, NE, SW, SE). Each child is a new node that can hold 10 points. When a child exceeds 10, split again.
Dense areas get deeply subdivided. Sparse areas stay as one big node.
public class QuadTree {
private static final int CAPACITY = 10;
private List<Point> points = new ArrayList<>();
private QuadTree[] children; // NW, NE, SW, SE
private Rectangle boundary;
public void insert(Point p) {
if (!boundary.contains(p)) return;
if (points.size() < CAPACITY && children == null) {
points.add(p);
return;
}
if (children == null) subdivide();
for (QuadTree child : children) child.insert(p);
}
private void subdivide() {
children = new QuadTree[4];
// Split boundary into 4 equal quadrants
children[0] = new QuadTree(boundary.nw());
children[1] = new QuadTree(boundary.ne());
children[2] = new QuadTree(boundary.sw());
children[3] = new QuadTree(boundary.se());
for (Point p : points) {
for (QuadTree child : children) child.insert(p);
}
points.clear();
}
}
Quadtree vs Geohash#
Geohashes are simpler. Store a string, do prefix queries, done. But they can’t adapt to density. Quadtrees adapt beautifully but live in memory, not in a database column. You can’t just WHERE quadtree_node LIKE 'something%'.
In practice, many systems use geohashes for storage and quadtrees for in-memory query acceleration. Not mutually exclusive.
Querying a Quadtree#
To find all points within a search rectangle: start at the root. If the search rectangle doesn’t overlap this node’s boundary, skip it. If it does, check the node’s points and recurse into children. You prune entire subtrees that are outside the search area. For a query covering a small area in a large map, you skip most of the tree.
At Salesforce, we had a similar adaptive partitioning problem, not geographic but structural. Org hierarchies varied wildly in depth and breadth. Some orgs had 3 levels with thousands of nodes per level, others had 20 levels with 2 nodes each. A fixed-depth traversal was either too shallow for deep orgs or wasteful for flat ones. We built an adaptive traversal that split deeper only where the branching factor exceeded a threshold. Same idea as a quadtree: subdivide where the data is dense, leave sparse regions alone. Query time dropped by 63% on the densest org structures.
What I’m Learning#
Quadtrees are a reminder that uniform partitioning is a convenience, not a requirement. Wherever your data has non-uniform density (cities vs countryside, hot products vs cold ones), adaptive subdivision gives you better query performance than fixed grids. The concept connects to B-trees, which also subdivide adaptively based on data volume, just in one dimension.
What spatial index does your system use: geohash, quadtree, R-tree, or something else?