Trie Data Structures: Prefix Search in Milliseconds
User types “dis”. You need to suggest “distributed”, “discovery”, “disconnect”. With a hash map, you’d have to iterate every key and check if it starts with “dis”. That’s O(n). With a trie, it’s O(3): walk three nodes down and collect everything below.
How Tries Work#
A trie is a tree where each node represents a character. The path from root to any node spells a prefix. Nodes marked as “end” represent complete words.
public class Trie {
private final TrieNode root = new TrieNode();
public void insert(String word, int frequency) {
TrieNode node = root;
for (char c : word.toCharArray()) {
node = node.children.computeIfAbsent(c, k -> new TrieNode());
}
node.isEnd = true;
node.frequency = frequency;
}
public List<String> prefixSearch(String prefix, int topK) {
TrieNode node = root;
for (char c : prefix.toCharArray()) {
node = node.children.get(c);
if (node == null) return List.of(); // no matches
}
// Collect all words below this node, sorted by frequency
PriorityQueue<Map.Entry<String, Integer>> heap = new PriorityQueue<>(
Comparator.comparingInt(e -> -e.getValue()));
collectWords(node, new StringBuilder(prefix), heap);
return heap.stream().limit(topK).map(Map.Entry::getKey).toList();
}
}
Lookup is O(k) where k is the prefix length. It doesn’t matter if you have 1,000 or 1,000,000 entries. Three characters typed means three nodes traversed.
Compressed Tries (Radix Trees)#
A basic trie wastes space on single-child chains. The path “t-r-i-b-u-t-e-d” is 8 nodes but only one possible path. A radix tree compresses this into one node holding “tributed”. Same lookup speed, much less memory.
This matters when your trie holds millions of entries. The compression ratio depends on how many shared prefixes exist. English words share lots of prefixes (re-, un-, dis-), so compression is significant.
Ranking Suggestions#
Not all matches are equal. “distributed” might be searched 10,000 times a day, “disambiguate” only 3 times. Store frequency counts at terminal nodes. When collecting suggestions, sort by frequency and return top-K.
For dynamic popularity, update counts periodically. You can even use a multi-level cache strategy: hot suggestions in memory, cold ones on disk.
At Oracle, we had a config parameter lookup tool where engineers typed parameter names. Each keystroke triggered a SQL query: SELECT name FROM parameters WHERE name LIKE 'nssf_reg%' ORDER BY name LIMIT 10. At a few thousand parameters it wasn’t terrible, but the UI felt sluggish. Loaded all parameter names into an in-memory trie on startup. Autocomplete became instant (sub-millisecond). The trie used about 2MB of memory for 4,000 parameters. Trivial cost for a much better experience.
What I’m Learning#
Tries solve a specific problem that hash maps can’t: prefix matching. If you only need exact lookups, a hash map is better. If you need “everything starting with X”, a trie is the right tool. The trick is knowing which problem you actually have. I defaulted to SQL LIKE queries for too long before realizing the data structure was the answer.
What do you use for autocomplete in your applications?