Vector Clocks and Lamport Timestamps
What time is it?
On your laptop, easy question. In a distributed system, it’s a trap.
I used to assume timestamps would save me. Server A says event happened at 10:00:01, Server B says 10:00:02. A happened first. Done.
Then I learned that clocks drift. Server A’s clock might be 3 seconds ahead of Server B. Now your “ordering” is garbage. The event that actually happened second looks like it happened first.
You can’t trust wall clocks across machines. So how do you know what happened before what?
Lamport Timestamps#
Leslie Lamport figured this out in 1978. Forget real time. Track logical time instead.
The rules are dead simple:
- Every process keeps a counter, starting at 0
- Before sending a message, increment your counter
- When receiving a message, set your counter to max(your counter, message counter) + 1
That’s it. If event A causes event B (A sends a message that B receives), then A’s timestamp is guaranteed to be lower than B’s.
(counter: 0) participant B as Server B
(counter: 0) participant C as Server C
(counter: 0) Note over A: Event happens
counter = 1 A->>B: msg (ts=1) Note over B: max(0,1)+1 = 2 Note over B: Event happens
counter = 3 B->>C: msg (ts=3) Note over C: max(0,3)+1 = 4
Now you can sort events by their Lamport timestamp. Lower number happened before higher number.
But there’s a catch.
The Limitation#
If timestamp(A) < timestamp(B), you know A might have happened before B. But two events could have the same timestamp, or different timestamps with no causal relationship. Lamport timestamps don’t tell you if events are truly independent.
Example: Server A and Server B both write to the same key at the exact same logical time. Who wins? Lamport timestamps alone can’t tell you these writes conflict.
Vector Clocks#
Vector clocks solve this. Instead of one counter, each process keeps a vector of counters, one slot per process in the system.
For a 3-server system, every event gets a timestamp like [A:2, B:1, C:0]. This means: “At the time of this event, A had seen 2 of its own events, 1 event from B, and 0 from C.”
The rules:
- Before any event, increment your own slot
- When sending, attach your full vector
- When receiving, merge vectors by taking max of each slot, then increment your own
= [1,2] Note over A: [2,0] Note over A,B: A's [2,0] and B's [1,2]
Neither dominates.
These are CONCURRENT.
Now you can compare any two events:
- If every slot in V1 <= every slot in V2, then V1 happened before V2
- If some slots are greater and some are smaller, the events are concurrent (conflict!)
This is exactly what systems like DynamoDB and Riak use to detect write conflicts. When two writes are concurrent, the system knows it needs conflict resolution (last-write-wins, merge, or ask the user).
The Trade-off#
Vector clocks grow with the number of processes. 1000 servers means 1000 slots per timestamp. This gets expensive. Real systems use tricks like pruning old entries or switching to simpler “version vectors.”
What I’m Learning#
After Raft, I thought consensus was the only way to maintain order. But consensus is expensive. It requires a leader and majority agreement.
Lamport timestamps and vector clocks offer a cheaper alternative: track causality without coordination. You give up the single source of truth, but you gain the ability to detect when things went wrong. Pair this with conflict resolution, and you have a system that stays available even when the network splits.
Sometimes “I know there’s a conflict” is more valuable than “everyone must agree first.”
Have you ever debugged an issue where events were processed out of order? What was your fix?