User creates an account, then updates it. Your consumer processes the update first. Account doesn’t exist yet. Crash.

Order matters. And distributed systems mess it up constantly.

Kafka’s Ordering Promise#

Kafka guarantees ordering within a partition. Not across partitions.

If you have a topic with 8 partitions, messages land on different partitions based on the key. Same key, same partition, same order. Different keys? No ordering guarantee between them.

// Same user always goes to same partition
kafkaTemplate.send("user-events", user.getId(), event);

This is why partition keys matter so much. Pick the wrong key and you lose ordering. Pick the right key and you get ordering for free, within that entity.

graph TD P[Producer] --> K{Partition Key} K -->|user-123| P0[Partition 0] K -->|user-456| P1[Partition 1] K -->|user-123| P0 P0 --> C0[Consumer 0: Ordered] P1 --> C1[Consumer 1: Ordered] style P fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style K fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style P0 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style P1 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style C0 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style C1 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff

When Partition Keys Aren’t Enough#

Sometimes one event touches multiple entities. Order created, inventory reserved, payment charged. Three different partition keys if you’re keying by entity type.

Simplest fix: use the order ID as the key for all related events. Everything for one order flows through one partition. When that’s not possible, add a sequence number and let the consumer reorder.

public class OrderEvent {
    private String orderId;
    private long sequenceNumber;  // monotonically increasing per order
}

At Oracle, we used sequence numbers on NSSF config events. Took us embarrassingly long to realize partition keys alone weren’t enough when configs spanned multiple network slices.

Total vs Causal Ordering#

Total ordering across all events is expensive. Single partition kills throughput. Vector clocks kill simplicity.

Most systems need causal ordering, not total: events that depend on each other arrive in order. User-create before user-update. Order-placed before order-shipped. Partition keys give you that within an entity. Good enough for most systems.

What I’m Learning#

Partition-level ordering handles 90% of cases. The remaining 10% need sequence numbers or careful key design.

The question to ask: “What breaks if these two events arrive in the wrong order?” If the answer is “nothing,” stop worrying about it.

Have you ever debugged an out-of-order event bug?