Two armies need to attack together. They communicate via messengers through enemy territory. Messengers might get captured.

Can they guarantee coordinated attack? No. Provably impossible.

This is the Two Generals Problem. It explains why distributed systems are fundamentally hard.

The Setup#

General A and General B are on opposite sides of an enemy. They must attack simultaneously to win. If only one attacks, they lose.

The only way to communicate: send messengers through enemy territory. Messengers might get captured. No way to know if message was delivered.

How do they coordinate?

The Impossibility#

Attempt 1: Simple message

General A sends messenger: “Attack at dawn”

Problem: What if messenger is captured? B never gets message. A attacks alone. A loses.

So A won’t attack unless they know B received the message.

Attempt 2: Acknowledgment

General A sends: “Attack at dawn”
Messenger arrives safely.
General B sends back: “Got it, will attack”

Problem: What if B’s acknowledgment is captured? A doesn’t know B got the message. A won’t attack. B attacks alone. B loses.

So B won’t attack unless they know A received the acknowledgment.

Attempt 3: Acknowledgment of acknowledgment

General A sends: “Attack at dawn”
General B sends: “Got it”
General A sends: “Got your ack”

Problem: What if A’s second message is captured? B doesn’t know A got the ack. B won’t attack.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#000000','primaryTextColor':'#00ff00','primaryBorderColor':'#00ff00','lineColor':'#00ff00','secondaryColor':'#000000','tertiaryColor':'#000000','noteBkgColor':'#000000','noteBorderColor':'#00ff00','noteTextColor':'#00ff00'}}}%% sequenceDiagram autonumber participant A as General A participant E as Enemy Territory participant B as General B A->>E: Attack at dawn E->>B: ✓ Delivered Note over B: Got message, but... B->>E: ACK: Got it E--XA: ✗ Captured Note over A: No ACK received
Did B get message?
Won't attack alone Note over B: Sent ACK, but...
Did A receive it?
Won't attack alone Note over A,B: Both wait
Attack never happens

Even one lost acknowledgment breaks coordination. Neither general attacks.

The Infinite Regress#

You need acknowledgment of acknowledgment, then acknowledgment of acknowledgment of acknowledgment, forever.

The last sender never knows if their message arrived. So they can’t commit to the action. This breaks the entire chain.

Mathematical proof: No finite number of messages can guarantee agreement over an unreliable channel.

Why This Matters for Distributed Systems#

Networks are unreliable. Packets get dropped. Connections timeout. You never know for certain if your message arrived.

This is why:

  • Database transactions can’t guarantee commit across network partition
  • Leader election can’t be 100% safe during network split
  • Distributed consensus requires accepting “good enough” instead of “perfect”

Real Systems Work Around It#

1. Timeouts and retries

Instead of infinite acks, set deadline. “Attack at dawn if you hear from me by midnight.”

Not guaranteed, but good enough. Most messages get through. Retries help.

2. Majority voting

Don’t need unanimous agreement. 3 out of 5 generals agree? Good enough.

This is how Raft and Paxos work. Majority quorum instead of 100% consensus.

3. Accept failure modes

Two-phase commit acknowledges the problem. Coordinator tracks who committed. If coordinator fails, participants might be stuck. Trade-off accepted.

FLP Impossibility#

The Two Generals Problem is informal. Fischer, Lynch, Paterson proved it formally in 1985.

FLP Impossibility: You can’t have all three:

  1. Consensus (everyone agrees)
  2. Termination (algorithm finishes)
  3. Fault tolerance (handle failures)

In asynchronous network with failures, one must be sacrificed.

Most systems sacrifice guaranteed termination. Raft/Paxos might not terminate during network partition, but when network heals, they do.

What I’m Learning#

This thought experiment clicked when I realized: the problem isn’t technical, it’s mathematical. No clever protocol solves it. The unreliable channel makes certainty impossible.

Real systems don’t try for impossibility. They use timeouts (accept “probably”), quorums (accept “most”), and retries (accept “eventually”).

Consistency models exist because perfect consistency is impossible. Circuit breakers exist because you can’t guarantee service availability. Distributed systems design is choosing which guarantees to weaken.

The Two Generals Problem is a reminder: perfect coordination over unreliable networks is fantasy. Design for reality instead.

Have you dealt with distributed consensus in production? What trade-offs did you make?