The CAP Theorem: The Cliché I Tried to Avoid

I’ve avoided writing about the CAP Theorem for weeks.

If you’ve spent more than ten minutes on a system design blog, you’ve seen the triangle. Consistency, Availability, Partition Tolerance. Pick two. It’s the ultimate cliché of the industry.

The problem is that the “Pick 2” rule is a lie. It makes it sound like you have a choice, when in reality, the laws of physics have already made the choice for you.

The Partition isn’t Optional#

In a distributed system, network partitions (the ‘P’) are a fact of life. Cables get cut, routers reboot, and packets disappear.

If you have two servers, and the network between them dies, you are in a partition. You can’t “not pick” P. It’s already happened.

Now, you only have two choices:

Consistency (CP): Stop accepting writes until the network is fixed. You’d rather be “Unavailable” than “Wrong.”
Availability (AP): Keep accepting writes on both sides. You’d rather be “Fast” than “Consistent.”

That’s it. That is the whole theorem. It’s not a triangle: it’s a trade-off that only matters when things go wrong.

Why it Matters#

Imagine a banking app. The network splits.

If the app is AP (Available), it lets a user withdraw $100 from Server A and another $100 from Server B, even if they only had $100 total. The system stayed “Available,” but now your data is a mess.

If the app is CP (Consistent), it sees the network is split and refuses the withdrawal. The user is frustrated because the app is “Down,” but the bank’s books stay clean.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#000000','primaryTextColor':'#00ff00','primaryBorderColor':'#00ff00','lineColor':'#00ff00','secondaryColor':'#000000','tertiaryColor':'#000000','noteBorderColor':'#00ff00','noteTextColor':'#00ff00'}}}%% graph TD subgraph "Normal State" A1[Node A] --- B1[Node B] end subgraph "Network Partition (The 'P')" A2[Node A] -. Broken Link .- B2[Node B] end A2 -->|CP: Fail| UA[Unavailable but Correct] B2 -->|AP: Succeed| AC[Available but Inconsistent] style A1 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style B1 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style A2 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style B2 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style UA fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style AC fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff

The Bridge to Consensus#

Most of the patterns we’ve talked about—Sagas, Event Sourcing, and CDC—are built for an AP world. They value speed and availability, and they use “eventual consistency” to clean up the mess later.

But what if you can’t afford to be eventually consistent? What if you need a “Source of Truth” that everyone agrees on, even if some servers are down?

This is where things get hard. To build a truly CP system (like the one that manages your Kubernetes cluster or your database’s metadata), you need a way for a group of servers to act as one.

You need a way to reach Consensus.

What I’m Thinking#

I stayed away from CAP because it felt too basic. But the more I build distributed systems, the more I realize that every single bug usually boils down to someone trying to have their cake and eat it too. They want the speed of an AP system with the guarantees of a CP system.

You can’t have both.

Understanding this trade-off is the prerequisite for everything else. Tomorrow, I want to dive into the most elegant way I know to solve this “CP” problem without losing your mind in the complexity of older, more academic algorithms.

Does the “P is not optional” reality change how you look at your current project? Or are you still trying to pick all three?