Tenant-Aware Data Partitioning

You shard your database to scale. You pick a shard key. If you pick something unrelated to tenant, queries for one tenant’s data scatter across all shards. If you pick tenant ID, all of one tenant’s data lands on one shard, and a large tenant can overwhelm it.

Why Tenant ID Makes Sense as a Shard Key#

Tenant isolation is the priority in a multi-tenant system. If all of Tenant A’s data is on Shard 2, a query for Tenant A’s records goes to Shard 2 only. No cross-shard scatter. No aggregation step. Predictable, bounded query execution.

The alternative, sharding by user ID or record ID, means a query for all records belonging to Tenant A has to hit every shard and aggregate results. That’s cross-shard overhead on every single request.

The Hot Shard Problem#

A large tenant generates more data and more queries than small tenants. If they land on Shard 2 by hash, Shard 2 runs hotter than the others. Other tenants on Shard 2 are noisy neighbors at the shard level.

Solutions: assign tenants to shards based on expected load rather than pure hashing, give high-volume tenants their own dedicated shard, or allow a single tenant to span multiple shards with more complex routing. The dedicated shard option gives the cleanest isolation at the cost of more operational complexity.

graph TD A[Tenant Routing Layer] --> B[Tenant A: Shard 2] A --> C[Tenant B: Shard 1] A --> D[Tenant C: Shard 3 - dedicated, high volume] B --> E[Shard 2 - shared with smaller tenants] C --> F[Shard 1 - shared with smaller tenants] D --> G[Shard 3 - Tenant C only] style A fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style B fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style C fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style D fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style E fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style F fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style G fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff

The Routing Table#

The component that knows which shard a tenant lives on is the tenant routing table. Every request reads it. Cache it aggressively: Redis with a short TTL, or a local in-process cache with a background refresh. A slow routing lookup adds latency to every request.

When you move a tenant from one shard to another, you update the routing table and run a background data migration. During migration, new writes go to the new shard, old data is copied over, and reads still go to the old shard until migration catches up. Then you flip the routing entry and drain the old shard. The dual-read period is the operationally complex part.

At Salesforce#

The routing table that mapped org IDs (their equivalent of tenant IDs) to specific database pods was critical infrastructure. Every API request started with a routing lookup. When the routing table store had an outage, every API call failed. We learned that this routing layer needed the same reliability investment as the databases themselves.

During org migrations (moving a large customer to a new pod), the dual-write phase was the most complex operation. Data written to the new pod had to stay in sync with the old pod until cutover, and the cutover itself was a timed, coordinated switch with a rollback plan ready.

What I’m Learning#

Shard migration is the part teams underestimate. The initial partitioning is straightforward. Moving data between shards later, while the system is live and under load, is where the real complexity lives.

Have you moved tenant data between shards in a live system, and how did you handle the cutover?