Single database hits limits. 1 billion users, 10 TB data. Can’t fit on one machine.

Split the data. But how you split determines whether queries are fast or slow.

Why Shard#

Vertical scaling limits: Can’t buy infinite RAM. 512GB is expensive, still not enough.

Query throughput: One server handles 10,000 queries/sec. Need 100,000? Need more servers.

Storage limits: Disk full. Need more space.

Solution: partition data across multiple servers (shards). Each shard handles subset of data.

Hash-Based Sharding#

Hash the partition key. Result determines which shard.

int shard = hash(userId) % numberOfShards;

Example: 3 shards, user_id as partition key.

  • user_123: hash(123) % 3 = 0 → Shard 0
  • user_456: hash(456) % 3 = 0 → Shard 0
  • user_789: hash(789) % 3 = 1 → Shard 1

Pros:

  • Even distribution (if hash function is good)
  • No hotspots (celebrity user data spread out)
  • Predictable shard location

Cons:

  • Range queries hit all shards (get users 100-200? Check all shards)
  • Rebalancing is expensive (add shard 4? Rehash everything)
  • Related data scattered (user’s posts on different shards)
%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#000000','primaryTextColor':'#00ff00','primaryBorderColor':'#00ff00','lineColor':'#00ff00','secondaryColor':'#000000','tertiaryColor':'#000000','noteBkgColor':'#000000','noteBorderColor':'#00ff00','noteTextColor':'#00ff00'}}}%% graph TD Q[Query: user_id=456] Q --> H[hash user_id % 3] H --> S0[Shard 0
users: 123, 456, 789] H -.-> S1[Shard 1
users: 234, 567] H -.-> S2[Shard 2
users: 345, 678] S0 --> R[Return user 456 data] style Q fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style H fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style S0 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style S1 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style S2 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style R fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff

Hash-based sharding: hash determines shard, single-key queries are fast.

Range-Based Sharding#

Partition by key ranges.

Example: 3 shards, user_id ranges.

  • Shard 0: user_id 1-1000000
  • Shard 1: user_id 1000001-2000000
  • Shard 2: user_id 2000001-3000000

Pros:

  • Range queries efficient (get users 100-200? Hit Shard 0 only)
  • Related data together (user 100’s posts likely on same shard)
  • Easy to add shard (split range, no rehashing)

Cons:

  • Hotspots (new users all go to last shard)
  • Sequential IDs cause uneven load (signup spike hits one shard)
  • Need to track range mappings
%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#000000','primaryTextColor':'#00ff00','primaryBorderColor':'#00ff00','lineColor':'#00ff00','secondaryColor':'#000000','tertiaryColor':'#000000','noteBkgColor':'#000000','noteBorderColor':'#00ff00','noteTextColor':'#00ff00'}}}%% graph TD Q[Query: users 100-200] Q --> R[Check range mapping] R --> S0[Shard 0
Range: 1-1M
Contains 100-200] R -.-> S1[Shard 1
Range: 1M-2M] R -.-> S2[Shard 2
Range: 2M-3M] S0 --> RT[Return users 100-200] style Q fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style R fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style S0 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style S1 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style S2 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style RT fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff

Range-based sharding: range queries hit single shard.

Choosing Partition Key#

Good partition keys:

  • High cardinality (many unique values)
  • Evenly distributed
  • Queried frequently

Bad partition keys:

  • Low cardinality (status: active/inactive → 2 shards only)
  • Timestamp (recent data always on one shard)
  • Celebrity user (one shard gets all their traffic)

Example - E-commerce:

❌ Bad: shard by product_category (electronics shard overwhelmed during sales)

✅ Good: shard by order_id (evenly distributed)

❌ Bad: shard by created_at (recent orders all on one shard)

Rebalancing#

Add new shard? Data needs to move.

Hash-based rebalancing:

Before: hash(key) % 3
After:  hash(key) % 4

Almost all data moves. Expensive.

Better: Consistent hashing with virtual nodes (covered earlier). Add shard, only ~1/N data moves.

Range-based rebalancing: Split ranges when shard gets too large.

Before: Shard 0 (1-2M)
After:  Shard 0 (1-1M), Shard 3 (1M-2M)

Only affected range moves.

Cross-Shard Queries#

Query needs data from multiple shards? Scatter-gather.

-- Find all orders for user_123
-- User's orders might be on any shard if sharded by order_id

Application queries all shards, merges results. Slow.

Solutions:

  • Denormalize (duplicate user data on each shard with their orders)
  • Secondary index table (tracks which shards have data for each user)
  • Redesign schema (shard by user_id instead)

What Complexity Looks Like#

Worked with a client considering sharding PostgreSQL. Single instance hitting limits (500GB, queries slowing).

Questions that came up:

  • Shard by user_id or order_id? User_id keeps user data together but hotspot for power users.
  • Hash or range? Hash for even distribution but range queries become scatter-gather.
  • How to handle JOINs across shards? Denormalize or move to application layer?
  • What about transactions spanning shards? Not supported, need distributed transactions or saga pattern.

Decided: Not ready for sharding complexity yet. Scaled vertically to 1TB, added read replicas, partitioned tables within single database. Bought 2 more years before needing sharding.

Sharding is powerful but adds massive complexity. Don’t do it until vertical scaling exhausted.

How do you decide when to shard?