You have 5 servers. Request comes in. Which one handles it?

Seems simple until you realize the algorithm determines whether your system handles load gracefully or collapses under traffic spikes.

Why Load Balancing Matters#

Single server hits capacity (CPU, memory, connections). Solution: add more servers, distribute requests.

But naive distribution fails. Send equal traffic to all servers? Server 1 might be processing expensive queries while Server 2 sits idle. You need smarter routing.

Round Robin#

Simplest approach. Cycle through servers in order.

public class RoundRobinLoadBalancer {
    private List<Server> servers;
    private AtomicInteger counter = new AtomicInteger(0);
    
    public Server getNextServer() {
        int index = counter.getAndIncrement() % servers.size();
        return servers.get(index);
    }
}

Fair distribution. Easy to implement. Problem: doesn’t account for server capacity or current load. Server 1 might be handling 10 slow requests while Server 2 has 10 fast ones. Both get same traffic.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#000000','primaryTextColor':'#00ff00','primaryBorderColor':'#00ff00','lineColor':'#00ff00','secondaryColor':'#000000','tertiaryColor':'#000000','noteBkgColor':'#000000','noteBorderColor':'#00ff00','noteTextColor':'#00ff00'}}}%% sequenceDiagram autonumber participant C as Clients participant LB as Load Balancer participant S1 as Server 1 participant S2 as Server 2 participant S3 as Server 3 C->>LB: Request 1 LB->>S1: Route (counter=0) C->>LB: Request 2 LB->>S2: Route (counter=1) C->>LB: Request 3 LB->>S3: Route (counter=2) C->>LB: Request 4 LB->>S1: Route (counter=0)

Round Robin cycles through servers regardless of their actual load.

Least Connections#

Send request to server with fewest active connections.

public class LeastConnectionsLoadBalancer {
    private List<Server> servers;
    
    public Server getNextServer() {
        return servers.stream()
            .min(Comparator.comparingInt(Server::getActiveConnections))
            .orElseThrow();
    }
}

class Server {
    private AtomicInteger activeConnections = new AtomicInteger(0);
    
    public int getActiveConnections() {
        return activeConnections.get();
    }
    
    public void incrementConnections() {
        activeConnections.incrementAndGet();
    }
    
    public void decrementConnections() {
        activeConnections.decrementAndGet();
    }
}

Better than round robin. Adapts to actual load. Problem: expensive request vs cheap request treated the same. Server with 5 database queries looks same as server with 5 cache hits.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#000000','primaryTextColor':'#00ff00','primaryBorderColor':'#00ff00','lineColor':'#00ff00','secondaryColor':'#000000','tertiaryColor':'#000000','noteBkgColor':'#000000','noteBorderColor':'#00ff00','noteTextColor':'#00ff00'}}}%% sequenceDiagram autonumber participant C as Clients participant LB as Load Balancer participant S1 as Server 1 (3 active) participant S2 as Server 2 (1 active) participant S3 as Server 3 (5 active) C->>LB: Request 1 Note over LB: Check connections
S2 has fewest (1) LB->>S2: Route C->>LB: Request 2 Note over LB: S1 and S2 tied (2 each)
Pick S1 LB->>S1: Route C->>LB: Request 3 Note over LB: S2 still has fewest (2) LB->>S2: Route

Least Connections routes to server with fewest active connections.

Weighted Round Robin#

Servers have different capacities. Give more traffic to bigger servers.

public class WeightedRoundRobinLoadBalancer {
    private List<WeightedServer> servers;
    private int currentIndex = 0;
    private int currentWeight = 0;
    
    public Server getNextServer() {
        while (true) {
            currentIndex = (currentIndex + 1) % servers.size();
            if (currentIndex == 0) {
                currentWeight = currentWeight - gcd(servers);
                if (currentWeight <= 0) {
                    currentWeight = maxWeight(servers);
                }
            }
            if (servers.get(currentIndex).weight >= currentWeight) {
                return servers.get(currentIndex).server;
            }
        }
    }
}

class WeightedServer {
    Server server;
    int weight; // Higher weight = more traffic
}

Example: Server 1 (weight=3), Server 2 (weight=1). Pattern: S1, S1, S1, S2, S1, S1, S1, S2…

Accounts for different server capacities. Problem: weights are static. Doesn’t adapt if Server 1 suddenly slows down.

IP Hash / Sticky Sessions#

Route same client to same server. Useful for session affinity.

public class IPHashLoadBalancer {
    private List<Server> servers;
    
    public Server getNextServer(String clientIP) {
        int hash = clientIP.hashCode();
        int index = Math.abs(hash) % servers.size();
        return servers.get(index);
    }
}

Session state stays on one server. No need for distributed session storage. Problem: uneven distribution if few clients make many requests. Removing server breaks all those sessions.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#000000','primaryTextColor':'#00ff00','primaryBorderColor':'#00ff00','lineColor':'#00ff00','secondaryColor':'#000000','tertiaryColor':'#000000','noteBkgColor':'#000000','noteBorderColor':'#00ff00','noteTextColor':'#00ff00'}}}%% sequenceDiagram autonumber participant C1 as Client A
IP: 192.168.1.10 participant C2 as Client B
IP: 192.168.1.20 participant LB as Load Balancer participant S1 as Server 1 participant S2 as Server 2 C1->>LB: Request 1 Note over LB: hash(192.168.1.10) % 2 = 0 LB->>S1: Always Server 1 C2->>LB: Request 2 Note over LB: hash(192.168.1.20) % 2 = 1 LB->>S2: Always Server 2 C1->>LB: Request 3 Note over LB: Same IP = Same server LB->>S1: Always Server 1 C2->>LB: Request 4 LB->>S2: Always Server 2

IP Hash ensures same client always hits same server (session affinity).

Real-World Challenges#

Health checks: Load balancer needs to know which servers are healthy. Passive (detect failures) or active (periodic pings). Failed health check? Stop sending traffic.

Connection draining: Server going down for maintenance? Stop new requests, wait for existing connections to finish. Graceful shutdown.

Hot servers: One server gets unlucky with expensive requests. Load balancer should detect high latency and back off. Requires monitoring response times, not just connection counts.

What I’ve Seen#

At one of the companies I worked at, we used round robin with Spring Cloud. Worked fine until one instance had a slow database connection. Requests kept piling up on that server (because round robin doesn’t care about load) while others sat mostly idle. Switched to least connections, problem solved.

Most production systems use least connections or weighted least connections. Round robin is too naive. IP hash only when you absolutely need session affinity.

What load balancing strategy does your system use?