API gets hit with 10,000 requests/second. Designed for 100/second. Database crashes. Service dies.

Rate limiting prevents this.

Why Rate Limiting#

Protect your service from:

  • Abusive clients (intentional or buggy)
  • Traffic spikes you can’t handle
  • DDoS attacks
  • Expensive operations draining resources

Without rate limiting, one bad client kills service for everyone.

Token Bucket Algorithm#

Bucket holds tokens. Each request consumes one token. Tokens refill at fixed rate.

Rules:

  • Bucket capacity: 100 tokens
  • Refill rate: 10 tokens/second
  • Request arrives: Check if token available. Yes? Consume token, allow request. No? Reject.

Allows bursts (use all 100 tokens instantly) but enforces average rate over time.

public class TokenBucket {
    private final long capacity;
    private final long refillRate;  // tokens per second
    private long availableTokens;
    private long lastRefillTime;
    
    public TokenBucket(long capacity, long refillRate) {
        this.capacity = capacity;
        this.refillRate = refillRate;
        this.availableTokens = capacity;
        this.lastRefillTime = System.nanoTime();
    }
    
    public synchronized boolean allowRequest() {
        refill();
        
        if (availableTokens > 0) {
            availableTokens--;
            return true;
        }
        
        return false;  // Rate limited
    }
    
    private void refill() {
        long now = System.nanoTime();
        long timePassed = now - lastRefillTime;
        
        long tokensToAdd = (timePassed * refillRate) / 1_000_000_000L;
        
        if (tokensToAdd > 0) {
            availableTokens = Math.min(capacity, availableTokens + tokensToAdd);
            lastRefillTime = now;
        }
    }
}

Use when: You want to allow bursts. API can handle 100 req/sec sustained, but occasional spike to 500 req/sec is fine.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#000000','primaryTextColor':'#00ff00','primaryBorderColor':'#00ff00','lineColor':'#00ff00','secondaryColor':'#000000','tertiaryColor':'#000000','noteBkgColor':'#000000','noteBorderColor':'#00ff00','noteTextColor':'#00ff00'}}}%% graph TD B[Token Bucket
Capacity: 100
Refill: 10/sec] R1[Request 1] --> B B --> C1{Token available?} C1 -->|Yes| A1[Allow
Tokens: 99] C1 -->|No| D1[Deny 429] R2[Request 2] --> B B --> C2{Token available?} C2 -->|Yes| A2[Allow
Tokens: 98] C2 -->|No| D2[Deny 429] T[Time passes
+1 second] --> B B --> RF[Refill +10 tokens
Tokens: 108 → capped at 100] style B fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style A1 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style A2 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style D1 fill:#000000,stroke:#ff0000,stroke-width:2px,color:#fff style D2 fill:#000000,stroke:#ff0000,stroke-width:2px,color:#fff style RF fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff

Token bucket allows bursts up to capacity, refills at steady rate.

Leaky Bucket Algorithm#

Process requests at fixed rate, regardless of input rate. Excess requests queue or get rejected.

Think of it as a bucket with hole at bottom. Water (requests) pours in at any rate. Leaks out (processed) at fixed rate.

public class LeakyBucket {
    private final BlockingQueue<Request> queue;
    private final int capacity;
    private final long processRate;  // requests per second
    private final ScheduledExecutorService scheduler;
    
    public LeakyBucket(int capacity, long processRate) {
        this.queue = new LinkedBlockingQueue<>(capacity);
        this.capacity = capacity;
        this.processRate = processRate;
        this.scheduler = Executors.newScheduledThreadPool(1);
        
        // Process at fixed rate
        long delayMs = 1000 / processRate;
        scheduler.scheduleAtFixedRate(this::processRequest, 0, delayMs, TimeUnit.MILLISECONDS);
    }
    
    public boolean addRequest(Request request) {
        return queue.offer(request);  // Reject if queue full
    }
    
    private void processRequest() {
        Request request = queue.poll();
        if (request != null) {
            handleRequest(request);
        }
    }
}

Use when: You want smooth, predictable output rate. Prevents bursts entirely.

Token Bucket vs Leaky Bucket#

Token Bucket:

  • Allows bursts (up to bucket capacity)
  • Flexible for bursty traffic
  • Simpler to implement

Leaky Bucket:

  • Smooth output rate (no bursts)
  • Better for downstream systems that can’t handle spikes
  • Queue management needed

Most APIs use token bucket. More user-friendly (burst tolerance).

Per-User vs Global Rate Limiting#

Global: 1000 req/sec total across all users. One user can monopolize.

Per-User: 100 req/sec per user. Fair distribution.

public class PerUserRateLimiter {
    private final ConcurrentHashMap<String, TokenBucket> buckets;
    private final long capacity;
    private final long refillRate;
    
    public PerUserRateLimiter(long capacity, long refillRate) {
        this.buckets = new ConcurrentHashMap<>();
        this.capacity = capacity;
        this.refillRate = refillRate;
    }
    
    public boolean allowRequest(String userId) {
        TokenBucket bucket = buckets.computeIfAbsent(
            userId, 
            k -> new TokenBucket(capacity, refillRate)
        );
        
        return bucket.allowRequest();
    }
}

Memory concern: One bucket per user. 10,000 active users = 10,000 buckets. Use LRU cache with bounded size.

Distributed Rate Limiting#

Multiple servers need shared rate limit. Can’t track locally.

Solution: Redis with atomic operations.

public class RedisRateLimiter {
    private final RedisTemplate<String, Long> redis;
    
    public boolean allowRequest(String key, long limit, long windowSeconds) {
        String redisKey = "ratelimit:" + key;
        
        Long count = redis.opsForValue().increment(redisKey);
        
        if (count == 1) {
            // First request in window, set expiration
            redis.expire(redisKey, windowSeconds, TimeUnit.SECONDS);
        }
        
        return count <= limit;
    }
}

Redis handles concurrency. All servers see same counter.

Response Headers#

Tell client their rate limit status.

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 73
X-RateLimit-Reset: 1641234567

Client knows when they’ll get more quota. Can back off gracefully.

What I’ve Implemented#

Public API at one of the companies I worked with. No rate limiting initially. One buggy client script hammered endpoint. Generated 50,000 req/sec. Brought down entire service.

Added token bucket rate limiting. 1000 req/sec per API key. Bursts allowed up to 5000. Rejected excess with 429 status. Buggy client got rate limited, other clients unaffected.

Backpressure handles slow consumers. Rate limiting handles abusive producers.

How do you rate limit your APIs?