Priority Queues in Distributed Systems

You have a message queue. Urgent alerts and bulk data syncs go into the same queue. The urgent alert sits behind 5,000 bulk messages. By the time it’s processed, it’s no longer urgent.

FIFO doesn’t care about importance. Priority queues do.

Multi-Level Priority#

The simplest approach: multiple queues, one per priority level. Workers check the high-priority queue first.

public Runnable pollNext() {
    for (Queue queue : List.of(highQueue, mediumQueue, lowQueue)) {
        Runnable task = queue.poll();
        if (task != null) return task;
    }
    return null;
}

Problem: if high-priority messages keep arriving, low-priority messages never get processed. That’s starvation.

Starvation Prevention#

Weighted fair queuing: process 70% from high, 20% from medium, 10% from low. Low-priority items are slow but never stuck.

Aging: increase priority over time. A low-priority item that’s been waiting 30 minutes gets promoted to medium. Wait an hour, it becomes high. Everything eventually gets processed.

// Redis sorted set: score = priority_weight + age_bonus
double score = basePriority + (System.currentTimeMillis() - createdAt) * AGING_FACTOR;
redisTemplate.opsForZSet().add("task_queue", taskId, score);
// ZPOPMIN gets the highest-priority (lowest score) item

graph TD I[Incoming Tasks] --> C{Priority?} C -->|High| HQ[High Queue: 70%] C -->|Medium| MQ[Medium Queue: 20%] C -->|Low| LQ[Low Queue: 10%] HQ --> W[Workers] MQ --> W LQ --> W style I fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style C fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style HQ fill:#000000,stroke:#ff0000,stroke-width:2px,color:#fff style MQ fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style LQ fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style W fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff

Priority Inversion#

A subtle bug: a high-priority task depends on a result from a low-priority task. The high-priority task is “next” but can’t proceed because the thing it needs is stuck in the low queue. The fix is the same as in operating systems: temporarily boost the low-priority dependency so it completes first.

At Oracle, we had a single FIFO queue for NSSF notification processing. Urgent config updates (affecting live traffic) waited behind thousands of bulk sync messages from overnight batch jobs. We split into three priority levels with weighted processing. Urgent updates went from waiting minutes to processing within seconds. The backpressure mechanism stayed the same, it just applied per-level now.

What I’m Learning#

Priority queues seem like a small upgrade over FIFO, but they introduce real complexity. Starvation, inversion, and the question of “who decides priority?” are all design decisions that affect system behavior under load. The rule I follow: start with FIFO. Only add priority when you can point to a specific case where important work is being delayed by unimportant work.

How do you handle mixed-priority workloads in your systems?