Delayed Message Delivery: Execute This in 30 Minutes

User signs up, you want to send a welcome email in 30 minutes. The obvious approach: Thread.sleep(30 * 60 * 1000). The obvious problem: your server restarts and the task is gone forever.

Delayed execution needs to survive restarts, scale across instances, and handle failures.

Database Polling#

The simplest durable approach: write the task to a database with an execute_at timestamp. A poller checks every few seconds for due tasks.

SELECT * FROM scheduled_tasks
WHERE execute_at <= NOW() AND status = 'PENDING'
ORDER BY execute_at
LIMIT 100
FOR UPDATE SKIP LOCKED;

It works. But you’re running this query every few seconds whether there are due tasks or not. And if your table grows large, the query gets slower even with an index on execute_at.

Redis Sorted Sets#

Better: use the execution timestamp as the score in a Redis sorted set.

// Schedule a task for 30 minutes from now
public void schedule(String taskId, Instant executeAt) {
    redisTemplate.opsForZSet().add(
        "delayed_tasks",
        taskId,
        executeAt.toEpochMilli()
    );
}

// Poll for due tasks
public List<String> pollDueTasks() {
    long now = Instant.now().toEpochMilli();
    Set<String> due = redisTemplate.opsForZSet()
        .rangeByScore("delayed_tasks", 0, now);
    // Remove claimed tasks atomically
    due.forEach(id -> redisTemplate.opsForZSet().remove("delayed_tasks", id));
    return new ArrayList<>(due);
}

Insert is O(log n). Range query for due tasks is O(k) where k is the number of due items. No wasted scans.

graph TD S["Schedule Task (execute_at = now + 30m)"] --> Z["Redis Sorted Set (score = timestamp)"] Z --> P["Poller: ZRANGEBYSCORE 0, now"] P --> D{Due tasks?} D -->|Yes| E[Remove + Process] D -->|No| W[Wait, poll again] E --> ACK{Success?} ACK -->|Yes| DONE[Complete] ACK -->|No| RE["Re-schedule (retry delay)"] style S fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style Z fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style P fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style D fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style E fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style W fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style ACK fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style DONE fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style RE fill:#000000,stroke:#ff0000,stroke-width:2px,color:#fff

The Visibility Timeout Problem#

Worker picks up a delayed task, starts processing, crashes. The task was already removed from the sorted set. It’s lost.

The fix: don’t remove immediately. Move it to an “in-progress” set with a visibility timeout. If the worker doesn’t acknowledge within the timeout, the task re-appears in the delayed queue. Same pattern as dead letter queues and message broker acknowledgments.

This means processing is at-least-once. The task might execute twice if the worker finishes but crashes before acknowledging. You need idempotency on the handler side.

At Oracle, NSSF config changes needed a 5-minute cool-down before applying. If multiple changes arrived within 5 minutes, only the latest should apply. We initially used Thread.sleep in the handler thread. Restarting the service during the window meant changes were lost. Moved to a Redis sorted set: each config change writes to the set with apply_at = now + 5 minutes. A poller picks up due changes. If a newer change arrives for the same config, the old entry gets overwritten (same member, updated score). Simple, durable, and restart-safe.

What I’m Learning#

Delayed execution shows up everywhere: retry delays, scheduled notifications, reservation expiry, cool-down periods. The pattern is always the same: store the task with a future timestamp, poll for due items, process with at-least-once guarantees. Redis sorted sets hit the sweet spot for most use cases: fast, simple, and the timestamp-as-score trick makes range queries natural.

How do you handle delayed or scheduled tasks in your system?