Green dot next to a username. Looks simple. Behind it is a distributed system that’s constantly guessing whether a user is still connected.

Presence is deceptively hard. It’s an inherently eventually consistent problem, and getting it wrong means showing someone as online when they closed their laptop 10 minutes ago.

Heartbeat-Based Presence#

The standard approach: clients send periodic heartbeats. Server tracks the last heartbeat time. If no heartbeat arrives within a timeout window, the user is considered offline.

// Client sends heartbeat every 10 seconds
public void heartbeat(String userId) {
    // Redis key expires in 30 seconds (3 missed heartbeats = offline)
    redisTemplate.opsForValue()
        .set("presence:" + userId, "online", Duration.ofSeconds(30));
}

public boolean isOnline(String userId) {
    return redisTemplate.hasKey("presence:" + userId);
}

Redis TTL does the heavy lifting. No explicit “set offline” needed. If heartbeats stop, the key expires. Simple and self-healing.

The interval-vs-timeout trade-off matters. Heartbeat every 5 seconds with a 15-second timeout? Accurate but chatty. Every 30 seconds with a 90-second timeout? Less traffic but a user who closed the tab might show as online for over a minute.

The Fan-Out Problem#

User goes online. They’re in 50 group conversations. Do you notify all members of all 50 groups immediately? That’s potentially thousands of presence updates per status change.

graph TD U[User Comes Online] --> H[Heartbeat to Server] H --> R[Update Redis TTL] R --> N{Notify Contacts?} N -->|Eager| F[Fan-out to all groups] N -->|Lazy| P[Update on next poll] F --> EX[Expensive but instant] P --> CH[Cheap but delayed] style U fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style H fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style R fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style N fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style F fill:#000000,stroke:#ff0000,stroke-width:2px,color:#fff style P fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style EX fill:#000000,stroke:#ff0000,stroke-width:2px,color:#fff style CH fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff

Most systems use lazy presence for this reason. Don’t push status changes. Instead, when a user opens a conversation, fetch presence for the visible members. Much cheaper, slightly stale, usually good enough.

This is the same fan-out trade-off that shows up everywhere in distributed systems.

Failure Modes#

The tricky part: distinguishing “user closed the app” from “network blip.” A single missed heartbeat shouldn’t flip someone to offline. That’s why the timeout is usually 2-3x the heartbeat interval. But during a network partition, you’ll show everyone as offline even though they’re still there.

At Oracle, we added presence indicators to our internal service dashboard. The initial implementation used a 5-second heartbeat and 10-second timeout. Every minor network hiccup caused services to flicker between online and offline. Operations team hated it. We bumped the timeout to 30 seconds, which is exactly the timeout tuning lesson I keep re-learning: tight timeouts cause more problems than they solve.

This is basically the same failure detection pattern that gossip protocols use. Phi accrual failure detectors, suspicion levels, adaptive timeouts. It all applies here.

What I’m Learning#

Presence is a best-effort system. You’re never truly sure if someone is online. You’re making probabilistic guesses based on recent heartbeats. The trick is choosing the right trade-offs: accuracy vs overhead, eagerness vs cost.

That green dot is a lie. But it’s a useful one.

How do you handle presence in your systems? Fixed timeouts or adaptive?