Event Aggregation: When 47 Notifications Become One
47 people performed the same action on the same item. Do you show 47 separate notifications? Or “47 people did X on Y”?
Obviously the second. But getting there in a distributed system is trickier than it looks.
The Raw Event Problem#
Your system generates individual events. Each one is stored separately because that’s how event sourcing and fan-out work. But displaying them individually doesn’t scale. A popular item generates hundreds of events. Users drown in noise.
The Aggregation Key#
Group events by what they have in common: action type, target, and time window.
record AggregationKey(String actionType, String targetId, LocalDate timeBucket) {}
public Map<AggregationKey, List<Event>> aggregate(List<Event> events) {
return events.stream().collect(
Collectors.groupingBy(e -> new AggregationKey(
e.getActionType(),
e.getTargetId(),
e.getTimestamp().toLocalDate()
))
);
}
“Alice liked Photo-123 at 10am” and “Bob liked Photo-123 at 11am” share the same key: (LIKE, Photo-123, 2026-02-20). They collapse into “Alice, Bob, and 45 others liked Photo-123.”
Write-Time vs Read-Time Aggregation#
Read-time: Store individual events. Aggregate when the user requests them. Flexible (you can change grouping logic anytime) but expensive at read time. Every request groups and counts potentially thousands of events.
Write-time: Pre-aggregate into counters as events arrive. Increment a counter per aggregation key. Fast reads, but changing the grouping logic requires reprocessing all events.
// Write-time: increment counter on each event
public void onEvent(Event event) {
String key = event.getActionType() + ":" + event.getTargetId();
redisTemplate.opsForHash().increment("aggregates:" + key, "count", 1);
// Store latest actors for display
redisTemplate.opsForList().leftPush("actors:" + key, event.getActorId());
redisTemplate.opsForList().trim("actors:" + key, 0, 4); // keep last 5
}
Most systems use write-time aggregation for counts and keep a small window of individual events for display (“Alice, Bob, and 45 others”). This is CQRS in practice: the write model stores individual events, the read model stores aggregated summaries.
The Time Window Problem#
How wide should the time bucket be? Group all events from the last hour? The last day?
Too narrow: “3 people did X” at 10am, “2 more people did X” at 11am. Fragmented.
Too wide: “500 people did X this week.” Not useful for recent activity.
At Oracle, we had service health notifications that aggregated by hour. During an incident, 200 alerts in 10 minutes showed as one per hour. Operators missed the severity spike because the aggregation was too coarse. We switched to adaptive windowing: 5-minute buckets during high activity, hourly during normal times.
What I’m Learning#
Aggregation is where the event-driven world meets the user-facing world. Events are individual. Users want summaries. Bridging that gap requires choosing what to group, when to group it, and how much detail to preserve.
The hardest part isn’t the grouping logic. It’s deciding the time window. Too fine and you fragment. Too coarse and you lose signal.
How do you handle event aggregation? Pre-compute or aggregate at read time?