Aggregating over an infinite stream sounds easy until you realize you have no idea when it ends. You need to cut it into chunks. That’s what windows are.

Three Window Types#

Tumbling windows are fixed, non-overlapping buckets. “Clicks per minute” is a tumbling window: minute 1, minute 2, minute 3, no overlap. Simple to implement, but events that span the boundary get split across buckets.

Sliding windows overlap. “Average clicks in the last 5 minutes, recomputed every minute” means each event can appear in up to 5 windows. More CPU, but a smoother signal. No event falls through a gap.

Session windows are driven by behavior, not the clock. Events within 30 seconds of each other belong to one session. The window size is dynamic. Long pauses create natural boundaries.

graph TD T1[Tumbling: fixed 1-min buckets] --> T2[Event at 0:59 and Event at 1:01 in different buckets] S1[Sliding: 5-min window every 1 min] --> S2[Both events in same window] SE1[Session: 30s inactivity gap] --> SE2[Window size varies by user activity] style T1 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style T2 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style S1 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style S2 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style SE1 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style SE2 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff

In Kafka Streams the DSL looks like .windowedBy(TimeWindows.ofSizeWithNoGrace(Duration.ofMinutes(1))). State is stored keyed by (user_id, window_start), which means you need to think about state store sizing when you have millions of users.

At Salesforce#

We needed hourly aggregates for pipeline metrics. First attempt: a tumbling window inside a cron job. The edge case nobody caught in code review: a report event fired at 11:59:58 and arrived at 12:00:03 due to a 2-second network delay. The 12:00 cron already ran. That hour was under-counted.

Switching to a sliding window with a 5-minute grace period fixed it. The intraday chart looked noisier, but the hourly total was now correct. Saved us a daily manual reconciliation job that took about 20 minutes every morning.

The lesson that stuck: tumbling windows are the right default until you have events that arrive near bucket boundaries. Then they aren’t.

What I’m Learning#

Session windows confuse me the most because the state you have to maintain grows with user activity, not with time. How do you handle very long sessions that tie up memory?