Posts for: #Kafka

Handling Incompatible Schema Changes

2026-05-04sohilladhani

Sometimes the change you need to make breaks compatibility. You can’t add a default. The field type genuinely needs to change. You can’t keep the old schema. Here’s what you do instead. The New Topic Strategy The cleanest approach: create a new topic with the new schema. Producers write to both old and new topics in parallel. Consumers migrate to the new topic one by one. When all consumers are on the new topic, stop writing to the old one.

[Read more]

Event Schema Evolution

2026-05-03sohilladhani

#distributed-systems #system-design #kafka #architecture

Kafka retains messages for days or weeks. Your consumer code will be updated independently of your producer. That means old messages need to be readable by new consumer code, and new messages need to be readable by old consumer code. You can’t just change a field. What Backward and Forward Mean Backward compatibility: a new consumer can read messages written with the old schema. If you add an optional field with a default, old messages (which don’t have that field) are still valid.

[Read more]

Schema Registry

2026-05-02sohilladhani

#distributed-systems #system-design #kafka #architecture

Service A writes a Kafka message with field user_id. Service B reads it. Service A’s team renames it to userId next sprint. Service B starts throwing deserialization errors at runtime. Neither team knew about the other. The Problem In a microservices system passing messages through Kafka, producers and consumers evolve independently. There’s no enforced contract. A producer can change a field name, add a required field, or change a data type, and the consumer finds out when deserialization fails in production.

[Read more]

Lambda and Kappa Architecture

2026-04-04sohilladhani

#distributed-systems #architecture #kafka #stream-processing #system-design

Real-time results are fast and approximate. Historical results are slow and accurate. The tension between them is where Lambda and Kappa architecture come from. Lambda: Two Pipelines Lambda runs two parallel systems. The batch layer processes all historical data on a schedule (Spark on HDFS, every few hours) and produces ground truth. The speed layer processes the live stream (Kafka Streams or Flink) for low-latency results. The serving layer merges both: “latest batch result plus stream delta since the last batch.

[Read more]

Watermarks and Late-Arriving Data

2026-04-03sohilladhani

#distributed-systems #stream-processing #kafka #system-design

There are two clocks in any stream processing system. Event time: when the click actually happened, recorded in the payload. Processing time: when your system received it. On a healthy network they’re close. In reality they’re not. Mobile clients buffer events when offline. Retries add delay. A click at 10:00:05 might reach your processor at 10:00:47. The 10:00 window has long since closed. The Problem With Never Waiting If you never close a window, you never produce output.

[Read more]

Stream Processing Windows

2026-04-02sohilladhani

#distributed-systems #kafka #stream-processing #system-design

Aggregating over an infinite stream sounds easy until you realize you have no idea when it ends. You need to cut it into chunks. That’s what windows are. Three Window Types Tumbling windows are fixed, non-overlapping buckets. “Clicks per minute” is a tumbling window: minute 1, minute 2, minute 3, no overlap. Simple to implement, but events that span the boundary get split across buckets. Sliding windows overlap. “Average clicks in the last 5 minutes, recomputed every minute” means each event can appear in up to 5 windows.

[Read more]

Consumer Group Rebalancing: The Partition Shuffle

2026-03-23sohilladhani

#distributed-systems #system-design #architecture #kafka #event-driven

You have 3 consumers reading from 6 Kafka partitions. One consumer crashes. The remaining 2 need to pick up its partitions. That handoff isn’t as smooth as you’d hope.

[Read more]

Log Compaction: Keeping the Latest Without Keeping Everything

2026-03-22sohilladhani

#distributed-systems #system-design #architecture #kafka #event-driven

Your event log has 100 million records. Key ‘user-42’ has been updated 500 times. You only care about the latest value. But deleting old entries would break consumers who haven’t caught up yet.

[Read more]

Ordering Guarantees in Event-Driven Systems

2026-02-07sohilladhani

#distributed-systems #kafka #messaging #event-driven #system-design

Events arrive out of order. User updates overwrite each other. Here’s how partition keys, sequence numbers, and causal ordering keep things straight.

[Read more]

Dead Letter Queues

2026-02-06sohilladhani

#distributed-systems #kafka #messaging #resilience #system-design

Your consumer retried a bad message 10,000 times. It will never succeed. Dead letter queues catch the messages that can’t be processed so the rest of your system keeps moving.

[Read more]