Handling Incompatible Schema Changes
Sometimes the change you need to make breaks compatibility. You can’t add a default. The field type genuinely needs to change. You can’t keep the old schema. Here’s what you do instead.
The New Topic Strategy#
The cleanest approach: create a new topic with the new schema. Producers write to both old and new topics in parallel. Consumers migrate to the new topic one by one. When all consumers are on the new topic, stop writing to the old one. After retention expires, the old topic goes away.
More work than a simple schema change, but safe. Old consumers never see a message they can’t deserialize. New consumers are isolated from the old format. Both topics run in parallel until migration completes.
A Transformer Service#
If creating a new topic isn’t practical, a transformer service reads from the old topic, converts the message format, and writes to the new topic. Consumers only deal with the new format. The risk: the transformer is a new service to operate, and it can fall behind if the old topic has high throughput. You’re trading coordination complexity for operational complexity.
Versioned Field Names#
A lower-tech option: keep the old field and add a new one alongside it. user_id stays. user_uuid is added. Consumers migrate from reading user_id to user_uuid at their own pace. Once all consumers have migrated, deprecate user_id (keep it in the schema for a while but stop populating it). Eventually remove it.
This bloats the schema over time but avoids multi-team coordination pressure.
At Salesforce#
We needed to change a field type in a platform event from a stringified integer to an actual integer. Not compatible. We created a new topic and ran both in parallel. The migration took eight weeks because one consumer team had a quarterly release cycle. Operating both topics wasn’t terrible, but the coordination effort was. After that experience, we documented every Kafka topic with a deprecation date from day one, so teams know upfront how long old topics will be supported.
What I’m Learning#
The hardest part of an incompatible schema change isn’t the technical migration. It’s finding every consumer, getting them to prioritize the work, and confirming they’re done before you shut down the old topic. Tooling that shows which consumer groups are still active on a topic makes the last step much easier.
Have you managed an incompatible Kafka schema change, and how long did the migration take end to end?