Revision History and Snapshotting

A user hits Ctrl+Z forty times and expects to land exactly where they were yesterday. That is not just undo. That is a complete audit trail of every edit, stored efficiently, queryable at any point in time. The naive approach: store a full copy of the document after every change. Works for ten users. Collapses at ten thousand.

Deltas, Not Copies#

Instead of storing full document state after every edit, store only what changed: the operation (insert 3 chars at position 12, delete 5 chars at position 20). A revision is a delta. The current document is what you get by replaying all deltas from the beginning. Compact to store, expensive to query at a specific point in time as the delta log grows.

Periodic Snapshots#

Take a full document snapshot every N revisions (say, every 100 edits). To reconstruct the document at revision 350, load the snapshot at revision 300, replay only 50 deltas. Snapshot frequency trades storage cost against replay cost. More snapshots: faster reads, more storage. Fewer: slower reads, less storage. Same trade-off as Checkpointing in long-running jobs.

Compaction#

Old deltas between two snapshots can be discarded once you no longer need fine-grained history at that resolution. A document version system might keep full per-keystroke deltas for 30 days, then compact to per-minute snapshots, then to per-day snapshots. Same tiered retention pattern as Downsampling and Data Retention.

Undo in Collaborative Systems#

Undo in a collaborative document is not just “apply the inverse of the last op.” Another user may have applied ops after yours. You cannot pop a local stack. Undo must be transformed against all intervening ops before being applied, which is exactly what Operational Transformation handles.

graph TD EditStream["EditStream (d1...d100)"] --> Snapshot1["Snapshot S1 (full state at d100)"] EditStream2["EditStream (d101...d200)"] --> Snapshot2["Snapshot S2 (full state at d200)"] QueryNode["QueryAtRevision150"] --> LoadS1["Load S1"] LoadS1 --> ReplayDeltas["Replay d101-d150"] ReplayDeltas --> Result["Document state at revision 150"] style EditStream fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style EditStream2 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style Snapshot1 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style Snapshot2 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style QueryNode fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style LoadS1 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style ReplayDeltas fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style Result fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff

At Salesforce, a record editing feature stored a full copy of the record JSON in a history table on every save. For a record with 60 fields, each save wrote about 4KB. At 2M active records with heavy editing, the history table hit 200GB within 6 months and became the top disk consumer. We switched to delta storage: only the changed fields per save, stored as a diff. Average history entry dropped from roughly 4KB to about 200 bytes. The history table shrank 95% from its projected size. The query to reconstruct a record at a past point in time went from a full JSON blob scan to a short delta replay.

What I’m Learning#

Snapshot frequency is one of those knobs nobody thinks about until the delta log is 10 million entries long and a point-in-time query takes 30 seconds. Size the snapshot interval against your P99 read latency budget, not against storage cost alone.

Have you built revision history for a product? Did you go full copies, deltas, or some hybrid?