Discussion about this post

User's avatar
Sahar Massachi's avatar

One thing I forgot to make clear in the article is that --

1. If you partition your data, doing snapshots adds 0 latency

2. If you send partitions to cold storage after, say, 3 months, then your tables stay roughly the same size over time.

Expand full comment
Chris Gambill's avatar

Date-stamped snapshots are solid for analytic state questions and fast backfills. No argument from me on that, but where this breaks down is human-edited operational data. Salesforce, NetSuite, HubSpot, finance apps. Human entered data... Those records flip after the fact, and auditors expect an immutable trail tied to who changed what and when.

If you need real auditability, capture changes first (CDC or event log). Then layer convenience: a “latest” view for analysts and, if needed, an SCD2 history table for business reporting. Snapshots alone are fine for product analytics; they’re not enough for regulated ops data.

So the choice isn’t “SCD2 vs datestamps.” It’s immutable facts at the base, with snapshots or SCD2 as views that match the model and the risk.

Your question might be but how does SCD2 fix this?

It “fails” only if your ingestion cadence or diff method is too coarse. Fix with CDC or higher-frequency diffs. SCD2 will happily store the extra versions.

Then SCD2 is the storage pattern and CDC is the microscope.

Expand full comment
28 more comments...

No posts

Ready for more?