Collection Versioning

Immutable versioning is one of Byline's differentiating pillars. The document half of that story is fundamental: every save writes a new documentVersions row keyed by UUIDv7, a current_documents view resolves "the latest" via ROW_NUMBER() OVER PARTITION, and a status change is the deliberate exception that mutates a row in place. Collection versioning is the schema-side companion: on every document save, Byline records which version of the collection's schema the document was written against.

What you can rely on. Every document version carries an integer collection_version you can read and reason about, and every collection row carries a schema_hash that bumps when the data-affecting parts of the schema change. The aim is that a document version can later be resolved against the collection schema as it was when the version was written, and migrated forward in memory. Today Byline records the version; it does not yet read documents by it — the read path uses the live CollectionDefinition regardless of collection_version. See the boundary below for exactly where that line sits.

What is recorded

Two columns on collections, one on document_versions:

collections
  version       integer  NOT NULL DEFAULT 1
  schema_hash   varchar(64)             -- nullable; see below

document_versions
  collection_version  integer  NOT NULL

Both current_documents and current_published_documents views project collection_version so it surfaces on every read. Every pre-existing row is implicitly v1. schema_hash is nullable to accommodate rows that predate the feature; any row written after ensureCollections() runs carries a hash.

Fingerprint

schema_hash is a SHA-256 over a canonicalised projection of the CollectionDefinition. The fingerprint defines what counts as a "schema change" for the purposes of bumping the version. It is deliberately narrow — only properties that affect the storable document shape participate.

Included (a change bumps the version):

path
useAsTitle, useAsPath
fields (recursive). Per field: name, type, optional, localized. Compound types recurse into fields / blocks. Per type: relation.targetCollection, relation.displayField, select.options.value, datetime.mode, and validation for text / textArea / richText / float / integer.
workflow — status names and defaultStatus. Labels and verbs are stripped.
Per-field upload — mimeTypes, maxFileSize, and sizes[].{name, width, height, fit, format, quality}.

Excluded (changes do NOT bump):

labels.singular, labels.plural
hooks (function values can't be JSON-stable anyway)
search, showStats (admin UX)
Field-level label, helpText, placeholder (admin UX)
Workflow status label, verb
upload.storage (provider implementation, not data shape)
Select option labels

The stripping rules are enforced by whitelist — known keys are copied; unknown keys are dropped. So adding a new presentational field to CollectionDefinition will not silently churn versions. Stability is covered by 19 contract tests in collection-fingerprint.test.node.ts: key-order invariance, function exclusion, every "does NOT bump" rule, and every "DOES bump" rule.

Why SHA-256, why Web Crypto. SHA-256 over a 32-bit hash because this is the tamper-evidence record for the lifetime of the installation — collision resistance matters. 64 hex chars is cheap to store and compare. The hash is computed via crypto.subtle.digest (Web Crypto), not Node's node:crypto. An earlier iteration imported node:crypto.createHash; Vite's module-graph walker pulled the import into the client bundle (via core.ts → collection-bootstrap.ts → fingerprint) even though the client never calls fingerprintCollection. Externalising node:crypto would have thrown at runtime. The Web Crypto switch eliminated the issue without conditional platform code; the side-effect is that fingerprintCollection is async.

Version-bump policy

CollectionDefinition carries an optional version?: number. Behaviour, in ensureCollections / reconcileCollection:

Compute the fingerprint of the in-memory definition.
No row exists → insert with version = definition.version ?? 1 and the fingerprint.
Row exists, hash matches → no-op. Independent of any definition.version pin: the hash is the source of truth for "did the shape change?", and a no-op write would just add noise.
Row exists, hash differs:
- definition.version pinned and > stored.version → use the pin.
- definition.version pinned and < stored.version → throw. Pinning backwards is always a developer error (it silently desynchronises the version from document history).
- definition.version pinned and == stored.version → use it. Effectively a "yes, I know the shape changed but don't bump" pin.
- definition.version omitted → auto-bump to stored.version + 1.
First-run-after-Phase-1 special case. When stored_hash is NULL (existing row pre-dating this feature), don't auto-bump. Backfill the hash at whatever version the DB already holds. Without this, every collection would bump from v1 to v2 on the first boot after Phase 1 deployed, for no information reason.

The hybrid — auto-bump as default, explicit pin as escape hatch — was chosen over both alternatives:

"Explicit only" is easy to forget and produces silent drift. A dev adds a field, forgets to bump, and collection_version = 3 is now stamped on a row authored against a different shape than v3.
"Hash-only, no pin" is the cleanest API but blocks two real workflows: aligning version numbers across environments (so staging catches up to prod), and reserving a round number for a planned major change.
The hybrid keeps the common case zero-effort while allowing either escape. Even under a manual pin the hash is still recorded, so Phase 2 can detect "the config on disk no longer matches the version we have written down." That's why schema_hash exists as a separate column rather than being implied by version.

A future Phase-5 strictCollectionVersions: true flag could invert the default for CI, requiring explicit bumps when the hash changes. The plumbing is already in place — it's only a policy knob.

Boundary — what does NOT read by version yet

Storage and the document lifecycle write collection_version but do not read by it. Every read still uses the current CollectionDefinition in memory.

A document from collection_version = 2 loaded against a live v3 definition reconstructs against v3's field set. If v3 added a field, the field is absent on the reconstructed document (no row exists for it). If v3 removed a field, the orphan store rows from v2 are silently ignored by restoreFieldSetData. If v3 renamed a field, the v2 rows are orphaned the same way and the new name is absent — which is the failure mode that motivates the future migration phases.

This is the deliberate scope of Phase 1: record now so the migration story can land later without a schema migration. Until Phase 3+ ships, treat collection_version as recorded data without semantics in the read path.

Startup reconciliation

initBylineCore() calls ensureCollections() once and caches the result on BylineCore. Reconciliation involves:

Fingerprinting every in-memory definition (sub-millisecond, no I/O).
Reading the stored row for each (one indexed SELECT on a ≤ 50-row table).
Comparing hashes.
Possibly an UPDATE (bump path) or INSERT (first boot).
Possibly throwing (backwards-pin error) before the process accepts traffic.

The loop is Promise.all(...) across all definitions, so wall-clock cost is one DB round-trip plus the fingerprint cost — not N round-trips. For a local Postgres that's ~5 ms total; for a managed DB across a VPC, ~10–50 ms. It's paid once per process.

Why startup, not lazy. A previous prototype used a lazy ensureCollection(path) that ran on every admin request. That worked when reconciliation was just "does the row exist? if no, insert." Phase 1 made the work semantic — decisions with consequences, including "should this throw and block the process?" — and where a semantic decision runs changes its failure surface.

Concern	Startup	Lazy (per-request, cached)
When a `version` pin error surfaces	Server refuses to start (loud, ops-visible)	First request to the offending route fails (scattered, user-visible)
When version-bump logs appear	All at boot, easy to grep	Scattered across the day's request logs
When an unreachable DB blocks you	Boot	First request per collection
First-request latency	Normal	Adds 1–2 round-trips on cold collection paths
State predictability for ops	"Everything reconciled by the time the server is up"	"Each collection reconciles when someone first hits it"
Consistency under parallel cold-starts	Single synchronous phase, no races	Two simultaneous first-requests can both attempt a bump

The last row matters under load. Lazy reconciliation inside a request handler has a lost-update window where two concurrent first-requests both compute the same hash, both see "no match," and both try to bump — yielding either a duplicate-key error or a double-bump. Startup reconciliation runs once, before any request.

Where lazy (or a hybrid) would actually win — three configurations would flip the trade, none of which apply today:

Serverless / edge / short-lived processes. Every cold start pays startup cost. For 20 collections at ~50 ms total, that's a meaningful slice of a 100 ms invocation budget. Byline's current target is a long-running Node process.
Hundreds or thousands of collections in a multi-tenant installation. At 500+ collections, even concurrent SELECTs get uncomfortable. Two better options at that scale: lazy DB reconciliation with synchronous in-memory fingerprinting at startup (catches authoring errors fail-fast without hitting the DB for unused schemas), or a "did anything change since last boot?" aggregate-hash check that reconciles individually only on a mismatch.
Reconciliation starts doing expensive work. If Phase 2's history-table writes get large enough that bumping 20 collections on a redeploy is painful, selective or deferred reconciliation would win.

The Phase-1 code is structured so that dropping in a lazy or hybrid strategy later is a localised change — collectionRecords stays the contract; only the population strategy moves.

Fail-fast by default. A concrete benefit worth pulling out: startup reconciliation means a backwards version pin, a duplicate collection path, or an adapter mis-configuration fails the process before it accepts traffic. Operators find out during deploy, not during the first affected request. For a CMS where the blast radius of a silent schema desync is "every document written during the window is mis-stamped," that's the correct trade even before considering performance.

Current limitations

Recording the schema version is in place; reading by it is not. The boundary is:

Reads ignore collection_version. A document written under v2 and loaded against a live v3 definition reconstructs against v3. A field v3 added is absent; a field v3 removed leaves orphaned store rows that are ignored; a renamed field reads as the old name orphaned and the new name absent. Materialising an old document against its original schema — and migrating it forward in memory — is not yet supported.
schema_hash is nullable. The runtime invariant is that any row written after ensureCollections() carries a hash; only rows predating the feature can legitimately be NULL.
Bootstrap is fail-fast, not fail-partial. If one collection throws (e.g. a backwards version pin), reconciliation rejects and the server refuses to start — other in-flight reconciliations may already have written. A partially-reconciled startup is deliberately treated as worse than no startup.
initBylineCore is async. The webapp awaits it via top-level await in byline/server.config.ts; a non-Vite consumer would need to await explicitly.