Byline CMS
  • Home
  • Docs
  • About
Byline CMS
View on GitHub
  • Getting Started
    • Experimental CLI
    • Development environment and example application
  • Why Byline
    • Mission & Vision
    • Content Management in the Time of AI
  • Key Architectural Decisions
    • Core Document Storage
    • Core Composition
    • Transactions
  • Collections
    • Fields API
    • Relationships
    • Document Trees
    • Document Paths
    • File / Media Uploads
    • Rich Text Editor
    • Collection Versioning
  • Reading & Delivery
    • Client SDK (@byline/client)
    • Routing & API
    • Transports
    • Markdown Export
    • MCP Server
    • Caching
  • Auth & Security
    • Authentication & Authorization
    • Auditability
  • Internationalization (i18n)
    • The host i18n system
    • Admin interface translations
    • Content locales
    • Administering content locales
  • Admin UI
    • UI Kit (@byline/ui)
    • Client-config registration
  • Testing
  • Home

Collection Versioning

Immutable versioning is one of Byline's differentiating pillars. The document half of that story is fundamental: every save writes a new documentVersions row keyed by UUIDv7, a current_documents view resolves "the latest" via ROW_NUMBER() OVER PARTITION, and a status change is the deliberate exception that mutates a row in place. Collection versioning is the schema-side companion: on every document save, Byline records which version of the collection's schema the document was written against.

What you can rely on. Every document version carries an integer collection_version you can read and reason about, and every collection row carries a schema_hash that bumps when the data-affecting parts of the schema change. The aim is that a document version can later be resolved against the collection schema as it was when the version was written, and migrated forward in memory. Today Byline records the version; it does not yet read documents by it — the read path uses the live CollectionDefinition regardless of collection_version. See the boundary below for exactly where that line sits.

What is recorded

Two columns on collections, one on document_versions:

collections
version integer NOT NULL DEFAULT 1
schema_hash varchar(64) -- nullable; see below
document_versions
collection_version integer NOT NULL

Both current_documents and current_published_documents views project collection_version so it surfaces on every read. Every pre-existing row is implicitly v1. schema_hash is nullable to accommodate rows that predate the feature; any row written after ensureCollections() runs carries a hash.

Fingerprint

schema_hash is a SHA-256 over a canonicalised projection of the CollectionDefinition. The fingerprint defines what counts as a "schema change" for the purposes of bumping the version. It is deliberately narrow — only properties that affect the storable document shape participate.

Included (a change bumps the version):

  • path
  • useAsTitle, useAsPath
  • fields (recursive). Per field: name, type, optional, localized. Compound types recurse into fields / blocks. Per type: relation.targetCollection, relation.displayField, select.options.value, datetime.mode, and validation for text / textArea / richText / float / integer.
  • workflow — status names and defaultStatus. Labels and verbs are stripped.
  • Per-field upload — mimeTypes, maxFileSize, and sizes[].{name, width, height, fit, format, quality}.

Excluded (changes do NOT bump):

  • labels.singular, labels.plural
  • hooks (function values can't be JSON-stable anyway)
  • search, showStats (admin UX)
  • Field-level label, helpText, placeholder (admin UX)
  • Workflow status label, verb
  • upload.storage (provider implementation, not data shape)
  • Select option labels

The stripping rules are enforced by whitelist — known keys are copied; unknown keys are dropped. So adding a new presentational field to CollectionDefinition will not silently churn versions. Stability is covered by 19 contract tests in collection-fingerprint.test.node.ts: key-order invariance, function exclusion, every "does NOT bump" rule, and every "DOES bump" rule.

Why SHA-256, why Web Crypto. SHA-256 over a 32-bit hash because this is the tamper-evidence record for the lifetime of the installation — collision resistance matters. 64 hex chars is cheap to store and compare. The hash is computed via crypto.subtle.digest (Web Crypto), not Node's node:crypto. An earlier iteration imported node:crypto.createHash; Vite's module-graph walker pulled the import into the client bundle (via core.ts → collection-bootstrap.ts → fingerprint) even though the client never calls fingerprintCollection. Externalising node:crypto would have thrown at runtime. The Web Crypto switch eliminated the issue without conditional platform code; the side-effect is that fingerprintCollection is async.

Version-bump policy

CollectionDefinition carries an optional version?: number. Behaviour, in ensureCollections / reconcileCollection:

  1. Compute the fingerprint of the in-memory definition.
  2. No row exists → insert with version = definition.version ?? 1 and the fingerprint.
  3. Row exists, hash matches → no-op. Independent of any definition.version pin: the hash is the source of truth for "did the shape change?", and a no-op write would just add noise.
  4. Row exists, hash differs:
    • definition.version pinned and > stored.version → use the pin.
    • definition.version pinned and < stored.version → throw. Pinning backwards is always a developer error (it silently desynchronises the version from document history).
    • definition.version pinned and == stored.version → use it. Effectively a "yes, I know the shape changed but don't bump" pin.
    • definition.version omitted → auto-bump to stored.version + 1.
  5. First-run-after-Phase-1 special case. When stored_hash is NULL (existing row pre-dating this feature), don't auto-bump. Backfill the hash at whatever version the DB already holds. Without this, every collection would bump from v1 to v2 on the first boot after Phase 1 deployed, for no information reason.

The hybrid — auto-bump as default, explicit pin as escape hatch — was chosen over both alternatives:

  • "Explicit only" is easy to forget and produces silent drift. A dev adds a field, forgets to bump, and collection_version = 3 is now stamped on a row authored against a different shape than v3.
  • "Hash-only, no pin" is the cleanest API but blocks two real workflows: aligning version numbers across environments (so staging catches up to prod), and reserving a round number for a planned major change.
  • The hybrid keeps the common case zero-effort while allowing either escape. Even under a manual pin the hash is still recorded, so Phase 2 can detect "the config on disk no longer matches the version we have written down." That's why schema_hash exists as a separate column rather than being implied by version.

A future Phase-5 strictCollectionVersions: true flag could invert the default for CI, requiring explicit bumps when the hash changes. The plumbing is already in place — it's only a policy knob.

Boundary — what does NOT read by version yet

Storage and the document lifecycle write collection_version but do not read by it. Every read still uses the current CollectionDefinition in memory.

A document from collection_version = 2 loaded against a live v3 definition reconstructs against v3's field set. If v3 added a field, the field is absent on the reconstructed document (no row exists for it). If v3 removed a field, the orphan store rows from v2 are silently ignored by restoreFieldSetData. If v3 renamed a field, the v2 rows are orphaned the same way and the new name is absent — which is the failure mode that motivates the future migration phases.

This is the deliberate scope of Phase 1: record now so the migration story can land later without a schema migration. Until Phase 3+ ships, treat collection_version as recorded data without semantics in the read path.

Startup reconciliation

initBylineCore() calls ensureCollections() once and caches the result on BylineCore. Reconciliation involves:

  1. Fingerprinting every in-memory definition (sub-millisecond, no I/O).
  2. Reading the stored row for each (one indexed SELECT on a ≤ 50-row table).
  3. Comparing hashes.
  4. Possibly an UPDATE (bump path) or INSERT (first boot).
  5. Possibly throwing (backwards-pin error) before the process accepts traffic.

The loop is Promise.all(...) across all definitions, so wall-clock cost is one DB round-trip plus the fingerprint cost — not N round-trips. For a local Postgres that's ~5 ms total; for a managed DB across a VPC, ~10–50 ms. It's paid once per process.

Why startup, not lazy. A previous prototype used a lazy ensureCollection(path) that ran on every admin request. That worked when reconciliation was just "does the row exist? if no, insert." Phase 1 made the work semantic — decisions with consequences, including "should this throw and block the process?" — and where a semantic decision runs changes its failure surface.

Concern

Startup

Lazy (per-request, cached)

When a version pin error surfaces

Server refuses to start (loud, ops-visible)

First request to the offending route fails (scattered, user-visible)

When version-bump logs appear

All at boot, easy to grep

Scattered across the day's request logs

When an unreachable DB blocks you

Boot

First request per collection

First-request latency

Normal

Adds 1–2 round-trips on cold collection paths

State predictability for ops

"Everything reconciled by the time the server is up"

"Each collection reconciles when someone first hits it"

Consistency under parallel cold-starts

Single synchronous phase, no races

Two simultaneous first-requests can both attempt a bump

The last row matters under load. Lazy reconciliation inside a request handler has a lost-update window where two concurrent first-requests both compute the same hash, both see "no match," and both try to bump — yielding either a duplicate-key error or a double-bump. Startup reconciliation runs once, before any request.

Where lazy (or a hybrid) would actually win — three configurations would flip the trade, none of which apply today:

  1. Serverless / edge / short-lived processes. Every cold start pays startup cost. For 20 collections at ~50 ms total, that's a meaningful slice of a 100 ms invocation budget. Byline's current target is a long-running Node process.
  2. Hundreds or thousands of collections in a multi-tenant installation. At 500+ collections, even concurrent SELECTs get uncomfortable. Two better options at that scale: lazy DB reconciliation with synchronous in-memory fingerprinting at startup (catches authoring errors fail-fast without hitting the DB for unused schemas), or a "did anything change since last boot?" aggregate-hash check that reconciles individually only on a mismatch.
  3. Reconciliation starts doing expensive work. If Phase 2's history-table writes get large enough that bumping 20 collections on a redeploy is painful, selective or deferred reconciliation would win.

The Phase-1 code is structured so that dropping in a lazy or hybrid strategy later is a localised change — collectionRecords stays the contract; only the population strategy moves.

Fail-fast by default. A concrete benefit worth pulling out: startup reconciliation means a backwards version pin, a duplicate collection path, or an adapter mis-configuration fails the process before it accepts traffic. Operators find out during deploy, not during the first affected request. For a CMS where the blast radius of a silent schema desync is "every document written during the window is mis-stamped," that's the correct trade even before considering performance.


Current limitations

Recording the schema version is in place; reading by it is not. The boundary is:

  • Reads ignore collection_version. A document written under v2 and loaded against a live v3 definition reconstructs against v3. A field v3 added is absent; a field v3 removed leaves orphaned store rows that are ignored; a renamed field reads as the old name orphaned and the new name absent. Materialising an old document against its original schema — and migrating it forward in memory — is not yet supported.
  • schema_hash is nullable. The runtime invariant is that any row written after ensureCollections() carries a hash; only rows predating the feature can legitimately be NULL.
  • Bootstrap is fail-fast, not fail-partial. If one collection throws (e.g. a backwards version pin), reconciliation rejects and the server refuses to start — other in-flight reconciliations may already have written. A partially-reconciled startup is deliberately treated as worse than no startup.
  • initBylineCore is async. The webapp awaits it via top-level await in byline/server.config.ts; a non-Vite consumer would need to await explicitly.
docsPreviousRich Text Editor
docsNextReading & Delivery
Byline CMS

Building the future of content management, one commit at a time.

Project

  • Documentation
  • Roadmap
  • Contributing
  • Releases

Community

  • GitHub Discussions
  • Blog
  • Newsletter

Legal

  • Privacy Policy
  • Terms of Use
  • Cookies

© 2026 Infonomic Company Limited and contributors. Open source and built with ❤️ by the community.