Skip to content

Document Data Model

This document defines the core document data model used by WildEditorInChief (WEIC).

The data model describes the primary records, relationships, and structural rules that support document storage, revision history, fragment reuse, tagging, and future governance features.

The purpose of this model is to establish a stable structural foundation for WEIC implementation.


Purpose

The document data model exists to support several goals:

  • provide stable document identity separate from changing content
  • store document content using revision records
  • support reusable fragments with matching revision behavior
  • classify documents using tags
  • create a foundation for review, publication, and evidence features later

This model is aligned with the WEIC decision to store canonical document content as HTML.


Core entities

The initial WEIC data model includes the following primary entities:

  • documents
  • document_revisions
  • fragments
  • fragment_revisions
  • tags
  • document_tags

Future models may add:

  • review records
  • publication records
  • attachments
  • evidence links

Documents

A document represents the stable identity of a knowledge item.

The document record contains identity, hierarchy, ownership, and current-state metadata.

Typical document fields include:

Field Purpose
id stable document identifier
parent_id hierarchical parent document
title document title
slug URL-friendly or logical document name
owner responsible owner
status document lifecycle status
current_revision_id pointer to the current document revision
created_utc creation timestamp
updated_utc last update timestamp

The document record should not store the canonical body content directly. Body content belongs to revisions.


Document revisions

A document revision stores a specific version of a document's content.

Typical document revision fields include:

Field Purpose
id revision identifier
document_id owning document
body_html canonical HTML document body
revision_note optional revision summary
author revision author
created_utc revision creation timestamp

This model allows document identity to remain stable while content evolves through revision history.


Fragments

A fragment is a reusable unit of document content.

Fragments allow shared language or repeated content to be managed separately from full documents.

Typical fragment fields include:

Field Purpose
id stable fragment identifier
name fragment name
current_revision_id pointer to the current fragment revision
created_utc creation timestamp
updated_utc last update timestamp

Fragments follow the same identity-plus-revision pattern as documents.


Fragment revisions

A fragment revision stores a specific version of a fragment's content.

Typical fragment revision fields include:

Field Purpose
id fragment revision identifier
fragment_id owning fragment
body_html canonical HTML fragment body
revision_note optional revision summary
author revision author
created_utc revision creation timestamp

Fragments and documents share the same HTML storage principle.


Tags

A tag is a reusable classification label applied to documents.

Typical tag fields include:

Field Purpose
id tag identifier
name tag name

Tags should remain simple and reusable.


Document tags

A document_tag record associates a document with a tag.

Typical fields include:

Field Purpose
document_id associated document
tag_id associated tag

This model supports many-to-many classification without embedding tag lists directly in document records.


Entity relationships

The core relationships are:

document
   └─ many document_revisions

document
   └─ many document_tags
            └─ one tag

fragment
   └─ many fragment_revisions

Current state is represented through pointer fields such as current_revision_id.


Current revision model

The current document state is represented by a pointer to a revision.

This means:

  • the document record remains stable
  • content changes create new revisions
  • the active version is chosen by current_revision_id

This model is preferred over updating a single mutable body field in place.

The same rule applies to fragments.


Metadata location

The data model separates stable metadata from changing content.

Document-level metadata

Document-level metadata includes:

  • title
  • slug
  • hierarchy placement
  • owner
  • status
  • current revision pointer

Revision-level metadata

Revision-level metadata includes:

  • body_html
  • revision note
  • author
  • created timestamp

This separation preserves clean boundaries between identity and versioned content.


HTML body storage

Document and fragment body content is stored as HTML in revision records.

This means:

  • documents do not store Markdown as canonical content
  • revisions preserve authored HTML
  • fragments use the same canonical format as documents
  • rendering, export, and indexing begin from HTML

This model aligns with the broader WEIC HTML storage decision.


Search and indexing implications

Search and indexing should derive from the current HTML revision for a document or fragment.

This means indexing pipelines should:

  • select the current revision
  • parse or sanitize HTML content
  • extract searchable text
  • combine searchable body content with document metadata

Search should not depend on a separate Markdown source.


Future extensibility

The initial data model is intentionally focused, but it is designed to support later extensions.

Likely future entities include:

  • review workflows
  • approval or publication records
  • attachments
  • evidence associations
  • cross-document relationships

These can be added without changing the core identity and revision model.


Design principles

The document data model follows several principles.

Stable identity

A document or fragment remains the same logical entity even as its content changes.

Revision-first storage

Content changes are represented by new revisions rather than by mutating current content in place.

Canonical HTML

Body content is stored in HTML to preserve structure and rendering fidelity.

Fragment symmetry

Fragments should behave like lightweight documents using the same core revision ideas.

Extensible metadata

The model should support future governance and compliance needs without rewriting core entities.


Relationship to the Oryvin plan

WEIC is the knowledge core of the Oryvin ecosystem. The document data model defines the structure that stores that knowledge.

document identity
        ↓
revisioned HTML content
        ↓
rendering, export, workflow use, and evidence

This makes the WEIC storage layer buildable and predictable.