Document Data Model

This document defines the core document data model used by WildEditorInChief (WEIC).

The data model describes the primary records, relationships, and structural rules that support document storage, revision history, fragment reuse, tagging, and future governance features.

The purpose of this model is to establish a stable structural foundation for WEIC implementation.

Purpose

The document data model exists to support several goals:

provide stable document identity separate from changing content
store document content using revision records
support reusable fragments with matching revision behavior
classify documents using tags
create a foundation for review, publication, and evidence features later

This model is aligned with the WEIC decision to store canonical document content as HTML.

Core entities

The initial WEIC data model includes the following primary entities:

documents
document_revisions
fragments
fragment_revisions
tags
document_tags

Future models may add:

review records
publication records
attachments
evidence links

Documents

A document represents the stable identity of a knowledge item.

The document record contains identity, hierarchy, ownership, and current-state metadata.

Typical document fields include:

Field	Purpose
id	stable document identifier
parent_id	hierarchical parent document
title	document title
slug	URL-friendly or logical document name
owner	responsible owner
status	document lifecycle status
current_revision_id	pointer to the current document revision
created_utc	creation timestamp
updated_utc	last update timestamp

The document record should not store the canonical body content directly. Body content belongs to revisions.

Document revisions

A document revision stores a specific version of a document's content.

Typical document revision fields include:

Field	Purpose
id	revision identifier
document_id	owning document
body_html	canonical HTML document body
revision_note	optional revision summary
author	revision author
created_utc	revision creation timestamp

This model allows document identity to remain stable while content evolves through revision history.

Fragments

A fragment is a reusable unit of document content.

Fragments allow shared language or repeated content to be managed separately from full documents.

Typical fragment fields include:

Field	Purpose
id	stable fragment identifier
name	fragment name
current_revision_id	pointer to the current fragment revision
created_utc	creation timestamp
updated_utc	last update timestamp

Fragments follow the same identity-plus-revision pattern as documents.

Fragment revisions

A fragment revision stores a specific version of a fragment's content.

Typical fragment revision fields include:

Field	Purpose
id	fragment revision identifier
fragment_id	owning fragment
body_html	canonical HTML fragment body
revision_note	optional revision summary
author	revision author
created_utc	revision creation timestamp

Fragments and documents share the same HTML storage principle.

Document tags

A document_tag record associates a document with a tag.

Typical fields include:

Field	Purpose
document_id	associated document
tag_id	associated tag

This model supports many-to-many classification without embedding tag lists directly in document records.

Entity relationships

The core relationships are:

document
   └─ many document_revisions

document
   └─ many document_tags
            └─ one tag

fragment
   └─ many fragment_revisions

Current state is represented through pointer fields such as current_revision_id.

Current revision model

The current document state is represented by a pointer to a revision.

This means:

the document record remains stable
content changes create new revisions
the active version is chosen by current_revision_id

This model is preferred over updating a single mutable body field in place.

The same rule applies to fragments.

Metadata location

The data model separates stable metadata from changing content.

Document-level metadata

Document-level metadata includes:

title
slug
hierarchy placement
owner
status
current revision pointer

Revision-level metadata

Revision-level metadata includes:

body_html
revision note
author
created timestamp

This separation preserves clean boundaries between identity and versioned content.

HTML body storage

Document and fragment body content is stored as HTML in revision records.

This means:

documents do not store Markdown as canonical content
revisions preserve authored HTML
fragments use the same canonical format as documents
rendering, export, and indexing begin from HTML

This model aligns with the broader WEIC HTML storage decision.

Search and indexing implications

Search and indexing should derive from the current HTML revision for a document or fragment.

This means indexing pipelines should:

select the current revision
parse or sanitize HTML content
extract searchable text
combine searchable body content with document metadata

Search should not depend on a separate Markdown source.

Future extensibility

The initial data model is intentionally focused, but it is designed to support later extensions.

Likely future entities include:

review workflows
approval or publication records
attachments
evidence associations
cross-document relationships

These can be added without changing the core identity and revision model.

Design principles

The document data model follows several principles.

Stable identity

A document or fragment remains the same logical entity even as its content changes.

Revision-first storage

Content changes are represented by new revisions rather than by mutating current content in place.

Canonical HTML

Body content is stored in HTML to preserve structure and rendering fidelity.

Fragment symmetry

Fragments should behave like lightweight documents using the same core revision ideas.

Extensible metadata

The model should support future governance and compliance needs without rewriting core entities.

Relationship to the Oryvin plan

WEIC is the knowledge core of the Oryvin ecosystem. The document data model defines the structure that stores that knowledge.

document identity
        ↓
revisioned HTML content
        ↓
rendering, export, workflow use, and evidence

This makes the WEIC storage layer buildable and predictable.