Document Data Model
This document defines the core document data model used by WildEditorInChief (WEIC).
The data model describes the primary records, relationships, and structural rules that support document storage, revision history, fragment reuse, tagging, and future governance features.
The purpose of this model is to establish a stable structural foundation for WEIC implementation.
Purpose
The document data model exists to support several goals:
- provide stable document identity separate from changing content
- store document content using revision records
- support reusable fragments with matching revision behavior
- classify documents using tags
- create a foundation for review, publication, and evidence features later
This model is aligned with the WEIC decision to store canonical document content as HTML.
Core entities
The initial WEIC data model includes the following primary entities:
- documents
- document_revisions
- fragments
- fragment_revisions
- tags
- document_tags
Future models may add:
- review records
- publication records
- attachments
- evidence links
Documents
A document represents the stable identity of a knowledge item.
The document record contains identity, hierarchy, ownership, and current-state metadata.
Typical document fields include:
| Field | Purpose |
|---|---|
| id | stable document identifier |
| parent_id | hierarchical parent document |
| title | document title |
| slug | URL-friendly or logical document name |
| owner | responsible owner |
| status | document lifecycle status |
| current_revision_id | pointer to the current document revision |
| created_utc | creation timestamp |
| updated_utc | last update timestamp |
The document record should not store the canonical body content directly. Body content belongs to revisions.
Document revisions
A document revision stores a specific version of a document's content.
Typical document revision fields include:
| Field | Purpose |
|---|---|
| id | revision identifier |
| document_id | owning document |
| body_html | canonical HTML document body |
| revision_note | optional revision summary |
| author | revision author |
| created_utc | revision creation timestamp |
This model allows document identity to remain stable while content evolves through revision history.
Fragments
A fragment is a reusable unit of document content.
Fragments allow shared language or repeated content to be managed separately from full documents.
Typical fragment fields include:
| Field | Purpose |
|---|---|
| id | stable fragment identifier |
| name | fragment name |
| current_revision_id | pointer to the current fragment revision |
| created_utc | creation timestamp |
| updated_utc | last update timestamp |
Fragments follow the same identity-plus-revision pattern as documents.
Fragment revisions
A fragment revision stores a specific version of a fragment's content.
Typical fragment revision fields include:
| Field | Purpose |
|---|---|
| id | fragment revision identifier |
| fragment_id | owning fragment |
| body_html | canonical HTML fragment body |
| revision_note | optional revision summary |
| author | revision author |
| created_utc | revision creation timestamp |
Fragments and documents share the same HTML storage principle.
Tags
A tag is a reusable classification label applied to documents.
Typical tag fields include:
| Field | Purpose |
|---|---|
| id | tag identifier |
| name | tag name |
Tags should remain simple and reusable.
Document tags
A document_tag record associates a document with a tag.
Typical fields include:
| Field | Purpose |
|---|---|
| document_id | associated document |
| tag_id | associated tag |
This model supports many-to-many classification without embedding tag lists directly in document records.
Entity relationships
The core relationships are:
document
└─ many document_revisions
document
└─ many document_tags
└─ one tag
fragment
└─ many fragment_revisions
Current state is represented through pointer fields such as current_revision_id.
Current revision model
The current document state is represented by a pointer to a revision.
This means:
- the document record remains stable
- content changes create new revisions
- the active version is chosen by
current_revision_id
This model is preferred over updating a single mutable body field in place.
The same rule applies to fragments.
Metadata location
The data model separates stable metadata from changing content.
Document-level metadata
Document-level metadata includes:
- title
- slug
- hierarchy placement
- owner
- status
- current revision pointer
Revision-level metadata
Revision-level metadata includes:
- body_html
- revision note
- author
- created timestamp
This separation preserves clean boundaries between identity and versioned content.
HTML body storage
Document and fragment body content is stored as HTML in revision records.
This means:
- documents do not store Markdown as canonical content
- revisions preserve authored HTML
- fragments use the same canonical format as documents
- rendering, export, and indexing begin from HTML
This model aligns with the broader WEIC HTML storage decision.
Search and indexing implications
Search and indexing should derive from the current HTML revision for a document or fragment.
This means indexing pipelines should:
- select the current revision
- parse or sanitize HTML content
- extract searchable text
- combine searchable body content with document metadata
Search should not depend on a separate Markdown source.
Future extensibility
The initial data model is intentionally focused, but it is designed to support later extensions.
Likely future entities include:
- review workflows
- approval or publication records
- attachments
- evidence associations
- cross-document relationships
These can be added without changing the core identity and revision model.
Design principles
The document data model follows several principles.
Stable identity
A document or fragment remains the same logical entity even as its content changes.
Revision-first storage
Content changes are represented by new revisions rather than by mutating current content in place.
Canonical HTML
Body content is stored in HTML to preserve structure and rendering fidelity.
Fragment symmetry
Fragments should behave like lightweight documents using the same core revision ideas.
Extensible metadata
The model should support future governance and compliance needs without rewriting core entities.
Relationship to the Oryvin plan
WEIC is the knowledge core of the Oryvin ecosystem. The document data model defines the structure that stores that knowledge.
document identity
↓
revisioned HTML content
↓
rendering, export, workflow use, and evidence
This makes the WEIC storage layer buildable and predictable.