HTML Storage Model

This document defines the HTML storage model used by WildEditorInChief (WEIC).

WEIC stores document content in HTML as its canonical internal format. HTML is the authoritative representation of document body content. Markdown may appear as an import or publication format in some workflows, but it is not the primary storage format inside WEIC.

This decision aligns WEIC with the long-term goals of structured authoring, controlled rendering, fragment reuse, revision management, and flexible export.

Purpose

The HTML storage model exists to support several goals:

preserve document content in a rich, structured format
support controlled editing and rendering
allow fragment reuse within HTML documents
enable revision storage without lossy format conversion
support downstream export to other formats when required

The key principle is that WEIC should store the document as it is authored rather than converting it into a weaker intermediate representation.

Canonical document format

The canonical body format for WEIC documents is HTML.

This means:

document body content is stored as HTML
revisions preserve HTML body content
rendering is based on stored HTML
exports begin from HTML rather than from Markdown

Markdown is not the canonical format.

Markdown may still appear in some contexts, such as:

bootstrap documentation repositories
imports from external sources
publication workflows
conversion pipelines

However, once content is stored in WEIC, the authoritative form is HTML.

Why HTML

HTML was selected as the canonical format because it supports the needs of a controlled documentation system better than plain Markdown.

Advantages include:

richer document structure
precise control over formatting and semantics
better support for embedded structure and reusable fragments
easier round-trip editing in browser-based editors
more direct rendering and export behavior

This model also aligns naturally with editor technologies that already produce structured HTML output.

Document structure

A WEIC document consists of structured metadata and body content.

Typical document elements include:

document identifier
title
slug
hierarchy placement
ownership metadata
status metadata
current revision reference
HTML body content stored in revisions

The document record itself should contain document identity and placement metadata, while content belongs to revisions.

This separation keeps document identity stable while allowing content to evolve over time.

Revision storage

WEIC uses revision-based storage.

Each content change creates a new revision record. A revision stores the document body in HTML along with revision metadata.

Typical revision elements include:

revision identifier
document identifier
created timestamp
author
revision notes, where applicable
HTML body content

This model provides:

historical traceability
safe editing
rollback capability
comparison between revisions
controlled publication of approved content

The current document state is represented by the current revision reference rather than by mutating a single body field in place.

Fragment model

WEIC supports reusable fragments that can be inserted into documents.

Fragments are also stored as HTML.

This allows fragments to preserve rich structure and formatting without conversion.

Typical fragment properties include:

fragment identifier
fragment name
current revision reference
HTML body content in fragment revisions

A fragment may represent:

standardized language
reusable policy text
common operational instructions
shared document sections

The HTML storage model makes fragment reuse predictable because inserted content already exists in the same canonical format as the parent document.

Fragment inclusion behavior

Fragment use should remain explicit.

A document may:

embed fragment content directly at render time
store fragment references for later expansion
support controlled materialization for export or publication workflows

The important architectural point is that fragments and documents share the same canonical body format.

This avoids the complexity of mixing Markdown fragments into HTML documents or vice versa.

Rendering model

Rendering in WEIC begins from stored HTML.

Typical rendering paths include:

editing in an HTML-capable editor
displaying rendered document content in the application
publishing HTML output
transforming HTML to other target formats

Because HTML is already the stored format, rendering does not require an intermediate conversion step from Markdown.

This reduces ambiguity and preserves structure more reliably.

Editing model

WEIC editing is expected to operate on structured HTML content.

This does not mean users edit raw HTML directly in all cases. In practice, an editor may provide a rich text or structured interface while still producing HTML as the persisted result.

This approach supports:

browser-based editing
consistent round-trip behavior
preservation of document structure
controlled sanitization and validation

The persisted output remains HTML even if the editing interface abstracts the markup from the user.

Export model

Exports should begin from canonical HTML content.

Possible export targets include:

HTML publication
PDF generation
DOCX generation
Markdown export where required

Because HTML is the canonical source, exports are treated as derived formats rather than as the authoritative record.

This keeps the storage model simple:

authoring
   ↓
HTML revision storage
   ↓
rendering and export

Rather than:

authoring
   ↓
Markdown storage
   ↓
HTML conversion
   ↓
export conversion

The HTML-first model reduces unnecessary conversion chains.

Relationship to EIC

EditorInChief (EIC) was the earlier knowledge editor that preceded WEIC.

Work in EIC was originally designed to organize and manage policies and gather evidence for HITRUST certification.

WEIC extends that model into a broader documentation and knowledge platform.

The HTML storage decision reflects lessons from that evolution:

document structure matters
revision fidelity matters
reusable content matters
rendered output matters

Using HTML as the canonical format supports these requirements more directly than Markdown.

Search and indexing implications

Because HTML is the canonical body format, indexing systems should extract searchable text from sanitized HTML content.

This means search pipelines should:

parse stored HTML
extract text content
preserve structural cues where useful
index metadata and body content separately where appropriate

Search should not rely on the existence of Markdown source.

The system should treat HTML as the source of truth for indexing and retrieval.

Sanitization and validation

Because HTML is stored directly, WEIC must apply controlled sanitization and validation rules.

Typical goals include:

remove disallowed markup
preserve approved structural elements
prevent unsafe script or embedded content
enforce predictable document structure

The storage model assumes HTML is controlled, not arbitrary.

This is an important part of making HTML safe and reliable as a canonical format.

Design principles

The HTML storage model follows several principles.

Canonical richness

Store the document in a format rich enough to preserve real structure.

Revision fidelity

Revisions should preserve authored content without unnecessary conversion loss.

Format separation

Canonical storage and export formats are not the same thing.

Fragment consistency

Documents and fragments should share the same storage model.

Controlled rendering

Stored content should be renderable in predictable ways across editing, viewing, and publication.

Relationship to the Oryvin plan

WEIC acts as the knowledge core of the Oryvin ecosystem. The HTML storage model defines how that knowledge is stored internally.

authored knowledge
        ↓
HTML document and fragment revisions
        ↓
rendering, publication, and export
        ↓
automation and operational use

This decision affects document governance, publication, fragment reuse, and long-term system evolution.