Dossier — The Knowledge Model (v0)

knowledge-model

index read as Orient confidence asserted status active 2026-06-14
source Dossier design; conforms to OKF v0.1 (markdown + YAML, single mandatory `type` field)

The Knowledge Model (v0)

The model is how Dossier turns messy source material (sites, SharePoint, files, tribal know-how) into an atomic, single-source-of-truth, OKF-native institutional memory that is simultaneously a human wiki and an agent-queryable graph — and that the client owns.

Design principles

  1. Atomic. One concept per file. The file is the unit of truth and the unit of reuse. If two facts can change independently, they are two files.
  2. Single source of truth. A fact lives in exactly one place; everything else links to it ([[id]]). Never copy — reference.
  3. OKF-native. Markdown + YAML frontmatter; only type is mandatory. cat-able, git clone-able, no SDK. (See Adopt OKF as Dossier's canonical knowledge format.)
  4. Dual-surface. The same file reads cleanly for humans and parses cleanly for agents. One artifact, two audiences.
  5. Relationships are first-class. Typed edges in frontmatter (machine: builds the graph) + [[wiki-links]] in the body (human: navigation). Same edge, two surfaces → GraphRAG for free.
  6. Sovereign & legible. Plain files in the client's git; stable slugs as permanent addresses; index.how-grade URL legibility.
  7. Capture judgment, not just facts. Decisions and rationale are concept types, because the tacit "how we decide things here" is the IP that walks out the door.
  8. Provenance always. Every atom records where it came from and how sure we are. Auditable memory is trustable memory.

Base frontmatter (every concept)

Field Req Meaning
type The concept type (below) — the only OKF-mandatory field. Custom types allowed.
id Stable unique slug. The permanent address; never recycle.
title Human label.
description One-line summary (used for recall/relevance).
resource URI of the underlying real asset (the SharePoint doc, Figma file, repo). Provenance link.
owner role id accountable for this concept (edge).
status draft | active | deprecated | superseded
confidence verified (evidence) | asserted (human judgment) | inferred (LLM-derived)
source Provenance: connector/url/human the knowledge was extracted from.
tags Free tags.
timestamp ISO-8601 of last meaningful change.
supersedes / superseded_by Versioning edges.

★ = strongly recommended · ○ = optional

Core concept types (the spine)

type Captures Key fields
role A position/function — the "who's accountable" layer responsibilities, decision_rights, reports_to, members
process A standardized, repeatable unit of work — "how we do X" trigger, inputs, outputs, owner, uses (systems), governed_by (policies), inline steps
workflow An orchestration connecting processes/roles/systems end-to-end — the connective tissue trigger, stages[] (ordered: process + responsible role + system + decision points), outcomes
decision A judgment record (ADR-style) — the tacit "why" context, options_considered, decision, rationale, decided_by, reversibility, consequences, review_date

process vs workflow: a process is a node (a documented procedure); a workflow is the path through nodes (the orchestration). A workflow references processes; a process never references a workflow.

Supporting concept types

type Captures Key fields
system A tool/software/platform the org uses (Salesforce, Figma, SharePoint). Ties workflows to tooling and to ingestion connectors.
policy A rule/standard/constraint/governance (entitlements, compliance) — the governance layer.
artifact A deliverable/template/output (proposal template, brand guide).
term An atomic glossary definition; domain vocabulary.
task A single unit of work in the agentic board — the work item a bounded loop picks up, transitions, and hands off (Agentic "sprint board" architecture — a git-resident OKF task board worked by bounded, hook-governed Agent SDK loops). One atomic file per task; its frontmatter carries its own live state, so the git board is the source of truth. status, priority, assignee, dependencies, acceptance_criteria, claimed_by, lease_expires (owner = base)
concept Generic fallback for anything not yet typed (OKF base type).

Extensible by design. Because only type is mandatory, verticals add their own types freely — e.g. a Digital Experience Agency vertical might add engagement, client, capability. The model is a starting taxonomy, not a cage.

The task type — work items on the git board

A task is the OKF atom DEC-0024's agentic board is made of: one markdown+YAML file per work item, in the client's own git, so the board carries no off-disk state. It conforms to the base frontmatter exactly (id immutable, owner = accountable role, source/confidence/timestamp provenance) and adds a small mutable spine:

Field Req Meaning
status backlog | claimed | in_progress | review | done | blocked. Free-text by design (verticals may add e.g. on-hold-legal); the baseline lifecycle is the convention, not a hard enum.
priority Relative urgency — p0 | p1 | p2 (or critical/high/medium/low).
owner The role accountable for the task's direction (base field, edge → role).
assignee Who is actually doing the work — a role id or a concrete agent/person handle. Mirrors the process owner-vs-doer split.
dependencies Prerequisite task ids (edge → task). A suggestion of ordering, not an automatic hold — agents and humans must check and decide. Dangling deps are graph-lint findings, never board breakers.
acceptance_criteria The done-definition: a checklist the review gate is judged against.
claimed_by The agent/person id currently holding the task (coordination, not ownership). Half of the claim/lease pair.
lease_expires ISO-8601 expiry of the claim. A PreToolUse hook denies edits that breach a live lease and treats an expired lease as reclaimable — governance is deterministic (hook verdict), not model-trusted (Agentic "sprint board" architecture — a git-resident OKF task board worked by bounded, hook-governed Agent SDK loops §5).

Conventions that keep the board atomic and legible:

  • One task per file, id-stamped and immutable (slug never recycled; retire via status: done + optional superseded_by, never rename/delete). Task ids take a task-… prefix so they never collide with decision/role ids in the route map.
  • State lives in the atom, nowhere else. There is no separate claims table or lock service; status / claimed_by / lease_expires are frontmatter edits + git commits. The git history is the audit trail.
  • Tasks live under tasks/ by convention. The loader is directory-agnostic (it globs **/*.md), but the convention keeps boards discoverable and tools optimized; a board surface queries type === 'task'.
  • Provenance always. Every task records source (how it was created — a curator, a Jira import), confidence (asserted when human-curated, inferred when agent-generated), and timestamp (last meaningful change).
  • Quote full date-times. A lease_expires (or any …T…Z date-time) is written quoted ("2026-06-16T14:30:00Z") so it parses as a string, not a YAML timestamp — the field is z.string(). A date-only value (2026-06-14) needs no quoting.
  • Verticals extend, never fork. A client needing custom fields registers a vertical task type (e.g. a DXA dxa-task adding service_tier/billable_hours) via the same registerType(defineType(...)) path the other verticals use — single source of truth preserved.

process vs workflow vs task. A process is a node (a documented, repeatable procedure — "how we do X", stable). A workflow is the path through nodes (the standing orchestration). A task is a transient unit of execution — a single, ephemeral work item with mutable state (claimed, in progress, done) that an agent or human actually performs now. A process answers "how is this done in general"; a task answers "do this specific thing, here is its live state." Tasks may reference the processes/decisions/roles they enact (via relates_to/informed_by/owner), but a process never references a task (the durable concept does not depend on the ephemeral work item).

Relationship vocabulary (edges → the graph)

Core vocabulary (every atom may carry these): owner/owned_by → role · reports_to → role · steps/stages → ordered process refs · uses → system · governed_by → policy · produces → artifact · informed_by/decided_in/decided_by → decision/role · defines/uses_term → term · relates_to → generic · supersedes/superseded_by → versioning.

Task edges (board): dependencies → prerequisite task ids (ordering hint, not a transitive hold — agents must check and decide); assignee → the role/person doing the work; claimed_by → the agent/person currently holding the task (coordination). owner stays the accountable role. Reverse readings (blocked_by as the inverse of dependencies, owned_by as the inverse of owner) are derived at read time, never authored twice — single source of truth.

Vertical edges (registry-driven). A relationship label is not a closed list. A vertical adds its own traversable edges by declaring them on its registered type — the same registerType(defineType(...)) path it uses to add the type itself — so verticals extend the graph, never fork it. The edge field's name is the relationship label, and a registered edge is a plain typed edge (it never inherits produces semantics — DEC-0007 reserves produces to a process). The DXA vertical contributes: capability.delivered_by → workflow/process (associative, deliberately not produces — DEC-0007); client.systems → system, client.engagements → engagement; engagement.client → client, engagement.capabilities → capability, engagement.runs → workflow/process, engagement.sow → artifact. (See The produces edge is canonical on the producing process only and OKF edge vocabulary is registry-driven — a vertical declares its own traversable edges.) Edge extraction (edges()) unions the core vocabulary with every registered vertical field; a core label always wins, so a vertical can never shadow a core relationship.

Edges live in frontmatter and as [[wiki-links]] in the body. The frontmatter feeds the knowledge graph (GraphRAG); the body feeds the human wiki.

Reserved files (OKF)

These are reserved structural types (type: index / type: log) — first-class in the schema, not coerced to concept. They organize the repo rather than capture a domain concept.

  • index.md (type: index) — overview/entry for a directory or composite concept.
  • log.md (type: log) — the atomic change/decision log for that directory or concept (audit trail at knowledge-atom granularity).

The judgment layer (why decision is core)

Facts (process, role) are the easy 80%. The IP is the judgment: why this vendor, why onboarding works this way, what we'd never do again. decision concepts capture that — and they are produced continuously, by the log-auditor agent as the org (and this project) operates. Decisions link to the processes/workflows/roles they shaped, so the graph encodes not just what the org does but why — exactly the tacit knowledge Nadella says models otherwise commoditize away.

Example — a process atom

---
type: process
id: client-onboarding
title: Client Onboarding
description: How a new enterprise client goes from signed SOW to active engagement.
owner: engagement-lead
uses: [salesforce, sharepoint]
governed_by: [data-handling-policy]
produces: [kickoff-deck]
confidence: verified
source: sharepoint://ops/onboarding-runbook.docx
status: active
timestamp: 2026-06-14
---

# Client Onboarding
**Trigger:** SOW signed. **Owner:** [[engagement-lead]].
1. Provision client workspace …
2. Run discovery against [[salesforce]] …
> Decided to front-load discovery — see [[0007-frontload-discovery]].

Example — a task atom

A work item on the agentic board, mid-flight (claimed), with the claim/lease in action. Note: owner/assignee are the accountable role / the doer; claimed_by + lease_expires are coordination; dependencies is an ordering hint, not a hold. Quote a full ISO-8601 lease_expires (a date-time) so it parses as a string, not a YAML timestamp.

---
type: task
id: task-extract-acme-handbook
title: Extract the Acme employee handbook into OKF
description: Run the loop over Acme's handbook PDF and emit conformant OKF atoms.
status: claimed
confidence: asserted
source: board-curator (engagement-lead)
timestamp: 2026-06-16
priority: p1
owner: engagement-lead
assignee: extraction-engineer
dependencies: [task-provision-acme-tenant]
acceptance_criteria:
  - All handbook sections emitted as atoms; 0 rejected by `@dossier/okf` validate
  - Every atom carries `confidence: inferred` + the source PDF as provenance
claimed_by: claude-agent-extract-20260616
lease_expires: "2026-06-16T14:30:00Z"
---

# Extract the Acme employee handbook into OKF
**Owner:** [[engagement-lead]] · **Doing it:** [[extraction-engineer]].
Prereq: [[task-provision-acme-tenant]] should land first (ordering hint, not a hard block).
> Claimed by `claude-agent-extract-20260616` until the lease expires; a [[0024-agentic-board-architecture|PreToolUse hook]] denies a competing edit while the lease is live.

Versioning

Concepts evolve via status + supersedes/superseded_by; ids are immutable. History is git. v0 of this model is itself a decision and will compound.