Dossier — The Knowledge Model (v0)

knowledge-model

index read as Orient confidence asserted status active 2026-06-14

source Dossier design; conforms to OKF v0.1 (markdown + YAML, single mandatory `type` field)

The Knowledge Model (v0)

The model is how Dossier turns messy source material (sites, SharePoint, files, tribal know-how) into an atomic, single-source-of-truth, OKF-native institutional memory that is simultaneously a human wiki and an agent-queryable graph — and that the client owns.

Design principles

Atomic. One concept per file. The file is the unit of truth and the unit of reuse. If two facts can change independently, they are two files.
Single source of truth. A fact lives in exactly one place; everything else links to it ([[id]]). Never copy — reference.
OKF-native. Markdown + YAML frontmatter; only type is mandatory. cat-able, git clone-able, no SDK. (See Adopt OKF as Dossier's canonical knowledge format.)
Dual-surface. The same file reads cleanly for humans and parses cleanly for agents. One artifact, two audiences.
Relationships are first-class. Typed edges in frontmatter (machine: builds the graph) + [[wiki-links]] in the body (human: navigation). Same edge, two surfaces → GraphRAG for free.
Sovereign & legible. Plain files in the client's git; stable slugs as permanent addresses; index.how-grade URL legibility.
Capture judgment, not just facts. Decisions and rationale are concept types, because the tacit "how we decide things here" is the IP that walks out the door.
Provenance always. Every atom records where it came from and how sure we are. Auditable memory is trustable memory.

Base frontmatter (every concept)

Field	Req	Meaning
`type`	✅	The concept type (below) — the only OKF-mandatory field. Custom types allowed.
`id`	★	Stable unique slug. The permanent address; never recycle.
`title`	★	Human label.
`description`	★	One-line summary (used for recall/relevance).
`resource`	○	URI of the underlying real asset (the SharePoint doc, Figma file, repo). Provenance link.
`owner`	○	`role` id accountable for this concept (edge).
`status`	○	`draft` \| `active` \| `deprecated` \| `superseded`
`confidence`	○	`verified` (evidence) \| `asserted` (human judgment) \| `inferred` (LLM-derived)
`source`	○	Provenance: connector/url/human the knowledge was extracted from.
`tags`	○	Free tags.
`timestamp`	○	ISO-8601 of last meaningful change.
`supersedes` / `superseded_by`	○	Versioning edges.

★ = strongly recommended · ○ = optional

Core concept types (the spine)

`type`	Captures	Key fields
`role`	A position/function — the "who's accountable" layer	`responsibilities`, `decision_rights`, `reports_to`, `members`
`process`	A standardized, repeatable unit of work — "how we do X"	`trigger`, `inputs`, `outputs`, `owner`, `uses` (systems), `governed_by` (policies), inline steps
`workflow`	An orchestration connecting processes/roles/systems end-to-end — the connective tissue	`trigger`, `stages[]` (ordered: process + responsible role + system + decision points), `outcomes`
`decision`	A judgment record (ADR-style) — the tacit "why"	`context`, `options_considered`, `decision`, `rationale`, `decided_by`, `reversibility`, `consequences`, `review_date`

process vs workflow: a process is a node (a documented procedure); a workflow is the path through nodes (the orchestration). A workflow references processes; a process never references a workflow.

Supporting concept types

`type`	Captures	Key fields
`system`	A tool/software/platform the org uses (Salesforce, Figma, SharePoint). Ties workflows to tooling and to ingestion connectors.	—
`policy`	A rule/standard/constraint/governance (entitlements, compliance) — the governance layer.	—
`artifact`	A deliverable/template/output (proposal template, brand guide).	—
`term`	An atomic glossary definition; domain vocabulary.	—
`task`	A single unit of work in the agentic board — the work item a bounded loop picks up, transitions, and hands off (Agentic "sprint board" architecture — a git-resident OKF task board worked by bounded, hook-governed Agent SDK loops). One atomic file per task; its frontmatter carries its own live state, so the git board is the source of truth.	`status`, `priority`, `assignee`, `dependencies`, `acceptance_criteria`, `claimed_by`, `lease_expires` (`owner` = base)
`concept`	Generic fallback for anything not yet typed (OKF base type).	—

Extensible by design. Because only type is mandatory, verticals add their own types freely — e.g. a Digital Experience Agency vertical might add engagement, client, capability. The model is a starting taxonomy, not a cage.

The `task` type — work items on the git board

A task is the OKF atom DEC-0024's agentic board is made of: one markdown+YAML file per work item, in the client's own git, so the board carries no off-disk state. It conforms to the base frontmatter exactly (id immutable, owner = accountable role, source/confidence/timestamp provenance) and adds a small mutable spine:

Field	Req	Meaning
`status`	★	`backlog` \| `claimed` \| `in_progress` \| `review` \| `done` \| `blocked`. Free-text by design (verticals may add e.g. `on-hold-legal`); the baseline lifecycle is the convention, not a hard enum.
`priority`	○	Relative urgency — `p0` \| `p1` \| `p2` (or `critical`/`high`/`medium`/`low`).
`owner`	○	The `role` accountable for the task's direction (base field, edge → role).
`assignee`	○	Who is actually doing the work — a role id or a concrete agent/person handle. Mirrors the `process` owner-vs-doer split.
`dependencies`	○	Prerequisite task ids (edge → task). A suggestion of ordering, not an automatic hold — agents and humans must check and decide. Dangling deps are graph-lint findings, never board breakers.
`acceptance_criteria`	○	The done-definition: a checklist the `review` gate is judged against.
`claimed_by`	○	The agent/person id currently holding the task (coordination, not ownership). Half of the claim/lease pair.
`lease_expires`	○	ISO-8601 expiry of the claim. A `PreToolUse` hook denies edits that breach a live lease and treats an expired lease as reclaimable — governance is deterministic (hook verdict), not model-trusted (Agentic "sprint board" architecture — a git-resident OKF task board worked by bounded, hook-governed Agent SDK loops §5).

Conventions that keep the board atomic and legible:

One task per file, id-stamped and immutable (slug never recycled; retire via status: done + optional superseded_by, never rename/delete). Task ids take a task-… prefix so they never collide with decision/role ids in the route map.
State lives in the atom, nowhere else. There is no separate claims table or lock service; status / claimed_by / lease_expires are frontmatter edits + git commits. The git history is the audit trail.
Tasks live under tasks/ by convention. The loader is directory-agnostic (it globs **/*.md), but the convention keeps boards discoverable and tools optimized; a board surface queries type === 'task'.
Provenance always. Every task records source (how it was created — a curator, a Jira import), confidence (asserted when human-curated, inferred when agent-generated), and timestamp (last meaningful change).
Quote full date-times. A lease_expires (or any …T…Z date-time) is written quoted ("2026-06-16T14:30:00Z") so it parses as a string, not a YAML timestamp — the field is z.string(). A date-only value (2026-06-14) needs no quoting.
Verticals extend, never fork. A client needing custom fields registers a vertical task type (e.g. a DXA dxa-task adding service_tier/billable_hours) via the same registerType(defineType(...)) path the other verticals use — single source of truth preserved.

process vs workflow vs task. A process is a node (a documented, repeatable procedure — "how we do X", stable). A workflow is the path through nodes (the standing orchestration). A task is a transient unit of execution — a single, ephemeral work item with mutable state (claimed, in progress, done) that an agent or human actually performs now. A process answers "how is this done in general"; a task answers "do this specific thing, here is its live state." Tasks may reference the processes/decisions/roles they enact (via relates_to/informed_by/owner), but a process never references a task (the durable concept does not depend on the ephemeral work item).

Relationship vocabulary (edges → the graph)

Core vocabulary (every atom may carry these): owner/owned_by → role · reports_to → role · steps/stages → ordered process refs · uses → system · governed_by → policy · produces → artifact · informed_by/decided_in/decided_by → decision/role · defines/uses_term → term · relates_to → generic · supersedes/superseded_by → versioning.

Task edges (board): dependencies → prerequisite task ids (ordering hint, not a transitive hold — agents must check and decide); assignee → the role/person doing the work; claimed_by → the agent/person currently holding the task (coordination). owner stays the accountable role. Reverse readings (blocked_by as the inverse of dependencies, owned_by as the inverse of owner) are derived at read time, never authored twice — single source of truth.

Vertical edges (registry-driven). A relationship label is not a closed list. A vertical adds its own traversable edges by declaring them on its registered type — the same registerType(defineType(...)) path it uses to add the type itself — so verticals extend the graph, never fork it. The edge field's name is the relationship label, and a registered edge is a plain typed edge (it never inherits produces semantics — DEC-0007 reserves produces to a process). The DXA vertical contributes: capability.delivered_by → workflow/process (associative, deliberately not produces — DEC-0007); client.systems → system, client.engagements → engagement; engagement.client → client, engagement.capabilities → capability, engagement.runs → workflow/process, engagement.sow → artifact. (See The produces edge is canonical on the producing process only and OKF edge vocabulary is registry-driven — a vertical declares its own traversable edges.) Edge extraction (edges()) unions the core vocabulary with every registered vertical field; a core label always wins, so a vertical can never shadow a core relationship.

Edges live in frontmatter and as [[wiki-links]] in the body. The frontmatter feeds the knowledge graph (GraphRAG); the body feeds the human wiki.

Reserved files (OKF)

These are reserved structural types (type: index / type: log) — first-class in the schema, not coerced to concept. They organize the repo rather than capture a domain concept.

index.md (type: index) — overview/entry for a directory or composite concept.
log.md (type: log) — the atomic change/decision log for that directory or concept (audit trail at knowledge-atom granularity).

The judgment layer (why `decision` is core)

Facts (process, role) are the easy 80%. The IP is the judgment: why this vendor, why onboarding works this way, what we'd never do again. decision concepts capture that — and they are produced continuously, by the log-auditor agent as the org (and this project) operates. Decisions link to the processes/workflows/roles they shaped, so the graph encodes not just what the org does but why — exactly the tacit knowledge Nadella says models otherwise commoditize away.

Example — a `process` atom

---
type: process
id: client-onboarding
title: Client Onboarding
description: How a new enterprise client goes from signed SOW to active engagement.
owner: engagement-lead
uses: [salesforce, sharepoint]
governed_by: [data-handling-policy]
produces: [kickoff-deck]
confidence: verified
source: sharepoint://ops/onboarding-runbook.docx
status: active
timestamp: 2026-06-14
---

# Client Onboarding
**Trigger:** SOW signed. **Owner:** [[engagement-lead]].
1. Provision client workspace …
2. Run discovery against [[salesforce]] …
> Decided to front-load discovery — see [[0007-frontload-discovery]].

Example — a `task` atom

A work item on the agentic board, mid-flight (claimed), with the claim/lease in action. Note: owner/assignee are the accountable role / the doer; claimed_by + lease_expires are coordination; dependencies is an ordering hint, not a hold. Quote a full ISO-8601 lease_expires (a date-time) so it parses as a string, not a YAML timestamp.

---
type: task
id: task-extract-acme-handbook
title: Extract the Acme employee handbook into OKF
description: Run the loop over Acme's handbook PDF and emit conformant OKF atoms.
status: claimed
confidence: asserted
source: board-curator (engagement-lead)
timestamp: 2026-06-16
priority: p1
owner: engagement-lead
assignee: extraction-engineer
dependencies: [task-provision-acme-tenant]
acceptance_criteria:
  - All handbook sections emitted as atoms; 0 rejected by `@dossier/okf` validate
  - Every atom carries `confidence: inferred` + the source PDF as provenance
claimed_by: claude-agent-extract-20260616
lease_expires: "2026-06-16T14:30:00Z"
---

# Extract the Acme employee handbook into OKF
**Owner:** [[engagement-lead]] · **Doing it:** [[extraction-engineer]].
Prereq: [[task-provision-acme-tenant]] should land first (ordering hint, not a hard block).
> Claimed by `claude-agent-extract-20260616` until the lease expires; a [[0024-agentic-board-architecture|PreToolUse hook]] denies a competing edit while the lease is live.

Versioning

Concepts evolve via status + supersedes/superseded_by; ids are immutable. History is git. v0 of this model is itself a decision and will compound.