Dossier — The Knowledge Model (v0)
knowledge-model
The Knowledge Model (v0)
The model is how Dossier turns messy source material (sites, SharePoint, files, tribal know-how) into an atomic, single-source-of-truth, OKF-native institutional memory that is simultaneously a human wiki and an agent-queryable graph — and that the client owns.
Design principles
- Atomic. One concept per file. The file is the unit of truth and the unit of reuse. If two facts can change independently, they are two files.
- Single source of truth. A fact lives in exactly one place; everything else links to it (
[[id]]). Never copy — reference. - OKF-native. Markdown + YAML frontmatter; only
typeis mandatory.cat-able,git clone-able, no SDK. (See Adopt OKF as Dossier's canonical knowledge format.) - Dual-surface. The same file reads cleanly for humans and parses cleanly for agents. One artifact, two audiences.
- Relationships are first-class. Typed edges in frontmatter (machine: builds the graph) +
[[wiki-links]]in the body (human: navigation). Same edge, two surfaces → GraphRAG for free. - Sovereign & legible. Plain files in the client's git; stable slugs as permanent addresses; index.how-grade URL legibility.
- Capture judgment, not just facts. Decisions and rationale are concept types, because the tacit "how we decide things here" is the IP that walks out the door.
- Provenance always. Every atom records where it came from and how sure we are. Auditable memory is trustable memory.
Base frontmatter (every concept)
| Field | Req | Meaning |
|---|---|---|
type |
✅ | The concept type (below) — the only OKF-mandatory field. Custom types allowed. |
id |
★ | Stable unique slug. The permanent address; never recycle. |
title |
★ | Human label. |
description |
★ | One-line summary (used for recall/relevance). |
resource |
○ | URI of the underlying real asset (the SharePoint doc, Figma file, repo). Provenance link. |
owner |
○ | role id accountable for this concept (edge). |
status |
○ | draft | active | deprecated | superseded |
confidence |
○ | verified (evidence) | asserted (human judgment) | inferred (LLM-derived) |
source |
○ | Provenance: connector/url/human the knowledge was extracted from. |
tags |
○ | Free tags. |
timestamp |
○ | ISO-8601 of last meaningful change. |
supersedes / superseded_by |
○ | Versioning edges. |
★ = strongly recommended · ○ = optional
Core concept types (the spine)
type |
Captures | Key fields |
|---|---|---|
role |
A position/function — the "who's accountable" layer | responsibilities, decision_rights, reports_to, members |
process |
A standardized, repeatable unit of work — "how we do X" | trigger, inputs, outputs, owner, uses (systems), governed_by (policies), inline steps |
workflow |
An orchestration connecting processes/roles/systems end-to-end — the connective tissue | trigger, stages[] (ordered: process + responsible role + system + decision points), outcomes |
decision |
A judgment record (ADR-style) — the tacit "why" | context, options_considered, decision, rationale, decided_by, reversibility, consequences, review_date |
process vs workflow: a
processis a node (a documented procedure); aworkflowis the path through nodes (the orchestration). A workflow references processes; a process never references a workflow.
Supporting concept types
type |
Captures | Key fields |
|---|---|---|
system |
A tool/software/platform the org uses (Salesforce, Figma, SharePoint). Ties workflows to tooling and to ingestion connectors. | — |
policy |
A rule/standard/constraint/governance (entitlements, compliance) — the governance layer. | — |
artifact |
A deliverable/template/output (proposal template, brand guide). | — |
term |
An atomic glossary definition; domain vocabulary. | — |
task |
A single unit of work in the agentic board — the work item a bounded loop picks up, transitions, and hands off (Agentic "sprint board" architecture — a git-resident OKF task board worked by bounded, hook-governed Agent SDK loops). One atomic file per task; its frontmatter carries its own live state, so the git board is the source of truth. | status, priority, assignee, dependencies, acceptance_criteria, claimed_by, lease_expires (owner = base) |
concept |
Generic fallback for anything not yet typed (OKF base type). | — |
Extensible by design. Because only
typeis mandatory, verticals add their own types freely — e.g. a Digital Experience Agency vertical might addengagement,client,capability. The model is a starting taxonomy, not a cage.
The task type — work items on the git board
A task is the OKF atom DEC-0024's agentic board is made of: one markdown+YAML file per work item, in the client's own git, so the board carries no off-disk state. It conforms to the base frontmatter exactly (id immutable, owner = accountable role, source/confidence/timestamp provenance) and adds a small mutable spine:
| Field | Req | Meaning |
|---|---|---|
status |
★ | backlog | claimed | in_progress | review | done | blocked. Free-text by design (verticals may add e.g. on-hold-legal); the baseline lifecycle is the convention, not a hard enum. |
priority |
○ | Relative urgency — p0 | p1 | p2 (or critical/high/medium/low). |
owner |
○ | The role accountable for the task's direction (base field, edge → role). |
assignee |
○ | Who is actually doing the work — a role id or a concrete agent/person handle. Mirrors the process owner-vs-doer split. |
dependencies |
○ | Prerequisite task ids (edge → task). A suggestion of ordering, not an automatic hold — agents and humans must check and decide. Dangling deps are graph-lint findings, never board breakers. |
acceptance_criteria |
○ | The done-definition: a checklist the review gate is judged against. |
claimed_by |
○ | The agent/person id currently holding the task (coordination, not ownership). Half of the claim/lease pair. |
lease_expires |
○ | ISO-8601 expiry of the claim. A PreToolUse hook denies edits that breach a live lease and treats an expired lease as reclaimable — governance is deterministic (hook verdict), not model-trusted (Agentic "sprint board" architecture — a git-resident OKF task board worked by bounded, hook-governed Agent SDK loops §5). |
Conventions that keep the board atomic and legible:
- One task per file, id-stamped and immutable (slug never recycled; retire via
status: done+ optionalsuperseded_by, never rename/delete). Task ids take atask-…prefix so they never collide withdecision/roleids in the route map. - State lives in the atom, nowhere else. There is no separate claims table or lock service;
status/claimed_by/lease_expiresare frontmatter edits + git commits. The git history is the audit trail. - Tasks live under
tasks/by convention. The loader is directory-agnostic (it globs**/*.md), but the convention keeps boards discoverable and tools optimized; a board surface queriestype === 'task'. - Provenance always. Every task records
source(how it was created — a curator, a Jira import),confidence(assertedwhen human-curated,inferredwhen agent-generated), andtimestamp(last meaningful change). - Quote full date-times. A
lease_expires(or any…T…Zdate-time) is written quoted ("2026-06-16T14:30:00Z") so it parses as a string, not a YAML timestamp — the field isz.string(). A date-only value (2026-06-14) needs no quoting. - Verticals extend, never fork. A client needing custom fields registers a vertical task type (e.g. a DXA
dxa-taskaddingservice_tier/billable_hours) via the sameregisterType(defineType(...))path the other verticals use — single source of truth preserved.
process vs workflow vs task. A
processis a node (a documented, repeatable procedure — "how we do X", stable). Aworkflowis the path through nodes (the standing orchestration). Ataskis a transient unit of execution — a single, ephemeral work item with mutable state (claimed, in progress, done) that an agent or human actually performs now. A process answers "how is this done in general"; a task answers "do this specific thing, here is its live state." Tasks may reference the processes/decisions/roles they enact (viarelates_to/informed_by/owner), but a process never references a task (the durable concept does not depend on the ephemeral work item).
Relationship vocabulary (edges → the graph)
Core vocabulary (every atom may carry these): owner/owned_by → role · reports_to → role · steps/stages → ordered process refs · uses → system · governed_by → policy · produces → artifact · informed_by/decided_in/decided_by → decision/role · defines/uses_term → term · relates_to → generic · supersedes/superseded_by → versioning.
Task edges (board): dependencies → prerequisite task ids (ordering hint, not a transitive hold — agents must check and decide); assignee → the role/person doing the work; claimed_by → the agent/person currently holding the task (coordination). owner stays the accountable role. Reverse readings (blocked_by as the inverse of dependencies, owned_by as the inverse of owner) are derived at read time, never authored twice — single source of truth.
Vertical edges (registry-driven). A relationship label is not a closed list. A vertical adds its own traversable edges by declaring them on its registered type — the same registerType(defineType(...)) path it uses to add the type itself — so verticals extend the graph, never fork it. The edge field's name is the relationship label, and a registered edge is a plain typed edge (it never inherits produces semantics — DEC-0007 reserves produces to a process). The DXA vertical contributes: capability.delivered_by → workflow/process (associative, deliberately not produces — DEC-0007); client.systems → system, client.engagements → engagement; engagement.client → client, engagement.capabilities → capability, engagement.runs → workflow/process, engagement.sow → artifact. (See The produces edge is canonical on the producing process only and OKF edge vocabulary is registry-driven — a vertical declares its own traversable edges.) Edge extraction (edges()) unions the core vocabulary with every registered vertical field; a core label always wins, so a vertical can never shadow a core relationship.
Edges live in frontmatter and as [[wiki-links]] in the body. The frontmatter feeds the knowledge graph (GraphRAG); the body feeds the human wiki.
Reserved files (OKF)
These are reserved structural types (type: index / type: log) — first-class in the schema, not coerced to concept. They organize the repo rather than capture a domain concept.
index.md(type: index) — overview/entry for a directory or composite concept.log.md(type: log) — the atomic change/decision log for that directory or concept (audit trail at knowledge-atom granularity).
The judgment layer (why decision is core)
Facts (process, role) are the easy 80%. The IP is the judgment: why this vendor, why onboarding works this way, what we'd never do again. decision concepts capture that — and they are produced continuously, by the log-auditor agent as the org (and this project) operates. Decisions link to the processes/workflows/roles they shaped, so the graph encodes not just what the org does but why — exactly the tacit knowledge Nadella says models otherwise commoditize away.
Example — a process atom
---
type: process
id: client-onboarding
title: Client Onboarding
description: How a new enterprise client goes from signed SOW to active engagement.
owner: engagement-lead
uses: [salesforce, sharepoint]
governed_by: [data-handling-policy]
produces: [kickoff-deck]
confidence: verified
source: sharepoint://ops/onboarding-runbook.docx
status: active
timestamp: 2026-06-14
---
# Client Onboarding
**Trigger:** SOW signed. **Owner:** [[engagement-lead]].
1. Provision client workspace …
2. Run discovery against [[salesforce]] …
> Decided to front-load discovery — see [[0007-frontload-discovery]].
Example — a task atom
A work item on the agentic board, mid-flight (claimed), with the claim/lease in action. Note: owner/assignee are the accountable role / the doer; claimed_by + lease_expires are coordination; dependencies is an ordering hint, not a hold. Quote a full ISO-8601 lease_expires (a date-time) so it parses as a string, not a YAML timestamp.
---
type: task
id: task-extract-acme-handbook
title: Extract the Acme employee handbook into OKF
description: Run the loop over Acme's handbook PDF and emit conformant OKF atoms.
status: claimed
confidence: asserted
source: board-curator (engagement-lead)
timestamp: 2026-06-16
priority: p1
owner: engagement-lead
assignee: extraction-engineer
dependencies: [task-provision-acme-tenant]
acceptance_criteria:
- All handbook sections emitted as atoms; 0 rejected by `@dossier/okf` validate
- Every atom carries `confidence: inferred` + the source PDF as provenance
claimed_by: claude-agent-extract-20260616
lease_expires: "2026-06-16T14:30:00Z"
---
# Extract the Acme employee handbook into OKF
**Owner:** [[engagement-lead]] · **Doing it:** [[extraction-engineer]].
Prereq: [[task-provision-acme-tenant]] should land first (ordering hint, not a hard block).
> Claimed by `claude-agent-extract-20260616` until the lease expires; a [[0024-agentic-board-architecture|PreToolUse hook]] denies a competing edit while the lease is live.
Versioning
Concepts evolve via status + supersedes/superseded_by; ids are immutable. History is git. v0 of this model is itself a decision and will compound.