OKF Upstream — GoogleCloudPlatform/knowledge-catalog — June 2026 Scan

okf-upstream-knowledge-catalog-2026-06

reference confidence verified status active 2026-06-16 owner principal-architect
source Deep-research pass, 2026-06-16, against PRIMARY sources (OKF SPEC.md / README.md, the Google Cloud Data Analytics blog posts, the Dataplex AI-overview docs, the Discovery Agent sample). Core technical facts are primary-sourced and confidence: verified; framing explicitly attributed to Google marketing is flagged inline as such. URLs in Sources.

OKF Upstream — GoogleCloudPlatform/knowledge-catalog — June 2026 Scan

Type note. reference is a custom concept type — a dated, provenance-bearing external-knowledge snapshot (see TypeScript Toolchain Competitive Landscape — June 2026 Scan and Brand Identity & Premium-Surface Research — June 2026 for the established pattern, and Dossier — The Knowledge Model (v0) for why custom types are allowed). It is not a decision: it records what the upstream looks like, not a judgment Dossier made. The judgment that consumed this evidence is OKF upstream relationship — complement at the format layer, competitor at the serving layer.

Snapshot dated 2026-06-16. Core technical facts are primary-sourced (confidence: verified). Extreme-recency caveat: OKF v0.1 was published 2026-06-12 — four days before this scan — and is labeled "v0.1 — Draft." Treat anything here as fast-moving.

Context

Dossier already adopted Google's Open Knowledge Format (OKF) v0.1 as its canonical format in Adopt OKF as Dossier's canonical knowledge format. This scan deep-researched OKF's upstream home to ground that adoption in current ground truth: where the spec actually lives, who governs it, what ships alongside it, and how Google itself serves OKF to agents. The findings confirm DEC-0001 (Dossier's format layer is this spec) and surface the relationship question answered in OKF upstream relationship — complement at the format layer, competitor at the serving layer.

Key findings

1. The upstream repo and spec

  • Repo: GoogleCloudPlatform/knowledge-catalog (https://github.com/GoogleCloudPlatform/knowledge-catalog) — the reference implementation and spec home for OKF v0.1.
  • Provenance: published by the Google Cloud Data Cloud team (authors Sam McVeety, Amir Hormati) on 2026-06-12, Apache-2.0. Reached ~2,878 stars within days; labeled "v0.1 — Draft."
  • What ships in the repo:
    • okf/SPEC.md + okf/README.md — the format definition.
    • A Discovery Agent sample (built on Google ADK / Agent Development Kit; queries the Dataplex / Knowledge Catalog API; uses Vertex AI gemini-3-flash-preview).
    • An enrichment agent, sample bundles, and a graph visualizer.

2. OKF the FORMAT is deliberately minimal, git-native, and disclaims central authority

  • OKF is markdown + YAML in git, SDK-free: the spec's stance is "if you can cat a file you can read OKF."
  • The spec explicitly disclaims central authority / a schema registry / required tooling.
  • Crucially for Dossier: "Producers MAY include any additional keys." This makes Dossier's extensions legal OKF, not a fork (see finding 4).
  • This is philosophically convergent with Dossier's sovereignty stance (Dossier — Mission & North Star, Adopt OKF as Dossier's canonical knowledge format) — OKF-the-format is not an opposed lock-in format.

3. Knowledge Catalog the SERVING layer is vendor-coupled

  • Google "updated Knowledge Catalog to ingest OKF and serve it to our agents."
  • Knowledge Catalog = rebranded Dataplex Universal Catalog (renamed 2026-04-10; the API / CLI / IAM names are unchanged — still dataplex.googleapis.com).
  • It serves context to agents via MCP — a remote MCP server at dataplex.googleapis.com/mcp plus a local MCP Toolbox. This is the same MCP primitive Dossier's MCP agentic foundation — tenant-scoped GraphRAG over the OKF KB builds on...
  • ...but it is coupled to GCP IAM / billing and the GCP data estate (BigQuery / AlloyDB / Spanner / Looker). That vendor-coupled serving path is precisely the lock-in vector Dossier positions against.

4. Dossier is a deliberate SUPERSET of OKF (verified, not a fork)

Dossier's format layer is Google's OKF spec, plus a small set of additions that the spec's "additional keys MAY" clause explicitly permits:

  • typed frontmatter edges (owner / uses / governed_by / produces / supersedes) on top of OKF's deliberately untyped-link base;
  • confidence (verified / asserted / inferred);
  • source / provenance;
  • a first-class decision judgment layer (the IP capture — Dossier — The Knowledge Model (v0)).

So Dossier extends a deliberately minimal, untyped-link base with typed edges + confidence + provenance + judgment — and stays legal OKF.

What Google's named agent consumers actually are

Google says it serves OKF to "our agents" — named consumers are Gemini-in-BigQuery, ADK, LangChain, and external Claude via MCP.

Overreach flagged. Framing OKF as "served to Vertex AI" is an overreach of the primary text — Google says only "our agents" (the named list above), not Vertex AI broadly. Do not propagate the stronger claim.

Caveats — preserve honestly

  • Extreme recency: OKF v0.1 is 4 days old (Draft) as of this snapshot; everything is subject to change.
  • Marketing vs. primary: some framing is Google's own marketing — e.g. "lock-in free" and the "three pillars: Aggregation / Enrichment / Search" — even though the core technical facts above come from primary SPEC.md / README.md.
  • Overreach: the "serves OKF to Vertex AI" phrasing is not supported (see above); Google says only "our agents."
  • Editorial vs. sourced: the complement-vs-competitor verdict (format = complement, serving = competitor) is editorial synthesis, recorded as the judgment in OKF upstream relationship — complement at the format layer, competitor at the serving layernot a sourced fact.

Implication for Dossier (evidence only — the judgment is the decision)

The evidence splits cleanly by layer: OKF-the-format is a sovereignty-convergent complement Dossier already adopts and supersets; Knowledge-Catalog-the-serving-layer is a vendor-coupled competitor on the same MCP primitive. The act of deciding what Dossier does with this — the positioning stance and the interop / typed-edge / governance follow-ups — is recorded in OKF upstream relationship — complement at the format layer, competitor at the serving layer, not here. This reference confirms but does not change Adopt OKF as Dossier's canonical knowledge format; no decision is taken in this atom.

Sources

Deep-research pass, 2026-06-16 (primary sources; core technical facts confidence: verified):