OKF Upstream — GoogleCloudPlatform/knowledge-catalog — June 2026 Scan
okf-upstream-knowledge-catalog-2026-06
OKF Upstream — GoogleCloudPlatform/knowledge-catalog — June 2026 Scan
Type note.
referenceis a custom concept type — a dated, provenance-bearing external-knowledge snapshot (see TypeScript Toolchain Competitive Landscape — June 2026 Scan and Brand Identity & Premium-Surface Research — June 2026 for the established pattern, and Dossier — The Knowledge Model (v0) for why custom types are allowed). It is not a decision: it records what the upstream looks like, not a judgment Dossier made. The judgment that consumed this evidence is OKF upstream relationship — complement at the format layer, competitor at the serving layer.Snapshot dated 2026-06-16. Core technical facts are primary-sourced (
confidence: verified). Extreme-recency caveat: OKF v0.1 was published 2026-06-12 — four days before this scan — and is labeled "v0.1 — Draft." Treat anything here as fast-moving.
Context
Dossier already adopted Google's Open Knowledge Format (OKF) v0.1 as its canonical format in Adopt OKF as Dossier's canonical knowledge format. This scan deep-researched OKF's upstream home to ground that adoption in current ground truth: where the spec actually lives, who governs it, what ships alongside it, and how Google itself serves OKF to agents. The findings confirm DEC-0001 (Dossier's format layer is this spec) and surface the relationship question answered in OKF upstream relationship — complement at the format layer, competitor at the serving layer.
Key findings
1. The upstream repo and spec
- Repo:
GoogleCloudPlatform/knowledge-catalog(https://github.com/GoogleCloudPlatform/knowledge-catalog) — the reference implementation and spec home for OKF v0.1. - Provenance: published by the Google Cloud Data Cloud team (authors Sam McVeety, Amir Hormati) on 2026-06-12, Apache-2.0. Reached ~2,878 stars within days; labeled "v0.1 — Draft."
- What ships in the repo:
okf/SPEC.md+okf/README.md— the format definition.- A Discovery Agent sample (built on Google ADK / Agent Development Kit; queries the Dataplex / Knowledge Catalog API; uses Vertex AI
gemini-3-flash-preview). - An enrichment agent, sample bundles, and a graph visualizer.
2. OKF the FORMAT is deliberately minimal, git-native, and disclaims central authority
- OKF is markdown + YAML in git, SDK-free: the spec's stance is "if you can
cata file you can read OKF." - The spec explicitly disclaims central authority / a schema registry / required tooling.
- Crucially for Dossier: "Producers MAY include any additional keys." This makes Dossier's extensions legal OKF, not a fork (see finding 4).
- This is philosophically convergent with Dossier's sovereignty stance (Dossier — Mission & North Star, Adopt OKF as Dossier's canonical knowledge format) — OKF-the-format is not an opposed lock-in format.
3. Knowledge Catalog the SERVING layer is vendor-coupled
- Google "updated Knowledge Catalog to ingest OKF and serve it to our agents."
- Knowledge Catalog = rebranded Dataplex Universal Catalog (renamed 2026-04-10; the API / CLI / IAM names are unchanged — still
dataplex.googleapis.com). - It serves context to agents via MCP — a remote MCP server at
dataplex.googleapis.com/mcpplus a local MCP Toolbox. This is the same MCP primitive Dossier's MCP agentic foundation — tenant-scoped GraphRAG over the OKF KB builds on... - ...but it is coupled to GCP IAM / billing and the GCP data estate (BigQuery / AlloyDB / Spanner / Looker). That vendor-coupled serving path is precisely the lock-in vector Dossier positions against.
4. Dossier is a deliberate SUPERSET of OKF (verified, not a fork)
Dossier's format layer is Google's OKF spec, plus a small set of additions that the spec's "additional keys MAY" clause explicitly permits:
- typed frontmatter edges (
owner/uses/governed_by/produces/supersedes) on top of OKF's deliberately untyped-link base; - confidence (
verified/asserted/inferred); - source / provenance;
- a first-class
decisionjudgment layer (the IP capture — Dossier — The Knowledge Model (v0)).
So Dossier extends a deliberately minimal, untyped-link base with typed edges + confidence + provenance + judgment — and stays legal OKF.
What Google's named agent consumers actually are
Google says it serves OKF to "our agents" — named consumers are Gemini-in-BigQuery, ADK, LangChain, and external Claude via MCP.
Overreach flagged. Framing OKF as "served to Vertex AI" is an overreach of the primary text — Google says only "our agents" (the named list above), not Vertex AI broadly. Do not propagate the stronger claim.
Caveats — preserve honestly
- Extreme recency: OKF v0.1 is 4 days old (Draft) as of this snapshot; everything is subject to change.
- Marketing vs. primary: some framing is Google's own marketing — e.g. "lock-in free" and the "three pillars: Aggregation / Enrichment / Search" — even though the core technical facts above come from primary
SPEC.md/README.md. - Overreach: the "serves OKF to Vertex AI" phrasing is not supported (see above); Google says only "our agents."
- Editorial vs. sourced: the complement-vs-competitor verdict (format = complement, serving = competitor) is editorial synthesis, recorded as the judgment in OKF upstream relationship — complement at the format layer, competitor at the serving layer — not a sourced fact.
Implication for Dossier (evidence only — the judgment is the decision)
The evidence splits cleanly by layer: OKF-the-format is a sovereignty-convergent complement Dossier already adopts and supersets; Knowledge-Catalog-the-serving-layer is a vendor-coupled competitor on the same MCP primitive. The act of deciding what Dossier does with this — the positioning stance and the interop / typed-edge / governance follow-ups — is recorded in OKF upstream relationship — complement at the format layer, competitor at the serving layer, not here. This reference confirms but does not change Adopt OKF as Dossier's canonical knowledge format; no decision is taken in this atom.
Sources
Deep-research pass, 2026-06-16 (primary sources; core technical facts confidence: verified):
- https://github.com/GoogleCloudPlatform/knowledge-catalog/blob/main/okf/SPEC.md
- https://github.com/GoogleCloudPlatform/knowledge-catalog/blob/main/okf/README.md
- https://github.com/GoogleCloudPlatform/knowledge-catalog/tree/main/samples/discovery
- https://cloud.google.com/blog/products/data-analytics/how-the-open-knowledge-format-can-improve-data-sharing/
- https://cloud.google.com/blog/products/data-analytics/introducing-the-google-cloud-knowledge-catalog
- https://docs.cloud.google.com/dataplex/docs/ai-overview