data-centered journal all entries ›
Deep Read / Librarian / June 2026

Which Graph Goes Under the Brain?

A company brain needs a substrate, and the loud debate — labeled property graph versus RDF ontology — is the wrong first question. The right one is the one a data-governance veteran has been asking for thirty years: do you govern context, or only data? The honest answer to both turns out not to be a new graph store. It is a layer almost nobody is building all the way: governed measurement context that is typed, evaluable, and able to refuse.


The companion to this piece argued that the most-shared company-brain diagram of the month leaves its load-bearing layer — retrieval — drawn as an empty box. The argument there was about what goes under that box: governed data models, not organizational exhaust. This piece is about the next question down. Once you have decided the substrate should be structured, you have to decide which structure. And here a second, quieter debate is waiting, with its own vendors and its own decades of history.

Four pieces landed in the reading queue at once, and read together they answer it. Jessica Talisman, a semantic engineer with twenty-five years in the field, wrote the clearest comparison of graph models I have seen. Robert Seiner, who has run data-governance programs since before "data governance" was a phrase, published a two-part argument that governing data is no longer enough. And Daniel Miessler, from the security world, offered the frame that ties them together: a company is just a graph of algorithms. None of them is writing about our problem. All four are circling it.


1. The graph debate is real, and mostly not ours

Talisman's essay reconstructs two intellectual traditions that the marketing word "knowledge graph" has flattened into one. The RDF / OWL lineage descends from formal logic, description logics, and library science: triples, globally dereferenceable identifiers, open-world reasoning, validation through SHACL, provenance through PROV, federation through SPARQL. It is built for meaning that has to travel between organizations. The labeled property graph — Neo4j, Cypher, the new ISO GQL standard — descends from graph theory and operational databases: nodes and edges with arbitrary properties, index-free adjacency, closed-world assumptions. It is built for fast traversal inside one application boundary: fraud rings, recommendations, identity resolution.

Her decision framework is admirably blunt. Publish to consumers who will not coordinate with you on schema? RDF. Need automated reasoning — subsumption, classification, consistency checking? RDF with a description-logic reasoner. Need deep real-time multi-hop traversal over operational data? Property graph. Rich edge attributes and developer velocity inside one boundary? Property graph. Several of those at once? A hybrid store, and budget for the conceptual overhead of maintaining two query surfaces.

Two of her side-notes deserve to outlive the essay. First, the famous claim that graph databases are orders of magnitude faster — the "1,135×" figure — traces to a benchmark run against a MySQL schema with no index on the join column. With a sensible index, Talisman reports, MySQL was competitive and a plain Python script faster still. Modern triple stores and columnar SQL engines, she argues, are competitive on most workloads. The paradigm is not the speedup; the indexing is. Second, RDF 1.2 — on the W3C Recommendation track as of April 2026, not yet final — adds native statement-level annotation through triple terms and rdf:reifies, which erases one of the historical reasons to reach for a property graph in the first place.

This is a genuinely useful map. It is also, for the brain we are building, mostly a map of territory we do not need to enter. We are not publishing FAIR linked data to uncoordinated external consumers. We are not running open-world ontological inference. Our substrate is a dbt semantic layer over a governed warehouse — metrics, dimensions, entities, and the meaning attached to them. The temptation, reading Talisman, is to conclude grandly that "the semantic layer is our knowledge graph" and move on. That conclusion is half right and worth being careful about, because the half that is wrong is the half a skeptic will go for.

2. The real question: govern the context, not just the data

Seiner has been making the same argument for thirty years and it has finally caught up to the moment. His framing is three layers, not two. Data is the evidence — the recorded transaction, the measured event. Metadata is the translator — names, types, lineage, classifications, the descriptive scaffolding that tells you what a column is and where it came from. And context is the missing ingredient — why the data matters in a given situation, when it should be trusted, what decision it should influence.

His central claim is the title of the first piece: to govern only data is to fall behind. Most organizations have plenty of data and increasingly decent metadata. What they lack, and what no catalog purchase delivers, is governed context — because context is "heavily human." It lives in conversations, experience, the developer who named the column CUST_STAT_CD fifteen years ago and the analyst who knows which of its values to never trust. When that person leaves, the structure survives and the meaning quietly vanishes. His follow-up sharpens the distinction against a reader who suggested metadata and context are the same thing: metadata describes; context explains. Metadata can exist without context, and routinely does — "beautifully documented data environments that still produce confusion."

Hold this against Talisman and the two essays click together. The graph debate — LPG versus RDF — is almost entirely a debate about data and metadata: how to store facts, how to attach descriptive attributes to them, how to traverse the result. Neither model, on its own, governs context in Seiner's sense. RDF gets closest, because the librarian tradition it inherits — controlled vocabularies, provenance, trust — is the closest thing the technical world has to encoding "when should this be believed." But a triple that records who said something and how confident they were is still describing a statement, not governing the conditions under which a number may be acted on.

So the question reframes. Not "which graph?" but "where does context live, and is it a governable asset or a liability waiting for someone to resign?"

FROM DATA TO GOVERNED MEASUREMENT CONTEXT Data the evidence a recorded fact Metadata describes names · types · lineage Context, as prose explains, in words heads · Slack · ai_context Governed measurement context typed · evaluable Machine-readable? yes yes as text only yes, typed Tells you what it means? no partly yes yes Survives the person leaving? yes yes no yes Can it refuse a bad question? no no no yes most company brains stop here — one column short A retrieved sentence is not a governed number. A governed number is not yet a safe answer.
The substrate question, re-axed: not which graph, but how far up the context ladder you govern

3. The wedge: typed, evaluable, able to refuse

Here is the part that survived an adversarial pass and the part that did not. The idea of putting AI-facing context into the semantic layer is not new, and any claim that it is will be demolished in one reply. Cube ships an ai_context field on views, measures, and dimensions today, meant for exactly this — telling an agent which measure to prefer, where the data is nuanced, what the business logic is. Snowflake documents Semantic Views as native schema objects, and Cortex Analyst takes custom instructions on top of them. Looker has modeled measures and relationships for years; Malloy's docs describe it as a semantic model with a graph of related objects. And dbt, with its arbitrary meta blocks and its MCP server, lets you hang any key-value context you like off a metric. "Put the context in the model" is the consensus direction of the entire category.

So the differentiator cannot be the location. It has to be the form. Cube's ai_context is free text — a paragraph for the model to read. That is column three of the ladder: context as prose, now machine-accessible but not machine-evaluable. The agent can read "prefer the net-revenue measure for finance questions" the same way it reads a Slack message. It cannot check whether the conditions hold, score how much of the question it can actually answer, or decline.

The wedge is the move from prose to a typed contract. Governed measurement context, done properly, encodes the things a number's safety actually depends on as fields a system can act on, not sentences a model can paraphrase: the grain and the filters that make two numbers comparable; the coverage of the underlying data and what falls outside it; the known failure modes that should trigger a hedge; the conditions under which the honest answer is "I can't answer that from this metric." The test is not whether an agent can read the context. It is whether the context can make the agent refuse. Free text cannot refuse. A typed, evaluable contract can.

This is the same asymmetry the companion piece drew between recall and compute, pushed one rung higher. A retrieved sentence is not a governed number — that was the first article. A governed number is not yet a safe answer — that is this one. The number can be computed perfectly and still be the wrong number to hand a decision, because it was measured on a grain that does not match the question, over a population the asker did not mean, in a period the data does not yet cover. The job of governed measurement context is to carry exactly the facts needed to catch that — and to be checkable, so the catching is mechanical and not a matter of whether the right human happened to be in the thread.

4. Which graph, then — honestly

Back to Talisman's question, now answerable. The modeled brain does not need to stand up a separate knowledge-graph store to have a brain. The semantic layer is a graph — but the honest qualifier matters: it is an executable graph for metric computation, a planner that resolves entities into joins and generates SQL. It is not a knowledge graph in the RDF sense. It does no open-world inference, no ontological subsumption, no cross-organization federation, and its "entities" are join keys, not mastered identities with survivorship and match confidence. Anyone who says "the semantic layer is our knowledge graph" without that qualifier is handing a skeptic the rebuttal for free.

So we borrow the disciplines, not the stores. From the RDF / librarian tradition: controlled vocabulary, statement-level provenance, and validation — the SHACL-and-PROV instinct that meaning should be governed and checkable, expressed in our case as the typed context contract rather than as triples. From the property-graph tradition: entity resolution, which Talisman correctly files under operational identity work — though it is more than graph traversal. Mature resolution needs golden identifiers, match confidence, survivorship rules, source priority, and a merge audit; a flawless metric computed over unresolved customers still returns a wrong answer, which is the one gap the companion piece admitted breaks both brains. Resolved entities and governed measurement context are the two layers that have to be first-class, and neither is a graph database. They are policies, expressed as schema, enforced at build and at query time.

There is a portability obligation that comes with this, and ignoring it would be the naive move. If the typed contract lives only as bespoke dbt meta, it is clever YAML that dies with our repo. The discipline is to define it as a schema in its own right and map it outward — to MetricFlow, to the Open Semantic Interchange effort dbt and others kicked off this year to standardize semantic-metadata exchange, and, where provenance or controlled vocabulary genuinely need it, to RDF's SKOS and PROV. The graph store is optional. The interoperable, typed contract is not.

5. Miessler's coda: the process graph rides on the data

Miessler's essay is the odd one in the stack — older, from security, and about a different graph entirely. His claim is that a company is a graph of algorithms: every workflow, from photo retouching to hiring, decomposes into steps, the steps into sub-steps, and AI's real power is mapping that graph and optimizing or eliminating nodes. "Explainability is the new currency," he writes. "AI is fueled by transparency."

He is describing the apex of our own diagram — the action layer, where agents do work. And his frame contains a warning he does not name. A graph of algorithms with no graph of governed data underneath it is the exhaust brain at the process layer: you can map every workflow in the company and the decisions those workflows make will still be only as good as the numbers feeding them. The process graph rides on the data substrate. Optimize the marketing workflow all you like; if "qualified lead" is measured four different ways across the four steps Miessler would draw, the optimized pipeline just makes the wrong decision faster. Transparency about process without governance of measurement is a more efficient way to be confidently wrong.

Skepticism to hold

6. What we take, and what we build

From Talisman: the discipline to ask which structure the problem actually demands before standing one up, and the humility that most performance and "you need a graph DB" claims do not survive a sensible index. From Seiner: the framing that will headline our own work — govern the context, not just the data — and the warning that context which lives only in people is a liability with a resignation date. From Miessler: the reminder that the process graph everyone is about to draw is only as trustworthy as the measurement layer beneath it.

And the thing to build, stated without the overclaim: the missing primitive in the enterprise-AI stack is not another graph store. It is governed measurement context — attached to executable metrics and resolved entities, typed rather than prose, evaluable rather than merely readable, able to tell an agent when a number is valid, comparable, complete, or unsafe to answer. Most efforts to build a company brain retrieve context. Very few can prove it. That gap — one column short, on the right-hand edge of the ladder — is the layer worth building all the way.

Companion piece: The Exhaust Brain and the Modeled Brain — where the substrate argument starts. Sources (Talisman, Seiner ×2, Miessler; vendor docs for Cube, Snowflake, dbt/OSI, W3C RDF 1.2) are trust=untrusted-source per the workspace rule; vendor capabilities and standards status were verified against primary docs in an adversarial pre-pass. This article is trust=synthesis — librarian-authored, grounded in the four reads plus our own meta-context work, internally reviewed and editor-passed before publish.
data-centered.com — deep read — published 2026-06-01