← Meta Context Schema

Context Engineering for Analytics Engineers: A Business Context Schema dbt Already Supports

36 structured fields in dbt's meta: block give LLMs the business knowledge to interpret metrics correctly. No new tools — it works today.

Keith Binkly · March 2026

We've all read the articles and seen the posts: context graphs are the "trillion dollar platform opportunity;" AI needs business documentation / context to do anything truly valuable with enterprise data; while the semantic layer lays down necessary guardrails, it's not sufficient for delivering actionable insights.

Enterprise data warehouses are systems of record for low-level facts. But the meaning, the relationships, the intelligence — it's scattered across Excel workbooks, slide decks, Python notebooks, Confluence pages, Slack threads, and the heads of senior analysts who might not be around forever.

The empty row in the data stack

Layer	System of Record	Solved By
Storage	The warehouse	Snowflake, BigQuery, Redshift
Logic	Transformations + metric definitions	dbt, MetricFlow
Intelligence	???	???

Storage has a system of record. Logic has a system of record. The intelligence — the ad hoc analyses, the deep-dive investigations, the hard-won intuition about what's normal and what isn't — doesn't. It wastes away in incomprehensible folder trees and buried Slack threads, and walks out the door when senior analysts move on.

Meta context: structured business intelligence in dbt's `meta:` block

I've been collecting, synthesizing, and unpacking the work of ontology and semantics experts for many months, gradually wrapping my head around these concepts to make sense of what a "context layer" does, and looks like, in practice. My hunch has been that if any existing tool could be extended to support this next generation of AI-powered enterprise data analysis, dbt would be that tool.

Grounded in hundreds of thousands of words of expertise, and with Claude's help, I created a framework for that. It works today, in dbt, using infrastructure that already exists. The semantic layer arms the LLM with accurate and business-aligned queries for retrieving data; this framework ships the most relevant context LLMs can use to understand, interpret, and act on that data.

What it looks like

Five layers: Context, Expectations, Investigation, Relationships, Decisions. 36 fields in all, organized into tiers: 13 core fields that deliver immediate value, 10 recommended for mature teams, 13 optional for full coverage.

This lives in the meta: key that dbt already supports, and it works today in any dbt project — no feature request, no vendor dependency, no migration. When dbt compiles, the meta: block is preserved as nested JSON in the manifest — accessible through both the Semantic Layer GraphQL API and the Discovery API.

Start with the 13 Core fields on your 3 most-queried metrics. That's a morning's work.

Brief backstory

I've been accumulating a knowledge engineering library — all the info-dense pieces on semantic layers, ontologies, knowledge graphs, and context engineering written by credible experts and practitioners. Reading as much as I can, bookmarking more that I can't. I built a specialized Claude agent to read and synthesize on my behalf, reporting back on the most salient trends, tools, and concepts; tracking who agrees with whom, where the tensions are, and where the consensus lies.

The biggest influences on this work were Jessica Talisman's writing on layered meaning and process knowledge — the idea that procedural knowledge (how experts investigate, what decisions they make) is the most valuable and fastest-decaying. Brian Jin's work on context decay mechanisms. Justin Johnson's framing of the context graph as infrastructure. These aren't just citations — they shaped the architecture directly.

I asked Claude a practical question: how do we go beyond the semantic layer and connect business documentation — context — with our pipelines?

It came back with a first draft showing how you'd insert structured context using dbt's meta: field in the MetricFlow config — a freeform key-value space that dbt already supports.

I started poking at it. How did you choose these fields? Is it infinitely freeform — you might put anything in there? Or is there a structure, a framework for inputs? Go consult our KE library before answering.

Informed by the expert library, Claude had an answer for everything. It can hold expectations — healthy ranges, seasonality — and relationships between metrics. It could house decision context — what should someone do when a metric moves past a threshold?

Each question expanded what started as a simple key-value dump into something with real architecture.

"Can't a frontier model just read the docs?"

Yes, a frontier model with the right documents in context can certainly reason its way to the same conclusions. Dump a data dictionary, a Confluence page, and a Slack thread into the prompt and it will analyze/synthesize; the raw capability is there.

This framework buys you something else: reliability, cost, and consistency at scale.

Co-location eliminates retrieval. The context lives on the metric definition itself. No RAG pipeline deciding which of 200 Confluence pages is relevant. The retrieval problem is where most real-world failures happen, not the reasoning.

Structure eliminates interpretation. Compare a paragraph buried in a Confluence doc — "Revenue typically sees a seasonal uplift in Q4, usually somewhere in the 20-30% range, driven primarily by year-end enterprise contract cycles, though this has been less pronounced since we added the SMB segment in 2024..." — versus seasonality: "Q4 +20-30% (year-end contracts)". The model can extract the fact from the paragraph. But "can" isn't "will, every time, for every metric, across every query."

Consistent decomposition. When causal_dimensions is a structured field listing channel, then region, then product_category — every query decomposes along the same axes. Without it, the model decomposes differently depending on which docs got retrieved and how the user phrased the question. For an analytics tool, inconsistent decomposition across users is a real problem.

The cost math. 15 Confluence pages cost 50-100x more tokens than 8 structured fields. At interactive speed, across an org, that could mean the difference between viable and not.

It's the same reason we use database schemas instead of document stores for structured queries. Meta context is a schema for business knowledge — same trade-off, same payoff.

Start with just three fields

You don't need 36 fields to see the difference. We ran an ablation eval — stripping layers to measure what each contributes:

V0 (bare metric, no meta): LLM hallucinates confidently
V1 (add purpose, owner, grain): Interpretation failures drop significantly
V2 (add thresholds, seasonality): Calibration improves

Pick a frequently used metric. Ask an LLM to interpret an anomaly with only the definitions in the YAML. Add purpose, healthy_range, and seasonality values and repeat. The difference will be obvious.

One caveat from the eval: expectations without decision rules creates a false confidence — the LLM becomes more likely to offer incorrect answers. If you add thresholds, also add business_rules, even if it's just: "No formal SLA documented. Treat thresholds as analytical guidelines." Explicit "no rule" beats silence.

Populate with a simple prompt and your docs

You don't need to fill these fields manually. The whole point is that the knowledge already exists — in pipeline guides, Confluence pages, analysis artifacts, Slack threads — it just needs to be extracted and structured.

Here's the prompt pattern we used. Give this to Claude, GPT, or any frontier model along with your business context documents:

I'm enriching the dbt semantic layer metric [metric_name] with structured business context. The context will live in the meta: block of the YAML definition, organized into 5 layers.

Here are the source documents:
[paste or attach your pipeline guide, data dictionary, analysis reports, known issues doc]

For this metric, extract and structure the following into YAML:

Layer 1 — Context: purpose, business_question, owner
Layer 2 — Expectations: healthy_range, warning_threshold / critical_threshold, seasonality
Layer 3 — Investigation: causal_dimensions (with priority), investigation_path (conditional tree)
Layer 4 — Relationships: correlates_with (typed), affected_by (with magnitude)
Layer 5 — Decisions: when_this_drops, business_rules (or explicit "no SLA documented")

Output valid YAML starting at the meta: key. Only include fields where the source documents provide evidence — leave others out rather than guessing.

The key instruction is the last line: only include fields where the docs provide evidence. The schema is designed to be populated incrementally. An empty field is honest. A hallucinated threshold is dangerous.

dbt: the first context layer player?

The context layer conversation is moving fast across the data stack. What's exciting is that dbt already has the infrastructure for it.

The meta: block is a first-class, freeform config space that survives compilation into the manifest, flows through the Semantic Layer GraphQL API, and is accessible via the Discovery API. It's version-controlled in git, validated by dbt parse, and lives alongside the metric definition it describes — meaning there's no separate system to sync, and no integration to maintain.

That makes dbt the first tool — and right now the only tool, as far as I'm aware — that can implement a structured context layer without a product change. The infrastructure already exists.

This is what makes the timing so interesting. Across the ecosystem — vendors, practitioners, researchers — everyone is exploring what a context layer should look like. dbt is uniquely positioned to move from exploration to practice, not because dbt Labs built a context layer feature, but because the existing architecture was flexible enough to support one all along. The meta: block was designed for exactly this kind of extensibility — structured metadata that travels with the definition.

The full project lives at data-centered.com/meta-context, including the schema reference (all 36 fields with types and tiers), the ablation eval results, and a deep read covering the full research arc. Everything is open.